SlideShare a Scribd company logo
1 of 78
4-1
Introduction to Linear Regression
(SW Chapter 4)
Empirical problem: Class size and educational output
 Policy question: What is the effect of reducing class
size by one student per class? by 8 students/class?
 What is the right output (performance) measure?
 parent satisfaction
 student personal development
 future adult welfare
 future adult earnings
 performance on standardized tests
4-2
What do data say about class sizes and test scores?
The California Test Score Data Set
All K-6 and K-8 California school districts (n = 420)
Variables:
 5th
grade test scores (Stanford-9 achievement test,
combined math and reading), district average
 Student-teacher ratio (STR) = no. of students in the
district divided by no. full-time equivalent teachers
4-3
An initial look at the California test score data:
4-4
Do districts with smaller classes (lower STR) have higher test
scores?
4-5
The class size/test score policy question:
 What is the effect on test scores of reducing STR by
one student/class?
 Object of policy interest:
Test score
STR


 This is the slope of the line relating test score and STR
4-6
This suggests that we want to draw a line through the
Test Score v. STR scatterplot – but how?
4-7
Some Notation and Terminology
(Sections 4.1 and 4.2)
The population regression line:
Test Score = 0 + 1STR
1 = slope of population regression line
=
Test score
STR


= change in test score for a unit change in STR
 Why are 0 and 1 “population” parameters?
 We would like to know the population value of 1.
 We don’t know 1, so must estimate it using data.
4-8
How can we estimate 0 and 1 from data?
Recall that Y was the least squares estimator of Y: Y
solves,
2
1
min ( )
n
m i
i
Y m



By analogy, we will focus on the least squares
(“ordinary least squares” or “OLS”) estimator of the
unknown parameters 0 and 1, which solves,
0 1
2
, 0 1
1
min [ ( )]
n
b b i i
i
Y b b X

 

4-9
The OLS estimator solves: 0 1
2
, 0 1
1
min [ ( )]
n
b b i i
i
Y b b X

 

 The OLS estimator minimizes the average squared
difference between the actual values of Yi and the
prediction (predicted value) based on the estimated line.
 This minimization problem can be solved using
calculus (App. 4.2).
 The result is the OLS estimators of 0 and 1.
4-10
Why use OLS, rather than some other estimator?
 OLS is a generalization of the sample average: if the
“line” is just an intercept (no X), then the OLS
estimator is just the sample average of Y1,…Yn (Y ).
 Like Y , the OLS estimator has some desirable
properties: under certain assumptions, it is unbiased
(that is, E( 1
ˆ
 ) = 1), and it has a tighter sampling
distribution than some other candidate estimators of
1 (more on this later)
 Importantly, this is what everyone uses – the common
“language” of linear regression.
4-11
4-12
Application to the California Test Score – Class Size data
Estimated slope = 1
ˆ
 = – 2.28
Estimated intercept = 0
ˆ
 = 698.9
Estimated regression line: TestScore = 698.9 – 2.28 STR
4-13
Interpretation of the estimated slope and intercept
TestScore = 698.9 – 2.28 STR
 Districts with one more student per teacher on average
have test scores that are 2.28 points lower.
 That is,
Test score
STR


= –2.28
 The intercept (taken literally) means that, according to
this estimated line, districts with zero students per
teacher would have a (predicted) test score of 698.9.
 This interpretation of the intercept makes no sense – it
extrapolates the line outside the range of the data – in
this application, the intercept is not itself
economically meaningful.
4-14
Predicted values & residuals:
One of the districts in the data set is Antelope, CA, for
which STR = 19.33 and Test Score = 657.8
predicted value: ˆ
Antelope
Y = 698.9 – 2.28 19.33 = 654.8
residual: ˆAntelope
u = 657.8 – 654.8 = 3.0
4-15
OLS regression: STATA output
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------
TestScore = 698.9 – 2.28 STR
(we’ll discuss the rest of this output later)
4-16
The OLS regression line is an estimate, computed using
our sample of data; a different sample would have given
a different value of 1
ˆ
 .
How can we:
 quantify the sampling uncertainty associated with 1
ˆ
 ?
 use 1
ˆ
 to test hypotheses such as 1 = 0?
 construct a confidence interval for 1?
Like estimation of the mean, we proceed in four steps:
1. The probability framework for linear regression
2. Estimation
3. Hypothesis Testing
4. Confidence intervals
4-17
1. Probability Framework for Linear Regression
Population
population of interest (ex: all possible school districts)
Random variables: Y, X
Ex: (Test Score, STR)
Joint distribution of (Y,X)
The key feature is that we suppose there is a linear
relation in the population that relates X and Y; this linear
relation is the “population linear regression”
4-18
The Population Linear Regression Model (Section 4.3)
Yi = 0 + 1Xi + ui, i = 1,…, n
 X is the independent variable or regressor
 Y is the dependent variable
 0 = intercept
 1 = slope
 ui = “error term”
 The error term consists of omitted factors, or possibly
measurement error in the measurement of Y. In
general, these omitted factors are other factors that
influence Y, other than the variable X
4-19
Ex.: The population regression line and the error term
What are some of the omitted factors in this example?
4-20
Data and sampling
The population objects (“parameters”) 0 and 1 are
unknown; so to draw inferences about these unknown
parameters we must collect relevant data.
Simple random sampling:
Choose n entities at random from the population of
interest, and observe (record) X and Y for each entity
Simple random sampling implies that {(Xi, Yi)}, i = 1,…,
n, are independently and identically distributed (i.i.d.).
(Note: (Xi, Yi) are distributed independently of (Xj, Yj) for
different observations i and j.)
4-21
Task at hand: to characterize the sampling distribution of
the OLS estimator. To do so, we make three
assumptions:
The Least Squares Assumptions
1. The conditional distribution of u given X has mean
zero, that is, E(u|X = x) = 0.
2. (Xi,Yi), i =1,…,n, are i.i.d.
3. X and u have four moments, that is:
E(X4
) < and E(u4
) < .
We’ll discuss these assumptions in order.
4-22
Least squares assumption #1: E(u|X = x) = 0.
For any given value of X, the mean of u is zero
4-23
Example: Assumption #1 and the class size example
Test Scorei = 0 + 1STRi + ui, ui = other factors
“Other factors:”
 parental involvement
 outside learning opportunities (extra math class,..)
 home environment conducive to reading
 family income is a useful proxy for many such factors
So E(u|X=x) = 0 means E(Family Income|STR) = constant
(which implies that family income and STR are
uncorrelated). This assumption is not innocuous! We
will return to it often.
4-24
Least squares assumption #2:
(Xi,Yi), i = 1,…,n are i.i.d.
This arises automatically if the entity (individual, district)
is sampled by simple random sampling: the entity is
selected then, for that entity, X and Y are observed
(recorded).
The main place we will encounter non-i.i.d. sampling is
when data are recorded over time (“time series data”) –
this will introduce some extra complications.
4-25
Least squares assumption #3:
E(X4
) < and E(u4
) <
Because Yi = 0 + 1Xi + ui, assumption #3 can
equivalently be stated as, E(X4
) < and E(Y4
) < .
Assumption #3 is generally plausible. A finite domain of
the data implies finite fourth moments. (Standardized
test scores automatically satisfy this; STR, family income,
etc. satisfy this too).
4-26
1. The probability framework for linear regression
2. Estimation: the Sampling Distribution of 1
ˆ

(Section 4.4)
3. Hypothesis Testing
4. Confidence intervals
Like Y , 1
ˆ
 has a sampling distribution.
 What is E( 1
ˆ
 )? (where is it centered)
 What is var( 1
ˆ
 )? (measure of sampling uncertainty)
 What is its sampling distribution in small samples?
 What is its sampling distribution in large samples?
4-27
The sampling distribution of 1
ˆ
 : some algebra:
Yi = 0 + 1Xi + ui
Y = 0 + 1 X + u
so Yi – Y = 1(Xi – X ) + (ui – u )
Thus,
1
ˆ
 = 1
2
1
( )( )
( )
n
i i
i
n
i
i
X X Y Y
X X


 



=
1
1
2
1
( )[ ( ) ( )]
( )
n
i i i
i
n
i
i
X X X X u u
X X



   



4-28
1
ˆ
 =
1
1
2
1
( )[ ( ) ( )]
( )
n
i i i
i
n
i
i
X X X X u u
X X



   



= 1 1
1
2 2
1 1
( )( ) ( )( )
( ) ( )
n n
i i i i
i i
n n
i i
i i
X X X X X X u u
X X X X
  
 
   

 
 
 
so
1
ˆ
 – 1 = 1
2
1
( )( )
( )
n
i i
i
n
i
i
X X u u
X X


 



4-29
We can simplify this formula by noting that:
1
( )( )
n
i i
i
X X u u

 
 =
1
( )
n
i i
i
X X u


 –
1
( )
n
i
i
X X u

 

 
 

=
1
( )
n
i i
i
X X u


 .
Thus
1
ˆ
 – 1 = 1
2
1
( )
( )
n
i i
i
n
i
i
X X u
X X






= 1
2
1
1
n
i
i
X
v
n
n
s
n


 
 
 

where vi = (Xi – X )ui.
4-30
1
ˆ
 – 1 = 1
2
1
1
n
i
i
X
v
n
n
s
n


 
 
 

, where vi = (Xi – X )ui
We now can calculate the mean and variance of 1
ˆ
 :
E( 1
ˆ
 – 1) = 2
1
1 1
n
i X
i
n
E v s
n n


 
 
 
 
 
 

= 2
1
1
1
n
i
i X
v
n
E
n n s

 
 
   

   

= 2
1
1
1
n
i
i X
v
n
E
n n s

 
 
   

   

4-31
Now E(vi/ 2
X
s ) = E[(Xi – X )ui/ 2
X
s ] = 0
because E(ui|Xi=x) = 0 (for details see App. 4.3)
Thus, E( 1
ˆ
 – 1) = 2
1
1
1
n
i
i X
v
n
E
n n s

 
 
   

   
 = 0
so
E( 1
ˆ
 ) = 1
That is, 1
ˆ
 is an unbiased estimator of 1.
4-32
Calculation of the variance of 1
ˆ
 :
1
ˆ
 – 1 = 1
2
1
1
n
i
i
X
v
n
n
s
n


 
 
 

This calculation is simplified by supposing that n is
large (so that 2
X
s can be replaced by 2
X
 ); the result is,
var( 1
ˆ
 ) = 2
var( )
X
v
n
(For details see App. 4.3.)
4-33
The exact sampling distribution is complicated, but when
the sample size is large we get some simple (and good)
approximations:
(1) Because var( 1
ˆ
 ) 1/n and E( 1
ˆ
 ) = 1, 1
ˆ

p
 1
(2) When n is large, the sampling distribution of 1
ˆ
 is
well approximated by a normal distribution (CLT)
4-34
1
ˆ
 – 1 = 1
2
1
1
n
i
i
X
v
n
n
s
n


 
 
 

When n is large:
 vi = (Xi – X )ui (Xi – X)ui, which is i.i.d. (why?) and
has two moments, that is, var(vi) < (why?). Thus
1
1 n
i
i
v
n 
 is distributed N(0,var(v)/n) when n is large
 2
X
s is approximately equal to 2
X
 when n is large

1
n
n

= 1 –
1
n
1 when n is large
Putting these together we have:
Large-n approximation to the distribution of 1
ˆ
 :
4-35
1
ˆ
 – 1 = 1
2
1
1
n
i
i
X
v
n
n
s
n


 
 
 
 1
2
1 n
i
i
X
v
n



,
which is approximately distributed N(0,
2
2 2
( )
v
X
n


).
Because vi = (Xi – X )ui, we can write this as:
1
ˆ
 is approximately distributed N(1, 4
var[( ) ]
i x i
X
X u
n



)
4-36
Recall the summary of the sampling distribution of Y :
For (Y1,…,Yn) i.i.d. with 0 < 2
Y
 < ,
 The exact (finite sample) sampling distribution of Y
has mean Y (“Y is an unbiased estimator of Y”) and
variance 2
Y
 /n
 Other than its mean and variance, the exact
distribution of Y is complicated and depends on the
distribution of Y
 Y
p
 Y (law of large numbers)

( )
var( )
Y E Y
Y

is approximately distributed N(0,1) (CLT)
4-37
Parallel conclusions hold for the OLS estimator 1
ˆ
 :
Under the three Least Squares Assumptions,
 The exact (finite sample) sampling distribution of 1
ˆ

has mean 1 (“ 1
ˆ
 is an unbiased estimator of 1”), and
var( 1
ˆ
 ) is inversely proportional to n.
 Other than its mean and variance, the exact
distribution of 1
ˆ
 is complicated and depends on the
distribution of (X,u)
 1
ˆ

p
 1 (law of large numbers)
 1 1
1
ˆ ˆ
( )
ˆ
var( )
E
 


is approximately distributed N(0,1) (CLT)
4-38
4-39
1. The probability framework for linear regression
2. Estimation
3. Hypothesis Testing (Section 4.5)
4. Confidence intervals
Suppose a skeptic suggests that reducing the number of
students in a class has no effect on learning or,
specifically, test scores. The skeptic thus asserts the
hypothesis,
H0: 1 = 0
We wish to test this hypothesis using data – reach a
tentative conclusion whether it is correct or incorrect.
4-40
Null hypothesis and two-sided alternative:
H0: 1 = 0 vs. H1: 1 0
or, more generally,
H0: 1 = 1,0 vs. H1: 1 1,0
where 1,0 is the hypothesized value under the null.
Null hypothesis and one-sided alternative:
H0: 1 = 1,0 vs. H1: 1 < 1,0
In economics, it is almost always possible to come up
with stories in which an effect could “go either way,” so
it is standard to focus on two-sided alternatives.
Recall hypothesis testing for population mean using Y :
4-41
t = ,0
/
Y
Y
Y
s n


then reject the null hypothesis if |t| >1.96.
where the SE of the estimator is the square root of an
estimator of the variance of the estimator.
Applied to a hypothesis about 1:
4-42
t =
estimator - hypothesized value
standard error of the estimator
so
t = 1 1,0
1
ˆ
ˆ
( )
SE
 


where 1 is the value of 1,0 hypothesized under the null
(for example, if the null value is zero, then 1,0 = 0.
What is SE( 1
ˆ
 )?
SE( 1
ˆ
 ) = the square root of an estimator of the
variance of the sampling distribution of 1
ˆ

4-43
Recall the expression for the variance of 1
ˆ
 (large n):
var( 1
ˆ
 ) = 2 2
var[( ) ]
( )
i x i
X
X u
n



=
2
4
v
X
n


where vi = (Xi – X )ui. Estimator of the variance of 1
ˆ
 :
1
2
ˆ
ˆ
 =
2
2 2
1 estimator of
(estimator of )
v
X
n



=
2 2
1
2
2
1
1
ˆ
( )
1 2
1
( )
n
i i
i
n
i
i
X X u
n
n
X X
n





 

 
 


.
4-44
1
2
ˆ
ˆ
 =
2 2
1
2
2
1
1
ˆ
( )
1 2
1
( )
n
i i
i
n
i
i
X X u
n
n
X X
n





 

 
 


.
OK, this is a bit nasty, but:
 There is no reason to memorize this
 It is computed automatically by regression software
 SE( 1
ˆ
 ) = 1
2
ˆ
ˆ
 is reported by regression software
 It is less complicated than it seems. The numerator
estimates the var(v), the denominator estimates
var(X).
4-45
Return to calculation of the t-statsitic:
t = 1 1,0
1
ˆ
ˆ
( )
SE
 


=
1
1 1,0
2
ˆ
ˆ
ˆ
 


 Reject at 5% significance level if |t| > 1.96
 p-value is p = Pr[|t| > |tact
|] = probability in tails of
normal outside |tact
|
 Both the previous statements are based on large-n
approximation; typically n = 50 is large enough for
the approximation to be excellent.
4-46
Example: Test Scores and STR, California data
Estimated regression line: TestScore = 698.9 – 2.28STR
Regression software reports the standard errors:
SE( 0
ˆ
 ) = 10.4 SE( 1
ˆ
 ) = 0.52
t-statistic testing 1,0 = 0 = 1 1,0
1
ˆ
ˆ
( )
SE
 


=
2.28 0
0.52
 
= –4.38
 The 1% 2-sided significance level is 2.58, so we reject
the null at the 1% significance level.
 Alternatively, we can compute the p-value…
4-47
The p-value based on the large-n standard normal
approximation to the t-statistic is 0.00001 (10–4
)
4-48
1. The probability framework for linear regression
2. Estimation
3. Hypothesis Testing
4. Confidence intervals (Section 4.6)
In general, if the sampling distribution of an estimator is
normal for large n, then a 95% confidence interval can be
constructed as estimator 1.96 standard error.
So: a 95% confidence interval for 1
ˆ
 is,
{ 1
ˆ
 1.96 SE( 1
ˆ
 )}
4-49
Example: Test Scores and STR, California data
Estimated regression line: TestScore = 698.9 – 2.28STR
SE( 0
ˆ
 ) = 10.4 SE( 1
ˆ
 ) = 0.52
95% confidence interval for 1
ˆ
 :
{ 1
ˆ
 1.96 SE( 1
ˆ
 )} = {–2.28 1.96 0.52}
= (–3.30, –1.26)
Equivalent statements:
 The 95% confidence interval does not include zero;
 The hypothesis 1 = 0 is rejected at the 5% level
A convention for reporting estimated regressions:
4-50
Put standard errors in parentheses below the estimates
TestScore = 698.9 – 2.28STR
(10.4) (0.52)
This expression means that:
 The estimated regression line is
TestScore = 698.9 – 2.28STR
 The standard error of 0
ˆ
 is 10.4
 The standard error of 1
ˆ
 is 0.52
4-51
OLS regression: STATA output
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.38 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------
so:
TestScore = 698.9 – 2.28STR
(10.4) (0.52)
t (1 = 0) = –4.38, p-value = 0.000
95% conf. interval for 1 is (–3.30, –1.26)
4-52
Regression when X is Binary (Section 4.7)
Sometimes a regressor is binary:
 X = 1 if female, = 0 if male
 X = 1 if treated (experimental drug), = 0 if not
 X = 1 if small class size, = 0 if not
So far, 1 has been called a “slope,” but that doesn’t
make much sense if X is binary.
How do we interpret regression with a binary regressor?
4-53
Yi = 0 + 1Xi + ui, where X is binary (Xi = 0 or 1):
 When Xi = 0: Yi = 0 + ui
 When Xi = 1: Yi = 0 + 1 + ui
thus:
 When Xi = 0, the mean of Yi is 0
 When Xi = 1, the mean of Yi is 0 + 1
that is:
 E(Yi|Xi=0) = 0
 E(Yi|Xi=1) = 0 + 1
so:
1 = E(Yi|Xi=1) – E(Yi|Xi=0)
= population difference in group means
4-54
Example: TestScore and STR, California data
Let
Di =
1 if 20
0 if 20
i
i
STR
STR





The OLS estimate of the regression line relating
TestScore to D (with standard errors in parentheses) is:
TestScore = 650.0 + 7.4D
(1.3) (1.8)
Difference in means between groups = 7.4;
SE = 1.8 t = 7.4/1.8 = 4.0
4-55
Compare the regression results with the group means,
computed directly:
Class Size Average score (Y ) Std. dev. (sY) N
Small (STR > 20) 657.4 19.4 238
Large (STR ≥ 20) 650.0 17.9 182
Estimation: small large
Y Y
 = 657.4 – 650.0 = 7.4
Test =0:
7.4
( ) 1.83
s l
s l
Y Y
t
SE Y Y

 

= 4.05
95% confidence interval ={7.4 1.96 1.83}=(3.8,11.0)
This is the same as in the regression!
TestScore = 650.0 + 7.4D
(1.3) (1.8)
4-56
Summary: regression when Xi is binary (0/1)
Yi = 0 + 1Xi + ui
 0 = mean of Y given that X = 0
 0 + 1 = mean of Y given that X = 1
 1 = difference in group means, X =1 minus X = 0
 SE( 1
ˆ
 ) has the usual interpretation
 t-statistics, confidence intervals constructed as usual
 This is another way to do difference-in-means
analysis
 The regression formulation is especially useful when
we have additional regressors (coming up soon…)
4-57
Other Regression Statistics (Section 4.8)
A natural question is how well the regression line “fits”
or explains the data. There are two regression statistics
that provide complementary measures of the quality of
fit:
 The regression R2
measures the fraction of the
variance of Y that is explained by X; it is unitless and
ranges between zero (no fit) and one (perfect fit)
 The standard error of the regression measures the fit
– the typical size of a regression residual – in the units
of Y.
4-58
The R2
Write Yi as the sum of the OLS prediction + OLS
residual:
Yi = ˆ
i
Y + ˆi
u
The R2
is the fraction of the sample variance of Yi
“explained” by the regression, that is, by ˆ
i
Y :
R2
=
ESS
TSS
,
where ESS = 2
1
ˆ ˆ
( )
n
i
i
Y Y


 and TSS = 2
1
( )
n
i
i
Y Y


 .
4-59
R2
=
ESS
TSS
, where ESS = 2
1
ˆ ˆ
( )
n
i
i
Y Y


 and TSS = 2
1
( )
n
i
i
Y Y



The R2
:
 R2
= 0 means ESS = 0, so X explains none of the
variation of Y
 R2
= 1 means ESS = TSS, so Y = ˆ
Y so X explains all of
the variation of Y
 0 ≤ R2
≤ 1
 For regression with a single regressor (the case here),
R2
is the square of the correlation coefficient between
X and Y
4-60
The Standard Error of the Regression (SER)
The standard error of the regression is (almost) the
sample standard deviation of the OLS residuals:
SER = 2
1
1
ˆ ˆ
( )
2
n
i i
i
u u
n 



= 2
1
1
ˆ
2
n
i
i
u
n 


(the second equality holds because
1
1
ˆ
n
i
i
u
n 
 = 0).
4-61
SER = 2
1
1
ˆ
2
n
i
i
u
n 


The SER:
 has the units of u, which are the units of Y
 measures the spread of the distribution of u
 measures the average “size” of the OLS residual (the
average “mistake” made by the OLS regression line)
 The root mean squared error (RMSE) is closely
related to the SER:
RMSE = 2
1
1
ˆ
n
i
i
u
n 

This measures the same thing as the SER – the minor
difference is division by 1/n instead of 1/(n–2).
4-62
Technical note: why divide by n–2 instead of n–1?
SER = 2
1
1
ˆ
2
n
i
i
u
n 


 Division by n–2 is a “degrees of freedom” correction
like division by n–1 in 2
Y
s ; the difference is that, in the
SER, two parameters have been estimated (0 and 1, by
0
ˆ
 and 1
ˆ
 ), whereas in 2
Y
s only one has been estimated
(Y, by Y ).
 When n is large, it makes negligible difference whether
n, n–1, or n–2 are used – although the conventional
formula uses n–2 when there is a single regressor.
 For details, see Section 15.4
4-63
Example of R2
and SER
TestScore = 698.9 – 2.28STR, R2
= .05, SER = 18.6
(10.4) (0.52)
The slope coefficient is statistically significant and large
in a policy sense, even though STR explains only a small
fraction of the variation in test scores.
4-64
A Practical Note: Heteroskedasticity,
Homoskedasticity, and the Formula for the Standard
Errors of 0
ˆ
 and 1
ˆ
 (Section 4.9)
 What do these two terms mean?
 Consequences of homoskedasticity
 Implication for computing standard errors
What do these two terms mean?
If var(u|X=x) is constant – that is, the variance of the
conditional distribution of u given X does not depend on
X, then u is said to be homoskedastic. Otherwise, u is
said to be heteroskedastic.
4-65
Homoskedasticity in a picture:
 E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)
 The variance of u does not change with (depend on) x
4-66
Heteroskedasticity in a picture:
 E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)
 The variance of u depends on x – so u is
heteroskedastic.
4-67
An real-world example of heteroskedasticity from labor
economics: average hourly earnings vs. years of
education (data source: 1999 Current Population Survey)
Average
hourly
earnings
Scatterplot and OLS Regression Line
Years of Education
Average Hourly Earnings Fitted values
5 10 15 20
0
20
40
60
4-68
Is heteroskedasticity present in the class size data?
Hard to say…looks nearly homoskedastic, but the spread
might be tighter for large values of STR.
4-69
So far we have (without saying so) assumed that u is
heteroskedastic:
Recall the three least squares assumptions:
1. The conditional distribution of u given X has mean
zero, that is, E(u|X = x) = 0.
2. (Xi,Yi), i =1,…,n, are i.i.d.
3. X and u have four finite moments.
Heteroskedasticity and homoskedasticity concern
var(u|X=x). Because we have not explicitly assumed
homoskedastic errors, we have implicitly allowed for
heteroskedasticity.
4-70
What if the errors are in fact homoskedastic?:
 You can prove some theorems about OLS (in
particular, the Gauss-Markov theorem, which says
that OLS is the estimator with the lowest variance
among all estimators that are linear functions of
(Y1,…,Yn); see Section 15.5).
 The formula for the variance of 1
ˆ
 and the OLS
standard error simplifies (App. 4.4): If var(ui|Xi=x) =
2
u
 , then
var( 1
ˆ
 ) = 2 2
var[( ) ]
( )
i x i
X
X u
n



= … =
2
2
u
X
n


Note: var( 1
ˆ
 ) is inversely proportional to var(X):
more spread in X means more information about 1
ˆ
 .
4-71
General formula for the standard error of 1
ˆ
 is the of:
1
2
ˆ
ˆ
 =
2 2
1
2
2
1
1
ˆ
( )
1 2
1
( )
n
i i
i
n
i
i
X X u
n
n
X X
n





 

 
 


.
Special case under homoskedasticity:
1
2
ˆ
ˆ
 =
2
1
2
1
1
ˆ
1 2
1
( )
n
i
i
n
i
i
u
n
n
X X
n







.
Sometimes it is said that the lower formula is simpler.
4-72
The homoskedasticity-only formula for the standard error
of 1
ˆ
 and the “heteroskedasticity-robust” formula (the
formula that is valid under heteroskedasticity) differ – in
general, you get different standard errors using the
different formulas.
Homoskedasticity-only standard errors are the
default setting in regression software –
sometimes the only setting (e.g. Excel). To get
the general “heteroskedasticity-robust”
standard errors you must override the default.
If you don’t override the default and there is in fact
heteroskedasticity, you will get the wrong standard errors
(and wrong t-statistics and confidence intervals).
4-73
The critical points:
 If the errors are homoskedastic and you use the
heteroskedastic formula for standard errors (the one
we derived), you are OK
 If the errors are heteroskedastic and you use the
homoskedasticity-only formula for standard errors,
the standard errors are wrong.
 The two formulas coincide (when n is large) in the
special case of homoskedasticity
 The bottom line: you should always use the
heteroskedasticity-based formulas – these are
conventionally called the heteroskedasticity-robust
standard errors.
4-74
Heteroskedasticity-robust standard errors in STATA
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------
Use the “, robust” option!!!
4-75
Summary and Assessment (Section 4.10)
 The initial policy question:
Suppose new teachers are hired so the student-
teacher ratio falls by one student per class. What
is the effect of this policy intervention (this
“treatment”) on test scores?
 Does our regression analysis give a convincing answer?
Not really – districts with low STR tend to be ones
with lots of other resources and higher income
families, which provide kids with more learning
opportunities outside school…this suggests that
corr(ui,STRi) > 0, so E(ui|Xi) 0.
4-76
Digression on Causality
The original question (what is the quantitative effect of
an intervention that reduces class size?) is a question
about a causal effect: the effect on Y of applying a unit
of the treatment is 1.
 But what is, precisely, a causal effect?
 The common-sense definition of causality isn’t
precise enough for our purposes.
 In this course, we define a causal effect as the effect
that is measured in an ideal randomized controlled
experiment.
4-77
Ideal Randomized Controlled Experiment
 Ideal: subjects all follow the treatment protocol –
perfect compliance, no errors in reporting, etc.!
 Randomized: subjects from the population of interest
are randomly assigned to a treatment or control group
(so there are no confounding factors)
 Controlled: having a control group permits
measuring the differential effect of the treatment
 Experiment: the treatment is assigned as part of the
experiment: the subjects have no choice, which
means that there is no “reverse causality” in which
subjects choose the treatment they think will work
best.
4-78
Back to class size:
 What is an ideal randomized controlled experiment for
measuring the effect on Test Score of reducing STR?
 How does our regression analysis of observational data
differ from this ideal?
oThe treatment is not randomly assigned
oIn the US – in our observational data – districts with
higher family incomes are likely to have both
smaller classes and higher test scores.
oAs a result it is plausible that E(ui|Xi=x) 0.
oIf so, Least Squares Assumption #1 does not hold.
oIf so, 1
ˆ
 is biased: does an omitted factor make
class size seem more important than it really is?

More Related Content

Similar to Ch 4 Slides.doc655444444444444445678888776

Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.pptbranlymbunga1
 
Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptrahulrkmgb09
 
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92ohenebabismark508
 
principle component analysis.pptx
principle component analysis.pptxprinciple component analysis.pptx
principle component analysis.pptxwahid ullah
 
chi square test lecture for nurses in pakistan
chi square test lecture for nurses in pakistanchi square test lecture for nurses in pakistan
chi square test lecture for nurses in pakistanmuhammad shahid
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2Padma Metta
 
Lecture2-LinearRegression.ppt
Lecture2-LinearRegression.pptLecture2-LinearRegression.ppt
Lecture2-LinearRegression.pptssuser36911e
 
Outlier detection by Ueda's method
Outlier detection by Ueda's methodOutlier detection by Ueda's method
Outlier detection by Ueda's methodPOOJA PATIL
 
1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docxjackiewalcutt
 
Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCAAbdullah al Mamun
 

Similar to Ch 4 Slides.doc655444444444444445678888776 (20)

Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
 
principle component analysis.pptx
principle component analysis.pptxprinciple component analysis.pptx
principle component analysis.pptx
 
chi square test lecture for nurses in pakistan
chi square test lecture for nurses in pakistanchi square test lecture for nurses in pakistan
chi square test lecture for nurses in pakistan
 
Factorial Experiments
Factorial ExperimentsFactorial Experiments
Factorial Experiments
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
 
Project2
Project2Project2
Project2
 
Statistics lesson 2
Statistics lesson 2Statistics lesson 2
Statistics lesson 2
 
Lecture2-LinearRegression.ppt
Lecture2-LinearRegression.pptLecture2-LinearRegression.ppt
Lecture2-LinearRegression.ppt
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Outlier detection by Ueda's method
Outlier detection by Ueda's methodOutlier detection by Ueda's method
Outlier detection by Ueda's method
 
Ch01 03
Ch01 03Ch01 03
Ch01 03
 
1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx1. Consider the following partially completed computer printout fo.docx
1. Consider the following partially completed computer printout fo.docx
 
Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCA
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
I stata
I stataI stata
I stata
 

More from ohenebabismark508

Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344ohenebabismark508
 
Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838ohenebabismark508
 
Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777ohenebabismark508
 
Ch 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of businessCh 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of businessohenebabismark508
 
Ch 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modelingCh 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modelingohenebabismark508
 
Chapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric bookChapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric bookohenebabismark508
 

More from ohenebabismark508 (6)

Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344
 
Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838Ch 8 Slides.doc28383&3&3&39388383288283838
Ch 8 Slides.doc28383&3&3&39388383288283838
 
Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777Ch 10 Slides.doc546555544554555455555777777
Ch 10 Slides.doc546555544554555455555777777
 
Ch 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of businessCh 12 Slides.doc. Introduction of science of business
Ch 12 Slides.doc. Introduction of science of business
 
Ch 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modelingCh 11 Slides.doc. Introduction to econometric modeling
Ch 11 Slides.doc. Introduction to econometric modeling
 
Chapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric bookChapter_1_Intro.pptx. Introductory econometric book
Chapter_1_Intro.pptx. Introductory econometric book
 

Recently uploaded

do's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of Jobdo's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of JobRemote DBA Services
 
VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...
VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...
VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...Suhani Kapoor
 
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big BoodyDubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boodykojalkojal131
 
VIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service Bhiwandi
VIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service BhiwandiVIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service Bhiwandi
VIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service BhiwandiSuhani Kapoor
 
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Employee of the Month - Samsung Semiconductor India Research
Employee of the Month - Samsung Semiconductor India ResearchEmployee of the Month - Samsung Semiconductor India Research
Employee of the Month - Samsung Semiconductor India ResearchSoham Mondal
 
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen DatingDubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Datingkojalkojal131
 
VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...
VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...
VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...Suhani Kapoor
 
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual serviceanilsa9823
 
The Impact of Socioeconomic Status on Education.pdf
The Impact of Socioeconomic Status on Education.pdfThe Impact of Socioeconomic Status on Education.pdf
The Impact of Socioeconomic Status on Education.pdftheknowledgereview1
 
Dubai Call Girls Naija O525547819 Call Girls In Dubai Home Made
Dubai Call Girls Naija O525547819 Call Girls In Dubai Home MadeDubai Call Girls Naija O525547819 Call Girls In Dubai Home Made
Dubai Call Girls Naija O525547819 Call Girls In Dubai Home Madekojalkojal131
 
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...shivangimorya083
 
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证obuhobo
 
VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...Suhani Kapoor
 
Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...
Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...
Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...Niya Khan
 
VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...
VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...
VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...Suhani Kapoor
 
Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012
Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012
Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012rehmti665
 
VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...Suhani Kapoor
 
Internshala Student Partner 6.0 Jadavpur University Certificate
Internshala Student Partner 6.0 Jadavpur University CertificateInternshala Student Partner 6.0 Jadavpur University Certificate
Internshala Student Partner 6.0 Jadavpur University CertificateSoham Mondal
 

Recently uploaded (20)

do's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of Jobdo's and don'ts in Telephone Interview of Job
do's and don'ts in Telephone Interview of Job
 
VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...
VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...
VIP Russian Call Girls in Bhilai Deepika 8250192130 Independent Escort Servic...
 
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big BoodyDubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
 
VIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service Bhiwandi
VIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service BhiwandiVIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service Bhiwandi
VIP Call Girl Bhiwandi Aashi 8250192130 Independent Escort Service Bhiwandi
 
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Employee of the Month - Samsung Semiconductor India Research
Employee of the Month - Samsung Semiconductor India ResearchEmployee of the Month - Samsung Semiconductor India Research
Employee of the Month - Samsung Semiconductor India Research
 
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen DatingDubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
Dubai Call Girls Starlet O525547819 Call Girls Dubai Showen Dating
 
VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...
VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...
VIP Call Girls Firozabad Aaradhya 8250192130 Independent Escort Service Firoz...
 
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Gosainganj Lucknow best sexual service
 
The Impact of Socioeconomic Status on Education.pdf
The Impact of Socioeconomic Status on Education.pdfThe Impact of Socioeconomic Status on Education.pdf
The Impact of Socioeconomic Status on Education.pdf
 
Dubai Call Girls Naija O525547819 Call Girls In Dubai Home Made
Dubai Call Girls Naija O525547819 Call Girls In Dubai Home MadeDubai Call Girls Naija O525547819 Call Girls In Dubai Home Made
Dubai Call Girls Naija O525547819 Call Girls In Dubai Home Made
 
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
 
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
女王大学硕士毕业证成绩单(加急办理)认证海外毕业证
 
VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...
VIP Call Girls Service Jamshedpur Aishwarya 8250192130 Independent Escort Ser...
 
Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...
Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...
Neha +91-9537192988-Friendly Ahmedabad Call Girls has Complete Authority for ...
 
VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...
VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...
VIP Call Girls in Jamshedpur Aarohi 8250192130 Independent Escort Service Jam...
 
Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012
Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012
Call Girls Mukherjee Nagar Delhi reach out to us at ☎ 9711199012
 
VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...
VIP Call Girls Service Cuttack Aishwarya 8250192130 Independent Escort Servic...
 
Internshala Student Partner 6.0 Jadavpur University Certificate
Internshala Student Partner 6.0 Jadavpur University CertificateInternshala Student Partner 6.0 Jadavpur University Certificate
Internshala Student Partner 6.0 Jadavpur University Certificate
 

Ch 4 Slides.doc655444444444444445678888776

  • 1. 4-1 Introduction to Linear Regression (SW Chapter 4) Empirical problem: Class size and educational output  Policy question: What is the effect of reducing class size by one student per class? by 8 students/class?  What is the right output (performance) measure?  parent satisfaction  student personal development  future adult welfare  future adult earnings  performance on standardized tests
  • 2. 4-2 What do data say about class sizes and test scores? The California Test Score Data Set All K-6 and K-8 California school districts (n = 420) Variables:  5th grade test scores (Stanford-9 achievement test, combined math and reading), district average  Student-teacher ratio (STR) = no. of students in the district divided by no. full-time equivalent teachers
  • 3. 4-3 An initial look at the California test score data:
  • 4. 4-4 Do districts with smaller classes (lower STR) have higher test scores?
  • 5. 4-5 The class size/test score policy question:  What is the effect on test scores of reducing STR by one student/class?  Object of policy interest: Test score STR    This is the slope of the line relating test score and STR
  • 6. 4-6 This suggests that we want to draw a line through the Test Score v. STR scatterplot – but how?
  • 7. 4-7 Some Notation and Terminology (Sections 4.1 and 4.2) The population regression line: Test Score = 0 + 1STR 1 = slope of population regression line = Test score STR   = change in test score for a unit change in STR  Why are 0 and 1 “population” parameters?  We would like to know the population value of 1.  We don’t know 1, so must estimate it using data.
  • 8. 4-8 How can we estimate 0 and 1 from data? Recall that Y was the least squares estimator of Y: Y solves, 2 1 min ( ) n m i i Y m    By analogy, we will focus on the least squares (“ordinary least squares” or “OLS”) estimator of the unknown parameters 0 and 1, which solves, 0 1 2 , 0 1 1 min [ ( )] n b b i i i Y b b X    
  • 9. 4-9 The OLS estimator solves: 0 1 2 , 0 1 1 min [ ( )] n b b i i i Y b b X      The OLS estimator minimizes the average squared difference between the actual values of Yi and the prediction (predicted value) based on the estimated line.  This minimization problem can be solved using calculus (App. 4.2).  The result is the OLS estimators of 0 and 1.
  • 10. 4-10 Why use OLS, rather than some other estimator?  OLS is a generalization of the sample average: if the “line” is just an intercept (no X), then the OLS estimator is just the sample average of Y1,…Yn (Y ).  Like Y , the OLS estimator has some desirable properties: under certain assumptions, it is unbiased (that is, E( 1 ˆ  ) = 1), and it has a tighter sampling distribution than some other candidate estimators of 1 (more on this later)  Importantly, this is what everyone uses – the common “language” of linear regression.
  • 11. 4-11
  • 12. 4-12 Application to the California Test Score – Class Size data Estimated slope = 1 ˆ  = – 2.28 Estimated intercept = 0 ˆ  = 698.9 Estimated regression line: TestScore = 698.9 – 2.28 STR
  • 13. 4-13 Interpretation of the estimated slope and intercept TestScore = 698.9 – 2.28 STR  Districts with one more student per teacher on average have test scores that are 2.28 points lower.  That is, Test score STR   = –2.28  The intercept (taken literally) means that, according to this estimated line, districts with zero students per teacher would have a (predicted) test score of 698.9.  This interpretation of the intercept makes no sense – it extrapolates the line outside the range of the data – in this application, the intercept is not itself economically meaningful.
  • 14. 4-14 Predicted values & residuals: One of the districts in the data set is Antelope, CA, for which STR = 19.33 and Test Score = 657.8 predicted value: ˆ Antelope Y = 698.9 – 2.28 19.33 = 654.8 residual: ˆAntelope u = 657.8 – 654.8 = 3.0
  • 15. 4-15 OLS regression: STATA output regress testscr str, robust Regression with robust standard errors Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 ------------------------------------------------------------------------- | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671 _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057 ------------------------------------------------------------------------- TestScore = 698.9 – 2.28 STR (we’ll discuss the rest of this output later)
  • 16. 4-16 The OLS regression line is an estimate, computed using our sample of data; a different sample would have given a different value of 1 ˆ  . How can we:  quantify the sampling uncertainty associated with 1 ˆ  ?  use 1 ˆ  to test hypotheses such as 1 = 0?  construct a confidence interval for 1? Like estimation of the mean, we proceed in four steps: 1. The probability framework for linear regression 2. Estimation 3. Hypothesis Testing 4. Confidence intervals
  • 17. 4-17 1. Probability Framework for Linear Regression Population population of interest (ex: all possible school districts) Random variables: Y, X Ex: (Test Score, STR) Joint distribution of (Y,X) The key feature is that we suppose there is a linear relation in the population that relates X and Y; this linear relation is the “population linear regression”
  • 18. 4-18 The Population Linear Regression Model (Section 4.3) Yi = 0 + 1Xi + ui, i = 1,…, n  X is the independent variable or regressor  Y is the dependent variable  0 = intercept  1 = slope  ui = “error term”  The error term consists of omitted factors, or possibly measurement error in the measurement of Y. In general, these omitted factors are other factors that influence Y, other than the variable X
  • 19. 4-19 Ex.: The population regression line and the error term What are some of the omitted factors in this example?
  • 20. 4-20 Data and sampling The population objects (“parameters”) 0 and 1 are unknown; so to draw inferences about these unknown parameters we must collect relevant data. Simple random sampling: Choose n entities at random from the population of interest, and observe (record) X and Y for each entity Simple random sampling implies that {(Xi, Yi)}, i = 1,…, n, are independently and identically distributed (i.i.d.). (Note: (Xi, Yi) are distributed independently of (Xj, Yj) for different observations i and j.)
  • 21. 4-21 Task at hand: to characterize the sampling distribution of the OLS estimator. To do so, we make three assumptions: The Least Squares Assumptions 1. The conditional distribution of u given X has mean zero, that is, E(u|X = x) = 0. 2. (Xi,Yi), i =1,…,n, are i.i.d. 3. X and u have four moments, that is: E(X4 ) < and E(u4 ) < . We’ll discuss these assumptions in order.
  • 22. 4-22 Least squares assumption #1: E(u|X = x) = 0. For any given value of X, the mean of u is zero
  • 23. 4-23 Example: Assumption #1 and the class size example Test Scorei = 0 + 1STRi + ui, ui = other factors “Other factors:”  parental involvement  outside learning opportunities (extra math class,..)  home environment conducive to reading  family income is a useful proxy for many such factors So E(u|X=x) = 0 means E(Family Income|STR) = constant (which implies that family income and STR are uncorrelated). This assumption is not innocuous! We will return to it often.
  • 24. 4-24 Least squares assumption #2: (Xi,Yi), i = 1,…,n are i.i.d. This arises automatically if the entity (individual, district) is sampled by simple random sampling: the entity is selected then, for that entity, X and Y are observed (recorded). The main place we will encounter non-i.i.d. sampling is when data are recorded over time (“time series data”) – this will introduce some extra complications.
  • 25. 4-25 Least squares assumption #3: E(X4 ) < and E(u4 ) < Because Yi = 0 + 1Xi + ui, assumption #3 can equivalently be stated as, E(X4 ) < and E(Y4 ) < . Assumption #3 is generally plausible. A finite domain of the data implies finite fourth moments. (Standardized test scores automatically satisfy this; STR, family income, etc. satisfy this too).
  • 26. 4-26 1. The probability framework for linear regression 2. Estimation: the Sampling Distribution of 1 ˆ  (Section 4.4) 3. Hypothesis Testing 4. Confidence intervals Like Y , 1 ˆ  has a sampling distribution.  What is E( 1 ˆ  )? (where is it centered)  What is var( 1 ˆ  )? (measure of sampling uncertainty)  What is its sampling distribution in small samples?  What is its sampling distribution in large samples?
  • 27. 4-27 The sampling distribution of 1 ˆ  : some algebra: Yi = 0 + 1Xi + ui Y = 0 + 1 X + u so Yi – Y = 1(Xi – X ) + (ui – u ) Thus, 1 ˆ  = 1 2 1 ( )( ) ( ) n i i i n i i X X Y Y X X        = 1 1 2 1 ( )[ ( ) ( )] ( ) n i i i i n i i X X X X u u X X          
  • 28. 4-28 1 ˆ  = 1 1 2 1 ( )[ ( ) ( )] ( ) n i i i i n i i X X X X u u X X           = 1 1 1 2 2 1 1 ( )( ) ( )( ) ( ) ( ) n n i i i i i i n n i i i i X X X X X X u u X X X X                 so 1 ˆ  – 1 = 1 2 1 ( )( ) ( ) n i i i n i i X X u u X X       
  • 29. 4-29 We can simplify this formula by noting that: 1 ( )( ) n i i i X X u u     = 1 ( ) n i i i X X u    – 1 ( ) n i i X X u          = 1 ( ) n i i i X X u    . Thus 1 ˆ  – 1 = 1 2 1 ( ) ( ) n i i i n i i X X u X X       = 1 2 1 1 n i i X v n n s n          where vi = (Xi – X )ui.
  • 30. 4-30 1 ˆ  – 1 = 1 2 1 1 n i i X v n n s n          , where vi = (Xi – X )ui We now can calculate the mean and variance of 1 ˆ  : E( 1 ˆ  – 1) = 2 1 1 1 n i X i n E v s n n                = 2 1 1 1 n i i X v n E n n s                = 2 1 1 1 n i i X v n E n n s               
  • 31. 4-31 Now E(vi/ 2 X s ) = E[(Xi – X )ui/ 2 X s ] = 0 because E(ui|Xi=x) = 0 (for details see App. 4.3) Thus, E( 1 ˆ  – 1) = 2 1 1 1 n i i X v n E n n s                = 0 so E( 1 ˆ  ) = 1 That is, 1 ˆ  is an unbiased estimator of 1.
  • 32. 4-32 Calculation of the variance of 1 ˆ  : 1 ˆ  – 1 = 1 2 1 1 n i i X v n n s n          This calculation is simplified by supposing that n is large (so that 2 X s can be replaced by 2 X  ); the result is, var( 1 ˆ  ) = 2 var( ) X v n (For details see App. 4.3.)
  • 33. 4-33 The exact sampling distribution is complicated, but when the sample size is large we get some simple (and good) approximations: (1) Because var( 1 ˆ  ) 1/n and E( 1 ˆ  ) = 1, 1 ˆ  p  1 (2) When n is large, the sampling distribution of 1 ˆ  is well approximated by a normal distribution (CLT)
  • 34. 4-34 1 ˆ  – 1 = 1 2 1 1 n i i X v n n s n          When n is large:  vi = (Xi – X )ui (Xi – X)ui, which is i.i.d. (why?) and has two moments, that is, var(vi) < (why?). Thus 1 1 n i i v n   is distributed N(0,var(v)/n) when n is large  2 X s is approximately equal to 2 X  when n is large  1 n n  = 1 – 1 n 1 when n is large Putting these together we have: Large-n approximation to the distribution of 1 ˆ  :
  • 35. 4-35 1 ˆ  – 1 = 1 2 1 1 n i i X v n n s n          1 2 1 n i i X v n    , which is approximately distributed N(0, 2 2 2 ( ) v X n   ). Because vi = (Xi – X )ui, we can write this as: 1 ˆ  is approximately distributed N(1, 4 var[( ) ] i x i X X u n    )
  • 36. 4-36 Recall the summary of the sampling distribution of Y : For (Y1,…,Yn) i.i.d. with 0 < 2 Y  < ,  The exact (finite sample) sampling distribution of Y has mean Y (“Y is an unbiased estimator of Y”) and variance 2 Y  /n  Other than its mean and variance, the exact distribution of Y is complicated and depends on the distribution of Y  Y p  Y (law of large numbers)  ( ) var( ) Y E Y Y  is approximately distributed N(0,1) (CLT)
  • 37. 4-37 Parallel conclusions hold for the OLS estimator 1 ˆ  : Under the three Least Squares Assumptions,  The exact (finite sample) sampling distribution of 1 ˆ  has mean 1 (“ 1 ˆ  is an unbiased estimator of 1”), and var( 1 ˆ  ) is inversely proportional to n.  Other than its mean and variance, the exact distribution of 1 ˆ  is complicated and depends on the distribution of (X,u)  1 ˆ  p  1 (law of large numbers)  1 1 1 ˆ ˆ ( ) ˆ var( ) E     is approximately distributed N(0,1) (CLT)
  • 38. 4-38
  • 39. 4-39 1. The probability framework for linear regression 2. Estimation 3. Hypothesis Testing (Section 4.5) 4. Confidence intervals Suppose a skeptic suggests that reducing the number of students in a class has no effect on learning or, specifically, test scores. The skeptic thus asserts the hypothesis, H0: 1 = 0 We wish to test this hypothesis using data – reach a tentative conclusion whether it is correct or incorrect.
  • 40. 4-40 Null hypothesis and two-sided alternative: H0: 1 = 0 vs. H1: 1 0 or, more generally, H0: 1 = 1,0 vs. H1: 1 1,0 where 1,0 is the hypothesized value under the null. Null hypothesis and one-sided alternative: H0: 1 = 1,0 vs. H1: 1 < 1,0 In economics, it is almost always possible to come up with stories in which an effect could “go either way,” so it is standard to focus on two-sided alternatives. Recall hypothesis testing for population mean using Y :
  • 41. 4-41 t = ,0 / Y Y Y s n   then reject the null hypothesis if |t| >1.96. where the SE of the estimator is the square root of an estimator of the variance of the estimator. Applied to a hypothesis about 1:
  • 42. 4-42 t = estimator - hypothesized value standard error of the estimator so t = 1 1,0 1 ˆ ˆ ( ) SE     where 1 is the value of 1,0 hypothesized under the null (for example, if the null value is zero, then 1,0 = 0. What is SE( 1 ˆ  )? SE( 1 ˆ  ) = the square root of an estimator of the variance of the sampling distribution of 1 ˆ 
  • 43. 4-43 Recall the expression for the variance of 1 ˆ  (large n): var( 1 ˆ  ) = 2 2 var[( ) ] ( ) i x i X X u n    = 2 4 v X n   where vi = (Xi – X )ui. Estimator of the variance of 1 ˆ  : 1 2 ˆ ˆ  = 2 2 2 1 estimator of (estimator of ) v X n    = 2 2 1 2 2 1 1 ˆ ( ) 1 2 1 ( ) n i i i n i i X X u n n X X n               .
  • 44. 4-44 1 2 ˆ ˆ  = 2 2 1 2 2 1 1 ˆ ( ) 1 2 1 ( ) n i i i n i i X X u n n X X n               . OK, this is a bit nasty, but:  There is no reason to memorize this  It is computed automatically by regression software  SE( 1 ˆ  ) = 1 2 ˆ ˆ  is reported by regression software  It is less complicated than it seems. The numerator estimates the var(v), the denominator estimates var(X).
  • 45. 4-45 Return to calculation of the t-statsitic: t = 1 1,0 1 ˆ ˆ ( ) SE     = 1 1 1,0 2 ˆ ˆ ˆ      Reject at 5% significance level if |t| > 1.96  p-value is p = Pr[|t| > |tact |] = probability in tails of normal outside |tact |  Both the previous statements are based on large-n approximation; typically n = 50 is large enough for the approximation to be excellent.
  • 46. 4-46 Example: Test Scores and STR, California data Estimated regression line: TestScore = 698.9 – 2.28STR Regression software reports the standard errors: SE( 0 ˆ  ) = 10.4 SE( 1 ˆ  ) = 0.52 t-statistic testing 1,0 = 0 = 1 1,0 1 ˆ ˆ ( ) SE     = 2.28 0 0.52   = –4.38  The 1% 2-sided significance level is 2.58, so we reject the null at the 1% significance level.  Alternatively, we can compute the p-value…
  • 47. 4-47 The p-value based on the large-n standard normal approximation to the t-statistic is 0.00001 (10–4 )
  • 48. 4-48 1. The probability framework for linear regression 2. Estimation 3. Hypothesis Testing 4. Confidence intervals (Section 4.6) In general, if the sampling distribution of an estimator is normal for large n, then a 95% confidence interval can be constructed as estimator 1.96 standard error. So: a 95% confidence interval for 1 ˆ  is, { 1 ˆ  1.96 SE( 1 ˆ  )}
  • 49. 4-49 Example: Test Scores and STR, California data Estimated regression line: TestScore = 698.9 – 2.28STR SE( 0 ˆ  ) = 10.4 SE( 1 ˆ  ) = 0.52 95% confidence interval for 1 ˆ  : { 1 ˆ  1.96 SE( 1 ˆ  )} = {–2.28 1.96 0.52} = (–3.30, –1.26) Equivalent statements:  The 95% confidence interval does not include zero;  The hypothesis 1 = 0 is rejected at the 5% level A convention for reporting estimated regressions:
  • 50. 4-50 Put standard errors in parentheses below the estimates TestScore = 698.9 – 2.28STR (10.4) (0.52) This expression means that:  The estimated regression line is TestScore = 698.9 – 2.28STR  The standard error of 0 ˆ  is 10.4  The standard error of 1 ˆ  is 0.52
  • 51. 4-51 OLS regression: STATA output regress testscr str, robust Regression with robust standard errors Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 ------------------------------------------------------------------------- | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- str | -2.279808 .5194892 -4.38 0.000 -3.300945 -1.258671 _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057 ------------------------------------------------------------------------- so: TestScore = 698.9 – 2.28STR (10.4) (0.52) t (1 = 0) = –4.38, p-value = 0.000 95% conf. interval for 1 is (–3.30, –1.26)
  • 52. 4-52 Regression when X is Binary (Section 4.7) Sometimes a regressor is binary:  X = 1 if female, = 0 if male  X = 1 if treated (experimental drug), = 0 if not  X = 1 if small class size, = 0 if not So far, 1 has been called a “slope,” but that doesn’t make much sense if X is binary. How do we interpret regression with a binary regressor?
  • 53. 4-53 Yi = 0 + 1Xi + ui, where X is binary (Xi = 0 or 1):  When Xi = 0: Yi = 0 + ui  When Xi = 1: Yi = 0 + 1 + ui thus:  When Xi = 0, the mean of Yi is 0  When Xi = 1, the mean of Yi is 0 + 1 that is:  E(Yi|Xi=0) = 0  E(Yi|Xi=1) = 0 + 1 so: 1 = E(Yi|Xi=1) – E(Yi|Xi=0) = population difference in group means
  • 54. 4-54 Example: TestScore and STR, California data Let Di = 1 if 20 0 if 20 i i STR STR      The OLS estimate of the regression line relating TestScore to D (with standard errors in parentheses) is: TestScore = 650.0 + 7.4D (1.3) (1.8) Difference in means between groups = 7.4; SE = 1.8 t = 7.4/1.8 = 4.0
  • 55. 4-55 Compare the regression results with the group means, computed directly: Class Size Average score (Y ) Std. dev. (sY) N Small (STR > 20) 657.4 19.4 238 Large (STR ≥ 20) 650.0 17.9 182 Estimation: small large Y Y  = 657.4 – 650.0 = 7.4 Test =0: 7.4 ( ) 1.83 s l s l Y Y t SE Y Y     = 4.05 95% confidence interval ={7.4 1.96 1.83}=(3.8,11.0) This is the same as in the regression! TestScore = 650.0 + 7.4D (1.3) (1.8)
  • 56. 4-56 Summary: regression when Xi is binary (0/1) Yi = 0 + 1Xi + ui  0 = mean of Y given that X = 0  0 + 1 = mean of Y given that X = 1  1 = difference in group means, X =1 minus X = 0  SE( 1 ˆ  ) has the usual interpretation  t-statistics, confidence intervals constructed as usual  This is another way to do difference-in-means analysis  The regression formulation is especially useful when we have additional regressors (coming up soon…)
  • 57. 4-57 Other Regression Statistics (Section 4.8) A natural question is how well the regression line “fits” or explains the data. There are two regression statistics that provide complementary measures of the quality of fit:  The regression R2 measures the fraction of the variance of Y that is explained by X; it is unitless and ranges between zero (no fit) and one (perfect fit)  The standard error of the regression measures the fit – the typical size of a regression residual – in the units of Y.
  • 58. 4-58 The R2 Write Yi as the sum of the OLS prediction + OLS residual: Yi = ˆ i Y + ˆi u The R2 is the fraction of the sample variance of Yi “explained” by the regression, that is, by ˆ i Y : R2 = ESS TSS , where ESS = 2 1 ˆ ˆ ( ) n i i Y Y    and TSS = 2 1 ( ) n i i Y Y    .
  • 59. 4-59 R2 = ESS TSS , where ESS = 2 1 ˆ ˆ ( ) n i i Y Y    and TSS = 2 1 ( ) n i i Y Y    The R2 :  R2 = 0 means ESS = 0, so X explains none of the variation of Y  R2 = 1 means ESS = TSS, so Y = ˆ Y so X explains all of the variation of Y  0 ≤ R2 ≤ 1  For regression with a single regressor (the case here), R2 is the square of the correlation coefficient between X and Y
  • 60. 4-60 The Standard Error of the Regression (SER) The standard error of the regression is (almost) the sample standard deviation of the OLS residuals: SER = 2 1 1 ˆ ˆ ( ) 2 n i i i u u n     = 2 1 1 ˆ 2 n i i u n    (the second equality holds because 1 1 ˆ n i i u n   = 0).
  • 61. 4-61 SER = 2 1 1 ˆ 2 n i i u n    The SER:  has the units of u, which are the units of Y  measures the spread of the distribution of u  measures the average “size” of the OLS residual (the average “mistake” made by the OLS regression line)  The root mean squared error (RMSE) is closely related to the SER: RMSE = 2 1 1 ˆ n i i u n   This measures the same thing as the SER – the minor difference is division by 1/n instead of 1/(n–2).
  • 62. 4-62 Technical note: why divide by n–2 instead of n–1? SER = 2 1 1 ˆ 2 n i i u n     Division by n–2 is a “degrees of freedom” correction like division by n–1 in 2 Y s ; the difference is that, in the SER, two parameters have been estimated (0 and 1, by 0 ˆ  and 1 ˆ  ), whereas in 2 Y s only one has been estimated (Y, by Y ).  When n is large, it makes negligible difference whether n, n–1, or n–2 are used – although the conventional formula uses n–2 when there is a single regressor.  For details, see Section 15.4
  • 63. 4-63 Example of R2 and SER TestScore = 698.9 – 2.28STR, R2 = .05, SER = 18.6 (10.4) (0.52) The slope coefficient is statistically significant and large in a policy sense, even though STR explains only a small fraction of the variation in test scores.
  • 64. 4-64 A Practical Note: Heteroskedasticity, Homoskedasticity, and the Formula for the Standard Errors of 0 ˆ  and 1 ˆ  (Section 4.9)  What do these two terms mean?  Consequences of homoskedasticity  Implication for computing standard errors What do these two terms mean? If var(u|X=x) is constant – that is, the variance of the conditional distribution of u given X does not depend on X, then u is said to be homoskedastic. Otherwise, u is said to be heteroskedastic.
  • 65. 4-65 Homoskedasticity in a picture:  E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)  The variance of u does not change with (depend on) x
  • 66. 4-66 Heteroskedasticity in a picture:  E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)  The variance of u depends on x – so u is heteroskedastic.
  • 67. 4-67 An real-world example of heteroskedasticity from labor economics: average hourly earnings vs. years of education (data source: 1999 Current Population Survey) Average hourly earnings Scatterplot and OLS Regression Line Years of Education Average Hourly Earnings Fitted values 5 10 15 20 0 20 40 60
  • 68. 4-68 Is heteroskedasticity present in the class size data? Hard to say…looks nearly homoskedastic, but the spread might be tighter for large values of STR.
  • 69. 4-69 So far we have (without saying so) assumed that u is heteroskedastic: Recall the three least squares assumptions: 1. The conditional distribution of u given X has mean zero, that is, E(u|X = x) = 0. 2. (Xi,Yi), i =1,…,n, are i.i.d. 3. X and u have four finite moments. Heteroskedasticity and homoskedasticity concern var(u|X=x). Because we have not explicitly assumed homoskedastic errors, we have implicitly allowed for heteroskedasticity.
  • 70. 4-70 What if the errors are in fact homoskedastic?:  You can prove some theorems about OLS (in particular, the Gauss-Markov theorem, which says that OLS is the estimator with the lowest variance among all estimators that are linear functions of (Y1,…,Yn); see Section 15.5).  The formula for the variance of 1 ˆ  and the OLS standard error simplifies (App. 4.4): If var(ui|Xi=x) = 2 u  , then var( 1 ˆ  ) = 2 2 var[( ) ] ( ) i x i X X u n    = … = 2 2 u X n   Note: var( 1 ˆ  ) is inversely proportional to var(X): more spread in X means more information about 1 ˆ  .
  • 71. 4-71 General formula for the standard error of 1 ˆ  is the of: 1 2 ˆ ˆ  = 2 2 1 2 2 1 1 ˆ ( ) 1 2 1 ( ) n i i i n i i X X u n n X X n               . Special case under homoskedasticity: 1 2 ˆ ˆ  = 2 1 2 1 1 ˆ 1 2 1 ( ) n i i n i i u n n X X n        . Sometimes it is said that the lower formula is simpler.
  • 72. 4-72 The homoskedasticity-only formula for the standard error of 1 ˆ  and the “heteroskedasticity-robust” formula (the formula that is valid under heteroskedasticity) differ – in general, you get different standard errors using the different formulas. Homoskedasticity-only standard errors are the default setting in regression software – sometimes the only setting (e.g. Excel). To get the general “heteroskedasticity-robust” standard errors you must override the default. If you don’t override the default and there is in fact heteroskedasticity, you will get the wrong standard errors (and wrong t-statistics and confidence intervals).
  • 73. 4-73 The critical points:  If the errors are homoskedastic and you use the heteroskedastic formula for standard errors (the one we derived), you are OK  If the errors are heteroskedastic and you use the homoskedasticity-only formula for standard errors, the standard errors are wrong.  The two formulas coincide (when n is large) in the special case of homoskedasticity  The bottom line: you should always use the heteroskedasticity-based formulas – these are conventionally called the heteroskedasticity-robust standard errors.
  • 74. 4-74 Heteroskedasticity-robust standard errors in STATA regress testscr str, robust Regression with robust standard errors Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 ------------------------------------------------------------------------- | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671 _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057 ------------------------------------------------------------------------- Use the “, robust” option!!!
  • 75. 4-75 Summary and Assessment (Section 4.10)  The initial policy question: Suppose new teachers are hired so the student- teacher ratio falls by one student per class. What is the effect of this policy intervention (this “treatment”) on test scores?  Does our regression analysis give a convincing answer? Not really – districts with low STR tend to be ones with lots of other resources and higher income families, which provide kids with more learning opportunities outside school…this suggests that corr(ui,STRi) > 0, so E(ui|Xi) 0.
  • 76. 4-76 Digression on Causality The original question (what is the quantitative effect of an intervention that reduces class size?) is a question about a causal effect: the effect on Y of applying a unit of the treatment is 1.  But what is, precisely, a causal effect?  The common-sense definition of causality isn’t precise enough for our purposes.  In this course, we define a causal effect as the effect that is measured in an ideal randomized controlled experiment.
  • 77. 4-77 Ideal Randomized Controlled Experiment  Ideal: subjects all follow the treatment protocol – perfect compliance, no errors in reporting, etc.!  Randomized: subjects from the population of interest are randomly assigned to a treatment or control group (so there are no confounding factors)  Controlled: having a control group permits measuring the differential effect of the treatment  Experiment: the treatment is assigned as part of the experiment: the subjects have no choice, which means that there is no “reverse causality” in which subjects choose the treatment they think will work best.
  • 78. 4-78 Back to class size:  What is an ideal randomized controlled experiment for measuring the effect on Test Score of reducing STR?  How does our regression analysis of observational data differ from this ideal? oThe treatment is not randomly assigned oIn the US – in our observational data – districts with higher family incomes are likely to have both smaller classes and higher test scores. oAs a result it is plausible that E(ui|Xi=x) 0. oIf so, Least Squares Assumption #1 does not hold. oIf so, 1 ˆ  is biased: does an omitted factor make class size seem more important than it really is?