Ps602 notes part1

Advanced Quantitative Methods - PS 602
Notes – Version as of 8/30/12
Robert D. Duval - WVU Dept of Political Science

Class Office
116 Woodburn 301A Woodburn
Th 9:00-11:30 M-F 12:00-1:00
Phone 304-293-9537 (office)
304-599-8913
(home)
email: Bob.Duval@mail.wvu.edu
(Do not use Mix email address!)

1/18/2013

Syllabus
 Required texts
 Additional readings
 Computer exercises
 Course requirements
 Midterm - in class, open book (30%)
 Final – in-class, open book (30%)
 Research paper (30%)
 Participation (10%)
 http://www.polsci.wvu.edu/duval/ps602/602syl.htm

January 18, 2013 Slide 2

Prerequisites
 An fundamental understanding of calculus
 An informal but intuitive understanding of the
mathematics of Probability

 A sense of humor


Statistics is an innate cognitive skill
 We all possess the ability to do rudimentary statistical
analysis
 – in our heads
 – intuitively.
 The cognitive machinery for stats is built in to us, just like it
is for calculus.
 This is part of how we process information about the world
 It is not simply mysterious arcane jargon
 It is simply the mysterious arcane way you already think.
 Much of it formalizes simple intuition.
 Why do we set alpha (α) to .05?


Introduction
 This course is about Regression analysis.
 The principle method in the social science
 Three basic parts to the course:
 An introduction to the general Model
 The formal assumptions and what they mean.
 Selected special topics that relate to regression and linear
models.


Introduction: The General Linear
Model
 The General Linear Model (GLM) is a phrase used to
indicate a class of statistical models which include
simple linear regression analysis.
 Regression is the predominant statistical tool used in
the social sciences due to its simplicity and versatility.
 Also called
 Linear Regression Analysis.
 Multiple Regression
 Ordinary Least Squares


Simple Linear Regression:
The Basic Mathematical Model

 Regression is based on the concept of the simple
proportional relationship - also known as the
straight line.

 We can express this idea mathematically!
 Theoretical aside: All theoretical statements of
relationship imply a mathematical theoretical
structure.
 Just because it isn‟t explicitly stated doesn‟t mean
that the math isn‟t implicit in the language itself!


Math and language
 When we speak about the world, we often have
imbedded implicit relationships that we are referring to.

 For instance:
 Increasing taxes will send us into recession.
 Decreasing taxes will spur economic growth.


From Language to models
 The idea that reducing taxes on the wealthy will spur
economic growth (or increasing taxes will harm economic
growth) suggests that there is a proportional relationship
between tax rates and growth in domestic product.
 So lets look!
 Disclaimer! The “models” that follow are meant to be
examples. They are not “good” models, only useful ones to
talk about!


GDP and Average Tax Rates: 1986-2009

Sources: (1) US Bureau of Economic Analysis, http://www.bea.gov/iTable/iTable.cfm?ReqID=9&step=1
(2) US Internal Revenue Service, http://www.irs.gov/pub/irs-soi/09in05tr.xls

The Stats
regress gdp avetaxrate

Source | SS df MS Number of obs = 24
-------------+------------------------------ F( 1, 22) = 5.11
Model | 44236671 1 44236671 Prob > F = 0.0340
Residual | 190356376 22 8652562.57 R-squared = 0.1886
-------------+------------------------------ Adj R-squared = 0.1517
Total | 234593048 23 10199697.7 Root MSE = 2941.5

------------------------------------------------------------------------------
gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avetaxrate | -1341.942 593.4921 -2.26 0.034 -2572.769 -111.1148
_cons | 26815.88 7910.084 3.39 0.003 10411.37 43220.39
------------------------------------------------------------------------------


The Stats
regress gdp avetaxrate

-------------+------------------------------ F( 1, 22) = 5.11
Model | 44236671 1 44236671 Prob > F = 0.0340
Residual | 190356376 22 8652562.57 R-squared = 0.1886
-------------+------------------------------ Adj R-squared = 0.1517
Total | 234593048 23 10199697.7 Root MSE = 2941.5

------------------------------------------------------------------------------
gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avetaxrate | -1341.942 593.4921 -2.26 0.034 -2572.769 -111.1148
_cons | 26815.88 7910.084 3.39 0.003 10411.37 43220.39
------------------------------------------------------------------------------


But wait, there‟s more…
 What is the model?
 There is a directly proportional relationship between tax
rates and economic growth.
 How about an equation?
 We will get back to this…
 Can you critique the “model?”
 Can we look at it differently?
 If higher


Effective Tax Rate and Growth in GDP:
1986-2009

Sources: (1) US Bureau of Economic Analysis, http://www.bea.gov/iTable/iTable.cfm?ReqID=9&step=1
(2) US Internal Revenue Service, http://www.irs.gov/pub/irs-soi/09in05tr.xls

. regress gdpchange avetaxrate

-------------+------------------------------ F( 1, 21) = 6.15
Model | 22.4307279 1 22.4307279 Prob > F = 0.0217
Residual | 76.5815075 21 3.64673845 R-squared = 0.2265
-------------+------------------------------ Adj R-squared = 0.1897
Total | 99.0122354 22 4.50055615 Root MSE = 1.9096

------------------------------------------------------------------------------
gdpchange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
avetaxrate | .9889805 .3987662 2.48 0.022 .1597008 1.81826
_cons | -7.977765 5.292757 -1.51 0.147 -18.98466 3.029126
------------------------------------------------------------------------------


Finally…sort of

. regress gdpchange taxratechange

-------------+------------------------------ F( 1, 21) = 11.00
Model | 34.0305253 1 34.0305253 Prob > F = 0.0033
Residual | 64.9817101 21 3.09436715 R-squared = 0.3437
-------------+------------------------------ Adj R-squared = 0.3124
Total | 99.0122354 22 4.50055615 Root MSE = 1.7591

-------------------------------------------------------------------------------
gdpchange | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
taxratechange | .2803104 .0845261 3.32 0.003 .1045288 .456092
_cons | 5.414708 .3780098 14.32 0.000 4.628593 6.200822
-------------------------------------------------------------------------------


Alternate Mathematical
Notation for the Line
 Alternate Mathematical Notation for the straight line -
don‟t ask why!
 10th Grade Geometry
y = mx + b
 Statistics Literature
Yi a bX i ei
 Econometrics Literature
Y i = B 0 + B 1 X i +e i

 Or (Like your Textbook)
Yi = B1 +B2Xi + ui

Alternate Mathematical
Notation for the Line – cont.
 These are all equivalent. We simply have to live with
this inconsistency.

 We won‟t use the geometric tradition, and so you just
need to remember that B0 and a are both the same
thing.


Linear Regression: the Linguistic
Interpretation
 In general terms, the linear model states that the
dependent variable is directly proportional to the value
of the independent variable.

 Thus if we state that some variable Y increases in
direct proportion to some increase in X, we are stating
a specific mathematical model of behavior - the linear
model.

 Hence, if we say that the crime rate goes up as
unemployment goes up, we are stating a simple linear
model.


Linear Regression:
A Graphic Interpretation

The Straight Line

12

10

8

Y 6

4

2

0
1 2 3 4 5 6 7 8 9 10
X


The linear model is represented
by a simple picture

Simple Linear Regression
12

10

8

Y 6

4

2

0
1 2 3 4 5 6 7 8 9 10

X


The Mathematical Interpretation:
The Meaning of the Regression
Parameters
 a = the intercept
 the point where the line crosses the Y-axis.
 (the value of the dependent variable when all of the
independent variables = 0)

 b = the slope
 the increase in the dependent variable per unit change
in the independent variable. (also known as the 'rise
over the run')


The Error Term
 Such models do not predict behavior perfectly.
 So we must add a component to adjust or compensate
for the errors in prediction.

 Having fully described the linear model, the rest of the
semester (as well as several more) will be spent of the
error.


The Nature of Least Squares
Estimation
 There is 1 essential goal and there are 4 important
concerns with any OLS Model


The 'Goal' of Ordinary Least
Squares
 Ordinary Least Squares (OLS) is a method of finding
the linear model which minimizes the sum of the
squared errors.

 Such a model provides the best explanation/prediction
of the data.


Why Least Squared error?
 Why not simply minimum error?
 The error‟s about the line sum to 0.0!
 Minimum absolute deviation (error) models now exist,
but they are mathematically cumbersome.

 Try algebra with | Absolute Value | signs!


Other models are possible...
 Best parabola...?
 (i.e. nonlinear or curvilinear relationships)
 Best maximum likelihood model ... ?
 Best expert system...?
 Complex Systems…?
 Chaos/Non-linear systems models
 Catastrophe models
 others


The Simple Linear Virtue
 I think we over emphasize the linear model.
 It does, however, embody this rather important notion
that Y is proportional to X.
 As noted, we can state such relationships in simple
English.
 As unemployment increases, so does the crime rate.
 As domestic conflict increased, national leaders will
seek to distract their populations by initiating foreign
disputes.


The Notion of Linear Change
 The linear aspect means that the same amount of
increase in unemployment will have the same effect on
crime at both low and high unemployment.

 A nonlinear change would mean that as unemployment
increases, its impact upon the crime rate might
increase at higher unemployment levels.


Why squared error?
 Because:
 (1) the sum of the errors expressed as deviations would
be zero as it is with standard deviations, and
 (2) some feel that big errors should be more influential
than small errors.

 Therefore, we wish to find the values of a and b that
produce the smallest sum of squared errors.


The Parameter estimates
 In order to do this, we must find parameter estimates
which accomplish this minimization.

 In calculus, if you wish to know when a function is at its
minimum, you set the first derivative equal to zero.

 In this case we must take partial derivatives since we
have two parameters (a & b) to worry about.

 We will look closer at this and it‟s not a pretty sight!


Decomposition of the error in
LS


Goodness of Fit

 Since we are interested in how well the model
performs at reducing error, we need to develop a
means of assessing that error reduction.
 Since the mean of the dependent variable
represents a good benchmark for comparing
predictions, we calculate the improvement in the
prediction of Yi relative to the mean of Y
 (the best guess of Y with no other information).


Sum of Squares Terminology
 In mathematical jargon we seek to minimize the
Unexplained Sum of Squares (USS), where:

 2
U SS ( Yi Yi )
2
ei


Sums of Squares
 This gives us the following 'sum-of-squares'
measures:
 Total Variation = Explained Variation +
Unexplained Variation
2
TSS Total Sum of Squares (Y i Y)

ESS Explained Sum of Squares ˆ
(Y i Y)
2

USS Un exp lained Sum of Squares (Y i ˆ 2
Yi )


Sums of Squares Confusion
 Note: Occasionally you will run across ESS and RSS
which generate confusion since they can be used
interchangeably.
 ESS can be the error sums-of-squares, or alternatively,
the estimated or explained SSQ.
 Likewise RSS can be the residual SSQ, or the regression
SSQ.
 Hence the use of USS for Unexplained SSQ in this
treatment.
 Gujarati uses ESS (Explained sum of squares) and RSS
(Residual sum of squares)


The Parameter estimates
 In order to find the „best‟ parameters, we must find
parameter estimates which accomplish this minimization.
 That is the parameters that have the smallest sum f squared
errors.
 In calculus, if you wish to know when a function is at its
minimum, you take the first derivative and set it equal to 0.0
 The second derivative must be positive as well, but we do not
need to go there.
 In this case we must take partial derivatives since we have
two parameters to worry about.


Deriving the Parameter
Estimates

 Since U SS ( Yi Yi )
2

2
ei
2
( Yi a bX i )

 We can take the partial derivative of USS with respect
to both a and b.
U SS
2 ( Yi a bX i )( 1)
a
U SS
2 ( Yi a bX i )( X i )
b


Deriving the Parameter Estimates
(cont.)
 Which simplifies to

U SS
2 ( Yi a bX i ) 0
a
U SS
2 X i ( Yi a bX i ) 0
b

 We also set these derivatives to 0 to indicate that we
are at a minimum.


(cont.)
 We now add a “hat” to the parameters to indicate that
the results are estimators.

U SS  
2 ( Yi a bXi) 0
a
U SS  
2 X i ( Yi a bXi) 0
b

 We also set these derivatives equal to zero.


(cont.)
 Dividing through by -2 and rearranging terms, we get

 
Yi an b( X i ),
  2
X i Yi a( Xi) b( Xi )


(cont.)
 We can solve these equations simultaneously to get
our estimators.

 n X i Yi Xi Yi
b1 2 2
n Xi ( Xi)

(Xi X )( Yi Y)
2
(Xi X)
 
a Y b1 X

(cont.)
 The estimator for a also shows that the regression line
always goes through the point which is the intersection
of the two means.
 This formula is quite manageable for bivariate
regression. If there are two or more independent
variables, the formula for b2, etc. becomes
unmanageable!

 See matrix algebra in POLS603!


Tests of Inference
 t-tests for coefficients
 F-test for entire model
 Since we are interested in how well the model performs at
reducing error, we need to develop a means of assessing
that error reduction.
 Since the mean of the dependent variable represents a good
benchmark for comparing predictions, we calculate the
improvement in the prediction of Yi relative to the mean of Y
 Remember that the mean of Y is your best guess of Y with no
other information.
 Well, often, assuming the data is normally distributed!


T-Tests
 Since we wish to make probability statements about
our model, we must do tests of inference.
 Fortunately,
B
tn 2
se B

 OK, so what is the seB?


Standard Errors of Estimates
 These estimates have variation associated with
them

sea =
å Xi2
s
nå x 2
i

s
seB =
åx
ˆ 2
1
i


This gives us the F test:
 The F-test tells us whether the full model is significant

( ESS /( k 1)
Fk 1, n k
USS /( n k)

 Note that F statistics have 2 different degrees of
freedom: k-1 and n-k, where k is the number of
regressors in the model.


More on the F-test
 In addition, the F test for the entire model must be
adjusted to compensate for the changed degrees of
freedom.
 Note that F increases as n or R2 increases and
decreases as k – the number of independent variables
- increases.


F test for different Models
 The F test can tell us whether two different models
of the same dependent variable are significantly
different.
 i.e. whether adding a new variable will
significantly improve estimation

2 2
R new R old / number of new regressors
F 2
1 R new / df ( n number of parameters in the new model)


The correlation coefficient
 A measure of how close the residuals are to the
regression line

 It ranges between -1.0 and +1.0
 It is closely related to the slope.


R2 (r-square, r2)
 The r2 (or R-square) is also called the coefficient of determination.

2
E SS
r
T SS
U SS
1
T SS
 It is the percent of the variation in Y explained by X

 It must range between 0.0 and 1.0.

 An r2 of .95 means that 95% of the variation about the mean of Y is “caused” (or at least
explained) by the variation in X


Observations on r2
 R2 always increases as you add independent variables.
 The r2 will go up even if X2 , or any new variable, is a
completely random variable.

 r2 is an important statistic, but it should never be seen
as the focus of your analysis.

 The coefficient values, their interpretation, and the tests
of inference are really more important.

 Beware of r2 = 1.0 !!!


The Adjusted R 2

 Since R2 always increases with the addition of a new
variable, the adjusted R2 compensates for added
explanatory variables.

2 [USS /( n k 1)]
R 1
[TSS /( n 1)]

 Note that it may range < 0.0 and greater than 1.0!!!
 But these values indicate poorly formed models.


Comments on the adjusted R-squared
 R-squared will always go up with the addition of a new
variable.

 Adjusted r-squared will go down if the variable
contributes nothing new to the explanation of the
model.
 As a rule of thumb, if the new variable has a t-value
greater than 1.0, it increases the adjusted r-square.


The assumptions of the
model
 We will spend the next 7 weeks on this!


Model
The Scalar Version
 The basic multiple regression model is a simple
extension of the bivariate equation.
 By adding extra independent variables, we are creating
a multiple-dimensioned space, where the model fit is a
some appropriate space.
 For instance, if there are two independent variables,
we are fitting the points to a „plane in space‟.
 Visualizing this in more dimensions is a good trick.


The Scalar Equation
 The basic linear model:

Yi a b1 X 1i b2 X 2 i ... bk X ki ei

 If bivariate regression can be described as a line on a
plane, multiple regression represents a k-dimensional
object in a k+1 dimensional space.


The Matrix Model
 We can use a different type of mathematical structure
to describe the regression model
 Frequently called Matrix or Linear Algebra
 The multiple regression model may be easily
represented in matrix terms.

Y XB e
 Where the Y, X, B and e are all matrices of data,
coefficients, or residuals


The Matrix Model (cont.)
 The matrices in Y XB e represented by
are

Y1 X 11 X 12 ... X ik B1 e1
Y2 X 21 X 22 ... X 2k B2 e2
Y X B e
 ... ... ... ...
Yn X n1 X n2 ... X nk Bk en

 Note that we postmultiply X by B since this order makes them
conformable.
 Also note that X1 is a column of 1‟s to obtain an intercept term.


Assumptions of the model
Scalar Version
 The OLS model has seven fundamental assumptions.
 This count varies from author to author, based upon what each
author thinks is a separate assumption!
 These assumptions form the foundation for all regression
analysis.
 Failure of a model to conform to these assumptions
frequently presents problems for estimation and inference.
 The “problem” may range from minor to severe!
 These problems, or violations of the assumptions, almost
invariably arise out of substantive or theoretical problems!


Model
Scalar Version (cont.)
 1. The ei's are normally distributed.
 2. E(ei) = 0
 3. E(ei2) = 2
 4. E(eiej) = 0 (i j)
 5. X's are nonstochastic with values fixed in repeated
samples and å(Xik - Xk )2 isna finite nonzero number.
/
 6. The number of observations is greater than the
number of coefficients estimated.
 7. No exact linear relationship exists between any of the
explanatory variables.


Model
The English Version
 The errors have a normal distribution.
 The residuals are heteroskedastic.
 (The variation in the errors doesn‟t change across values of the independent or
dependent variables)

 There is no serial correlation in the errors.
 The errors are unrelated to their neighbors.)

 There is no multicollinearity.
 (No variable is a perfect function of another variable.)

 The X‟s are fixed (non-stochastic) and have some variation, but no infinite
values.

 There are more data points than unknowns.
 The model is linear in its parameters..
 All modeled relationships are directly proportional

 OK…so it‟s not really English….

Model:
The Matrix Version
 These same assumptions expressed in matrix format
are:

 1. e N(0, )
 2. = 2I
 3. The elements of X are fixed in repeated samples and
(1/ n)X'X is nonsingular and its elements are finite


Properties of Estimators ( )
 Since we are concerned with error, we will be
concerned with those properties of estimators which
have to do with the errors produced by the estimates -
the s
 We use the symbol to denote a general parameter
 It could represent a regression slope (B), a sample
mean (Xbar), a standard deviation (s), and many
other statistics or estimators based on some sample.


Types of estimator error
 Estimators are seldom exactly correct due to any
number of reasons, most notably sampling error and
biased selection.
 There are several important concepts that we need to
understand in examining how well estimators do their
job.


Sampling error
 Sampling error is simply the difference between the
true value of a parameter and its estimate in any given
sample.

Sam pling E rror

 This sampling error means that an estimator will vary
from sample to sample and therefore estimators have
variance.
 2  2
2  2
V ar ( ) E[ E ( )] E( ) [ E ( )]


Bias
 The bias of an estimate is the difference between its
expected value and its true value.
 If the estimator is always low (or high) then the
estimator is biased.
 An estimator is unbiased if


E( )
 And

B ias E( )

Mean Squared Error
 The mean square error (MSE) is different from the
estimator‟s variance in that the variance measures
dispersion about the estimated parameter while mean
squared error measures the dispersion about the true
parameter.
 2
M ean square error E( )

 If the estimator is unbiased then the variance and MSE
are the same.


Mean Squared Error (cont.)
 The MSE is important for time series and forecasting
since it allows for both bias and efficiency:

2
 For Instance SE = variance + (bias)
M
 These concepts lead us to look at the properties of
estimators. Estimators may behave differently in large
and small samples, so we look at both
 small sample and,
 large (asymptotic) sample properties.


Small Sample Properties
 These are the ideal properties. We desire these to
hold.
 Unbiasedness
 Efficiency
 Best Linear Unbiased Estimator
 If the small sample properties hold, then by extension,
the large sample properties hold.


Bias
 A parameter is unbiased if

E( )
 In other words, the average value of the estimator in
repeated sampling equals the true parameter.
 Note that whether an estimator is biased or not implies
nothing about its dispersion.


Bias

1

0.8
Normal
0.6 (s=1.0)
Prob

0.4 Series3

0.2

0
-3

0.2
1
1.8
2.6
-2.2
-1.4
-0.6

X


Efficiency
 An estimator is efficient if it is unbiased and where its variance is
less than any other unbiased estimator of the parameter.

 Is unbiased;
 ~
 Var( ) Var ( )

~
where is any other unbiased estimator of

 There might be instances in which we might choose a biased
estimator, if it has a smaller variance.


Efficiency

0.5

0.4
Normal
0.3 (s=1.0)
Prob

0.2 Normal
(s=2.0)
0.1

0
-3

0.2
1
1.8
2.6
-2.2
-1.4
-0.6

X

Series 1 (s=1.0) is more efficient than Series 2 (s=2.0)

BLUE (Best Linear
Unbiased Estimate)
 An estimator is described as a BLUE estimator if it is
  is a linear function

 is unbiased
 Var(  ) Var ( )
~
where
~ is any other linear unbiased estimator of


What is a linear estimator?
 A linear estimator looks like the formula for a straight
line
ˆ a1 x1 a2 x2 a3 x3 ... an xn

 The linearity referred to is the linearity in the
parameters
 Note that the sample mean is an example of a
linear estimator.
1 1 1 1
X X1 X2 X3 ... Xn
n n n n

BLUE is Bueno
 If your estimator (e.g. the Regression B) is the BLUE
estimator, then you have a very good estimator –
relative to other regression style estimators.
 The problem is that if certain assumptions are violated,
then OLS may no longer be the “best” estimator.
 There might be a better one!
 You can still hope that the large sample properties hold
though!


Asymptotic (Large Sample)
Properties
 Asymptotically unbiased
 Consistency
 Asymptotic efficiency


Asymptotic bias
 An estimator is unbiased if
lim E ( ˆn )
n

 As the sample size gets larger the estimated
parameter gets closer to the true value.
 For instance:
2
2
(Xi X) 2 2 1
S and E(S ) 1-
n n


Consistency
 The point at which a distribution collapses is called the
probability limit (plim)
 If the bias and variance both decrease as the sample
size gets larger, the estimator is consistent.
ˆ
lim
n
P 1 0

 Usually noted by
plim ˆ
n


Asymptotic efficiency
 An estimator is asymptotically efficient if
 asymptotic distribution with finite mean and variance
 is consistent
 no other estimator has smaller asymptotic variance


Rifle and Target Analogy
 Small sample properties
 Bias: The shots cluster around some spot other than the
bull‟s-eye)
 Efficient: When one rifle‟s cluster is smaller than another‟s.
 BLUE - Smallest scatter for rifles of a particular type of
simple construction


Rifle and Target Analogy
(cont.)
 Asymptotic properties
 Think of increased sample size as getting closer to the target.
 Asymptotic Unbiasedness means that as the sample size gets larger,
the center of the point cluster moves closer to the target center.
 With consistency, the point cluster moves closer to the target center
and cluster shrinks in size.
 If it is asymptotically efficient, then no other rifle has a smaller cluster
that is closer to the true center.
 When all of the assumptions of the OLS model hold its estimators
are:
 unbiased
 Minimum variance, and
 BLUE


How we will approach the
question.
 Definition
 Implications
 Causes
 Tests
 Remedies


Non-zero Mean for the
residuals (Definition)
 Definition:
 The residuals have a mean other than 0.0.
 Note that this refers to the true residuals. Hence the
estimated residuals have a mean of 0.0, while the true
residuals are non-zero.


Non-zero Mean for the
residuals (Implications)
 The true regression line is

Yi a bX i e

 Therefore the intercept is biased.
 The slope, b, is unbiased. There is also no way of
separating out a and .


Non-zero Mean for the residuals
(Causes, Tests, Remedies)
 Causes: Non-zero means result from some form of
specification error. Something has been omitted from
the model which accounts for that mean in the
estimation.

 We will discuss Tests and Remedies when we look
closely at Specification errors.


Non-normally distributed
errors : Definition
 The residuals are not NID(0, )

Normality Tests Section
Assumption Value Probability Decision(5%)
Skewness 5.1766 0.000000 Rejected
Kurtosis 4.6390 0.000004 Rejected
Omnibus 48.3172 0.000000 Rejected
Histogram of Residuals of rate90

35.0

26.3
Count

17.5

8.8

0.0
-1000.0 -250.0 500.0 1250.0 2000.0
Residuals of rate90


errors : Implications
 The existence of residuals which are not normally
distributed has several implications.
 First is that it implies that the model is to some
degree misspecified.
 A collection of truly stochastic disturbances should
have a normal distribution. The central limit theorem
states that as the number of random variables
increases, the sum of their distributions tends to be a
normal distribution.
 Distribution theory – beyond the scope of this course


errors : Implications (cont.)
 If the residuals are not normally distributed, then the
estimators of a and b are also not normally distributed.
 Estimates are, however, still BLUE.
 Estimates are unbiased and have minimum variance.
 They are no longer efficient, even though they are
asymptotically unbiased and consistent.
 It is only our hypothesis tests which are suspect.


errors: Causes
 Generally causes by a misspecification error.
 Usually an omitted variable.
 Can also result from
 Outliers in data.
 Wrong functional form.


errors : Tests for non-
normality
 Chi-Square goodness of fit
 Since the cumulative normal frequency distribution
has a chi-square distribution, we can test for the
normality of the error terms using a standard chi-
square statistic.
 We take our residuals, group them, and count how
many occur in each group, along with how many we
would expect in each group.


normality (cont.)
 We then calculate the simple 2 statistic.

2
k
2
Oi Ei
 This statistic has (N-1) degrees of ifreedom, where N is
i 1 E
the number of classes.


normality (cont.)
 Jarque-Bera test
 This test examines both the skewness and kurtosis of a
distribution to test for normality.

 Where S is the skewness and K is the kurtosis of the
2 2
S (K 3)
residuals. JB n
6 24
 JB has a 2 distribution with 2 df.


errors: Remedies
 Try to modify your theory. Omitted variable? Outlier
needing specification?
 Modify your functional form by taking some variance
transforming step such as square root, exponentiation,
logs, etc.
 Be mindful that you are changing the nature of the model.
 Bootstrap it!
 From the shameless commercial division!


Multicollinearity: Definition
 Multicollinearity is the condition where the independent
variables are related to each other. Causation is not
implied by multicollinearity.
 As any two (or more) variables become more and more
closely correlated, the condition worsens, and
„approaches singularity‟.
 Since the X's are fixed (or they are supposed to be
anyway), this a sample problem.
 Since multicollinearity is almost always present, it is a
problem of degree, not merely existence.


Multicollinearity:
Implications
 Consider the following cases
A. No multicollinearity
 The regression would appear to be identical to separate
bivariate regressions
 This produces variances which are biased upward (too large)
making t-tests too small.
 The coefficients are unbiased.
 For multiple regression this satisfies the assumption.


Multicollinearity:
Implications (cont.)
B. Perfect Multicollinearity
 Some variable Xi is a perfect linear combination of one or
more other variables Xj, therefore X'X is singular, and |X'X| =
0.
 This is matrix algebra notation. It means that one variable is a
perfect linear function of another. (e.g. X2 = X1+3.2)
 The effects of X1 and X2 cannot be separated.
 The standard errors for the B‟s are infinite.
 A model cannot be estimated under such circumstances. The
computer dies.
 And takes your model down with it…


Multicollinearity:
C. A high degree of Multicollinearity
 When the independent variables are highly correlated the
variances and covariances of the Bi's are inflated (t ratio's are
lower) and R2 tends to be high as well.
 The B's are unbiased (but perhaps useless due to their imprecise
measurement as a result of their variances being too large). In fact
they are still BLUE.
 OLS estimates tend to be sensitive to small changes in the data.
 Relevant variables may be discarded.


Multicollinearity: Causes
 Sampling mechanism. Poorly constructed design & measurement
scheme or limited range.
 Too small a sample range
 Constrained theory:
 (X1 does affect X2) e.g. Elect consump = wealth + House size
 Statistical model specification: adding polynomial terms or trend
indicators.
 Too many variables in the model - the model is over-determined.
 Theoretical specification is wrong. Inappropriate construction of
theory or even measurement.
 If your dependent variable is constructed using an independent
variable


Multicollinearity:
Tests/Indicators
 |X'X| approaches 0.0
 The variance covariance matrix is singular, so it‟s
“determinant” is 0.0
 Since the determinant is a function of variable scale, this
measure doesn't help a whole lot. We could, however,
use the determinant of the correlation matrix and
therefore bound the range from 0. to 1.0


Multicollinearity:
Tests/Indicators (cont.)
 Tolerance:
 If the tolerance equals 1, the variables are unrelated. If
Tolj = 0, then they are perfectly correlated.
 To calculate, regress each Independent variable on all the
other independent variables
 Variance Inflation Factors (VIFs)
1
VIF 2
1 Rk
 Tolerance
2
TOL j 1 Rj (1 / (VIF j ))


Interpreting VIFs
 No multicollinearity produces VIFs = 1.0
 If the VIF is greater than 10.0, then multicollinearity is
probably severe. 90% of the variance of Xj is explained
by the other X‟s.

 In small samples, a VIF of about 5.0 may indicate
problems


Multicollinearity:
 R2 deletes - tries all possible models of X's and by
includes/ excludes based on small changes in R2
with the inclusion/omission of the variables (taken
1 at a time)
 F is significant, But no t value is.
 Adjusted R2 declines with a new variable
 Multicollinearity is of concern when either

rX 1 X 2 r X 1Y
rX 1 X 2 rX 2Y

Multicollinearity:
 I would avoid the rule of thumb
rX X1
.6
2

 Beta's are > 1.0 or < -1.0
 Sign changes occur with the introduction of a new
variable
 The R2 is high, but few t-ratios are.
 Eigenvalues and Condition Index - If this topic is
beyond Gujarati, it‟s beyond me.


Multicollinearity: Remedies
 Increase sample size
 Pooled cross-sectional time series
 Thereby introducing all sorts of new problems!
 Omit Variables
 Scale Construction/Transformation
 Factor Analysis
 Constrain the estimation. Such as the case where you
can set the value of one coefficient relative to another.


Multicollinearity: Remedies
(cont.)
 Change design (LISREL maybe or Pooled cross-
sectional Time series)
 Thereby introducing all sorts of new problems!
 Ridge Regression
 This technique introduces a small amount of bias into the
coefficients to reduce their variance.
 Ignore it - report adjusted R2 and claim it warrants
retention in the model.


Heteroskedasticity:
Definition
 Heteroskedasticity is a problem where the error terms
do not have a constant variance.
2 2
E ( ei ) i

 That is, they may have a larger variance when values
of some Xi (or the Yi‟s themselves) are large (or small).


Heteroskedasticity:
Definition
 This often gives the plots of the residuals by the
dependent variable or appropriate independent
variables a characteristic fan or funnel shape.

180
160
140
120
100
Series1
80
60
40
20
0
0 50 100 150


Heteroskedasticity:
Implications
 The regression B's are unbiased.
 But they are no longer the best estimator. They are
not BLUE (not minimum variance - hence not
efficient).
 They are, however, consistent.


Heteroskedasticity:
 The estimator variances are not asymptotically
efficient, and they are biased.
 So confidence intervals are invalid.
 What do we know about the bias of the variance?
 If Yi is positively correlated with ei, bias is negative -
(hence t values will be too large.)
 With positive bias many t's too small.


Heteroskedasticity:
 Types of Heteroskedasticity
 There are a number of types of heteroskedasticity.
 Additive
 Multiplicative
 ARCH (Autoregressive conditional heteroskedastic) - a time
series problem.


Heteroskedasticity: Causes
 It may be caused by:
 Model misspecification - omitted variable or improper
functional form.
 Learning behaviors across time
 Changes in data collection or definitions.
 Outliers or breakdown in model.
 Frequently observed in cross sectional data sets where
demographics are involved (population, GNP, etc).


Heteroskedasticity: Tests
 Informal Methods
 Plot the data and look for patterns!
 Plot the residuals by the predicted dependent variable
(Resids on the Y-axis)
 Plotting the squared residuals actually makes more sense,
since that is what the assumption refers to!
 Homoskedasticity will be a random scatter horizontally
across the plot.


(cont.)
 Park test
 As an exploratory test, log the residuals and regress them
on the logged values of the suspected independent
variable.
2 2

ln u i ln B ln X i vi
a B ln X i vi
 If the B is significant, then heteroskedasticity may be a
problem.


(cont.)
 Glejser Test
 This test is quite similar to the park test, except that it uses the absolute
values of the residuals, and a variety of transformed X‟s.

1

ui B1 B2 X i vi 
ui B1 B2 vi
Xi

ui B1 B2 Xi vi

ui B1 B2 X i vi
1

ui B1 B2 vi 
ui B1 B2 X i
2
vi
Xi
 A significant B2 indicated Heteroskedasticity.
 Easy test, but has problems.


(cont.)
 Goldfeld-Quandt test
 Order the n cases by the X that you think is correlated
with ei2.
 Drop a section of c cases out of the middle
(one-fifth is a reasonable number).
 Run separate regressions on both upper and lower
samples.


(cont.)
 Goldfeld-Quandt test (cont.)
 Do F-test for difference in error variances
F has (n - c - 2k)/2 degrees of freedom for each

s ee 1
F n c 2k n c 2k
(
2
,
2
) s ee 2


(cont.)
 Breusch-Pagan-Godfrey Test (Lagrangian Multiplier
test)
 Estimate model with OLS
 Obtain
~2 2
ui / n
 Construct variables

pi ˆ
ui
2 ~ 2
/ i


(cont.)
 Breusch-Pagan-Godfrey Test (cont.)
 Regress pi on the X (and other?!) variables

 Calculate
pi 1 2
Z 2i 3
Z 3i ... m
Z mi

 Note that 1
(ESS )
2

2
m 1


(cont.)
 White‟s Generalized Heteroskedasticity test
 Estimate model with OLS and obtain residuals
 Run the following auxiliary regression

 Higher powers may also be used, along with more X‟s

1
(ESS )
2


(cont.)
 White‟s Generalized Heteroskedasticity test (cont.)
 Note that 2 2
n R

 The degrees of freedom is the number of coefficients
estimated above.


Heteroskedasticity:
Remedies
 GLS
 We will cover this after autocorrelation
 Weighted Least Squares
 si2 is a consistent estimator of σi2
 use same formula (BLUE) to get a & ß


Iteratively weighted least squares (IWLS)

 Iteratively weighted least squares (IWLS)

1. Obtain estimates of ei2 using OLS

Yi a bX i
ei
2. Use these to get "1st round" estimates of σi
3. Using formula above replace wi with 1/ si and obtain new estimates for a
and ß.
4. Adjust data
* Yi * Xi
Yi , Xi
si si
5. Use these to re-estimate

* * * * *
Yi Xi ei
6. Repeat Step 3-5 until a and ß converge.


White‟s corrected standard
errors
 White’s corrected standard errors
 For normal OLS
2
ˆ
Var ( B1 )
TSS x

 We can restate this as n
2 2
( xi x) i
ˆ
Var ( B1 ) i 1
2
TSS x

 Since n
2
( xi x)
i 1

 this is the same when
2 2
January 18, 2013 i Slide 125

errors
(cont.)
 White’s corrected standard errors
 White‟s solution is to use the robust
estimator
n
2 2
ˆ 2ˆ
( rij ) u i
ˆ ˆ
Va r ( B j ) i 1
2
RSS j

 When you see robust standard errors, it
usually refers to this estimator


Obtaining Robust errors
 In Stata, just add a , r to the regress command
 regress approval unemrate
 Becomes
 regress approval unemrate, r


Autocorrelation: Definition
 Autocorrelation is simply the presence of correlation
between adjacent (contemporaneous) residuals.
 If a residual is negative (or positive) then its neighbors
tend to also be negative (or positive).
 Most often autocorrelation is between adjacent
observations, however, lagged or seasonal patterns
can also occur.
 Autocorrelation is also usually a function of order by
time, but it can occur for other orders as well – firm or
state size.


(cont.)
 The assumption violated is

E ( ei e j ) 0

 This means that the Pearson‟s r (correlation
coefficient) between the residuals from OLS and the
same residuals lagged one period (or more) is non-
zero.
E ( ei e j ) 0


(cont.)
 Most autocorrelation is what we call 1st order
autocorrelation, meaning that the residuals are related
to their contiguous values.
 Autocorrelation can be rather complex, producing
counterintuitive patterns and correlations.


(cont.)
 Types of Autocorrelation
 Autoregressive processes
 Moving Averages


(cont.)
 Autoregressive processes AR(p)
 The residuals are related to their preceding
values.
et et 1
ut

 This is classic 1st order autocorrelation


(cont.)
 Autoregressive processes (cont.)
 In 2nd order autocorrelation the residuals are related to
their t-2 values as well

et 1
et 1 2
et 2
ut

 Larger order processes may occur as well

et 1
et 1 2
et 2
... p
et p
ut


(cont.)
 Moving Average Processes MA(q)
et ut ut 1

 The error term is a function of some random error plus
a portion of the previous random error.


(cont.)
 Moving Average Processes (cont.
 Higher order processes for MA(q) also exist.
et ut 1
ut 1 2
ut 2
... q
ut q

 The error term is a function of some random error plus
a portion of the previous random error.


(cont.)
 Mixed processes ARMA(p,q)
et 1
et 1 2
et 2
... p
et p
ut

1
ut 1 2
ut 2
... q
ut q

 The error term is a complex function of both
autoregressive and moving average processes.


(cont.)
 There are substantive interpretations that can be
placed on these processes.
 AR processes represent shocks to systems that have
long-term memory.
 MA processes are quick shocks to a system that can
handle the process „efficiently,‟ having only short term
memory.


Autocorrelation: Implications
 Coefficient estimates are unbiased, but the estimates
are not BLUE

 The variances are often greatly underestimated (biased
small)

 Hence hypothesis tests are exceptionally suspect.
 In fact, strongly significant t-tests (P < .001) may well be
insignificant once the effects of autocorrelation are
removed.


Autocorrelation: Causes
 Specification error
 Omitted variable – i.e inflation
 Wrong functional form
 Lagged effects
 Data Transformations
 Interpolation of missing data
 differencing


Autocorrelation: Tests
 Observation of residuals
 Graph/plot them!
 Runs of signs
 Geary test


Autocorrelation: Tests (cont.)
 Durbin-Watson d
t n
2
ˆ
ut ˆ
ut 1
t 2
d t n
ˆ2
ut
t 2

 Criteria for hypothesis of AC
 Reject if d < dL
 Do not reject if d > dU
 Test is inconclusive if dL d dU.


 Durbin-Watson d (cont.)
 Note that the d is symmetric about 2.0, so that negative
autocorrelation will be indicated by a d > 2.0.
 Use the same distances above 2.0 as upper and lower
bounds.

Analysis of Time Series. http://cnx.org/content/m34544/latest/

 Durbin‟s h
 Cannot use DW d if there is a lagged endogenous
variable in the model

d T
h 1
2 2
1 Ts c

 sc2 is the estimated variance of the Yt-1 term
 h has a standard normal distribution


 Tests for higher order autocorreltaion
 Ljung-Box Q (χ2 statistic)

L 2
rj
Q' T (T 2)
j 1 T j
 Also called the Portmanteau test

 Breusch-Godfrey


Autocorrelation: Remedies
 Generalized Least Squares
 Later!
 First difference method
 Take 1st differences of your Xs and Y
 Regress Δ Y on ΔX
 Assumes that Φ = 1!
 This changes your model from one that explains rates to
one that explains changes.

 Generalized differences
 Requires that Φ be known.


Autocorrelation: Remedies
 Cochran-Orcutt method
 (1) Estimate model using OLS and obtain the residuals,
u t.
 (2) Using the residuals from the OLS, run the following
regression.

ˆ
ut ˆˆ
put 1
vt


Autocorrelation:
Cochran-Orcutt method (cont.)

 (3) using the p obtained, perform the regression on the
generalized differences

 Where
Yt* =B1* + B2 Xt* + ey
*
t

B1* = B1 (1- r ), Yt* = (Yt - rYt-1 ), Xt* = (Xt - r Xt-1 ), and B2 = B2
*

 (4) Substitute the values of B1 and B2 into the original
regression to obtain new estimates of the residuals.
 (5) Return to step 2 and repeat – until p no longer changes.
 No longer changes means (approximately) changes are less than 3
significant digits – or, for instance, at the 3rd decimal place.


Autocorrelation with lagged Dependent
Variables
 The presence of a lagged dependent variable causes
special estimation problems.
 Essentially you must purge the lagged error term of its
autocorrelation by using a two stage IV solution.
 Careful with lagged dependent variable models.
 The Lagged dep var may simple scoop up all the variance to be
explained.
 A variety of models used lagged dependent b=variables:
 Adaptive expectations
 Partial adjustment,
 Rational expectations.


Model Specification:
Definition
 The analyst should understand one fundamental “truth”
about statistical models. They are all misspecified.
 We exist in a world of incomplete information at best.
Hence model misspecification is an ever-present
danger.
 We do, however, need to come to terms with the
problems associated with misspecification so we can
develop a feeling for the quality of information,
description, and prediction produced by our models.


Criteria for a “Good Model”
 Hendry & Richard Criteria
 Be data admissible – predictions must be logically
possible
 Be consistent with theory
 Have weakly endogenous regressors (errors and X‟s
uncorrelated)
 Exhibit parameter stability – relationship cannot vary
over time – unless modeled in that way
 Exhibit data coherency – random residuals
 Be encompassing – contain or explain the results of
other models


Definition (cont.)
 There are basically 4 types of misspecification we need
to examine:
 functional form
 inclusion of an irrelevant variable
 exclusion of a relevant variable
 measurement error and misspecified error term


Implications
 If an omitted variable is correlated with the included
variables, the estimates are biased as well as
inconsistent.
 In addition, the error variance is incorrect, and usually
overestimated.
 If the omitted variable is uncorrelated with the included
variables, the errors are still biased, even though the
B‟s are not.


Implications
 Incorrect functional form can result in autocorrelation or
heteroskedasticity.
 See the notes for these problems for the implications of
each.


Causes
 This one is easy - theoretical design.
 something is omitted, irrelevantly included, mismeasured
or non-linear.
 This problem is explicitly theoretical.


Data Mining
 There are techniques out there that look for variables to
add.
 These are often atheoretical – but can they work?
 Note that data mining may alter the „true‟ level of
significance
 With c candidates for variables in the model, and k
actually chosen with an α=.05, the true level of
significance is:
* c/k
1 (1 )
 Note similarity to Bonferroni correction

January 18, 2013
a ' =1- (1- a ) 1/k
Slide 155

Model Specification: Tests
 Actual Specification Tests
 No test can reveal poor theoretical construction per se.
 The best indicator that your model is misspecified is the
discovery that the model has some undesirable
statistical property; e.g. a misspecified functional form
will often be indicated by a significant test for
autocorrelation.
 Sometimes time-series models will have negative
autocorrelation as a result of poor design.


Ramsey RESET Test
 The Ramsey RESET test is a “Regression Specification
Error Test.

 You add the predicted values of Y to the regression
model

 If they have a significant coefficient then the errors are
related to the predicted values, indicating that there is a
specification error.

 This is based on demonstrating that there is some non-
random behavior left in the residuals


Model Specification: Tests
 Specification Criteria for lagged designs
 Most useful for comparing time series models with same
set of variables, but differing number of parameters


Ps602 notes part1

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Ps602 notes part1

Similar to Ps602 notes part1 (20)

Ps602 notes part1