Econometrics: ECON2300 – Lecture 1
The Econometric Model:
Econometrics is about how we can use theory and data from economics, business and the social
sciences, along with tools from statistics, to answer “how much” type questions.
In economics we express our ideas about relationships between economics variables using the
mathematical concept of a function. An example of this is when expressing the price of a house in
terms of its size.
Price = f(size)
Hedonic Model: A model that decomposes the item being researched into its constituent
characteristics, and obtains estimates of the contributory value of each characteristic
An example of a hedonic model for house price might be expressed as:
)
,
,
,
,
,
,
(
Price oning
airconditi
pool
age
stories
bathrooms
bedrooms
size
f

Economic theory does not claim to be able to predict the specific behaviour of any individual or firm,
but rather is describes the average or systematic behaviour of many individuals for firms.
Economic models = Generalisation
In fact we realise that there will be a random and unpredictable component e that we will call random
error. Hence the econometric model for price would be
e
oning
airconditi
pool
age
stories
bathrooms
bedrooms
size
f 
 )
,
,
,
,
,
,
(
Price
The random error e, accounts for the many factors that affect sales that we have omitted from this
simplistic model, and it also reflects the intrinsic uncertainty in economic activity.
Take for example the demand relation:
i
p
p
p
i
p
p
p
f
q c
s
c
s
d
5
4
3
2
1
)
,
,
,
( 



 





The corresponding econometric model is:
e
i
p
p
p
i
p
p
p
f
q c
s
c
s
d






 5
4
3
2
1
)
,
,
,
( 




Econometric Models include the error term, e
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are
making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such
copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
In every model there are two parts:
1. A systematic portion – part we obtain from economic theory, includes assumptions about the
functional form.
2. An unobservable random component – “noise” component which obscures our understanding
of the relationship among variables: e.
How Do we Obtain Data?
In an ideal world:
1. We would design an experiment to obtain economic observations or sample information
2. Repeating the experiment N times would create a sample of N sample observations
In the real world:
Economists work in a complex world in which data on variables are “observed” and rarely obtained
from a controlled experiment. It is just not feasible to conduct an experiment to obtain data. Thus we
use non-experimental data generated by an uncontrolled experiment.
Experimental data: Variables can be fixed at specific values in repeated trials of the
experiment
Non-experimental data: Values are neither fixed nor repeatable
Most economic, financial or accounting data are collected for administrative rather than research
purposes, often by government agencies or industry. The data may be:
 Time-series form – data collected over discrete intervals of time (stock market index, CPI,
GDP, interest rates, the annual price of wheat in Australia from 1880 to 2009)
 Cross-sectional form – data collected over sample units in a particular time period (income in
suburbs in Brisbane during 2009, or household census)
 Panel data form – data that follow individual microunits over time (data for 30 countries for
the period 1980-2005, monthly value of 3 stock market indices over the last 5 years)
Data may be collected at various level of aggregation:
 Micro – data collected on individual economic decision-making units units such as
individuals, households, or firms
 Macro – data resulting from a pooling or aggregating over individuals, households, or firms
at the local, state, or national levels
Data collected may also represent flow or a stock:
 Flow – outcome measures over a period of time, such as the consumption of petrol during the
last quarter of 2005
 Stock – outcome measured at a particular point in time, such as the quantity of crude oil held
by BHP in its Australian storage tanks on April 1, 2002, or the asset value of Macquarie Bank
on 5th
July 2009.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Data collected may be quantitative or qualitative:
 Quantitative – numerical data, data that can be expressed as numbers or some transformation
of them such as real prices or per capital income
 Qualitative – outcomes that of an “either-or” situation that is whether an attribute is present
or not. Eg. Colour, or whether a consumer purchased a certain good or not (Dummy
variables)
Statistical Inference:
The aim of statistics is to “infer” or learn something about the real world by analysing a sample of
data. The ways which statistical inference are carried out include:
 Estimating economic parameters, such as elasticities
 Predicting economic outcomes, such as the enrolments in bachelor degree programs in
Australia for the next 5 years.
 Testing economic hypotheses, such as: Ii newspaper advertising better than “email”
advertising for increasing sales?
Econometrics includes all of these aspects of statistical inference. There are two types of inference:
1. Deductive: go from a general case  to  a specific case: this is used in mathematical
proofs
2. Inferential: go from a specific case  to  a general case: this is used in statistics
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Review of Statistic Concepts:
Random variables: Discrete and Continuous
Random variable: A random variable is a variable whose value is unknown until it is observed, it is
not perfectly predictable. The value of the random variable results from an experiment (controlled or
uncontrolled). Uppercase letters (e.g. X) are usually used to denote random variables. Lower case
letters (e.g. x) are usually used to denote values of random variables.
Discrete random variable:
A discrete random variable can take only a finite number of values that can be counted by using the
positive integers
 E.g. The number of cars you own, your age in whole years, etc.
 Dummy variables:






female)
isnot
person
If
female
is
person
If
0
1
D
Probability distribution of a discrete random variable:
A discrete random variable has a probability density function which summarises all the possible
values of a discrete random variable together with their associated probabilities. It can be in the form
of a table, formula or graph.
Two key features of a probability distribution are its centre (location) and width (dispersion); the
mean, μ, and variance, σ2
, respectively. For a discrete random variable X:
Mean:  


 )
(
)
( x
X
P
x
X
E

Variance:    
 





 )
(
)
(
)
(
)
( 2
2
2
x
X
P
x
X
E
X
Var 


It can be seen in the graph above that there are only distinct values that the variable x can take which
is what a discrete variable is – the probability density function is NOT continuous.
Discrete probability distributions are:
1. Mutually exclusive – no overlap between values
2. Collectly exhausted – full sample space covered, includes every possibility
frequency
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example: A 5 sided dice is biased: the sides have 0, 1, 2, 3 & 4 respectively the following table
shows the probability distribution.
a) Calculate the mean & variance of X
b) Sketch the probability distribution of X
c) Find P(X 2
 )
X 0 1 2 3 4
P(X) 0.10 0.45 0.30 0.10 0.05
Solution:
a) i) Mean:
55
.
1
05
.
0
4
10
.
0
3
30
.
0
2
45
.
0
1
10
.
0
0
)
4
(
4
)
3
(
3
)
2
(
2
)
1
(
1
)
0
(
0
)
(




























 
X
P
X
P
X
P
X
P
X
P
x
X
P
X

ii) Variance:
 
9475
.
0
05
.
0
)
55
.
1
4
(
10
.
0
)
55
.
1
3
(
30
.
0
)
55
.
1
2
(
45
.
0
)
55
.
1
1
(
10
.
0
)
55
.
1
0
(
)
(
)
(
2
2
2
2
2
2
2



















  x
X
P
X 

b)
c) P(X 2
 )
85
.
0
40
.
0
45
.
0
10
.
0
)
2
(
)
1
(
)
0
(
)
2
(










 X
P
X
P
X
P
X
P
X
P(X=x)
0 1 2 3 4
0.45
0.30
0.10
0.05
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Continuous random variable:
A continuous random variable can take any real value (not just whole numbers or positive) generally
measurable.
 E.g. Your height, the temperature etc.
Easy way to establish, is to pick a random number eg. 3.4314135315 and ask if the variable can take
that value? If yes then it is continuous, if no it is discrete.
Probability distribution of a continuous random variable:
A continuous random variable has a probability density function which is a smooth non-negative
function representing likely and unlikely values of the random variable.
Two key features of a probability distribution are its centre (location) and width (dispersion); the
mean, μ, and variance, σ2
, respectively. Let f(x) denote the pdf for a random continuous variable X.
Mean: 





 dx
x
f
x
X
E )
(
)
(

Variance:   








 dx
x
f
x
X
E
X
Var )
(
)
(
)
(
)
( 2
2
2



There are an infinite number of points in an interval of a continuous random variable, so a positive
probability cannot be assigned to each point – the area of a line = 0. Therefore, for a continuous
random variable, P(X= x) = 0.
We can only assign probabilities to a range of values or to put it another way, we can only assign a
probability that X will lie within a certain range of variables.




2
1
)
(
)
( 2
1
x
x
dx
x
f
x
X
x
P
Note that it does not matter if greater than or greater than or equal to symbols are used as the
difference in negligible (the probability of a single value is 0).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The Normal Distribution:
The most useful continuous distribution is the normal distribution. The Normal distribution has a
probability distribution function (pdf) of:












 

x
e
x
f
x
-
,
2
1
)
(
2
2
2
)
(
2



Important Parameters of the normal distribution:
1. μ = mean: the centre of the distribution.
2. σ2
= variance: level of dispersion
3.
Properties of the normal distribution:
 Symmetric about the mean
 Bell shaped
 The mean, μ median and mode are all the same
 Used to find the probabilities of range
 Probabilities of a single value = 0. E.g. P(X=3) = 0
 There are an infinite number of normal distributions for each value of μ and σ
 Area under the probability Density function is equal to 1
o As symmetric, each side has 0.5 area
 Probability is measured by the area under the curve – the cumulative distribution function
The Standardised Normal Distribution:
 Variance and Standard Deviation of 1
 Mean of 0
 Values greater than the mean have positive Z-Values
 Values less than the mean have negative Z-Values
The most useful element of the normal distribution is that we can “standardise” it to the standard
normal distribution of which we have tables to determine probabilities (Z values)
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205




X
Z
Example: In a given population, heights of people are normally distributed with a mean of 160cm
and standard deviation of 10cm.
a) What is the probability that a person is more than 163.5cm tall?
b) What proportion of people have heights between 155cm and 164cm?
Solution:
a)
b)
160cm
 
 
3632
.
0
1368
.
0
5
.
0
35
.
0
0
5
.
0
10
160
5
.
163
10
160
160
5
.
0
5
.
163
160
5
.
0
)
5
.
163
(












 










Z
P
Z
P
X
P
X
P
160cm
 
   
3283
.
0
1368
.
0
1915
.
0
35
.
0
0
0
5
.
0
10
160
5
.
163
10
160
160
10
160
160
10
160
155
5
.
163
160
)
160
155
(
)
5
.
163
155
(















 









 












Z
P
Z
P
Z
P
Z
P
X
P
X
P
X
P
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The Chi-Square Distribution:
The Chi-square random variables arise when standard normal random variables are squared. If Z1,
Z2, ..., Zm denote m independent N(0,1) random variables, then
  






m
i
m
i
m
m Z
Z
Z
Z
V
1
2
)
(
2
2
)
(
2
2
2
2
1 ~
~ 


The notation
2
)
(
~ m
X
V is read as: the random variable V has a chi-square distribution with m
degrees of freedom.
The degrees of freedom parameter m indicates the number of independent N(0,1) random variables
that are squared and summed to form V. The value of m determines the entire shape of the chi-
square distribution – including its mean and variance.
 
  m
V
m
E
V
E
m
m
2
var
)
var(
)
(
2
)
(
2
)
(






The Values of V must be non-negative, v  0, because V is formed by squaring and summing m
standardised normal N(0,1) random variables. The distribution has a long tail, or is
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
skewed to the right (long tail to the right). As the degrees of freedom increase m gets larger and the
distribution becomes more symmetric and “bell-shaped”. As m gets larger, the chi-square
distribution converges to and essentially becomes the normal distribution.
The student ‘t’ Distribution:
A ‘t’ random variable is formed by dividing a standard normal random variable Z ~ N(0,1) by the
square root of an independent chi-square random variable, V ~ χ2
m.
m
t
m
V
Z
t ~

The t-distributions shape is completely determined by the degrees of freedom parameter, m and the
distribution is symbolised by tm.
Note that the t distribution is more spreadout than the standard normal distribution and less peaked.
With mean and variance:
2
)
var(
0
)
(
)
(
)
(



m
m
t
t
E
m
m
As the number of degrees of freedom approaches infinity, the distribution approaches the standard
normal. N(0,1).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The F distribution:
An F random variable is formed by the ratio of two independent chi-square random variables that
have been divided by their degrees of freedom. If V1 ~ χ2
m1 and V2 ~ χ2
m2 and if V1 and V2 are
independent, then:
)
,
(
~
/
/
2
1
2
2
1
1
m
m
F
m
V
m
V
F 
The F-distribution is said to have m1 numerator degrees of freedom and m2 denominator degree’s of
freedom. The values of m1 and m2 determine the shape of the distribution, which in general looks
like the figure below.
The graph below shows the range of shapes the distribution can take for different degrees of
freedom.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Laws of Expectation and Variation:
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
]
[
0
]
[
]
[
2
2
Y
Var
X
Var
Y
X
Var
Y
E
X
E
Y
X
E
X
Var
a
b
aX
Var
b
X
aE
b
aX
E
X
Var
a
aX
Var
X
aE
aX
E
b
Var
b
b
E















The Error Term:
The error term in a regression model is a random variable. Like other random variables it is
characterised by:
a) A mean (or expected value)
b) A variance
c) A distribution (i.e. probability density function)
We usually assume the random error term of an econometric model to:
a) Have expected value of zero
b) Have a variance which we will call σ2
Where:
a and bare constants
X and Y are random variables
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The smaller the variance of the error term, the more efficient the model.
Sampling Distributions:
We can usually draw many samples of size n from a population. Each sample can be used to
compute a sample statistic (eg. A sample mean) these statistics will vary from sample to sample. If
we take infinitely many samples of a normally distributed random variable X in the population the
sample statistic X will also be normally distributed.
The probability distribution that gives all possible values of a statistic and associated probabilities is
known as a sampling distribution.
If Xi ~ N(μ,σ2
) then, )
/
,
(
~ 2
N
N
X 

If the distribution of X is non-normal but n is large, then X is approximately normally distributed.
The approximation is good when n 30
 - this is known as the central limit theory.
Central limit Theorem:
If Y1, ..., YN are independent and identically distributed random variables with mean μ and variance
σ2
, and 
 N
Yi
Y / , then
N
Y
ZN




has a probability distribution that converges to the standard normal as N (0,1) as N  ∞.
Estimators & Estimates:
A point estimator is a rule or formula which tells us how to use a set of sample observations to
estimate the value of a parameter of interest. A point estimate is the value obtained after the
observations have been substituted into the formula.
Desirable properties of point estimators include:
 Unbiased – an estimator ˆ is an unbiased estimator of the population parameter θ if E(ˆ) = θ
 Efficiency - 1
ˆ
 is more efficient than 2
ˆ
 if    
2
1
ˆ
var
ˆ
var 
 
 Consistency- the distribution of the estimator becomes more concentrated about the
population parameter as the sample size becomes larger
Note that both bias and variance approach 0 as n approaches infinity.
Estimate: is a particular value for a parameter
Estimator: a formula to get estimate
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Examples:
N
Xi
X 
 is the best linear unbiased estimator of )
(X
E


 
N
X
Xi
 

2
2
̂ is a biased but consistent estimator of 2
2
)
( 
 
 X
E
 
1
ˆ
2
2


 
N
X
Xi
 is an unbiased and consistent estimator of 2
2
)
( 
 
 X
E
Confidence Intervals:
A confidence interval or interval estimate, is a range of values which contains information not only
about the location of the population mean, but about the precision with which we estimate it.
We can generally use the sampling distribution of an estimator to derive a confidence interval for the
population parameter.
In general, a 100(1-α)% confidence interval for the population mean is given by:
n
Z
x
CI

 

 2
/
Where α is the level of confidence.
Prior to selecting a random sample, the probability that a CI will contain the population parameter is
100(1-α)%. Eg. If we took many samples of size n and calculated the many corresponding random
1-α = 0.95
α/2 α/2
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
intervals
n
Z
x

 
 2
/ then 100(1-α)% would contain μ.
After we construct a confidence interval, either it does or it does not contain the population
parameter, with probabilities 1 and 0 (so we can only say we are 100(1-α)% confident that a
particular confidence interval contains the parameter.
General conclusion: “We can say with 100(1-α)% confidence that the population parameter is
between lower bound and upper bound.”
Hypothesis Testing:
An hypothesis is a statement or claim about the value(s) of one or more population parameters. To
test a hypothesis we
1. Identify a test statistic and find its sampling distribution when the hypothesis is true
2. Reject the hypothesis if the test statistic takes a value that is deemed unlikely
5 steps:
1. State H0 and H1 – H0 must contain an equality ( 

 ,
, )
2. State a decision rule – Reject H0 if...
3. Calculate test statistic
4. Compare, and make decision
5. Write conclusion
Note:
o One-tail or two tail tests can be used
o Can use critical values or p-value method
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 2
An Econometric Model:
For a given set of data the aim of a econometric model is to fit a regression line and then check how
good it fits.
In order to investigate this relationship between expenditure and income we must build an economic
model and then a corresponding econometric model that forms the basis for a quantitative or
empirical economic analysis.
We must express mathematically which variables are dependent and independent. (In this case we
can say that the weekly expenditure depends on income – y depends on x)
We represent our economic model mathematically by the conditional mean:
x
x
y
E x
y 2
1
|
)
|
( 

 


The conditional mean )
|
( x
y
E is called a simple regression function as there is only one
explanatory variable. The unknown regression parameters 1
 and 2
 are the intercept and slope
respectively.
dx
x
y
dE
x
x
y
E )
|
(
)
|
(
2 




Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are
making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such
copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
For each value of x there is potentially a range of values of y – in fact each has a probability
distribution.
The figure above shows that the regression line passes through the mean of each distribution of
expenditure at each level of income.
The difference between the actual value of y and the expected value is known as the random error
term.
)
(
)
( 2
1 x
y
y
E
y
e 
 




If we rearrange:
e
x
y 

 2
1 

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Assumptions of the Simple Linear Regression (SLR) Model:
1. The population can be represented by:
e
x
y 

 2
1 

2. The mean value of y, for each value of x is given by the linear regression function
x
x
y
E 2
1
)
|
( 
 

Error term: This means that the mean error term is 0.
0
)
( 
e
E
3. For each value of x, the values of y are distributed about their mean value, following
probability distributions that all have the same variance
2
)
|
var( 

x
y
Error term: This means that the error terms are homoskedastic: constant variance. Violation
of this is hetroskadastic.
)
var(
)
var( 2
y
e 
  v
4. The sample values of y are all uncorrelated and have zero covariance, implying there is no
linear association amoung them,
0
)
,
cov( 
j
i y
y
Error term: There is no Serial Correlation. Note that this assumption can be made stronger
by assuming that the random errors e are all statistically independent in which case the values
of y are also statistically independent.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
5. The variable x is not random and must take at least two different values.
6. (optional) The values of y are normally distributed about their mean for each value of x.
 
 
2
2
1 ,
~ 

 x
N
y 
Error term: The values of e are normally distributed about their mean
)
,
0
(
~ 2

N
e
If the values of y are normally distributed, and vice versa.
The Error term:
If the regression parameters 1
 and 2
 were known then for any value of y we could calculate:
)
(
)
( 2
1 x
y
y
E
y
e 
 




However, the values of 1
 and 2
 are never known for certain and therefore it is impossible to
calculate e.
The random error e represents all factors affecting y other than x. These factors cause individual
observations y to differ from the mean value:
x
y
E 2
1
)
( 
 

Estimating the Parameters of the Similar Linear Regression:
Our problem is to estimate the location of x
y
E 2
1
)
( 
 
 that best represents our data. We would
expect this line to be somewhere in the middle of all the data points ince it represents mean, or
average behaviour. To estimate 1
 and 2
 we could simply draw a line through the middle of the
data and then measure the slope and intercept with a ruler. The problem with this method is that
different people would draw different lines – in fact there would be an infinite set of possibilities,
and that it would not be accurate.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The estimated regression line is given by:
i
i x
b
b
y 2
1
ˆ 

The least squares principle:
The least squares method involves finding estimators 1
 and 2
 that provide the smallest sum of
squared residuals:
 

 

2
2
ˆ
min
ˆ
min i
i
i y
y
e





 2
2
)
(
)
)(
(
x
x
y
y
x
x
b
i
i
i
x
b
y
b 2
1 

We usually use a computer to calculate these values as the process would take too long and be too
tedious to do by hand.
Interpreting the estimates:
 The value of b2 is an estimate of 2
 , the amount by which y increases per unit increase in x
 The value of b1 is an estimate of 1
 , what y would be when x = 0
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Because the least squares estimate is generated using sample data, different samples will lead to
different values of b1 and b2. Therefore b1 and b2 are random variables.
In this context we call b1 and b2 the least squares estimators, but when actual sample values are
substituted then we obtain values of random variables which are estimates.
Estimators: Formulas for estimates
Estimates: Actual values given by the estimators
The variances and Covariance of b1 and b2:












2
2
2
1
)
(
)
var(
x
x
N
x
b
i
i











 2
2
2
)
(
)
var(
x
x
b
i

The square roots of the estimated variances are known as standard errors.











 2
2
2
1
)
(
)
,
cov(
x
x
x
b
b
i

Summary: the variances and covariances of b1 and b2
 The larger the variance in the error term, 2
 , the greater the uncertainty there is in the
statistical model, and the larger the variances and covariance of the least squares estimators.
 The larger the sum of squares,  2
  x
xi , the smaller the variances of the least squares
estimators and the more precisely we can estimate the unknown parameters
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
In a the data are bunched, the  2
  x
xi is smaller and we cannot estimate the line very
accurately. In b the  2
  x
xi is larger and we can estimate the unknown parameters more
precisely.
 The larger the sample size N, the smaller the variances and covariance’s of the least squares
estimators
 The larger the term  2
i
x is, the larger the variance of the least squares estimator b1
The further our data are from x = 0, the more difficult it is to interpret B1.
 The absolute magnitude of the covariance increase the larger in magnitude is the sample
mean x , and the covariance has a sign opposite that of x .
The probability distribution of the Least Squares Estimators:
 If the normality assumption about the error terms, is correct, the the least squares estimators
are normally distributed.
 If assumptions 1 – 5 hold, and if the sample size is sufficiently large ( 30

n ), then by the
central limit theorem the least squares estimators have a distribution that approximates the
normal distribution shown.
The Gauss-Markov Theorem:
Under the assumptions of SR1-SR5 of the linear regression model, the estimators b1 and b2 have the
smallest variance of all linear and unbiased estimators 2
1 & 
 . They are the Best Linear Unbiased
Estimators (BLUE) of 2
1 & 
 .
To clarify what the Gauss-Markov theorem does, and does not, say:
1. The estimators b1 and b2 are “best” when compared to similar estimators, those that are linear
and unbiased. The theorem does not say that b1 and b2 are the best of all possible estimators.
2. They are the “best” within their class because they have the minimum variance. When
comparing two linear and unbiased estimators we always want to use the one with the
smallest variance.
3. In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must be true. If any
of these assumptions are not true, then b1 and b2 are not the best linear unbiased estimators of
B1 and B2.
4. The Gauss-Markov theorem does not depend on the assumption of normality
5. In simple linear regression these are the estimators to use.
6. The theorem applies to the least squares estimators. It does not apply to the least squares
estimates from a single same.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Estimating the variance of the Error term:
The variance of the random error ei is:
    )
(
)
0
(
)
(
)
var(
2
2
2
2
i
i
i
i
i e
E
e
E
e
E
e
E
e 




 
Assuming that the mean error = 0 assumption is correct.
The unbiased estimator of variance is:
2
ˆ
ˆ
2
2

 
N
ei
 with 2
2
)
ˆ
( 
 
E
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Interval Estimation:
Confidence interval:
)
(
. k
crit
k b
error
std
t
b
CI 


Where:
bk = b1 or b2
tcrit = the critical value )
2
,
2
/
1
( 
 N
t  where N-2 are the degrees of freedom
std. Error = is given by the regression estimation
Before sampling, we can make the probability statement there is a 100(1-α)% chance that the real
value lies within the interval.
After sampling, we can only make a confidence interval – we are 100(1-α)% confident that the real
value lies within the interval.
Example:
Construct a 95% confidence interval for B2 for the following equation when there are 40
observations.
)
09
.
2
(
)
4
.
43
(
21
.
10
4
.
83
ˆ x
y 

Solution:
23016
.
4
21
.
10
09
.
2
024
.
2
21
.
10
09
.
2
21
.
10
09
.
2
21
.
10
09
.
2
21
.
10
)
(
.
)
38
,
975
.
0
(
)
2
40
,
2
/
05
.
0
1
(
)
2
,
2
/
1
(
2
2





















t
t
t
b
error
std
t
b
CI
N
crit

We can say with 95% confidence that the true value of β2 lies within the interval 5.98 to 14.44.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Hypothesis Testing:
We can conduct a hypothesis test on the slope of the regression line.
Step 1: State Hypothesis:
,
,
,
:
,
,
,
:
1
0
c
c
c
H
c
c
c
H
k
k
k
k
k
k












Step 2: Decision rule:
Reject H0 if .....
Step 3: Calculate test statistic
Step 4: Compare and decision
Step 5: Conclusion
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example:
Using 40 observations on food expenditure.
)
09
.
2
(
)
4
.
43
(
21
.
10
4
.
83
ˆ x
y 

Test whether B2 is less than or equal to 0 at the 5% level of significance.
Step 1: State Hypothesis
0
:
0
:
2
1
2
0




H
H
Step 2: Decision Rule
Reject H0 if tcalc > tcrit
Step 3: Calculate test statistic
88
.
4
09
.
2
0
21
.
10
2
2
2





b
Se
b
tcalc

Step 4: Compare and decision
4.88 > 2.024 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the value of B2, that the
increase in expenditure for a 1 unit increase in income, is not less than or equal to 0.
Types of errors:
H0 true H0 False
Reject H0 Type 1 error = α No error
DNR H0 No error Type 2 error
Rejection
region
024
.
2
38
,
975
.
0
)
2
40
,
2
/
05
.
0
1
(
)
2
,
2
/
1
(








tcalc
t
tcalc
t
tcalc
t
tcalc N

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 3
The least Squares Predictor:
The linear regression model provides a way to predict y given any value of x. This is extremely
important for forecasters; be it in politics, finance or business. Accurate predictions provide a basis
for better decision making.
Our first SR assumption is that our model is linear: For a given value of the explanatory variable, x0,
the value of the dependent various y0 is given by the econometric model:
0
2
1
0 e
x
y 

 

Where e0 is a random error. This random error has:
1. Mean: E(e0)= 0
2. Variance: var(e0) = σ2
3. Covariance: cov(e0,e1) = 0
The least squares predictor (or estimator) of y0 (given x0) is:
0
2
1
0
ˆ x
b
b
y 

To evaluate how well this predictor or estimator performs we define the forecast error, which is
analogous to the least squares residual.
The variance of the prediction error is:
i
i x
b
b
y 2
1
ˆ 

Forecast error: f
Actual value: yi
i
ŷ
i
x
0
0
2
2
1
1
0
2
1
0
2
1
0
0
)
(
)
(
)
(
ˆ
e
x
b
b
e
x
x
b
b
y
y
f
















Now: if we apply the assumptions SR1 to SR5:
0
0
0
0
)
(
)
(
)
(
)
ˆ
(
)
(
0
0
2
2
1
1
0
0











e
E
x
b
E
b
E
y
y
E
f
E


As:
0
)
(
&
)
(
&
)
( 0
2
2
1
1 

 e
E
b
E
b
E 

Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases,
these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of
assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been
superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the
copyright owner. We are making these materials available for the purposes of research and study and as such believe that this
constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205















 2
2
0
2
0
)
(
)
(
1
1
)
ˆ
var(
)
var(
x
x
x
x
N
y
y
f
i

If SR6 holds, or the sample size is large enough, then the prediction error is normally distributed.
Note that, the further x0 is from the sample mean, the larger the variance of the prediction error.
 This means that as you extrapolate more and more your predictions will be less accurate.
Note the variance of the forecast error is smaller when:
i) The overall uncertainty in the model is smaller, as measured by the variance of the
random errors σ2
ii) The sample size N is larger
iii) The variation in the explanatory variable is larger
iv) The value of x0 from x is smaller
The forecast error variance is estimated by replacing
σ2
with its estimator:
)
(
r̂
va
)
(
ˆ
ˆ
)
(
ˆ
)
(
ˆ
ˆ
)
(
)
(
ˆ
ˆ
ˆ
)
(
)
(
1
1
ˆ
)
ˆ
var(
2
2
0
2
2
2
2
2
0
2
2
2
2
0
2
2
2
2
2
0
2
b
x
x
N
x
x
x
x
N
x
x
x
x
N
x
x
x
x
N
f
i
i
i







































i
i x
b
b
y 2
1
ˆ 

i
ŷ
x x2
x1
Obviously:
The estimate that the estimator or predictor
gives at x1 will be close to the actual value as
there are lots of data points that the regression
is based on round x1 – it is close to the sample
mean.
At x2, there are no points very close that the
regression was based on, so the prediction will
be less accurate aka will have a larger variance.
i.e. We can do a better job of predicting in the
region where we have more sample
information.
The standard error of the forecast:
)
(
r̂
va
)
( f
f
se 
Hence, we can construct a (1-α)x100%
prediction interval for y0:
)
(
ˆ0
0 f
se
t
y
y crit


Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example:
Calculate a 95% confidence interval for y when x0 = 20:
)
(
ˆ0
0 f
se
t
y
y crit


Step 1: Linear equation
From the output above we can determine a linear regression:
)
093
.
2
(
)
41
.
43
(
21
.
10
416
.
83
ˆ
ˆ 2
1
x
y
x
b
b
y




Therefore: when x = 20
616
.
287
)
20
(
21
.
10
416
.
83
ˆ 


y
Step 2: Determine se(f)
)
(
r̂
va
)
( f
f
se 
2
2
2
2
0
2
2
2
0
2
)
0932
.
2
)(
605
.
19
20
(
40
517
.
89
517
.
89
)
(
r̂
va
)
(
ˆ
ˆ
)
(
)
(
1
1
ˆ
)
(
r̂
va






















b
x
x
N
x
x
x
x
N
f
i



S.E of regression Sample size (N)
x-value mean of x se(b2)
Note:
Var(b2) = se(b2)2

)
(
r̂
va f 8214.34 Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Step 3: Confidence interval
34
.
8214
024
.
2
616
.
287
34
.
8214
616
.
287
)
(
r̂
va
616
.
287
)
(
ˆ
2
40
,
975
.
0
)
2
,
2
/
1
(
0
0













t
f
t
f
se
t
y
y
N
crit

06
.
471
17
.
104 
 y
Therefore we can say with 95% confidence that the true expenditure on food will be between
$104.17 and $471.06.
Transforming x to obtain se(f):
A simple way to obtain the prediction and prediction interval estimates with EViews ( or any other
econometrics package, including Excel) is as follows:
1. Transform the independent variable x by subtracting x0 from each of the values.
Generate a new variable:
Genr  x2 = x – x0
2. The estimate the regression model by running a regression analysis
3. The estimated standard error of the forecast is given by:
2
1 ˆ
)
var(
)
( 

 b
f
se
Example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The transformation has the following effect:
Measuring Goodness-of-Fit:
Two major reasons for analysing the model
e
x
y 

 2
1 

1. To explain how the dependent variable (yi) changes as the independent variable (xi) changes
2. To predict y0 given an x0
These two objectives come under the broad headings of estimation and prediction. Closely allied
with the prediction problem discussed in the previous section is the desire to use xi to explain as
much of the variable in the dependent variable yi as possible.
i
i x
b
b
y 2
1
ˆ 

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
SST = total sum of squares – measure of total variation in the dependent variable about its sample
mean
SSR = regression sum of squares – the part that is explained by the regression
SSE = sum of squared errors – that part of the total variation that is unexplained
Coefficient of determination: R2
The coefficient of determination measures the proportion of the variation in the dependent variable
that is explained by the regression model:
SST
SSE
SST
SSR
R 

 1
2
0 < R2
< 1
If R2
=1 the data fall exactly on the fitted least squares regression line and we have a perfect fit. If the
sample data for y and x are uncorrelated and show no linear association, then the least squares fitted
line is “horizontal” so SSR = 0 and R2
= 0.
For a simple regression model, R2
can also be computed as the square of the correlation coefficient
between yi and i
ŷ .
 R2
= 1: All the sample data falls exactly on the fitted least squares line, SSE = 0
 R2
= 0: The sample data for y and x are uncorrelated, the least squares fitted line is horizontal
and equal to the mean of y so that SSR = 0
Note:
1. R2 is a descriptive measure
2. By itself, it does NOT measure the quality of the regression model
3. It is NOT the objective of regression analysis to find the model with the highest R2
4. By adding more variables R2
will automatically increase even if the variables have no
economic justification this is why we use adjusted R2 in multiple regression analysis(will
expand on this when we study multiple regression):
i
ŷ
x i
x
y
SST = y
yi 
SSE i
i
i y
y
e ˆ
ˆ 
 = unexplained
SSR y
yi 
ˆ = explained component
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
)
1
/(
)
/(
1
2




N
SSR
K
N
SSE
R
Example:
For the same data as before:
The Effects of Scaling the Data:
Data we obtain is not always in a convenient form for presentation in a table or use in a regression
analysis. When the scale of the data is not convenient, it can be altered without changing any of the
real underlying relationships between variables.
If we scale x by 1/c:
e
c
x
c
y
e
x
y







)
/
(
)
( 2
1
2
1




If we scale y by 1/c:
c
e
x
c
c
c
y
e
x
y
/
)
(
)
/
(
/
/ 2
1
2
1











Example if we now report income in $100 units.
Because b2 = 10.21 and x = 200, After scaling: b2 = 0.1021 and x = 2
This has no change in the model.
Choosing a Functional Form:
So far we have assumed that the mean household food expenditure is a linear function of household
income. That is, we assumed the underlying economic relationship to be x
y
E 2
1
)
( 
 
 , which
implies that there is a linear, straight-line relationship between E(y) and x.
When the scale of x is altered, the standard error of the
regression coefficient changes by the same multiplicative
factor as the coefficient, so that their ratio, the t-statistic, is
unaffected. All other regression statistics are unchanged.
Because the error term is scaled in this process the least
squares residuals will also be scaled. This will affect the
standard errors of the regression coefficients, but will not
affect t statistics or R2
.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
In the real world this might not be the case, and this was only assumed to make the analysis easier.
The starting point in all econometric analysis is economic theory. What does economics really say
about the relation between food expenditure and income, holding all else constant? We expect there
to be a positive relationship between these variables because food is a normal good. But nothing says
the relationship must be a straight line. In fact we do not expect that as household income rises that
food expenditures will continue to rise indefinitely at the same constant rate. Instead, as income rises,
we expect food expenditures to rise but at a decrease rate – law of diminishing returns.
The term linear in “linear regression model”
1. Does not mean a linear relationship between the economic variables.
2. Does mean that the model is “linear in the parameters” (eg. βk values – must not be raised to
powers or multiplied by other parameters etc.) but is not, necessarily, “linear in the variables”
(eg. X can be x2
x3
etc etc.)
Linear in parameters: the parameters are not multiplied together, divided, squared, cubed etc.
k
k x
x
x
f 

 


 ...
)
( 1
1
0
1. each explanatory variable in the function is multiplied by an unknown parameter,
2. there is at most one unknown parameter with no corresponding explanatory variable, and
3. all of the individual terms are summed to produce the final function value.
An example of a non-linear in parameter model is:
x
x
f 1
0
0
)
( 

 
 or 1
0
)
( 
 x
x
f 

This is non-linear because the slope of this line is expressed as a product of two parameters.
As a result, nonlinear least squares regression must be used to fit this model, but linear least
squares cannot be used.
Because of this fact, the simple linear regression model is much more flexible than it appears at first
glance. By transforming the variables y and x, we can represent many curved, nonlinear
relationships and still use the linear regression model. Choosing an algebraic form for the
relationship means choosing transformations of the original variables.
The slopes of which can be determined by taking the derivatives of the function.
Note: the most important implication of transforming variables is that the regression result
interpretations change. Both the slope and elasticity change from the linear relationship case.
Some common function types are:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
A Practical Approach:
1. Plotting the data and choosing economically-plausible models
2. Testing hypotheses concerning the parameters
3. Performing residual analysis
4. Assessing forecasting performance
5. Measuring goodness-of fit (R2
)
6. Using the principle of parsimony – simplest model
Example on Food Expenditure:
1. Plotting data
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
2. Testing hypothesise:
All slope coefficients are significantly different from zero at the 5% level of significance.
3. Performing residual analysis: Testing for normally distributed Errors
The k-th moment (from physics) of the random variable e is:
k
k e
E )
( 
 

Where μ denotes the mean of e. Measures of spread, symmetry and “peakedness” are:
Variance: 2
2 
 

V
Skewness: 3
3 / 


S
Kurtosis: 4
4 / 


K - Whether the tails are thicker or thinner than expected
If e is normally distributed then S = 0 and K = 3. Formalising this, is the Jarque-Bera Test:
The Jarque-Bera test is a test of how far measurers of residual skewness and kurtosis are from 0 and
3 (normality).
To test the null hypothesis of normality of the errors, we use the test statistic:







 


4
)
3
(
6
2
2 K
S
N
JB
Where:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
N = sample size
S = skewness
K = Kurtosis
When the null hypothesis is true, the Jarque-Bera statistic, JB has a χ2
distribution with 2 df.
Step 1: State the hypothesis:
H0: the errors are normally distributed
H1:the errors are not normally distributed
Step 2: Decision rule:
Rejet H0 if JB > χ2
(0.95,2) = 5.991
Step 3: Calculate test statistic:
063
.
0
4
)
3
99
.
2
(
)
097
.
0
(
6
40
4
)
3
(
6
2
2
2
2








 










 


K
S
N
JB
Step 4: Compare and decision
0.063 < 5.991 therefore do not reject H0.
Step 5: conclusion
There is insufficient evidence to conclude that the errors are not normally distributed at the
5% level of significance.
4. Assessing forecasting performance
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
5. Measuring goodness-of-fit: With different dependent variables:
Goodness of fit with different dependent variables:
The R2 from a linear model, measures how well the linear model explains the variation in y, while
the R2 from a log-linear model measures how well that model explains the variation in ln(y). The
two measures should NOT be compared.
To compare goodness-of-fit in models with different dependent variables, we can compute the
generalised R2
.
  2
ˆ
,
2
2
)
ˆ
,
( y
y
g r
y
y
corr
R 

We can’t compare R2
as each has
different dependent variable.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
6. Using the principle of parsimony – Use the simplest model
The principle of parsimony states that you should use the simplest model if two models
appear to be of equal forecasting ability.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 4
Multiple Regression A:
The simple regression model we have studied so far relates the dependent variable y to only ONE
explanatory variable x.
When we turn an economics model with more than one explanatory variable into its corresponding statistical
model, we refer to it as a multiple regression model.
Changes and Extensions from the simple regression model:
1. Interpretation of the β parameters:
The population regression line is:
ik
k
i
iK
i x
x
x
x
yi
E 

 


 ...
)
,...,
|
( 2
2
1
2
The k-th slope coefficient measures the effect of a change in the variable xk, upon the expected value
of y, all other variables held constant. Mathematically:
ik
ik
i
i
sconstnat
allotherx
iK
i
x
x
x
y
E
xiK
x
x
yi
E




 )
,...,
|
(
)
,...,
|
( 2
'
2
Note: the x’s start at 2 as 1 refers to the intercept term (which has no slope).
2. The assumption concerning the characteristics of the explanatory (x) variables
The assumptions of the multiple regression model are:
MR1: i
K
i
K
i
i e
x
x
y 



 

 ...
2
2
1 , where: i = 1,..., N
- The model is linear in parameters but may be non-linear in the variables
MR2: 0
)
E(e
:
with
synonomous
is
which
...
)
( i
2
2
1 




 K
i
K
i
i x
x
y
E 


- The expected (average) value of yi depends on the values of the explanatory variables and
the unknown parameters.
MR3: 2
)
var(
)
var( 

 i
i e
y - the error terms are homoskedastic (have constant variance)
MR4: cov(yi,yj) = cov(ei,ej) = 0 – There is no serial correlation
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are
making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such
copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
MR5: The values of each xik are not random and are not exact linear functions of the other
explanatory variables
MR6: (optional) )
,
0
(
~
]
),
...
[(
~ 2
2
2
2
1 



 N
e
x
x
N
y i
iK
K
i
i 



3. The degrees of freedom for the t-distribution
We will go into further detail of this further in the summary.
Least Squares Estimation:
The fitted regression line for the multiple regression model is:
iK
k
i
i x
b
x
b
b
y 


 ...
ˆ 2
2
1
The least squares residual is:
iK
k
i
i
i
i x
b
x
b
b
yi
y
y
e 





 ...
ˆ
ˆ 2
2
1
Similarly to the simple linear regression, the unknown parameters β1,...,βK are obtained by
minimising the residual sum of squares:
   


 









N
i
iK
k
i
N
i
i
i
N
i
i x
b
x
b
b
yi
y
y
e
1
2
2
2
1
1
2
1
...
ˆ
ˆ
Solving the first-order conditions for a minimum yields messy expressions for the ordinary least
squares estimators, even when K is small.
For example when K = 3 the OLS method gives:
In practise we use matrix algebra to solve these systems:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
To understand graphically what a multiple regression model embodies look at the image below:
The equation forms a surface or plane which describes the position of the variable.
Example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The model is given by:
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Interpretation of the coefficients:
b2: The number of sales is expected to fall by $7907 units when the price increases by $1 holding
the amount of advertising constant.
b1: The number of sales is expected to increase by $1863 units when the advertising increase by $1
holding the price constant.
Properties Of The OLS Estimators: (OLS = Ordinary Least Squares)
The Gauss-Markov Theorem says that:
If MR1 to MR5 are correct, the OLS estimators b1,...,bK have the smallest variance of all linear and
unbiased estimators of β1, ...,βK – they are the Best Linear and Unbiased Estimators (BLUE).
Remember that the Gauss-Markov theorem does not depend on the assumption of normality (MR6).
However, if MR6 does hold, then the OLS estimators are also normally distributed.
Again with larger values of K, the formula’s for variances of the OLS estimators are messy. For
example, when K = 3, we can show that:
Where r23 is the sample correlation coefficient between x2 and x3. -1 < r < 1
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The variances and covariances are often presented in the form of a covariance matrix. For K = 3, this matrix
takes the form:
In practise however, σ2
, the population variance is unknown. So instead we use an unbiased estimator of the
error variance:
K
N
e
K
N
y
y
N
i
i
N
i
i
i






 
 1
2
2
1
2
ˆ
)
ˆ
(
̂
The estimated variances and covariances of the OLS estimators are obtained by replacing within the
appropriate formulas. The square roots of the estimated variances are still known as standard errors.
It is important to understand the factors affecting the variance of bi (i = 2,...,K):
Inferences in the Multiple Regression Model:
If the assumptions MR1 – MR6 hold, we can:
1. The larger σ2
the larger the variance of the least squares estimators.
2. The larger the sample size the smaller the variances
3. More variation in an explanatory variable around its mean, leads to a smaller variance of the
least squares estimator.
4. The larger the correlation between the explanatory variables, the larger is the variance of the
least squares estimators. “Independent” variables ideally exhibit variation that is “independent”
of the variation in other explanatory variables.
5. Variation is one explanatory variable connected to variation in another explanatory variable is
knonw as Multicoliniarity (see next week). E.g. A larger correlation between x2 and x3 leads to a
larger variance of b2.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
1. Construct confidence intervals for each of the K parameters
2. Conduct a significance test for each of the K parameters
3. Conduct a hypothesis test on any of the parameters or combinations of parameters
The approach is that followed for the simple regression model in weeks 2 and 3 for the parameters of the
simple regression model.
1. Confidence interval:
A 100(1-α)% confidence interval for βk is given by:
)
( k
crit
k
k b
se
t
b 

 for k = 1, ..., K
Where:
K = Number of βi parameters . e.g. 3
3
2
2
1
ˆ i
i
i x
b
x
b
b
y 

 : K = 3
)
,
2
/
1
( K
N
t
tcrit 

 
Se = the standard error given in the regression output of the bi estimate
Example: construct a 95% confidence interval for the coefficient of advertising for the following
model which was based on N = 75 observations on hamburger sales.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
)
683
.
0
(
993
.
1
863
.
1
)
(
)
(
)
(
3
3
)
72
,
975
.
0
(
3
3
3
)
3
75
,
2
/
05
.
0
1
(
3
3
)
,
2
/
1
(















 
b
se
t
b
b
se
t
b
b
se
t
b k
K
N
k
k
2. Hypotheses Testing
2.1.A simple null hypothesis is a null hypothesis with a single restriction on one or more parameters.
Under MR1 to MR6, we can test the null hypothesis H0: βk = c using the t-statistic:
)
(
~
)
(
K
N
k
k
t
b
se
c
b
t 


Even if MR6 doesn’t hold, the test is still valid provided the sample size is large.
Example: Test whether revenue is related to price at the 5% level of significance when N = 75.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
224
.
3
502
.
0 3 
 
We can say with 95% confidence that the true
change in sales for a one dollar increase in
advertising is between $502 and $3224.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Step 1: State Hypotheses
0
:
0
:
2
1
2
0




H
H
Step 2: Decision Rule
Reject H0 if |tcalc| > tcrit
Step 3: Calculate Test Statistic
215
.
7
096
.
1
0
908
.
7
)
(
)
( 2
2
2









b
se
b
b
se
b
t
k
k
k
calc


Step 4: Compare and Decision
|-7.215| > 1.993 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the price does not have
no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as
effect on revenue.
2.2.Testing of a null hypothesis consisting of two or more hypotheses about the parameters in the
multiple regression model.
F- Tests
Used in:
1. Overall significance of the Model
2. Testing economic hypotheses involving more than one parameter in the model
3. Misspecification Tests
4. Testing for Heteroskedasticity
5. Testing for Serial correlation
Note: We adopt assumptions MR1-MR6 (i.e. including normality). If the errors are not normal, then
the results presented will hold approximately if the sample is large.
993
.
1
|
|
|
|
|
|
)
72
,
975
.
0
(
)
,
2
/
1
(


 

tcalc
t
tcalc
t
tcalc K
N

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
A Familiar Form of the F-test:
From ECON1320 we saw that we could express F as:
)
/(
)
1
/(
)
(
)
/(
)
(
)
1
/(
)
(
K
N
SSE
K
SSE
SST
K
N
SSE
K
SSR
F







However, this is just a particular example of a more general F-statistic that can be used to test
sets of joint hypotheses.
The general F-test:
A joint null hypothesis is a null hypothesis with two or more restrictions on two or more parameters.
Under MR1 to MR6, we can test a joint null hypothesis concerning the using the F statistic:
K
N
J
U
U
R
F
K
N
SSE
J
SSE
SSE
F 


 ,
~
)
/(
)
(
/
)
(
Where:
J = the number of restrictions in H0
SSEU = The unrestricted sum of squared errors from the original, unrestricted multiple regression
Model.
SSER = The restricted sum of squared errors from a regression model in which the null hypothesis
is assumed to be true
Note: Even if MR6 doesn’t hold, the test is still valid provided the sample size is large (by the central
limit theorem)
The General F-test can be used to test 3 types of hypotheses:
1. When used to test H0: βk=0 against H1: βk ≠ 0; the F-test is equivalent to a t-test
J = 1
2. When used to test: H0: β2= β3= ...= βk against H1: At least one βk ≠ 0
J = K - 1
3. The F-test can also be used to test whether some combination of parameters is
collectively significant to the model
K
J 

1
Restrictions:
When we have a restriction, we assume that the null hypothesis is true, for example if the null hypotheses is
0, then we assume that the βk value in the null hypotheses is 0 in the regression equation. Instead of using
the least squares estimates that minimise the sum of squared errors, we find estimates that minimise the sum
of squared errors subject to parameter constraints – restrictions. This means that the sum of squared errors
will increase; a constrained minimum is larger than an unconstrained minimum.
The theory behind the F-test, is that if the Errors are significantly different, then the assumption that the
parameter is the value assumed in the null hypothesis has significantly reduced the ability of the model to fit
the data, and thus the data do not support the null hypothesis. On the other hand, if the null hypothesis is
true, we expect that the data are compatible with the conditions placed on the parameters – we would expect
little change in the sum of squared errors when the null hypothesis is true.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
1. Testing with 1 restriction (J=1)
Example: Test whether revenue is related to price at the 5% level of significance when N = 75.
(0.683)
(1.096)
(6.352)
)
(
)
(
862583
.
1
)
(
907854
.
7
9136
.
118
ˆ
se
ADVERT
PRICE
S 


Solution:
Step 1: State Hypotheses & apply restriction
0
:
0
:
2
1
2
0




H
H
Now, impose the restriction assuming the null is correct, ie. Price is not significant and β2 is 0 and
then find the regression equation.
(0.890)
(1.80)
)
(
)
(
733
.
1
180
.
74
ˆ
se
ADVERT
S 

Step 2: Decision Rule
Reject H0 if Fcalc > Fcrit
Step 3: Calculate Test Statistic
06
.
52
)
3
75
/(
)
943
.
1718
(
1
/
)
943
.
1718
827
.
2961
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
Step 4: Compare and Decision
52.06 > 3.97 therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that the price does not have
no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as
effect on revenue.
The t-test and F-test - a relationship:
When conducting a two-tail test for a single parameter, either a t-test or an F-test can be used and the
outcomes will be identical.
In fact, the square of a t random variable with df degrees of freedom is an F random variable with
distribution F(1,df)
F-statistic = (t-statistic)2
F-crit = (t-crit)2
52.06 = (-7.215)2
3.97 = (1.993)2
97
.
3
)
3
75
,
1
,
95
.
0
(
)
,
,
1
(






Fcalc
t
Fcalc
F
Fcalc K
N
J

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
2. Testing with J = K-1 restrictions: the overall significance of the model
An important application of the F-test is for what is called “Testing the overall significance of a model”.
Consider the general multiple regression model with (K - 1) explanatory variables and K unknown
coefficients.
Unrestricted model: i
K
K
i
i
i
i e
x
x
x
y 




 


 ...
3
3
2
2
1
To examine whether we have a viable explanatory model, we set up the following null and alternative
hypotheses.
Restricted model: i
i e
y 
 1

Therefore:
SSER = SSTU SSEU = SSEU
Step 1: State Hypotheses and calculate restricted model
nonzero
is
the
of
one
least
at
:
0
,...,
0
,
0
:
k
1
3
2
0




H
H K 


Estimate restricted model:
)
749
.
0
(
)
(
375
.
77
ˆ
se
S 
SSER = 3115.482 (=SSTU)
Step 2: Decision rule
Reject H0 if Fcalc > Fcrit
Step 3: Calculate test statistic
248
.
29
)
3
75
/(
)
943
.
1718
(
2
/
)
943
.
1718
482
.
3115
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
Step 4: Compare and decision
29.248 > 3.12 Therefore reject H0.
Step 5: Conclusion
There is sufficient evidence at the 5 % level of significance to conclude that at least one of the explanatory
variables as an effect on sales.
3. Testing a Group of parameters (1 ≤ J < K)
Consider the model:
Note: the null has K – 1 hypotheses, it is referred to
as a Joint hypothesis.
12
.
3
)
3
75
,
1
3
,
95
.
0
(
)
,
,
1
(







Fcalc
F
Fcalc
F
Fcalc K
N
J

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Does advertising have an effect on sales?
Step 1: State Hypotheses
nonzero
are
both
or
0
0
:
0
0
:
4
3
1
4
3
0








H
H
Step 2: Decision rule
Reject H0 if Fcalc > Fcrit
Step 3: Calculate test statistic
44
.
8
)
4
75
/(
)
084
.
1532
(
2
/
)
084
.
1532
391
.
1896
(
)
/(
)
(
/
)
(







K
N
SSE
J
SSE
SSE
F
U
U
R
Step 4: Compare and decision
8.44 > 3.126
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that advertising has a
statistically significant effect on sales.
126
.
3
)
4
75
,
2
,
95
.
0
(
)
,
,
1
(






Fcalc
F
Fcalc
F
Fcalc K
N
J

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Prediction:
The value of y when the explanatory variables take the values x02.
The prediction error (or forecast error) is 0
0
ˆ y
y
f 
 . The prediction error is a random variable with a
mean and a variance. If assumptions MR1 to MR5 hold then 0
)
ˆ
(
)
( 0
0 

 y
y
E
f
E and
)
ˆ
var(
)
var( 0
0 y
y
f 
 with many terms, each involving σ2
. The prediction error variance is estimated by
replacing σ2
by 2
ˆ
 . The square root of the estimated forecast error variance is still called the standard error
of the forecast. If assumption MR6 (normality) is correct, or the sample size is large, then a 100(1-α)%
confidence interval or prediction interval for y0 is:
)
(
ˆ
)
(
ˆ
)
,
2
/
1
(
0
0
0
0
f
se
t
y
y
f
se
t
y
y
K
N
c







Example: Construct a 95% confidence interval for the prediction of y0 when P = 5.50 and A = 1200
Solution:
2
1
)
3
75
,
2
/
05
.
0
1
(
0
)
,
2
/
1
(
0
0
0
0
ˆ
*)
var(
))
1200
(
863
.
1
)
50
.
5
(
91
.
7
91
.
118
(
)
(
ˆ
)
(
ˆ















b
t
y
f
se
t
y
y
f
se
t
y
y
K
N
c
Therefore create two new variables:
P * = (P – 5.50) and A* = (A – 1200)
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
9429
.
4
993
.
1
66
.
77
0 


y
5112
.
87
809
.
67 0 
 y
We can therefore say with 95% confidence that the true value of sales when the price is $5.50 and the
advertising expenditure is $1200, that the true value of sales lies between 67.8 thousand and
87.5112thousand.
A reminder:
Estimated regression models describe the relationship between the economic variables for values similar to
those found in the sample data. Extrapolating the results to extreme values is generally not a good idea.
Predicting the value of dependent variables for values of the explanatory variables far from the sample
values invites disaster.
Goodness of Fit:
If the regression model constrains an intercept, we can still decompose the variation in the dependent
variable (SST) into its explainable and unexplainable components (SSR and SSE). Then the coefficient of
determination still measurers the proportion of the variation in the dependent variable that is explained by
the regression model:
SST
SSE
SST
SSR
R 

 1
2
The interpretation of R2 is identical to its interpretation in the simple regression model. i.e. R2
% of variation
can be explained by the estimated equation. (1 implies a perfect fit)
Adjusted R2:
A problem with R2
is that it can be made large by adding more and more variables to the model, even when
they have no economic justification. The adjusted R-squared imposes a penalty for adding more variables:










)
1
/(
)
/(
1
2
N
SST
K
N
SSE
R
Adjusted R-squared does not give the proportion of variation in the dependent variable that is explained by
the model. It should not be used as a criterion for adding or deleting variables (if we add a variable, adjusted
R-Squared will increase if the t-statistic on the new variable is greater than 1 in absolute value!)
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
SST = (N-1) (S.D. dependent variable)2
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 5
Multiple Regression B:
Non-sample information:
In many estimation problems, economic theory and experience provides us with information on the
parameters that is over and above the information contained in the sample data. If this non-sample
information is correct, and if we can combine it with the sample information, then we can estimate the
parameters with greater precision.
Some non-sample information can be written in the form of linear equality restrictions on the unknown
parameters. (e.g. several parameters sum to one). We can incorporate this information into the estimation
process by simply substituting the restrictions into the model.
One example is when dealing with a firm which has constant returns to scale – take for example the cobb-
dougles function whose parameters α and β must sum to 1 with constant returns to scale:


t
t
t
t L
K
A
y 
We can show: when K and L increase by proportion λ, that this has the effect λ on y also with constant
returns to scale.

























1
)
(
)
(
t
t
t
t
t
t
t
t
L
K
A
y
L
K
A
y
In order to incorporate the non-sample information, and impose constant returns to scale we should then
estimate the following model:

 
 1
t
t
t
t L
K
A
y
The model is now the function of a single unknown parameter α.
A technique to obtain an estimate of α in this case is known as restricted least squares - we “force” β = 1 – α
To estimate the above model in practise, we can use the least squares method – as the model is linear in its
parameters. We would convert the model to a log-log function as the model,
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of
assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been
superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the
copyright owner. We are making these materials available for the purposes of research and study and as such believe that this
constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
t
t
t
t
t e
L
K
A
y 



 )
ln(
)
1
(
)
ln(
)
ln(
)
ln( 

To insure the restriction holds we re-arrange and collect terms:
t
t
t
t
t e
L
K
A
L
y 

 )
/
ln(
)
ln(
)
/
ln( 
The restricted Least Squares Estimator:
The least squares estimates we obtain after imposing the restrictions are known as restricted least squares
(RLS) estimates.
The RLS estimator:
 Is biased unless the restrictions are EXACTLY true
 Has a smaller variance than the OLS (ordinary least squares) estimator, whether or not the
restrictions are true
By incorporating the additional information with the data, we usually give up unbiasedness in return for
reduced variances. Evidence on whether the restrictions are true can, of course, be obtained using an F-test
(Wald test).
Model Specification:
There are several key questions you should ask yourself when specifying a model:
Q1. What are the important considerations when choosing a model?
A1. The problem, the economic model
Q2. What are the consequences of choosing the wrong model?
A2. If the wrong model is used, there can be omitted and irrelevant variables in the model
Q3. Are there ways of assessing whether a model is adequate?
A3. Yes you can use model Diagnostics – A test of adequate functional form
In examining these model specifications we will look at the following example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Omitted variables:
It is possible that a chosen model may have important variables omitted. Our economic principles may have
overlooked a variable, or lack of data may lead us to drop a variable even when it is prescribed by economic
theory.
We will consider a sample of married couples where both husbands and wives work. This sample was used
by labour economist Tom Mroz in a classic paper on female labour force participation. The variables from
this sample are in edu_inc.dat.
We are interested in the impact of level of education, both the husband’s education and the wife’s education,
on family income. Summary statistics for the data appear in table 6.2. The estimated relationship is:
We estimate that an additional year of education for the husband will increase annual income by $3132, and
an additional year of education for the wife will increase income by $4523. If we now incorrectly omit
wife’s education from the equation:
FAMINC = the combined income of husband and wife
If we omit a relevant variable, then the least squares estimator will generally be biased, although it will
have lower variance.
Including irrelevant variables does not cause least squares method to be biased – however variance and
therefore standard errors will be greater.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
When we omit WEDU it leads us to overstate the effect of an extra year of education for they husband by
about $2000. This change in magnitude of a coefficient is typical of the effect of incorrectly omitting a
relevant variable.
To right a general expression for this bias for the case where one explanatory variable is omitted froma
model with two explanatory variables we write the underlying model as:
i
i
i
i e
x
x
y 


 3
3
2
2
1 


Omitting x3 from the equation is equivalent to imposing the restriction β3 = 0. It can be viewed as imposing
an incorrect constraint on the parameters. This of course has the implication of a reduced variance, but
causes biased coefficient estimators. We can show (in appendix 6B) that the new estimate b2* of β3 is:








)
var(
)
,
cov(
*)
(
*)
(
2
3
2
3
2
2
2
x
x
x
b
E
b
bias 
 


We can include further variables for instance, KL6 – the number of children under the age of 6. The larger
the number of young children, the fewer the number of hours likely to be worked and hence a lower family
income would be expected.
(0.004)
(0.000)
(0.000)
(0.488)
value)
-
(p
(5004)
(1061)
(796)
)
11163
(
)
(
6
14311
4777
3211
7755







se
i
KL
WEDUi
HEDUi
FAMINC





Notice that compared to the original estimated equation that the coefficients haven’t changed considerably
for HEDU and WEDU.
This outcome occurs because KL6 is not highly correlated with the education variables. From a general
modelling perspective, it means that useful results can still be obtained when a relevant variable is
omitted if that variable is uncorrelated with the included variables and our interest is on the coefficients
of the included variables.
Irrelevant Variables:
The consequences of omitting relevant variables may lead you to think that a good strategy is to include as
many variables as possible in your model. However this will:
Omission of a relevant variable leads to omitted variable bias. The bias increases with the correlation
between the included and omitted relevant variable.
Note: if cov(xi,xj) = 0 or if β3 = 0; then
bias will be 0. i.e. will be unbiased.
Corr(KL6, HEDU) = 0.105
Corr(KL6, WEDU) = 0.129
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
1. Complicate your model
2. Inflate the variances of your estimates
To examine this, we will add two artificially generated variables X5 and X6. These variables were
constructed so that they are correlated with HEDU and WEDU, but are not expected to influence family
income.
(0.591)
(0.692)
(0.004)
(0.000)
(0.000)
(0.488)
value)
-
(p
(1982)
(2242)
(50044)
(2278)
(1250)
)
11195
(
)
(
1067
889
6
14200
5869
3340
7759 6
5









se
X
X
KL
WEDUi
HEDUi
FAMINC i
i
i





The first thing that we notice is that the p-values for the two new coefficients are much greater than 0.05.
They do indeed appear to be irrelevant variables. Also, the standard errors of the coefficients for all other
variables have increased, with p-values increasing correspondingly. The inclusion of these irrelevant
variables has reduced the precision of the estimated coefficients for other variables in the equation.
The result follows because, by the Gauss-Markov theorem, the least squares estimator of the correct model
is the minimum variance linear unbiased estimator.
A Practical Approach:
We should choose a functional form that:
1. Is consistent with what economic theory tells us about the relationship between the variables
2. Is compatible with assumptions MR1 to MR5
3. Is flexible enough to fit the data
In a multiple regression context, this mainly involves:
1. Hypothesis testing
2. Performing residual analysis
3. Assessing forecasting performance
4. Comparing information criteria
5. Using the principle of parsimony
Hypothesis Testing:
The usual t- and F-tests are available for testing simple and jpoint hypotheses concerning the coefficients.
As usual, failure to reject a null hypothesis can occur because the data are not sufficiently rich to disprove
the hypothesis. If a variable has an insignificant coefficient, it can either be (a) discarded because it is
irrelevant, or (b) retained because there are strong theoretical reasons for including it.
The adequacy of a model can also be tested using a general specification test known as RESET.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Testing for Model Misspecification: RESET
RESET (Regression Specification Error Test) is designed to detect omitted variables and incorrect
functional form.
Intuition:
Hypotheses:
H0: The functional form is correct, no omitted variables (extra terms are statistically not significantly)
H1: The functional form is incorrect, or/and there are omitted variables (extra terms are statistically
significant)
Suppose that we have specified and estimated the regression model:
i
i
i
i e
x
x
y 


 3
3
2
2
1 


The predicted of “fitted” values of yt:
3
3
2
2
1
ˆ i
i
i x
b
x
b
b
y 


There are two alternative forms for the test:
Artificial Model 1: i
i
i
i
i e
y
x
x
y 




2
1
3
3
2
2
1 ˆ




Artificial Model 2: i
i
i
i
i
i e
y
y
x
x
y 





3
2
2
1
3
3
2
2
1 ˆ
ˆ 




Example: FAMINC model:
Step 1: State hypothesis
H0: γ = 0
H1: γ ≠ 0
Step 2: Decision Rule
Reject H0 if p-value < α = 0.05
Step 3: Calculate test statistic
Ramsay RESET Test:
If the chosen model and algebraic form are correct, then squared and cubed terms of the “fitted or
predicted” values should not contain any explanatory power.
If we can significantly improve the model by artificially including powers of the predictions of the model,
then the original model must have been inadequate
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
F-calc = 0.0440
Step 4: Compare
0.0440 < 0.05 Therefore reject H0
Step 5: Conclusion
There is sufficient evidence at the 5% level of significance to conclude that there are omitted
variables or the functional form is incorrect.
Selection of Models – Information Criteria
Akaike Information Criterion (AIC):
 Is often used in model selection for non-nested alternatives – smaller values of the AIC are preferred
N
K
N
SSE
AIC
2
ln 







The Schwarz Criterion (SC):
 Is an alternative to the AIC that imposes a larger penalty for additional coefficients
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
N
N
K
N
SSE
SC
)
ln(
ln 







Adjusted R2
:
 Penalizes for the addition of regressors which do not contribute to the explanatory power of the
model. It is sometimes used to select regressors, although the AIC and SC are superior. It does not
have the interpretation of R2
.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Collinear Economic Variables:
When data are the result of an uncontrolled experiment many of the economic variables may move together
in systematic ways.
Such variables are said to be collinear, and the problem is labelled collinearity, or multicollinearity when
several variables are involved.
Co-linearity: Moving together in a linear way
When there is collinarity, there is no guarantee that the data will be “rich in information” nor that it will be
possible to isolate the economic relationship or parameters of interest.
Consequences of collinearity:
1. One or more exact linear relationships amount the explanatory variables: exact collinearity, or exact
multicollinarity. Least squares estimator is not defined.
Multicollinearity calculation:
    y
x
x
x
b
T
T 1


From linear algebra, we know that a matrix whose rows and columns are not linearly independent
does not have an inverse, so matrix b – the multicollinearity cannot be calculated.
2. Nearly exact linear dependencies among the explanatory variables: some of the variances, standard
errors and covariances of the least squares estimators may be large.
 
 
 

 2
2
2
2
23
2
2
)
1
(
)
var(
x
x
r
b
i

For perfect collinearity:
r23 = -1 or 1 therefore (1-r23
2
) = 0
3. Large standard errors make the usual t-values small and lead to the conclusion that parameter
estimates are not significantly different from 0, ALTHOUGH high R2 or F-values indicate
“significant” explanatory power of the model as a whole.
smallvalue
b
se
b
tcalc
i
i
i



)
(

In general: Reject H0 (βi =0) if |tcalc| > |tcrit| therefore would conclude that Bi is 0.
4. Estimates may be very sensitive to the addition or deletion of a few observations, or the deletion of
an apparently insignificant variable.
5. Despite the difficulties in isolating the effects of individual variables from such a sample, accurate
forecasts may still be possible.
For Near Perfect collinearity:
r23 ≈ -1 or 1 therefore (1-r23
2
) ≈ 0
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example – Chinese Coal Production
We can detect multicollinearity by:
 Computing sample correlation coefficients between variables. A common rule
of thumb is that multicollinearity is a problem if the sample correlation between Only look at pairs
any pair of variables is greater than 0.8 or 0.9.
 Estimate auxiliary regressions (i.e. regress each explanatory variable on all the
others.) Multicollinearity is usually considered a problem if the R2
from an
auxiliary regression is greater than about 0.8.
Looks at
combinations of
variables.
Eg. x2 = 2x3 + 5x4
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Pair-wise Correlations:
Conclusion:
The pair wise correlation between some of the inputs is extremely high, such as between ln(x3) and ln(x2)
and ln(x3).
Auxillary regression on ln(x3):
Solution:
A possible solution in this case is to use non-sample information:
1. Constant returns to scale
2. Variables 4,5 & 6 all are statistically insignificant (=0)
Conduct a Wald Test:















0
0
0
1
:
6
5
4
7
2
0




i
i
H
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Mitigating the Effects of Multicollinearity:
The collinearity problem occurs because the data do not contain enough information about the effects of the
individual explanatory variables. We can include more information into the estimation process by:
 Obtaining more, and better data – not always possible in non-experimental contexts
 Introducing non-sample information into the estimation process in the form of restrictions on the
parameters.
Nonlinear Relationships:
Relationships between economic variables cannot always be adequately represented by straight lines. We
saw in Week 4 that we can add more flexibility to a regression model by considering logarithmic, reciprocal,
polynomial and various other nonlinear-in-the-variables functional forms.
 Linear in parameters, non-linear in variables
We can also use these types of functional forms in multiple regression models. In multiple regression
models, we also use models that involve interaction terms. When using these types of models some changes
in model interpretation are required.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Introductory Econometrics: ECON2300 – Dummy Variable Models
The Use of Dummy Variables in Econometric Models:
Assumption MR1 in the multiple Regresison model is:
i
iK
K
i
i e
x
x
y 



 

 ...
2
2
1 for i = 1, ..., N
1. The statistical model we assume is appropriate for all N observations in our sample
2. The parameters of the model, βk, are the same for each and every observation
3. If this assumption does not hold, and if the parameters are not the same for all the observations, then
the meaning of the least squares estimates of the parameters is not clear
There are some economic problems or questions where we might expect the parameters to be different for
different observations:
1. Everything else the same, is there a difference between male and female earnings?
2. Does studying econometrics make a difference in starting salaries of graduates?
3. Does having a pool make a difference in a house’s sale price in the Brisbane market?
4. Is there a difference in the demand for illicit drugs across race groups?
Dummy variables:
1. The simplest procedures for extending the multiple regression model to situations in which the
regression parameters are different for some or all of the observations in a sample
2. Dummy variables are explanatory variables that only take two values usually 0 and 1
3. These simple variables are a very powerful tool for capturing qualitative characteristics of
individuals, such as gender, race and geographic region of residence.
There are two main types of dummy variables:
1. Intercept Dummy Variables: parameter (coefficients) denoted - δ
2. Slope Dummy variables: parameter (coefficients) denoted – γ
Intercept Dummy Variables:
Intercept dummy variables allow the intercept to change for a subset of observations in the sample. Models
with intercept dummy variables take the form:
i
iK
K
i
i
i e
x
x
D
y 




 


 ...
2
2
1
where Di = 1 if the i-th observation has a certain characteristic and Di = 0 otherwise:
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205



















intercept
:
0
if
...
)
0
(
intercept
:
1
if
...
)
1
(
)
(
1
2
2
1
1
2
2
1











i
i
iK
K
i
i
i
iK
K
i
D
e
x
x
D
e
x
x
y
E
Note that the least squares estimator properties are not affected by the fact that one of the explanatory
variables consists only of zeros and ones – D is treated as any other explanatory variable. We can construct
an interval estimate for δ, or we can test the significance of its least squares estimate. Such a test is a
statistical test of whether the effect is “statistically significant”. If δ = 0, the variable has no effect on the
variable in question.
Example: House prices
A model that allows the intercept to vary with the presence or absence of a particular characteristic
Estimated equation:
Sqft
Pool 60
.
8
69
.
5
68
.
29
e
ĉ
Pri 


In this model the value of Pool = 0 defines the
reference group (homes with no pool). Two
equivalent model would be:
ˆ
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Log-Linear Models:
If e
SQFT
POOL
PRICE pool 


 2
1
)
ln( 


If e
SQFT
PRICE nopool 

 2
1
)
ln( 


 

 )
1
(
)
ln(
)
ln( nopool
pool PRICE
PRICE
Then: 













nopool
pool
nopool
pool
PRICE
PRICE
PRICE
PRICE
PRICE ln
)
ln(
)
ln(
)
ln(
1
1
1


















e
PRICE
PRICE
PRICE
PRICE
e
PRICE
PRICE
e
PRICE
PRICE
nopool
nopool
nopool
pool
nopool
pool
nopool
pool
And: 1


 
e
PRICE
PRICE
PRICE
nopool
nopool
pool
Thus, houses with pools are 100(eδ
-1)% more expensive than houses without pools, all other things
being equal.
Slope Dummy variables:
Slope dummy variables allow the slope to change for a subset of observations in the sample. A model that
allows β2 to vary across observations takes the form:
i
iK
K
i
i
i
i
i e
x
x
x
D
x
y 





 



 ...
3
3
2
2
2
1
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205


















x
of
slope
:
0
if
...
x
of
slope
:
1
if
...
)
(
)
(
1
2
2
2
2
1
2
2
2
2
1










i
i
iK
K
i
i
i
iK
K
i
D
e
x
x
D
e
x
x
y
E
Slope and Intercept Dummy Variables Combined:
Testing for Qualitative Effects:
Dummy variables are frequently used to measure:
1. Interactions between qualitative factors (e.g. race and gender)
2. The effects of qualitative factors having more than two categories (eg. level of schooling)
Example: WAGES
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Explaining wages as a function of individual characteristics using white males as the reference group:
e
FEMALE
BLACK
FEMALE
BLACK
EDUC
WAGE 





 )
(
1
2
1
2
1 




Only if black and female
does γ have an effect
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
To test the null hypothesis that neither race nor gender affect wages at the 1% Level of significance:
Now: Explaining wages as a function of location using workers in the northeast as the reference
group:
e
WEST
MIDWEST
SOUTH
EDUC
WAGE 




 3
2
1
2
1 




Not significant at 5%
LOS
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Not significant at 5%
LOS
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Testing the Equivalence of Two regressions:
By including an intercept dummy variable and an interaction term for every variable in a regression model,
we are allowing every coefficient in the model to differ based on the qualitative factor – we are specifying
two regressions.
A test of the equivalence of the two regressions is a test of the joint null hypothesis that all the dummy
variable coefficients are zero. We can test this null hypothesis using a standard F-test. This particular F-test
is known as a Chow test.
Explaining wage as a function of individual characteristics:
e
FEMALE
BLACK
FEMALE
BLACK
EDUC
WAGE 





 )
(
1
2
1
2
1 




To test if there are differences between the wage regressions for the south and the rest of the country we
estimate the model:
The two regression equations are:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
If south = 1
If south = 0
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
A Chow test at the 10% level of significance:
Controlling For Time:
Dummy variables are frequently used to control for:
 Seasonal effects
 Annual effects
 Regime effects (government)
Example: Emergency room cases
Data on number of emergency room cases per day is available in the file fullmoon.wk1. The model:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example – Stockton House prices
Example – Investment tax credits
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
ECONOMETRICS: ECON2300 – Lecture 7
Heteroskedasticity
If we were to guess food expenditure for a low-income household and food expenditure for a high-
income household we would be more accurate for a low-income house-hold as they have less choice
and only have a limited income which they MUST spend on food. Alternatively a high-income
household could have extravagant or simple food taste – a large variance at high income levels:
resulting in heteroskedasticity.
How can we model this phenomena?
Note that assumption MR3 says that the errors have equal variance, or equal (homo) spread
(skedasticity). An alternative and much more general assumption is:
2
)
var( i
i
e 

Heteroskedasticity is often encountered in cross-section studies, where different individuals may have
very different characteristics. It is less common in time-series studies.
Properties of the OLS Estimator:
If the errors are heteroskedastic then:
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
 OLS is still a linear and unbiased estimator. But it is inefficient in that it is no longer BLUE
– Best linear unbiased estimator
 The variances of the OLS estimators are no longer given by the formulas we discussed in
earlier lectures. Thus, confidence intervals and hypothesis tests based on these variances
are no longer valid.
There are three alternative courses of cation to deal with heteroskedasticity:
1. If in doubt, use Least Sqaures for the parameters and a standard errors formula that works
either way. (White Robust Standard Errors)
2. If heteroskedasticity is known to be present, use Generalised Least Squares (Weighted Least
Squares) – BLUE if variance is known
3. Test for Heteroskedasticity: (Goldfeld-Quant test or White’s General Test or Breusch- Pagan
Test)
a. If present, use feasible Generalised Least Squares (if variance unknown and must be
estimated)
b. If no evidence use least squares as it is BLUE
White’s Approximate Estimator for the Variances of the Least Sqaures Estimator under
Heteroskedasticity:
Whites estimator:
a) Is strictly appropriate only in large samples
b) If errors are homoskedastic, it converges to the least squares formula
The variances of the OLS estimators depend on σi
2
rather than σ2
. In the case of the simple linear
model:
i
i
i e
x
y 

 2
1 

The variance of b2 is given by:
 
 
 










N
i
i
i
N
i
i
N
i
i
i
w
x
x
x
x
b
1
2
2
1
2
1
2
2
2
)
)
)
(
r̂
va 

If we replace 2
i
 with 2
ˆi
e we obtain White’s heteroskedasticity consistent estimator
White’s Robust Standard errors give the same coefficients but a reduced standard error.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
What would happen if we always compute the standard errors (and therefore t-ratios) using White’s
formula instead of the traditional Least Squares?
This is known as Heteroskedasticity-Robust Inference, and it is used by many applied economists.
Robust estimation is a “branch” of econometrics.
When the true variance is homoskedastic and the sample is large, Whites formula converges
approximately to:
N
SSE

2
̂
The Generalised Least Squares (Weighted Least Squares):
1. Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator
2. One way of overcoming this dilemma is to change or transform our statistical model into one
with homoskedastic errors and then use Least squares
3. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error
model into a homosedastic error model.
If 2
i
 is known then we can weight the original data (including the constant term) and then perform
OLS on the transformed model. The transformed model is:
*
*
*
2
2
*
1
1
*
2
2
1
...
...
1
i
iK
K
i
i
i
i
i
i
iK
K
i
i
i
i
i
e
x
x
x
y
or
e
x
x
y





















The transformed model satisfies all the assumptions of the multiple regression model (including
homoskedasticity). Thus, applying OLS to the transformed model yields best linear unbiased
estimates. The estimator is known as Generalised Least Squares (GLS) or Weighted Least Squares
(WLS).
Sometimes 2
i
 is only known up to a factor of proportionality. In this case, we can still transform the
original model in such a way that the transformed errors are homoskedastic. Some popular
heteroskedastic specifications:
ij
ij
ij
i
ij
ij
ij
i
x
x
x
x
x
x
by
divide
therefore,
)
(
by
divide
therefore,
)
(
2
2
2
2
2
2
2












If our assumptions about the form of heteroskedasticity are incorrect then GLS will yield biased
estimates.
For: ij
ij
i x
x by
divide
therefore,
2
2
2

 

2
2
2
2
* 1
)
var(
1
var
)
var( t
t
t
t
t
t
i x
x
e
x
x
e
e 











For: ij
ij
i x
x by
divide
therefore,
2
2

 

Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
t
t
t
t
t
t
i x
x
e
x
x
e
e 2
* 1
)
var(
1
var
)
var( 











Feasible Generalised Least Squares:
If we reject the null hypotheses of homoskedasticity, we might wish to use an estimation technique for
the coefficients and the standard errors that accounts for heteroskedasticity.
We have already shown that if we “weight” the original data by some appropriate value we can
achieve a transformed model with homoskedastic errors that can be estimated by Ordinary Least
Squares (OLS).
We also note that the task of finding na appropriate weight in a multiple regression model is more
complicated as we might have several variables that are potentially an option.
Feasible Generalised Least Squares is based on the idea that we should use all the information
available, therefore, we will construct a suitable weight that is a function of all the explanatory
variables in the original model.
If 2
i
 is unkown then it must be estimated. The resulting estimator is known as Feasible Generalised
Least Squares (GLS) a popular specification:
)
...
exp( 2
2
1
2
iS
S
i
i z
z 


 



In this case, we estimate the model:
i
iS
S
i
i
i
i v
z
z
v
e 





 


 ...
)
ln(
)
ˆ
ln( 2
2
1
2
2
And then use the variance estimator:
)
ˆ
...
ˆ
ˆ
exp(
)
ˆ
( 2
2
1
2
iS
S
i
i z
z 


 



The aim is to produce a prediction 2
i
 , based on the model and then use it to weight the original
model.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
A Heteroskedastic Partition:
If we have the variance structure







N
N
i
N
i
i
,...,
1
for
,...,
1
for
1
2
2
2
1
2



Then we can estimate by applying OLS to the first N1 observations, and estimate by applying OLS to
the remaining N2 = N - N1 observations.
We can then develop weights and apply GLS to the model in the usual way (using all N observations)
We can apply GLS by generating a weight variable based on the two sub-samples:
 Immediately after estimating each partitioned equation, save the SE of the regression:
o Scalar se_rural=@se
o Scalar se_metro=@se
 Then generate the new series:
o Series weight = metro*(1/se_metro) + (1-metro)*(1/se_rurual)
Detecting Heteroskedasticity:
Methods for detecting the presence of heteroskedasticity
1. Plots of the least squares residuals, or squared residuals (with more than one
explanatory variable, we plot the least squares residuals against each explanatory
variable, or against the fitted values, to see if those residuals vary in a systematic way
relative to the specified variable.
2. White’s General test
3. Goldfeld-Quandt test
4. Breusch-Pagan test
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Testing for heteroskedasticity:
There are several possible test for heteroskedasticity as mentioned above. They all have the same
hypotheses:
H0: 2
2

 
t , the variance is constant (homoskedasticity)
H1: 2
2

 
t , the variance is not constant (heteroskedasticity)
White’s General Test:
When conducting white’s general test of heteroskedasticity, there are two alternatives available:
A: White test for heteroskedasticity – no crossed terms: 3
2
1 ,
,
:
3 x
x
x
S 
This option is a regression of squared residuals on independent variables and their
squares
B: White test for heteroskedasticity (including crossed term 3
2
3
1
2
1
3
2
1 ,
,
,
,
,
:
6 x
x
x
x
x
x
x
x
x
S 
This option is a regression of squared residuals on independent variables, their squares
and cross products.
This option uses up more degrees of freedom and it is not recommended when the
number of observations is relatively small.
In both cases the test procedures are:
i. Run the regression, save the residuals
ii. Run a second regression of the squared residuals on original explanatory variables plus
some extra terms
The tests are valid in large samples and the computer “F” statstic is a small sample approximation.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
White’s General Test:
Step 1: State hypotheses
H0: 2
2

 
t , the variance is constant (homoskedasticity)
H1: 2
2

 
t , the variance is not constant (heteroskedasticity)
Step 2: Decision rule
Reject H0 is p-value < α
Or
Reject H0 if WG > 2
)
1
( 
s
 where S = number of terms of interest
Step 3: Calculate test statistic
WG = N R2
Where: R2
is the coefficient of determination in the regression of 2
ˆi
e on all unique variables
contained in X1...XK, their squares (and their cross-products).
Note: the test doesn’t require any specific assumptions about the form of heteroskedasticity.
However, it may have low power, and it is non-constructive in the sense that it doesn’t tell us
what to do next.
Step 4: Decision
Reject H0 or Do Not Reject H0
Step 5: Conclusion
Conclusion about whether heteroskedasticity is present or not.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The Goldfeld- Quandt Test
The Goldeld-Quandt test involves splitting the sample into two approximately equal subsamples. If
heteroskedasticity exists, some observations will have large variances and others will have small
variances.
1. Divide the sample such that the observations with potentially high variances are in one
subsample and those with potentially low variances are in the other subsample. Make subset 1
the group with higher variances.
2. Compute estimated error variances and for each of the subsamples.
3. Compute the Goldfeld-Quandt statistic and compare to the F-distribution critical values.
Step 1: State hypotheses
H0: 2
2
2
1 
  , the variance is constant (homoskedasticity)
H1: 2
2
2
1 
  , the variance is not constant (heteroskedasticity)
Step 2: Decision rule
Reject H0 is p-value < α
Or
Reject H0 if GQ > )
,
,
1
( 2
1 K
N
K
N
F 

 where N1 and N2 are the number of observations in
each subsample.
Step 3: Calculate test statistic
2
2
2
1 /


GQ
Step 4: Decision
Reject H0 or Do Not Reject H0
Step 5: Conclusion
Conclusion about whether heteroskedasticity is present or not.
Notes: the above test is a one-sided test because the alternative hypothesis suggested which sample
partition will have the larger variance.
If we suspect that two sample partitions could have different variances, but we do not know which
variance is potentially larger, then a two-sided test with alternative hypothesis is more appropriate.
To perform a two-sided test at the 5% significance level we put the larger variance estimate in the
numerator and use a critical value such that:
P(F > Fcrit) = 0.025
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Breusch-Pagan Test:
Step 1: State hypotheses
H0: 2
2

 
i , for all i
H1:, )
...
( 2
2
1
2
2
S
S
i z
z
h 



 



Step 2: Decision rule
Reject H0 is p-value < α
Or
Reject H0 if BP > 2
)
1
( 
S

Step 3: Calculate test statistic
2
R
N
BP 

Step 4: Decision
Reject H0 or Do Not Reject H0
Step 5: Conclusion
Conclusion about whether heteroskedasticity is present or not.
Note: The zi values are not specified by the test and are chosen by the analyst.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics: ECON2300 – Lecture 8
Models with Autocorrelated Errors:
In a time-series context, the multiple linear regression model assumption 4 states taht there is no serial
correlation or autocorrelation:
In cross-section situations, where all data is recorded at a single point in time, the randomness of the
sample implies that the error terms for different observations (households or firms) will be
uncorrelated. There is no particular ordering of the observations that is more natural or better than
another.
Recall:
s
t
for
e
e
e
e
e
e
corr
s
t
s
t
s
t 

 :
)
var(
)
var(
)
,
cov(
)
,
(
However, when we have time-series data, where the observations follow a natural ordering through
time, there is always a possibility that successive errors will be correlated with each other. The change
in a level of an explanatory variable may have behavioural implications beyond the time period in
which it occurred. The consequences of economic decision that result in changes in economic
variables can last a long time. The possibility of autocorrelation should ALWAYS be entertained
when we are dealing with time-series data.
These effects do not happen simultaneously but are spread, or distributed, over future time periods. As
shown in Figure 9.1 economic actions taken at one point in time t have effects on the economy at time
t, but also at times t+1, t+2...t+n.
This carryover will be related to, or correlated with, the effects of earlier shocks, or impacts. When
circumstances such as these lead to error terms that are correlated, we say that autocorrelation exists.
It is important to note that MR2 (E(e) =0) and MR3 (Constant variance) can still hold when
autocorrelation is present.
Economic
action at time
t
Effect at time
t
Effect at time
t+1
Effect at time
t+2
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
First-order Autoregressive Errors:
In this topic, we assume the errors follow an AR(1) process (i.e. an autoregressive process of order 1):
t
t
t v
e
e 
 1
 Where: 1
1 

  is the autocorrelation coefficient.
Where the t
v are independent random error terms with mean zero and constant variance as we usually
assume about the error term in a regression model.
s
t
v
v
v
v
E s
t
v
t
t 


 :
for
0
)
,
cov(
)
var(
0
)
( 2

When the equation errors follow an AR(1) model they continue to have a zero mean. For the variance
of et, it can be shown that:
2
2
2
1
)
var(





 v
e
t
e (That the homoskedastic property holds)
However, the covariance between the errors corresponding to different observations is different. Since
we are using time-series data, when we say “ the covariance between errors corresponding to different
observations” we are referring to the covariance between errors for different time periods. This
covariance will be nonzero because of the existence of a lagged relationship between the errors from
different time periods.
k
e
k
t
t e
e 
 2
)
,
cov( 
 k > 0
Clearly we have shown that there is correlation in time series data:
However, when we propose a model using time series data, we expect the independent variables (the
x’s) to explain the behaviour of yt (unemployment) over time. Therefore, no correlation over time is
expect to remain in the error term.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Properties of the OLS Estimator:
Consequences for the Least Squares Estimator
If we have an equation whose errors exhibit autocorrelation, but we ignore it, or are simply unaware of
it, what does it have on the properties of least squares estimates?
1. The least squares estimator is still a linear unbiased estimator
2. OLS however is inefficient (i.e. it is no longer the BLUE – Best Linear Unbiased Estimator) it
is possible to find an alternative estimator with a lower variance. Having a lower variance
means that these is a higher probability of obtaining a coefficient estimate lose to its true value.
It also means hypotheses tests have greater power and a lower probability of a Type II error.
3. The formulas for the standard errors usually computed for the least squares estimators are no
longer correct, and hence confidence intervals and hypothesis tests that use these standard
errors may be misleading.
Although the usual least squares standard errors are not the correct ones, it is possible to compute
correct standard errors for the squares estimator when the errors are auto-correlated. These standard
errors are known as HAC (Heteroskedasticity and autocorrelation consistent) standard errors, or
Newey-West standard errors, and are analogous to the heteroskedasticity consistent standard errors
introduced in chapter 8.
These new estimators have the advantage of being consistent for autocorrelated errors that are not
necessarily AR(1) and do not require specification of the dynamic error model that is needed to get an
estimator with a lower variance.
1. Newey-West standard errors are robust to both autocorrelation and heteroskedasticity over time
2. Heteroskedasticity over time is when the variance changes over time. This is common in
financial time series.
3. Newey-West are not recommended for the traditional heteroskedasticity in cross-sectional
models such as those presented in previous topics – in this case White standard errors are
recommended
4. These robust standard errors are recommended for large samples only.
Estimation of Model with Autocorrelated Errors:
1. In order to estimate a model with autocorrelated errors, a model must be chosen.
2. The most commonly adopted model is the AR(1), “first order autoregressive process” This
process postulates that a “proportion” of the previous period’s value of “e” is carried over.
3. An AR(1) model of the error:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
t
t
t v
e
e 
 1
 Where: 1
1 

  is the autocorrelation
coefficient which indicates teh “proportion” that is
carried from period t-1 to t.
Autocorrelation parameter: ρ
Assume there is a shock of size “1” and the “proportion” that is remembered from one period to the
next is 0.9. i.e ρ = 0.9.
The shock will disappear from the memory after about 40 periods.
There are several estimation options available:
1. Generalised Least Squares (ρ is known)
i) Transform the model to a “star” model with non-autocorrelated errors
ii) Use Least Squares on the transformed model (the β’s are estimated in this step)
2. Feasible Generalised Least Squares (Cochrane-Orcutt or Prais-Weinsten)
i) Transform the model to a “star” model with non-autocorrelated errors (the parameter ρ
is estimated in this step)
ii) Use Least Squares on the transformed model (the β’s are estimated in this step)
3. Non-linear Estimation techniques
The use of non-linear estimation techniques is recommended.
Although the technical details of these techniques are beyond the scope of this course, we will
make use of the Eviews non-linear estimation technique.
Method 1: Generalised Least Squares
If the errors follow an AR(1) process and is known then we can obtain unbiased and efficient
estimates by applying OLS to a transformed model:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Only T-1 observations are used for the estimation (one observation is lost through lagging).
This transformation is known as the Cochrane-Orcutt transformation. If ρ is unknown we can
use the first-order sample correlation coefficient as an estimator.
There are a number of ways to estimate ρ:
)
(
r̂
va
)
,
(
v̂
co
)
,
(
1
1
1
t
t
t
t
t
e
e
e
r
e
e
corr





In eviews:
 @cor(x1,x2)
 Sample correlogram
Example: to show how confidence intervals can be misleading
P = price of sugar cane divided by the price of jute (a substitute)
A = area of sugar cane planted in thousands of hectares in a region in Bangladesh
Original model (how we would estimate without accounting for autocorrelation):
In command line: ls log(a) c log(p)
(0.2775)
(0.06134)
)
(
)
ln(
776119
.
0
89326
.
3
)
(
n̂
l
se
P
A 

Now lets estimate ρ from the sample correlagram:
1. In command line: ls log(a) c log(p)
2. View  Residual tests  Correlogram Q-tests
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The transformed GLS equation is:
    t
t
t
t
t
t P
P
A
A
y 

 





 
 )
ln(
395
.
0
)
ln(
)
395
.
0
1
(
)
ln(
395
.
0
)
ln( 1
2
1
1
*
In command line:
 ls (log(a) - 0.395*log(A(-1))) (1-0.395) (log(P)-0.395*log(P(-1)))
Dependent Variable: LOG(A)-0.395*LOG(A(-1))
Method: Least Squares
Date: 05/21/10 Time: 18:20
Sample (adjusted): 2 34
Included observations: 33 after adjustments
Variable Coefficient Std. Error t-Statistic Prob.
1-0.395 3.899243 0.087209 44.71165 0.0000
LOG(P)-0.395*LOG(P(-1)) 0.876123 0.255584 3.427925 0.0017
R-squared 0.274865 Mean dependent var 2.427009
Adjusted R-squared 0.251474 S.D. dependent var 0.324645
S.E. of regression 0.280875 Akaike info criterion 0.356874
Sum squared resid 2.445605 Schwarz criterion 0.447572
Log likelihood -3.888426 Hannan-Quinn criter. 0.387391
Durbin-Watson stat 1.773865
ρ  r1 = 0.395
k =1 k =2
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Note that it is also possible to obtain unbiased and efficient estimates by estimating the model:
This model is nonlinear in the parameters, which makes it difficult to find values of the parameters that
minimise the sum of squares function.
EViews finds the so-called nonlinear least squares (NLS) estimates numerically (by systematically
evaluating the sum of squares function at different values of the parameters until the least squares
estimates are found). NLS estimation is equivalent to iterative GLS estimation using the Cochrane-
Orcutt transformation.
Estimation equation:
 (log(a)) = c(1)*(1-c(3)) + c(2)*log(p) + c(3)*log(a(-1)) -c(2)*c(3)*log(p(-1))
Dependent Variable: LOG(A)
Method: Least Squares
Date: 05/21/10 Time: 19:05
Sample (adjusted): 2 34
Included observations: 33 after adjustments
Convergence achieved after 6 iterations
(LOG(A)) = C(1)*(1-C(3)) + C(2)*LOG(P) + C(3)*LOG(A(-1)) -C(2)*C(3)
*LOG(P(-1))
Coefficient Std. Error t-Statistic Prob.
C(1) 3.898771 0.092166 42.30159 0.0000
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
C(3) 0.422139 0.166047 2.542281 0.0164
C(2) 0.888372 0.259298 3.426060 0.0018
R-squared 0.277777 Mean dependent var 3.999309
Adjusted R-squared 0.229629 S.D. dependent var 0.325164
S.E. of regression 0.285399 Akaike info criterion 0.416650
Sum squared resid 2.443575 Schwarz criterion 0.552696
Log likelihood -3.874725 Hannan-Quinn criter. 0.462425
Durbin-Watson stat 1.820559
For an Autoregression model:
In command window: ls log(a) c log(p) ar(1)
Dependent Variable: LOG(A)
Method: Least Squares
Date: 05/21/10 Time: 19:06
Sample (adjusted): 2 34
Included observations: 33 after adjustments
Convergence achieved after 7 iterations
Variable Coefficient Std. Error t-Statistic Prob.
C 3.898771 0.092165 42.30197 0.0000
LOG(P) 0.888370 0.259299 3.426048 0.0018
AR(1) 0.422140 0.166047 2.542284 0.0164
R-squared 0.277777 Mean dependent var 3.999309
Adjusted R-squared 0.229629 S.D. dependent var 0.325164
S.E. of regression 0.285399 Akaike info criterion 0.416650
Sum squared resid 2.443575 Schwarz criterion 0.552696
Log likelihood -3.874725 Hannan-Quinn criter. 0.462425
F-statistic 5.769216 Durbin-Watson stat 1.820560
Prob(F-statistic) 0.007587
Inverted AR Roots .42
Testing For Autocorrelation:
There are several methods for detecting the presence of autocorrelation:
1. Residual plots
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
2. Residual Correlograms
3. Durbin-Watson test
4. Lagrange Multiplier test
1. Residual plots:
Postive autocorrelation is likely to be present if residual plots reveal runs of positive residuals
followed by runs of negative residuals.
Negative autocorrelation is likely to be present if positive residuals tend to be followed by
negative residuals and negative residuals tend to be followed by positive residuals (+ve, -ve,
+ve, -ve in order)
2. Residuals Correlograms:
(View  Residual tests  correlogram Q-Stat)
Positive autocorrelation Negative autocorrelation
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
From the diagram it can be seen that Autocorrelation is significant
3. Durbin-Watson Test
The Durbin-Watson test is by far the most important way of detecting AR(1) errors.
t
t
t v
e
e 
 1
 Where: 1
1 

  is the autocorrelation
ρ is the parameter we test.
It is assumed that the vt are independent random errors with distribution N(0, σ v
2
)
The assumption of normally distributed errors is needed to derive the probability distribution of
the test statistic used in the Durbin-Watson test. The DW-Statistic probability distribution
depends on the values of the explanatory variables. It is impossible to tabulate critical values
that can be used for every possible problem.
To overcome this problem we use a “bounds test”
Durbin and Watson considered two other statistics dL and dU whose probability distributions do
not dpend on the explanatory variables and which have the property that:
dL < d < dU.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Irrespective of the explanatory variables in the model under consideration.
Step 1: State Hypotheses
H0: p = 0 ( no autocorrelation)
H1: p > 0 (positive autocorrelation is present) – note we don’t usually test for –ve
Autocorrelation
Step 2: Decision Rule
Reject H0 if d < dL
Do not reject H0 if d > dU
No conclusion if dL < d < dU
Note : these values are tabulated in the durbin-watson tables
Step 3: Calculate Test statistic
From e-views output
Step 4: Decision/Comparison
Step 5: Conclusion
4. A Lagrange Multiplier (LM) Test (Breusch-Godfrey test)
Step1: State Hypotheses
H0: ρ = 0
H1: ρ ≠ 1
Step 2: Decision Rule
Reject H0 if LM > 2
1

Step 3: Calculate test statistic
LM = TR2
Where R2 is the coefficient of determination in the regression of the
1
2 ˆ
and
,...
,
1
on
ˆ 
t
ik
i
t e
x
x
e
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
And T is the sample size.
Alternatively, in Eviews: View  Residual tests  Serial LM test
Step 4: Comparison/Decision
Step 5: Conclusion
Note the following points:
1. The Durbin-Watson test is an exact test valid in finite samples. The LM test is an
approximate large-sample test, the approximation occurring because the random error is
replaced by residuals
2. The Durbin-Watson test is not valid if one of the explanatory variables is the lagged
dependent variable yt-1. The LM test can still be used in these circumstances. This fact is
particularly relevant for distributed lag models.
3. The Durbin-Watson test is designed for a specific form of autocorrelation under the
alternative, known as AR(1) errors (more on these shortly). The LM test can be used for
other types of autocorrelated models by including the additional lagged errors, and using an
F or 2
 test to test the relevance of their inclusion.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics – ECON2300: Lecture 9 Dynamic Models
Autoregressive Models:
An autoregressive model expresses the current value of a variable as a function of its own
lagged values. An autoregressive model of order p, denoted AR(p) takes the form:
Where the vt are independent random error terms with mean zero and constant variance 2
v
 .
The error term is “Well-behaved”, so the model can be estimated using OLS. The usual
hypotheses testing procedures and goodness-of fit statistics are valid. We choose a value of p
using the usual methods – hypothesis tests, residual analysis, information criteria, parsimony.
We want to model U.S. inflation, given CPI data for the period December 1983 to May 2006.
Preparing the data:
The regression model:
In this case we will use an AR(2) model:
In command line: ls INFLN c INFLN(-1) INFLN(-2)
Dependent Variable: INFLN
Method: Least Squares
Date: 05/21/10 Time: 22:17
Sample (adjusted): 1984M03 2006M05
Included observations: 267 after adjustments
Variable Coefficient Std. Error t-Statistic Prob.
C 0.209278 0.021781 9.608328 0.0000
INFLN(-1) 0.355224 0.060520 5.869540 0.0000
INFLN(-2) -0.180537 0.060341 -2.991927 0.0030
R-squared 0.120232 Mean dependent var 0.253534
Adjusted R-squared 0.113568 S.D. dependent var 0.209803
S.E. of regression 0.197531 Akaike info criterion -0.394674
Sum squared resid 10.30084 Schwarz criterion -0.354368
Log likelihood 55.68895 Hannan-Quinn criter. -0.378483
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
F-statistic 18.03964 Durbin-Watson stat 1.963006
Prob(F-statistic) 0.000000
Now view correlogram of residuals: View  residual tests  correlogram Q-stat
Now try an AR(3) model:
In command line: ls INFLN c INFLN(-1) INFLN(-2) INFLN(-3)
Dependent Variable: INFLN
Method: Least Squares
Date: 05/21/10 Time: 22:21
Sample (adjusted): 1984M04 2006M05
Included observations: 266 after adjustments
Variable Coefficient Std. Error t-Statistic Prob.
C 0.188335 0.025290 7.446877 0.0000
INFLN(-1) 0.373292 0.061481 6.071690 0.0000
INFLN(-2) -0.217919 0.064472 -3.380029 0.0008
INFLN(-3) 0.101254 0.061268 1.652641 0.0996
R-squared 0.129295 Mean dependent var 0.253389
Adjusted R-squared 0.119325 S.D. dependent var 0.210185
S.E. of regression 0.197247 Akaike info criterion -0.393799
Sum squared resid 10.19345 Schwarz criterion -0.339912
Log likelihood 56.37528 Hannan-Quinn criter. -0.372150
F-statistic 12.96851 Durbin-Watson stat 2.000246
Prob(F-statistic) 0.000000
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The first model AR(2) is slightly better as seen from the Schwarz criterion and AIC being
lower in the first. Also, INFLN(-3) is not significant at the 5% LOS.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The equation that tells us the value yT+1 is:
Our estimate (or forecast of this value is):
The forecast value two periods beyond the end of the sample is:
And so on, for future periods. Confidence intervals for our forecasts are difficult to compute
manually because forecast error variances are highly non-linear functions of the variances of
the OLS estimators.
Using our preferred model AR(2) to forecast two periods beyond the end of the sample:
Accounting for coefficient uncertainty, a 95% confidence interval for yT+1 is:
39174
.
0
2599
.
0
19896
.
0
9689
.
1
2599
.
0
ˆ
ˆ 1
1





 
c
T t
y
6516
.
0
1319
.
0 1 

 
T
y
Finite Distributed Lag Models:
An finite distributed lag (FDL) model expresses the current value of the dependent variable
as a function of current and lagged values of exogenous variables (variables external to the
variable of interest). If there is only one exogenous variable, a finite distributed lag model of
order q takes the form:
Where the vt are independent random error terms with mean zero and constant variance The
coefficients βS are called distributed lag weights.
Note: tc: scalartc = @qtdist(0.975,264)
Can’t be calculated manually due to non-linearity
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Again, the model can be estimated using OLS, and the usual hypothesis testing procedures
and goodness-of-fit statistics are valid. We choose a value of q using the usual methods.
(AIC and Schwartz).
Impact and Delay Multipliers:
Suppose y and x have been constant for at least the last q periods. Then suppose x is
increased by 1 unit before returned to its original level. Then:
 yt will increase by β0 units. This coefficient is known as the impact multiplier
 yt+1 will increase by β1 units. This coefficient is called the one-period delay
multiplier
 yt+s will increase by βs units. These βs coefficients are called s-period delay
multipliers
Interim and Total Multipliers
Supplose y and x have been constant for at least the last q periods. Then suppose x is
increased by 1 unit and maintained at this new level.
 yt will increase by β0 units. This coefficient is known as the impact multiplier
 yt+1 will increase by β0 + β1units. This coefficient is known as the impact
multiplier
 yt+S will increase by β0 + β1+ ...+ βS units. This coefficient is known as the s-
period interim multiplier
 The final effect on yt+S is the total multiplier 

q
S 0

s-period delay
multipliers
Impact
multiplier
y
x
x y
0 α
0 α
0 α
1 α + β0 = impact multiplier
0 α + β1 = 1-period delay multiplier
0 α + β2 = 2-period delay multiplier
0 α
0 α
0 α
s-period delay
multipliers
Impact
multiplier
y
x
β0
β1
β2
Total
multiplier
x y
0 α
0 α
0 α
1 α + β0 = impact multiplier
1 α + β0 + β1 = 1-period delay multiplier
1 α + β0 + β1 + β2 = 2-period delay multiplier
1 α + β0 + β1 + β2
1 α + β0 + β1 + β2
1 α + β0 + β1 + β2
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The Almon Lag:
Multicollinearity can be a problem in FDL (finite distributed Lag) models, particularly if q is
large. This makes it difficult to identify the multipliers. One solution is to impose some
constraints on the lag coefficients. A popular lag structure is the Almon lag (quadratic or
second degree polynomial):
Therefore:
Simplifying:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Therefore:
In the Almon scheme y is regressed on the constructed Z variables, not on the original x
variables. Once the α values are estimated, the original β’s can be estimated.
Suppose that our model is given by:
We know that each βt is given by:
Therefore:
β0 = α0 + α1(0) + α2(0)2
= α0
β1 = α0 + α1(1) + α2(1)2
= α0 + α1 + α2
β2 = α0 + α1(2) + α2(2)2
= α0 + 2α1 + 4α2
β3 = α0 + α1(3) + α2(3)2
= α0 + 3α1 + 9α2
β4 = α0 + α1(4) + α2(4)2
= α0 + 4α1 + 16α2
We use the data to estimate α0, α1, α2. This is known as Almon Polynomial.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Infinite Distributed Lag:
The main problem with this type of model is that the number of lags “q” must be chosen
empirically using selection criteria such as AIC and SBC.
This is viewed as too data driven – you might fit the data perfectly but you’re actually fitting
the errors, this is an example of an application of the taylor’s theory that any model can be
fitted with high enough polynomials.
A more appropriate approach is to choose an infinite distributed lag model:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The model is impossible to estimate since there are an infinite numbers of parameters.
Models have been developed that are parsimonious (simpler but do an effective job) and
which reduce the number of parameters to estimate.
The Cost of reducing the number of parameters is that these models must assume particular
patterns for the parameters βi which are called distributed lag weights.
The most popular ones are known as Geometric distributed lag.
 βi are called distributed lag weights
 β is a scaling factor and the parameter ϕ is less than 1 in absolute value.
 The lag weights βi decline towards zero as i gets larger.
Although an improvement on the finite distributed lag. The geometric distributed lag still
imposes a strong pattern of decline on the parameter.
This model would not do well in a situation in which the peak effect does not occur for
several periods, such as when modelling monetary or fiscal policy.
Thus, a preferred approach is to use Autoregressive distributed lag model, ARDL.
The ARDL are infinite distributed lag models, but they are flexible and parsimonious.
An autoregressive distributed lag (ARDL) model expresses the current value of the dependent
variable as a function of its own lagged values as well as current and lagged values of
exogenous variables. When there is only one exogenous variable, an ARDL (p,q) model take
the form:
Where the vt are independent random error terms with mean zero and constant variance σv
2
.
Again, the model can be estimated using OLS, and the usual hyptohese testing procedures
and goodness-of-fit statistics are valid. We choose values of p and q using the usual methods.
An example of an ARDL (Autoregressive distributed lag model) is:
This is denoted as ARDL (1,1) as it contains one lagged value of x and one lagged value of y.
A model containing p lags of y and q lags of x is denoted:
ARDL(p,q)
Despite its simple appearance the ARDL(1,1) model represents an infinite lag.
Lags of y Lags of x
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
If the usual error assumptions on the error term e hold, then the parameters can be estimated
by least squares.
Any ARDL(p,q) model can be written in the form of an infinite distributed lag model:
Estimates of the lag weights β (and therefore the delay, interim and total multipliers) can be
found from estimates of the δ and θ coefficients. The precise relationship between them
depends on the values for p and q. For example if p = 2 and q = 3.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The Geometrics Lag:
If p = 1 and q = 0 then the ARDL(p,q) model is an infinite distributed lag model with:
This model is also called a geometric lag model because the lag weights begin at δ0 and then
evolve geometrically through time according to the relationship 1
1 
 S
S 

 . If |θ1| < 1 then
the total multiplier is:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
1
0
2
1
1
0
0 1
...)
1
(















s
S
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Econometrics – ECON2300: Non-Stationary Time Series Data Lecture 10
Stationarity:
A time series yt is said to be stationary if the mean, variance and covariances of the series are all finite
and do not depend on t. Mathematically, the series is stationary if:
E(yi) = μ < ∞ (constant mean)
var(yi) = σ2
< ∞ ( constant variance)
cov( yt,yt+s) = cov( yt,yt-s) = γS (covariance depends on s, not t)
Applies mainly to Macroeconomic variables: Bond ratio’s, Stock market, exchange rates etc.
Recall from last week that AR(1) models take the form:
t
t
t v
y
y 

 1


Where the vt are white noise. If |ρ| < 1 then
The following are two examples of stationary data:
These conform with the conditions of
stationarity. (see above)
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Trend-stationary Processes:
Consider a model with a deterministic trend:
t
t v
t
y 

 

Where the vt are white noise. For this model:
t
y
E t 
 

)
(
So the process is non-stationary. If we knew the value of λ we could get a stationary process by de-
trending. The unknown parameters can be estimated using OLS.
A non-stationary process is said to be trend-stationary if it can be made stationary by de-trending.
Difference-stationary Processes:
Consider the special case of an AR(1) model with ρ = 1
t
t
t v
y
y 

 1

In this case:
2
0
)
var(
)
(
v
t
t
t
y
y
t
y
E





So the process is non-stationary. The model is known as a random walk with drift. In the special case
where α = 0, the model is known simply as a random walk. (no drift)
First series is trending upward, if we estimate
the equation we can de-trend it.
De-trended series: we found that the series increased by
0.1 per unit of time therefore, if we remove this from
there series, it becomes trend –stationary.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
An interesting feature of a random talk process is that the first differenced series is stationary. For a
random walk process with drift:
t
t
t
t v
y
y
y 



  
1
Thus,
0
for
0
)
,
cov(
)
var(
)
(
1
2








 x
y
y
y
y
E
t
t
v
t
t


A non-stationary process is said to be difference-stationary if it can be made stationary by differencing
but cannot be made stationary by de-trending.
Random walk with drift Can be made stationary through differencing
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Spurious Regressions:
The main reason why it is important to know whether a time series is stationary or non-stationary before
one embarks on a regression analysis is that there is a danger of obtaining apparently significant
regression results from unrelated data when non-stationary series are used in regression analysis.
If one or more variables in a regression analysis are difference stationary then there is danger of obtain
apparently significant results even though the variables are totally unrelated.
Such regressions are said to be spurious. In such cases:
 The finite sample properties of the OLS estimator are unknown
 The usual t-and F-statistics do not have well-defined distributions
 R2
values are totally unreliable
 The DW statistic tends to zero
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Unit Root Tests
If we known a variable is non-stationary then it is important to determine whether it is difference-
stationary or trend-stationary. One popular approach is to consider the model:
And then use formal testing procedures to test:
Step 1: State Hypotheses
H0: ρ = 1
H1: ρ < 1
Note that unit root tests are complicated by the fact that if H0 is true then the distribution of the OLS
estimator depends on whether the (unknown) true data generating process contains an intercept and/or a
time trend. . Thus, we can’t just estimate (1) and conduct a standard t-test.
Instead we can conduct a dickey-fuller (DF) tests.
Dickey-Fuller (DF) Tests
To conduct a Dickey-Fuller test we use OLS to estimate one or more of the models:
Where γ ≡ ρ -1. Equation 2 is just another way of writing equation (1) above. Equations 3 and 4 are
restricted versions of 2.
Irrespective of which equation we estimate, we will reject the null hypotheses H0: γ= 0 by comparing
the standard t-statistic to critical values obtained from special Dickey-Fuller tables. In this situation, the
t-statistic is usually called a τ (tau) statistic.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Our ability to reject H0: γ= 0 when it is false (the power of the test) is low if the estimated regression
doesn’t contain exactly the same deterministic regressors as the true data-generating process. What to
do? In practise if the series appears to be wandering or fluctuating around a
 Linear trend we use test equation (2)
 Sample average that is non-zero we use test equation (3)
 Sample average of zero we use test equation (4)
If we use a particular test equation and fail to reject H0, this could be because the test equation contains
the wrong deterministic regressors. Given this possibility, we sometimes re-conduct the test using a
different test equation.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Augmented Dickey-Fuller (ADF) Tests:
To conduct an ADF test we follow exactly the same procedure except we estimate the models:
We add as many “augmentation terms: as we need to ensure the residuals are not autocorrelated. We still
test H0: γ = 0 by comparing the tau statistic to critical values in the usual DF tables.
Model Selection and Estimation:
If we suspect a time series contains a (deterministic or stochastic) trend we should...
1. Conduct unit root tests
2. If there is no unit root then any trend must be deterministic implying we should include a time
trend in our model
3. If there is a unit root then as a rule, we should first-difference the series. If we suspect this new
series contains a trend we should return to Step 1. Otherwise we can go ahead and model the
first-differenced series.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
ECON2300 – Lecture 11: Cointegration
Order of Integration:
Recall that if yt follows a random walk, wt then γ = 0 and the first difference of yt becomes:
t
t
t
t v
y
y
y 


 1
An interesting feature of the series t
t
t
t v
y
y
y 


 1 is that it is stationary since vt, being
an independent (0, σv
2
) random variable, is stationary. Series like yt, which can be made
stationary by taking the first difference, only once then it is said to be integrated of order one,
and is denoted I(1).
Stationary series are said to be integrated of order zero, and are denoted I(0).
In general, the order of integration is the minimum number of times the series must be
differenced to make it stationary.
Linear combinations of I(1) variables are usually also I(1). But this is not always the case.
For example:
The following series: the Federal Funds rate:
Conduct a dicky-fuller test for stationarity of the Change in federal funds rate: ∆F
1. The ∆F plot appears to be stationary and fluctuating around 0, therefore we use the
test equation without the intercept term.
∆F : Change in the Federal Funds Rate
F: Federal Funds Rate
0
Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these
notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment.
Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These
notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We
are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with
any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth).
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
From the test we reject the null hypotheses that ∆F is non-stationary and accept the
alternative that it is stationary.
Therefore as the first difference is stationary, we say that the series Ft is I(1) because it had to
be differenced once to make it stationary. Note also that ∆Ft is I(0) as it is a stationary series.
Suppose that wt is a random walk and εyt and εxt are white noise. The following processes are
all I(1):
Linear combinations of I(1) variables are usually I(1).
However, sometimes, linear combinations of I(1) variables are I(0). In this case, the variables
are said to be cointegrated.
Linear
combination
Cointegration: when two or more I(1) variables are combined linearly and result in an I(0)
Linear
combination
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
The linear combination of zt is I(0) so yt and xt are cointegrated.
Example:
In this case, there is no linear combination of yt and xt that is I(0). Therefore the variables are
not cointegrated.
If we regress y on x we see that our model and explanatory variables are highly significant!
(F-stat = 1482.33, p-values <<0.001)
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
BUT! As we discussed last week, this regression is spurious. The two variables were
generated independently and in truth, have no relation to one another, yet the results suggest
that the simple regression model fits the data well and is highly significant! These results
are, however, completely meaningless or spurious. The apparent significance of the
relationship is false. It results from the fact that we have related one series with a stochastic
trend to another series with another stochastic trend.
Example 3:
Again, there is no linear combination of yt and xt that is I(0) so the two variables are
no cointegrated.
As a general rule, non-stationary time-series variables should not be used in regression models, to avoid
the problem of spurious regression. However, there is an exception to this rule. If yt and xt are non-
stationary variables I(1) variables, then we expect their difference, or any linear combination of them,
such as et = yt – β1 – β2xt to be I(1) as well. However there is an important case where et = yt – β1 – β2xt
is a stationary I(0) process. In this case yt and xt are said to be cointegrated.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example 4:
The linear combination t
t
t x
y
z 25
.
1

 is I(0) so the two variables are cointegrated.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example 5:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Testing for Cointegration:
When two variables are cointegrated, it implies that yt and xt have similar stochastic trends,
and since the difference et is stationary, they never diverge too far from each other. A natural
way to test whether yt and xt are cointegrated is to test whether the errors et = yt – β1 – β2xt
are stationary. Since we cannot observe et, we test the stationarity of the least squares
residuals t
ê = yt – b1 – b2xt using the Dickey-Fuller test.
The test for cointegration is effectively a test of the stationarity of the residuals. If the
residuals are stationary, then yt and xt are said to be cointegrated; if the residuals are non-
stationary, then yt and xt are not cointegrated, and any apparent regression relationship
between them is spurious.
Suppose the variables tK
t
t x
x
y ,...,
, 2 are all I(1) and are cointegrated.
Then we can write:
t
tK
K
t
t e
x
x
y 



 

 ...
2
2
1
Where et is I(0). In this case OLS is unbiased and (super-)consistent. If the variables are not
cointegrated then et will be I(1) and the regression would be spurious.
One method of testing whether variables are cointegrated is to estimate the regression model
and use an ADF (augmented dickey-fuller) test to determine whether the residuals are I(0).
However, because the test is based on estimates of the et we must use critical values obtained
from special Engle-Granger (EG) tables.
Step 1: State Hypotheses
H0: the series are not cointegrated  residuals are non-stationary
H1: the series are cointegrated  residuals are stationary
Engle-Granger (EG) table of critical values:
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Example 5: continued
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205
Model Selection and Estimation:
 If variables are stationary, or I(1) and cointegrated, we can estimate a regression
relationship between the levels of those variables without fear of encountering a
spurious regression.
 If the variables are I(1) and not cointegrated, we need to estimate a relationship in first
differences.
 If the variables are trend-stationary we should estimate a regression relationship that
includes a trend variable.
Downloaded by Lamin Dampha (ldampha@utg.edu.gm)
lOMoARcPSD|2941205

revision-notes-introductory-econometrics-lecture-1-11.pdf

  • 1.
    Econometrics: ECON2300 –Lecture 1 The Econometric Model: Econometrics is about how we can use theory and data from economics, business and the social sciences, along with tools from statistics, to answer “how much” type questions. In economics we express our ideas about relationships between economics variables using the mathematical concept of a function. An example of this is when expressing the price of a house in terms of its size. Price = f(size) Hedonic Model: A model that decomposes the item being researched into its constituent characteristics, and obtains estimates of the contributory value of each characteristic An example of a hedonic model for house price might be expressed as: ) , , , , , , ( Price oning airconditi pool age stories bathrooms bedrooms size f  Economic theory does not claim to be able to predict the specific behaviour of any individual or firm, but rather is describes the average or systematic behaviour of many individuals for firms. Economic models = Generalisation In fact we realise that there will be a random and unpredictable component e that we will call random error. Hence the econometric model for price would be e oning airconditi pool age stories bathrooms bedrooms size f   ) , , , , , , ( Price The random error e, accounts for the many factors that affect sales that we have omitted from this simplistic model, and it also reflects the intrinsic uncertainty in economic activity. Take for example the demand relation: i p p p i p p p f q c s c s d 5 4 3 2 1 ) , , , (            The corresponding econometric model is: e i p p p i p p p f q c s c s d        5 4 3 2 1 ) , , , (      Econometric Models include the error term, e Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 2.
    In every modelthere are two parts: 1. A systematic portion – part we obtain from economic theory, includes assumptions about the functional form. 2. An unobservable random component – “noise” component which obscures our understanding of the relationship among variables: e. How Do we Obtain Data? In an ideal world: 1. We would design an experiment to obtain economic observations or sample information 2. Repeating the experiment N times would create a sample of N sample observations In the real world: Economists work in a complex world in which data on variables are “observed” and rarely obtained from a controlled experiment. It is just not feasible to conduct an experiment to obtain data. Thus we use non-experimental data generated by an uncontrolled experiment. Experimental data: Variables can be fixed at specific values in repeated trials of the experiment Non-experimental data: Values are neither fixed nor repeatable Most economic, financial or accounting data are collected for administrative rather than research purposes, often by government agencies or industry. The data may be:  Time-series form – data collected over discrete intervals of time (stock market index, CPI, GDP, interest rates, the annual price of wheat in Australia from 1880 to 2009)  Cross-sectional form – data collected over sample units in a particular time period (income in suburbs in Brisbane during 2009, or household census)  Panel data form – data that follow individual microunits over time (data for 30 countries for the period 1980-2005, monthly value of 3 stock market indices over the last 5 years) Data may be collected at various level of aggregation:  Micro – data collected on individual economic decision-making units units such as individuals, households, or firms  Macro – data resulting from a pooling or aggregating over individuals, households, or firms at the local, state, or national levels Data collected may also represent flow or a stock:  Flow – outcome measures over a period of time, such as the consumption of petrol during the last quarter of 2005  Stock – outcome measured at a particular point in time, such as the quantity of crude oil held by BHP in its Australian storage tanks on April 1, 2002, or the asset value of Macquarie Bank on 5th July 2009. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 3.
    Data collected maybe quantitative or qualitative:  Quantitative – numerical data, data that can be expressed as numbers or some transformation of them such as real prices or per capital income  Qualitative – outcomes that of an “either-or” situation that is whether an attribute is present or not. Eg. Colour, or whether a consumer purchased a certain good or not (Dummy variables) Statistical Inference: The aim of statistics is to “infer” or learn something about the real world by analysing a sample of data. The ways which statistical inference are carried out include:  Estimating economic parameters, such as elasticities  Predicting economic outcomes, such as the enrolments in bachelor degree programs in Australia for the next 5 years.  Testing economic hypotheses, such as: Ii newspaper advertising better than “email” advertising for increasing sales? Econometrics includes all of these aspects of statistical inference. There are two types of inference: 1. Deductive: go from a general case  to  a specific case: this is used in mathematical proofs 2. Inferential: go from a specific case  to  a general case: this is used in statistics Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 4.
    Review of StatisticConcepts: Random variables: Discrete and Continuous Random variable: A random variable is a variable whose value is unknown until it is observed, it is not perfectly predictable. The value of the random variable results from an experiment (controlled or uncontrolled). Uppercase letters (e.g. X) are usually used to denote random variables. Lower case letters (e.g. x) are usually used to denote values of random variables. Discrete random variable: A discrete random variable can take only a finite number of values that can be counted by using the positive integers  E.g. The number of cars you own, your age in whole years, etc.  Dummy variables:       female) isnot person If female is person If 0 1 D Probability distribution of a discrete random variable: A discrete random variable has a probability density function which summarises all the possible values of a discrete random variable together with their associated probabilities. It can be in the form of a table, formula or graph. Two key features of a probability distribution are its centre (location) and width (dispersion); the mean, μ, and variance, σ2 , respectively. For a discrete random variable X: Mean:      ) ( ) ( x X P x X E  Variance:             ) ( ) ( ) ( ) ( 2 2 2 x X P x X E X Var    It can be seen in the graph above that there are only distinct values that the variable x can take which is what a discrete variable is – the probability density function is NOT continuous. Discrete probability distributions are: 1. Mutually exclusive – no overlap between values 2. Collectly exhausted – full sample space covered, includes every possibility frequency Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 5.
    Example: A 5sided dice is biased: the sides have 0, 1, 2, 3 & 4 respectively the following table shows the probability distribution. a) Calculate the mean & variance of X b) Sketch the probability distribution of X c) Find P(X 2  ) X 0 1 2 3 4 P(X) 0.10 0.45 0.30 0.10 0.05 Solution: a) i) Mean: 55 . 1 05 . 0 4 10 . 0 3 30 . 0 2 45 . 0 1 10 . 0 0 ) 4 ( 4 ) 3 ( 3 ) 2 ( 2 ) 1 ( 1 ) 0 ( 0 ) (                               X P X P X P X P X P x X P X  ii) Variance:   9475 . 0 05 . 0 ) 55 . 1 4 ( 10 . 0 ) 55 . 1 3 ( 30 . 0 ) 55 . 1 2 ( 45 . 0 ) 55 . 1 1 ( 10 . 0 ) 55 . 1 0 ( ) ( ) ( 2 2 2 2 2 2 2                      x X P X   b) c) P(X 2  ) 85 . 0 40 . 0 45 . 0 10 . 0 ) 2 ( ) 1 ( ) 0 ( ) 2 (            X P X P X P X P X P(X=x) 0 1 2 3 4 0.45 0.30 0.10 0.05 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 6.
    Continuous random variable: Acontinuous random variable can take any real value (not just whole numbers or positive) generally measurable.  E.g. Your height, the temperature etc. Easy way to establish, is to pick a random number eg. 3.4314135315 and ask if the variable can take that value? If yes then it is continuous, if no it is discrete. Probability distribution of a continuous random variable: A continuous random variable has a probability density function which is a smooth non-negative function representing likely and unlikely values of the random variable. Two key features of a probability distribution are its centre (location) and width (dispersion); the mean, μ, and variance, σ2 , respectively. Let f(x) denote the pdf for a random continuous variable X. Mean:        dx x f x X E ) ( ) (  Variance:             dx x f x X E X Var ) ( ) ( ) ( ) ( 2 2 2    There are an infinite number of points in an interval of a continuous random variable, so a positive probability cannot be assigned to each point – the area of a line = 0. Therefore, for a continuous random variable, P(X= x) = 0. We can only assign probabilities to a range of values or to put it another way, we can only assign a probability that X will lie within a certain range of variables.     2 1 ) ( ) ( 2 1 x x dx x f x X x P Note that it does not matter if greater than or greater than or equal to symbols are used as the difference in negligible (the probability of a single value is 0). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 7.
    The Normal Distribution: Themost useful continuous distribution is the normal distribution. The Normal distribution has a probability distribution function (pdf) of:                x e x f x - , 2 1 ) ( 2 2 2 ) ( 2    Important Parameters of the normal distribution: 1. μ = mean: the centre of the distribution. 2. σ2 = variance: level of dispersion 3. Properties of the normal distribution:  Symmetric about the mean  Bell shaped  The mean, μ median and mode are all the same  Used to find the probabilities of range  Probabilities of a single value = 0. E.g. P(X=3) = 0  There are an infinite number of normal distributions for each value of μ and σ  Area under the probability Density function is equal to 1 o As symmetric, each side has 0.5 area  Probability is measured by the area under the curve – the cumulative distribution function The Standardised Normal Distribution:  Variance and Standard Deviation of 1  Mean of 0  Values greater than the mean have positive Z-Values  Values less than the mean have negative Z-Values The most useful element of the normal distribution is that we can “standardise” it to the standard normal distribution of which we have tables to determine probabilities (Z values) Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 8.
        X Z Example: In agiven population, heights of people are normally distributed with a mean of 160cm and standard deviation of 10cm. a) What is the probability that a person is more than 163.5cm tall? b) What proportion of people have heights between 155cm and 164cm? Solution: a) b) 160cm     3632 . 0 1368 . 0 5 . 0 35 . 0 0 5 . 0 10 160 5 . 163 10 160 160 5 . 0 5 . 163 160 5 . 0 ) 5 . 163 (                         Z P Z P X P X P 160cm       3283 . 0 1368 . 0 1915 . 0 35 . 0 0 0 5 . 0 10 160 5 . 163 10 160 160 10 160 160 10 160 155 5 . 163 160 ) 160 155 ( ) 5 . 163 155 (                                         Z P Z P Z P Z P X P X P X P Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 9.
    The Chi-Square Distribution: TheChi-square random variables arise when standard normal random variables are squared. If Z1, Z2, ..., Zm denote m independent N(0,1) random variables, then          m i m i m m Z Z Z Z V 1 2 ) ( 2 2 ) ( 2 2 2 2 1 ~ ~    The notation 2 ) ( ~ m X V is read as: the random variable V has a chi-square distribution with m degrees of freedom. The degrees of freedom parameter m indicates the number of independent N(0,1) random variables that are squared and summed to form V. The value of m determines the entire shape of the chi- square distribution – including its mean and variance.     m V m E V E m m 2 var ) var( ) ( 2 ) ( 2 ) (       The Values of V must be non-negative, v  0, because V is formed by squaring and summing m standardised normal N(0,1) random variables. The distribution has a long tail, or is Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 10.
    skewed to theright (long tail to the right). As the degrees of freedom increase m gets larger and the distribution becomes more symmetric and “bell-shaped”. As m gets larger, the chi-square distribution converges to and essentially becomes the normal distribution. The student ‘t’ Distribution: A ‘t’ random variable is formed by dividing a standard normal random variable Z ~ N(0,1) by the square root of an independent chi-square random variable, V ~ χ2 m. m t m V Z t ~  The t-distributions shape is completely determined by the degrees of freedom parameter, m and the distribution is symbolised by tm. Note that the t distribution is more spreadout than the standard normal distribution and less peaked. With mean and variance: 2 ) var( 0 ) ( ) ( ) (    m m t t E m m As the number of degrees of freedom approaches infinity, the distribution approaches the standard normal. N(0,1). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 11.
    The F distribution: AnF random variable is formed by the ratio of two independent chi-square random variables that have been divided by their degrees of freedom. If V1 ~ χ2 m1 and V2 ~ χ2 m2 and if V1 and V2 are independent, then: ) , ( ~ / / 2 1 2 2 1 1 m m F m V m V F  The F-distribution is said to have m1 numerator degrees of freedom and m2 denominator degree’s of freedom. The values of m1 and m2 determine the shape of the distribution, which in general looks like the figure below. The graph below shows the range of shapes the distribution can take for different degrees of freedom. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 12.
    Laws of Expectationand Variation: ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ 0 ] [ ] [ 2 2 Y Var X Var Y X Var Y E X E Y X E X Var a b aX Var b X aE b aX E X Var a aX Var X aE aX E b Var b b E                The Error Term: The error term in a regression model is a random variable. Like other random variables it is characterised by: a) A mean (or expected value) b) A variance c) A distribution (i.e. probability density function) We usually assume the random error term of an econometric model to: a) Have expected value of zero b) Have a variance which we will call σ2 Where: a and bare constants X and Y are random variables Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 13.
    The smaller thevariance of the error term, the more efficient the model. Sampling Distributions: We can usually draw many samples of size n from a population. Each sample can be used to compute a sample statistic (eg. A sample mean) these statistics will vary from sample to sample. If we take infinitely many samples of a normally distributed random variable X in the population the sample statistic X will also be normally distributed. The probability distribution that gives all possible values of a statistic and associated probabilities is known as a sampling distribution. If Xi ~ N(μ,σ2 ) then, ) / , ( ~ 2 N N X   If the distribution of X is non-normal but n is large, then X is approximately normally distributed. The approximation is good when n 30  - this is known as the central limit theory. Central limit Theorem: If Y1, ..., YN are independent and identically distributed random variables with mean μ and variance σ2 , and   N Yi Y / , then N Y ZN     has a probability distribution that converges to the standard normal as N (0,1) as N  ∞. Estimators & Estimates: A point estimator is a rule or formula which tells us how to use a set of sample observations to estimate the value of a parameter of interest. A point estimate is the value obtained after the observations have been substituted into the formula. Desirable properties of point estimators include:  Unbiased – an estimator ˆ is an unbiased estimator of the population parameter θ if E(ˆ) = θ  Efficiency - 1 ˆ  is more efficient than 2 ˆ  if     2 1 ˆ var ˆ var     Consistency- the distribution of the estimator becomes more concentrated about the population parameter as the sample size becomes larger Note that both bias and variance approach 0 as n approaches infinity. Estimate: is a particular value for a parameter Estimator: a formula to get estimate Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 14.
    Examples: N Xi X   isthe best linear unbiased estimator of ) (X E     N X Xi    2 2 ̂ is a biased but consistent estimator of 2 2 ) (     X E   1 ˆ 2 2     N X Xi  is an unbiased and consistent estimator of 2 2 ) (     X E Confidence Intervals: A confidence interval or interval estimate, is a range of values which contains information not only about the location of the population mean, but about the precision with which we estimate it. We can generally use the sampling distribution of an estimator to derive a confidence interval for the population parameter. In general, a 100(1-α)% confidence interval for the population mean is given by: n Z x CI      2 / Where α is the level of confidence. Prior to selecting a random sample, the probability that a CI will contain the population parameter is 100(1-α)%. Eg. If we took many samples of size n and calculated the many corresponding random 1-α = 0.95 α/2 α/2 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 15.
    intervals n Z x     2 /then 100(1-α)% would contain μ. After we construct a confidence interval, either it does or it does not contain the population parameter, with probabilities 1 and 0 (so we can only say we are 100(1-α)% confident that a particular confidence interval contains the parameter. General conclusion: “We can say with 100(1-α)% confidence that the population parameter is between lower bound and upper bound.” Hypothesis Testing: An hypothesis is a statement or claim about the value(s) of one or more population parameters. To test a hypothesis we 1. Identify a test statistic and find its sampling distribution when the hypothesis is true 2. Reject the hypothesis if the test statistic takes a value that is deemed unlikely 5 steps: 1. State H0 and H1 – H0 must contain an equality (    , , ) 2. State a decision rule – Reject H0 if... 3. Calculate test statistic 4. Compare, and make decision 5. Write conclusion Note: o One-tail or two tail tests can be used o Can use critical values or p-value method Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 16.
    Econometrics: ECON2300 –Lecture 2 An Econometric Model: For a given set of data the aim of a econometric model is to fit a regression line and then check how good it fits. In order to investigate this relationship between expenditure and income we must build an economic model and then a corresponding econometric model that forms the basis for a quantitative or empirical economic analysis. We must express mathematically which variables are dependent and independent. (In this case we can say that the weekly expenditure depends on income – y depends on x) We represent our economic model mathematically by the conditional mean: x x y E x y 2 1 | ) | (       The conditional mean ) | ( x y E is called a simple regression function as there is only one explanatory variable. The unknown regression parameters 1  and 2  are the intercept and slope respectively. dx x y dE x x y E ) | ( ) | ( 2      Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 17.
    For each valueof x there is potentially a range of values of y – in fact each has a probability distribution. The figure above shows that the regression line passes through the mean of each distribution of expenditure at each level of income. The difference between the actual value of y and the expected value is known as the random error term. ) ( ) ( 2 1 x y y E y e        If we rearrange: e x y    2 1   Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 18.
    Assumptions of theSimple Linear Regression (SLR) Model: 1. The population can be represented by: e x y    2 1   2. The mean value of y, for each value of x is given by the linear regression function x x y E 2 1 ) | (     Error term: This means that the mean error term is 0. 0 ) (  e E 3. For each value of x, the values of y are distributed about their mean value, following probability distributions that all have the same variance 2 ) | var(   x y Error term: This means that the error terms are homoskedastic: constant variance. Violation of this is hetroskadastic. ) var( ) var( 2 y e    v 4. The sample values of y are all uncorrelated and have zero covariance, implying there is no linear association amoung them, 0 ) , cov(  j i y y Error term: There is no Serial Correlation. Note that this assumption can be made stronger by assuming that the random errors e are all statistically independent in which case the values of y are also statistically independent. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 19.
    5. The variablex is not random and must take at least two different values. 6. (optional) The values of y are normally distributed about their mean for each value of x.     2 2 1 , ~    x N y  Error term: The values of e are normally distributed about their mean ) , 0 ( ~ 2  N e If the values of y are normally distributed, and vice versa. The Error term: If the regression parameters 1  and 2  were known then for any value of y we could calculate: ) ( ) ( 2 1 x y y E y e        However, the values of 1  and 2  are never known for certain and therefore it is impossible to calculate e. The random error e represents all factors affecting y other than x. These factors cause individual observations y to differ from the mean value: x y E 2 1 ) (     Estimating the Parameters of the Similar Linear Regression: Our problem is to estimate the location of x y E 2 1 ) (     that best represents our data. We would expect this line to be somewhere in the middle of all the data points ince it represents mean, or average behaviour. To estimate 1  and 2  we could simply draw a line through the middle of the data and then measure the slope and intercept with a ruler. The problem with this method is that different people would draw different lines – in fact there would be an infinite set of possibilities, and that it would not be accurate. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 20.
    The estimated regressionline is given by: i i x b b y 2 1 ˆ   The least squares principle: The least squares method involves finding estimators 1  and 2  that provide the smallest sum of squared residuals:       2 2 ˆ min ˆ min i i i y y e       2 2 ) ( ) )( ( x x y y x x b i i i x b y b 2 1   We usually use a computer to calculate these values as the process would take too long and be too tedious to do by hand. Interpreting the estimates:  The value of b2 is an estimate of 2  , the amount by which y increases per unit increase in x  The value of b1 is an estimate of 1  , what y would be when x = 0 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 21.
    Because the leastsquares estimate is generated using sample data, different samples will lead to different values of b1 and b2. Therefore b1 and b2 are random variables. In this context we call b1 and b2 the least squares estimators, but when actual sample values are substituted then we obtain values of random variables which are estimates. Estimators: Formulas for estimates Estimates: Actual values given by the estimators The variances and Covariance of b1 and b2:             2 2 2 1 ) ( ) var( x x N x b i i             2 2 2 ) ( ) var( x x b i  The square roots of the estimated variances are known as standard errors.             2 2 2 1 ) ( ) , cov( x x x b b i  Summary: the variances and covariances of b1 and b2  The larger the variance in the error term, 2  , the greater the uncertainty there is in the statistical model, and the larger the variances and covariance of the least squares estimators.  The larger the sum of squares,  2   x xi , the smaller the variances of the least squares estimators and the more precisely we can estimate the unknown parameters Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 22.
    In a thedata are bunched, the  2   x xi is smaller and we cannot estimate the line very accurately. In b the  2   x xi is larger and we can estimate the unknown parameters more precisely.  The larger the sample size N, the smaller the variances and covariance’s of the least squares estimators  The larger the term  2 i x is, the larger the variance of the least squares estimator b1 The further our data are from x = 0, the more difficult it is to interpret B1.  The absolute magnitude of the covariance increase the larger in magnitude is the sample mean x , and the covariance has a sign opposite that of x . The probability distribution of the Least Squares Estimators:  If the normality assumption about the error terms, is correct, the the least squares estimators are normally distributed.  If assumptions 1 – 5 hold, and if the sample size is sufficiently large ( 30  n ), then by the central limit theorem the least squares estimators have a distribution that approximates the normal distribution shown. The Gauss-Markov Theorem: Under the assumptions of SR1-SR5 of the linear regression model, the estimators b1 and b2 have the smallest variance of all linear and unbiased estimators 2 1 &   . They are the Best Linear Unbiased Estimators (BLUE) of 2 1 &   . To clarify what the Gauss-Markov theorem does, and does not, say: 1. The estimators b1 and b2 are “best” when compared to similar estimators, those that are linear and unbiased. The theorem does not say that b1 and b2 are the best of all possible estimators. 2. They are the “best” within their class because they have the minimum variance. When comparing two linear and unbiased estimators we always want to use the one with the smallest variance. 3. In order for the Gauss-Markov Theorem to hold, assumptions SR1-SR5 must be true. If any of these assumptions are not true, then b1 and b2 are not the best linear unbiased estimators of B1 and B2. 4. The Gauss-Markov theorem does not depend on the assumption of normality 5. In simple linear regression these are the estimators to use. 6. The theorem applies to the least squares estimators. It does not apply to the least squares estimates from a single same. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 23.
    Estimating the varianceof the Error term: The variance of the random error ei is:     ) ( ) 0 ( ) ( ) var( 2 2 2 2 i i i i i e E e E e E e E e        Assuming that the mean error = 0 assumption is correct. The unbiased estimator of variance is: 2 ˆ ˆ 2 2    N ei  with 2 2 ) ˆ (    E Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 24.
    Interval Estimation: Confidence interval: ) ( .k crit k b error std t b CI    Where: bk = b1 or b2 tcrit = the critical value ) 2 , 2 / 1 (   N t  where N-2 are the degrees of freedom std. Error = is given by the regression estimation Before sampling, we can make the probability statement there is a 100(1-α)% chance that the real value lies within the interval. After sampling, we can only make a confidence interval – we are 100(1-α)% confident that the real value lies within the interval. Example: Construct a 95% confidence interval for B2 for the following equation when there are 40 observations. ) 09 . 2 ( ) 4 . 43 ( 21 . 10 4 . 83 ˆ x y   Solution: 23016 . 4 21 . 10 09 . 2 024 . 2 21 . 10 09 . 2 21 . 10 09 . 2 21 . 10 09 . 2 21 . 10 ) ( . ) 38 , 975 . 0 ( ) 2 40 , 2 / 05 . 0 1 ( ) 2 , 2 / 1 ( 2 2                      t t t b error std t b CI N crit  We can say with 95% confidence that the true value of β2 lies within the interval 5.98 to 14.44. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 25.
    Hypothesis Testing: We canconduct a hypothesis test on the slope of the regression line. Step 1: State Hypothesis: , , , : , , , : 1 0 c c c H c c c H k k k k k k             Step 2: Decision rule: Reject H0 if ..... Step 3: Calculate test statistic Step 4: Compare and decision Step 5: Conclusion Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 26.
    Example: Using 40 observationson food expenditure. ) 09 . 2 ( ) 4 . 43 ( 21 . 10 4 . 83 ˆ x y   Test whether B2 is less than or equal to 0 at the 5% level of significance. Step 1: State Hypothesis 0 : 0 : 2 1 2 0     H H Step 2: Decision Rule Reject H0 if tcalc > tcrit Step 3: Calculate test statistic 88 . 4 09 . 2 0 21 . 10 2 2 2      b Se b tcalc  Step 4: Compare and decision 4.88 > 2.024 therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that the value of B2, that the increase in expenditure for a 1 unit increase in income, is not less than or equal to 0. Types of errors: H0 true H0 False Reject H0 Type 1 error = α No error DNR H0 No error Type 2 error Rejection region 024 . 2 38 , 975 . 0 ) 2 40 , 2 / 05 . 0 1 ( ) 2 , 2 / 1 (         tcalc t tcalc t tcalc t tcalc N  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 27.
    Econometrics: ECON2300 –Lecture 3 The least Squares Predictor: The linear regression model provides a way to predict y given any value of x. This is extremely important for forecasters; be it in politics, finance or business. Accurate predictions provide a basis for better decision making. Our first SR assumption is that our model is linear: For a given value of the explanatory variable, x0, the value of the dependent various y0 is given by the econometric model: 0 2 1 0 e x y      Where e0 is a random error. This random error has: 1. Mean: E(e0)= 0 2. Variance: var(e0) = σ2 3. Covariance: cov(e0,e1) = 0 The least squares predictor (or estimator) of y0 (given x0) is: 0 2 1 0 ˆ x b b y   To evaluate how well this predictor or estimator performs we define the forecast error, which is analogous to the least squares residual. The variance of the prediction error is: i i x b b y 2 1 ˆ   Forecast error: f Actual value: yi i ŷ i x 0 0 2 2 1 1 0 2 1 0 2 1 0 0 ) ( ) ( ) ( ˆ e x b b e x x b b y y f                 Now: if we apply the assumptions SR1 to SR5: 0 0 0 0 ) ( ) ( ) ( ) ˆ ( ) ( 0 0 2 2 1 1 0 0            e E x b E b E y y E f E   As: 0 ) ( & ) ( & ) ( 0 2 2 1 1    e E b E b E   Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 28.
                    2 2 0 2 0 ) ( ) ( 1 1 ) ˆ var( ) var( x x x x N y y f i  If SR6holds, or the sample size is large enough, then the prediction error is normally distributed. Note that, the further x0 is from the sample mean, the larger the variance of the prediction error.  This means that as you extrapolate more and more your predictions will be less accurate. Note the variance of the forecast error is smaller when: i) The overall uncertainty in the model is smaller, as measured by the variance of the random errors σ2 ii) The sample size N is larger iii) The variation in the explanatory variable is larger iv) The value of x0 from x is smaller The forecast error variance is estimated by replacing σ2 with its estimator: ) ( r̂ va ) ( ˆ ˆ ) ( ˆ ) ( ˆ ˆ ) ( ) ( ˆ ˆ ˆ ) ( ) ( 1 1 ˆ ) ˆ var( 2 2 0 2 2 2 2 2 0 2 2 2 2 0 2 2 2 2 2 0 2 b x x N x x x x N x x x x N x x x x N f i i i                                        i i x b b y 2 1 ˆ   i ŷ x x2 x1 Obviously: The estimate that the estimator or predictor gives at x1 will be close to the actual value as there are lots of data points that the regression is based on round x1 – it is close to the sample mean. At x2, there are no points very close that the regression was based on, so the prediction will be less accurate aka will have a larger variance. i.e. We can do a better job of predicting in the region where we have more sample information. The standard error of the forecast: ) ( r̂ va ) ( f f se  Hence, we can construct a (1-α)x100% prediction interval for y0: ) ( ˆ0 0 f se t y y crit   Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 29.
    Example: Calculate a 95%confidence interval for y when x0 = 20: ) ( ˆ0 0 f se t y y crit   Step 1: Linear equation From the output above we can determine a linear regression: ) 093 . 2 ( ) 41 . 43 ( 21 . 10 416 . 83 ˆ ˆ 2 1 x y x b b y     Therefore: when x = 20 616 . 287 ) 20 ( 21 . 10 416 . 83 ˆ    y Step 2: Determine se(f) ) ( r̂ va ) ( f f se  2 2 2 2 0 2 2 2 0 2 ) 0932 . 2 )( 605 . 19 20 ( 40 517 . 89 517 . 89 ) ( r̂ va ) ( ˆ ˆ ) ( ) ( 1 1 ˆ ) ( r̂ va                       b x x N x x x x N f i    S.E of regression Sample size (N) x-value mean of x se(b2) Note: Var(b2) = se(b2)2  ) ( r̂ va f 8214.34 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 30.
    Step 3: Confidenceinterval 34 . 8214 024 . 2 616 . 287 34 . 8214 616 . 287 ) ( r̂ va 616 . 287 ) ( ˆ 2 40 , 975 . 0 ) 2 , 2 / 1 ( 0 0              t f t f se t y y N crit  06 . 471 17 . 104   y Therefore we can say with 95% confidence that the true expenditure on food will be between $104.17 and $471.06. Transforming x to obtain se(f): A simple way to obtain the prediction and prediction interval estimates with EViews ( or any other econometrics package, including Excel) is as follows: 1. Transform the independent variable x by subtracting x0 from each of the values. Generate a new variable: Genr  x2 = x – x0 2. The estimate the regression model by running a regression analysis 3. The estimated standard error of the forecast is given by: 2 1 ˆ ) var( ) (    b f se Example: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 31.
    The transformation hasthe following effect: Measuring Goodness-of-Fit: Two major reasons for analysing the model e x y    2 1   1. To explain how the dependent variable (yi) changes as the independent variable (xi) changes 2. To predict y0 given an x0 These two objectives come under the broad headings of estimation and prediction. Closely allied with the prediction problem discussed in the previous section is the desire to use xi to explain as much of the variable in the dependent variable yi as possible. i i x b b y 2 1 ˆ   Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 32.
    SST = totalsum of squares – measure of total variation in the dependent variable about its sample mean SSR = regression sum of squares – the part that is explained by the regression SSE = sum of squared errors – that part of the total variation that is unexplained Coefficient of determination: R2 The coefficient of determination measures the proportion of the variation in the dependent variable that is explained by the regression model: SST SSE SST SSR R    1 2 0 < R2 < 1 If R2 =1 the data fall exactly on the fitted least squares regression line and we have a perfect fit. If the sample data for y and x are uncorrelated and show no linear association, then the least squares fitted line is “horizontal” so SSR = 0 and R2 = 0. For a simple regression model, R2 can also be computed as the square of the correlation coefficient between yi and i ŷ .  R2 = 1: All the sample data falls exactly on the fitted least squares line, SSE = 0  R2 = 0: The sample data for y and x are uncorrelated, the least squares fitted line is horizontal and equal to the mean of y so that SSR = 0 Note: 1. R2 is a descriptive measure 2. By itself, it does NOT measure the quality of the regression model 3. It is NOT the objective of regression analysis to find the model with the highest R2 4. By adding more variables R2 will automatically increase even if the variables have no economic justification this is why we use adjusted R2 in multiple regression analysis(will expand on this when we study multiple regression): i ŷ x i x y SST = y yi  SSE i i i y y e ˆ ˆ   = unexplained SSR y yi  ˆ = explained component Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 33.
    ) 1 /( ) /( 1 2     N SSR K N SSE R Example: For the samedata as before: The Effects of Scaling the Data: Data we obtain is not always in a convenient form for presentation in a table or use in a regression analysis. When the scale of the data is not convenient, it can be altered without changing any of the real underlying relationships between variables. If we scale x by 1/c: e c x c y e x y        ) / ( ) ( 2 1 2 1     If we scale y by 1/c: c e x c c c y e x y / ) ( ) / ( / / 2 1 2 1            Example if we now report income in $100 units. Because b2 = 10.21 and x = 200, After scaling: b2 = 0.1021 and x = 2 This has no change in the model. Choosing a Functional Form: So far we have assumed that the mean household food expenditure is a linear function of household income. That is, we assumed the underlying economic relationship to be x y E 2 1 ) (     , which implies that there is a linear, straight-line relationship between E(y) and x. When the scale of x is altered, the standard error of the regression coefficient changes by the same multiplicative factor as the coefficient, so that their ratio, the t-statistic, is unaffected. All other regression statistics are unchanged. Because the error term is scaled in this process the least squares residuals will also be scaled. This will affect the standard errors of the regression coefficients, but will not affect t statistics or R2 . Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 34.
    In the realworld this might not be the case, and this was only assumed to make the analysis easier. The starting point in all econometric analysis is economic theory. What does economics really say about the relation between food expenditure and income, holding all else constant? We expect there to be a positive relationship between these variables because food is a normal good. But nothing says the relationship must be a straight line. In fact we do not expect that as household income rises that food expenditures will continue to rise indefinitely at the same constant rate. Instead, as income rises, we expect food expenditures to rise but at a decrease rate – law of diminishing returns. The term linear in “linear regression model” 1. Does not mean a linear relationship between the economic variables. 2. Does mean that the model is “linear in the parameters” (eg. βk values – must not be raised to powers or multiplied by other parameters etc.) but is not, necessarily, “linear in the variables” (eg. X can be x2 x3 etc etc.) Linear in parameters: the parameters are not multiplied together, divided, squared, cubed etc. k k x x x f        ... ) ( 1 1 0 1. each explanatory variable in the function is multiplied by an unknown parameter, 2. there is at most one unknown parameter with no corresponding explanatory variable, and 3. all of the individual terms are summed to produce the final function value. An example of a non-linear in parameter model is: x x f 1 0 0 ) (      or 1 0 ) (   x x f   This is non-linear because the slope of this line is expressed as a product of two parameters. As a result, nonlinear least squares regression must be used to fit this model, but linear least squares cannot be used. Because of this fact, the simple linear regression model is much more flexible than it appears at first glance. By transforming the variables y and x, we can represent many curved, nonlinear relationships and still use the linear regression model. Choosing an algebraic form for the relationship means choosing transformations of the original variables. The slopes of which can be determined by taking the derivatives of the function. Note: the most important implication of transforming variables is that the regression result interpretations change. Both the slope and elasticity change from the linear relationship case. Some common function types are: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 35.
    A Practical Approach: 1.Plotting the data and choosing economically-plausible models 2. Testing hypotheses concerning the parameters 3. Performing residual analysis 4. Assessing forecasting performance 5. Measuring goodness-of fit (R2 ) 6. Using the principle of parsimony – simplest model Example on Food Expenditure: 1. Plotting data Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 36.
    2. Testing hypothesise: Allslope coefficients are significantly different from zero at the 5% level of significance. 3. Performing residual analysis: Testing for normally distributed Errors The k-th moment (from physics) of the random variable e is: k k e E ) (     Where μ denotes the mean of e. Measures of spread, symmetry and “peakedness” are: Variance: 2 2     V Skewness: 3 3 /    S Kurtosis: 4 4 /    K - Whether the tails are thicker or thinner than expected If e is normally distributed then S = 0 and K = 3. Formalising this, is the Jarque-Bera Test: The Jarque-Bera test is a test of how far measurers of residual skewness and kurtosis are from 0 and 3 (normality). To test the null hypothesis of normality of the errors, we use the test statistic:            4 ) 3 ( 6 2 2 K S N JB Where: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 37.
    N = samplesize S = skewness K = Kurtosis When the null hypothesis is true, the Jarque-Bera statistic, JB has a χ2 distribution with 2 df. Step 1: State the hypothesis: H0: the errors are normally distributed H1:the errors are not normally distributed Step 2: Decision rule: Rejet H0 if JB > χ2 (0.95,2) = 5.991 Step 3: Calculate test statistic: 063 . 0 4 ) 3 99 . 2 ( ) 097 . 0 ( 6 40 4 ) 3 ( 6 2 2 2 2                         K S N JB Step 4: Compare and decision 0.063 < 5.991 therefore do not reject H0. Step 5: conclusion There is insufficient evidence to conclude that the errors are not normally distributed at the 5% level of significance. 4. Assessing forecasting performance Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 38.
    5. Measuring goodness-of-fit:With different dependent variables: Goodness of fit with different dependent variables: The R2 from a linear model, measures how well the linear model explains the variation in y, while the R2 from a log-linear model measures how well that model explains the variation in ln(y). The two measures should NOT be compared. To compare goodness-of-fit in models with different dependent variables, we can compute the generalised R2 .   2 ˆ , 2 2 ) ˆ , ( y y g r y y corr R   We can’t compare R2 as each has different dependent variable. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 39.
    6. Using theprinciple of parsimony – Use the simplest model The principle of parsimony states that you should use the simplest model if two models appear to be of equal forecasting ability. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 40.
    Econometrics: ECON2300 –Lecture 4 Multiple Regression A: The simple regression model we have studied so far relates the dependent variable y to only ONE explanatory variable x. When we turn an economics model with more than one explanatory variable into its corresponding statistical model, we refer to it as a multiple regression model. Changes and Extensions from the simple regression model: 1. Interpretation of the β parameters: The population regression line is: ik k i iK i x x x x yi E        ... ) ,..., | ( 2 2 1 2 The k-th slope coefficient measures the effect of a change in the variable xk, upon the expected value of y, all other variables held constant. Mathematically: ik ik i i sconstnat allotherx iK i x x x y E xiK x x yi E      ) ,..., | ( ) ,..., | ( 2 ' 2 Note: the x’s start at 2 as 1 refers to the intercept term (which has no slope). 2. The assumption concerning the characteristics of the explanatory (x) variables The assumptions of the multiple regression model are: MR1: i K i K i i e x x y         ... 2 2 1 , where: i = 1,..., N - The model is linear in parameters but may be non-linear in the variables MR2: 0 ) E(e : with synonomous is which ... ) ( i 2 2 1       K i K i i x x y E    - The expected (average) value of yi depends on the values of the explanatory variables and the unknown parameters. MR3: 2 ) var( ) var(    i i e y - the error terms are homoskedastic (have constant variance) MR4: cov(yi,yj) = cov(ei,ej) = 0 – There is no serial correlation Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 41.
    MR5: The valuesof each xik are not random and are not exact linear functions of the other explanatory variables MR6: (optional) ) , 0 ( ~ ] ), ... [( ~ 2 2 2 2 1      N e x x N y i iK K i i     3. The degrees of freedom for the t-distribution We will go into further detail of this further in the summary. Least Squares Estimation: The fitted regression line for the multiple regression model is: iK k i i x b x b b y     ... ˆ 2 2 1 The least squares residual is: iK k i i i i x b x b b yi y y e        ... ˆ ˆ 2 2 1 Similarly to the simple linear regression, the unknown parameters β1,...,βK are obtained by minimising the residual sum of squares:                  N i iK k i N i i i N i i x b x b b yi y y e 1 2 2 2 1 1 2 1 ... ˆ ˆ Solving the first-order conditions for a minimum yields messy expressions for the ordinary least squares estimators, even when K is small. For example when K = 3 the OLS method gives: In practise we use matrix algebra to solve these systems: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 42.
    To understand graphicallywhat a multiple regression model embodies look at the image below: The equation forms a surface or plane which describes the position of the variable. Example: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 43.
    The model isgiven by: (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Interpretation of the coefficients: b2: The number of sales is expected to fall by $7907 units when the price increases by $1 holding the amount of advertising constant. b1: The number of sales is expected to increase by $1863 units when the advertising increase by $1 holding the price constant. Properties Of The OLS Estimators: (OLS = Ordinary Least Squares) The Gauss-Markov Theorem says that: If MR1 to MR5 are correct, the OLS estimators b1,...,bK have the smallest variance of all linear and unbiased estimators of β1, ...,βK – they are the Best Linear and Unbiased Estimators (BLUE). Remember that the Gauss-Markov theorem does not depend on the assumption of normality (MR6). However, if MR6 does hold, then the OLS estimators are also normally distributed. Again with larger values of K, the formula’s for variances of the OLS estimators are messy. For example, when K = 3, we can show that: Where r23 is the sample correlation coefficient between x2 and x3. -1 < r < 1 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 44.
    The variances andcovariances are often presented in the form of a covariance matrix. For K = 3, this matrix takes the form: In practise however, σ2 , the population variance is unknown. So instead we use an unbiased estimator of the error variance: K N e K N y y N i i N i i i          1 2 2 1 2 ˆ ) ˆ ( ̂ The estimated variances and covariances of the OLS estimators are obtained by replacing within the appropriate formulas. The square roots of the estimated variances are still known as standard errors. It is important to understand the factors affecting the variance of bi (i = 2,...,K): Inferences in the Multiple Regression Model: If the assumptions MR1 – MR6 hold, we can: 1. The larger σ2 the larger the variance of the least squares estimators. 2. The larger the sample size the smaller the variances 3. More variation in an explanatory variable around its mean, leads to a smaller variance of the least squares estimator. 4. The larger the correlation between the explanatory variables, the larger is the variance of the least squares estimators. “Independent” variables ideally exhibit variation that is “independent” of the variation in other explanatory variables. 5. Variation is one explanatory variable connected to variation in another explanatory variable is knonw as Multicoliniarity (see next week). E.g. A larger correlation between x2 and x3 leads to a larger variance of b2. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 45.
    1. Construct confidenceintervals for each of the K parameters 2. Conduct a significance test for each of the K parameters 3. Conduct a hypothesis test on any of the parameters or combinations of parameters The approach is that followed for the simple regression model in weeks 2 and 3 for the parameters of the simple regression model. 1. Confidence interval: A 100(1-α)% confidence interval for βk is given by: ) ( k crit k k b se t b    for k = 1, ..., K Where: K = Number of βi parameters . e.g. 3 3 2 2 1 ˆ i i i x b x b b y    : K = 3 ) , 2 / 1 ( K N t tcrit     Se = the standard error given in the regression output of the bi estimate Example: construct a 95% confidence interval for the coefficient of advertising for the following model which was based on N = 75 observations on hamburger sales. (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Solution: ) 683 . 0 ( 993 . 1 863 . 1 ) ( ) ( ) ( 3 3 ) 72 , 975 . 0 ( 3 3 3 ) 3 75 , 2 / 05 . 0 1 ( 3 3 ) , 2 / 1 (                  b se t b b se t b b se t b k K N k k 2. Hypotheses Testing 2.1.A simple null hypothesis is a null hypothesis with a single restriction on one or more parameters. Under MR1 to MR6, we can test the null hypothesis H0: βk = c using the t-statistic: ) ( ~ ) ( K N k k t b se c b t    Even if MR6 doesn’t hold, the test is still valid provided the sample size is large. Example: Test whether revenue is related to price at the 5% level of significance when N = 75. (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Solution: 224 . 3 502 . 0 3    We can say with 95% confidence that the true change in sales for a one dollar increase in advertising is between $502 and $3224. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 46.
    Step 1: StateHypotheses 0 : 0 : 2 1 2 0     H H Step 2: Decision Rule Reject H0 if |tcalc| > tcrit Step 3: Calculate Test Statistic 215 . 7 096 . 1 0 908 . 7 ) ( ) ( 2 2 2          b se b b se b t k k k calc   Step 4: Compare and Decision |-7.215| > 1.993 therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that the price does not have no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as effect on revenue. 2.2.Testing of a null hypothesis consisting of two or more hypotheses about the parameters in the multiple regression model. F- Tests Used in: 1. Overall significance of the Model 2. Testing economic hypotheses involving more than one parameter in the model 3. Misspecification Tests 4. Testing for Heteroskedasticity 5. Testing for Serial correlation Note: We adopt assumptions MR1-MR6 (i.e. including normality). If the errors are not normal, then the results presented will hold approximately if the sample is large. 993 . 1 | | | | | | ) 72 , 975 . 0 ( ) , 2 / 1 (      tcalc t tcalc t tcalc K N  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 47.
    A Familiar Formof the F-test: From ECON1320 we saw that we could express F as: ) /( ) 1 /( ) ( ) /( ) ( ) 1 /( ) ( K N SSE K SSE SST K N SSE K SSR F        However, this is just a particular example of a more general F-statistic that can be used to test sets of joint hypotheses. The general F-test: A joint null hypothesis is a null hypothesis with two or more restrictions on two or more parameters. Under MR1 to MR6, we can test a joint null hypothesis concerning the using the F statistic: K N J U U R F K N SSE J SSE SSE F     , ~ ) /( ) ( / ) ( Where: J = the number of restrictions in H0 SSEU = The unrestricted sum of squared errors from the original, unrestricted multiple regression Model. SSER = The restricted sum of squared errors from a regression model in which the null hypothesis is assumed to be true Note: Even if MR6 doesn’t hold, the test is still valid provided the sample size is large (by the central limit theorem) The General F-test can be used to test 3 types of hypotheses: 1. When used to test H0: βk=0 against H1: βk ≠ 0; the F-test is equivalent to a t-test J = 1 2. When used to test: H0: β2= β3= ...= βk against H1: At least one βk ≠ 0 J = K - 1 3. The F-test can also be used to test whether some combination of parameters is collectively significant to the model K J   1 Restrictions: When we have a restriction, we assume that the null hypothesis is true, for example if the null hypotheses is 0, then we assume that the βk value in the null hypotheses is 0 in the regression equation. Instead of using the least squares estimates that minimise the sum of squared errors, we find estimates that minimise the sum of squared errors subject to parameter constraints – restrictions. This means that the sum of squared errors will increase; a constrained minimum is larger than an unconstrained minimum. The theory behind the F-test, is that if the Errors are significantly different, then the assumption that the parameter is the value assumed in the null hypothesis has significantly reduced the ability of the model to fit the data, and thus the data do not support the null hypothesis. On the other hand, if the null hypothesis is true, we expect that the data are compatible with the conditions placed on the parameters – we would expect little change in the sum of squared errors when the null hypothesis is true. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 48.
    1. Testing with1 restriction (J=1) Example: Test whether revenue is related to price at the 5% level of significance when N = 75. (0.683) (1.096) (6.352) ) ( ) ( 862583 . 1 ) ( 907854 . 7 9136 . 118 ˆ se ADVERT PRICE S    Solution: Step 1: State Hypotheses & apply restriction 0 : 0 : 2 1 2 0     H H Now, impose the restriction assuming the null is correct, ie. Price is not significant and β2 is 0 and then find the regression equation. (0.890) (1.80) ) ( ) ( 733 . 1 180 . 74 ˆ se ADVERT S   Step 2: Decision Rule Reject H0 if Fcalc > Fcrit Step 3: Calculate Test Statistic 06 . 52 ) 3 75 /( ) 943 . 1718 ( 1 / ) 943 . 1718 827 . 2961 ( ) /( ) ( / ) (        K N SSE J SSE SSE F U U R Step 4: Compare and Decision 52.06 > 3.97 therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that the price does not have no effect on the revenue. i.e. we can conclude at the 5% level of significance that the price has as effect on revenue. The t-test and F-test - a relationship: When conducting a two-tail test for a single parameter, either a t-test or an F-test can be used and the outcomes will be identical. In fact, the square of a t random variable with df degrees of freedom is an F random variable with distribution F(1,df) F-statistic = (t-statistic)2 F-crit = (t-crit)2 52.06 = (-7.215)2 3.97 = (1.993)2 97 . 3 ) 3 75 , 1 , 95 . 0 ( ) , , 1 (       Fcalc t Fcalc F Fcalc K N J  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 49.
    2. Testing withJ = K-1 restrictions: the overall significance of the model An important application of the F-test is for what is called “Testing the overall significance of a model”. Consider the general multiple regression model with (K - 1) explanatory variables and K unknown coefficients. Unrestricted model: i K K i i i i e x x x y           ... 3 3 2 2 1 To examine whether we have a viable explanatory model, we set up the following null and alternative hypotheses. Restricted model: i i e y   1  Therefore: SSER = SSTU SSEU = SSEU Step 1: State Hypotheses and calculate restricted model nonzero is the of one least at : 0 ,..., 0 , 0 : k 1 3 2 0     H H K    Estimate restricted model: ) 749 . 0 ( ) ( 375 . 77 ˆ se S  SSER = 3115.482 (=SSTU) Step 2: Decision rule Reject H0 if Fcalc > Fcrit Step 3: Calculate test statistic 248 . 29 ) 3 75 /( ) 943 . 1718 ( 2 / ) 943 . 1718 482 . 3115 ( ) /( ) ( / ) (        K N SSE J SSE SSE F U U R Step 4: Compare and decision 29.248 > 3.12 Therefore reject H0. Step 5: Conclusion There is sufficient evidence at the 5 % level of significance to conclude that at least one of the explanatory variables as an effect on sales. 3. Testing a Group of parameters (1 ≤ J < K) Consider the model: Note: the null has K – 1 hypotheses, it is referred to as a Joint hypothesis. 12 . 3 ) 3 75 , 1 3 , 95 . 0 ( ) , , 1 (        Fcalc F Fcalc F Fcalc K N J  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 50.
    Does advertising havean effect on sales? Step 1: State Hypotheses nonzero are both or 0 0 : 0 0 : 4 3 1 4 3 0         H H Step 2: Decision rule Reject H0 if Fcalc > Fcrit Step 3: Calculate test statistic 44 . 8 ) 4 75 /( ) 084 . 1532 ( 2 / ) 084 . 1532 391 . 1896 ( ) /( ) ( / ) (        K N SSE J SSE SSE F U U R Step 4: Compare and decision 8.44 > 3.126 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that advertising has a statistically significant effect on sales. 126 . 3 ) 4 75 , 2 , 95 . 0 ( ) , , 1 (       Fcalc F Fcalc F Fcalc K N J  Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 51.
    Prediction: The value ofy when the explanatory variables take the values x02. The prediction error (or forecast error) is 0 0 ˆ y y f   . The prediction error is a random variable with a mean and a variance. If assumptions MR1 to MR5 hold then 0 ) ˆ ( ) ( 0 0    y y E f E and ) ˆ var( ) var( 0 0 y y f   with many terms, each involving σ2 . The prediction error variance is estimated by replacing σ2 by 2 ˆ  . The square root of the estimated forecast error variance is still called the standard error of the forecast. If assumption MR6 (normality) is correct, or the sample size is large, then a 100(1-α)% confidence interval or prediction interval for y0 is: ) ( ˆ ) ( ˆ ) , 2 / 1 ( 0 0 0 0 f se t y y f se t y y K N c        Example: Construct a 95% confidence interval for the prediction of y0 when P = 5.50 and A = 1200 Solution: 2 1 ) 3 75 , 2 / 05 . 0 1 ( 0 ) , 2 / 1 ( 0 0 0 0 ˆ *) var( )) 1200 ( 863 . 1 ) 50 . 5 ( 91 . 7 91 . 118 ( ) ( ˆ ) ( ˆ                b t y f se t y y f se t y y K N c Therefore create two new variables: P * = (P – 5.50) and A* = (A – 1200) Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 52.
    9429 . 4 993 . 1 66 . 77 0    y 5112 . 87 809 . 67 0  y We can therefore say with 95% confidence that the true value of sales when the price is $5.50 and the advertising expenditure is $1200, that the true value of sales lies between 67.8 thousand and 87.5112thousand. A reminder: Estimated regression models describe the relationship between the economic variables for values similar to those found in the sample data. Extrapolating the results to extreme values is generally not a good idea. Predicting the value of dependent variables for values of the explanatory variables far from the sample values invites disaster. Goodness of Fit: If the regression model constrains an intercept, we can still decompose the variation in the dependent variable (SST) into its explainable and unexplainable components (SSR and SSE). Then the coefficient of determination still measurers the proportion of the variation in the dependent variable that is explained by the regression model: SST SSE SST SSR R    1 2 The interpretation of R2 is identical to its interpretation in the simple regression model. i.e. R2 % of variation can be explained by the estimated equation. (1 implies a perfect fit) Adjusted R2: A problem with R2 is that it can be made large by adding more and more variables to the model, even when they have no economic justification. The adjusted R-squared imposes a penalty for adding more variables:           ) 1 /( ) /( 1 2 N SST K N SSE R Adjusted R-squared does not give the proportion of variation in the dependent variable that is explained by the model. It should not be used as a criterion for adding or deleting variables (if we add a variable, adjusted R-Squared will increase if the t-statistic on the new variable is greater than 1 in absolute value!) Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 53.
    SST = (N-1)(S.D. dependent variable)2 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 54.
    Econometrics: ECON2300 –Lecture 5 Multiple Regression B: Non-sample information: In many estimation problems, economic theory and experience provides us with information on the parameters that is over and above the information contained in the sample data. If this non-sample information is correct, and if we can combine it with the sample information, then we can estimate the parameters with greater precision. Some non-sample information can be written in the form of linear equality restrictions on the unknown parameters. (e.g. several parameters sum to one). We can incorporate this information into the estimation process by simply substituting the restrictions into the model. One example is when dealing with a firm which has constant returns to scale – take for example the cobb- dougles function whose parameters α and β must sum to 1 with constant returns to scale:   t t t t L K A y  We can show: when K and L increase by proportion λ, that this has the effect λ on y also with constant returns to scale.                          1 ) ( ) ( t t t t t t t t L K A y L K A y In order to incorporate the non-sample information, and impose constant returns to scale we should then estimate the following model:     1 t t t t L K A y The model is now the function of a single unknown parameter α. A technique to obtain an estimate of α in this case is known as restricted least squares - we “force” β = 1 – α To estimate the above model in practise, we can use the least squares method – as the model is linear in its parameters. We would convert the model to a log-log function as the model, Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 55.
    t t t t t e L K A y     ) ln( ) 1 ( ) ln( ) ln( ) ln(   To insure the restriction holds we re-arrange and collect terms: t t t t t e L K A L y    ) / ln( ) ln( ) / ln(  The restricted Least Squares Estimator: The least squares estimates we obtain after imposing the restrictions are known as restricted least squares (RLS) estimates. The RLS estimator:  Is biased unless the restrictions are EXACTLY true  Has a smaller variance than the OLS (ordinary least squares) estimator, whether or not the restrictions are true By incorporating the additional information with the data, we usually give up unbiasedness in return for reduced variances. Evidence on whether the restrictions are true can, of course, be obtained using an F-test (Wald test). Model Specification: There are several key questions you should ask yourself when specifying a model: Q1. What are the important considerations when choosing a model? A1. The problem, the economic model Q2. What are the consequences of choosing the wrong model? A2. If the wrong model is used, there can be omitted and irrelevant variables in the model Q3. Are there ways of assessing whether a model is adequate? A3. Yes you can use model Diagnostics – A test of adequate functional form In examining these model specifications we will look at the following example: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 56.
    Omitted variables: It ispossible that a chosen model may have important variables omitted. Our economic principles may have overlooked a variable, or lack of data may lead us to drop a variable even when it is prescribed by economic theory. We will consider a sample of married couples where both husbands and wives work. This sample was used by labour economist Tom Mroz in a classic paper on female labour force participation. The variables from this sample are in edu_inc.dat. We are interested in the impact of level of education, both the husband’s education and the wife’s education, on family income. Summary statistics for the data appear in table 6.2. The estimated relationship is: We estimate that an additional year of education for the husband will increase annual income by $3132, and an additional year of education for the wife will increase income by $4523. If we now incorrectly omit wife’s education from the equation: FAMINC = the combined income of husband and wife If we omit a relevant variable, then the least squares estimator will generally be biased, although it will have lower variance. Including irrelevant variables does not cause least squares method to be biased – however variance and therefore standard errors will be greater. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 57.
    When we omitWEDU it leads us to overstate the effect of an extra year of education for they husband by about $2000. This change in magnitude of a coefficient is typical of the effect of incorrectly omitting a relevant variable. To right a general expression for this bias for the case where one explanatory variable is omitted froma model with two explanatory variables we write the underlying model as: i i i i e x x y     3 3 2 2 1    Omitting x3 from the equation is equivalent to imposing the restriction β3 = 0. It can be viewed as imposing an incorrect constraint on the parameters. This of course has the implication of a reduced variance, but causes biased coefficient estimators. We can show (in appendix 6B) that the new estimate b2* of β3 is:         ) var( ) , cov( *) ( *) ( 2 3 2 3 2 2 2 x x x b E b bias      We can include further variables for instance, KL6 – the number of children under the age of 6. The larger the number of young children, the fewer the number of hours likely to be worked and hence a lower family income would be expected. (0.004) (0.000) (0.000) (0.488) value) - (p (5004) (1061) (796) ) 11163 ( ) ( 6 14311 4777 3211 7755        se i KL WEDUi HEDUi FAMINC      Notice that compared to the original estimated equation that the coefficients haven’t changed considerably for HEDU and WEDU. This outcome occurs because KL6 is not highly correlated with the education variables. From a general modelling perspective, it means that useful results can still be obtained when a relevant variable is omitted if that variable is uncorrelated with the included variables and our interest is on the coefficients of the included variables. Irrelevant Variables: The consequences of omitting relevant variables may lead you to think that a good strategy is to include as many variables as possible in your model. However this will: Omission of a relevant variable leads to omitted variable bias. The bias increases with the correlation between the included and omitted relevant variable. Note: if cov(xi,xj) = 0 or if β3 = 0; then bias will be 0. i.e. will be unbiased. Corr(KL6, HEDU) = 0.105 Corr(KL6, WEDU) = 0.129 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 58.
    1. Complicate yourmodel 2. Inflate the variances of your estimates To examine this, we will add two artificially generated variables X5 and X6. These variables were constructed so that they are correlated with HEDU and WEDU, but are not expected to influence family income. (0.591) (0.692) (0.004) (0.000) (0.000) (0.488) value) - (p (1982) (2242) (50044) (2278) (1250) ) 11195 ( ) ( 1067 889 6 14200 5869 3340 7759 6 5          se X X KL WEDUi HEDUi FAMINC i i i      The first thing that we notice is that the p-values for the two new coefficients are much greater than 0.05. They do indeed appear to be irrelevant variables. Also, the standard errors of the coefficients for all other variables have increased, with p-values increasing correspondingly. The inclusion of these irrelevant variables has reduced the precision of the estimated coefficients for other variables in the equation. The result follows because, by the Gauss-Markov theorem, the least squares estimator of the correct model is the minimum variance linear unbiased estimator. A Practical Approach: We should choose a functional form that: 1. Is consistent with what economic theory tells us about the relationship between the variables 2. Is compatible with assumptions MR1 to MR5 3. Is flexible enough to fit the data In a multiple regression context, this mainly involves: 1. Hypothesis testing 2. Performing residual analysis 3. Assessing forecasting performance 4. Comparing information criteria 5. Using the principle of parsimony Hypothesis Testing: The usual t- and F-tests are available for testing simple and jpoint hypotheses concerning the coefficients. As usual, failure to reject a null hypothesis can occur because the data are not sufficiently rich to disprove the hypothesis. If a variable has an insignificant coefficient, it can either be (a) discarded because it is irrelevant, or (b) retained because there are strong theoretical reasons for including it. The adequacy of a model can also be tested using a general specification test known as RESET. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 59.
    Testing for ModelMisspecification: RESET RESET (Regression Specification Error Test) is designed to detect omitted variables and incorrect functional form. Intuition: Hypotheses: H0: The functional form is correct, no omitted variables (extra terms are statistically not significantly) H1: The functional form is incorrect, or/and there are omitted variables (extra terms are statistically significant) Suppose that we have specified and estimated the regression model: i i i i e x x y     3 3 2 2 1    The predicted of “fitted” values of yt: 3 3 2 2 1 ˆ i i i x b x b b y    There are two alternative forms for the test: Artificial Model 1: i i i i i e y x x y      2 1 3 3 2 2 1 ˆ     Artificial Model 2: i i i i i i e y y x x y       3 2 2 1 3 3 2 2 1 ˆ ˆ      Example: FAMINC model: Step 1: State hypothesis H0: γ = 0 H1: γ ≠ 0 Step 2: Decision Rule Reject H0 if p-value < α = 0.05 Step 3: Calculate test statistic Ramsay RESET Test: If the chosen model and algebraic form are correct, then squared and cubed terms of the “fitted or predicted” values should not contain any explanatory power. If we can significantly improve the model by artificially including powers of the predictions of the model, then the original model must have been inadequate Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 60.
    F-calc = 0.0440 Step4: Compare 0.0440 < 0.05 Therefore reject H0 Step 5: Conclusion There is sufficient evidence at the 5% level of significance to conclude that there are omitted variables or the functional form is incorrect. Selection of Models – Information Criteria Akaike Information Criterion (AIC):  Is often used in model selection for non-nested alternatives – smaller values of the AIC are preferred N K N SSE AIC 2 ln         The Schwarz Criterion (SC):  Is an alternative to the AIC that imposes a larger penalty for additional coefficients Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 61.
    N N K N SSE SC ) ln( ln         Adjusted R2 : Penalizes for the addition of regressors which do not contribute to the explanatory power of the model. It is sometimes used to select regressors, although the AIC and SC are superior. It does not have the interpretation of R2 . Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 62.
    Collinear Economic Variables: Whendata are the result of an uncontrolled experiment many of the economic variables may move together in systematic ways. Such variables are said to be collinear, and the problem is labelled collinearity, or multicollinearity when several variables are involved. Co-linearity: Moving together in a linear way When there is collinarity, there is no guarantee that the data will be “rich in information” nor that it will be possible to isolate the economic relationship or parameters of interest. Consequences of collinearity: 1. One or more exact linear relationships amount the explanatory variables: exact collinearity, or exact multicollinarity. Least squares estimator is not defined. Multicollinearity calculation:     y x x x b T T 1   From linear algebra, we know that a matrix whose rows and columns are not linearly independent does not have an inverse, so matrix b – the multicollinearity cannot be calculated. 2. Nearly exact linear dependencies among the explanatory variables: some of the variances, standard errors and covariances of the least squares estimators may be large.         2 2 2 2 23 2 2 ) 1 ( ) var( x x r b i  For perfect collinearity: r23 = -1 or 1 therefore (1-r23 2 ) = 0 3. Large standard errors make the usual t-values small and lead to the conclusion that parameter estimates are not significantly different from 0, ALTHOUGH high R2 or F-values indicate “significant” explanatory power of the model as a whole. smallvalue b se b tcalc i i i    ) (  In general: Reject H0 (βi =0) if |tcalc| > |tcrit| therefore would conclude that Bi is 0. 4. Estimates may be very sensitive to the addition or deletion of a few observations, or the deletion of an apparently insignificant variable. 5. Despite the difficulties in isolating the effects of individual variables from such a sample, accurate forecasts may still be possible. For Near Perfect collinearity: r23 ≈ -1 or 1 therefore (1-r23 2 ) ≈ 0 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 63.
    Example – ChineseCoal Production We can detect multicollinearity by:  Computing sample correlation coefficients between variables. A common rule of thumb is that multicollinearity is a problem if the sample correlation between Only look at pairs any pair of variables is greater than 0.8 or 0.9.  Estimate auxiliary regressions (i.e. regress each explanatory variable on all the others.) Multicollinearity is usually considered a problem if the R2 from an auxiliary regression is greater than about 0.8. Looks at combinations of variables. Eg. x2 = 2x3 + 5x4 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 64.
    Pair-wise Correlations: Conclusion: The pairwise correlation between some of the inputs is extremely high, such as between ln(x3) and ln(x2) and ln(x3). Auxillary regression on ln(x3): Solution: A possible solution in this case is to use non-sample information: 1. Constant returns to scale 2. Variables 4,5 & 6 all are statistically insignificant (=0) Conduct a Wald Test:                0 0 0 1 : 6 5 4 7 2 0     i i H Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 65.
    Mitigating the Effectsof Multicollinearity: The collinearity problem occurs because the data do not contain enough information about the effects of the individual explanatory variables. We can include more information into the estimation process by:  Obtaining more, and better data – not always possible in non-experimental contexts  Introducing non-sample information into the estimation process in the form of restrictions on the parameters. Nonlinear Relationships: Relationships between economic variables cannot always be adequately represented by straight lines. We saw in Week 4 that we can add more flexibility to a regression model by considering logarithmic, reciprocal, polynomial and various other nonlinear-in-the-variables functional forms.  Linear in parameters, non-linear in variables We can also use these types of functional forms in multiple regression models. In multiple regression models, we also use models that involve interaction terms. When using these types of models some changes in model interpretation are required. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 66.
    Example: Downloaded by LaminDampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 67.
    Downloaded by LaminDampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 68.
    Introductory Econometrics: ECON2300– Dummy Variable Models The Use of Dummy Variables in Econometric Models: Assumption MR1 in the multiple Regresison model is: i iK K i i e x x y         ... 2 2 1 for i = 1, ..., N 1. The statistical model we assume is appropriate for all N observations in our sample 2. The parameters of the model, βk, are the same for each and every observation 3. If this assumption does not hold, and if the parameters are not the same for all the observations, then the meaning of the least squares estimates of the parameters is not clear There are some economic problems or questions where we might expect the parameters to be different for different observations: 1. Everything else the same, is there a difference between male and female earnings? 2. Does studying econometrics make a difference in starting salaries of graduates? 3. Does having a pool make a difference in a house’s sale price in the Brisbane market? 4. Is there a difference in the demand for illicit drugs across race groups? Dummy variables: 1. The simplest procedures for extending the multiple regression model to situations in which the regression parameters are different for some or all of the observations in a sample 2. Dummy variables are explanatory variables that only take two values usually 0 and 1 3. These simple variables are a very powerful tool for capturing qualitative characteristics of individuals, such as gender, race and geographic region of residence. There are two main types of dummy variables: 1. Intercept Dummy Variables: parameter (coefficients) denoted - δ 2. Slope Dummy variables: parameter (coefficients) denoted – γ Intercept Dummy Variables: Intercept dummy variables allow the intercept to change for a subset of observations in the sample. Models with intercept dummy variables take the form: i iK K i i i e x x D y           ... 2 2 1 where Di = 1 if the i-th observation has a certain characteristic and Di = 0 otherwise: Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 69.
                       intercept : 0 if ... ) 0 ( intercept : 1 if ... ) 1 ( ) ( 1 2 2 1 1 2 2 1            i i iK K i i i iK K i D e x x D e x x y E Note that theleast squares estimator properties are not affected by the fact that one of the explanatory variables consists only of zeros and ones – D is treated as any other explanatory variable. We can construct an interval estimate for δ, or we can test the significance of its least squares estimate. Such a test is a statistical test of whether the effect is “statistically significant”. If δ = 0, the variable has no effect on the variable in question. Example: House prices A model that allows the intercept to vary with the presence or absence of a particular characteristic Estimated equation: Sqft Pool 60 . 8 69 . 5 68 . 29 e ĉ Pri    In this model the value of Pool = 0 defines the reference group (homes with no pool). Two equivalent model would be: ˆ Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 70.
    Log-Linear Models: If e SQFT POOL PRICEpool     2 1 ) ln(    If e SQFT PRICE nopool    2 1 ) ln(        ) 1 ( ) ln( ) ln( nopool pool PRICE PRICE Then:               nopool pool nopool pool PRICE PRICE PRICE PRICE PRICE ln ) ln( ) ln( ) ln( 1 1 1                   e PRICE PRICE PRICE PRICE e PRICE PRICE e PRICE PRICE nopool nopool nopool pool nopool pool nopool pool And: 1     e PRICE PRICE PRICE nopool nopool pool Thus, houses with pools are 100(eδ -1)% more expensive than houses without pools, all other things being equal. Slope Dummy variables: Slope dummy variables allow the slope to change for a subset of observations in the sample. A model that allows β2 to vary across observations takes the form: i iK K i i i i i e x x x D x y             ... 3 3 2 2 2 1 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 71.
                      x of slope : 0 if ... x of slope : 1 if ... ) ( ) ( 1 2 2 2 2 1 2 2 2 2 1           i i iK K i i i iK K i D e x x D e x x y E Slope and InterceptDummy Variables Combined: Testing for Qualitative Effects: Dummy variables are frequently used to measure: 1. Interactions between qualitative factors (e.g. race and gender) 2. The effects of qualitative factors having more than two categories (eg. level of schooling) Example: WAGES Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 72.
    Explaining wages asa function of individual characteristics using white males as the reference group: e FEMALE BLACK FEMALE BLACK EDUC WAGE        ) ( 1 2 1 2 1      Only if black and female does γ have an effect Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 73.
    To test thenull hypothesis that neither race nor gender affect wages at the 1% Level of significance: Now: Explaining wages as a function of location using workers in the northeast as the reference group: e WEST MIDWEST SOUTH EDUC WAGE       3 2 1 2 1      Not significant at 5% LOS Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 74.
    Not significant at5% LOS Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 75.
    Testing the Equivalenceof Two regressions: By including an intercept dummy variable and an interaction term for every variable in a regression model, we are allowing every coefficient in the model to differ based on the qualitative factor – we are specifying two regressions. A test of the equivalence of the two regressions is a test of the joint null hypothesis that all the dummy variable coefficients are zero. We can test this null hypothesis using a standard F-test. This particular F-test is known as a Chow test. Explaining wage as a function of individual characteristics: e FEMALE BLACK FEMALE BLACK EDUC WAGE        ) ( 1 2 1 2 1      To test if there are differences between the wage regressions for the south and the rest of the country we estimate the model: The two regression equations are: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 76.
    If south =1 If south = 0 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 77.
    A Chow testat the 10% level of significance: Controlling For Time: Dummy variables are frequently used to control for:  Seasonal effects  Annual effects  Regime effects (government) Example: Emergency room cases Data on number of emergency room cases per day is available in the file fullmoon.wk1. The model: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 78.
    Example – StocktonHouse prices Example – Investment tax credits Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 79.
    ECONOMETRICS: ECON2300 –Lecture 7 Heteroskedasticity If we were to guess food expenditure for a low-income household and food expenditure for a high- income household we would be more accurate for a low-income house-hold as they have less choice and only have a limited income which they MUST spend on food. Alternatively a high-income household could have extravagant or simple food taste – a large variance at high income levels: resulting in heteroskedasticity. How can we model this phenomena? Note that assumption MR3 says that the errors have equal variance, or equal (homo) spread (skedasticity). An alternative and much more general assumption is: 2 ) var( i i e   Heteroskedasticity is often encountered in cross-section studies, where different individuals may have very different characteristics. It is less common in time-series studies. Properties of the OLS Estimator: If the errors are heteroskedastic then: Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 80.
     OLS isstill a linear and unbiased estimator. But it is inefficient in that it is no longer BLUE – Best linear unbiased estimator  The variances of the OLS estimators are no longer given by the formulas we discussed in earlier lectures. Thus, confidence intervals and hypothesis tests based on these variances are no longer valid. There are three alternative courses of cation to deal with heteroskedasticity: 1. If in doubt, use Least Sqaures for the parameters and a standard errors formula that works either way. (White Robust Standard Errors) 2. If heteroskedasticity is known to be present, use Generalised Least Squares (Weighted Least Squares) – BLUE if variance is known 3. Test for Heteroskedasticity: (Goldfeld-Quant test or White’s General Test or Breusch- Pagan Test) a. If present, use feasible Generalised Least Squares (if variance unknown and must be estimated) b. If no evidence use least squares as it is BLUE White’s Approximate Estimator for the Variances of the Least Sqaures Estimator under Heteroskedasticity: Whites estimator: a) Is strictly appropriate only in large samples b) If errors are homoskedastic, it converges to the least squares formula The variances of the OLS estimators depend on σi 2 rather than σ2 . In the case of the simple linear model: i i i e x y    2 1   The variance of b2 is given by:                 N i i i N i i N i i i w x x x x b 1 2 2 1 2 1 2 2 2 ) ) ) ( r̂ va   If we replace 2 i  with 2 ˆi e we obtain White’s heteroskedasticity consistent estimator White’s Robust Standard errors give the same coefficients but a reduced standard error. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 81.
    What would happenif we always compute the standard errors (and therefore t-ratios) using White’s formula instead of the traditional Least Squares? This is known as Heteroskedasticity-Robust Inference, and it is used by many applied economists. Robust estimation is a “branch” of econometrics. When the true variance is homoskedastic and the sample is large, Whites formula converges approximately to: N SSE  2 ̂ The Generalised Least Squares (Weighted Least Squares): 1. Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator 2. One way of overcoming this dilemma is to change or transform our statistical model into one with homoskedastic errors and then use Least squares 3. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error model into a homosedastic error model. If 2 i  is known then we can weight the original data (including the constant term) and then perform OLS on the transformed model. The transformed model is: * * * 2 2 * 1 1 * 2 2 1 ... ... 1 i iK K i i i i i i iK K i i i i i e x x x y or e x x y                      The transformed model satisfies all the assumptions of the multiple regression model (including homoskedasticity). Thus, applying OLS to the transformed model yields best linear unbiased estimates. The estimator is known as Generalised Least Squares (GLS) or Weighted Least Squares (WLS). Sometimes 2 i  is only known up to a factor of proportionality. In this case, we can still transform the original model in such a way that the transformed errors are homoskedastic. Some popular heteroskedastic specifications: ij ij ij i ij ij ij i x x x x x x by divide therefore, ) ( by divide therefore, ) ( 2 2 2 2 2 2 2             If our assumptions about the form of heteroskedasticity are incorrect then GLS will yield biased estimates. For: ij ij i x x by divide therefore, 2 2 2     2 2 2 2 * 1 ) var( 1 var ) var( t t t t t t i x x e x x e e             For: ij ij i x x by divide therefore, 2 2     Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 82.
    t t t t t t i x x e x x e e 2 *1 ) var( 1 var ) var(             Feasible Generalised Least Squares: If we reject the null hypotheses of homoskedasticity, we might wish to use an estimation technique for the coefficients and the standard errors that accounts for heteroskedasticity. We have already shown that if we “weight” the original data by some appropriate value we can achieve a transformed model with homoskedastic errors that can be estimated by Ordinary Least Squares (OLS). We also note that the task of finding na appropriate weight in a multiple regression model is more complicated as we might have several variables that are potentially an option. Feasible Generalised Least Squares is based on the idea that we should use all the information available, therefore, we will construct a suitable weight that is a function of all the explanatory variables in the original model. If 2 i  is unkown then it must be estimated. The resulting estimator is known as Feasible Generalised Least Squares (GLS) a popular specification: ) ... exp( 2 2 1 2 iS S i i z z         In this case, we estimate the model: i iS S i i i i v z z v e            ... ) ln( ) ˆ ln( 2 2 1 2 2 And then use the variance estimator: ) ˆ ... ˆ ˆ exp( ) ˆ ( 2 2 1 2 iS S i i z z         The aim is to produce a prediction 2 i  , based on the model and then use it to weight the original model. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 83.
    A Heteroskedastic Partition: Ifwe have the variance structure        N N i N i i ,..., 1 for ,..., 1 for 1 2 2 2 1 2    Then we can estimate by applying OLS to the first N1 observations, and estimate by applying OLS to the remaining N2 = N - N1 observations. We can then develop weights and apply GLS to the model in the usual way (using all N observations) We can apply GLS by generating a weight variable based on the two sub-samples:  Immediately after estimating each partitioned equation, save the SE of the regression: o Scalar se_rural=@se o Scalar se_metro=@se  Then generate the new series: o Series weight = metro*(1/se_metro) + (1-metro)*(1/se_rurual) Detecting Heteroskedasticity: Methods for detecting the presence of heteroskedasticity 1. Plots of the least squares residuals, or squared residuals (with more than one explanatory variable, we plot the least squares residuals against each explanatory variable, or against the fitted values, to see if those residuals vary in a systematic way relative to the specified variable. 2. White’s General test 3. Goldfeld-Quandt test 4. Breusch-Pagan test Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 84.
    Testing for heteroskedasticity: Thereare several possible test for heteroskedasticity as mentioned above. They all have the same hypotheses: H0: 2 2    t , the variance is constant (homoskedasticity) H1: 2 2    t , the variance is not constant (heteroskedasticity) White’s General Test: When conducting white’s general test of heteroskedasticity, there are two alternatives available: A: White test for heteroskedasticity – no crossed terms: 3 2 1 , , : 3 x x x S  This option is a regression of squared residuals on independent variables and their squares B: White test for heteroskedasticity (including crossed term 3 2 3 1 2 1 3 2 1 , , , , , : 6 x x x x x x x x x S  This option is a regression of squared residuals on independent variables, their squares and cross products. This option uses up more degrees of freedom and it is not recommended when the number of observations is relatively small. In both cases the test procedures are: i. Run the regression, save the residuals ii. Run a second regression of the squared residuals on original explanatory variables plus some extra terms The tests are valid in large samples and the computer “F” statstic is a small sample approximation. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 85.
    White’s General Test: Step1: State hypotheses H0: 2 2    t , the variance is constant (homoskedasticity) H1: 2 2    t , the variance is not constant (heteroskedasticity) Step 2: Decision rule Reject H0 is p-value < α Or Reject H0 if WG > 2 ) 1 (  s  where S = number of terms of interest Step 3: Calculate test statistic WG = N R2 Where: R2 is the coefficient of determination in the regression of 2 ˆi e on all unique variables contained in X1...XK, their squares (and their cross-products). Note: the test doesn’t require any specific assumptions about the form of heteroskedasticity. However, it may have low power, and it is non-constructive in the sense that it doesn’t tell us what to do next. Step 4: Decision Reject H0 or Do Not Reject H0 Step 5: Conclusion Conclusion about whether heteroskedasticity is present or not. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 86.
    The Goldfeld- QuandtTest The Goldeld-Quandt test involves splitting the sample into two approximately equal subsamples. If heteroskedasticity exists, some observations will have large variances and others will have small variances. 1. Divide the sample such that the observations with potentially high variances are in one subsample and those with potentially low variances are in the other subsample. Make subset 1 the group with higher variances. 2. Compute estimated error variances and for each of the subsamples. 3. Compute the Goldfeld-Quandt statistic and compare to the F-distribution critical values. Step 1: State hypotheses H0: 2 2 2 1    , the variance is constant (homoskedasticity) H1: 2 2 2 1    , the variance is not constant (heteroskedasticity) Step 2: Decision rule Reject H0 is p-value < α Or Reject H0 if GQ > ) , , 1 ( 2 1 K N K N F    where N1 and N2 are the number of observations in each subsample. Step 3: Calculate test statistic 2 2 2 1 /   GQ Step 4: Decision Reject H0 or Do Not Reject H0 Step 5: Conclusion Conclusion about whether heteroskedasticity is present or not. Notes: the above test is a one-sided test because the alternative hypothesis suggested which sample partition will have the larger variance. If we suspect that two sample partitions could have different variances, but we do not know which variance is potentially larger, then a two-sided test with alternative hypothesis is more appropriate. To perform a two-sided test at the 5% significance level we put the larger variance estimate in the numerator and use a critical value such that: P(F > Fcrit) = 0.025 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 87.
    Breusch-Pagan Test: Step 1:State hypotheses H0: 2 2    i , for all i H1:, ) ... ( 2 2 1 2 2 S S i z z h          Step 2: Decision rule Reject H0 is p-value < α Or Reject H0 if BP > 2 ) 1 (  S  Step 3: Calculate test statistic 2 R N BP   Step 4: Decision Reject H0 or Do Not Reject H0 Step 5: Conclusion Conclusion about whether heteroskedasticity is present or not. Note: The zi values are not specified by the test and are chosen by the analyst. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 88.
    Econometrics: ECON2300 –Lecture 8 Models with Autocorrelated Errors: In a time-series context, the multiple linear regression model assumption 4 states taht there is no serial correlation or autocorrelation: In cross-section situations, where all data is recorded at a single point in time, the randomness of the sample implies that the error terms for different observations (households or firms) will be uncorrelated. There is no particular ordering of the observations that is more natural or better than another. Recall: s t for e e e e e e corr s t s t s t    : ) var( ) var( ) , cov( ) , ( However, when we have time-series data, where the observations follow a natural ordering through time, there is always a possibility that successive errors will be correlated with each other. The change in a level of an explanatory variable may have behavioural implications beyond the time period in which it occurred. The consequences of economic decision that result in changes in economic variables can last a long time. The possibility of autocorrelation should ALWAYS be entertained when we are dealing with time-series data. These effects do not happen simultaneously but are spread, or distributed, over future time periods. As shown in Figure 9.1 economic actions taken at one point in time t have effects on the economy at time t, but also at times t+1, t+2...t+n. This carryover will be related to, or correlated with, the effects of earlier shocks, or impacts. When circumstances such as these lead to error terms that are correlated, we say that autocorrelation exists. It is important to note that MR2 (E(e) =0) and MR3 (Constant variance) can still hold when autocorrelation is present. Economic action at time t Effect at time t Effect at time t+1 Effect at time t+2 Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 89.
    First-order Autoregressive Errors: Inthis topic, we assume the errors follow an AR(1) process (i.e. an autoregressive process of order 1): t t t v e e   1  Where: 1 1     is the autocorrelation coefficient. Where the t v are independent random error terms with mean zero and constant variance as we usually assume about the error term in a regression model. s t v v v v E s t v t t     : for 0 ) , cov( ) var( 0 ) ( 2  When the equation errors follow an AR(1) model they continue to have a zero mean. For the variance of et, it can be shown that: 2 2 2 1 ) var(       v e t e (That the homoskedastic property holds) However, the covariance between the errors corresponding to different observations is different. Since we are using time-series data, when we say “ the covariance between errors corresponding to different observations” we are referring to the covariance between errors for different time periods. This covariance will be nonzero because of the existence of a lagged relationship between the errors from different time periods. k e k t t e e   2 ) , cov(   k > 0 Clearly we have shown that there is correlation in time series data: However, when we propose a model using time series data, we expect the independent variables (the x’s) to explain the behaviour of yt (unemployment) over time. Therefore, no correlation over time is expect to remain in the error term. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 90.
    Properties of theOLS Estimator: Consequences for the Least Squares Estimator If we have an equation whose errors exhibit autocorrelation, but we ignore it, or are simply unaware of it, what does it have on the properties of least squares estimates? 1. The least squares estimator is still a linear unbiased estimator 2. OLS however is inefficient (i.e. it is no longer the BLUE – Best Linear Unbiased Estimator) it is possible to find an alternative estimator with a lower variance. Having a lower variance means that these is a higher probability of obtaining a coefficient estimate lose to its true value. It also means hypotheses tests have greater power and a lower probability of a Type II error. 3. The formulas for the standard errors usually computed for the least squares estimators are no longer correct, and hence confidence intervals and hypothesis tests that use these standard errors may be misleading. Although the usual least squares standard errors are not the correct ones, it is possible to compute correct standard errors for the squares estimator when the errors are auto-correlated. These standard errors are known as HAC (Heteroskedasticity and autocorrelation consistent) standard errors, or Newey-West standard errors, and are analogous to the heteroskedasticity consistent standard errors introduced in chapter 8. These new estimators have the advantage of being consistent for autocorrelated errors that are not necessarily AR(1) and do not require specification of the dynamic error model that is needed to get an estimator with a lower variance. 1. Newey-West standard errors are robust to both autocorrelation and heteroskedasticity over time 2. Heteroskedasticity over time is when the variance changes over time. This is common in financial time series. 3. Newey-West are not recommended for the traditional heteroskedasticity in cross-sectional models such as those presented in previous topics – in this case White standard errors are recommended 4. These robust standard errors are recommended for large samples only. Estimation of Model with Autocorrelated Errors: 1. In order to estimate a model with autocorrelated errors, a model must be chosen. 2. The most commonly adopted model is the AR(1), “first order autoregressive process” This process postulates that a “proportion” of the previous period’s value of “e” is carried over. 3. An AR(1) model of the error: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 91.
    t t t v e e  1  Where: 1 1     is the autocorrelation coefficient which indicates teh “proportion” that is carried from period t-1 to t. Autocorrelation parameter: ρ Assume there is a shock of size “1” and the “proportion” that is remembered from one period to the next is 0.9. i.e ρ = 0.9. The shock will disappear from the memory after about 40 periods. There are several estimation options available: 1. Generalised Least Squares (ρ is known) i) Transform the model to a “star” model with non-autocorrelated errors ii) Use Least Squares on the transformed model (the β’s are estimated in this step) 2. Feasible Generalised Least Squares (Cochrane-Orcutt or Prais-Weinsten) i) Transform the model to a “star” model with non-autocorrelated errors (the parameter ρ is estimated in this step) ii) Use Least Squares on the transformed model (the β’s are estimated in this step) 3. Non-linear Estimation techniques The use of non-linear estimation techniques is recommended. Although the technical details of these techniques are beyond the scope of this course, we will make use of the Eviews non-linear estimation technique. Method 1: Generalised Least Squares If the errors follow an AR(1) process and is known then we can obtain unbiased and efficient estimates by applying OLS to a transformed model: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 92.
    Only T-1 observationsare used for the estimation (one observation is lost through lagging). This transformation is known as the Cochrane-Orcutt transformation. If ρ is unknown we can use the first-order sample correlation coefficient as an estimator. There are a number of ways to estimate ρ: ) ( r̂ va ) , ( v̂ co ) , ( 1 1 1 t t t t t e e e r e e corr      In eviews:  @cor(x1,x2)  Sample correlogram Example: to show how confidence intervals can be misleading P = price of sugar cane divided by the price of jute (a substitute) A = area of sugar cane planted in thousands of hectares in a region in Bangladesh Original model (how we would estimate without accounting for autocorrelation): In command line: ls log(a) c log(p) (0.2775) (0.06134) ) ( ) ln( 776119 . 0 89326 . 3 ) ( n̂ l se P A   Now lets estimate ρ from the sample correlagram: 1. In command line: ls log(a) c log(p) 2. View  Residual tests  Correlogram Q-tests Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 93.
    The transformed GLSequation is:     t t t t t t P P A A y             ) ln( 395 . 0 ) ln( ) 395 . 0 1 ( ) ln( 395 . 0 ) ln( 1 2 1 1 * In command line:  ls (log(a) - 0.395*log(A(-1))) (1-0.395) (log(P)-0.395*log(P(-1))) Dependent Variable: LOG(A)-0.395*LOG(A(-1)) Method: Least Squares Date: 05/21/10 Time: 18:20 Sample (adjusted): 2 34 Included observations: 33 after adjustments Variable Coefficient Std. Error t-Statistic Prob. 1-0.395 3.899243 0.087209 44.71165 0.0000 LOG(P)-0.395*LOG(P(-1)) 0.876123 0.255584 3.427925 0.0017 R-squared 0.274865 Mean dependent var 2.427009 Adjusted R-squared 0.251474 S.D. dependent var 0.324645 S.E. of regression 0.280875 Akaike info criterion 0.356874 Sum squared resid 2.445605 Schwarz criterion 0.447572 Log likelihood -3.888426 Hannan-Quinn criter. 0.387391 Durbin-Watson stat 1.773865 ρ  r1 = 0.395 k =1 k =2 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 94.
    Note that itis also possible to obtain unbiased and efficient estimates by estimating the model: This model is nonlinear in the parameters, which makes it difficult to find values of the parameters that minimise the sum of squares function. EViews finds the so-called nonlinear least squares (NLS) estimates numerically (by systematically evaluating the sum of squares function at different values of the parameters until the least squares estimates are found). NLS estimation is equivalent to iterative GLS estimation using the Cochrane- Orcutt transformation. Estimation equation:  (log(a)) = c(1)*(1-c(3)) + c(2)*log(p) + c(3)*log(a(-1)) -c(2)*c(3)*log(p(-1)) Dependent Variable: LOG(A) Method: Least Squares Date: 05/21/10 Time: 19:05 Sample (adjusted): 2 34 Included observations: 33 after adjustments Convergence achieved after 6 iterations (LOG(A)) = C(1)*(1-C(3)) + C(2)*LOG(P) + C(3)*LOG(A(-1)) -C(2)*C(3) *LOG(P(-1)) Coefficient Std. Error t-Statistic Prob. C(1) 3.898771 0.092166 42.30159 0.0000 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 95.
    C(3) 0.422139 0.1660472.542281 0.0164 C(2) 0.888372 0.259298 3.426060 0.0018 R-squared 0.277777 Mean dependent var 3.999309 Adjusted R-squared 0.229629 S.D. dependent var 0.325164 S.E. of regression 0.285399 Akaike info criterion 0.416650 Sum squared resid 2.443575 Schwarz criterion 0.552696 Log likelihood -3.874725 Hannan-Quinn criter. 0.462425 Durbin-Watson stat 1.820559 For an Autoregression model: In command window: ls log(a) c log(p) ar(1) Dependent Variable: LOG(A) Method: Least Squares Date: 05/21/10 Time: 19:06 Sample (adjusted): 2 34 Included observations: 33 after adjustments Convergence achieved after 7 iterations Variable Coefficient Std. Error t-Statistic Prob. C 3.898771 0.092165 42.30197 0.0000 LOG(P) 0.888370 0.259299 3.426048 0.0018 AR(1) 0.422140 0.166047 2.542284 0.0164 R-squared 0.277777 Mean dependent var 3.999309 Adjusted R-squared 0.229629 S.D. dependent var 0.325164 S.E. of regression 0.285399 Akaike info criterion 0.416650 Sum squared resid 2.443575 Schwarz criterion 0.552696 Log likelihood -3.874725 Hannan-Quinn criter. 0.462425 F-statistic 5.769216 Durbin-Watson stat 1.820560 Prob(F-statistic) 0.007587 Inverted AR Roots .42 Testing For Autocorrelation: There are several methods for detecting the presence of autocorrelation: 1. Residual plots Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 96.
    2. Residual Correlograms 3.Durbin-Watson test 4. Lagrange Multiplier test 1. Residual plots: Postive autocorrelation is likely to be present if residual plots reveal runs of positive residuals followed by runs of negative residuals. Negative autocorrelation is likely to be present if positive residuals tend to be followed by negative residuals and negative residuals tend to be followed by positive residuals (+ve, -ve, +ve, -ve in order) 2. Residuals Correlograms: (View  Residual tests  correlogram Q-Stat) Positive autocorrelation Negative autocorrelation Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 97.
    From the diagramit can be seen that Autocorrelation is significant 3. Durbin-Watson Test The Durbin-Watson test is by far the most important way of detecting AR(1) errors. t t t v e e   1  Where: 1 1     is the autocorrelation ρ is the parameter we test. It is assumed that the vt are independent random errors with distribution N(0, σ v 2 ) The assumption of normally distributed errors is needed to derive the probability distribution of the test statistic used in the Durbin-Watson test. The DW-Statistic probability distribution depends on the values of the explanatory variables. It is impossible to tabulate critical values that can be used for every possible problem. To overcome this problem we use a “bounds test” Durbin and Watson considered two other statistics dL and dU whose probability distributions do not dpend on the explanatory variables and which have the property that: dL < d < dU. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 98.
    Irrespective of theexplanatory variables in the model under consideration. Step 1: State Hypotheses H0: p = 0 ( no autocorrelation) H1: p > 0 (positive autocorrelation is present) – note we don’t usually test for –ve Autocorrelation Step 2: Decision Rule Reject H0 if d < dL Do not reject H0 if d > dU No conclusion if dL < d < dU Note : these values are tabulated in the durbin-watson tables Step 3: Calculate Test statistic From e-views output Step 4: Decision/Comparison Step 5: Conclusion 4. A Lagrange Multiplier (LM) Test (Breusch-Godfrey test) Step1: State Hypotheses H0: ρ = 0 H1: ρ ≠ 1 Step 2: Decision Rule Reject H0 if LM > 2 1  Step 3: Calculate test statistic LM = TR2 Where R2 is the coefficient of determination in the regression of the 1 2 ˆ and ,... , 1 on ˆ  t ik i t e x x e Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 99.
    And T isthe sample size. Alternatively, in Eviews: View  Residual tests  Serial LM test Step 4: Comparison/Decision Step 5: Conclusion Note the following points: 1. The Durbin-Watson test is an exact test valid in finite samples. The LM test is an approximate large-sample test, the approximation occurring because the random error is replaced by residuals 2. The Durbin-Watson test is not valid if one of the explanatory variables is the lagged dependent variable yt-1. The LM test can still be used in these circumstances. This fact is particularly relevant for distributed lag models. 3. The Durbin-Watson test is designed for a specific form of autocorrelation under the alternative, known as AR(1) errors (more on these shortly). The LM test can be used for other types of autocorrelated models by including the additional lagged errors, and using an F or 2  test to test the relevance of their inclusion. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 100.
    Econometrics – ECON2300:Lecture 9 Dynamic Models Autoregressive Models: An autoregressive model expresses the current value of a variable as a function of its own lagged values. An autoregressive model of order p, denoted AR(p) takes the form: Where the vt are independent random error terms with mean zero and constant variance 2 v  . The error term is “Well-behaved”, so the model can be estimated using OLS. The usual hypotheses testing procedures and goodness-of fit statistics are valid. We choose a value of p using the usual methods – hypothesis tests, residual analysis, information criteria, parsimony. We want to model U.S. inflation, given CPI data for the period December 1983 to May 2006. Preparing the data: The regression model: In this case we will use an AR(2) model: In command line: ls INFLN c INFLN(-1) INFLN(-2) Dependent Variable: INFLN Method: Least Squares Date: 05/21/10 Time: 22:17 Sample (adjusted): 1984M03 2006M05 Included observations: 267 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C 0.209278 0.021781 9.608328 0.0000 INFLN(-1) 0.355224 0.060520 5.869540 0.0000 INFLN(-2) -0.180537 0.060341 -2.991927 0.0030 R-squared 0.120232 Mean dependent var 0.253534 Adjusted R-squared 0.113568 S.D. dependent var 0.209803 S.E. of regression 0.197531 Akaike info criterion -0.394674 Sum squared resid 10.30084 Schwarz criterion -0.354368 Log likelihood 55.68895 Hannan-Quinn criter. -0.378483 Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 101.
    F-statistic 18.03964 Durbin-Watsonstat 1.963006 Prob(F-statistic) 0.000000 Now view correlogram of residuals: View  residual tests  correlogram Q-stat Now try an AR(3) model: In command line: ls INFLN c INFLN(-1) INFLN(-2) INFLN(-3) Dependent Variable: INFLN Method: Least Squares Date: 05/21/10 Time: 22:21 Sample (adjusted): 1984M04 2006M05 Included observations: 266 after adjustments Variable Coefficient Std. Error t-Statistic Prob. C 0.188335 0.025290 7.446877 0.0000 INFLN(-1) 0.373292 0.061481 6.071690 0.0000 INFLN(-2) -0.217919 0.064472 -3.380029 0.0008 INFLN(-3) 0.101254 0.061268 1.652641 0.0996 R-squared 0.129295 Mean dependent var 0.253389 Adjusted R-squared 0.119325 S.D. dependent var 0.210185 S.E. of regression 0.197247 Akaike info criterion -0.393799 Sum squared resid 10.19345 Schwarz criterion -0.339912 Log likelihood 56.37528 Hannan-Quinn criter. -0.372150 F-statistic 12.96851 Durbin-Watson stat 2.000246 Prob(F-statistic) 0.000000 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 102.
    The first modelAR(2) is slightly better as seen from the Schwarz criterion and AIC being lower in the first. Also, INFLN(-3) is not significant at the 5% LOS. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 103.
    The equation thattells us the value yT+1 is: Our estimate (or forecast of this value is): The forecast value two periods beyond the end of the sample is: And so on, for future periods. Confidence intervals for our forecasts are difficult to compute manually because forecast error variances are highly non-linear functions of the variances of the OLS estimators. Using our preferred model AR(2) to forecast two periods beyond the end of the sample: Accounting for coefficient uncertainty, a 95% confidence interval for yT+1 is: 39174 . 0 2599 . 0 19896 . 0 9689 . 1 2599 . 0 ˆ ˆ 1 1        c T t y 6516 . 0 1319 . 0 1     T y Finite Distributed Lag Models: An finite distributed lag (FDL) model expresses the current value of the dependent variable as a function of current and lagged values of exogenous variables (variables external to the variable of interest). If there is only one exogenous variable, a finite distributed lag model of order q takes the form: Where the vt are independent random error terms with mean zero and constant variance The coefficients βS are called distributed lag weights. Note: tc: scalartc = @qtdist(0.975,264) Can’t be calculated manually due to non-linearity Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 104.
    Again, the modelcan be estimated using OLS, and the usual hypothesis testing procedures and goodness-of-fit statistics are valid. We choose a value of q using the usual methods. (AIC and Schwartz). Impact and Delay Multipliers: Suppose y and x have been constant for at least the last q periods. Then suppose x is increased by 1 unit before returned to its original level. Then:  yt will increase by β0 units. This coefficient is known as the impact multiplier  yt+1 will increase by β1 units. This coefficient is called the one-period delay multiplier  yt+s will increase by βs units. These βs coefficients are called s-period delay multipliers Interim and Total Multipliers Supplose y and x have been constant for at least the last q periods. Then suppose x is increased by 1 unit and maintained at this new level.  yt will increase by β0 units. This coefficient is known as the impact multiplier  yt+1 will increase by β0 + β1units. This coefficient is known as the impact multiplier  yt+S will increase by β0 + β1+ ...+ βS units. This coefficient is known as the s- period interim multiplier  The final effect on yt+S is the total multiplier   q S 0  s-period delay multipliers Impact multiplier y x x y 0 α 0 α 0 α 1 α + β0 = impact multiplier 0 α + β1 = 1-period delay multiplier 0 α + β2 = 2-period delay multiplier 0 α 0 α 0 α s-period delay multipliers Impact multiplier y x β0 β1 β2 Total multiplier x y 0 α 0 α 0 α 1 α + β0 = impact multiplier 1 α + β0 + β1 = 1-period delay multiplier 1 α + β0 + β1 + β2 = 2-period delay multiplier 1 α + β0 + β1 + β2 1 α + β0 + β1 + β2 1 α + β0 + β1 + β2 Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 105.
    The Almon Lag: Multicollinearitycan be a problem in FDL (finite distributed Lag) models, particularly if q is large. This makes it difficult to identify the multipliers. One solution is to impose some constraints on the lag coefficients. A popular lag structure is the Almon lag (quadratic or second degree polynomial): Therefore: Simplifying: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 106.
    Therefore: In the Almonscheme y is regressed on the constructed Z variables, not on the original x variables. Once the α values are estimated, the original β’s can be estimated. Suppose that our model is given by: We know that each βt is given by: Therefore: β0 = α0 + α1(0) + α2(0)2 = α0 β1 = α0 + α1(1) + α2(1)2 = α0 + α1 + α2 β2 = α0 + α1(2) + α2(2)2 = α0 + 2α1 + 4α2 β3 = α0 + α1(3) + α2(3)2 = α0 + 3α1 + 9α2 β4 = α0 + α1(4) + α2(4)2 = α0 + 4α1 + 16α2 We use the data to estimate α0, α1, α2. This is known as Almon Polynomial. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 107.
    Infinite Distributed Lag: Themain problem with this type of model is that the number of lags “q” must be chosen empirically using selection criteria such as AIC and SBC. This is viewed as too data driven – you might fit the data perfectly but you’re actually fitting the errors, this is an example of an application of the taylor’s theory that any model can be fitted with high enough polynomials. A more appropriate approach is to choose an infinite distributed lag model: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 108.
    The model isimpossible to estimate since there are an infinite numbers of parameters. Models have been developed that are parsimonious (simpler but do an effective job) and which reduce the number of parameters to estimate. The Cost of reducing the number of parameters is that these models must assume particular patterns for the parameters βi which are called distributed lag weights. The most popular ones are known as Geometric distributed lag.  βi are called distributed lag weights  β is a scaling factor and the parameter ϕ is less than 1 in absolute value.  The lag weights βi decline towards zero as i gets larger. Although an improvement on the finite distributed lag. The geometric distributed lag still imposes a strong pattern of decline on the parameter. This model would not do well in a situation in which the peak effect does not occur for several periods, such as when modelling monetary or fiscal policy. Thus, a preferred approach is to use Autoregressive distributed lag model, ARDL. The ARDL are infinite distributed lag models, but they are flexible and parsimonious. An autoregressive distributed lag (ARDL) model expresses the current value of the dependent variable as a function of its own lagged values as well as current and lagged values of exogenous variables. When there is only one exogenous variable, an ARDL (p,q) model take the form: Where the vt are independent random error terms with mean zero and constant variance σv 2 . Again, the model can be estimated using OLS, and the usual hyptohese testing procedures and goodness-of-fit statistics are valid. We choose values of p and q using the usual methods. An example of an ARDL (Autoregressive distributed lag model) is: This is denoted as ARDL (1,1) as it contains one lagged value of x and one lagged value of y. A model containing p lags of y and q lags of x is denoted: ARDL(p,q) Despite its simple appearance the ARDL(1,1) model represents an infinite lag. Lags of y Lags of x Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 109.
    If the usualerror assumptions on the error term e hold, then the parameters can be estimated by least squares. Any ARDL(p,q) model can be written in the form of an infinite distributed lag model: Estimates of the lag weights β (and therefore the delay, interim and total multipliers) can be found from estimates of the δ and θ coefficients. The precise relationship between them depends on the values for p and q. For example if p = 2 and q = 3. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 110.
    The Geometrics Lag: Ifp = 1 and q = 0 then the ARDL(p,q) model is an infinite distributed lag model with: This model is also called a geometric lag model because the lag weights begin at δ0 and then evolve geometrically through time according to the relationship 1 1   S S    . If |θ1| < 1 then the total multiplier is: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 111.
  • 112.
    Econometrics – ECON2300:Non-Stationary Time Series Data Lecture 10 Stationarity: A time series yt is said to be stationary if the mean, variance and covariances of the series are all finite and do not depend on t. Mathematically, the series is stationary if: E(yi) = μ < ∞ (constant mean) var(yi) = σ2 < ∞ ( constant variance) cov( yt,yt+s) = cov( yt,yt-s) = γS (covariance depends on s, not t) Applies mainly to Macroeconomic variables: Bond ratio’s, Stock market, exchange rates etc. Recall from last week that AR(1) models take the form: t t t v y y    1   Where the vt are white noise. If |ρ| < 1 then The following are two examples of stationary data: These conform with the conditions of stationarity. (see above) Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 113.
    Trend-stationary Processes: Consider amodel with a deterministic trend: t t v t y      Where the vt are white noise. For this model: t y E t     ) ( So the process is non-stationary. If we knew the value of λ we could get a stationary process by de- trending. The unknown parameters can be estimated using OLS. A non-stationary process is said to be trend-stationary if it can be made stationary by de-trending. Difference-stationary Processes: Consider the special case of an AR(1) model with ρ = 1 t t t v y y    1  In this case: 2 0 ) var( ) ( v t t t y y t y E      So the process is non-stationary. The model is known as a random walk with drift. In the special case where α = 0, the model is known simply as a random walk. (no drift) First series is trending upward, if we estimate the equation we can de-trend it. De-trended series: we found that the series increased by 0.1 per unit of time therefore, if we remove this from there series, it becomes trend –stationary. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 114.
    An interesting featureof a random talk process is that the first differenced series is stationary. For a random walk process with drift: t t t t v y y y        1 Thus, 0 for 0 ) , cov( ) var( ) ( 1 2          x y y y y E t t v t t   A non-stationary process is said to be difference-stationary if it can be made stationary by differencing but cannot be made stationary by de-trending. Random walk with drift Can be made stationary through differencing Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 115.
    Spurious Regressions: The mainreason why it is important to know whether a time series is stationary or non-stationary before one embarks on a regression analysis is that there is a danger of obtaining apparently significant regression results from unrelated data when non-stationary series are used in regression analysis. If one or more variables in a regression analysis are difference stationary then there is danger of obtain apparently significant results even though the variables are totally unrelated. Such regressions are said to be spurious. In such cases:  The finite sample properties of the OLS estimator are unknown  The usual t-and F-statistics do not have well-defined distributions  R2 values are totally unreliable  The DW statistic tends to zero Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 116.
    Unit Root Tests Ifwe known a variable is non-stationary then it is important to determine whether it is difference- stationary or trend-stationary. One popular approach is to consider the model: And then use formal testing procedures to test: Step 1: State Hypotheses H0: ρ = 1 H1: ρ < 1 Note that unit root tests are complicated by the fact that if H0 is true then the distribution of the OLS estimator depends on whether the (unknown) true data generating process contains an intercept and/or a time trend. . Thus, we can’t just estimate (1) and conduct a standard t-test. Instead we can conduct a dickey-fuller (DF) tests. Dickey-Fuller (DF) Tests To conduct a Dickey-Fuller test we use OLS to estimate one or more of the models: Where γ ≡ ρ -1. Equation 2 is just another way of writing equation (1) above. Equations 3 and 4 are restricted versions of 2. Irrespective of which equation we estimate, we will reject the null hypotheses H0: γ= 0 by comparing the standard t-statistic to critical values obtained from special Dickey-Fuller tables. In this situation, the t-statistic is usually called a τ (tau) statistic. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 117.
    Our ability toreject H0: γ= 0 when it is false (the power of the test) is low if the estimated regression doesn’t contain exactly the same deterministic regressors as the true data-generating process. What to do? In practise if the series appears to be wandering or fluctuating around a  Linear trend we use test equation (2)  Sample average that is non-zero we use test equation (3)  Sample average of zero we use test equation (4) If we use a particular test equation and fail to reject H0, this could be because the test equation contains the wrong deterministic regressors. Given this possibility, we sometimes re-conduct the test using a different test equation. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 118.
    Augmented Dickey-Fuller (ADF)Tests: To conduct an ADF test we follow exactly the same procedure except we estimate the models: We add as many “augmentation terms: as we need to ensure the residuals are not autocorrelated. We still test H0: γ = 0 by comparing the tau statistic to critical values in the usual DF tables. Model Selection and Estimation: If we suspect a time series contains a (deterministic or stochastic) trend we should... 1. Conduct unit root tests 2. If there is no unit root then any trend must be deterministic implying we should include a time trend in our model 3. If there is a unit root then as a rule, we should first-difference the series. If we suspect this new series contains a trend we should return to Step 1. Otherwise we can go ahead and model the first-differenced series. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 119.
    ECON2300 – Lecture11: Cointegration Order of Integration: Recall that if yt follows a random walk, wt then γ = 0 and the first difference of yt becomes: t t t t v y y y     1 An interesting feature of the series t t t t v y y y     1 is that it is stationary since vt, being an independent (0, σv 2 ) random variable, is stationary. Series like yt, which can be made stationary by taking the first difference, only once then it is said to be integrated of order one, and is denoted I(1). Stationary series are said to be integrated of order zero, and are denoted I(0). In general, the order of integration is the minimum number of times the series must be differenced to make it stationary. Linear combinations of I(1) variables are usually also I(1). But this is not always the case. For example: The following series: the Federal Funds rate: Conduct a dicky-fuller test for stationarity of the Change in federal funds rate: ∆F 1. The ∆F plot appears to be stationary and fluctuating around 0, therefore we use the test equation without the intercept term. ∆F : Change in the Federal Funds Rate F: Federal Funds Rate 0 Please note that LIFT does not warrant the correctness of the materials contained within the notes. Additionally, in some cases, these notes were created for previous semesters and years. Courses are subject to change over time, both in content and scope of assessment. Thus the information contained within may or may not be assessed this semester, or the information may have been superseded. These notes reproduce some copyrighted material, the use of which has not always been specifically authorised by the copyright owner. We are making these materials available for the purposes of research and study and as such believe that this constitutes fair dealing with any such copyrighted material pursuant to s 40 Copyright Act 1968 (Cth). Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 120.
    From the testwe reject the null hypotheses that ∆F is non-stationary and accept the alternative that it is stationary. Therefore as the first difference is stationary, we say that the series Ft is I(1) because it had to be differenced once to make it stationary. Note also that ∆Ft is I(0) as it is a stationary series. Suppose that wt is a random walk and εyt and εxt are white noise. The following processes are all I(1): Linear combinations of I(1) variables are usually I(1). However, sometimes, linear combinations of I(1) variables are I(0). In this case, the variables are said to be cointegrated. Linear combination Cointegration: when two or more I(1) variables are combined linearly and result in an I(0) Linear combination Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 121.
    The linear combinationof zt is I(0) so yt and xt are cointegrated. Example: In this case, there is no linear combination of yt and xt that is I(0). Therefore the variables are not cointegrated. If we regress y on x we see that our model and explanatory variables are highly significant! (F-stat = 1482.33, p-values <<0.001) Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 122.
    BUT! As wediscussed last week, this regression is spurious. The two variables were generated independently and in truth, have no relation to one another, yet the results suggest that the simple regression model fits the data well and is highly significant! These results are, however, completely meaningless or spurious. The apparent significance of the relationship is false. It results from the fact that we have related one series with a stochastic trend to another series with another stochastic trend. Example 3: Again, there is no linear combination of yt and xt that is I(0) so the two variables are no cointegrated. As a general rule, non-stationary time-series variables should not be used in regression models, to avoid the problem of spurious regression. However, there is an exception to this rule. If yt and xt are non- stationary variables I(1) variables, then we expect their difference, or any linear combination of them, such as et = yt – β1 – β2xt to be I(1) as well. However there is an important case where et = yt – β1 – β2xt is a stationary I(0) process. In this case yt and xt are said to be cointegrated. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 123.
    Example 4: The linearcombination t t t x y z 25 . 1   is I(0) so the two variables are cointegrated. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 124.
    Example 5: Downloaded byLamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 125.
    Testing for Cointegration: Whentwo variables are cointegrated, it implies that yt and xt have similar stochastic trends, and since the difference et is stationary, they never diverge too far from each other. A natural way to test whether yt and xt are cointegrated is to test whether the errors et = yt – β1 – β2xt are stationary. Since we cannot observe et, we test the stationarity of the least squares residuals t ê = yt – b1 – b2xt using the Dickey-Fuller test. The test for cointegration is effectively a test of the stationarity of the residuals. If the residuals are stationary, then yt and xt are said to be cointegrated; if the residuals are non- stationary, then yt and xt are not cointegrated, and any apparent regression relationship between them is spurious. Suppose the variables tK t t x x y ,..., , 2 are all I(1) and are cointegrated. Then we can write: t tK K t t e x x y         ... 2 2 1 Where et is I(0). In this case OLS is unbiased and (super-)consistent. If the variables are not cointegrated then et will be I(1) and the regression would be spurious. One method of testing whether variables are cointegrated is to estimate the regression model and use an ADF (augmented dickey-fuller) test to determine whether the residuals are I(0). However, because the test is based on estimates of the et we must use critical values obtained from special Engle-Granger (EG) tables. Step 1: State Hypotheses H0: the series are not cointegrated  residuals are non-stationary H1: the series are cointegrated  residuals are stationary Engle-Granger (EG) table of critical values: Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 126.
    Example 5: continued Downloadedby Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 127.
    Downloaded by LaminDampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 128.
    Downloaded by LaminDampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 129.
    Downloaded by LaminDampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 130.
    Downloaded by LaminDampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205
  • 131.
    Model Selection andEstimation:  If variables are stationary, or I(1) and cointegrated, we can estimate a regression relationship between the levels of those variables without fear of encountering a spurious regression.  If the variables are I(1) and not cointegrated, we need to estimate a relationship in first differences.  If the variables are trend-stationary we should estimate a regression relationship that includes a trend variable. Downloaded by Lamin Dampha (ldampha@utg.edu.gm) lOMoARcPSD|2941205