3. BASIC ECONOMETRICS
Classical Assumptions
Introduction
Basic Econometrics
The simplest econometric model is the ordinary least square model (OLS).
This model minimizes the sum of squared errors (deviation between actual values
and estimated values of the dependent variable). The classical linear regression
model (CLRM) is buit upon some important assumptions.
By relaxing these assumptions of CLRM, we are confronted with some
econometric problems. Major econometric problems arises when we relax these
assumptions of CLRM are;
1.Hetroscedasticity
2.Autocorrelation and
3.Multicollinearity
4. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Basic Econometrics
The classical linear regression model is that the disturbances ui appearing in the
population regression function are homoscedastic; that is, they all have the same
variance. In this lesson we examine the validity of this assumption and find out
what happens if this assumption is not fulfilled. We seek answers to the following
questions:
What is the nature of heteroscedasticity?
What are its consequences?
How can we detect it?
What are the remedial measures?
5. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Basic Econometrics
Nature of Heteroscedasticity
Where the conditional variance of the Y population varies with X, this situation is
known appropriately as heteroscedasticity or unequal spread or variance. That is,
We can illustrate the problem of Hetroscedasticity as in Figure 1
Figure 1 Hetroscedasticity
6. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Reasons for Hetroscedasticity
Various reasons for the origin of Hetroscedasticity are;
1.In an error learning model, as people learn, their error of behaviour become smaller over time.
2.As income grows, people have more discretionary income & hence more scope for choice about the
disposition of their income.
3.As data collecting techniques increases σi
2 is likely to decrease.
4.If can also arise as a result of the presence of collinear.
5.If there is skewness in the distribution of one or more regressors included in the model, there is chances of
hetroscedasticity.
6.Incorrect data transformation.
7.Incorrect functional form.
Basic Econometrics
7. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Under CLRM, these OLS estimators are BLUE. Now with Heteroscedasticity, the consequences are;
1.OLS estimators are still linear
2.OLS estimators are still unbiased
3.But they no longer have minimum variance. That is, they are no longer efficient. In short, OLS estimators
are no longer BLUE in small as well as in large samples.
4.The bias arises from the fact that the conventional estimator of true σ2 is no longer and unbiased estimator
of As a result the usual confidence intervals and hypothesis tests based on t and F distributions are
unreliable. If conventional testing procedures are employed there is a possibility of drawing wrong
conclusions
In short, in the presence of Heteroscedasticity OLS estimators are no longer BLUE. So we rely on other
methods like Generalized Least Square (GLS) for estimation. Similarly, ordinary testing of hypothesis is not
reliable raising the possibility of drawing wrong conclusions. Therefore it is essential to detect and solve the
problem of Heteroscedasticity before estimation.
Basic Econometrics
8. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Detection of Heteroscedasticity
by hoping they are the good estimates of disturbances ui. It is noted that there are no hard and fast rules for
detecting Heteroscedasticity and we have only a few rules of thumb. But this situation is inevitable because
can be known only if we have the entire Y population corresponding to the chosen X‘s. But such data is a rare
case in most economic investigations. Therefore in most cases, involving economic investigations
Heteroscedasticity may be a matter of intuition, educated guess work, prior empirical experience, or sheer
speculation.
Let us examine some of the informal and formal methods of detecting Heteroscedasticity. Most of these
methods are based on the examination of the OLS residuals ûi since they are the one we observe, and
Informal Methods
Nature of Problem: - Very often nature of the problem under consideration suggests whether
Heteroscedasticity is likely to be encountered. Based on the past studies, one can analyse the nature of
hetroscedasticity in the surveys. Now one generally assumes that in similar surveys one can expect unequal
variances among the disturbances. As a matter of fact, in cross-sectional data involving heterogeneous units,
hetroscedasticity may be the rule rather than exception.
Basic Econometrics
9. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Detection of Heteroscedasticity
Graphical Method: - If there is no empirical information about the nature of Heteroscedasticity, in practice one
can do the regression analysis on the assumption that there is no Heteroscedasticity & then do a post-mortem
examination of the residual squared ûi
2 to see if they exhibit any systematic pattern. Although ûi
2 are not the
same thing as ui
2, they can be
used as proxies especially if the sample size is sufficiently large.
An examination of the ûi
2 may reveal the following patterns (Figure 3.2).
Here we are plotting ûi
2 against the estimated Y values, Ŷi. Then we are finding out whether the Ŷi is
systematically related to ûi
2. If they show some patterns, it means that there is hetroscedasticity.
Figure 3.2 Detection of Hetroscedasticity
Basic Econometrics
11. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
In figure ‗a‘, we see that there is no systematic pattern between the two variables, suggesting no
hetroscedasticity is present in data. But from figures ‗b‘ to ‗e‘ they show some patterns and therefore there is
hetroscedasticity in these data.
Formal Methods
Park Test: - Park formalized the graphical method by suggesting that is some function of the explanatory
variable Xi. His suggested functions are
Since is generally not known, Park suggested using ûi, as a proxy and running following regression.
Basic Econometrics
12. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Basic Econometrics
If β turn out to be statistically significant, it would suggest that Heteroscedasticity is present in the data.
Then, Park test is a two stage procedure
• We run the OLS regression disregarding the heteroscedasticity question.
• Run the regression
GoldFeld Quandt Test: - One of the popular methods, in which of one assumes that the Heteroscedastic variance is positively
related to one of the explanatory valuables in the regression model.
Suppose is positively related to Xi as;
Where is a constant. This equation gave us the idea that,
is proportional to the square of the X variable. That is, would be larger if X variable become larger.
Therefore, hetroscedasticity is more likey to be present in the model.
13. BASIC ECONOMETRICS
Classical Assumptions
Hetroscedasticity
Basic Econometrics
Remedial Measures for Hetroscedasticity
.When is known: The Method of Weighted Least Squares
As we have seen, if is known, the most straight forward method of correcting
heteroscedasticity is by means of weighted least squares.
Although weighted least squares is treated as an extension of OLS, technically it’s the other
way around: OLS is a special case of weighted least squares. With OLS, all the weights are
equal to 1. Therefore, solving the WSS formula is similar to solving the OLS formula.
When is not known:
If true σi² are known, we can use the WLS method to obtain BLUE estimators. Since the true
σi² are rarely known. Therefore, if we want to use the method of WLS, we will have to
resort to some adhoc assumption about σi² and transform the original regression model
so that the transformed model satisfies the hetroscedasticity assumption.
Re-specification of the model
Instead of speculating , a re-specification σi² of the model choosing a different functional
form can reduce hetroscedasticity. For example, instead of running linear regression, if we
estimate the model in the log form, it often reduces hetroscedasticity.
14. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION
Basic Econometrics
There are generally three types of data that are available for empirical analysis:
•Cross section,
•Time series, and
Combination of cross section and time series, also known as pooled data.
In developing the classical linear regression model (CLRM) we made several assumptions.
However, we noted that not all these assumptions would hold in every type of data.
As a matter of fact, we saw in the previous section that the assumption of homoscedasticity, or
equal error variance, may not be always tenable in cross-sectional data.
In other words, cross-sectional data are often plagued by the problem of heteroscedasticity
15. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION -NATURE
Basic Econometrics
If there are no correlation between members of series of observation ordered in
time (as in time series data) or space as in cross-sectional data) is known as the
assumption of no autocorrelation.
That is,
Autocorrelation doesn't exist in the disturbance ui if, E(ui, uj) = 0, if i≠j
Otherwise, if the disturbance terms of a dataset that are ordered in time or space
are correlated each other, the situation is generally termed as autocorrelation.
That is,
E(ui, uj) ≠ 0, if i≠j
Now let us see some possible patterns of auto and no autocorrelation as Figure 3.3.
On the vertical axis of the Figure 3.3, we take both population disturbances (u) and
its sample counterpart (û) and on the horizontal axis time. Then we plot the
corresponding points.
16. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-NATURE
From the Figure 3.3, Part (a) to Part (d) errors follow some systematic patterns. Hence, there is
autocorrelation. But e Part(e)Revealsno suchpatternsand hence there isno autocorrelation.
Figure 3.3 Patterns of Autocorrelation
Basic Econometrics
17. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-NATURE
Positive and negative autocorrelation
Autocorrelation can be positive or negative. The value of autocorrelation varies from -1 (for
perfectly negative autocorrelation) and 1 (for perfectly positive autocorrelation). The value closer to
0 is referred to as no autocorrelation.
Positive autocorrelation occurs when an error of a given sign between two values of time series
lagged by k followed by an error of the same sign. When data exhibiting positive autocorrelation is
plotted, the points appear in a smooth snake- like curve, as on the left in Figure 3.4.
Figure 3.4 Types of Autocorrelation
Basic Econometrics
18. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-REASONS
Negative autocorrelation occurs when an error of a given sign between two values of time series
lagged by k followed by an error of the different sign. With negative autocorrelation, the points form
a zigzag pattern if connected, as shown on the right of figure 3.3.
The following are the major reasons for autocorrelation.
Inertia: - Silent feature of most of the time series is inertia or sluggishness. Well known examples in
time series are GNI, price Index.
Specification Bias: Excluded variable case: - Residuals (which are estimate of ui) may suggest that
same variable that were originally candidates but were not included in the model for a variety of
reasons should be included.
Yi = Quantity of meat demanded
X2 = Price of meat
X3 = Consumer income X4 = Price of Fish
t = Time
Basic Econometrics
19. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-REASONS
After Regression,
Basic Econometrics
•Specification Bias: Incorrect functional form:-
For explaining this, first we are taking case of a marginal cost function,
Marginal Costt =
But instead of this, suppose we get the following model.
This can be depicted as Figure 3.5.
Figure 3.5 Specification Bias
20. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-REASONS
Cobweb Phenomenon: - The supply of many agricultural commodities reflects the so called cobweb
Phenomenon.Where supply reacts to price with a lag of one time period because supply decisions
takes time implement.
Lag: - In time series regression model, sometimes the lagged value of the dependant variable also
included as one of the explanatory variable. For example,
Consumption =
Manipulation of data: - In empirical analysis the raw data are often manipulated.
Data Transformation:- Sometimes data transformation leads to autocorrelation.
Basic Econometrics
21. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-CONSEQUENCES
Consequences of Autocorrelation
In the presence of autocorrelation one should not use OLS for estimation, to establish confidence intervals and
to test hypothesis. We should use Generalised Least Squares (GLS)
method for these purposes. Because in the presence of autocorrelation,
1.The least square estimators are still linear and unbiased.
2.But they are not efficient compared to the procedures that take into account autocorrelation. In short, the
usual OLS estimators are not BLUE because they do not possess the property of minimum variance.
Apart from this, the other consequences of autocorrelation are;
3.The estimated variances of OLS estimators are biased. Sometimes, the usual formulas to compute the
variances and standard errors of OLS estimators seriously underestimate true variances and standard errors,
there by inflating ‗t‘ values
Basic Econometrics
22. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-CONSEQUENCES
4.Therefore, the usual ‗t‘ and F tests are not generally reliable.
5.The usual formula to compute the error variance is a biased estimator of true σ2 .
6.As a consequence, the conventionally computed R2 may be unreliable measure of true R2
7.The conventionally computed variances and SEs of forecast may also be inefficient.
Basic Econometrics
23. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-DETECTION MEASURES
Basic Econometrics
Detection Measures
There are varieties of tests to detect autocorrelation.
1.Graphical Method
There are various ways of examine the residuals (error) under graphical method
Time sequence plot (Figure 1)
Standardized residual (Figure 2)
24. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-DETECTION MEASURES
Basic Econometrics
Both figures (Figures 1 and 2) clearly shows that the residuals follow some systematic patterns
and hence there is autocorrelation
Runs test
Initially, we have several residuals that are negative, then there is a series of positive residuals,
and then there are several residuals that are negative. If these residuals were purely random,
could we observe such a pattern? Intuitively, it seems
unlikely. This intuition can be checked by the so-called runs test, sometimes also known as the
Geary test, a nonparametric test. This is also a crude method.
For the Runs test, let us simply note down the signs of the residuals as * or -. Suppose we have
these signs as;
We now define a run as an uninterrupted sequence of one symbol or attribute, such as + or -.
We further define the length of a run as the number of elements in it.
By examining how the runs behave in a strictly random sequence of observations, we can derive
a test of randomness of runs. If there are too many runs, it means that the residuals change sign
frequently, thus suggesting negative autocorrelation. Similarly, if there are too few runs, it
suggests positive autocorrelation.
25. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-DETECTION MEASURES
Durbin- Watson d‘ Statistic
It is one of the good methods as the d statistic is based on the estimated residuals, which are computed in
regression analysis. It is defined as;
It is simply the ratio of the sum of squared differences in successive residuals to the RSS. It is note
that in the ‗d‘ statistic, the number of observations is n -1 because one of the nation is lost in taking
successive differences.
Basic Econometrics
26. BASIC ECONOMETRICS
Classical Assumptions
AUTOCORRELATION-DETECTION MEASURES
Remedial Measures
Try to find out if the autocorrelation is pure autocorrelation or not because of the result of the mis-
specification of the model.
Transformation of original model, so that in the transformed model we do not have the problem of (Pure)
autocorrelation.
In case of large sample we can Newey-West method to obtain standard error of OLS estimators that are
corrected for auto correlation.
In some situation we can continue to use the OLS method
Remedial Measures
1.Try to find out if the autocorrelation is pure autocorrelation or not because of the result of the mis-
specification of the model.
2.Transformation of original model, so that in the transformed model we do not have the problem of (Pure)
autocorrelation.
3.In case of large sample we can Newey-West method to obtain standard error of OLS estimators that are
corrected for auto correlation.
4.In some situation we can continue to use the OLS method
Basic Econometrics
27. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity
Basic Econometrics
Another important assumption of the Classical Linear Regression Model (CLRM) is that there
is no Multicollinearity among the regressors included in the multiple regression models. In
practice, one rarely encounters perfect multicollinearity but cases of near or very high
Multicollinearity can be found, where explanatory variables are linearly correlated in many
instances.
The term multicollinearity was coined in 1934 by Ragnar Frisch in his book Confluence
Analysis‘.Because of strong interrelationships among the explanatory variables, it becomes
difficult to find out how much each of these will influence the dependent variable.
Usually economic variables are related in several ways and because of inter-relationship
among the explanatory variables, often the statistical results gained from them are found to be
ambiguous, a multicollinearity problem is said to exist. Under this section, we are explaining
the nature, reasons, consequences, detection measures and ways to solve the problem of
multicollinearity.
28. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-NATURE
Basic Econometrics
Multicollinearity generally occurs when there are high correlations between two or more
predictor variables. In other words, one predictor variable can be used to predict the other.
This creates redundant information, skewing the results in a regression model.
In classical linear model(CLM) it was assumed that there are no exact linear relationships
among the sample values of the explanatory variables. This requirement can also be stated as
the absence of perfect multicollinearity. The linear relationship among the sample values of
the explanatory variables is known as multicolinearity. The existence of perfect
multicollinearity means that the OLS method cannot provide estimates for population
parameters
To understand multicollinearity, consider the following model:
Y = β1+β2X2+β3X3+u (1)
29. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-NATURE
Basic Econometrics
Where hypothetical sample values for X2 and X3 are given below:
X 2 : 1 2 3 4 5 6
X3 : 2 4 6 8 10 12
From this we can easily observe that X3 =2X2.
Therefore, while Equation (1) seems to contain two distinct explanatory variables (X2 and X3),
in fact the information provided by X3 is not distinct from that of X2. This is because, as we
have seen, X3 is an exact linear function of X2.
When this situation occurs, X2 and X3 are said to be linearlydependent,whichimpliesthatX2
andX3 are perfectly collinear. More formally, two variables X2 and X3 are linearly dependent if
one variable can be expressed as a linear function of the other
30. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-NATURE
Basic Econometrics
1.The data collection method employed, for example, sampling over a limited range of the
values taken by the regressors in the population.
2.Constraints on the model or in the population being sampled.
For example, in the regression of electricity consumption on income (X2) and house size (X3)
there is a physical constraint in the population in that families with higher incomes generally
have larger homes than families with lower incomes.
3.Model specification, for example, adding polynomial terms to a regression model, especially
when the range of the X variable is small.
4.An overdetermined model.
This happens when the model has more explanatory variables than the number of
observations. This could happen in medical research where there may be a small number of
patients about whom information is collected on a large number of variables.
31. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-CONSEQUENCES
Basic Econometrics
Consequences of Multicollinearity
It can be shown that even if the multicollinearity is very high the OLS estimators are still retain
the property of BLUE.
Theoretical consequences
It is true that even in the case of high multicollinearity, the OLS estimators are unbiased, but
unbiasedness is a multi sample or repeated sampling phenomenon.
But this saysnothing about the properties of estimators in any given sample
It is true that collinearity does not destroy the property of minimum variance in the class of all
linear unbiased estimators.
The OLS estimators have minimum variance that is their efficient. But it does not mean that
the variance of OLS estimator will necessarily be small Multicollinearity is essentially a sample
phenomenon in the sense that even if the X variables are not linearly related in the
population they may be so related in the particular sample
For these reasons, the fact that the OLS estimators are BLUE despite multicollinearity is of
little consolation in practice.
32. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-CONSEQUENCES
Basic Econometrics
Practical consequences
1.Although BLUE, the OLS estimators have large variances and co-variances making the
precision difficult.
Because of this, the confidence intervals tend to be much wider leading to the acceptance of
these zero null hypothesis more rapidly.
2.Because of this, the t‘ ratio of one or more coefficients tend to be statically insignificant in
the case of high collinearity, the estimated standard error increase dramatically by making t
values smaller. Therefore, in such cases one increasingly accept the null hypothesis
3.Although t ratios of one or more coefficients is insignificant statistically, the R2 (the overall
measure of goodness of fit) can be very high. That is, on the basis of t‘ test one or more of the
partial slope coefficients are statistically insignificant and we accept
33. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-CONSEQUENCES
Basic Econometrics
H0: β2 = β3 = βk =0
But R2 is so high,say 0.9, on the basis of F test one can reject H0. That it is one of the signals of
multicollinearity - insignificant ‗t‘ values, but a high overall R2 and a significant F value.
The OLS estimators and their standard errors can be sensitive to small changes in the data.
34. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-DETECTATION
Basic Econometrics
Since multicollinearity is essentially a sample phenomenon, arising out of the largely nonexperimental data
collected in most social sciences, we do not have one unique method of detecting it or measuring its
strength.
1.Simple correlation coefficient
Multicollinearity is caused by inter correlations between the explanatory variables.
Therefore,the most logical way to detect multicollinearity problems would appear to be through the
correlation coefficient for these two variables. When an equation contains only two explanatory variables,
the simple correlation coefficient is an adequate measure for detecting multicollinearity. If the value of the
correlation coefficient is large then prob lems from multicollinearity might emerge. The problem here is to
define what value can be considered as large, and most researchers consider the value of 0.9 as the
threshold beyond which problems are likely to occur.
2. R2 from auxiliary regressions
In the case where we have more than two variables, the use of the simple correlation coefficient to detect
bivariate correlations, and therefore problematic multicollinearity, is highly unreliable, because an exact
linear dependency can occur among three or more variables simultaneously. In these cases, we use
auxiliary regressions. If a near-linear dependency exists, the auxiliary regression will display a small
equation standard error, a large R2 and a statistically significant t-value for the overall significance of the
regressors.
3.Variance Inflation Factor
Therefore, VIF can be used as an indicator of multicollinearity. The larger the value of VIF the more
troublesome or collinear is the variables Xj and vice versa. As a rule of thumb, if the VIF > 10 of a variable
35. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-DETECTATION
Basic Econometrics
Examination of partial correlations.
Farrar and Glauber
have suggested that one should look at the partial correlation coefficients. Thus, in the
regression of Y on X2 , X3 , and X4 , a finding that R2 1.234 is very high but r 2 12 .34, r2 13 .24,
and r2 14 .23 are comparatively low may suggest that the variables X2 , X3 , and X4 are highly
inter correlated and that at least one of these variables is superfluous.
Klien’s rule of thumb
which suggests that multicollinearity may be a troublesome problem only if the R2 obtained
from an auxiliary regression is greater than the overall R2 , that is, that obtained from the
regression of Y on all the regressors.
36. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-REMEDIALS
What can be done if multicollinearity is serious?
We have two choices:
(i) Do nothing or (ii) Follow some rules of thumb.
Blanchard Says: “When students run their first ordinary least squares (OLS) regression, the first
problem that they usually encounter is that of multicollinearity. Many of them conclude that there is
something wrong with OLS; some resort to new and often creative techniques to get around the
problem. But, we tell them, this is wrong, Multicollinearity is God’s will, not a problem with OLS or
statistical technique in general.”
Do Nothing What Blanchard is saying is that multicollinearity is essentially a data deficiency problem
and some times we have no choice over the data we have available for empirical analysis. Also, it is
not that all the coefficients in a regression model are statistically insignificant. Moreover, even if we
cannot estimate one or more regression coefficients with greater precision, a linear combination of
them (i.e., estimable function) can be estimated relatively efficiently.
Basic Econometrics
37. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-REMEDIALS
Follow some rules of thumb
A priori information Suppose we consider the model:
Yi = β1 +β2X2i +β3X3i +ui
where Y=consumption, X2=income, and X3=wealth.
As noted before, income and wealth variables tend to be highly collinear. But suppose a priori we
believe that β3 =0.10β2 ; that is, the rate of change of consumption with respect to wealth is one-
tenth the corresponding rate with respect to income.
We can then run the following regression:
Yi = β1 +β2X2i +0.10β2X3i +ui = β1 +β2Xi +ui
where Xi = X2i +0.1X3i .
Once we obtain β2 , we can estimate β3 from the postulated relationship between β2 and β3 .
Combining cross-sectional and time series data:
A variant of the extraneous or a priori information technique is the combination of cross-sectional
and time-series data, known as pooling the data.
Basic Econometrics
38. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-REMEDIALS
Dropping a variable(s) and specification bias: When faced with severe multicollinearity, one of the
“simplest” things to do is to drop one of the collinear variables. Thus, in our consumption–income–
wealth illustration, when we drop the wealth variable, we obtain regression, which shows that,
whereas in the original model the income variable was statistically insignificant, it is now “highly”
significant.
Transformation of variables
Suppose we have time series data on consumption expenditure, income, and wealth. One reason for
high multicollinearity between income and wealth in such data is that over time both the variables
tend to move in the same direction. One way of minimizing this dependence is to proceed as
follows.
If the relation Yt = β1 +β2X2t +β3X3t +ut
(i) holds at time t, it must also hold at time t−1 because the origin of time is arbitrary anyway.
Therefore, we have Yt−1 = β1 +β2X2,t−1 +β3X3,t−1 +ut−1
Basic Econometrics
39. BASIC ECONOMETRICS
Classical Assumptions
Multicollinearity-REMEDIALS
If we subtract (i) from (ii), we obtain Yt−Yt−1=β2 (X2t−X2,t−1 )+β3 (X3t−X3,t−1 )+vt (iii) where vt =ut
−ut−1 .
Equation (iii) is known as the first difference form because we run the regression, not on the original
variables, but on the differences of successive values of the variables. The first difference regression
model often reduces the severity of multicollinearity because, although the levels of X2 and X3 may
be highly correlated, there is no a priori reason to believe that their differences will also be highly
correlated.
Additional or new data : Since multicollinearity is a sample feature, it is possible that in another
sample involving the same variables collinearity may not be so serious as in the first sample.
Sometimes simply increasing the size of the sample (if possible) may attenuate the collinearity
problem
Other methods of remedying multicollinearity:
Multivariate statistical techniques such as factor analysis and principal components or techniques
such as ridge regressionare often employed to “solve” the problem of multicollinearity
Basic Econometrics
40. BASIC ECONOMETRICS
Classical Assumptions
Panel data Models
Basic Econometrics
We distinguish the following data structures
1. Time series data: I {xt , t = 1, . . . , T}, univariate series, e.g. a price series: Its path over time
is modeled. The path may also depend on third variables.
Multivariate, e.g. several price series: Their individual as well as their common dynamics is
modeled. Third variables may be included.
2.Cross sectional data
are observed at a single point of time for several individuals, countries, assets, etc., xi , i = 1, . .
. , N. The interest lies in modeling the distinction of single individuals, the heterogeneity across
individuals.
3.A panel data set (also longitudinal data)
has both a cross-sectional and a time series dimension, where all cross section units are
observed during the whole time period. xit , i = 1, . . . , N, t = 1, . . . , T. T is usually small. We can
distinguish between balanced and unbalanced panels. Example for a balanced panel: The
Mikrozensus in Austria is a household, hh, survey, with the same size of 22.500 each quarter.
Each hh has to record its consumption expenditures for 5 quarters. So each quarter 4500
members enter/leave the Mikrozensus. This is a balanced panel.
41. BASIC ECONOMETRICS
Classical Assumptions
Panel data Models
Why Panel Data?
Basic Econometrics
Since panel data relate to individuals, firms, states, countries, etc., over time, there is bound to
be heterogeneity in these units.
The techniques of panel data estimation can take such heterogeneity explicitly into account by
allowing for individual-specific variables, as we shall show shortly.
1.We use the term individual in a generic sense to include microunits such as individuals,
firms, states, and countries.
2. By combining time series of cross-section observations, panel data give “more informative
data, more variability, less collinearity among variables, more degrees of freedom and more
efficiency.”
42. BASIC ECONOMETRICS
Classical Assumptions
Panel data Models
Why Panel Data?
Basic Econometrics
3.By studying the repeated cross section of observations, panel data are better suited to study
the dynamics of change. Spells of unemployment, job turnover, and labor mobility are better
studied with panel data.
4. Panel data can better detect and measure effects that simply cannot be observed in pure
cross-section or pure time series data. For example, the effects of minimum wage laws on
employment and earnings can be better studied if we include successive waves of minimum
wage increases in the federal and/or state minimum wages.
5. Panel data enables us to study more complicated behavioral models. For example,
phenomena such as economies of scale and technological change can be better handled by
panel data than by pure cross-section or pure time series data.
6. By making data available for several thousand units, panel data can minimize the bias that
might result if we aggregate individuals or firms into broad aggregates.
43. BASIC ECONOMETRICS
Classical Assumptions
Panel data Models
Why Panel Data?
Basic Econometrics
Consider the data given in Table 16.1, which are taken from a famous study of investment
theory proposed by Y. Grunfel
Grunfeld was interested in finding out how real gross investment (Y) depends on the real value
of the firm (X2) and real capital stock (X3). Although the original study covered several
companies, for illustrative purposes we have obtained data on four companies, General
Electric (GE), General Motor (GM), U.S. Steel (US), and Westinghouse. Data for each company
on the preceding three variables are available for the period 1935–1954.
Thus, there are four cross-sectional units and 20 time periods. In all, therefore, we have 80
observations. A priori, Y is expected to be positively related to X2 and X3. In principle, we could
run four time series regressions, one for each company or we could run 20 cross-sectional
regressions, one for each year, although in the latter case we will have to worry about the
degrees of freedom
44. BASIC ECONOMETRICS
Classical Assumptions
Panel data Models
Why Panel Data?
Basic Econometrics
Pooling, or combining, all the 80 observations, we can write the Grunfeld investment function
as:
Yit = β1 + β2X2it + β3X3it + uit i = 1, 2, 3, 4 (1)
t = 1, 2, ..., 20 where i stands for the ith cross-sectional unit and t for the tth time period.
As a matter of convention, we will let i denote the cross-section identifier and t the time
identifier. It is assumed that there are a maximum of N crosssectional units or observations
and a maximum of T time periods.
If each cross-sectional unit has the same number of time series observations, then such a
panel (data) is called a balanced panel. In the present example we have a balanced panel, as
each company in the sample has 20 observations. If the number of observations differs among
panel members, we call such a panel an unbalanced panel. In this chapter we will largely be
concerned with a balanced panel
45. BASIC ECONOMETRICS
Classical Assumptions
Panel data Model
Basic Econometrics
Initially, we assume that the X’s are nonstochastic and that the error term follows the classical
assumptions, namely, E(uit) ∼ N(0, σ 2 ).
b) Fixed effects model. ...
c) Random effects model.
The Fixed Effects or Least-Squares Dummy Variable (LSDV) Regression Model
One way to take into account the “individuality” of each company or each cross-sectional unit
is to let the intercept vary for each company but still assume that the slope coefficients are
constant across firms.
To see this, we write model (16.2.1) as: Yit = β1i + β2X2it + β3X3it + uit (16.3.2) Notice that we
have put the subscript i on the intercept term to suggest that the intercepts of the four firms
may be different; the differences may be due to special features of each company, such as
managerial style or managerial philosophy. In the literature, model (16.3.2) is known as the
fixed effects (regression) model (FEM).
46. BASIC ECONOMETRICS
Classical Assumptions
Panel data Models
Why Panel Data?
Basic Econometrics
The term “fixed effects” is due to the fact that, although the intercept may differ across
individuals (here the four companies), each individual’s intercept does not vary over time; that
is, it is time invariant. Notice that if we were to write the intercept as β1it, it will suggest that
the intercept of each company or individual is time variant. It may be noted that the FEM given
in (16.3.2) assumes that the (slope) coefficients of the regressors do not vary across individuals
or over time.
RANDOM EFFECTS MODEL
This is precisely the approach suggested by the proponents of the socalled error components
model (ECM) or random effects model (REM).
The basic idea is to start with (16.3.2): Yit = β1i + β2X2it + β3X3it + uit (16.4.1)
Instead of treating β1i as fixed, we assume that it is a random variable with a mean value of
β1 (no subscript i here). And the intercept value for an individual company can be expressed as
β1i = β1 + εi i = 1, 2, ..., N (16.4.2) where εi is a random error term with a mean value of zero
and variance of σ 2 ε .
What we are essentially saying is that the four firms included in our sample are a drawing from
a much larger universe of such companies and that they have a common mean value for the
intercept ( = β1) and the individual differences in the intercept values of each company are
reflected in the error term εi