Econometrics for
Development Professionals
Rudm 6041
Zemen Ayalew (PhD)
Course objective
• The aim of this course is to equip the students
with the necessary skills, including both the
acquisition of habits of thought and knowledge of
the techniques of modern econometrics.
• The course is application oriented.
• The course also aims to provide students with the
ability to use appropriate software in an effective
manner.
2
Contents of the course
• What is econometrics? Meaning of Econometric models;
Econometric tools; Aims and methodology of
econometrics. Data sources and data types; Single entities
and group entities; Functional forms and stochastic
structure of variables.
• Statistical Background: Probability; Random variables and
Probability Distributions; The Normal Probability
Distributions and Related Distributions; Classical Statistical
Inferences; Properties of Estimators; Sampling
Distributions for Samples from a Normal Population;
Testing of Hypotheses; Relationship between Confidence
Interval Procedures and Tests of Hypotheses.
3
…Contents
• Simple Regression: Introduction; Specification of the Relationships;
Statistical Inference in the Linear Regression Model; Analysis of
Variance for the Simple Regression Model; Prediction with the
Simple Regression Model; Alternative Functional Forms for
Regression Equations; The Regression Fallacy
• Multiple Regression: Introduction: A Model with two Explanatory
Variables; Statistical Inferences in the Multiple Regression Model;
Interpretation of the Regression Coefficients; Partial Correlation and
Multiple Correlation; Relationships among Simple, partial, and
Multiple Correlation Coefficients; prediction in the Multiple
Regression Model; Analysis of Variance and Tests of Hypotheses;
Degrees of Freedom; Tests for Stability.
4
… contents
• Introduction to Time Series Analysis:
Introduction; Two Methods of Time-Series
Analysis: Frequency Domain and Time
Domain; Stationary and Non-stationary Time
Series; Some useful Model for Time Series.
5
Outline
1.1. What is Econometrics?
1.2. Steps in Empirical Economic Analysis
1.3. Structure of Economic Data
1.4. Causality and the Notion of Ceteris Paribus
1.5. Review of Basic Statistical Theory
6
Chapter1. Introduction to Econometrics
1.1. What is Econometrics?
Definition
• It is a combination of mathematical and
statistical methods, economics and data to
answer empirical questions in economics.
• Or It is the application of economic theory,
mathematics, and statistical techniques for
the purpose of testing hypotheses and
estimating and forecasting economic
phenomena.
• Econometrics = Economics+ mathematics+
statistics+ computing
7
• Economic data is non-experimental data
• We cannot simply classify individuals or firms in
an experimental group and a control group.
• Individuals are typically free to self-select
themselves in a group (e.g., education,
occupation, product market, etc).
• Economic models (either simple or
sophisticated) are key to interpret the
statistical results in econometric
applications.
8
What Makes it Different?
Econometrics
Economic
Theory
Mathematical
Economics
Economic
Statistics
Mathematic
Statistics
10
Econometrics deserves to be studied in its own right for the
following reasons:
Economic theory makes statements or hypotheses that are mostly
qualitative in nature (the law of demand), the law does not provide
any numerical measure of the relationship. This is the job of the
econometrician.
The main concern of mathematical economics is to express
economic theory in mathematical form without regard to
measurability or empirical verification of the theory. Econometrics is
mainly interested in the empirical verification of economic theory.
Economic statistics is mainly concerned with collecting,
processing, and presenting economic data in the form of charts and
tables. It does not go any further. The one who does that is the
econometrician.
Mathematical statistics provides many of tools for economic
studies, but econometrics supplies the later with many special
methods of quantitative analysis based on economic data
Why Econometrics?
• Analysis
– Testing economic theories
• Test Keynes hypothesis: Consumption increases with
income
– Estimation of economic relationships
• Estimating demand and supply equations
• Forecasting or Prediction
– Use current and past economic data to predict future
values of variables such as inflation, GDP, stock prices,
etc.
• Policy making
– To test the outcome of different government
economic policy moves
• Impact of extension intervention on farm productivity;
• Effects of price ceiling on market stabilization.
• The research process in applied econometrics is not
simply linear, but it has “loops”.
– The original question, model, and even the data
collection can be modified after looking at
preliminary econometric results.
• Steps followed in the research process in
econometrics are:
1. Formulation of the question(s) of interest.
2. Collection of data
3. Specification of the econometric model
4. Estimation, validation, hypotheses testing and
prediction.
12
1.2. Steps in Empirical Economic Analysis
• Suppose we want to test the hypothesis:
Efficiency in production increases with
extension contact.
• This is important for policy makers to
improve farm productivity by increasing
extension contact with farmers.
13
Empirical Question(s)/Hypothesis
14
• There are different datasets that can be
collected to test this hypothesis.
1. Data on production costs versus data on
output and inputs.
2. Data at farm level versus data at household
level.
3. Time series data (e.g., a farmer over several
years), or cross-sectional data (e.g., many
farmers at the same period); or panel data
(e.g., many farmers over multiple periods).
Collection of Data
15
• An econometric model is an economic model
where we take into account what is observable and
not to the researcher.
• A researcher’s decision of which economic model
to estimate depends critically on what is
observable.
• For instance, we can study efficiency either by
estimating a cost function or by estimating a
production function.
• Whether we estimate the cost or production
function depends very much on the available data:
with data on production costs we can estimate a
cost function; with data on output and inputs we
can estimate a production function.
Specification of the Econometric Model
16
• Suppose that we decide to estimate a
production function.
Specification of the Econometric Model
Y = F( L , K , N, u)
Y = Yield per hectare
L = Amount of labor used
K = Amount of physical capital used
N = Extension contact in a year
u = Other factors unobservable to researcher
17
• An important specification assumption is the
choice of the functional form of the
production function F(.).
• Following previous literature can be useful
here.
• For instance, it is well-known that the Cobb-
Douglas (C-D) production function provides
a good fit for farm production.
• The C-D production function is also very
convenient because it is linear in logarithms.
Specification of the Econometric Model
18
• In logarithms, the C-D production function is:
Specification of the Econometric Model
• The β’s are parameters to be estimated.
• u represents unobservable factors affecting yield,
e.g., weather.
u
N
K
L
Y N
K
L 



 ln
ln
ln
ln 0 



• Dealing with the unobservable (or error term or
disturbance) u, is one of the most important issues
in any econometric analysis.
• Certain conditions on the statistical properties of
the error term are key for the good properties of
our estimators of the parameters of interest.
• To a certain extent, we will be able to test for these
conditions. However, the economic interpretation
of the error term (i.e., which are the main factors in
it) is very important to correctly interpret our
estimation results.
19
Specification of the Econometric Model
20
• We want to estimate the parameters β in
the production function.
• After estimation, we have to make
specification tests in order to validate some
of the assumptions that we have made for
estimation.
• The results of these tests may imply a re-
specification and re-estimation of the
model.
Estimation, Validation, Hypotheses Testing,
prediction
21
• Once we have a validated model, we can
interpret the results from an economic point of
view, make tests, and predictions.
• For instance, we may be interested in testing
the significance of extension contact or testing
constant returns to scale.
Estimation, Validation, Hypotheses Testing,
prediction
22
Methodology of Econometrics
Broadly speaking, traditional econometric methodology
proceeds along the following lines:
1. Statement of theory or hypothesis.
2. Specification of the mathematical model of the theory
3. Specification of the statistical, or econometric, model
4. Collecting the data
5. Estimation of the parameters of the econometric model
6. Hypothesis testing
7. Forecasting or prediction
8. Using the model for control or policy purposes.
To illustrate the preceding steps, let us consider the well-
known Keynesian theory of consumption.
23
1. Statement of Theory or Hypothesis
 Keynes states that on average, consumers increase their
consumption as their income increases, but not as much as the
increase in their income (MPC < 1).
2. Specification of the Mathematical Model of Consumption (single-
equation model)
Y = β1 + β2X 0 < β2 < 1
Y = consumption expenditure and (dependent variable)
X = income, (independent, or explanatory variable)
β1 = the intercept
β2 = the slope coefficient
 The slope coefficient β2 measures the MPC.
24
25
3. Specification of the Econometric Model of Consumption
 The relationships between economic variables are generally
inexact. In addition to income, other variables affect consumption
expenditure. For example, size of family, ages of the members in
the family, family religion, etc., are likely to exert some influence
on consumption.
 To allow for the inexact relationships between economic
variables, (I.3.1) is modified as follows:
 Y = β1 + β2X + u
 where u, known as the disturbance, or error, term, is a random
(stochastic) variable that has well-defined probabilistic properties.
The disturbance term u may well represent all those factors that
affect consumption but are not taken into account explicitly.
26
4. Obtaining Data
 To obtain the numerical values of β1 and β2, we need data. Look
at Table I.1, which relate to the personal consumption expenditure
(PCE) and the gross domestic product (GDP). The data are in “real”
terms.
27
5. Estimation of the Econometric Model
 Regression analysis is the main tool used to obtain the
estimates. Using this technique and the data given in
Table I.1, we obtain the following estimates of β1 and β2,
namely, −184.08 and 0.7064. Thus, the estimated
consumption function is:
 Yˆ = −184.08 + 0.7064Xi
 The estimated regression line is shown in Figure I.3. The
regression line fits the data quite well. The slope
coefficient (i.e., the MPC) was about 0.70, an increase in
real income of 1 birr led, on average, to an increase of
about 70 cents in real consumption.
28
6. Hypothesis Testing
 That is to find out whether the estimates obtained in, Eq. (I.3.3)
are in accord with the expectations of the theory that is being
tested. Keynes expected the MPC to be positive but less than 1. In
our example we found the MPC to be about 0.70. But before we
accept this finding as confirmation of Keynesian consumption
theory, we must enquire whether this estimate is sufficiently
below unity. In other words, is 0.70 statistically less than 1? If it is,
it may support Keynes’ theory.
 Such confirmation or refutation of economic theories on the basis
of sample evidence is based on a branch of statistical theory
known as statistical inference (hypothesis testing).
29
7. Forecasting or Prediction
 To illustrate, suppose we want to predict the mean consumption
expenditure for 1997. The GDP value for 1997 was 7269.8 billion
dollars consumption would be:
Yˆ1997 = −184.0779 + 0.7064 (7269.8) = 4951.3
 The actual value of the consumption expenditure reported in 1997
was 4913.5 billion dollars. The estimated model (I.3.3) thus over-
predicted the actual consumption expenditure by about 37.82
billion dollars. We could say the forecast error is about 37.8 billion
dollars, which is about 0.76 percent of the actual GDP value for
1997.
 Now suppose the government decides to propose a reduction in the
income tax. What will be the effect of such a policy on income and
thereby on consumption expenditure and ultimately on
employment?
30
 Suppose that, as a result of the proposed policy change,
investment expenditure increases. What will be the effect on the
economy? As macroeconomic theory shows, the change in income
following, a dollar’s worth of change in investment expenditure is
given by the income multiplier M, which is defined as:
 M = 1/(1 − MPC)
 The multiplier is about M = 3.33. That is, an increase (decrease) of
a dollar in investment will eventually lead to more than a
threefold increase (decrease) in income; note that it takes time for
the multiplier to work.
 The critical value in this computation is MPC. Thus, a quantitative
estimate of MPC provides valuable information for policy
purposes. Knowing MPC, one can predict the future course of
income, consumption expenditure, and employment following a
change in the government’s fiscal policies.
8. Use of the Model for Control or Policy Purposes
• Suppose we have the estimated consumption function given in (I.3.3). Suppose
further the government believes that consumer expenditure of about 4900 will
keep the unemployment rate at its current level of about 4.2%. What level of
income will guarantee the target amount of consumption expenditure?
• If the regression results given in (I.3.3) seem reasonable, simple arithmetic will
show that:
• 4900 = −184.0779 + 0.7064X (I.3.6)
• which gives X = 7197, approximately. That is, an income level of about 7197
(billion) dollars, given an MPC of about 0.70, will produce an expenditure of
about 4900 billion dollars. As these calculations suggest, an estimated model may
be used for control, or policy, purposes. By appropriate fiscal and monetary policy
mix, the government can manipulate the control variable X to produce the desired
level of the target variable Y.
31
Economic Theory
Mathematic Model Econometric Model Data Collection
Estimation
Hypothesis Testing
Forecasting
Application
in control or
policy
studies
• Different types of datasets have their own
advantages and disadvantages.
• Some econometric methods may be valid (i.e.,
have good properties) for some types of data but
not for others.
• We typically distinguish four types of datasets:
1. Cross-sectional data
2. Time series data
3. Pooled cross-sectional data
4. Panel or longitudinal data
33
1.3. Economic Data
34
• A cross-sectional dataset is a sample of
individuals, or households, or firms, or
cities, or states, or countries, etc, taken at
a given point in time.
• We often assume that these data have
been obtained by random sampling.
• Sometimes we do not have a random
sample: sample selection problem, spatial
correlation, stratified samples.
Cross-sectional Data
 Example of cross-sectional dataset
35
Obsno Output Educ Extcont Sex Married
1 3.10 11 2 1 0
2 3.24 12 22 1 1
3 3.00 11 2 0 0
. . . . . .
. . . . . .
. . . . . .
499 11.56 16 5 0 1
500 3.50 14 5 1 0
Cross-sectional Data
36
• It consists of observations on a variable
or several variables over several
periods of time (days, weeks, months,
years).
• A key feature of time series data is
that, typically, observations are
correlated across time. We do not have
a random sample.
Time Series Data
Time Series Data
Problems of time series data
• Dependence across time
– This period’s observation depends somewhat on
last period’s observation (E.g. Income, price, etc)
• Seasonality: Frequency of data
– Many weekly, monthly or quarterly time series
data are affected by seasonality (E.g: Ice cream
sales fall in winter)
37
 Example of time series dataset
38
Obsno Year Month Exrate Irate
1 1990 1 1.32 7.35
2 1990 2 1.30 7.30
3 1990 3 1.29 7.32
. . . . .
. . . . .
. . . . .
191 2005 11 1.11 4.26
192 2005 12 1.10 4.31
Time Series Data
• A sequence of cross sections of the same variables
from the same population through a period of time
is called a pooled cross-sectional data.
• It is useful data to analyze the evolution over time
of the cross-sectional distribution of variables such
as individual wages, household income, firms’
investments, etc.
• Analysis is similar to cross sectional data, with the
additional consideration of structural changes due
to time
• Relatively new concept useful for analyzing policy
effects
39
Pooled Cross Sectional Data
• In panel data we have a group of individuals (or
households, firms, countries, etc) who are observed
at several points in time. That is, we have time
series data for each individual in the sample.
• The key feature of panel data that distinguishes
them from pooled cross sections is that the same
individuals are followed over a given period of
time.
• Using panel data we can control for time-invariant
unobserved characteristics of individuals, firms,
countries, etc.
40
Panel or Longitudinal Data
 Example of panel dataset: 100 cities over 2 years
41
Obsno City Year Murders Population Police
1 1 1999 5 350,000 440
2 1 2000 8 359,200 471
3 2 1999 2 64.300 75
4 2 2000 1 65,100 75
. . . . . .
. . . . . .
299 100 1999 25 543,000 520
300 100 2000 32 546,200 493
Panel or Longitudinal Data
• Most empirical questions in economics are
associated to the identification of causal
effects.
• X has causal effect on Y: X→Y
• Example: How does one more extension
contact changes farm productivity, if all other
relevant factors are held fixed?
• The notion of ceteris paribus (i.e., “other
factors being equal”) plays an important
role in the analysis of causality.
42
1.4. Causality and the Notion of “Ceteris
Paribus”
1.4. Causality and the Notion of “Ceteris
Paribus”
• Let us consider the relationship between two
variables X and Y.
• Assume that change in X causes change in Y:
∆X→∆Y
Y=Farm productivity, X=Extension contact
Questions
• How about other factors affecting Y? Experience?
Education? Others?
• What is the functional form that describes the
relationship between Y and X?
• How can we be sure we are capturing a ceteris
paribus relationship between Y and X?
1.4. Causality and the Notion of “Ceteris
Paribus”
• Let be our simple (=one variable)
linear (“y=a+bx” type) regression model.
• y: dependent variable, explained variable,
regressand
• x: independent variable (not in statistical sense),
explanatory variable, regressor
• u: error term, disturbance, unobserved factors
u
x
y 

 1
0 

1.4. Causality and the Notion of “Ceteris
Paribus”
• Suppose, we know
• How will y change if x changes from x1 to x2?
• Ceteris paribus assumption implies that
(all other factors are held constant) =>
x has a linear effect on y
1
0,










u
x
y
u
u
x
x
y
y
u
x
y
u
x
y














1
2
1
2
1
1
2
2
2
1
0
2
1
1
1
0
1
)
(





0

u
x
y 

 1

1.4. Causality and the Notion of “Ceteris
Paribus”
• In the simple linear regression model (SLRM)
are called parameters:
- intercept parameter
- slope parameter
• Estimation of these parameters is the primary
interest in applied econometrics.
• In our model, shows by how much y changes
when x changes by one unit, holding other factors
constant.
1
0,

1
0


1

1.4. Causality and the Notion of “Ceteris
Paribus”
• It is possible to do an experiment where we hold
everything fixed and vary only one variable (x) then
measure how the control variable (y) changes. This
way we can have ∆u=0.
• But in economics we don’t do experiments in a lab,
we have to rely on observations. In this case ∆u is
most likely to be different from 0.
• How should we proceed then?
SLRM Assumption 1
• Because x and u are random variables, we need a concept
that deals with probability.
Assumption 1: E(u)=0
• This is not a very restrictive assumption, as long as our
empirical equation contains an intercept, β0:
• If E(u)=a≠0, we can always write:
And now E(u’)=0
 '
' 1
0
'
1
'
0
0
u
x
a
u
x
a
y
u







 







SLRM Assumption 2
• An important assumption about how x and u are related is:
Assumption 2: E(u|x)=0 - the average value of u does not
depend on the value of x.
• Combining this with assumption 1 gives us:
E(u|x)=E(u)=0
• This means that for any given value of x the expectation of
the error term is zero.
In education example:
E(u|x=16)= =E(u|x=8)=0
Average ability of people
with 16 years of education
Average ability of people
with 8 years of education
Result
Assumption 2 implies that cov(x,u)=0:
 
 
0
)
|
(
)
(
)
(
)
(
)
(
))
(
))(
(
(
)
,
(










x
u
E
xu
E
u
E
x
xu
E
u
x
xu
E
u
E
u
x
E
x
E
u
x
Cov
Violation of assumption 2
• In the education example, u represents innate
ability (energy level, smartness, genetic
determinants…). It might be that people with
higher u (good potential) choose to have more
education.
• If we believe that u↑ as x↑ => cov(x,u)>0
than assumption 2 will be violated. This is a
very serious problem, we will learn later how
to deal with it.
Population Regression Function
• We have the following result:
• This is called population regression function; it is a
linear function of x.
• One unit increase in x changes the expected value of y
by the amount β1
:
 
x
x
u
x
E
x
y
E
x
u
E
u
x
y
1
0
1
0
1
0 |
)
(
)
|
(
0
)
|
( 



















1
,
)
|
( 1 


 x
x
y
E 
Another result is that
Systematic part Unsystematic part
WE DO NOT KNOW β0 β1. What should we do?
u
x
y
E
u
x
y
x
y
E




 )
|
(
)
|
(
1
0







Population Regression Function
Estimating the parameters
• Suppose, we have a random sample of size n
from the population:
• We must decide how to use data to obtain
estimates of the intercept and the slope
parameters.
• There are several approaches that we can use.
Let’s start by looking at the data and then use
the method of least squares.
}
,...,
1
:
)
,
{( n
i
y
x i
i 
Data
• Sample
• For observation i we have:
• We have x and y => we can plot the data.
• We do not have β0 β1 hence we do not know ui.
}
,...,
1
:
)
,
{( n
i
y
x i
i 






















































n
n
n
n
n u
x
y
u
x
y
u
x
y
x
x
x
y
y
y
1
0
2
2
1
0
2
1
1
1
0
1
2
1
2
1
,









x
y
i
i
i u
x
y 

 1
0 

Least Squares Approach
• This approach is to obtain parameters such that the
sum of squared residuals is minimized (=least
squares):
• First order conditions:

 




n
i
i
i
b
b
n
i
i
b
b
x
b
b
y
u
1
2
1
0
,
1
2
,
)
(
ˆ min
min 1
0
1
0








































n
i
i
i
i
n
i
i
i
n
i
i
i
n
i
i
i
x
b
b
y
x
b
x
b
b
y
x
b
b
y
b
x
b
b
y
1
1
0
1
1
2
1
0
1
1
0
0
1
2
1
0
0
)
(
2
0
)
(
0
)
(
2
0
)
(
Least Squares Approach
• The solution to this system of linear equations
are:
)
(
)
,
(
1
1
0
X
Var
Y
X
Cov
b
X
b
Y
b



Second Order Conditions
• To make sure we have found a min and not max solution
we need to check the second order condition for least
squares problem.
• For that we need to get second order derivatives and
check if the resulting matrix is positively definite:
2
1
1
2
1
0
2
0
1
1
2
1
0
2
1
0
1
2
1
0
2
2
0
1
2
1
0
2
)
(
)
(
)
(
)
(
b
x
b
b
y
b
b
x
b
b
y
b
b
x
b
b
y
b
x
b
b
y
n
i
i
i
n
i
i
i
n
i
i
i
n
i
i
i



















































Econometrics _1.pptx

  • 1.
  • 2.
    Course objective • Theaim of this course is to equip the students with the necessary skills, including both the acquisition of habits of thought and knowledge of the techniques of modern econometrics. • The course is application oriented. • The course also aims to provide students with the ability to use appropriate software in an effective manner. 2
  • 3.
    Contents of thecourse • What is econometrics? Meaning of Econometric models; Econometric tools; Aims and methodology of econometrics. Data sources and data types; Single entities and group entities; Functional forms and stochastic structure of variables. • Statistical Background: Probability; Random variables and Probability Distributions; The Normal Probability Distributions and Related Distributions; Classical Statistical Inferences; Properties of Estimators; Sampling Distributions for Samples from a Normal Population; Testing of Hypotheses; Relationship between Confidence Interval Procedures and Tests of Hypotheses. 3
  • 4.
    …Contents • Simple Regression:Introduction; Specification of the Relationships; Statistical Inference in the Linear Regression Model; Analysis of Variance for the Simple Regression Model; Prediction with the Simple Regression Model; Alternative Functional Forms for Regression Equations; The Regression Fallacy • Multiple Regression: Introduction: A Model with two Explanatory Variables; Statistical Inferences in the Multiple Regression Model; Interpretation of the Regression Coefficients; Partial Correlation and Multiple Correlation; Relationships among Simple, partial, and Multiple Correlation Coefficients; prediction in the Multiple Regression Model; Analysis of Variance and Tests of Hypotheses; Degrees of Freedom; Tests for Stability. 4
  • 5.
    … contents • Introductionto Time Series Analysis: Introduction; Two Methods of Time-Series Analysis: Frequency Domain and Time Domain; Stationary and Non-stationary Time Series; Some useful Model for Time Series. 5
  • 6.
    Outline 1.1. What isEconometrics? 1.2. Steps in Empirical Economic Analysis 1.3. Structure of Economic Data 1.4. Causality and the Notion of Ceteris Paribus 1.5. Review of Basic Statistical Theory 6 Chapter1. Introduction to Econometrics
  • 7.
    1.1. What isEconometrics? Definition • It is a combination of mathematical and statistical methods, economics and data to answer empirical questions in economics. • Or It is the application of economic theory, mathematics, and statistical techniques for the purpose of testing hypotheses and estimating and forecasting economic phenomena. • Econometrics = Economics+ mathematics+ statistics+ computing 7
  • 8.
    • Economic datais non-experimental data • We cannot simply classify individuals or firms in an experimental group and a control group. • Individuals are typically free to self-select themselves in a group (e.g., education, occupation, product market, etc). • Economic models (either simple or sophisticated) are key to interpret the statistical results in econometric applications. 8 What Makes it Different?
  • 9.
  • 10.
    10 Econometrics deserves tobe studied in its own right for the following reasons: Economic theory makes statements or hypotheses that are mostly qualitative in nature (the law of demand), the law does not provide any numerical measure of the relationship. This is the job of the econometrician. The main concern of mathematical economics is to express economic theory in mathematical form without regard to measurability or empirical verification of the theory. Econometrics is mainly interested in the empirical verification of economic theory. Economic statistics is mainly concerned with collecting, processing, and presenting economic data in the form of charts and tables. It does not go any further. The one who does that is the econometrician. Mathematical statistics provides many of tools for economic studies, but econometrics supplies the later with many special methods of quantitative analysis based on economic data
  • 11.
    Why Econometrics? • Analysis –Testing economic theories • Test Keynes hypothesis: Consumption increases with income – Estimation of economic relationships • Estimating demand and supply equations • Forecasting or Prediction – Use current and past economic data to predict future values of variables such as inflation, GDP, stock prices, etc. • Policy making – To test the outcome of different government economic policy moves • Impact of extension intervention on farm productivity; • Effects of price ceiling on market stabilization.
  • 12.
    • The researchprocess in applied econometrics is not simply linear, but it has “loops”. – The original question, model, and even the data collection can be modified after looking at preliminary econometric results. • Steps followed in the research process in econometrics are: 1. Formulation of the question(s) of interest. 2. Collection of data 3. Specification of the econometric model 4. Estimation, validation, hypotheses testing and prediction. 12 1.2. Steps in Empirical Economic Analysis
  • 13.
    • Suppose wewant to test the hypothesis: Efficiency in production increases with extension contact. • This is important for policy makers to improve farm productivity by increasing extension contact with farmers. 13 Empirical Question(s)/Hypothesis
  • 14.
    14 • There aredifferent datasets that can be collected to test this hypothesis. 1. Data on production costs versus data on output and inputs. 2. Data at farm level versus data at household level. 3. Time series data (e.g., a farmer over several years), or cross-sectional data (e.g., many farmers at the same period); or panel data (e.g., many farmers over multiple periods). Collection of Data
  • 15.
    15 • An econometricmodel is an economic model where we take into account what is observable and not to the researcher. • A researcher’s decision of which economic model to estimate depends critically on what is observable. • For instance, we can study efficiency either by estimating a cost function or by estimating a production function. • Whether we estimate the cost or production function depends very much on the available data: with data on production costs we can estimate a cost function; with data on output and inputs we can estimate a production function. Specification of the Econometric Model
  • 16.
    16 • Suppose thatwe decide to estimate a production function. Specification of the Econometric Model Y = F( L , K , N, u) Y = Yield per hectare L = Amount of labor used K = Amount of physical capital used N = Extension contact in a year u = Other factors unobservable to researcher
  • 17.
    17 • An importantspecification assumption is the choice of the functional form of the production function F(.). • Following previous literature can be useful here. • For instance, it is well-known that the Cobb- Douglas (C-D) production function provides a good fit for farm production. • The C-D production function is also very convenient because it is linear in logarithms. Specification of the Econometric Model
  • 18.
    18 • In logarithms,the C-D production function is: Specification of the Econometric Model • The β’s are parameters to be estimated. • u represents unobservable factors affecting yield, e.g., weather. u N K L Y N K L      ln ln ln ln 0    
  • 19.
    • Dealing withthe unobservable (or error term or disturbance) u, is one of the most important issues in any econometric analysis. • Certain conditions on the statistical properties of the error term are key for the good properties of our estimators of the parameters of interest. • To a certain extent, we will be able to test for these conditions. However, the economic interpretation of the error term (i.e., which are the main factors in it) is very important to correctly interpret our estimation results. 19 Specification of the Econometric Model
  • 20.
    20 • We wantto estimate the parameters β in the production function. • After estimation, we have to make specification tests in order to validate some of the assumptions that we have made for estimation. • The results of these tests may imply a re- specification and re-estimation of the model. Estimation, Validation, Hypotheses Testing, prediction
  • 21.
    21 • Once wehave a validated model, we can interpret the results from an economic point of view, make tests, and predictions. • For instance, we may be interested in testing the significance of extension contact or testing constant returns to scale. Estimation, Validation, Hypotheses Testing, prediction
  • 22.
    22 Methodology of Econometrics Broadlyspeaking, traditional econometric methodology proceeds along the following lines: 1. Statement of theory or hypothesis. 2. Specification of the mathematical model of the theory 3. Specification of the statistical, or econometric, model 4. Collecting the data 5. Estimation of the parameters of the econometric model 6. Hypothesis testing 7. Forecasting or prediction 8. Using the model for control or policy purposes. To illustrate the preceding steps, let us consider the well- known Keynesian theory of consumption.
  • 23.
    23 1. Statement ofTheory or Hypothesis  Keynes states that on average, consumers increase their consumption as their income increases, but not as much as the increase in their income (MPC < 1). 2. Specification of the Mathematical Model of Consumption (single- equation model) Y = β1 + β2X 0 < β2 < 1 Y = consumption expenditure and (dependent variable) X = income, (independent, or explanatory variable) β1 = the intercept β2 = the slope coefficient  The slope coefficient β2 measures the MPC.
  • 24.
  • 25.
    25 3. Specification ofthe Econometric Model of Consumption  The relationships between economic variables are generally inexact. In addition to income, other variables affect consumption expenditure. For example, size of family, ages of the members in the family, family religion, etc., are likely to exert some influence on consumption.  To allow for the inexact relationships between economic variables, (I.3.1) is modified as follows:  Y = β1 + β2X + u  where u, known as the disturbance, or error, term, is a random (stochastic) variable that has well-defined probabilistic properties. The disturbance term u may well represent all those factors that affect consumption but are not taken into account explicitly.
  • 26.
    26 4. Obtaining Data To obtain the numerical values of β1 and β2, we need data. Look at Table I.1, which relate to the personal consumption expenditure (PCE) and the gross domestic product (GDP). The data are in “real” terms.
  • 27.
    27 5. Estimation ofthe Econometric Model  Regression analysis is the main tool used to obtain the estimates. Using this technique and the data given in Table I.1, we obtain the following estimates of β1 and β2, namely, −184.08 and 0.7064. Thus, the estimated consumption function is:  Yˆ = −184.08 + 0.7064Xi  The estimated regression line is shown in Figure I.3. The regression line fits the data quite well. The slope coefficient (i.e., the MPC) was about 0.70, an increase in real income of 1 birr led, on average, to an increase of about 70 cents in real consumption.
  • 28.
    28 6. Hypothesis Testing That is to find out whether the estimates obtained in, Eq. (I.3.3) are in accord with the expectations of the theory that is being tested. Keynes expected the MPC to be positive but less than 1. In our example we found the MPC to be about 0.70. But before we accept this finding as confirmation of Keynesian consumption theory, we must enquire whether this estimate is sufficiently below unity. In other words, is 0.70 statistically less than 1? If it is, it may support Keynes’ theory.  Such confirmation or refutation of economic theories on the basis of sample evidence is based on a branch of statistical theory known as statistical inference (hypothesis testing).
  • 29.
    29 7. Forecasting orPrediction  To illustrate, suppose we want to predict the mean consumption expenditure for 1997. The GDP value for 1997 was 7269.8 billion dollars consumption would be: Yˆ1997 = −184.0779 + 0.7064 (7269.8) = 4951.3  The actual value of the consumption expenditure reported in 1997 was 4913.5 billion dollars. The estimated model (I.3.3) thus over- predicted the actual consumption expenditure by about 37.82 billion dollars. We could say the forecast error is about 37.8 billion dollars, which is about 0.76 percent of the actual GDP value for 1997.  Now suppose the government decides to propose a reduction in the income tax. What will be the effect of such a policy on income and thereby on consumption expenditure and ultimately on employment?
  • 30.
    30  Suppose that,as a result of the proposed policy change, investment expenditure increases. What will be the effect on the economy? As macroeconomic theory shows, the change in income following, a dollar’s worth of change in investment expenditure is given by the income multiplier M, which is defined as:  M = 1/(1 − MPC)  The multiplier is about M = 3.33. That is, an increase (decrease) of a dollar in investment will eventually lead to more than a threefold increase (decrease) in income; note that it takes time for the multiplier to work.  The critical value in this computation is MPC. Thus, a quantitative estimate of MPC provides valuable information for policy purposes. Knowing MPC, one can predict the future course of income, consumption expenditure, and employment following a change in the government’s fiscal policies.
  • 31.
    8. Use ofthe Model for Control or Policy Purposes • Suppose we have the estimated consumption function given in (I.3.3). Suppose further the government believes that consumer expenditure of about 4900 will keep the unemployment rate at its current level of about 4.2%. What level of income will guarantee the target amount of consumption expenditure? • If the regression results given in (I.3.3) seem reasonable, simple arithmetic will show that: • 4900 = −184.0779 + 0.7064X (I.3.6) • which gives X = 7197, approximately. That is, an income level of about 7197 (billion) dollars, given an MPC of about 0.70, will produce an expenditure of about 4900 billion dollars. As these calculations suggest, an estimated model may be used for control, or policy, purposes. By appropriate fiscal and monetary policy mix, the government can manipulate the control variable X to produce the desired level of the target variable Y. 31
  • 32.
    Economic Theory Mathematic ModelEconometric Model Data Collection Estimation Hypothesis Testing Forecasting Application in control or policy studies
  • 33.
    • Different typesof datasets have their own advantages and disadvantages. • Some econometric methods may be valid (i.e., have good properties) for some types of data but not for others. • We typically distinguish four types of datasets: 1. Cross-sectional data 2. Time series data 3. Pooled cross-sectional data 4. Panel or longitudinal data 33 1.3. Economic Data
  • 34.
    34 • A cross-sectionaldataset is a sample of individuals, or households, or firms, or cities, or states, or countries, etc, taken at a given point in time. • We often assume that these data have been obtained by random sampling. • Sometimes we do not have a random sample: sample selection problem, spatial correlation, stratified samples. Cross-sectional Data
  • 35.
     Example ofcross-sectional dataset 35 Obsno Output Educ Extcont Sex Married 1 3.10 11 2 1 0 2 3.24 12 22 1 1 3 3.00 11 2 0 0 . . . . . . . . . . . . . . . . . . 499 11.56 16 5 0 1 500 3.50 14 5 1 0 Cross-sectional Data
  • 36.
    36 • It consistsof observations on a variable or several variables over several periods of time (days, weeks, months, years). • A key feature of time series data is that, typically, observations are correlated across time. We do not have a random sample. Time Series Data
  • 37.
    Time Series Data Problemsof time series data • Dependence across time – This period’s observation depends somewhat on last period’s observation (E.g. Income, price, etc) • Seasonality: Frequency of data – Many weekly, monthly or quarterly time series data are affected by seasonality (E.g: Ice cream sales fall in winter) 37
  • 38.
     Example oftime series dataset 38 Obsno Year Month Exrate Irate 1 1990 1 1.32 7.35 2 1990 2 1.30 7.30 3 1990 3 1.29 7.32 . . . . . . . . . . . . . . . 191 2005 11 1.11 4.26 192 2005 12 1.10 4.31 Time Series Data
  • 39.
    • A sequenceof cross sections of the same variables from the same population through a period of time is called a pooled cross-sectional data. • It is useful data to analyze the evolution over time of the cross-sectional distribution of variables such as individual wages, household income, firms’ investments, etc. • Analysis is similar to cross sectional data, with the additional consideration of structural changes due to time • Relatively new concept useful for analyzing policy effects 39 Pooled Cross Sectional Data
  • 40.
    • In paneldata we have a group of individuals (or households, firms, countries, etc) who are observed at several points in time. That is, we have time series data for each individual in the sample. • The key feature of panel data that distinguishes them from pooled cross sections is that the same individuals are followed over a given period of time. • Using panel data we can control for time-invariant unobserved characteristics of individuals, firms, countries, etc. 40 Panel or Longitudinal Data
  • 41.
     Example ofpanel dataset: 100 cities over 2 years 41 Obsno City Year Murders Population Police 1 1 1999 5 350,000 440 2 1 2000 8 359,200 471 3 2 1999 2 64.300 75 4 2 2000 1 65,100 75 . . . . . . . . . . . . 299 100 1999 25 543,000 520 300 100 2000 32 546,200 493 Panel or Longitudinal Data
  • 42.
    • Most empiricalquestions in economics are associated to the identification of causal effects. • X has causal effect on Y: X→Y • Example: How does one more extension contact changes farm productivity, if all other relevant factors are held fixed? • The notion of ceteris paribus (i.e., “other factors being equal”) plays an important role in the analysis of causality. 42 1.4. Causality and the Notion of “Ceteris Paribus”
  • 43.
    1.4. Causality andthe Notion of “Ceteris Paribus” • Let us consider the relationship between two variables X and Y. • Assume that change in X causes change in Y: ∆X→∆Y Y=Farm productivity, X=Extension contact Questions • How about other factors affecting Y? Experience? Education? Others? • What is the functional form that describes the relationship between Y and X? • How can we be sure we are capturing a ceteris paribus relationship between Y and X?
  • 44.
    1.4. Causality andthe Notion of “Ceteris Paribus” • Let be our simple (=one variable) linear (“y=a+bx” type) regression model. • y: dependent variable, explained variable, regressand • x: independent variable (not in statistical sense), explanatory variable, regressor • u: error term, disturbance, unobserved factors u x y    1 0  
  • 45.
    1.4. Causality andthe Notion of “Ceteris Paribus” • Suppose, we know • How will y change if x changes from x1 to x2? • Ceteris paribus assumption implies that (all other factors are held constant) => x has a linear effect on y 1 0,           u x y u u x x y y u x y u x y               1 2 1 2 1 1 2 2 2 1 0 2 1 1 1 0 1 ) (      0  u x y    1 
  • 46.
    1.4. Causality andthe Notion of “Ceteris Paribus” • In the simple linear regression model (SLRM) are called parameters: - intercept parameter - slope parameter • Estimation of these parameters is the primary interest in applied econometrics. • In our model, shows by how much y changes when x changes by one unit, holding other factors constant. 1 0,  1 0   1 
  • 47.
    1.4. Causality andthe Notion of “Ceteris Paribus” • It is possible to do an experiment where we hold everything fixed and vary only one variable (x) then measure how the control variable (y) changes. This way we can have ∆u=0. • But in economics we don’t do experiments in a lab, we have to rely on observations. In this case ∆u is most likely to be different from 0. • How should we proceed then?
  • 48.
    SLRM Assumption 1 •Because x and u are random variables, we need a concept that deals with probability. Assumption 1: E(u)=0 • This is not a very restrictive assumption, as long as our empirical equation contains an intercept, β0: • If E(u)=a≠0, we can always write: And now E(u’)=0  ' ' 1 0 ' 1 ' 0 0 u x a u x a y u                
  • 49.
    SLRM Assumption 2 •An important assumption about how x and u are related is: Assumption 2: E(u|x)=0 - the average value of u does not depend on the value of x. • Combining this with assumption 1 gives us: E(u|x)=E(u)=0 • This means that for any given value of x the expectation of the error term is zero. In education example: E(u|x=16)= =E(u|x=8)=0 Average ability of people with 16 years of education Average ability of people with 8 years of education
  • 50.
    Result Assumption 2 impliesthat cov(x,u)=0:     0 ) | ( ) ( ) ( ) ( ) ( )) ( ))( ( ( ) , (           x u E xu E u E x xu E u x xu E u E u x E x E u x Cov
  • 51.
    Violation of assumption2 • In the education example, u represents innate ability (energy level, smartness, genetic determinants…). It might be that people with higher u (good potential) choose to have more education. • If we believe that u↑ as x↑ => cov(x,u)>0 than assumption 2 will be violated. This is a very serious problem, we will learn later how to deal with it.
  • 52.
    Population Regression Function •We have the following result: • This is called population regression function; it is a linear function of x. • One unit increase in x changes the expected value of y by the amount β1 :   x x u x E x y E x u E u x y 1 0 1 0 1 0 | ) ( ) | ( 0 ) | (                     1 , ) | ( 1     x x y E 
  • 53.
    Another result isthat Systematic part Unsystematic part WE DO NOT KNOW β0 β1. What should we do? u x y E u x y x y E      ) | ( ) | ( 1 0        Population Regression Function
  • 54.
    Estimating the parameters •Suppose, we have a random sample of size n from the population: • We must decide how to use data to obtain estimates of the intercept and the slope parameters. • There are several approaches that we can use. Let’s start by looking at the data and then use the method of least squares. } ,..., 1 : ) , {( n i y x i i 
  • 55.
    Data • Sample • Forobservation i we have: • We have x and y => we can plot the data. • We do not have β0 β1 hence we do not know ui. } ,..., 1 : ) , {( n i y x i i                                                        n n n n n u x y u x y u x y x x x y y y 1 0 2 2 1 0 2 1 1 1 0 1 2 1 2 1 ,          x y i i i u x y    1 0  
  • 56.
    Least Squares Approach •This approach is to obtain parameters such that the sum of squared residuals is minimized (=least squares): • First order conditions:        n i i i b b n i i b b x b b y u 1 2 1 0 , 1 2 , ) ( ˆ min min 1 0 1 0                                         n i i i i n i i i n i i i n i i i x b b y x b x b b y x b b y b x b b y 1 1 0 1 1 2 1 0 1 1 0 0 1 2 1 0 0 ) ( 2 0 ) ( 0 ) ( 2 0 ) (
  • 57.
    Least Squares Approach •The solution to this system of linear equations are: ) ( ) , ( 1 1 0 X Var Y X Cov b X b Y b   
  • 58.
    Second Order Conditions •To make sure we have found a min and not max solution we need to check the second order condition for least squares problem. • For that we need to get second order derivatives and check if the resulting matrix is positively definite: 2 1 1 2 1 0 2 0 1 1 2 1 0 2 1 0 1 2 1 0 2 2 0 1 2 1 0 2 ) ( ) ( ) ( ) ( b x b b y b b x b b y b b x b b y b x b b y n i i i n i i i n i i i n i i i                                                  