1. 405: ECONOMETRICS
Chapter # 1: THE NATURE OF REGRESSION ANALYSIS
By: Domodar N. Gujarati
Prof. M. El-SakkaProf. M. El-Sakka
Dept of Economics: Kuwait UniversityDept of Economics: Kuwait University
2. THE MODERN INTERPRETATION OF REGRESSION
• Regression analysis is concerned with the study of theRegression analysis is concerned with the study of the dependencedependence ofof
one variable, theone variable, the dependentdependent variable, on one or more other variables,variable, on one or more other variables,
thethe explanatoryexplanatory variables,variables, with a view to estimating and/or predictingwith a view to estimating and/or predicting
the (population)the (population) mean or averagemean or average value of the former in terms of thevalue of the former in terms of the
known or fixed (known or fixed (in repeated samplingin repeated sampling) values of the latter.) values of the latter.
ExamplesExamples
1.1. Consider Galton’s law of universal regression. Our concern is findingConsider Galton’s law of universal regression. Our concern is finding
out how theout how the average height of sons changesaverage height of sons changes,, given the fathers’given the fathers’ heightheight..
To see how this can be done, consider Figure 1.1, which is a scatterTo see how this can be done, consider Figure 1.1, which is a scatter
diagram, or scattergram.diagram, or scattergram.
3.
4. 2.2. Consider the scattergram in Figure 1.2, which gives the distribution in aConsider the scattergram in Figure 1.2, which gives the distribution in a
hypothetical populationhypothetical population of heights of boysof heights of boys measuredmeasured atat fixed agesfixed ages..
5. 3.3. studying the dependence ofstudying the dependence of personal consumption expenditurepersonal consumption expenditure on after taxon after tax
or disposable real personalor disposable real personal incomeincome. Such an analysis may be helpful in. Such an analysis may be helpful in
estimating the marginal propensity to consume (MPC.estimating the marginal propensity to consume (MPC.
4.4. A monopolist who canA monopolist who can fix the price or outputfix the price or output (but not both) may want to(but not both) may want to
find out thefind out the responseresponse of the demand for a product to changes in price. Suchof the demand for a product to changes in price. Such
an experiment may enable the estimation of the pricean experiment may enable the estimation of the price elasticityelasticity of theof the
demand for the product and may help determine the mostdemand for the product and may help determine the most profitable price.profitable price.
5.5. We may want to study the rate of change ofWe may want to study the rate of change of money wagesmoney wages in relation to thein relation to the
unemployment rateunemployment rate. The curve in Figure 1.3 is an example of the. The curve in Figure 1.3 is an example of the PhillipsPhillips
curvecurve. Such a scattergram may enable the labor economist to predict the. Such a scattergram may enable the labor economist to predict the
average change in money wagesaverage change in money wages given a certaingiven a certain unemployment rateunemployment rate..
6.
7. 6.6. The higher the rate ofThe higher the rate of inflationinflation ππ, the lower the proportion (k) of, the lower the proportion (k) of theirtheir
income that people would want to hold in the form ofincome that people would want to hold in the form of moneymoney. Figure 1.4.. Figure 1.4.
8. • 7. The marketing director of a company may want to know how the7. The marketing director of a company may want to know how the
demand for the company’s productdemand for the company’s product is related to, say,is related to, say, advertisingadvertising
expenditureexpenditure. Such a study will be of considerable help in finding out the. Such a study will be of considerable help in finding out the
elasticityelasticity of demand with respect to advertising expenditure. Thisof demand with respect to advertising expenditure. This
knowledge may be helpful in determining theknowledge may be helpful in determining the “optimum” advertising“optimum” advertising
budget.budget.
• 8. Finally, an agronomist may be interested in studying the dependence of8. Finally, an agronomist may be interested in studying the dependence of
crop yieldcrop yield, say, of wheat,, say, of wheat, on temperature, rainfall, amount of sunshine, andon temperature, rainfall, amount of sunshine, and
fertilizerfertilizer. Such a dependence analysis may enable the prediction of the. Such a dependence analysis may enable the prediction of the
average crop yield, given information about the explanatory variables.average crop yield, given information about the explanatory variables.
9. 1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS
• In statistical relationshipsIn statistical relationships among variables we essentially deal withamong variables we essentially deal with randomrandom
or stochastic variablesor stochastic variables, that is, variables that have, that is, variables that have probability distributionsprobability distributions..
In functional or deterministicIn functional or deterministic dependency, on the other hand, we also dealdependency, on the other hand, we also deal
with variables, but these variableswith variables, but these variables are not random or stochasticare not random or stochastic..
• The dependence ofThe dependence of crop yieldcrop yield on temperature, rainfall, sunshine, andon temperature, rainfall, sunshine, and
fertilizer, for example, isfertilizer, for example, is statisticalstatistical in naturein nature
• InIn deterministicdeterministic phenomena, we deal with relationships of the type, say,phenomena, we deal with relationships of the type, say,
exhibited byexhibited by Newton’s law of gravityNewton’s law of gravity, which states: Every particle in the, which states: Every particle in the
universe attracts every other particle with a force directly proportional touniverse attracts every other particle with a force directly proportional to
the product of their masses and inversely proportional to the square of thethe product of their masses and inversely proportional to the square of the
distance between them. Symbolically,distance between them. Symbolically, F = k(mF = k(m11mm22/r/r22
), where F = force, m), where F = force, m11 andand
mm22 are the masses of the two particles, r = distance, and k = constant ofare the masses of the two particles, r = distance, and k = constant of
proportionality.proportionality. we are not concerned with such deterministic relationships.we are not concerned with such deterministic relationships.
10. 1.4 REGRESSION VERSUS CAUSATION
• Although regression analysis deals with theAlthough regression analysis deals with the dependencedependence of one variable onof one variable on
other variables,other variables, it does not necessarily imply causationit does not necessarily imply causation. In the crop-yield. In the crop-yield
example cited previously, there is noexample cited previously, there is no statistical reasonstatistical reason toto assume thatassume that rainfallrainfall
does not depend on crop yielddoes not depend on crop yield. The fact that we treat crop yield as dependent. The fact that we treat crop yield as dependent
on rainfall (among other things) is due toon rainfall (among other things) is due to non-statistical considerationsnon-statistical considerations::
Common sense suggests that theCommon sense suggests that the relationship cannot be reversedrelationship cannot be reversed, for we, for we
cannot control rainfall by varying crop yield. A statistical relationshipcannot control rainfall by varying crop yield. A statistical relationship inin
itself cannot logically imply causationitself cannot logically imply causation. To ascribe causality, one must appeal. To ascribe causality, one must appeal
to a priori or theoretical considerations.to a priori or theoretical considerations.
11. 1.5 REGRESSION VERSUS CORRELATION
• InIn correlationcorrelation analysis, the primary objective is to measure theanalysis, the primary objective is to measure the strength orstrength or
degree of linear association between two variablesdegree of linear association between two variables.. For example,For example, smokingsmoking
andand lung cancerlung cancer, scores on, scores on statisticsstatistics andand mathematicsmathematics examinations, and soexaminations, and so
on. In regression analysis,on. In regression analysis, we try to estimate or predict the average value ofwe try to estimate or predict the average value of
one variable on the basis of the fixed values of otherone variable on the basis of the fixed values of other variablesvariables..
• Regression and correlation have some fundamental differences. InRegression and correlation have some fundamental differences. In
regression analysis there is anregression analysis there is an asymmetryasymmetry in the way the dependent andin the way the dependent and
explanatory variables are treated.explanatory variables are treated.
• In correlation analysis, we treat any (two) variablesIn correlation analysis, we treat any (two) variables symmetricallysymmetrically; there is; there is
nono distinction between the dependent and explanatory variablesdistinction between the dependent and explanatory variables. The. The
correlation between scores on mathematics and statistics examinations iscorrelation between scores on mathematics and statistics examinations is
the same as that between scores on statistics and mathematicsthe same as that between scores on statistics and mathematics
examinations. Moreover, both variables are assumed to beexaminations. Moreover, both variables are assumed to be randomrandom..
Whereas most of the regression theory to be dealt with here is conditionalWhereas most of the regression theory to be dealt with here is conditional
upon the assumption that the dependent variable isupon the assumption that the dependent variable is stochasticstochastic but thebut the
explanatory variables areexplanatory variables are fixed or nonstochasticfixed or nonstochastic..
12. 1.6 TERMINOLOGY AND NOTATION
• In the literature the termsIn the literature the terms dependent variable and explanatory variable aredependent variable and explanatory variable are
described variously. A representativedescribed variously. A representative list is:list is:
13. • We will use theWe will use the dependent variable/explanatorydependent variable/explanatory variable or the morevariable or the more
neutral,neutral, regressand and regressorregressand and regressor terminology.terminology.
• The termThe term random is a synonym for the term stochasticrandom is a synonym for the term stochastic. A random or. A random or
stochastic variable is a variable that can take on any set of values, positivestochastic variable is a variable that can take on any set of values, positive
or negative, with a given probability.or negative, with a given probability.
14. 1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS
• Types of DataTypes of Data
• There are three types of data: time series, cross-section, and pooled (i.e.,There are three types of data: time series, cross-section, and pooled (i.e.,
combination of time series and cross-section) data.combination of time series and cross-section) data.
• AA time seriestime series is a set of observations on theis a set of observations on the values that a variable takes atvalues that a variable takes at
different timesdifferent times. It is collected at. It is collected at regular time intervalsregular time intervals, such as daily,, such as daily,
weekly, monthly quarterly, annually, quinquennially, that is, every 5 yearsweekly, monthly quarterly, annually, quinquennially, that is, every 5 years
(e.g., the census of manufactures), or decennially (e.g., the census of(e.g., the census of manufactures), or decennially (e.g., the census of
population).population).
• Most empirical work based on time series data assumes that the underlyingMost empirical work based on time series data assumes that the underlying
time series is stationary. Loosely speakingtime series is stationary. Loosely speaking a time series is stationary if itsa time series is stationary if its
mean and variance do not vary systematically over time.mean and variance do not vary systematically over time.
15.
16. • Cross-Section Data.Cross-Section Data. Cross-section data are data on one or more variablesCross-section data are data on one or more variables
collectedcollected at the same point in time,at the same point in time, such as the census of populationsuch as the census of population
conducted by the Census Bureau every 10 years. example of cross-sectionalconducted by the Census Bureau every 10 years. example of cross-sectional
data is given in Table 1.1. For each year the data on the 50 states are cross-data is given in Table 1.1. For each year the data on the 50 states are cross-
sectional data. because of the stationarity issue, cross-sectional data toosectional data. because of the stationarity issue, cross-sectional data too
have their own problems, specifically the problem ofhave their own problems, specifically the problem of heterogeneityheterogeneity..
• From Table 1.1From Table 1.1 we see that we have some states that produce huge amountswe see that we have some states that produce huge amounts
of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).
When we include suchWhen we include such heterogeneousheterogeneous units in a statistical analysis, the sizeunits in a statistical analysis, the size
or scale effect must be taken into account. To see this clearly, we plot inor scale effect must be taken into account. To see this clearly, we plot in
Figure 1.6 the data on eggs produced and their prices in 50 states for theFigure 1.6 the data on eggs produced and their prices in 50 states for the
year 1990. This figure shows how widely scattered the observations are.year 1990. This figure shows how widely scattered the observations are.
19. • Pooled Data.Pooled Data. In pooled, or combined, data are elements ofIn pooled, or combined, data are elements of both time seriesboth time series
and cross-sectionand cross-section datdata. The data in Table 1.1 are an example of pooled data.a. The data in Table 1.1 are an example of pooled data.
For each year we have 50 cross-sectional observations and for each state weFor each year we have 50 cross-sectional observations and for each state we
have two time series observations on prices and output of eggs, a total ofhave two time series observations on prices and output of eggs, a total of
100100 pooledpooled (or combined) observations.(or combined) observations.
• Panel, Longitudinal, or Micropanel Data.Panel, Longitudinal, or Micropanel Data. This is aThis is a special typespecial type of pooledof pooled
data in which thedata in which the same cross-sectional unit (say, a family or a firm)same cross-sectional unit (say, a family or a firm) isis
surveyed over time.surveyed over time.
• The Sources of DataThe Sources of Data
• The data used in empirical analysis may be collected by aThe data used in empirical analysis may be collected by a governmentalgovernmental
agencyagency (e.g., the Department of Commerce), an(e.g., the Department of Commerce), an international agencyinternational agency (e.g.,(e.g.,
the International Monetary Fund (IMF) or the World Bank), athe International Monetary Fund (IMF) or the World Bank), a privateprivate
organizationorganization (e.g., the Standard & Poor’s Corporation), or an(e.g., the Standard & Poor’s Corporation), or an individualindividual..
Literally, there are thousands of such agencies collecting data for oneLiterally, there are thousands of such agencies collecting data for one
purpose or another.purpose or another.
20. • The Accuracy of DataThe Accuracy of Data
• The quality of the data is often not that good. Some reasons for that are:The quality of the data is often not that good. Some reasons for that are:
• FirstFirst, as noted, most social science data are nonexperimental in nature., as noted, most social science data are nonexperimental in nature.
Therefore, there is the possibility ofTherefore, there is the possibility of observational errorsobservational errors, either of omission, either of omission
or commission.or commission.
• SecondSecond, even in experimentally collected data, even in experimentally collected data errors of measurementerrors of measurement arisearise
from approximations and roundoffs.from approximations and roundoffs.
• ThirdThird, in questionnaire-type surveys, the problem of, in questionnaire-type surveys, the problem of nonresponsenonresponse can becan be
serious; a researcher is lucky to get a 40% response to a questionnaire.serious; a researcher is lucky to get a 40% response to a questionnaire.
• FourthFourth, the sampling methods used in obtaining the data may vary so, the sampling methods used in obtaining the data may vary so
widely that it is often difficult to compare the results obtained from thewidely that it is often difficult to compare the results obtained from the
various samples.various samples.
• FifthFifth, economic data are generally available at a highly aggregate level. For, economic data are generally available at a highly aggregate level. For
example, most macrodata (e.g., GNP, inflation, unemployment).example, most macrodata (e.g., GNP, inflation, unemployment).
• The researcher should always keep in mind that the results of research areThe researcher should always keep in mind that the results of research are
only as good as the quality of the dataonly as good as the quality of the data..
21. A Note on the Measurement Scales of Variables.
• The variables that we will generally encounter fall into four broadThe variables that we will generally encounter fall into four broad
categories:categories: ratio scale, interval scale, ordinal scale, and nominal scaleratio scale, interval scale, ordinal scale, and nominal scale.. It isIt is
important that we understand each.important that we understand each.
• Ratio ScaleRatio Scale. For a variable. For a variable X, taking two values, X1 and X2, the ratio X1/X2X, taking two values, X1 and X2, the ratio X1/X2..
Comparisons such asComparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful.X2 ≤ X1 or X2 ≥ X1 are meaningful.
• Interval ScaleInterval Scale. The distance between two time periods, say (2000–1995) is. The distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000meaningful, but not the ratio of two time periods (2000/1995)./1995).
• Ordinal Scale.Ordinal Scale. Examples are grading systems (A, B, C grades) or incomeExamples are grading systems (A, B, C grades) or income
class (upper, middle, lower). For these variables the ordering exists but theclass (upper, middle, lower). For these variables the ordering exists but the
distances between the categories cannot be quantified.distances between the categories cannot be quantified.
• Nominal Scale.Nominal Scale. Variables such as gender and marital status simply denoteVariables such as gender and marital status simply denote
categories.categories. Such variables cannot be expressedSuch variables cannot be expressed on the ratio, interval, oron the ratio, interval, or
ordinal scales.ordinal scales.
• Econometric techniques that may be suitable for ratio scale variables mayEconometric techniques that may be suitable for ratio scale variables may
not be suitable for nominal scale variables. Therefore, it is important tonot be suitable for nominal scale variables. Therefore, it is important to
bear in mind the distinctions among the four types.bear in mind the distinctions among the four types.