Introduction to Regression Analysis

INTRODUCTION TO REGRESSION
ANALYSIS AND TYPES OF DATA
Session 02
Created by: Sibashis
Chakraborty

ORIGIN
First introduced
by Francis Galton
in his famous
“Law of Universal
Regression”.
The law stated that,
although there was a
tendency for tall parents
to have tall children and
for short parents to have
short children, the
average height of
children born of parents
of a given height tended
to move or “regress”
toward the average
height in the population
as a whole.
Later the law was
confirmed by
Karl Pearson.
Regression to
‘Mediocrity’.

MODERN INTERPRETATION OF REGRESSION
In very general terms Regression is
concerned with describing and evaluating
the functional or the causal relationship
among variables.
More Specifically, Regression analysis is
concerned with the study of the
dependence of one variable, the
dependent variable, on one or more other
variables, the explanatory variables, with
a view to estimating and/or predicting the
(population) mean or average value of the
former in terms of the known or ﬁxed (in
repeated sampling) values of the latter.

SIMPLE REGRESSION VS MULTIPLE REGRESSION
If we are studying the dependence
of a variable on only a single
explanatory variable, such as that
of consumption expenditure on real
income, such a study is known as
Simple, or two-variable, regression
analysis.
If we are studying the dependence
of one variable on more than one
explanatory variable, as in the
crop-yield, rainfall, temperature,
sunshine, and fertilizer examples, it
is known as Multiple regression
analysis.

REGRESSION VERSUS CAUSATION
A statistical relationship in itself cannot logically imply causation.
To understand this better let us refer to the statistical relationship
between Income and Consumption presented in our previous slide.
𝑌 = 𝛽1 + 𝛽2 𝑋 + u
A feature of the above relationship becomes apparent, that, there is
a one-way-causation between income and consumption and not the
other way around.
There is no statistical reason to assume that income does not depend
upon consumption. The fact that we treat Consumption as dependent
on Income is due to non-statistical considerations.

CONTINUED
 “A statistical relationship, however strong and
however suggestive, can never establish causal
connection: our ideas of causation must come from
outside statistics, ultimately from some theory or
other.” – M.G Kendall and A. Stuart

REGRESSION VS
CORRELATION
The primary objective of Correlation analysis concerns with
measuring the strength or degree of linear association
between two variables.
In Regression analysis, we are not primarily interested in
such a measure. Instead, we try to estimate or predict the
average value of one variable on the basis of the ﬁxed
values of other variables.
In Regression analysis there is an asymmetry in the way the
dependent and explanatory variables are treated.
In Correlation analysis, however we treat the variables
symmetrically.

NOTE ON SYMMETRIC AND ASYMMETRIC
TREATMENT OF VARIABLES.
In Regression analysis, the
dependent variable is assumed to
be statistical, random, or stochastic,
that is, to have a probability
distribution. The explanatory
variables, on the other hand, are
assumed to have ﬁxed values (in
repeated sampling). - Asymmetric
In Correlation analysis on the other
hand, no such distinction is made
between the dependent and
explanatory variables. - Symmetric

TYPES OF DATA
Data
Time
Series
Cross-
Section
Pooled
Data
Panel
Data

TIME SERIES
DATA
Time Series data also called Macro data, are those
that are collected for the same entity for different
periods of time.
Simply put, a time series dataset consists of
observations on one or more variables over time.
The issue of ‘data frequency’ is important in the context
of time series data. Most common data frequencies are
annual, quarterly, monthly, weekly.

CROSS-SECTION
DATA
Also known as Micro-data, are those collected for
different entities in a single period of time. A cross
sectional data may consist of a sample of individuals,
households, firms, regions, countries or any other type
of units at a specific time point.
Cross sectional data are extensively used in
agricultural economics, industrial economics, labour
economics, health economics, demography, etc.

POOLED DATA In Pooled, or combined, data are elements of both
Time-series and Cross-section data.

PANEL DATA
A special type of Pooled data, also called
Longitudinal data, are data collected for multiple
entities where each entity is observed in two or more
time points.
For instance, if we collect data on some
macroeconomic variables (GDP, Money supply,
exports, etc) for some countries for two or more years,
and arrange this data in a systematic manner, then our
dataset is called a panel dataset.

UPCOMING PRESENTATIONS
In our upcoming discussions we will dive deeper into the various
assumptions and their implications pertaining to Regression
Analysis.

REFERENCES
Gujarati N Damodar, Porter C Dawn, Gunasekar Sangeetha ; Basic Econometrics
(Fifth Edition).
Bhaumik K Sankar ; Principles of Econometrics: A modern approach using Eviews (First
Edition).

Introduction to Regression Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Regression Analysis

Similar to Introduction to Regression Analysis (20)

Recently uploaded

Recently uploaded (20)

Introduction to Regression Analysis