1. 1
Correlation and Regression Analysis
Day 5, Session I
M. Amir Hossain, Ph.D.
Professor, ISRT, University of Dhaka
Correlation and Regression
The most commonly used forms of bi-variate statistical analysis
Useful in making business and economic decisions
Helpful in identifying the nature of relationship among many
business and economic variables
Recognize that there is a quantifiable relationship between two
or more variables
One variable depends on another and can be determined by it
2. 2
Correlation and Regression
The variables :
Students GPAs and amount of time they spend on studying
A firm’s sale and expenditure on advertisement
Dependent variable and Independent variable
Determination of dependent and independent variable is crucial
Usually
X : Independent variable
Y : Dependent variable
Scatter Diagram
A plot of the paired observations of X and Y on a graph
Graphically shows the relationship between two variables
Common practice is to place the dependent variable on Y–axis
and independent variable on X–axis
Ex. Sales and advertisement expenditures (in million Taka) of a
firm on different months are
Sales 3 6 4 6 3 5 4
Advertisement 2 4 2 3 1 3 2.5
4. 4
Correlation Analysis
Correlation Analysis: A group of statistical techniques used to
measure the strength of the relationship (correlation) between
two variables.
Scatter Diagram: A chart that portrays the relationship between
the two variables of interest.
Dependent Variable: The variable that is being predicted or
estimated.
Independent Variable: The variable that provides the basis for
estimation. It is the predictor variable.
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of the strength of the
relationship between two variables.
It requires interval or ratio-scaled data (variables).
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong correlation.
Values close to 0.0 indicate no linear correlation.
Negative values indicate an inverse relationship and positive
values indicate a direct relationship.
5. 5
The Coefficient of Correlation, r
Perfect Negative Correlation
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
0
Y
X
7. 7
Strong Positive Correlation
0 1 2 3 4 5 6 7 8 9 10
10
9
8
7
6
5
4
3
2
1
0
Y
Formula for correlation Coefficient (r)
r
n XY X Y
n X X n Y Y
( ) ( )( )
( ) ( )2 2 2 2
22
)()(
))((
YYXX
YYXX
r
8. 8
Coefficient of Determination
The Coefficient of Determination, r2 - the proportion of the total
variation in the dependent variable Y that is explained or
accounted for by the variation in the independent variable X.
The coefficient of determination is the square of the coefficient of
correlation, and ranges from 0 to 1.
Example: Sales and advertisement expense data,
r = 0.759 and r2 = (0.759)2 = 0.576
57.6% variation of sales can be explained by the variation in
advertisement expenses
Regression Analysis
In regression analysis an equation is developed to express the
relationship between dependent and independent variables
The equation must be linear
Purpose: to determine the regression equation; it is used to predict
the value of the dependent variable (Y) based on the
independent variable (X).
Procedure: select a sample from the population and list the paired
data for each observation; draw a scatter diagram to give a visual
portrayal of the relationship; determine the regression equation.
9. 9
Regression Analysis
The relationship between X and Y is described by a linear
function
Changes in Y are assumed to be caused by changes in X
Linear regression population equation model
Where 0 and 1 are the population model coefficients and is a
random error term.
ii10i εxββY
Linear Regression Model
ii10i εXββY
Linear component
Simple Linear Regression Model
The population regression model:
Population
Y intercept
Population
Slope
Coefficient
Random
Error
term
Independent
Variable
Random Error
component
Dependent
Variable
10. 10
19
Random Error for this Xi value
Y
Observed
Value of Y
for Xi
Predicted
Value of Y
for Xi
ii10i εXββY
Xi
Slope = β1
Intercept = β0
εi
Regression Analysis
We estimate β0 and β1 such that ∑e2 is minimum
The error sum of squares ∑e2 will be minimum if
2211
100
x-x
x-x
bβˆ
ˆβˆ
yy
yy
xbyb
11. 11
i10i xbbyˆ
The simple linear regression equation provides an estimate of the
population regression line
Simple Linear Regression Equation
Estimate of the
regression
intercept
Estimate of the
regression slope
Estimated (or
predicted) y value for
observation i
Value of x for
observation i
b0 is the estimated average value of y when the value
of x is zero (if x = 0 is in the range of observed x
values)
b1 is the estimated change in the average value of y
as a result of a one-unit change in x
Interpretation of the Slope and the Intercept
12. 12
Regression Analysis
is the average predicted value of Y for any X.
is the Y-intercept, or the estimated Y value when X=0
is the slope of the line, or the average change in Y’ for
each change of one unit in X
Prediction
The regression equation can be used to predict a value
for y, given a particular x
For a specified value, xn+1 , the predicted value is
1n101n xbbyˆ
13. 13
Coefficient of Determination
The Coefficient of Determination, r2 - the proportion of the total
variation in the dependent variable Y that is explained or
accounted for by the variation in the independent variable X.
The coefficient of determination is the square of the coefficient of
correlation, and ranges from 0 to 1.
Example: Sales and advertisement expense data,
r = 0.759 and r2 = (0.759)2 = 0.576
57.6% variation of sales can be explained by the variation in
advertisement expenses
R 2 = Percentage of total variation in the dependent variable
explained by the independent variable.
From a linear regression model one can write
R2 = (Explained variation/total variation)
= (Total variation – Unexplained variation)
Total variation
Regression Analysis (Coefficient of determination)
14. 14
Regression Analysis (Coefficient of determination)
Total Variation (TSS) =
Unexplained variation (ESS) =
Explained variation (RSS) =
Coefficient of variation (r2) =
TSS
ESS
TSS
RSS
R 12
Regression Analysis (Coefficient of determination)
Co-efficient of determination = R2