1. CORRELATION AND REGRESSION ANALYSIS
CHAPTER-15
RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
2. What is Correlation?
Correlation measures the degree of
association between two or more variables.
When we are dealing with two variables, we
are talking in terms of simple correlation and
when more than two variables are involved,
the subject matter of interest is called multiple
correlation.
SLIDE 7-1
RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
SLIDE 15-1
3. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Types of Correlation
Positive correlation - When two variables X and
Y move in the same direction, the correlation
between the two is positive.
Negative correlation: When two variables X and
Y move in the opposite direction, the correlation is
negative.
Zero correlation: The correlation between two
variables X and Y is zero when the variables
move in no connection with each other.
SLIDE 15-2
8. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I Quantitative Estimate of a Linear
Correlation
The linear correlation coefficient takes a value
between –1 and +1 (both values inclusive).
If the value of the correlation coefficient is equal to 1,
the two variables are perfectly positively correlated
and the scatter of the points of the variables X and Y
will lie on a positively sloped straight line.
Similarly, if the correlation coefficient between the two
variables X and Y is –1, the scatter of the points of
these variables will lie on a negatively sloped straight
line and such a correlation will be called a perfectly
negative correlation.
SLIDE 15-7
9. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I Testing the significance of the
correlation coefficient
The hypothesis to be tested is mentioned below:
H0 : ρ = 0
H1 : ρ ≠ 0
Test statistic is given by,
Where,
ρ = Population correlation coefficient between the variables X and Y
r = Sample correlation coefficient between the variables X and Y
n – 2 = The degrees of freedom
Now for a given level of significance, if computed |t| is greater than
tabulated |t| with n – 2 degrees of freedom, the null hypothesis of
no correlation between X and Y is rejected.
SLIDE 15-8
10. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Simple Regression Analysis
Simple linear regression equation can be presented as
Y = α + βX + U
Where,
U = Stochastic error term
α, β = Parameters to be estimated
The equation is estimated using the ordinary least
squares (OLS) method of estimation.
The OLS method of estimation states that the
regression line should be drawn in such a way so as
to minimize the error sum of squares.
SLIDE 15-9
12. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Simple Regression Analysis
The estimate of error (residual) term is obtained as:
The estimate of the variance of the error term is given by:
n and k denote the sample size and number of parameters to be estimated
respectively.
SLIDE 15-11
variable
dependent
the
of
value
estimated
Ŷ
where
variable
dependent
the
of
value
observed
Y
where
13. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I Test of Significance of Regression
Parameters
The hypothesis to be tested for the slope coefficient is mentioned
below as:
H0 : β = 0
H1 : β ≠ 0
The test statistic to be used to test the significance of the slope
coefficient is given by:
At a given level of significance, computed t is compared with
absolute t to accept or reject the null hypothesis.
SLIDE 15-12
14. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I Goodness of fit of regression
equation
r2 is used to measure the goodness of fit of regression equation.
This measure is also called the coefficient of determination of a
regression equation and it takes value between 0 and 1. Higher
the value of r2, higher is the goodness of fit.
SLIDE 15-13
15. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Testing the Significance of r2
The hypothesis to be tested is:
H0 : r2 = 0
H1 : r2 > 0
The test statistic F is given by the expression:
For a given level of significance α, the computed value of the
F statistic is compared with the tabulated value of F with k – 1
degrees of freedom in the numerator and n – k degrees of
freedom in the denominator. If the computed F exceeds the
tabulated F, the null hypothesis is rejected in favour of the
alternative hypothesis.
SLIDE 15-14
16. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Multiple Regression Model
In the multiple regression model, there are at least two independent
variables. The linear multiple regression model with two
independent variables would look like:
Y = b0 + b1 X1 + b2 X2 + U
b0, b1 and b2 are the parameters to be estimated.
The estimation is carried out using the OLS method which results in
the following:
SLIDE 15-15
17. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Multiple Regression Model
In case of multiple regression model, we have the concept of the
multiple correlation squared given by R2
Y.X1X2 which indicates the
explanatory power of the model. The various formulae for R2 are
given as under:
The test of significance of the individual parameters is
conducted using the t statistic. To be able to use the t
statistic we need the estimates of the variance of the
estimated coefficients of the regression equation.
SLIDE 15-16
19. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Multiple Regression Model
Let us assume that we want to test the significance of the slope
coefficient of the variable X1. We can write the null and alternative
hypothesis as:
H0 : b1 = 0
H1 : b1 ≠ 0
The test statistic may be written as:
The value of the test statistic t is computed and compared with
the table value of t for a given level of significance. If the
computed value of |t| is greater than table value of |t|, we reject
H0 in favour of the alternative hypothesis H1.
SLIDE 15-18
20. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I
Testing the Significance of R2
The test for the significance of R2 is carried out using the F
statistic, which is already explained in the case of the two
variable linear regression model. The hypothesis to be tested is
listed as under:
H0 : b0 = b1 = b2 = 0 ⇒ R2 = 0
H1 : All b’s are not zero ⇒ R2 > 0
If R2 is equal to 0 that means all the coefficients are equal to zero
since none of the independent variables would explain any
variations in Y.
The test for the significance of R2 is shown through the analysis
of variance (ANOVA) table.
SLIDE 15-19
22. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I Dummy Variables in Regression
Analysis
In regression analysis, the dependent variable is generally metric
in nature and it is most often influenced by other metric
variables.
However, there could be situations where the dependent variable
may be influenced by the qualitative variables like gender,
marital status, profession, geographical region, colour, or
religion.
The question arises how to quantify qualitative variables.
In situations like this, the dummy variables come to our rescue.
They are used to quantify the qualitative variables.
The number of dummy variables required in the regression
model is equal to the number of categories of data less one.
Dummy variables usually assume two values 0 and 1.
SLIDE 15-21
23. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I Example of a Dummy Variable
Regression
Suppose the starting salary of a college lecturer is influenced not
only by years of teaching experience but also by gender. Therefore,
the model could be specified as:
Y = f (X, D)
Where,
Y = Starting salary of a college lecturer in thousands ` per month
X = No. of years of work experience
D is a dummy variable which takes values
D = 1 (if the respondent is a male)
= 0 (if the respondent is a female)
The model could be written as,
Y = α + β X + γ D + U
SLIDE 15-22
24. RESEARCH CONCEPTS AND
D
R
D
E
E
PA
K
C
H
A
W
L
A
D
R
N
E
E
N
A
S
O
N
D
H
I Example of a Dummy Variable
Regression
This can be estimated by using ordinary least squares (OLS)
techniques. Suppose the estimated regression equation looks like:
The above two equations differ by the amount γˆ. It is known that
γˆ can be positive or negative. If γˆ is positive it would imply that
the average salary of a male lecturer is more than that of a
female lecturer by the amount γˆ while keeping the number of
years of experience constant.
SLIDE 15-23