• Correlation is another way of assessing the relationship between variables.
– it measures the extent of correspondence between the ordering of two random variables.
• There is a large amount of resemblance between regression and correlation but for their methods of interpretation of the relationship.
– For example, a scatter diagram is of tremendous help when trying to describe the type of relationship existing between two variables.
2. Module Aims
Introduction
Measuring correlation
Interpretation of the correlation coefficient
Deducing r from a scatter diagram
Solved Example
Causality
Spurious correlation
Coefficient of determination
Spearman’s rank correlation
Development Finance (M.Sc.) Correlation Analysis 2 / 10
3. Introduction
Correlation is another way of assessing the relationship between
variables.
it measures the extent of correspondence between the ordering of two
random variables.
There is a large amount of resemblance between regression and
correlation but for their methods of interpretation of the relationship.
For example, a scatter diagram is of tremendous help when trying to
describe the type of relationship existing between two variables.
Development Finance (M.Sc.) Correlation Analysis 3 / 10
4. Measuring correlation
Pearson’s correlation coefficient (linear product-moment
correlation coefficient) expresses the strength of the relationship.
This coefficient is generally used when variables are of quantitative
nature, that is, ratio or interval scale variables.
Pearson’s correlation coefficient is denoted by r and is defined by:
r =
n
P
xy −
P
x
P
y
rn
n
P
x2
− (
P
x)
2
o n
n
P
y2
− (
P
y)
2
o
The value of r always lies between –1 and 1 inclusive, i.e, −1 ≤ r ≤ 1.
If Y increases when X increases, we say that, there is positive or
direct correlation between them.
However, if Y decreases when X increases (or vice versa), then we say
that, they are negatively or inversely correlated.
The sign of r is always the same as that of (the gradient) b in the
regression equation: ŷ = a + bX.
Development Finance (M.Sc.) Correlation Analysis 4 / 10
5. Interpretation of the correlation coefficient
The extreme values of r, i.e, when r = ±1, indicate that there is perfect
(positive or negative) correlation between X and Y.
However, if r is 0, there is no or zero correlation between X and Y.
Pearson’s correlation coeff. measures linear relationship only not non-linear relationships.
NB: We observe that the strength of the relationship between X and Y is the same whether r = 0.85 or –0.85.
The only difference is that the there is direct correlation in the first case and inverse correlation in the second.
Development Finance (M.Sc.) Correlation Analysis 5 / 10
6. Deducing r from a scatter diagram
Development Finance (M.Sc.) Correlation Analysis 6 / 10
8. Causality
Causality, also known as causation
a cause-effect relationship between two variables.
A significant correlation does not necessarily indicate causality but
rather a common linkage in a sequence of events.
One type of significant correlation situation is when both variables are
influenced by a common cause and therefore are correlated with each
other.
Example:
individuals with a higher level of income have both higher levels of
savings and spending.
We might find that there is a positive correlation between level of
savings and level of spending but this does not mean that one variable
causes the other.
Development Finance (M.Sc.) Correlation Analysis 8 / 10
9. Spurious correlation and coefficient of determination
Spurious Correlation
Spurious correlation occurs between two variables that are
supposed to be mutually independent.
Coefficient of determination
The coefficient of determination is much more useful than the
correlation coefficient
It is denoted by r2 and is simply the square of the correlation coefficient.
While the correlation coefficient only describes the strength of the
relationship, the coefficient of determination gives the variability in Y
explained by the variability in X.
For example:
if r = 0.8 (high positive correlation), r2
is only 0.64.
therefore, the variability in X accounts for 64% of the variability in Y.
In terms of regression, it simply means that, apart from the predictor X,
there are other factors which also influence the response variable Y (the
remaining 36%) but which have not been considered in the analysis.
Development Finance (M.Sc.) Correlation Analysis 9 / 10
10. Spearman’s rank correlation
This is a non-parametric method of finding the correlation
between two random variables.
The calculation is carried out under the following assumptions:
1 The pairs of observations are at least at the order scale of measurement.
2 The observations have been sampled from a bivariate population of
continuous random variables.
3 The distribution needs not be of any specific form.
Merits
1 Not much needs to be known about the population.
2 This method is very useful in a wide variety of applications.
Drawbacks
1 Unlike the Pearson’s correlation coefficient, this method does not take
into account the values of the observations.
2 This method is not very appropriate with too many ties.
Development Finance (M.Sc.) Correlation Analysis 10 / 10