Concept and Importance of Correlation
We may come across certain series wherein there may be more than one variable. A distribution
in which each variable assumes two values is called a Bivariate Distribution. If we measure more
than two variables on each unit of a distribution, it is called Multivariate Distribution. In a
bivariate distribution, we may be interested to find if there is any relationship between the two
variables under study. The Correlation is a statistical tool which studies the relationship between
two variables and the correlation analysis involves various methods and techniques used for
studying and measuring the extent of the relationship between the two variables. Correlation
analysis is used as a statistical tool to ascertain the association between two variables.
“When the relationship is of a quantitative nature, the appropriate statistical tool for
discovering & measuring the relationship and expressing it in a brief formula is known as
- Croxton & Cowden
“Correlation is an analysis of the covariation between two or more variables.”
- A. M. Tuttle
“Correlation Analysis contributes to the understanding of economic behaviour, aids in locating
the critically important variables on which others depend, may reveal to the economist the
connections by which disturbances spread and suggest to him the paths through which
stabilizing forces may become effective.”
- W. A. Neiswanger
“The effect of correlation is to relation is to reduce the range of uncertainty of our prediction.”
The problem in analyzing the association between two variables can be broken down into three
o We try to know whether the two variables are related or independent of each other.
o If we find that there is a relationship between the two variables, we try to know its nature
and strength. This means whether these variables have a positive or a negative
relationship and how close that relationship is.
o We may like to know if there is a causal relationship between them. This means that the
variation in one variable causes variation in another.
When data regarding two or more variables are available, we may study the related variation of
these variables. For e.g. in a data regarding heights (x) and weights (y) of students of a college,
we find that those students who have greater height would have greater weight. Also, students
who have lesser height would have lesser weight. This type of related variation among variables
is called correlation. Correlation may be (i) Simple correlation (ii) Multiple correlation (iii)
Simple correlation concerns with related variation among two variables. Multiple correlation and
partial correlation concern with related variation among three or more variables.
Two variables are said to be correlated when they vary such that
a. The higher values of one variable correspond to the higher values of the other and the
lower values of the variable correspond to the lower values of the other. or
b. The higher values of one variable correspond to the lower values of the other.
Generally, it can be seen that those who are tall will have greater weight, and those who are short
will have lesser weight. Thus height (x) and weight (y) of persons show related variation. And so
they are correlated. On the other hand production (x) and price (y) of vegetables show variation
in opposite directions. Here the higher the production the lower would be the price.
In both the above examples, the variables x and y show related variation. And so they are
TYPES OF CORRELATION
Correlation is positive (direct) if the variables vary in the same directions, that is, if they increase
and decrease together.
Height (x) and weight (y) of persons are positively correlated.
Correlation is negative (inverse) if the variables vary in the opposite directions, that is, if one
variable increases the other variable decreases. Production (x) and price (y) of vegetables are
If variables do not show related variation, they are said to be non – correlated. If variables show
exact linear relationship, they are said to be perfectly correlated. Perfect correlation may be
positive or negative.
Correlation and Causation
o The correlation may be due to chance particularly when the data pertain to a small
o It is possible that both the variables are influenced by one or more other variables.
o There may be another situation where both the variables may be influencing each other so
that we cannot say which is the cause and which is the effect.
Types of Correlation
o Positive and Negative: If the values of the two variables deviate in the same direction
i.e., if the increase in the values of one variable results, on an average, in a corresponding
increase in the values of the other variable or if a decrease in the values of one variable
results, on an average, in a corresponding decrease in the values of the other variable,
correlation is said to be positive or direct. For example: Price & Supply of the
commodity. On the other hand, correlation is said to be negative or inverse if the
variables deviate in the opposite direction i.e., if the increase (decrease) in the values of
one variable results, on the average, in a corresponding decrease (increase) in the values
of the other variable. For example: Temperature and Sale of Woolen Garments.
o Linear and Non-Linear: The correlation between two variables is said to be linear if
corresponding to a unit change in one variable, there is a constant change in the other
variable over the entire range of the values. For example: y = ax + b. The relationship
between two variables is said to be non-linear or curvilinear if corresponding to a unit
change in one variable, the other variable does not change at a constant rate but at a
fluctuating rate. When this is plotted in the graph this will not be a straight line.
o Simple, Partial and Multiple: The distinction amongst these three types of correlation
depends upon the number of variables involved in a study. If only two variables are
involved in a study, then the correlation is said to be simple correlation. When three or
more variables are involved in a study, then it is a problem of either partial or multiple
correlation. In multiple correlation, three or more variables are studied simultaneously.
But in partial correlation we consider only two variables influencing each other while
the effect of other variable is held constant. For example: Let us suppose that we have
three variables, number of hours studied (x); IQ (y); marks obtained (z). In a multiple
correlation we will study the correlation between z with 2 variables x & y. In contrast,
when we study the relationship between x & z, keeping an average IQ as constant, it is
said to be a study involving partial correlation.
Methods of Correlation
METHODS OF CORRELATION
DIAGRAM COVARIENCE RANK CONCURRENT
METHOD CORRELATION DEVIATION
Process of Calculating Coefficient of Correlation
o Calculate the means of the two series: X and Y.
o Take deviations in the two series from their respective means, indicated as x and y. The
deviation should be taken in each case as the value of the individual item minus (–) the
o Square the deviations in both the series and obtain the sum of the deviation-squared
columns. This would give ∑x2 and ∑y2.
o Take the product of the deviations, that is, ∑xy. This means individual deviations are to
be multiplied by the corresponding deviations in the other series and then their sum is
o The values thus obtained in the preceding steps ∑xy, ∑x2 and ∑y2 are to be used in the
formula for correlation.
SCATTER DIAGRAM METHOD
Scatter diagram is a graphic presentation of bivariate data. Here, bivariate data with n pairs of
values is represented by n points on the xy – plane. The two variables are taken along the two
axes, and every pair of values in the data is represented by a point on the graph.
The pattern of distribution of points on the graph can be made use of for the rough estimation of
degree of correlation between the variables.
In the scatter diagram –
a. If the points form a line with positive sloe (a line moving upwards), the variables are
positively and perfectly correlated.
b. If the points form a line with negative slope (a line moving downwards), the variables are
negatively and perfectly correlated.
c. If the points cluster around a line with positive slope the variables are positively
d. If the points cluster around a line with negative slope, the variables are negatively
e. If the points are spread all over the graph, the variables are non correlated.
f. Any other curve – form of spread of points indicates curvilinear relation between the
Scatter diagram is one of the simplest ways of diagrammatic representation of a bivariate
distribution and provides us one of the simplest tools of ascertaining the correlation between two
variables. Suppose we are given n pairs of values of two variables X and Y. For example, if the
variables X and Y denote the height and weight respectively, then the pairs my represent the
heights and weights (in pairs) of n individuals. These n points may be plotted as dots (.) on the x
– axis and y – axis in the xy – plane. (It is customary to take the dependent variable along the x –
axis.) the diagram of dots so obtained is known as scatter diagram. From the scatter diagram we
can form a fairly good, though rough idea about the relationship between the two variables. The
following points may be borne in mind in interpreting the scatter diagram regarding the
correlation between the two variables:
1. If the points are very dense i.e very close to each other, a fairly good amount of
correlation may be expected between the two variables. On the other hand, if the points
are widely scattered, a poor correlation may be expected between them.
2. If the points on the scatter diagram reveal any trend (either upward or downward), the
variables are said to be correlated and if no trend is revealed, the variables are
3. If there is an upward trend rising from lower left hand corner and going upward to the
upper right hand corner , the correlation is positive since this reveals that the values of
the two variables are move in the same direction. If, on the other hand the points depict a
downward trend from the upper left hand corner, the correlation is negative since in this
case the values of the two variables move in the opposite directions.
4. In particular , if all the points lie on a straight line starting from the left bottom and going
up towards the right top, the correlation is perfect and positive , and if all the points lie on
a straight line starting from the left top and coming down to right bottom , the correlation
is perfect and negative.
5. The method of scatter diagram is readily comprehensible and enables us to form a rough
idea of the nature of the relationship between the two variables merely by inspection of
the graph. Moreover, this method is not affected by extreme observation whereas all
mathematical formulae of ascertaining correlation between two variables are affected by
extreme observations. However, this method is not suitable if the number of observations
is fairly large.
6. The method of scatter diagram tells us about the nature of the relationship whether it is
positive or negative and whether it is high or low. It does not provide us exact measure of
the extent of the relationship between the two variables.
7. The scatter diagram enables us to obtain an approximate estimating line or line of best fit
by free hand method. The method generally consists in stretching a piece of thread
through the plotted points to locate the best possible line.
KARL PEARSON’S COEFFICIENT OF CORRELATION (COVARIENCE METHOD;
This is a measure of linear relationship between the two variables. It indicates the degree of
correlation between the two variables. It is denoted by ‘r’.
INTERPRETATION OF COEFFICIENT OF CORRELATION
a. A positive value of r indicates positive correlation
b. A negative value of r indicates negative correlation
c. r = +1 means, correlation is perfect positive.
d. r = -1 means, correlation is perfect negative.
e. r = 0 (or low) means, the variables are non – correlated.
Karl Pearson’s measure known as Pearsonian correlation co efficient between two variables
( series) X and Y , usually donated by r , is a numerical measure of linear relationship between
them and is defined as the ratio of the covariance between X and Y , written as Cov (x, y) to the
product of standard deviation of X and Y .
Assumptions of the Karl Pearson’s Correlation
o The two variables X and Y are linearly related.
o The two variables are affected by several causes, which are independent, so as to form a
Coefficient of Determination
The strength of r is judged by coefficient of determination, r2 for r = 0.9, r2 = 0.81. We multiply
it by 100, thus getting 81 per cent. This suggests that when r is 0.9 then we can say that 81 per
cent of the total variation in the Y series can be attributed to the relationship with X.
Limitations of Spearman’s Method of Correlation
o Spearman’s r is a distribution-free or non parametric measure of correlation.
o As such, the result may not be as dependable as in the case of ordinary correlation where
the distribution is known.
o Another limitation of rank correlation is that it cannot be applied to a grouped frequency
o When the number of observations is quite large and one has to assign ranks to the
observations in the two series, then such an exercise becomes rather tedious and time-
consuming. This becomes a major limitation of rank correlation.
Some Limitations of Correlation Analysis
o Correlation analysis cannot determine cause-and-effect relationship.
o Another mistake that occurs frequently is on account of misinterpretation of the
coefficient of correlation and the coefficient of determination.
o Another mistake in the interpretation of the coefficient of correlation occurs when one
concludes a positive or negative relationship even though the two variables are actually