1. MIT Arts, commerce and science
college, alandi
Measures of Correlation
Presented
by
Prof. Dr. Sangita Birajdar
Assistant Professor,
Department of Statistics,
MIT ACSC, Alandi
Copyright @ Dr. Sangita Birajdar 1
2. OBJECTIVES
In this Unit you are going to learn correlation and
its measures:
1. Types of data
2. Concept and meaning of correlation
3. Types of correlation
4. Scatter diagram, its interpretation and merits and demerits
5. Covariance and its properties
6. Karl Pearson’s Coefficient of Correlation, its properties and its interpretation
7. Spearman’s Rank correlation coefficient and its interpretation.
8. Concept and meaning of regression
9. Lines of regression
10. Regression coefficients, their properties and interpretation
11. Numerical examples and problems
Copyright @ Dr. Sangita Birajdar 2
3. TYPES OF DATA
Univariate data: If a single variable which can be measured with
only one characteristic under study is known as Univariate data.
For Example: Marks of students in particular subject, Monthly
income of workers, Height of individuals, Blood pressure of adults,
etc.
Bivariate data: Variables which can be measured with two
characteristics at a time with same unit under study are known as
bivariate data.
For example: Day temperature and ice cream sales, income and
expenditure of families, Height and weight of students, Monthly
electricity bill and consumption are the examples of bivariate
data.
Copyright @ Dr. Sangita Birajdar 3
4. TYPES OF DATA
Trivariate Data: When the data involves three variables, it is
categorized under trivariate data.
For example, age, weight and blood pressure of a person, yield of
a crop, temperature and amount of fertiliser used, price, demand
and supply of a commodity, etc. are the examples of trivariate
data.
Bivariate data (Definition): A set of n pairs of observations related
to two variables X and Y are (x1
, y1
), (x2
, y2
), (x3
, y3
), …, (xn
, yn
)
under study is a bivariate data.
Copyright @ Dr. Sangita Birajdar 4
5. CONCEPT OF CORRELATION
Copyright @ Dr. Sangita Birajdar 5
• The mathematical measure of correlation was given by the great
Mathematician and Bio-statistician Karl Pearson in 1896 in the form of
correlation coefficient.
• This was extensively used by Sir Francis Galton to explain many
phenomena in biology and genetics.
• Correlation is a statistical technique that shows whether pairs of variables
are related to each other and how strongly they are correlated.
• The extent of linear relationship between two variables is called as
correlation.
• It measures the intensity of relationship between two variables and not
the causation. It means that, correlation is not a cause and effect
relationship.
• When two variables are correlated, increase or decrease in the values of
one variable corresponds to decrease or increase in the values of another
variable.
6. Types of Correlation
Copyright @ Dr. Sangita Birajdar 6
Depending on direction of changes in pairs of
variables, correlation classified into following three
types:
• Positive Correlation.
• Negative Correlation.
• Zero Correlation.
7. Positive Correlation
•
Copyright @ Dr. Sangita Birajdar 7
A positive correlation indicates the extent to which both
variables increase and decrease in parallel or values of variable
changes in same direction i.e. when values of one variable
increases as the other variable increases, or values of one
variable decreases while the values of other variable also
decreases. This type of correlation is said to be direct
correlation.
For example:
1. Marks obtained in an examination by a group of students are
positively correlated with the number of hours the students
studied for examination.
2. Sale of ice cream is positively correlated with day temperature.
3. Height and weight of a group of persons is positively correlated as
height increases on an average weight also increases.
8. Negative Correlation
•
Copyright @ Dr. Sangita Birajdar 8
A negative correlation indicates the extent to which values of
variable changes in opposite direction, i.e. when the value of one
variable increases (decreases) as the value of other variable
decreases (increases). This correlation is said to be invers
correlation.
For example:
1. Volume and pressure of a perfect gas.
2. Supply and price of a commodity.
3. Negative correlation would be as the slope of a hill increase,
the amount of speed a walker reaches may decrease.
9. Zero Correlation
Zero correlation means no relationship between the two variables
X and Y; i.e. the change in one variable (X) is not associated with
the change in the other variable (Y).
For example, body weight and intelligence, shoe size and monthly
salary, amount of tea drunk and level of intelligence, etc.
Copyright @ Dr. Sangita Birajdar 9
10. Think about the following
• Age and weight of person.
• Blood pressure of a group of bulky persons and their weights.
• Speed of the vehicle and time required to stop the vehicle after applying
break
• Selling prices of flats and its distance from the central place
• The crop yield and rainfall (up to certain extent)
• Marks in English and Marks in Mathematics.
• Height and marks obtained by students.
• Demand and price of the commodity.
• Amount of cereal in meal and maintaining healthy weight.
• Sale of woolen garments and day temperature.
• Student who spent more time on social media, they perform poor in
examination.
• Shoe size and monthly salary.
• Amount of tea drunk and level of intelligence
Copyright @ Dr. Sangita Birajdar 10
11. Remember that !!!
• If increase or decrease in the values of one variable does not
correspond to decrease or increase in the values of another
variable then two variables will be uncorrelated.
• Sometimes, the relationship between two variables is simple
incidence. For instance, the relation between the arrival of
migratory birds in a sanctuary and the birth rates in the
locality. Such correlation may be attributed to chance.
• A third variable’s impact on two variables may give rise to a
relation between the two variables. For instance, the relation
between illiteracy and crime arises due to increase in
population.
Copyright @ Dr. Sangita Birajdar 11
12. Measures of correlation
1) Scatter Diagram
2) Karl Pearson’s Coefficient of Correlation.
3) Spearman’s Rank Correlation.
Copyright @ Dr. Sangita Birajdar 12
13. Scatter Diagram
• A scatter plot visualise relationships or association between two variables.
• It is simplest and attractive method of diagrammatic representation of
bivariate data that gives the idea about whether the variables are correlated
or not.
• In this method each pair of observation are represented by a point in XY
plane and can be defined as follows:
Definition: Suppose {(xi
, yi
); i =1, 2, ..., n} are bivariate data related to two
variables X and Y. If the pairs of n observation are plotted on XY plan by taking
one variable on X axis and other on Y axis with corresponding to every ordered
pairs (xi
, yi
) to get a dots or points, such a diagram of dots known as scatter
diagram.
Copyright @ Dr. Sangita Birajdar 13
15. Merits and Demerits of Scatter Diagram
Merits of Scatter Diagram
1) Scatter diagram is the simplest measure of correlation that enables to get
rough idea of the nature of the relationship between two variables.
2) It is easy to understand.
3) It is not influenced by extreme values.
4) The scatter diagram enables to obtain line of best fit by free hand method.
Demerits of Scatter Diagram
1) It fails to give the magnitude (numeric value) of correlation.
2) It is not useful for qualitative data.
3) It is a subjective method.
Copyright @ Dr. Sangita Birajdar 15
16. Exercise
What type of correlation you expect in the following situation?
1) A student who has many absentees has a decrease in grades.
2) The longer someone invests the more compound interest he will earn.
3) The less time I spend marketing my business, the fewer new customers I
will have.
4) As the temperature goes up, ice cream sales also go up.
5) When an employee works more hours his pay check increases
proportionately.
6) As one exercise more, his Marks in English is less.
7) The older a man gets the less hair that he has.
8) More number of errors in computer program takes longer time to run a
program
Copyright @ Dr. Sangita Birajdar 16
17. Covariance
Covariance: Joint variation between two variables.
If two variables are correlated then Cov(X, Y) ≠ 0 but if they are
not correlated then Cov(X, Y) = 0
If {(xi
, yi
), i=1, 2, 3,…,n} is a bivariate data on (X,Y), then
covariance is define as arithmetic mean of product of deviation of
observations from their respective means and is denoted by Cov
(X,Y) and given by,
Copyright @ Dr. Sangita Birajdar 17
18. Properties of Covariance
1. Cov ( X, Y) = Cov (Y, X)
2. Cov (X, X) = Var (X)
3. Cov(X, -Y)= Cov(-X, Y) = - Cov(X, Y)
4. Cov (X, constant) = 0
5. Effect of change of origin:Covariance is invariant of change of origin
i.e. Cov (X - a, Y - b) = Cov (X, Y), where ‘a’ and ‘b’ are constants.
6. Effect of Change of origin and Scale: Covariance is invariant of change of
origin but variant on change of scale.
Copyright @ Dr. Sangita Birajdar 18
19. KARL PEARSON’S CORRELATION COEFFICIENT OR PRODUCT MOMENT CORRELATION
COEFFICIENT
Definition: Let (x1
, y1
), (x2
, y2
), …, (xn
, yn
) is a bivariate data on (X, Y). The Karl
Pearson’s coefficient of correlation denoted by r or r(X,Y) or rxy
is defined as the
ratio of covariance to the product of standard deviation.
Copyright @ Dr. Sangita Birajdar 19
20. Properties of Karl Pearson’s Correlation Coefficient
1) Limits of Pearson’s correlation coefficient: Pearson’s correlation coefficient lies
between bet -1 to +1 i.e. -1 ≤ r ≤ +1
2) Corr(X, Y) = Corr(Y, X)
3) Corr (X, X) = 1
4) Corr ( -X, Y) = Corr (X, -Y) = -Corr (X, Y)
5) Effect of change of origin
Corr(X - a, Y - b) = Corr(X,Y), where ‘a’ and ‘b’ are constants.
6) Effect of Change of Origin and Scale
Statement: Pearson’s correlation coefficient is invariant to change of origin and scale.
Copyright @ Dr. Sangita Birajdar 20
22. Merits and Demerits of Karl Pearson’s Coefficient of Correlation
Merits:
1. It depends upon all the observations.
2. It gives the extent of linear association between two variables.
3. It also indicates the type of correlation.
Demerits:
1. It fails to measure the non-linear relationship between two
variables.
2. It is unduly affected by extreme values.
3. It cannot be calculated for qualitative data.
Copyright @ Dr. Sangita Birajdar 22
23. SPEARMAN’S RANK CORRELATION (R)
• There are some qualitative variables which are required to quantify in terms
of ranks for example beauty, honesty, temperament etc. In such cases the
characteristics need to be expressed in terms of ranks.
• Some quantitative variables are also there like income, weight etc. which
will be more meaningful when measured in terms of ranks.
• The British psychologist C.E. Spearman in 1904 developed a measure called
as Spearman’s Rank Correlation that calculates the linear association
between ranks assigned to qualitative variables measured on ordinal scale
as well as the quantitative variables measured on interval or ratio scale and
converted into ranks.
• The Spearman’s Rank Correlation coefficient has been derived from Karl
Pearson’s coefficient of correlation where the individual values of the
variables have been replaced by ranks and it has interpretation as like Karl
Pearson’s coefficient of correlation.
• Ranking: Ordered arrangement of items according to their merits.
• Rank: The number indicating the position in ranking.
Copyright @ Dr. Sangita Birajdar 23
24. SPEARMAN’S RANK CORRELATION (R)
• The Spearman’s Rank Correlation coefficient is denoted by R
and given by,
• Where, di
= Rank (X) – Rank (Y)
n = number of pairs of observation.
Note: The Spearman rank correlation coefficient R lies
between – 1 to +1.
• The data under consideration before proceeding with the
Spearman’s Rank Correlation evaluation. The ranks will be
assigned to both the variables either in ascending or
descending order.
Copyright @ Dr. Sangita Birajdar 24
25. Ranks with ties
• If the observations repeated two or more times in the data set
then they are said to be “tied”.
• Each of their ranks equal to the mean of the ranks of the
positions they occupy in the ordered data set and the next
observation will be assigned the rank, next to the rank already
assumed.
• The number of observations getting same rank is called as
length of the tie and it is denoted by m.
Copyright @ Dr. Sangita Birajdar 25
26. Ranks with ties
• For example, in the data set 70, 74, 74, 78, and 79 kg,
observation 2nd
and 3rd
are tied; the mean of 2 and 3 is 2.5, so
the ranks of the five data are 1, 2.5, 2.5, 4, and 5. And the
length of the tie will be m=2.
• In the data set 1.6, 1.7, 1.9, 1.9, and 1.9, observation 3rd
, 4th
and 5th
are tied; the mean of 3, 4, and 5 is 4, so the ranks of
the five data are 1, 2, 4, 4, 4. In this case the length of the tie
will be m=3.
• Then formula for spearman’s rank correlation with ties is as,
Copyright @ Dr. Sangita Birajdar 26
27. Ranks with ties
• Then formula for spearman’s rank correlation with ties is as,
Where,
Copyright @ Dr. Sangita Birajdar 27
28. Merits and Demerits of Spearman’s rank correlation
• Merits of Spearman’s rank correlation
1. It depends upon all the observations.
2. It the linear association between ranks assigned to
qualitative variables measured on ordinal scale as well as the
quantitative variables converted into ranks.
3. It also indicates the type of correlation.
• Demerits of Spearman’s rank correlation
1. It is only approximate measure as actual values are not
used for calculations.
2. It is difficult to calculate Spearman’s rank correlation when
the numbers of ties are too many.
Copyright @ Dr. Sangita Birajdar 28