Covariance and correlation are measures of the relationship between two variables. Covariance measures how much two variables vary together, while correlation measures the strength and direction of the linear relationship between two variables. Correlation values range from -1 to 1, with 0 indicating no relationship, positive values indicating a direct relationship, and negative values indicating an inverse relationship. Several methods can be used to calculate correlation, including Pearson's correlation coefficient, Spearman's rank correlation coefficient, and scatter plots. Correlation does not necessarily imply causation, as two variables can be correlated without one causing the other.
3. Scatter Diagram
Suppose we have two variables say:
Age of the rider
Speed of the motor cycle
Age (Years): 16 22 34 45 21 24 27 53 32 24 26 29
Speed(Km/hr): 62 60 45 46 50 55 54 30 36 39 48 50
6. Covariance :Test of Relationship
Covariance is a measure of how much two random variables vary together.
It's similar to variance, but where variance tells you how a single variable varies.
Covariance tells you how two variables vary together.
A positive value of covariance between two variables indicates a positive
relationship between them. This means that if one variable deviates from the
mean in one direction, the other variable will also deviate from its mean in
the same direction.
A negative covariance between two variables indicates a negative relationship
between them.
The problem of measuring the relationship between two variables by using
covariance is that it is not a standardized measure of relationship.
7.
8.
9.
10.
11.
12. Covariance & Correlation
Covariance is a measure of how much two random variables vary together from
its mean.
Correlation is the degree of inter-relatedness among the two
or more variables.
Correlation analysis is a process to find out the degree of relationship
between two or more variables by applying various statistical tools and
techniques.
Term correlation coined by Karl Pearson in 1902.
Denoted by r
Value lies between -1 and+1
16. Correlation
Correlation only measures the extent of relationship between
variables
Karl Pearson (1867 – 1936), a British Biometrician, developed the
formula for Correlation Coefficient.
The correlation coefficient between two variables X and Y are
denoted by r(X,Y) or rx,y.
17. Coefficient of Correlation
• The coefficient of correlation gives a mathematical value
for measuring the strength of the linear relationship between two
variables.
• A linear relationship (or linear association) is a statistical term used to
describe a straight-line relationship between a variable and a constant.
• It can take values from –1 to 1.
• r= +0.6 ……………………………………Positive
Correlation
• r= -0.7 …………………………………….Negative
Correlation
• Correlation is the concept of linear relationship between two variables.
Whereas correlation coefficient is a measure that measures linear relationship
between two variables
18. Assumptions of Pearson Product Moment Correlation
Quantitative Measure
Linearity:X andY should be linearly related
Absence of Outliers:There Should be no outliers
Normality
Minimum 30Observations
19. Application
Relationship between variables like Job satisfactionand
turnover intention.
Market risk and ScripPrice
Satisfaction and purchaseintention
Yield of rice and rainfall
ICT use and learning effectiveness
20. Methods of Studying Correlation
Scatter Diagram Method
Karl Pearson Coefficient of Correlation
Spearman’s Rank Correlation Coefficient- Measure of the non-
parametric correlation between two ordinal variables. (Rank
Correlation)
21. Types of Correlation
On the basis
of degree of
correlation
Positive
correlation
Negative
correlation
On the basis
of number of
variables
Simple
correlation
Partial
correlation
Multiple
correlation
On the basis
of
linearity
Linear
correlation
Non –linear
correlation
22. Correlation on basis of number of variables
Simple Correlation – Sales & Expense , Income and Consumption
Partial Correlation – Height , Weight and Age – Effect of third variable “ Age” on height and weight
Multiple Correlation-Rainfall and temperature on the yield of wheat , Sales with Advertisement and No of Sales Persons
27. Correlation doesn’t imply causation
The consumption of ice-cream increases
during the summer months. There is a
strong correlation between the sales of ice-
cream units. In this particular example, we
see there is a causal relationship also as the
extreme summers do push the sale of ice-
creams up.
Ice-creams sales also have a strong
correlation with shark attacks. Now as
we can see very clearly here, the shark
attacks are most definitely not caused
due to ice-creams. So, there is no
causation here.
Hence, we can understand that the
correlation doesn't ALWAYS imply
causation!
30. Problem : Variance –Covariance Method
If Variance between X and Y variables is 12.5 and Variance of X and Y are
respectively 16.4 and 13.8 , find the coeficient of correlation between them
31. Problem 2) If covariance between X and Y variables is 12.5 and the variance
of X and Y are 16.4 and 13.8 , find coeff of correlation between them
32.
33. Problem: Find coeff of correlation between X
and Y by square of values method /Product
Moment Method
34.
35. Road Rash
The age of 12 motor cycle riders and their average speed while driving are as follows:
Age (Years): 16 22 34 45 21 24 27 53 32 24 26 29
Speed(Km/hr): 62 60 45 46 50 55 54 30 36 39 48 50
The Correlation Coefficient is -0.7281
36. Height and Weight Weight
(KG) Height (Cms)
56 149
78 164
66 157
63 155
49 159
48 156
52 169
50 163
72 176
79 177
82 175
48 144
56 164
68 169
59 165
Interpretation: The correlation is
positive and high i.e. Height and Weight
are positively related.
This indicates that as height of an
individual increases the weight shall
also increase and vice versa.
The Correlation Coefficient is 0.6551
37. Rank Correlation Coefficient
This correlation formula is used when relation between two variable are studied in
terms of the ranking of each case within each variable.
Mostly used when both the variables relate to some attribute.
Appropriate for data sets that are ordinal in nature.
The range is from [ -1, 1]
38. Spearman’s Rank Correlation
The Spearman's rank-order correlation is thenonparametric version of
the Karl PearsonCoefficient ofCorrelation.
Spearman's correlation coefficient (ρ, also signified byrs)
measures the strength and direction of association between two ranked
variables.
This method is also called rank correlation. It works on the ranking of the
observed score of the variables(Variables can be Ordinal, Interval or Ratio
Scale).
The ranking of the variables will be given as 1-highest,2-2nd
highest, 3, 4,…………………….N.
40. Relation between two attributes Beauty and Intelligence
Subject Score X Score Y
A 7.1 61
B 7.4 53
C 7.9 76
D 6.3 43
E 8.3 73
F 9.6 77
G 7.6 69
H 8.8 81
I 5.9 47
J 6.6 36
X represents beauty
Y represents intelligence
Are X and Y related?
)
1
(
6
1 2
2
n
n
di
Rank X Rank Y
7 6
6 7
4 3
9 8
3 4
1 2
5 5
2 1
10 9
8 10
41. Subject Rank X
(RX)
Rank Y
(RY )
di = RX – RY di
2
1 7 6 1 1
2 6 7 -1 1
3 4 3 1 1
4 9 8 1 1
5 3 4 -1 1
6 1 2 -1 1
7 5 5 0 0
8 2 1 1 1
9 10 9 1 1
10 8 10 -2 4
di
2 =12
98
.
0
)
1
10
(
10
12
6
1
)
1
(
6
1 2
2
2
n
n
di
Rank Correlation can be interpreted in the same fashion as
the Karl Pearson Correlation Coefficient
42. Movie Ranking by 3 Critics .Whats the correlation
Year Movie TOI Critic TOI User IMDB
2020Angrezi Medium 3.5 3.5 7.3
2020Baaghi 3 2.5 2.8 2
2020Thappad 4.5 4.1 6.9
2020
Shubh Mangal Zyada
Saavdhan 3.5 3.4 5.9
2020Bhoot - Part 1 2.5 3 5.5
2020Love aaj kal 3 3.1 5
2020Hacked 2 2.5 4.2
2020Shikara 3 3.2 3.3
2020Malang 3.5 3.4 6.5
2020Jawaani Janeman 3.5 3.4 6.7
2020Panga 4 3.9 7
2020Street Dancer 3D 3.5 3.4 3.6
2020Jai Mummy DI 2 2.5 3.5
2020Tanhji - The Unsung Warrior 4 4.3 7.7
2020Chhapaak 3.5 3.4 5