SlideShare a Scribd company logo
STAT 22000 Lecture Slides
Correlation
Yibi Huang
Department of Statistics
University of Chicago
Recall when we introduced scatter plots in Chapter 1, we assessed
the strength of the association between two variables by eyeballs.
Weak Association Strong Association
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
X
Y
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
X
Y
1
Recall when we introduced scatter plots in Chapter 1, we assessed
the strength of the association between two variables by eyeballs.
Weak Association Strong Association
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
X
Y
Range of Y
when X is
fixed ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
X
Y
Range of Y
when X is
fixed
Large spread of Y Small spread of Y
when X is known when X is known
1
Correlation = Correlation Coefficient, r
Correlation r is a numerical measure of the direction and strength
of the linear relationship between two numerical variables.
“r” always lies between −1 and 1; the strength increases as you
move away from 0 to either −1 or 1.
• r > 0: positive association
• r < 0: negative association
• r ≈ 0: very weak linear relationship
• large |r|: strong linear relationship
• r = −1 or r = 1: only when all the data points on the
scatterplot lie exactly along a straight line
−1 0 1
Neg. Assoc. Pos. Assoc.
Strong Weak No Assoc Weak Strong
Perfect Perfect 2
Positive Correlations
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = 0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = 0.2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = 0.4
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = 0.6
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = 0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = 0.9
3
Negative Correlations
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = −0.1
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = −0.3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = −0.5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = −0.7
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = −0.95
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−1
0
1
2
3
r = −0.99
4
Formula for Computing the Correlation Coefficient “r”
(x1, y1)
(x2, y2)
(x3, y3)
.
.
.
(xn, yn)
The correlation coefficient r
(or simply, correlation) is defined as:
r =
1
n − 1
n
X
i=1
xi − x̄
sx
!
| {z }
z-score of xi
yi − ȳ
sy
!
| {z }
z-score of yi
.
where sx and sy are respectively the sample SD of X
and of Y.
Usually, we find the correlation using softwares
rather than by manual computation.
5
Why r Measures the Strength of a Linear Relationship?
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−3
−2
−1
0
1
2
3
x
y
+
−
+ −
x
y
xi −
− x >
> 0
xi −
− x <
< 0
xi −
− x >
> 0
xi −
− x <
< 0
yi −
− y >
> 0
yi −
− y <
< 0
yi −
− y >
> 0
yi −
− y <
< 0
What is the sign of
xi − x̄
sx
!
yi − ȳ
sy
!
??
Here r > 0;
more positive
contributions than
negative.
What kind of points
have large
contributions to the
correlation?
6
Correlation r Has No Unit
r =
1
n − 1
n
X
i=1
xi − x̄
sx
!
| {z }
z-score of xi
yi − ȳ
sy
!
| {z }
z-score of yi
.
After standardization, the z-score of neither xi nor yi has a unit.
• So r is unit-free.
• So we can compare r between data sets, where variables are
measured in different units or when variables are different.
E.g. we may compare the
r between [swim time and pulse],
with the
r between [swim time and breathing rate].
7
Correlation r Has No Unit (2)
Changing the units of variables does not change the correlation
coefficient r, because we get rid of all the units when we
standardize them (get z-scores).
E.g., no matter the temperatures are recorded in ◦F, or ◦C, the
correlations obtained are equal because
C =
5
9
(F − 32).
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
25 30 35 40 45 50 55
45
55
65
75
HIGH
Daily low (F)
Daily
high
(F)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−5 0 5 10
10
15
20
25
HIGH
Daily low (C)
Daily
high
(C)
8
“r” Does Not Distinguish x & y
Sometimes one use the X variable to predict the Y variable. In this
case, X is called the explanatory variable, and Y the response.
The correlation coefficient r does not distinguish between the two.
It treats x and y symmetrically.
r =
1
n − 1
n
X
i=1
xi − x̄
sx
!
yi − ȳ
sy
!
Swapping the x-, y-axes doesn’t change r (both r = 0.74.)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
20 30 40 50
50
60
70
80
Daily low (F)
Daily
high
(F)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
50 60 70 80
20
30
40
50
Daily high (F)
Daily
low
(F)
9
Correlation r Describes Linear Relationships Only
The scatter plot below shows a perfect nonlinear association. All
points fall on the quadratic curve y = 1 − 4(x − 0.5)2.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.4 0.8
0.0
0.4
0.8
1.2
x
y
●
●
●
●
●
●
r of all black dots = 0.803,
r of all dots = −0.019.
(black + white)
No matter how strong the association,
the r of a curved relationship is NEVER 1 or −1.
It can even be 0, like the plot above. 10
Correlation Is VERY Sensitive to Outliers
Sometimes a single outlier can change r drastically.
● ●
●
●
●
●
●
●
●
●
●
For the plot on the left,
r =









0.0031 with the outlier
0.6895 without the outlier
Outliers that may remarkably change the form of associations
when removed are called influential points.
Remark: Not all outliers are influential points.
11
When Data Points Are Clustered ...
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
In the plot above, each of the two
clusters exhibits a weak negative
association (r = −0.336 and −0.323).
But the whole diagram shows a
moderately strong positive association
(r = 0.849).
• This is an example of the Simpson’s paradox.
• An overall r can be misleading when data points are clustered.
• Cluster-wise r’s should be reported as well.
12
Always Check the Scatter Plots (1)
The 4 data sets below have identical x, y, sx, sy, and r.
Dataset 1 Dataset 2 Dataset 3 Dataset 4
x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.96 8 8.14 8 6.77 8 5.76
13 7.58 13 8.75 13 12.76 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.36 19 12.50
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
Ave 9 7.5 9 7.5 9 7.5 9 7.5
SD 3.16 1.94 3.16 1.94 3.16 1.94 3.16 1.94
r 0.82 0.82 0.82 0.82 13
Always Check the Scatter Plots (2)
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
0
5
10
15
Dataset 1
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
0
5
10
15
Dataset 2
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
0
5
10
15
Dataset 3
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
0
5
10
15
Dataset 4
• In Dataset 2, y can be predicted exactly from x. But r < 1,
because r only measures linear association.
• In Dataset 3, r would be 1 instead of 0.82 if the outlier were
actually on the line.
The correlation coefficient can be misleading in the presence of
outliers, multiple clusters, or nonlinear association.
14
Correlation Indicates Association, Not Causation
Source: http://www.nejm.org/doi/full/10.1056/NEJMon1211064
15
Questions
• Why do both variables have to be numerical when computing
their correlation coefficient?
• If the law requires women to marry only men 2 years older
than themselves, what is the correlation of the ages between
husbands and wives?
16

More Related Content

Similar to L22.pdf

EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
mathsjournal
 
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
mathsjournal
 
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
mathsjournal
 
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
mathsjournal
 
12 x1 t01 03 integrating derivative on function (2013)
12 x1 t01 03 integrating derivative on function (2013)12 x1 t01 03 integrating derivative on function (2013)
12 x1 t01 03 integrating derivative on function (2013)
Nigel Simmons
 
Project_Report.pdf
Project_Report.pdfProject_Report.pdf
Project_Report.pdf
LuisCarvalho731494
 
Semana 21 funciones álgebra uni ccesa007
Semana 21 funciones  álgebra uni ccesa007Semana 21 funciones  álgebra uni ccesa007
Semana 21 funciones álgebra uni ccesa007
Demetrio Ccesa Rayme
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansion
Alexander Litvinenko
 
12 x1 t01 01 log laws (2013)
12 x1 t01 01 log laws (2013)12 x1 t01 01 log laws (2013)
12 x1 t01 01 log laws (2013)
Nigel Simmons
 
Splay Tree
Splay TreeSplay Tree
Nelson maple pdf
Nelson maple pdfNelson maple pdf
Nelson maple pdf
NelsonP23
 
The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...
The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...
The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...
YogeshIJTSRD
 
elec2200-6.pdf
elec2200-6.pdfelec2200-6.pdf
elec2200-6.pdf
AttareqTareq
 
computer graphics slides by Talha shah
computer graphics slides by Talha shahcomputer graphics slides by Talha shah
computer graphics slides by Talha shah
Syed Talha
 
Method for beam 20 1-2020
Method for beam 20 1-2020Method for beam 20 1-2020
Method for beam 20 1-2020
RahulMeena188
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
MuhannadSaleh
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian Processes
Dmytro Fishman
 
Chapter 6 exponents and surds
Chapter 6 exponents and surdsChapter 6 exponents and surds
Chapter 6 exponents and surds
Sarah Sue Calbio
 
scical manual fx-250HC
scical manual fx-250HCscical manual fx-250HC
scical manual fx-250HC
pearlapplepen
 
3D Transformation
3D Transformation3D Transformation
3D Transformation
SwatiHans10
 

Similar to L22.pdf (20)

EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
 
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
 
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
 
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONSEDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
EDGE-NEIGHBOR RUPTURE DEGREE ON GRAPH OPERATIONS
 
12 x1 t01 03 integrating derivative on function (2013)
12 x1 t01 03 integrating derivative on function (2013)12 x1 t01 03 integrating derivative on function (2013)
12 x1 t01 03 integrating derivative on function (2013)
 
Project_Report.pdf
Project_Report.pdfProject_Report.pdf
Project_Report.pdf
 
Semana 21 funciones álgebra uni ccesa007
Semana 21 funciones  álgebra uni ccesa007Semana 21 funciones  álgebra uni ccesa007
Semana 21 funciones álgebra uni ccesa007
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansion
 
12 x1 t01 01 log laws (2013)
12 x1 t01 01 log laws (2013)12 x1 t01 01 log laws (2013)
12 x1 t01 01 log laws (2013)
 
Splay Tree
Splay TreeSplay Tree
Splay Tree
 
Nelson maple pdf
Nelson maple pdfNelson maple pdf
Nelson maple pdf
 
The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...
The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...
The Queue Length of a GI M 1 Queue with Set Up Period and Bernoulli Working V...
 
elec2200-6.pdf
elec2200-6.pdfelec2200-6.pdf
elec2200-6.pdf
 
computer graphics slides by Talha shah
computer graphics slides by Talha shahcomputer graphics slides by Talha shah
computer graphics slides by Talha shah
 
Method for beam 20 1-2020
Method for beam 20 1-2020Method for beam 20 1-2020
Method for beam 20 1-2020
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Introduction to Gaussian Processes
Introduction to Gaussian ProcessesIntroduction to Gaussian Processes
Introduction to Gaussian Processes
 
Chapter 6 exponents and surds
Chapter 6 exponents and surdsChapter 6 exponents and surds
Chapter 6 exponents and surds
 
scical manual fx-250HC
scical manual fx-250HCscical manual fx-250HC
scical manual fx-250HC
 
3D Transformation
3D Transformation3D Transformation
3D Transformation
 

Recently uploaded

Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 

Recently uploaded (20)

Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 

L22.pdf

  • 1. STAT 22000 Lecture Slides Correlation Yibi Huang Department of Statistics University of Chicago
  • 2. Recall when we introduced scatter plots in Chapter 1, we assessed the strength of the association between two variables by eyeballs. Weak Association Strong Association ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X Y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X Y 1
  • 3. Recall when we introduced scatter plots in Chapter 1, we assessed the strength of the association between two variables by eyeballs. Weak Association Strong Association ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X Y Range of Y when X is fixed ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X Y Range of Y when X is fixed Large spread of Y Small spread of Y when X is known when X is known 1
  • 4. Correlation = Correlation Coefficient, r Correlation r is a numerical measure of the direction and strength of the linear relationship between two numerical variables. “r” always lies between −1 and 1; the strength increases as you move away from 0 to either −1 or 1. • r > 0: positive association • r < 0: negative association • r ≈ 0: very weak linear relationship • large |r|: strong linear relationship • r = −1 or r = 1: only when all the data points on the scatterplot lie exactly along a straight line −1 0 1 Neg. Assoc. Pos. Assoc. Strong Weak No Assoc Weak Strong Perfect Perfect 2
  • 5. Positive Correlations ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = 0.9 3
  • 6. Negative Correlations ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = −0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = −0.3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = −0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = −0.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = −0.95 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −1 0 1 2 3 r = −0.99 4
  • 7. Formula for Computing the Correlation Coefficient “r” (x1, y1) (x2, y2) (x3, y3) . . . (xn, yn) The correlation coefficient r (or simply, correlation) is defined as: r = 1 n − 1 n X i=1 xi − x̄ sx ! | {z } z-score of xi yi − ȳ sy ! | {z } z-score of yi . where sx and sy are respectively the sample SD of X and of Y. Usually, we find the correlation using softwares rather than by manual computation. 5
  • 8. Why r Measures the Strength of a Linear Relationship? ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x y + − + − x y xi − − x > > 0 xi − − x < < 0 xi − − x > > 0 xi − − x < < 0 yi − − y > > 0 yi − − y < < 0 yi − − y > > 0 yi − − y < < 0 What is the sign of xi − x̄ sx ! yi − ȳ sy ! ?? Here r > 0; more positive contributions than negative. What kind of points have large contributions to the correlation? 6
  • 9. Correlation r Has No Unit r = 1 n − 1 n X i=1 xi − x̄ sx ! | {z } z-score of xi yi − ȳ sy ! | {z } z-score of yi . After standardization, the z-score of neither xi nor yi has a unit. • So r is unit-free. • So we can compare r between data sets, where variables are measured in different units or when variables are different. E.g. we may compare the r between [swim time and pulse], with the r between [swim time and breathing rate]. 7
  • 10. Correlation r Has No Unit (2) Changing the units of variables does not change the correlation coefficient r, because we get rid of all the units when we standardize them (get z-scores). E.g., no matter the temperatures are recorded in ◦F, or ◦C, the correlations obtained are equal because C = 5 9 (F − 32). ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 30 35 40 45 50 55 45 55 65 75 HIGH Daily low (F) Daily high (F) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 0 5 10 10 15 20 25 HIGH Daily low (C) Daily high (C) 8
  • 11. “r” Does Not Distinguish x & y Sometimes one use the X variable to predict the Y variable. In this case, X is called the explanatory variable, and Y the response. The correlation coefficient r does not distinguish between the two. It treats x and y symmetrically. r = 1 n − 1 n X i=1 xi − x̄ sx ! yi − ȳ sy ! Swapping the x-, y-axes doesn’t change r (both r = 0.74.) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 50 50 60 70 80 Daily low (F) Daily high (F) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 60 70 80 20 30 40 50 Daily high (F) Daily low (F) 9
  • 12. Correlation r Describes Linear Relationships Only The scatter plot below shows a perfect nonlinear association. All points fall on the quadratic curve y = 1 − 4(x − 0.5)2. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.4 0.8 0.0 0.4 0.8 1.2 x y ● ● ● ● ● ● r of all black dots = 0.803, r of all dots = −0.019. (black + white) No matter how strong the association, the r of a curved relationship is NEVER 1 or −1. It can even be 0, like the plot above. 10
  • 13. Correlation Is VERY Sensitive to Outliers Sometimes a single outlier can change r drastically. ● ● ● ● ● ● ● ● ● ● ● For the plot on the left, r =          0.0031 with the outlier 0.6895 without the outlier Outliers that may remarkably change the form of associations when removed are called influential points. Remark: Not all outliers are influential points. 11
  • 14. When Data Points Are Clustered ... ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● In the plot above, each of the two clusters exhibits a weak negative association (r = −0.336 and −0.323). But the whole diagram shows a moderately strong positive association (r = 0.849). • This is an example of the Simpson’s paradox. • An overall r can be misleading when data points are clustered. • Cluster-wise r’s should be reported as well. 12
  • 15. Always Check the Scatter Plots (1) The 4 data sets below have identical x, y, sx, sy, and r. Dataset 1 Dataset 2 Dataset 3 Dataset 4 x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.96 8 8.14 8 6.77 8 5.76 13 7.58 13 8.75 13 12.76 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.10 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.10 4 5.36 19 12.50 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 Ave 9 7.5 9 7.5 9 7.5 9 7.5 SD 3.16 1.94 3.16 1.94 3.16 1.94 3.16 1.94 r 0.82 0.82 0.82 0.82 13
  • 16. Always Check the Scatter Plots (2) ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 20 0 5 10 15 Dataset 1 ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 20 0 5 10 15 Dataset 2 ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 20 0 5 10 15 Dataset 3 ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 20 0 5 10 15 Dataset 4 • In Dataset 2, y can be predicted exactly from x. But r < 1, because r only measures linear association. • In Dataset 3, r would be 1 instead of 0.82 if the outlier were actually on the line. The correlation coefficient can be misleading in the presence of outliers, multiple clusters, or nonlinear association. 14
  • 17. Correlation Indicates Association, Not Causation Source: http://www.nejm.org/doi/full/10.1056/NEJMon1211064 15
  • 18. Questions • Why do both variables have to be numerical when computing their correlation coefficient? • If the law requires women to marry only men 2 years older than themselves, what is the correlation of the ages between husbands and wives? 16