1. 14-12-13
Magdy Ibrahim Mostafa
Prof. Obstetrics & Gynecology, Faculty of Medicine, Cairo University
Director; Research, Biostatistics & IT Units, MEDC, Cairo University
Management member; EBM Unit, MEDC, Cairo University
Scientific Council Member, Egyptian IT Fellowship
Board Member, Egyptian Ob/Gyn Fellowship
Associate Editor; Kasr Al Aini Journal of Obstetrics and Gynecology
Peer Reviewer; Gyn Endocrin J, Gyn Oncol J, Obstet Gynecol Invest Journal
Peer Reviewer; Cairo University Medical Journal, Kasr El Aini Medical Journal, MEFS Journal.y
1
2. 14-12-13
Correlation
In two series of numerical data
Age
Height
Age
BMD
The values in one variable may vary
correspondingly with the other one
Correlation: + ve OR - ve
When the two variables increase & decrease in parallel
(Same direction)
positive correlation.
When one goes up the other goes down proportionally
(Opposite directions)
negative correlation
Correlation = Causation
2
3. 14-12-13
Importance of correlation
1. Facilitates difficult measures
2. Study of effectors:
• Dependent variable (outcome)
• Independent variable(s) (predictors or effectors)
Correlation between payment & working hours
Scatter diagram
3
4. 14-12-13
Correlation between Payment & working hours
Conclusion:
1. As working hours increase payment increase
Positive correlation (proportionate correlation)
2. The increase in payment is constant in relation to
increase in working hours Linear correlation
Correlation between TV watch & school grade
Scatter diagram
4
5. 14-12-13
Correlation between TV watching & school grade
Conclusion:
1. As TV watching hours increase, the final school grade
decrease
Negative correlation (inverse correlation)
2. The decrease in grade is constant in relation to
increase in TV watching hours Linear correlation
Correlation between Age and Height
Scatter Diagram
5
6. 14-12-13
Correlation between Age and Height
Scatter Diagram
1200
3000
1000
Distance before discomfort (m)
3500
2000
1500
1000
800
600
400
200
500
22
24
26
28
30
32
34
36
38
40
0
42
45
Gestational age (weeks)
Positive linear correlation
50
55
60
65
70
75
Negative linear correlation
Age (years)
170
200
150
Amniotic fluid volume (ml)
210
190
Height (cm)
Weight (g)
2500
180
170
160
130
110
90
70
150
28
33
38
Age (years)
43
No correlation
48
50
15
20
25
30
35
40
45
Gestational age (weeks)
Non linear correlation
6
7. 14-12-13
Non-Linear Correlation
120
Height
120
Height
Straight line
100
Curve
100
80
80
60
60
40
40
20
20
Curve is Closer to points
0
0
0
5
10
15
0
5
10
Age (Years)
Age (Years)
Linear
15
Non-Linear
The correlation coefficient
(meaning and magnitude)
Examining plots is a good way to determine the nature and
strength of the relationship between two variables
However, you need an objective measure to replace subjective
descriptions like strong, weak, I can't make up my mind, and
none
7
8. 14-12-13
The correlation coefficient
(meaning and magnitude)
Mathematically, correlation is represented by
what is known as:
correlation coefficient
The correlation coefficient ranges from:
“0” (means no correlation) to 1 (perfect correlation)
The sign is for the direction and not a value
The correlation coefficient
(interpretation)
Interpretation of “cc”:
From 0 to 0.25 (-0.25) = little or no relationship
From 0.25 to 0.50 (-0.25 to 0.50) = fair
From 0.50 to 0.75 (-0.50 to -0.75) = moderate to good
Greater than 0.75 (or -0.75) = very good to excellent
Strong relation may not be clinically important
8
9. 14-12-13
The correlation
Does NOT tell us if Y is a function of X
Does NOT tell us if X is a function of Y
Does NOT tell us if X causes Y
Does NOT tell us if Y causes X
Coefficient does NOT tell us what the scatterplot looks like
Correlation between Age and Height
Strength of correlation
95
180
Height (Cm)
185
90
175
85
170
80
165
75
160
70
155
65
60
150
0
20
40
60
80
0
2
4
Age (Years)
8
Age (Years)
cc = 0.012
Weak
6
cc = 0.983
Strong
9
10. 14-12-13
Correlation between Age and Height
Direction of correlation
95
Number of cold episods/year
7
Height
90
6
85
5
80
4
75
3
70
2
65
1
60
0
2
4
6
80
0
10
Age (Years)
cc = 0.983
Positive
20
30
40
50
60
Exposure to Sun (h/week)
cc = - 0.73
Negative
Correlation coefficient
Y Dependent variable
+1
0
-1
X Increases
Y Increases
X Change
Y Not Follow
X Increases
Y Decreases
X Independent variable
10
11. 14-12-13
Which test?
Linear correlation
Normal data
Pearson product
moment
correlation (r)
Non-normal data
Spearman
correlation (R)
The Pearson correlation
Is a measure of the strength of the linear correlation between
two variables in one sample
“r” indicates:
Strength of relationship (strong, weak, or none)
Direction of relationship
from 0 to 1
either (-)ve or (+)ve
11
12. 14-12-13
Pearson Correlation
Assumptions:
1. Variables are quantitative or ordinal
2. Normally distributed variables
3. Linear relationship (monotonic + constant change)
The Pearson “r” is
Symmetric, since the correlation of x and y is the same as the
correlation of y and x
Unaffected by linear transformations, such as adding a constant
to all numbers or dividing all numbers by a constant
WARNING: Never compute correlation coefficients for nominal
variables, even if they are nicely coded with numbers. A
correlation between governorate and income is meaningless
12
13. 14-12-13
The Pearson correlation
“r” is a measure of LINEAR ASSOCIATION
When “r” = ZERO
This means NO LINEAR CORREATION –
this does NOT mean there is NO CORRELATION
13
16. 14-12-13
Pearson Correlation
Pearson “r” is an appropriate summary measure for the first plot
only, since data are near a straight line
In the second plot, the relationship is not linear, so it doesn't
make sense to describe how tightly the points cluster around a
straight line
In the third plot, the perfect relationship is distorted by an
outlier point
In the fourth plot, there appear to be two subgroups of cases in
which there is no linear relationship between the two variables
16
17. 14-12-13
Pearson Correlation
If you don't plot your data, you can't tell whether a correlation
coefficient is a good summary of the relationship
The value of a correlation coefficient also depends on the range
of values for which observations are taken
Even if there is a linear relationship between two variables, you
won't detect it if you consider a small range of values of the
variables
For example, height may be a poor predictor of weight if you
restrict your range of heights to those over six feet
No extrapolation
17
18. 14-12-13
Pearson Correlation
Limitations:
Linearity: Can’t describe non-linear relationships (most
biological relations)
Truncation of range: Underestimate strength of relationship
if you can’t see full range of x value
No proof of causation
Testing hypothesis
Pearson correlation coefficient describes the correlation between
the sample observations on two variables in the same way that ρ
describes the relationship in a population
Thus we need to knowing if we may conclude that ρ # 0
The hypotheses are:
H0: ρ = 0 (no correlation in the population)
Ha: ρ ≠ 0 (there is correlation in the population)
18
19. 14-12-13
Testing hypothesis
The test used is the t test (revise t test uses)
Statistically significant doesn’t mean clinically important or
useful
If you are examining many correlations coefficients, have to use
the Bonferroni adjustment
Coefficient of determination
The square of Pearson cc, r2, is the proportion of variation in
the values of y that is explained by the regression model with x
Amount of variance accounted for in y by x
Percentage increase in accuracy you gain by using the
regression line to make predictions
0 ≤ r2 ≤ 1 (100%)
The larger r2 , the stronger the linear relationship
The closer r2 is to 1, the more confident we are in our
prediction
19
20. 14-12-13
Coefficient of determination
Example
Topography of adipose tissue (AT) is associated with
metabolic complications considered as risk factors for
cardiovascular disease
To measure the amount of intraabdominal AT as part
of the evaluation of the cardiovascular-disease risk of
an individual. Computed tomography (CT), the only
available technique that precisely and reliably
measures the amount of deep abdominal AT, however,
is costly, exposes the subject to irradiation and is not
available to many physicians
20
21. 14-12-13
Example
Despres and his colleagues conducted a study to
develop equations to predict the amount of deep
abdominal AT from simple anthropometric
measurements
Among the measurements taken on each subject were
deep abdominal AT obtained by CT and waist
circumference. The question of interest is how well
can deep abdominal AT correlates to waist
circumference
Spearman Correlation
It is a measure of the strength and direction of association that
exists between two variables measured on at least an ordinal
scale
It is denoted by the symbol rs, R
The test is used for either ordinal variables or for interval/ratio
data that has failed the assumptions necessary for conducting
the Pearson's product-moment correlation
The values of the variables are converted in ranks and then
correlated
21
22. 14-12-13
Spearman Correlation
Assumptions:
1.
Variables are measured on an ordinal, interval or ratio scale
2. Variables need NOT be normally distributed
3. There is a monotonic relationship (either the variables increase
in value together or as one variable value increases the other
variable value decreases) but linearity is not needed
4. This type of correlation is NOT very sensitive to outliers
SPSS work
22