1. 13
Correlation co-efficient (r test)
CORRELATION
[Q:
Define correlation. (BSMMU, MD Radiology, January 2010,
July 2009)
Short note: Correlation & regression (BSMMU, MD
Radiology, January, 2009)]
In statistics, the word correlation refers to the relationship between
two variables. If the change in one variable effects a change in the
other variable, the variables are said to be correlated.
Sometimes two continuous characters are measured in the same
person, such as weight and cholesterol, weight and height etc. At
other times, the same character is measured in two related groups
such as tallness in parents and tallness in children, study of
intelligent quotient (IQ) in brothers and in corresponding sisters
(siblings) and so on. The relationship or association between two
quantitatively measured or continuous variables is called
correlation.
Remember, correlation does not imply causation.
The relationship between two random variables is known as a
bivariate relationship. The known variable (or variables) is called
the independent variable(s). The variable we are trying to predict
is the dependent variable.
Example: A medical researcher may be interested in the bivariate
relationship between a patient’s blood pressure x and heart rate y.
Here x is independent variable and y is dependent variable.
Type of correlation
[Q:
2. Biostatistics-126
Discuss different types of correlation with figures.
(BSMMU, MD Radiology, January, 2010)
Classify correlation with figures of each. (BSMMU, MD
Radiology, July, 2009)]
1. Positive correlation:
If the movements of the variables are in the same direction,
the correlation is called positive correlation.
In positive correlation, the two variables react in the same
way, increasing or decreasing together.
Example:
a. Height and weight of a group of people are positively
correlated
b. Temperatures in Celsius and Fahrenheit have a positive
correlation.
In perfect positive correlation, coefficient of Correlation (r) =
+1, and in moderately positive correlation 0 < r <1.
2. Negative correlation:
If the movements of the variables are in the opposite
direction, the correlation is called negative correlation.
In negative correlation, as one variable increases, the other
decreases.
Example: One variable might be the number of hunters in a
region and the other variable could be the deer population.
Perhaps as the number of hunters increases, the deer
population decreases. This is an example of a negative
correlation
In perfect negative correlation, coefficient of correlation (r)
= -1, and in moderately negative correlation -1 < r < 0.
3. Zero correlation:
If the movements of the one variable do not effect the
movement of the other variable, the variables are not
correlated and defined as zero correlation.
In zero correlation, coefficient of correlation (r) = 0.
3. Biostatistics-127
Correlation in brief
When the value of one variable is related to the value of
another, they are said to be correlated
Coefficient of Correlation (r) measures such a relationship
The value of r ranges from -1 (perfectly correlated in the
negative direction) to +1 (perfectly correlated in the positive
direction)
4. Biostatistics-128
When r = 0, the 2 variables are not correlated
How can you tell if there is a correlation?
By observing the graphs, a person can tell if there is a correlation
by how closely the data resemble a line. If the points are scattered
about then there is may be no correlation. If the points would
closely fit a quadratic or exponential equation, etc., then they have
a nonlinear correlation.
How can you tell by inspection the type of correlation?
If the graph of the variables represent a line with positive slope,
then there is a positive correlation (x increases as y increases). If
the slope of the line is negative, then there is a negative
correlation (as x increases y decreases).
Correlation coefficient
Write short note on: Coefficient correlation, (BSMMU, MD
Radiology, January, 2010)
An important aspect of correlation is how strong it is.
The extent or degree of relationship between two sets of figures is
measured in terms the parameter called correlation coefficient. It
is denoted by letter ‘r’.
Another name for r is the Pearson product moment correlation
coefficient in honor of Karl Pearson who developed it about 1900.
When two variable characters in the same series or individuals are
measurable in quantitative units such as height and weight;
temperature and pulse rate; age and vital capacity; circulating
proteins in grams and surface area in square meters; systolic and
diastolic blood pressure in mm of Hg. it is often necessary and
possible to know, not only whether there is any association or
relationship between them or not but also the degree or extent of
such relationship.
5. Biostatistics-129
Correlation co-efficient (r) test
Measures of relationship between two group variables when one is
dependent to another.
Formula
2 2
( )( )
=
( ) ( )
sum x x y y
r
sum x x sum y y
- -
- -
2 2
=
sum XY
or r
sum X sumY
When =
X x x
-
=
Y y y
-
d.f = (n1-1) + (n2-1)
When; x=one variable
Y= other variable
Example
Problem: Find out the correlation co efficient between the
following variables.
x variable: 5, 8, 12, 15.
y variable: 20,25, 28, 30.
Solution:
Following table shows the relationship between the above
variables.
x y x x
-
= X
y y
-
= Y
X2
Y2
XY r
5 20 -5 -
5.75
25 33.06 28.75 0.959
8 25 -2 -
0.75
4 0.56 1.50
12 28 2 2.25 4 5.06 4.50
15 30 5 4.25 25 18.06 20.25
x = y = Sum
X2
=58
Sum
Y2
=56.68
Sum XY
=55
6. Biostatistics-130
10 25.75
2 2
=
sum XY
r
sum X sumY
55 55
, = = = 0.959
57.336
3287.44
or r
d.f = (n1-1) + (n2-1)
= 6
r = 0.959 means strong correlation
p value at 6 d.f <0.001
null hypothesis rejected.
Strength of Correlation:
Correlation coefficient degree of association
.8 to 1 Strong
.5 to .79 moderate
.2 to .49 weak
0to .19 negligible
1. Strong positive correlation …….When `r`=0.99-0.80 i. e
>.8
2. Moderate positive correlation……When `r` = 0.79-0.70
3. Limited degree correlation…. When ‘r` = 069 - 0.50
4. No correlation or zero correlation……...When `r`= <0.5
5. Negative correlation….When `r`=.1
N. B: Extent of correlation varies between minus one and plus one
I.e. - 1< r < l.
Problem for practice
During a laboratory experiment muscular contractions of a frog
muscle were measured against different doses of a given drug. The
height of the curve was considered as the response to the drug.
The observations were as below.
7. Biostatistics-131
Serial Number of experiment
1 2 3 4 5
Dose of
drug
0.3 0.4 0.6 0.8 0.9
Response to
drug
54.0. 59.0 60.0 65.0 70.0
From the above data calculate correlation coefficient and its
significance.
[Answer: r =0.9633 p <0.01]
[Q:
Calculate the person's correlation coefficient between X
and Y variables are given below :
X = 5, 7, 10, 12 Y = 4, 6, 9, 11
(BSMMU, MD Radiology, January, 2010)
Find-out the correlation coefficient between the following
2 variable. (BSMMU, MD Radiology, January, 2009)
Variable - I (X-
variable)
10, 15, 20, 25
(n=4)
Variable-II (Y-
variable)
30, 35, 40, 45
(n=4)
The length & weight of 7 mouse are given below.
Compute 'r' and test for its significance.
Length = 2, 5, 8, 12, 14, 19, 22.
Weight = 1, 4, 3, 4, 8, 9, 8
(BSMMU, MD Radiology, January, 2010)
What Is Rank Correlation?
8. Biostatistics-132
Consider a situation where the data does not contain precise
sample values so that a measure of precision is unattainable. In
this situation, the data may be ranked (as in the GPA system,
different range of marks are ranked as different grade) in order of
size, importance, etc., using the numbers 1, 2,...,n. These statistics
are called rank-order statistics or correlations. Rank correlation
is used when the data is not presented in precise sample values.
What is the coefficient of rank correlation?
Example:
Given an example where the data values x and y are organized in
order of size. Now, the correlation coefficient can be computed
for the given numerical values which are in the form of ranks. This
coefficient of rank correlation is denoted by rrank or briefly r and
is calculated by the equation,
Where
d = differences between ranks of corresponding x and y
x = number of pairs of values (x, y) in the data
The above equation is called as the SPEARMAN'S FORMULA FOR
RANK CORRELATION.
Example: A group of 5 Army officers have participated in the
competition of both SWIMMING and RUNNING. The following
table depicts the ranks, which is in accordance with the
achievements in both the tests. This table also depicts the
difference between the ranks and the square of those differences.
OFFICER RUNNING (x) SWIMMING (y) Di Di
2
Selim 5 3 2 4
9. Biostatistics-133
Habib 2 1 1 1
Ismail 4 5 1 1
Tauhid 1 2 1 1
Mesbah 3 4 1 1
From the above table we have,
Spearman's Rank Correlation is a technique used to test
the direction and strength of the relationship between two
variables. In other words, its a device to show whether any
one set of numbers has an effect on another set of
numbers.
It uses the statistic rrank (Rs) which falls between -1 and +1.
10. Biostatistics-134
If the rrank (Rs) value is 0, null hypothesis is accepted.
Otherwise, it is rejected.
The rank correlation method can be used when
1. The values of the variables are available in rank order form.
2. The data are qualitative in nature and can be ranked in some
order.
3. The data were originally quantitative in nature but because of
smallness of sample size were converted into ranks.
What are the types of correlation coefficient? Discuss with
figure. (BSMMU, MD Radiology, January, 2009)
Pearson’s correlation coefficient or spearman's rank
correlation coefficient
When associated variables are normally distributed such as height
and weight, the Pearson’s correlation coefficient is used. When two
variables are correlated, but not normally distributed spearman's
formula for rank correlation coefficient is used.
When calculating a correlation coefficient for ordinal data, select
Spearman's technique. For interval or ratio-type data, use
Pearson's technique.
REGRESSION
In experimental sciences after having understood the correlation
between two variables, there are situations when it is necessary to
estimate or predict the value of one character (variable say Y) from
the knowledge of the other character (variable say X) such as to
estimate height when weight is known. This is possible when the
two are linearly correlated. The former variable (Y i.e., weight) to be
estimated is called dependent variable and the latter (X i.e., height)
which is known, is called the Independent variable. This is done by
finding another constant called regression coefficient (b).
People use regression on an intuitive level every day. In business, a
well-dressed man is thought to be financially successful. A mother
knows that more sugar in her children's diet results in higher
energy levels. The ease of waking up in the morning often
11. Biostatistics-135
depends on how late you went to bed the night before.
Quantitative regression adds precision by developing a
mathematical formula that can be used for predictive purposes.
For example, a medical researcher might want to use body weight
(independent variable) to predict the most appropriate dose for a
new drug (dependent variable).
Regression means change in the measurements of a variable
character, on the positive or negative side, beyond the mean.
Regression coefficient is a measure of the change in one
dependent (Y) character with one unit change in the independent
character (X). It is denoted by letter ‘b’ which indicates the relative
change (Yc) in one variable (Y) from the mean (Y ) for one unit of
move, deviation or change (x) in another variable (X) from the
mean ( X ) when both are correlated. This helps to calculate or
predict any expected value of Y, i.e., Y corresponding to X. When
corresponding values Yc1. Yc2………….. Ycn are plotted on a graph a
straight line called the regression line or the mean correlation line
(Y on X) is obtained. The same was referred to as an imaginary line
while explaining various types of correlation.
The regression technique is primarily used to
1. Estimate the relationship that exists, on the average, between
the dependent variable and the explanatory (independent)
variable.
2. Determine the effect of each of the explanatory variables on
the dependent variables, controlling the effect of all the
explanatory variables.
3. Predict the value of the dependent variable for a given value
of the explanatory variable.
Types
Three types of regression models are fundamental to
epidemiological research:
1. linear regression
2. logistic regression
12. Biostatistics-136
3. Cox proportional hazards regression, a type of survival
analysis.
Linear regression: Here the dependent variable is a continuous
measure (such as body weight) with its frequency distribution
being the normal distribution. and the independent variables may
be both continuous and categorical.
Logistic regression: the dependent variable is derived from the
presence or absence of a characteristic,
Cox proportional hazards: the dependent variable represents the
time from a baseline of some type to the occurrence of an event of
interest.
[Reference: Bonita R, Beaglehole R, Kjellström T 2006. Basic
epidemiology, 2nd
edition, WHO.]
Difference between correlation and regression analysis
There are two important points of differences between correlation
and regression analysis.
1. Whereas correlation coefficient is a measure of degree of
relationship between x and y, the objective of regression
analysis is to study the nature of relationship between the
variables.
2. The cause and effect relation is clearly indicated through
regression analysis than by correlation. Correlation is merely a
tool of ascertaining the degree of relationship between two
variables and, therefore, we can not say that one variable is the
cause and the other the effect.
Scatter diagram
The graphical representation of bivariate data is called scatter
diagram. The graph of the data obtained by the values of the
variables x and y along the x-axis and y-axis respectively in the x-y
plane gives the scatter diagram.
13. Biostatistics-137
From the scatter diagram it can be evidently ascertained whether
there is any correlation existing among the variable or not. if there
exits correlation, types of correlation can also be ascertained.
Utilities of scatter diagram
1. It is simple and non mathematical method of studying
correlation between the variables. As such it can be easily
understood.
2. It is not influenced by the size of the extreme values whereas
most of the mathematical methods of finding correlation are
influenced by extreme values.
3. Making a scatter diagram usually is the first step in
investigating the relationship between the variables.