05/04/14 Dr Tarek Amin 1
Investigating the Relationship
between Two orMore Variables
Professor Tarek Tawfik Amin
Public Health, Faculty of Medicine
The Relationship Between Variables
Variables can be categorized into two types when investigating
A dependent variable is explained oraffected
by an independent variable. Age and height
Two variables are independent if the pattern of
variation in the scores forone variable is not
related orassociated with variation in the scores
The level of education in Ecuadorand the infant
mortality in Mali
Techniques used to Analyze the Relationship between Two
I- Tabularand graphical methods:
These present data in way that reveals a
possible relationship between two
Mathematical operations used to quantify,
in a single number, the strength of a
relationship (measures of association).
When both variables are measured at least
at the ordinal level they also indicate the
direction of the relationship.
Bivariate table for categorical data
Scatter plot for interval/ratio.
Lambda, Cramer’s V (nominal)
Gamma, Somer’s d, Kendall’s tau-b/c
(ordinal with few values)
Spearman’s rank order Co/Co.
(ordinal scales with many values)
Pearson’s product moment correlation
These techniques are called collectively as
Bi-variate descriptive statistics
o Correlational techniques are used to study
o They may be used in exploratory studies in
which one to intent to determine whether
o And in hypothesis testing about a particular
Correlations techniques used to
and the strength
of association between
Pearson Correlation (Numeric, interval/ratio)
The Pearson product moment correlation coefficient (rorrho)
is the usual method by which the relation between two
variables is quantified.
Type of data required:
Interval/ratio sometimes ordinal data.
At least two measures on each subjects at the
The sample must be representative of the population.
The variables that are being correlated must be normally
The relationship between variables must be LINEAR.
Directions of Correlations on ScatterPlot
05/04/14 Dr Tarek Amin 8
Relationships Measured with Correlation Coefficient
The correlation coefficient is the cross products
of the Z-scores.
[ ]( )nzXzYr ∑=
ZX= the z-score of variable X
ZY= the z-score of variable Y
N= number of observations
Because the means and standard deviations
of any given two sets of variables are
different, we cannot directly compare the
However, we can, transform them from the
ordinary absolute figures to Z-scores with a
mean of 0 and SDof 1.
The correlation is the mean of the cross-
products of the Z-score foreach value
included, a measure of how much each pair
of observations (scores) varies together.
Correlation Coefficient (r)
The correlation coefficient r allows us to
state mathematically the relationship that
exists between two variables. The correlation
coefficient may range from +1.00 through 0.00 to – 1.00.
A + 1.00 indicates a perfect positive
0.00 indicates no relationship,
and -1.00 indicates a perfect negative
I-Strength of the Correlation Coefficient
How large r should forit to be useful?
In decision making at least 0.95 while those concerning
human behaviors 0.5 is fair.
The strengths of r are as follow:
0.00-0.25 little if any.
0.26 -0.49 LOW
0.50- 0.69 Moderate
0.70 - 0.89 High
0.90 – 1.00 Very high .
II-Significance of the Correlation
The level of statistical significance is greatly
affected by the sample size n.
If r is based on a sample of 1,000, there is much
greaterlikelihood that it represents the r of the
population than if it were based on 10 subjects.
‘ With large sample sizes rs that are described as
demonstrating (little if any) relationship are
Statistical significance implies that r
did not occurby chance, the
relationship is greaterthan zero.
- The correlation coefficient also tell us the type
of relation that exists; that is, whetheris
- The relationship between job satisfaction and job
turnoverhas been shown to be negative; an
inverse relationship exists between them.
When one variable increases, the other decreases.
- Those with highergrades have lowerdropout rates
(a positive relationship).
Increases in the score of one variable is accompanied by
increase in the other.
III- Direction of correlation
Relationships Measured by Correlation
When using the formula with Z-scores, ris the
average of the corss-products of the Z-scores.
[ ]( )nzXzYr ∑=
A five subjects took a quiz X, on which the scores ranged from
6to 10 and an examination Y, on which the scores ranged form
Calculate r and determine the pattern of correlation?
05/04/14 Dr Tarek Amin 16
Formula forcalculating correlation coefficient r.
[ ]( )nzXzYr ∑=
A perfect positive relationship between two variables.
Subjects X (quiz) Y
zX zY zX*zY
mean X= 8, SD=1.41 mean Y= 90 sd=5.66 ∑zXzY= 5.00
r= ∑zXzY/n =
5.00/5 = +1
The following table is SPSS output describing the correlation between age, education in years,
smoking history, satisfaction with the current weight, and the overall state of health fora randomly
Education in years
Satisfaction with current
Overall state of health
*Correlation is significant at the 0.05 level (2-tailed(.
** Correlation is significant at the 0.01 level (2-tailed).
Figure (1): Insulin resistance (HOMA-IR) in relation to
serum ferritin level among cases and controls.
Figure (2): 1,25 (OH) vitamin D in relation to body mass
index among obese and lean controls.
Body mass index
r= -.166, P=0.036