2. Introduction
Correlation, the concept of a relationship or dependence
between variables, transcends statisti-
cal analysis. Cloudy days are related to (correlated with) cooler
temperatures. Natural disasters
are related to declines in the stock market. An impending test is
related to the need to study, and
grinding noises in the engine compartment of a car are usually
related to repair bills.
Some relationships are stronger than others, so statistical
procedures have been developed
to quantify, or numerically gauge, the strength of the
relationship between two variables. The
numerical indicators are called correlation coefficients, and one
of the most common is the
Pearson correlation coefficient, which indicates the strength of
the relationship between
interval- or ratio-scale variables. The name Pearson refers to
Karl Pearson, whose impact not
just on studying correlation but on statistical analysis generally
may be greater than that of
any other individual.
In the early years of the 20th century, Pearson founded the first
department of statistical analy-
sis at University College London. Under Pearson’s direction,
the department attracted, among
others, William Sealy Gosset of t test fame; Ronald Fisher, who
produced analysis of variance;
and Charles Spearman, for whom an alternative correlation
coefficient is named, as well as an
elegant statistical procedure based on correlation called factor
analysis. To put it succinctly, it is
difficult to overstate the impact that Pearson had on the
evolution of statistical analysis.
3. A man of fierce independence, Pearson’s education at
Cambridge centered in religion and
philosophy rather than mathematics. As a student of religion, he
sued the university over the
compulsory chapel attendance required of all undergraduates.
Winning his suit brought a
change to university rules—after which Pearson chose to attend
chapel. His graduate work (in
Germany) emphasized literature, and it is a testimony to his
extraordinary breadth of talent
that his greatest contributions would be in statistical analysis.
Pearson was a contemporary
of Einstein, who sought a grand theory that would unite all of
physics. Pearson tried to do the
same with mathematics. That both men were disappointed in
these efforts should not detract
from what they did accomplish. Although Pearson’s associations
with his colleagues were not
always harmonious, he and the others who found an academic
home in his department virtu-
ally defined modern quantitative analysis. Whether or not they
realize it, almost all of those
who crunch numbers for any length of time rely on their work.
8.1 The Hypothesis of Association
Previous chapters concentrated on tests of significant
difference. The z test, the t tests, analy-
sis of variance, and the repeated-measures designs test the
differences between groups. They
all fall under a general assumption referred to as the hypothesis
of difference. But some
kinds of analyses do not involve questions about whether there
are significant differences
between groups.
5. water is common to each experience, water must be
the cause.
A classic study demonstrates, among other things, a correlation
between the sale of ice cream
by vendors on city streets and burglaries in the same city.
Someone rushing to judgment
about cause might wish to curb ice cream sales or check the
criminal records of ice cream
vendors to reduce the number of burglaries. Such an individual
does not recognize that hot-
ter weather—and the open windows that result—probably drive
both ice cream sales and
burglaries. It is not unusual for some third variable to explain
an association between a first
and a second. Although correlation values provide some
evidence for causation, correlation
alone is rarely sufficient to demonstrate cause.
Scatterplots
Breaking down the word correlation—co-relation—makes its
meaning clear: the variables
are related. The evidence for the relationship is that the
characteristics co-vary. As the level of
one variable changes, the other changes as well because both
variables contain some of the
same information. The higher the correlation, the more common
information they contain.
A researcher gathers verbal ability and intelligence scores for
12 subjects and presents them
in Table 8.1. Note that the first participant has a verbal ability
score of 20 and an intelligence
score of 80. Scanning the two rows of data, we can see that as
the values of one score increase,
so do those of the other. In other words, there appears to be a
7. Intelligence: 80 95 90 100 100 100 110 115 120 115 110 125
Figure 8.1: The relationship between verbal ability and
intelligence
In the Figure 8.1 scatterplot, intelligence scores are plotted
along the vertical, or y, axis and
the verbal ability scores are plotted along the horizontal, or x,
axis. Each diamond-shaped
point in the graph, then, represents an intelligence score and a
verbal ability score.
The plot verifies what our cursory view of the two rows of data
in the table suggested: A posi-
tive correlation exists between measures of intelligence and
those of verbal ability. The gen-
eral trend is from lower left to upper right. As the value of one
variable increases, the value of
the other tends to do likewise. The incline is not dramatic, but
the graph shows a general rise
in the data points.
Less-than-Perfect Relationships
The relationship certainly is not perfect. The fourth, fifth, and
sixth participants all have the
same level of intelligence but different levels of verbal ability.
The same is true of participants
8 and 10, as well as participants 7 and 11. Still, there is a
general lower-left to upper-right
relationship, which might be expected. Brighter people often
have more complex language
patterns, something suggested by higher verbal-ability scores.
It also is not surprising that the relationship between
intelligence and verbal ability is less
than perfect. An extensive vocabulary alone is no guarantee of
9. Although perfect correlations are rare when dealing with
people, that is not necessarily the
case elsewhere. Mathematicians, for example, enjoy the
stability of perfect relationships; the
formula for the area of a circle, A 5 πr2 (where the area is
found by multiplying the value of pi
by the square of the radius), works for circles of any size
because a perfect relationship exists
between a circle’s radius and its area.
Still, even imperfect correlations, such as those related to
human-subjects research, can be
very important. If health professionals know a correlation, even
a weak one, exists between
exposure to secondhand smoke and the later development of
respiratory problems, they can
warn against such exposure. In that particular instance, by the
way, the research supports the
causal assumption. If educators know there is a correlation
between how much homework
students do and their success on a high school exit exam,
educators can encourage students to
complete more assignments. The instructors expect that pass
rates will rise as a consequence.
In the case of homework and exit exam scores, however, a
causal relationship is not as clear.
Perhaps people who have a higher level of academic
achievement do more homework and
have higher exit exam scores. That suggests the academic
achievement is the causal element
rather than the homework. Maybe the increased homework is the
manifestation of that other
variable, academic achievement, or perhaps parental
involvement is the causal factor—stu-
11. involvement with video-gaming while a text passage
is read to subjects is probably associated with lower
retention of the details of the text passage; as the value
of one increases, the value of the other declines. A cor-
relation of 0 indicates no relationship—fluctuations in
the value of one variable are unrelated to changes in the
value of the other. Values less than the absolute value of
1.0, but greater than 0, indicate imperfect relationships, with
the strength of the relationship
declining as the value approaches 0.
Correlating two variables does not require that they both
measure the same characteristic
or even that they both be gathered from the same subjects.
Often, entirely different kinds
of things are correlated. The example of secondhand smoke and
respiratory issues involves
two completely different variables, but the strength of the
relationship between them can be
calculated nevertheless. As long as the two variables can be
quantified—reduced to a num-
ber—the strength of any relationship can be determined.
Requirements for the Pearson Correlation
Researchers may employ any of several different correlation
procedures. The appropriate
procedure for a particular problem is determined by
characteristics such as the scale and
normality of the data involved. The Pearson correlation, for
example, requires variables of
either interval or ratio scale. Nominal or ordinal scale data can
be correlated as well, but they
involve other correlation procedures. In addition to interval or
ratio data, the Pearson corre-
lation also requires the following:
12. • In their populations, the characteristics are assumed to be
normally distrib-
uted. Normal distributions can never be reflected in relatively
small samples, but
researchers must have reason to believe that the samples come
from populations
that are normal.
• The distributions from which the samples come must be
similarly distributed.
• The two samples are assumed to be randomly selected
from their populations.
• The relationship between the variables must be linear; it
remains constant through-
out their ranges.
Recall that normality is indicated when the standard deviation is
about one-sixth of the range,
the measures of central tendency all have about the same value,
and so on (Chapter 2). The
way data are distributed in the scatterplot also suggests the
normality of the two variables
involved in a correlation. When both variables are normal, the
points in the plot will be dis-
tributed from left to right, with the frequency of the points
gradually increasing toward the
middle of the graph and then gradually decreasing to the right
extreme. If the relationship is
positive (example A in Figure 8.2), the scatter is generally from
lower left to upper right. If
it is negative (example B in Figure 8.2), the graph follows a
pattern from upper left to lower
right. If the variables have no correlation (example C in Figure
8.2), the points fall into a cir-
14. 0
0 42 86 10
5
15
10
5
0
0 5 10 15 20
Section 8.1 The Hypothesis of Association
greater frequency in the middle of the circle
reflects the fact that most of the data in any
normal distribution occur near the middle of
the distribution. (The pattern in our example
does not look circular because so few data are
present.)
The similar-distribution requirement does
not mean that the standard deviations should
be the same. That is not likely to happen
unless both variables are measured along the
same range. It means that the standard devia-
tions should account for similar proportions
of their respective ranges.
The strength of a correlation is affected by
range attenuation. When the range of scores
in either variable is artificially abbreviated,
the correlation value will be artificially low.
Range attenuation can be indicated by a stan-
15. dard deviation that is substantially smaller
than we know it to be in the population. If
we were correlating intelligence scores with
reading comprehension, and the intelligence
scores have a standard deviation of 8 points
when we know that the population standard
deviation is 15 points, we can expect any resulting correlation
value to be artificially low. One
of the advantages of random selection is that random samples of
a reasonable size tend to
mirror their populations reasonably well. Range restriction
problems are much less likely to
occur with randomly selected samples.
Linear and Nonlinear Correlations
When the relationship between two variables is linear, it means
that the degree to which they
change in concert with each other is the same throughout their
ranges; if it is low and posi-
tive, it is low and positive at low levels of both variables and at
higher levels of both variables.
Some correlations, however, are not linear. Consider the
correlation between anxiety and the
quality of a musician’s performance. In that instance, a little
anxiety is probably a good thing.
It prompts the individual to prepare for the performance by
practicing, studying the music
carefully, asking others for feedback, and so on. Without
anxiety, the musician might not make
the necessary preparations. It seems likely that, at least in the
early going, the quality of the
performance improves as anxiety increases.
But it is possible that if anxiety continues to increase, the
individual’s performance may reach
16. a plateau and then begin to diminish. The musician may become
so anxious that concentra-
tion is difficult and performance declines, with more anxiety
actually depreciating the quality
of the music. These conditions describe a relationship that is
curvilinear. It is illustrated in
Table 8.2, where anxiety is gauged as a function of someone’s
increasing pulse rate in beats
Figure 8.2: Scatterplots for positive,
negative, and zero correlations
30
20
10
0
0 5 10 15
30
20
10
A. A positive Correlation
B. A negative Correlation
C. A Zero Correlation
0
0 42 86 10
19. mance than the last six pairs of scores. The first part of the
distribution makes the relation-
ship look linear and positive. The latter part of the data makes
the relationship look linear but
negative. An accurate picture of the relationship requires data
throughout the entire ranges
of the two variables.
Understanding Correlation Values
It is important not to confuse the sign of the correlation (1 or 2)
with its strength. A corre-
lation of 20.50 contains the same amount of information about
the two variables as does a
correlation of 10.50. The sign makes a great deal of difference
how the relationship is inter-
preted, but it has nothing to do with the strength of the
relationship. With positive correla-
tions, as the value of one variable increases so does the value of
the other. When correlations
are negative, increasing values of one variable are associated
with decreasing values of the
other.
Earlier we noted that different scales of data require different
types of correlation proce-
dures. The number of variables involved also dictates the need
for different correlation
procedures:
• Bivariate correlations indicate the relationship between
two variables. For exam-
ple, the correlation between intelligence and verbal aptitude is a
bivariate correla-
tion. This chapter focuses on bivariate correlations.
• Multiple correlation gauges the relationship between one
20. variable and a combina-
tion of others. For example, the correlation between a combined
reading compre-
hension and vocabulary measure with an analytical-ability
measure would indicate
how well reading comprehension and vocabulary ability,
combined, correlate with
analytical ability.
• Canonical correlation measures the relationship between
two groups of variables.
For example, determining how a combination of reading
comprehension and vocab-
ulary ability and a combination of analytical ability and
problem-solving ability
relate calls for a canonical correlation.
• Partial correlation measures the relationship between two
variables after neu-
tralizing the influence of some third variable on both of the first
two. For example,
a correlation of analytical ability with problem-solving ability,
with the influence of
age controlled in both of the other variables, eliminates age
differences as a factor
in the resulting correlation. In effect, a partial correlation would
be the correlation
of analytical ability with problem-solving ability as if all
subjects were the same
age.
• Semipartial correlation gauges the relationship between
two variables after neu-
tralizing the influence of a third on either of the first two. For
example, a correlation
of intelligence with verbal aptitude, with age differences
22. rxy 5
∑[(zx)(zy)]
n 2 1
Note that the r symbol has x and y subscripts. These indicate
that the procedure correlates
two variables designated x and y. Which variable is assigned x
and which y is unimportant,
since correlation does not presume that the x variable causes y,
for example. Formula 8.1
indicates that if the x and y scores are transformed into z scores
(Formula 3.1: z 5 x 2 Ms ), the
value of rxy, (the correlation value) is the sum of the products
of the x and yz scores for each
participant, divided by the number of participants in the data
group (rather than the number
of scores), minus 1.
The n 2 1 signifies that this is a correlation formula for sample,
rather than population, data.
It is the same adjustment for sample data made with the
standard deviation calculation in
Chapter 1. Formula 8.1 can be used to calculate the correlation
value of the verbal ability
and intelligence scores from the earlier example. Calculating
the equivalent verbal-ability and
intelligence z values with Formula 3.1 produces the z values for
the original raw scores listed
in Table 8.3.
Here, each pair of z scores is multiplied and the products
summed:
(21.991 3 21.902) 1 (21.212 3 20.761) 1 . . . 1 (1.385 3 1.522) 5
24. resale or redistribution.
Section 8.2 Calculating the Pearson Correlation
With a maximum possible correlation value of 1.0, rxy 5 0.938
indicates a strong relationship
between verbal ability and intelligence, something that is
reflected in the fact that many intel-
ligence tests include subtests of verbal ability.
Although Formula 8.1 is visually simple, the need to transform
everything into z scores before
calculating rxy makes the calculations very time consuming and
tedious. Completing the cal-
culations by hand takes too much time. Formula 8.2, the
formula we will use, turns out to be
the formula programmed into many hand-calculators. It is
visually more complex but much
easier to execute:
Formula 8.2
rxy 5
n∑xy 2 (∑x)(∑y)
Î {[n∑x2 2 (∑x)2][n∑y2 2 (∑y)2]}
where
x 5 one of the scores in each pair as above in the z score
formula.
y 5 the other score in the pair.
n 5 the number of participants (the number of pairs of scores).
25. ∑xy indicates that each pair of scores is multiplied and then the
products for each pair
summed. The resulting value is the “sum of the cross-products.”
∑x2 indicates that each x score is squared, and then the squares
summed.
(∑x)2 indicates that the original x scores are totaled, and then
the total is squared.
∑y2 indicates that each y score is squared, and then the squares
summed.
(∑y)2 indicates that the original y scores are totaled, and then
the total is squared.
The formula is not as daunting as it appears. The
process will become familiar after a few problems.
Probably Excel or a hand-calculator with a built-in
correlation function will perform most of the statis-
tical “heavy-lifting,” but it is helpful to prepare for
that occasional time when there is no computer and
the calculator has no correlation function.
A Correlation Example
A researcher is duplicating a classic experiment by
psychologist E. L. Thorndike. The experiment relates
to Thorndike’s Law of Effect, which maintains that
behaviors followed by a satisfying state of affairs will
likely be repeated. In the experiment, the researcher
sets up a cage equipped with a door that opens if a
cat placed in the cage bats a string suspended inside
iStockphoto/ Thinkstock
28. rxy 5
10(137.15) 2 (55)(33.5)
Î {[10(385) 2 (55)2][10(141) 2 (33.5)2]}
5
1372.5 2 1842.5
Î [(3850 2 (3025)][(1410 2 1122.25]
5
2470
Î (825 3 287.75)
5 20.965
Interpreting Results
The relationship is indeed negative and because the maximum
correlation is 61.0, the rela-
tionship is also very strong. Neither of those conclusions
indicates whether the result is sta-
tistically significant, however. As with z, t, and F, significance
is determined by comparing
the calculated value to the table value indicated by the relevant
degrees of freedom and the
selected level of probability. A calculated correlation value for
which the absolute value is as
large is one that probably did not occur by chance. For the
Pearson correlation, the values are
in Table 8.5 (see also Table B.5 in Appendix B).
Like the t and F values, the correct critical value for r is
determined by degrees of freedom
and by the level of probability the researcher selects. The
degrees of freedom for a Pearson
correlation are the number of pairs of data, minus 2. Be careful
30. Section 8.2 Calculating the Pearson Correlation
Researchers most commonly settle on p 5 0.05 or 0.01. The p 5
0.1 occurs in statistical tables
less often because in most research settings, a one-in-ten chance
of a random correlation is
too great. No one wants to conclude that a correlation is not
statistically significant when
there is too much chance that the finding will not hold up under
further investigation. In
exploratory or descriptive research when there is little prior
research on which to rely, how-
ever, sometimes investigators will relax the probability to p 5
0.1.
Table 8.5: The critical values of rxy
Number of
xy pairs (n) df (n 2 2)
Lowest statistically significant correlation
for the specified probability
p 5 0.10 p 5 0.05 p 5 0.01
3 1 0.988 0.997 1.000
4 2 0.900 0.950 0.990
5 3 0.805 0.878 0.959
6 4 0.729 0.811 0.917
7 5 0.669 0.754 0.875
8 6 0.621 0.707 0.834
33. The Statistical Hypotheses
The null and alternate hypotheses for correlation reflect the fact
that we have moved away
from the hypothesis of difference. The null hypothesis is that no
relationship between the
variables exists. Symbolically, it is written: H0: ρ 5 0.
The symbol ρ is the Greek letter rho (as in
“row” your boat)and the equivalent of r. So
the
null hypothesis states that the correlation (r) equals 0. More
specifically, it means that there
is no statistically significant relationship. The alternate
hypothesis states that the correlation
does not equal 0, that a statistically significant relationship will
emerge each time data are
collected and the relationship calculated: HA: ρ ? 0.
The Coefficient of Determination
One of our important recurring themes is the distinction
between statistical significance and
practical importance. Determining practical importance was the
reason for omega-squared
and eta-squared calculations for significant t test and ANOVA
results, respectively.
Effect sizes take on particular importance with correlation
because with large samples, rela-
tively small correlations can be statistically significant. The
effect size corresponding to the
Pearson correlation is the coefficient of determination (rxy2).
As the notation suggests, the
coefficient of determination is the square of the correlation
coefficient. Squaring the correla-
tion indicates how much of the variance in y is explained by x
(or vice versa since correlation
35. When the variables describe the behavior of people, small
coefficients of determination do
not surprise us because they are part of human subjects’
complexity. Very few individual vari-
ables will explain large proportions of human behavior.
Sometimes, however, even low correlations and low rxy2 values
are important. If research
revealed that the correlation between the age of first exposure
to illegal narcotics and the
development of an addiction was rxy 5 20.3, that value (note the
negative correlation) indi-
cates that the younger subjects are at first exposure, the more
likely they are to develop an
addiction. The resulting rxy2 value would be just 0.09. But even
if just 9% of the variance
in addiction is explained by age at first exposure, within the
context of human complexity that
would be considered important. Practical importance is a
function of consequences.
Comparing Correlation Values
In isolation, correlation coefficients can be difficult to interpret
because correlation strength
does not increase or decrease in consistent increments. The
change from rxy 5 0.2 to
rxy 5 0.3 is a less dramatic increase in strength than the
increase from rxy 5 0.75 to
rxy 5 0.85, for example. Although the Pearson r requires equal
interval data, in the coefficients
that are the result, an increase in correlation strength of 0.1
reflects a very different change
from 0.8 to 0.9 than it does from 0.2 to 0.3. It takes a much
stronger increase in the relation-
ship to increase by 0.1 in the upper ranges of correlation values
than in the lower ranges,
39. Dichotomous
Problem 8.1 suggests some of the hazard in rushing
to judgment about cause from correlation data. While
we might be tempted to reduce the problem to “older
people contribute more to charity than younger peo-
ple,” other factors are probably at work, not the least
of which is that age likely correlates with income as
well. Perhaps it is not age that explains contribution
amount so much as income. The correlation value,
while instructive and important, indicates only how variables
co-vary, not necessarily why
the variables involved vary.
8.3 Correlating Data When One Variable Is Dichotomous
If the consultant had asked how the donation amount and the
donor’s gender relate, Pear-
son still provides the answer, but the procedure becomes a
point-biserial correlation. The
word point refers to the continuous variable, the amount of
money donated in this exam-
ple. The word biserial refers to the other variable, which has
only two levels. The required
change is coding the gender variable in a way that reflects its
dichotomy: as either 0 or
1. Which of females or males are coded 0 and which 1 will not
affect the strength of the
coefficient.
The point-biserial correlation has a number of applications.
Questions about the relation-
ship between marital status and income, between public versus
private school students and
achievement, or between Republicans’ and Democrats’
optimism are all questions that could
42. rxy 5
15(710) 2 (8)(1.195)
Î {[15(8) 2 (8)2][15(133,425) 2 (1,195)2]}
5 0.19
Still testing at p 5 0.05 and with the degrees of freedom still df
5 13, from Table 8.5 the criti-
cal value is still rxy0.05(13) 5 0.514. Therefore the statistical
decision will be to fail to reject H0.
The relationship between the donor’s gender and the amount
contributed is not statistically
significant. The rxy 5 0.19 result is probably a random
correlation that is unlikely to reach the
critical value from the table in any new analysis with new
subjects.
The interpretation of the point-biserial correlation is the same
as it is for conventional Pear-
son correlations, except that sign of the coefficient is a function
only of which variable is
coded 1. If male donors had been coded with 1s, the correlation
would have been negative,
rxy 5 20.19. Consider a few more applications for the point-
biserial correlation:
• What is the relationship between whether or not a parent
earned a college degree
and the child’s grades?
• How is whether or not a student is a native speaker of
English related to the
student’s test score?
tan82773_08_ch08_227-262.indd 245 3/3/16 12:34 PM
44. between risk-taking and success
solving novel problems. Having devised the Inventory Risk
Survey Catalog (the I-RiSC), the
psychologist gauges the willingness of a group of 16-year-olds
to do the unconventional and
then provides a series of word problems with which the
participants are unfamiliar. Scores on
the I-RiSC and the problems for 10 participants are listed in
Table 8.9.
Table 8.9: Risk-taking and problem-solving success data
I-RiSC: 2 7 4 5 1 8 7 9 3 6
Problems: 14 17 14 16 12 17 16 17 15 15
To complete the problem in Excel, it is best to set up the data in
two columns. Two rows also
will work, but parallel columns are visually simpler.
1. Create a label in cell A1 for “I-RiSC” and in cell B1
“ProbSolv” so that
the I-RiSC data appear in cells A2 to A11
and the ProbSolv data appear in B2 to B11.
2. From the Home tab at the top of the page click Data, and then
Data Analysis at the
far right.
3. Select Correlation, which is the second option in the window.
4. In the Input Range window enter A2:B11, which indicates the
cells where the data
are found. Note that the default groups the data in columns.
(Change the default if
46. Column 1 Column 2
Column 1 1 0.904203
Column 2 0.904203 1
The result of the analysis is a Pearson correlation of rxy 5
0.904. The 1s in the diagonal indi-
cate that each variable correlates perfectly with itself (rxy 5
1.0), of course. Note that the
output does not indicate whether the calculated value is
statistically significant, which makes
a check of the critical values table necessary. Table 8.5
indicates that rxy0.05(8) 5 0.632. The
relationship between risk-taking and problem solving is
statistically significant. Were these
data not contrived, it would be quite important to know that
about 82% (rxy2 5 0.818) of
problem-solving success (0.9042) is explained by whatever the
I-RiSC measures, ostensibly
the subject’s willingness to be unconventional.
Apply It!
Investigating the Correlation
between Crime and Unemployment
A law enforcement analyst is interested in any link
between crime and unemployment as a guide to allocat-
ing crime-prevention funds. Specifically, she would like
to know whether murders and property crimes correlate
with the unemployment rate.
The analyst obtains the murder and property-crime rates
for her state for the 16 years from 1990 to 2005 from
the FBI Uniform Crime Reports (rates are per 100,000
inhabitants). She then consults the Bureau of Labor Sta-
tistics for the unemployment rate in the state for the
48. 1993 6.4 4662 6.9
1994 6.2 4678 6.1
1995 5.7 4460 5.6
1996 5.8 4438 5.4
1997 5.4 4279 4.9
1998 6.1 4040 4.5
1999 5.5 3852 4.2
2000 5.1 3592 4.0
2001 4.9 3456 4.7
2002 4.3 3412 5.8
2003 4.2 3289 6.0
2004 4.7 3168 5.5
2005 5.0 3081 5.1
The Excel results indicate the following:
• The correlation between murder rate and unemployment is
rxy 5 0.386.
• Comparing the murder rate/unemployment rate correlation
to the critical value from
Table 8.5 (rxy0.05(14) 5 0.497) indicates that the calculated
correlation is not statistically
significant at p 5 0.05.
• The analyst fails to reject the null hypothesis,
ρ 5 0.
• The property crimes rate and unemployment correlation is
rxy 5 0.551.
• Comparing the calculated value to the critical value from
Table 8.5 (the same
rxy0.05(14) 5 0.497, since df are unchanged) indicates that this
correlation is statistically
significant at p 5 0.05.
50. Section 8.5 Spearman’s Rho
8.5 Spearman’s Rho
The Pearson correlation requires that both variables must be at
least interval scale. The point-
biserial correlation requires that one variable must be at least
interval scale, and the other
must be a variable with only two levels.
Neither of these correlations is helpful when the data are
ordinal scale, which describes much
of the data that psychologists and other social scientists
encounter. Nearly everyone who goes
to the mall or answers the telephone has been asked to take a
survey, particularly if it hap-
pens to be an election year. Survey data are usually ordinal
scale. It is common for the ques-
tionnaires to have a Likert-type format, where a statement is
read and the respondents are
asked the degree to which they agree with the statement by
selecting from a range of choices
such as:
• Strongly agree
• Agree
• Neither agree nor disagree
• Disagree
• Strongly disagree
Although surveyors commonly code the responses (strongly
agree 5 1, agree 5 2 and so
on) and then calculate means and standard deviations for all
respondents, those statistics
assume that the data are at least interval scale. Survey data
rarely are. The Likert types of
responses are essentially rankings. A response of “strongly
51. agree” is more positive than
“agree” but precisely how much more is not clear. Besides, one
respondent’s “disagree” may
be another respondent’s “strongly disagree.” These data are
more safely treated as ordinal
scale responses.
Correlating Ordinal, or Mixed Ordinal/Interval Data
In addition to survey data, ordinal scale characterizes other
common data, such as class rank-
ings and percentile scores. Sometimes the variables
investigators might wish to correlate
have mixed scales. For example, a researcher wants to correlate
subjects’ income (ratio scale
data) with their optimism (usually gauged with a Likert-type
survey and so ordinal scale).
Along with the ordinal variable, the income variable is often not
normally distributed. The
lack of normality in both the ratio variable and the ordinal scale
variable rules out a Pearson’s
correlation.
Charles Spearman, Pearson’s colleague at University College
London, developed a tremen-
dously flexible correlation procedure. It accommodates two
variables in a correlation proce-
dure, provided the variables fit any of the following:
• Both are ordinal scale.
• One variable is ordinal scale and one is interval or ratio
scale.
• Two variables are interval or ratio scale, but one or both
fail to meet the Pearson
correlation requirement for normality.
53. Following are the steps to calculating a Spearman’s rho:
1. Rank the scores for both variables separately.
2. For each pair of rankings, subtract the second ranking in the
pair from the first to
produce a difference score, d.
3. Square each of the d values for d2.
4. Sum the d2 values for ∑d2.
5. Solve for ρ.
Ranking Tied Scores
The ranking procedure must follow rules. If some of the scores
for one of the variables have
multiples, all must receive the same ranking. If someone were
ranking the following values,
for example:
3, 5, 6, 6, 7, 8, 8, 8, 9, 10
ranking the values from smallest to largest produces the
following values:
1, 2, 3.5, 3.5, 5, 7, 7, 7, 9, 10.
The smallest value, 3, was ranked “1,” the 5 was ranked “2,”
and so on. The two 6s and the
three 8s were handled as follows:
• Because the two 6s are rankings 3 and 4, those two values
are added and divided by
the number of them (2), which results in 3.5
([3 1 4] 4 2). After both 6s are ranked
3.5 (for places 3 and 4) the next value in the data set, 7, is
ranked 5.
55. 8 37
8 40
9 42
10 39
Calculations for a Spearman’s rho solution, based on the
information in Problem 8.1, give
ρ 5 1 2
6∑d2
n(n2 2 1)
5 1 2
6(24.5)
10(102 2 1)
5 0.852
Table 8.13 lists the critical values for Spearman’s rho (Table
B.6 in Appendix B). There are no
degrees of freedom for this procedure. The correct critical value
for rho is indicated by the
number of data pairs. Note that for p 5 0.05 and 10 pairs
ρ.05(10) 5 0.648. The relationship
between emotional stability and age among service personnel
assigned to combat zones is
statistically significant; therefore, we reject H0.
Try It!: #5
Spearman’s rho requires data of what
58. Section 8.5 Spearman’s Rho
Apply It!
Exploring the Correlation between
Job Satisfaction and Commute Times
As part of the justification for allowing workers
to work at home part-time, the human resources
director for a large firm intends to investigate
any correlation between job satisfaction and
average commute time for employees. The
director asks ten randomly selected employees
to fill out a job-satisfaction questionnaire with
the following responses to a series of questions:
Response Score
• very satisfied (vs) 1
• somewhat satisfied (ss) 2
• somewhat dissatisfied (sd) 3
• very dissatisfied (vd) 4
The employees were also asked to indicate their average one-
way commute time in minutes.
Recognizing that job satisfaction responses will be ordinal
scale, the HR director opts for
Spearman’s rho. The data and the difference scores are shown in
Table 8.14.
Table 8.14: Spearman’s rho data for the correlation between job
satisfaction and commute time
Commute
time
61. between any two data points
is lost. When the ages of the service personnel were ranked,
• the 25-year-old was 1,
• the 26-year-old was 2,
• and the 32-year-old was 3.
Once ranked, the fact that from the first to the second ranking is
a one-year difference and
from the second to the third ranking is a six-year difference is
lost. Pearson’s r retains those
(continued)
For n 5 10, the Spearman’s rho formula is
ρ 5 1 2
6∑d2
n(n2 2 1)
5 1 2
6(31)
10(102 2 1)
5 0.812
For rs 5 0.05 and 10 pairs of data, the critical value is
rs0.05(10) 5 0.648. The relationship between
job satisfaction and average commute time is statistically
significant. Those who commute the
least time have the highest levels of job satisfaction. Perhaps
the attitudes of those who have
the lowest levels of job satisfaction—those who have the
longest commutes—will improve if
63. 6 0.811 0.886
10 0.632 0.648
*for df =number of pairs, 22
In the examples above, the value required for significance with
a Spearman correlation is
higher than that required for a Pearson correlation.
Another limitation of the Spearman correlation is that we cannot
square the Spearman value
to determine the proportion of variance in y explained by x.
Spearman’s rho has no equivalent
of rxy2. When the data do not meet the Pearson requirements,
however, the researcher has no
choice. When the data do meet the requirements, a Pearson’s r
is usually preferable to Spear-
man’s rho.
Correlation in Research
Correlation procedures answer enough of the questions that
interest researchers and con-
sumers of research that the procedures pervade research
literature. Arroyo (2015) exam-
ined the correlation between work engagement and internal self-
concept. Arroyo found that
people tend to engage in the work they do to earn a living, not
for the external rewards, but
for the work’s own sake; their work is intrinsically satisfying.
Ceci and Kumar (2015), meanwhile, asked whether happiness
correlates with creative capac-
ity. They found no significant correlation but did find a
significant correlation between cre-
ative capacity and intrinsic motivation, suggesting that those
65. other decreases. The sign of the coefficient, however, is
unrelated to its strength
(Objective 2).
The differences among the correlation procedures in this
chapter are in the kinds
of variables they accommodate. The Pearson correlation
requires interval or ratio
variables that are normally and similarly distributed (Objective
3). A special applica-
tion of Pearson, the point-biserial correlation, requires an
interval/ratio variable and a
second variable that has only two manifestations, or a
dichotomously scored variable
(Objective 5). Spearman’s rho accommodates any combination
of ordinal, interval, or
ratio variables (Objective 6). Because the data used in a Pearson
correlation contain
more information than the rankings that make up the data for
Spearman’s approach,
the Pearson value provides more information about the nature
of the relationship
between the variables. This is evident in the fact that the
Pearson value can be squared
to produce the coefficient of determination. The rxy2 value
indicates the proportion of
one variable that can be explained by changes in the other
(Objective 4). Spearman
values have no equivalent of this statistic.
When two variables share information, they are correlated. The
amount of one explained
by the other is what that rxy2 value, the coefficient of
determination, indicates. This con-
cept provides a foundation for regression, which is the focus of
Chapter 9. Regression
66. allows what is known of y from analyzing x to predict the value
of y from a value of x.
It involves calculations and thinking with which you are already
familiar, so work the
end-of-chapter problems, reread any of the sections in Chapter
8, and prepare for
Chapter 9.
bivariate correlations Include all proce-
dures that test for significant relationships
between two variables.
canonical correlation Measures the rela-
tionship between two groups of variables.
coefficient of determination Indicates the
proportion of one variable in a Pearson cor-
relation that can be explained by the other.
correlation matrix A box in which the vari-
ables involved are listed in rows as well as
in columns, and each variable is correlated
with all variables, including itself.
hypothesis of association The umbrella
term for significance tests that analyze the
correlation between or among variables.
hypothesis of difference The umbrella
term for significance tests that analyze the
differences between groups.
linear Describes a relationship between
two variables whose strength is consistent
throughout their ranges. With curvilinear
relationships, the strength and sometimes
68. manifestations.
range attenuation Occurs when a variable
is not measured throughout its entire range.
Attenuated range artificially reduces the
strength of any resulting correlation value.
scatterplot A graph representing two vari-
ables, one on the horizontal axis, the other
on the vertical axis. Each point in the graph
indicates the measure of both variables for
one individual.
semi-partial correlation Gauges the rela-
tionship between two variables, controlling
for a third in just one of the first two.
Spearman’s rho A correlation procedure
for two ordinal variables, one ordinal and
one interval/ratio variable or two interval or
ratio variables, that fail to meet Pearson cor-
relation requirements for normality.
Review Questions
Answers to the odd-numbered questions are provided in
Appendix A.
1. What values indicate the strongest and weakest values for a
Pearson’s r?
2. What is the equivalent in a Pearson
correlation for η2?
3. What are the requirements for calculating Pearson’s r?
4. What is “range attenuation,” and how does it affect
70. relationship between
those two variables?
b. What is the resulting coefficient?
c. How much of variability in arrest records can be explained by
what time the juve-
nile goes to bed?
Juvenile Retire Arrest
1 9.0 No
2 9.5 No
3 11.0 Yes
4 11.5 Yes
5 10.0 Yes
6 9.75 No
7 10.0 No
8 10.25 Yes
9. A group of consumers has just taken two surveys on (a) their
attitude about
the economy and (b) their attitude about those in government. In
both, higher
scores mean more optimism. The data are ordinal scale. Are the
two attitudes
related?
Consumer Economy Government
72. the test.
Student Minutes (x) Score (y)
1 15 57
2 80 84
3 0 60
4 75 92
5 30 65
6 10 60
7 22 75
8 15 68
a. Is the relationship statistically significant?
b. How much of the variance in test scores can be explained by
differences in the
amount of time spent reading?
11. A district psychologist is working with developmentally
disabled students in a
special education setting and is curious about the relationship
between students’
persistence on puzzle tasks (measured in the number of minutes
they remain on
task) and their number of absences from class.
Student Persist Absent
74. b. Is the relationship statistically significant?
Employee Sales Blood pressure
1 1 150
2 4 140
3 3 140
4 6 110
5 2 140
6 4 130
7 0 160
8 3 110
9 5 120
10 7 160
13. An industrial psychologist is determining the relationship
between workers’ willing-
ness to embrace new manufacturing procedures, gauged with a
dogmatism scale
(higher scores indicate greater dogmatism), and their level of
job satisfaction (higher
scores indicate greater satisfaction). The satisfaction data are at
least ordinal scale.
a. What is the relationship?
b. What is the null hypothesis?
c. Do you reject or fail to reject the null hypothesis?
77. ARTX 435 The Fashion Consumer
Name: Date:
I Want That! How We All Became Shoppers- Discussion
Questions
1. How do we use objects to define our identity?
2. What does the author mean when he writes that objects are
“repositories of magic”?
3. The author writes, “The catalogue told Jane about the
moment in which she was living.” What is meant by this? What
resources do consumers use today to learn about the moment in
which they are living?
4. What does the author mean when he describes “just looking”
as a form of “domestic due diligence”?