2278CorrelationAnrodphotoiStockThinkstockChapter.docx

227
8Correlation
Anrodphoto/iStock/Thinkstock
Chapter Learning Objectives
After reading this chapter, you should be able to do the
following:
1. Explain the hypothesis of association.
2. Interpret the correlation coefficient.
3. List the Pearson correlation requirements.
4. Describe what the coefficient of determination explains.
5. Explain the variables involved in the point-biserial
correlation.
6. Describe the applications for the Spearman correlation.
tan82773_08_ch08_227-262.indd 227 3/3/16 12:33 PM
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for
resale or redistribution.
Section 8.1 The Hypothesis of Association

Introduction
Correlation, the concept of a relationship or dependence
between variables, transcends statisti-
cal analysis. Cloudy days are related to (correlated with) cooler
temperatures. Natural disasters
are related to declines in the stock market. An impending test is
related to the need to study, and
grinding noises in the engine compartment of a car are usually
related to repair bills.
Some relationships are stronger than others, so statistical
procedures have been developed
to quantify, or numerically gauge, the strength of the
relationship between two variables. The
numerical indicators are called correlation coefficients, and one
of the most common is the
Pearson correlation coefficient, which indicates the strength of
the relationship between
interval- or ratio-scale variables. The name Pearson refers to
Karl Pearson, whose impact not
just on studying correlation but on statistical analysis generally
may be greater than that of
any other individual.
In the early years of the 20th century, Pearson founded the first
department of statistical analy-
sis at University College London. Under Pearson’s direction,
the department attracted, among
others, William Sealy Gosset of t test fame; Ronald Fisher, who
produced analysis of variance;
and Charles Spearman, for whom an alternative correlation
coefficient is named, as well as an
elegant statistical procedure based on correlation called factor
analysis. To put it succinctly, it is
difficult to overstate the impact that Pearson had on the
evolution of statistical analysis.

A man of fierce independence, Pearson’s education at
Cambridge centered in religion and
philosophy rather than mathematics. As a student of religion, he
sued the university over the
compulsory chapel attendance required of all undergraduates.
Winning his suit brought a
change to university rules—after which Pearson chose to attend
chapel. His graduate work (in
Germany) emphasized literature, and it is a testimony to his
extraordinary breadth of talent
that his greatest contributions would be in statistical analysis.
Pearson was a contemporary
of Einstein, who sought a grand theory that would unite all of
physics. Pearson tried to do the
same with mathematics. That both men were disappointed in
these efforts should not detract
from what they did accomplish. Although Pearson’s associations
with his colleagues were not
always harmonious, he and the others who found an academic
home in his department virtu-
ally defined modern quantitative analysis. Whether or not they
realize it, almost all of those
who crunch numbers for any length of time rely on their work.
8.1 The Hypothesis of Association
Previous chapters concentrated on tests of significant
difference. The z test, the t tests, analy-
sis of variance, and the repeated-measures designs test the
differences between groups. They
all fall under a general assumption referred to as the hypothesis
of difference. But some
kinds of analyses do not involve questions about whether there
are significant differences
between groups.

If a psychologist asks about the relationship between birth order
and achievement motiva-
tion among siblings or about the connection between the amount
of time children read and
their school grades, the subject of research concerns
relationships rather than differences.
Those questions call for procedures connected to the hypothesis
of association, and when
tan82773_08_ch08_227-262.indd 228 3/3/16 12:33 PM
results are statistically significant, it means that the
relationship, rather than the difference, is
unlikely to be a random occurrence.
Correlation versus Cause
Before pursuing correlation, researchers must
make a distinction between correlation and cause.
Because two characteristics co-vary, or vary
together, that does not presume that one necessar-
ily causes the other. Although there may be a causal
relationship, researchers usually cannot determine
one just by studying the correlation. One of the
author’s statistics professors explained the risk of
confusing correlation with cause this way: A person
drinks for three successive nights. The first night, the
drink is scotch and water, the second bourbon and
water, and on the third, vodka and water. Each morn-
ing after is accompanied by a hangover. Because the

water is common to each experience, water must be
the cause.
A classic study demonstrates, among other things, a correlation
between the sale of ice cream
by vendors on city streets and burglaries in the same city.
Someone rushing to judgment
about cause might wish to curb ice cream sales or check the
criminal records of ice cream
vendors to reduce the number of burglaries. Such an individual
does not recognize that hot-
ter weather—and the open windows that result—probably drive
both ice cream sales and
burglaries. It is not unusual for some third variable to explain
an association between a first
and a second. Although correlation values provide some
evidence for causation, correlation
alone is rarely sufficient to demonstrate cause.
Scatterplots
Breaking down the word correlation—co-relation—makes its
meaning clear: the variables
are related. The evidence for the relationship is that the
characteristics co-vary. As the level of
one variable changes, the other changes as well because both
variables contain some of the
same information. The higher the correlation, the more common
information they contain.
A researcher gathers verbal ability and intelligence scores for
12 subjects and presents them
in Table 8.1. Note that the first participant has a verbal ability
score of 20 and an intelligence
score of 80. Scanning the two rows of data, we can see that as
the values of one score increase,
so do those of the other. In other words, there appears to be a

positive correlation between
the two scores. The relationship is easier to see in the
scatterplot. A scatterplot is a graph plotting the values
of one variable along the horizontal axis and the other
variable along the vertical axis, using dots to indicate
the intersection of each pair of values. Figure 8.1 shows
an Excel-generated scatterplot of the verbal ability/
intelligence data.
Design Pics/Kelly Redinger/Thinkstock
As the classic study involving ice
cream sales and burglaries shows us,
it is important to make a distinction
between correlation and cause.
Try It!: #1
How many raw scores does a single point
on a scatterplot represent?
tan82773_08_ch08_227-262.indd 229 3/3/16 12:34 PM
Table 8.1: Results of a study comparing verbal ability and
intelligence
Participant 1 2 3 4 5 6 7 8 9 10 11 12
Verbal ability: 20 35 42 48 55 60 63 66 72 76 78 85

Intelligence: 80 95 90 100 100 100 110 115 120 115 110 125
Figure 8.1: The relationship between verbal ability and
intelligence
In the Figure 8.1 scatterplot, intelligence scores are plotted
along the vertical, or y, axis and
the verbal ability scores are plotted along the horizontal, or x,
axis. Each diamond-shaped
point in the graph, then, represents an intelligence score and a
verbal ability score.
The plot verifies what our cursory view of the two rows of data
in the table suggested: A posi-
tive correlation exists between measures of intelligence and
those of verbal ability. The gen-
eral trend is from lower left to upper right. As the value of one
variable increases, the value of
the other tends to do likewise. The incline is not dramatic, but
the graph shows a general rise
in the data points.
Less-than-Perfect Relationships
The relationship certainly is not perfect. The fourth, fifth, and
sixth participants all have the
same level of intelligence but different levels of verbal ability.
The same is true of participants
8 and 10, as well as participants 7 and 11. Still, there is a
general lower-left to upper-right
relationship, which might be expected. Brighter people often
have more complex language
patterns, something suggested by higher verbal-ability scores.
It also is not surprising that the relationship between
intelligence and verbal ability is less
than perfect. An extensive vocabulary alone is no guarantee of

an unusually high intelligence
tan82773_08_ch08_227-262.indd 230 3/3/16 12:34 PM
score. Perhaps the individual is just an avid reader. At the other
end of the spectrum, not all
highly intelligent people excel verbally.
The exceptions point to the fact that people are very complex.
Human behavior is rarely
explained by one or two variables. Although intelligence is
related to verbal aptitude, so are
a number of other variables: how much the individual reads,
how easily the individual is dis-
tracted, how much experience the person has had, and so on.
One of the reasons researchers
calculate correlation values is to determine the level of
agreement when the relationships are
not perfect, as they rarely are with people.
The issue the hypothesis of association seeks to resolve is not
whether the relationship is
perfect—because it would be extremely rare if it were—but
rather, whether the relationship
is statistically significant. Statistically significant correlations
produce correlation values that
tend to reemerge every time new data are gathered for the
variables and the strength of the
correlation re-calculated.

Although perfect correlations are rare when dealing with
people, that is not necessarily the
case elsewhere. Mathematicians, for example, enjoy the
stability of perfect relationships; the
formula for the area of a circle, A 5 πr2 (where the area is
found by multiplying the value of pi
by the square of the radius), works for circles of any size
because a perfect relationship exists
between a circle’s radius and its area.
Still, even imperfect correlations, such as those related to
human-subjects research, can be
very important. If health professionals know a correlation, even
a weak one, exists between
exposure to secondhand smoke and the later development of
respiratory problems, they can
warn against such exposure. In that particular instance, by the
way, the research supports the
causal assumption. If educators know there is a correlation
between how much homework
students do and their success on a high school exit exam,
educators can encourage students to
complete more assignments. The instructors expect that pass
rates will rise as a consequence.
In the case of homework and exit exam scores, however, a
causal relationship is not as clear.
Perhaps people who have a higher level of academic
achievement do more homework and
have higher exit exam scores. That suggests the academic
achievement is the causal element
rather than the homework. Maybe the increased homework is the
manifestation of that other
variable, academic achievement, or perhaps parental
involvement is the causal factor—stu-

dents whose parents are directly involved in their schooling do
more homework and prepare
for their exit exams with greater care.
The Amount of Scatter
The amount of scatter in a scatterplot, the degree to which the
points in the scatterplot stray
from a straight line, suggests weakness in the correlation.
Scatterplots graphed for strong
correlations have very little scatter. The points appear to line
up.
What Correlations Provide
Calculating a correlation involves quantifying the strength of
the relationship between the
variables involved. Correlation values, or coefficients, range
from to 21.0 to 11.0. Correlation
values of either 21.0 or 11.0 indicate perfect relationships. With
positive correlations, as the
tan82773_08_ch08_227-262.indd 231 3/3/16 12:34 PM
value of one variable increases, so does the value of the other—
more verbal reinforcement of
subjects in a test of problem-solving ability is probably
associated with more effort expended
by the subject. With negative correlations, as the value
of one variable increases, the other decreases—more

involvement with video-gaming while a text passage
is read to subjects is probably associated with lower
retention of the details of the text passage; as the value
of one increases, the value of the other declines. A cor-
relation of 0 indicates no relationship—fluctuations in
the value of one variable are unrelated to changes in the
value of the other. Values less than the absolute value of
1.0, but greater than 0, indicate imperfect relationships, with
the strength of the relationship
declining as the value approaches 0.
Correlating two variables does not require that they both
measure the same characteristic
or even that they both be gathered from the same subjects.
Often, entirely different kinds
of things are correlated. The example of secondhand smoke and
respiratory issues involves
two completely different variables, but the strength of the
relationship between them can be
calculated nevertheless. As long as the two variables can be
quantified—reduced to a num-
ber—the strength of any relationship can be determined.
Requirements for the Pearson Correlation
Researchers may employ any of several different correlation
procedures. The appropriate
procedure for a particular problem is determined by
characteristics such as the scale and
normality of the data involved. The Pearson correlation, for
example, requires variables of
either interval or ratio scale. Nominal or ordinal scale data can
be correlated as well, but they
involve other correlation procedures. In addition to interval or
ratio data, the Pearson corre-
lation also requires the following:

• In their populations, the characteristics are assumed to be
normally distrib-
uted. Normal distributions can never be reflected in relatively
small samples, but
researchers must have reason to believe that the samples come
from populations
that are normal.
• The distributions from which the samples come must be
similarly distributed.
• The two samples are assumed to be randomly selected
from their populations.
• The relationship between the variables must be linear; it
remains constant through-
out their ranges.
Recall that normality is indicated when the standard deviation is
about one-sixth of the range,
the measures of central tendency all have about the same value,
and so on (Chapter 2). The
way data are distributed in the scatterplot also suggests the
normality of the two variables
involved in a correlation. When both variables are normal, the
points in the plot will be dis-
tributed from left to right, with the frequency of the points
gradually increasing toward the
middle of the graph and then gradually decreasing to the right
extreme. If the relationship is
positive (example A in Figure 8.2), the scatter is generally from
lower left to upper right. If
it is negative (example B in Figure 8.2), the graph follows a
pattern from upper left to lower
right. If the variables have no correlation (example C in Figure
8.2), the points fall into a cir-

cular pattern in the middle of the graph with the greatest density
at the circle’s center. The
Try It!: #2
If two variables are normally distributed
but uncorrelated, what pattern will their
data points make in a scatterplot?
tan82773_08_ch08_227-262.indd 232 3/3/16 12:34 PM
30
20
10
0
0 5 10 15
30
20
10
A. A positive Correlation
B. A negative Correlation
C. A Zero Correlation

0
0 42 86 10
5
15
10
5
0
0 5 10 15 20
greater frequency in the middle of the circle
reflects the fact that most of the data in any
normal distribution occur near the middle of
the distribution. (The pattern in our example
does not look circular because so few data are
present.)
The similar-distribution requirement does
not mean that the standard deviations should
be the same. That is not likely to happen
unless both variables are measured along the
same range. It means that the standard devia-
tions should account for similar proportions
of their respective ranges.
The strength of a correlation is affected by
range attenuation. When the range of scores
in either variable is artificially abbreviated,
the correlation value will be artificially low.
Range attenuation can be indicated by a stan-

dard deviation that is substantially smaller
than we know it to be in the population. If
we were correlating intelligence scores with
reading comprehension, and the intelligence
scores have a standard deviation of 8 points
when we know that the population standard
deviation is 15 points, we can expect any resulting correlation
value to be artificially low. One
of the advantages of random selection is that random samples of
a reasonable size tend to
mirror their populations reasonably well. Range restriction
problems are much less likely to
occur with randomly selected samples.
Linear and Nonlinear Correlations
When the relationship between two variables is linear, it means
that the degree to which they
change in concert with each other is the same throughout their
ranges; if it is low and posi-
tive, it is low and positive at low levels of both variables and at
higher levels of both variables.
Some correlations, however, are not linear. Consider the
correlation between anxiety and the
quality of a musician’s performance. In that instance, a little
anxiety is probably a good thing.
It prompts the individual to prepare for the performance by
practicing, studying the music
carefully, asking others for feedback, and so on. Without
anxiety, the musician might not make
the necessary preparations. It seems likely that, at least in the
early going, the quality of the
performance improves as anxiety increases.
But it is possible that if anxiety continues to increase, the
individual’s performance may reach

a plateau and then begin to diminish. The musician may become
so anxious that concentra-
tion is difficult and performance declines, with more anxiety
actually depreciating the quality
of the music. These conditions describe a relationship that is
curvilinear. It is illustrated in
Table 8.2, where anxiety is gauged as a function of someone’s
increasing pulse rate in beats
Figure 8.2: Scatterplots for positive,
negative, and zero correlations
30
20
10
0
0 5 10 15
30
20
10
A. A positive Correlation
B. A negative Correlation
C. A Zero Correlation
0
0 42 86 10

5
15
10
5
0
0 5 10 15 20
tan82773_08_ch08_227-262.indd 233 3/3/16 12:34 PM
per minute. The quality of the musician’s performance is
represented by the judgment of a
trained observer, with higher values indicating a more virtuoso
performance. If scores were
awarded every 5 minutes during a 65-minute performance, the
data are as follows:
Table 8.2: Study results of anxiety versus quality of a
musician’s performance
Anxiety 52 54 58 62 64 67 72 73 75 78 82 86 88
Performance quality 3 5 6 6 8 8 9 7 5 5 4 3 1
Figure 8.3 shows the scatterplot illustrating the relationship

between the musician’s anxiety
and the quality of the musician’s performance.
Figure 8.3: The relationship between performance quality and
anxiety
Try It!: #3
What impact does range attenuation have
on a correlation?
Initially, there is a positive relationship between anxiety and the
quality of the music. The
first few pairs of data have points that rise from left to right.
However, a positive relationship
becomes negative when performance begins to diminish as
anxiety increases. Viewed as a
whole, the correlation is curvilinear. After performance
reaches the judge’s high of 9, more anxiety is not asso-
ciated with better music.
The scatterplot also reveals some of the danger associ-
ated with range restriction. If someone collects data so
that only the first six pairs of scores were the sample,
tan82773_08_ch08_227-262.indd 234 3/3/16 12:34 PM
those scores provide very different indicators of the relationship
between anxiety and perfor-

mance than the last six pairs of scores. The first part of the
distribution makes the relation-
ship look linear and positive. The latter part of the data makes
the relationship look linear but
negative. An accurate picture of the relationship requires data
throughout the entire ranges
of the two variables.
Understanding Correlation Values
It is important not to confuse the sign of the correlation (1 or 2)
with its strength. A corre-
lation of 20.50 contains the same amount of information about
the two variables as does a
correlation of 10.50. The sign makes a great deal of difference
how the relationship is inter-
preted, but it has nothing to do with the strength of the
relationship. With positive correla-
tions, as the value of one variable increases so does the value of
the other. When correlations
are negative, increasing values of one variable are associated
with decreasing values of the
other.
Earlier we noted that different scales of data require different
types of correlation proce-
dures. The number of variables involved also dictates the need
for different correlation
procedures:
• Bivariate correlations indicate the relationship between
two variables. For exam-
ple, the correlation between intelligence and verbal aptitude is a
bivariate correla-
tion. This chapter focuses on bivariate correlations.
• Multiple correlation gauges the relationship between one

variable and a combina-
tion of others. For example, the correlation between a combined
reading compre-
hension and vocabulary measure with an analytical-ability
measure would indicate
how well reading comprehension and vocabulary ability,
combined, correlate with
analytical ability.
• Canonical correlation measures the relationship between
two groups of variables.
For example, determining how a combination of reading
comprehension and vocab-
ulary ability and a combination of analytical ability and
problem-solving ability
relate calls for a canonical correlation.
• Partial correlation measures the relationship between two
variables after neu-
tralizing the influence of some third variable on both of the first
two. For example,
a correlation of analytical ability with problem-solving ability,
with the influence of
age controlled in both of the other variables, eliminates age
differences as a factor
in the resulting correlation. In effect, a partial correlation would
be the correlation
of analytical ability with problem-solving ability as if all
subjects were the same
age.
• Semipartial correlation gauges the relationship between
two variables after neu-
tralizing the influence of a third on either of the first two. For
example, a correlation
of intelligence with verbal aptitude, with age differences

controlled in the verbal-
aptitude variable, is a semipartial correlation. Age would not be
controlled in the
intelligence variable. (This makes some sense since intelligence
is often argued to be
a stable variable across age differences in the individual.)
Only the bivariate correlations are covered here. The others are
beyond the scope of this book
but are described here very simply, so that the reader has a
sense of where bivariate correla-
tions fit into the broader discussion of these procedures.
tan82773_08_ch08_227-262.indd 235 3/3/16 12:34 PM
Section 8.2 Calculating the Pearson Correlation
8.2 Calculating the Pearson Correlation
Formally called the Pearson product-moment correlation
coefficient, the Pearson correlation,
or—because its symbol is typically a lowercase r—“Pearson’s
r,” is probably the most often
calculated of any correlation value. Thumbing through statistics
books and glancing at online
sources reveals several formulas. All provide the same answer,
but some are easier to com-
plete than others. Visually, at least, Formula 8.1 is probably
simplest:
Formula 8.1

rxy 5
∑[(zx)(zy)]
n 2 1
Note that the r symbol has x and y subscripts. These indicate
that the procedure correlates
two variables designated x and y. Which variable is assigned x
and which y is unimportant,
since correlation does not presume that the x variable causes y,
for example. Formula 8.1
indicates that if the x and y scores are transformed into z scores
(Formula 3.1: z 5 x 2 Ms ), the
value of rxy, (the correlation value) is the sum of the products
of the x and yz scores for each
participant, divided by the number of participants in the data
group (rather than the number
of scores), minus 1.
The n 2 1 signifies that this is a correlation formula for sample,
rather than population, data.
It is the same adjustment for sample data made with the
standard deviation calculation in
Chapter 1. Formula 8.1 can be used to calculate the correlation
value of the verbal ability
and intelligence scores from the earlier example. Calculating
the equivalent verbal-ability and
intelligence z values with Formula 3.1 produces the z values for
the original raw scores listed
in Table 8.3.
Here, each pair of z scores is multiplied and the products
summed:
(21.991 3 21.902) 1 (21.212 3 20.761) 1 . . . 1 (1.385 3 1.522) 5

10.313
This provides the numerator to be used in the formula
rxy 5
∑[(zx)(zy)]
n 2 1
Then, for the denominator, n (the number of pairs of scores) 5
12, so n 2 1 5 11. Therefore,
substituting these values into the above equation gives
rxy 5
10.313
11
5 0.938
Table 8.3: z values
Verbal
ability (x)
21.991 21.212 20.848 20.537 20.173 0.087 0.242 0.398 0.710
0.917 1.021 1.385
Intelli-
gence (y)
21.902 20.761 21.141 20.380 20.380 20.380 0.380 0.761 1.141
0.761 0.380 1.522
tan82773_08_ch08_227-262.indd 236 3/3/16 12:34 PM

With a maximum possible correlation value of 1.0, rxy 5 0.938
indicates a strong relationship
between verbal ability and intelligence, something that is
reflected in the fact that many intel-
ligence tests include subtests of verbal ability.
Although Formula 8.1 is visually simple, the need to transform
everything into z scores before
calculating rxy makes the calculations very time consuming and
tedious. Completing the cal-
culations by hand takes too much time. Formula 8.2, the
formula we will use, turns out to be
the formula programmed into many hand-calculators. It is
visually more complex but much
easier to execute:
Formula 8.2
rxy 5
n∑xy 2 (∑x)(∑y)
Î {[n∑x2 2 (∑x)2][n∑y2 2 (∑y)2]}
where
x 5 one of the scores in each pair as above in the z score
formula.
y 5 the other score in the pair.
n 5 the number of participants (the number of pairs of scores).

∑xy indicates that each pair of scores is multiplied and then the
products for each pair
summed. The resulting value is the “sum of the cross-products.”
∑x2 indicates that each x score is squared, and then the squares
summed.
(∑x)2 indicates that the original x scores are totaled, and then
the total is squared.
∑y2 indicates that each y score is squared, and then the squares
summed.
(∑y)2 indicates that the original y scores are totaled, and then
the total is squared.
The formula is not as daunting as it appears. The
process will become familiar after a few problems.
Probably Excel or a hand-calculator with a built-in
correlation function will perform most of the statis-
tical “heavy-lifting,” but it is helpful to prepare for
that occasional time when there is no computer and
the calculator has no correlation function.
A Correlation Example
A researcher is duplicating a classic experiment by
psychologist E. L. Thorndike. The experiment relates
to Thorndike’s Law of Effect, which maintains that
behaviors followed by a satisfying state of affairs will
likely be repeated. In the experiment, the researcher
sets up a cage equipped with a door that opens if a
cat placed in the cage bats a string suspended inside
iStockphoto/ Thinkstock

Thorndike’s Law of Effect maintains
that behaviors followed by satisfaction
are likely to be repeated. A hungry cat
will learn to bat a suspended string if
that action is followed by food.
tan82773_08_ch08_227-262.indd 237 3/3/16 12:34 PM
the cage. According to the law of effect, if batting the string is
followed by something satisfying,
that behavior should occur more frequently in future trials than
other behaviors. A hungry cat
is placed in the cage and food placed outside where it is
inaccessible from the inside of the cage.
Data comprise the several trials and the elapsed time, in
minutes, before the cat releases itself.
This experiment is repeated 10 times over as many days. Table
8.4 lists the data.
Table 8.4: Experimental results from cat behavioral study
Trial number 1 2 3 4 5 6 7 8 9 10
Elapsed time 5.0 5.5 4.75 4.5 4.25 3.5 2.75 2.0 1.0 0.25
Figure 8.4 shows the scatterplot for these data, which suggests
that the relationship is prob-
ably negative and quite strong.

Figure 8.4: The relationship between number of trials and
elapsed time
The correlation value checks both conclusions. To determine the
correlation, we use
Formula 8.2:
rxy 5
n∑xy 2 (∑x)(∑y)
Î {[n∑x2 2 (∑x)2][n∑y2 2 (∑y)2]}
The number of trials (n) 5 10. The researcher can then verify
that
∑xy 5 137.25
∑x2 5 385
tan82773_08_ch08_227-262.indd 238 3/3/16 12:34 PM
(∑x)2 5 (55)2
∑y2 5 141
(∑y)2 5 (33.5)2
Substituting the relevant values gives

rxy 5
10(137.15) 2 (55)(33.5)
Î {[10(385) 2 (55)2][10(141) 2 (33.5)2]}
5
1372.5 2 1842.5
Î [(3850 2 (3025)][(1410 2 1122.25]
5
2470
Î (825 3 287.75)
5 20.965
Interpreting Results
The relationship is indeed negative and because the maximum
correlation is 61.0, the rela-
tionship is also very strong. Neither of those conclusions
indicates whether the result is sta-
tistically significant, however. As with z, t, and F, significance
is determined by comparing
the calculated value to the table value indicated by the relevant
degrees of freedom and the
selected level of probability. A calculated correlation value for
which the absolute value is as
large is one that probably did not occur by chance. For the
Pearson correlation, the values are
in Table 8.5 (see also Table B.5 in Appendix B).
Like the t and F values, the correct critical value for r is
determined by degrees of freedom
and by the level of probability the researcher selects. The
degrees of freedom for a Pearson
correlation are the number of pairs of data, minus 2. Be careful

not to confuse the number of
pairs with the number of scores.
The probability values in Table 8.5 indicate the absolute value
that the calculated rxy must
reach to be confident that the correlation did not occur by
chance. The level of confidence in
that conclusion is indicated by the columns for p 5 0.1, p 5
0.05, and p 5 0.01. To have some
practice interpreting the values, note the following:
• If a correlation were calculated for n 5 7 pairs of data
(which means that df 5 5)
and the result was rxy 5 1/2 0.669, there is 1 chance in 10, or in
other words
p 5 0.1, that the correlation occurred by chance. A chance, or
random, correlation
means that if new data were collected and the rxy value
calculated a second time, it
would probably be less than the table value.
• If the researcher wants more assurance against a random
correlation,
rxy 5 1/2 0.754 (also with 5 degrees of freedom) will occur by
chance just 5 times
in 100 (p 5 0.05) and rxy 5 1/2 0.875 will occur by chance just 1
time in 100
(p 5 0.01).
tan82773_08_ch08_227-262.indd 239 3/3/16 12:34 PM

Researchers most commonly settle on p 5 0.05 or 0.01. The p 5
0.1 occurs in statistical tables
less often because in most research settings, a one-in-ten chance
of a random correlation is
too great. No one wants to conclude that a correlation is not
statistically significant when
there is too much chance that the finding will not hold up under
further investigation. In
exploratory or descriptive research when there is little prior
research on which to rely, how-
ever, sometimes investigators will relax the probability to p 5
0.1.
Table 8.5: The critical values of rxy
Number of
xy pairs (n) df (n 2 2)
Lowest statistically significant correlation
for the specified probability
p 5 0.10 p 5 0.05 p 5 0.01
3 1 0.988 0.997 1.000
4 2 0.900 0.950 0.990
5 3 0.805 0.878 0.959
6 4 0.729 0.811 0.917
7 5 0.669 0.754 0.875
8 6 0.621 0.707 0.834

9 7 0.582 0.666 0.798
10 8 0.549 0.632 0.765
11 9 0.521 0.602 0.735
12 10 0.497 0.576 0.708
13 11 0.476 0.553 0.684
14 12 0.458 0.532 0.661
15 13 0.441 0.514 0.641
16 14 0.426 0.497 0.623
17 15 0.412 0.482 0.606
18 16 0.400 0.468 0.590
19 17 0.389 0.456 0.575
20 18 0.378 0.444 0.561
21 19 0.369 0.433 0.549
22 20 0.360 0.423 0.537
23 21 0.352 0.413 0.526
24 22 0.344 0.404 0.515
25 23 0.337 0.396 0.505
Source: Brighton Webs Ltd. (2006). Critical values of

correlation coefficient (R). Statistics for Energy and the
Environment. Retrieved
from
https://web.archive.org/web/20110117193722/http://www.bright
on-webs.co.uk/tables/critical_values_r.asp
tan82773_08_ch08_227-262.indd 240 3/3/16 12:34 PM
https://web.archive.org/web/20110117193722/http://www.bright
on-webs.co.uk/tables/critical_values_r.asp
The Relationship Between Degrees of Freedom and Significance
Even with a correlation value as extreme as 20.965, checking
the table for significance is
important. In both the t test and ANOVA, the magnitude of the
critical values declines as
degrees of freedom (and sample size) increase. It is the same
with correlation, but here the
decline in critical values is more dramatic. Note from the table,
for example, that if n 5 3 (and
therefore df 5 1), the correlation would need to be at least rxy 5
0.997 (nearly perfect) to be
statistically significant. The related point is that with only three
pairs of data, the potential for
a random relationship that looks significant is very high. At the
other extreme, if n 5 25 (so
that df 5 23), a correlation of just rxy= 0.396 is statistically
significant. That much data bears
a much lower potential for an accidental (random) relationship.

The Statistical Hypotheses
The null and alternate hypotheses for correlation reflect the fact
that we have moved away
from the hypothesis of difference. The null hypothesis is that no
relationship between the
variables exists. Symbolically, it is written: H0: ρ 5 0.
The symbol ρ is the Greek letter rho (as in
“row” your boat)and the equivalent of r. So
the
null hypothesis states that the correlation (r) equals 0. More
specifically, it means that there
is no statistically significant relationship. The alternate
hypothesis states that the correlation
does not equal 0, that a statistically significant relationship will
emerge each time data are
collected and the relationship calculated: HA: ρ ? 0.
The Coefficient of Determination
One of our important recurring themes is the distinction
between statistical significance and
practical importance. Determining practical importance was the
reason for omega-squared
and eta-squared calculations for significant t test and ANOVA
results, respectively.
Effect sizes take on particular importance with correlation
because with large samples, rela-
tively small correlations can be statistically significant. The
effect size corresponding to the
Pearson correlation is the coefficient of determination (rxy2).
As the notation suggests, the
coefficient of determination is the square of the correlation
coefficient. Squaring the correla-
tion indicates how much of the variance in y is explained by x
(or vice versa since correlation

does not assume cause).
In the problem about number of trials and elapsed time, rxy 5
20.965 so rxy2 5 0.931.
For that problem, the coefficient of determination is interpreted
this way: the number of tri-
als can explain about 93% of the variance in time elapsed,
which would be a very important
finding with implications for many kinds of performance tasks,
except that the numbers were
contrived.
The Interpretive Value of rxy2
The coefficient of determination can also indicate how
unimportant some low correlations
are, even when they are statistically significant. For example,
with 23 degrees of freedom, a
correlation of rxy 5 0.396 is statistically significant. The
coefficient of determination for that
tan82773_08_ch08_227-262.indd 241 3/3/16 12:34 PM
value is rxy2 5 0.157. One variable in such a relationship
explains just 16% of the variance in
the other. The other 84% of the variability is related to other
factors.

When the variables describe the behavior of people, small
coefficients of determination do
not surprise us because they are part of human subjects’
complexity. Very few individual vari-
ables will explain large proportions of human behavior.
Sometimes, however, even low correlations and low rxy2 values
are important. If research
revealed that the correlation between the age of first exposure
to illegal narcotics and the
development of an addiction was rxy 5 20.3, that value (note the
negative correlation) indi-
cates that the younger subjects are at first exposure, the more
likely they are to develop an
addiction. The resulting rxy2 value would be just 0.09. But even
if just 9% of the variance
in addiction is explained by age at first exposure, within the
context of human complexity that
would be considered important. Practical importance is a
function of consequences.
Comparing Correlation Values
In isolation, correlation coefficients can be difficult to interpret
because correlation strength
does not increase or decrease in consistent increments. The
change from rxy 5 0.2 to
rxy 5 0.3 is a less dramatic increase in strength than the
increase from rxy 5 0.75 to
rxy 5 0.85, for example. Although the Pearson r requires equal
interval data, in the coefficients
that are the result, an increase in correlation strength of 0.1
reflects a very different change
from 0.8 to 0.9 than it does from 0.2 to 0.3. It takes a much
stronger increase in the relation-
ship to increase by 0.1 in the upper ranges of correlation values
than in the lower ranges,

something suggested by the distance between tenths in this
number line:
rxy 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Squaring the correlation coefficient makes the intervals
consistent. A change in the coefficient
of determination from 0.35 to 0.5, for example, represents the
same increase in proportion of
variance explained as an increase from 0.7 to 0.85, as the line
suggests: r2xy 5 0.1 0.2 0.3 0.4
0.5 0.6 0.7 0.8 0.9.
Another Correlation Problem
A foundation interested in what prompts contributions to
charitable causes retains a consul-
tant. Noting that age varies with donation, the consultant
gathers the data in Table 8.6 and
generates the values in Problem 8.1.
Table 8.6: Data on charity donations
Donor: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Age: 25 27 32 32 35 38 43 45 45 47 48 52 63 65 66
Amount: 20 20 35 25 100 50 75 45 100 150 100 200 50 100 125
tan82773_08_ch08_227-262.indd 242 3/3/16 12:34 PM

Problem 8.1: The Pearson correlation for contributor’s age
and contribution amount
Donor’s age Contribution amount
x x2 y y2 xy
25 625 20 400 500
27 729 20 400 540
32 1,024 35 1,225 1,120
32 1,024 25 625 800
35 1,225 100 10,000 3,500
38 1,444 50 2,500 1,900
43 1,849 75 5,625 3,225
45 2,025 45 2,025 2,025
45 2,025 100 10,000 4,500
47 2,209 150 22,500 7,050
48 2,304 100 10,000 4,800
52 4,704 200 40,000 10,400
63 3,969 50 2,500 3,150
65 4,225 100 10,000 6,500
66 4,356 125 15,625 8,250
∑x 5 663 ∑x2 5 31,737 ∑y 5 1,195 ∑y2 5 133,425 ∑xy 5 58,260
The correlation of the donor’s age and the contribution amount
is calculated as follows:
rxy 5
n∑xy 2 (∑x)(∑y)
Î {[n∑x
2
2 (∑x)

2][n∑y2 2 (∑y)
2]}
5
15(58,260) 2 (663)(1,195)
Î {[15(31,737) 2 (663)
2][15(133,425) 2 1,195
2]}
5
81,615
Î [(36,486)(573,350)]
5 0.564
• The critical value at p 5 0.05 and 13 df (r0.05(13)) is
0.514.
• Because rxy . r0.05(13), the correlation is statistically
significant.
• The coefficient of determination (rxy2) 5 0.318, which
indicates that age can explain
about 32% of the variability in donation amount.
tan82773_08_ch08_227-262.indd 243 3/3/16 12:34 PM
Section 8.3 Correlating Data When One Variable Is

Dichotomous
Problem 8.1 suggests some of the hazard in rushing
to judgment about cause from correlation data. While
we might be tempted to reduce the problem to “older
people contribute more to charity than younger peo-
ple,” other factors are probably at work, not the least
of which is that age likely correlates with income as
well. Perhaps it is not age that explains contribution
amount so much as income. The correlation value,
while instructive and important, indicates only how variables
co-vary, not necessarily why
the variables involved vary.
8.3 Correlating Data When One Variable Is Dichotomous
If the consultant had asked how the donation amount and the
donor’s gender relate, Pear-
son still provides the answer, but the procedure becomes a
point-biserial correlation. The
word point refers to the continuous variable, the amount of
money donated in this exam-
ple. The word biserial refers to the other variable, which has
only two levels. The required
change is coding the gender variable in a way that reflects its
dichotomy: as either 0 or
1. Which of females or males are coded 0 and which 1 will not
affect the strength of the
coefficient.
The point-biserial correlation has a number of applications.
Questions about the relation-
ship between marital status and income, between public versus
private school students and
achievement, or between Republicans’ and Democrats’
optimism are all questions that could

be analyzed with point-biserial correlation.
In point-biserial correlations, which level is coded 0 and which
1 affects only the sign of the
coefficient. We will need to be careful when interpreting the
result. If donors 3, 5, 6, 7, 9, 10,
11, and 14 are female, and if females are coded 1 and males 0,
the research obtains the data
in Table 8.7.
Table 8.7: Data on charity donations by donor type (gender)
Donor (x) 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0
Amount (y) 20 20 35 25 100 50 75 45 100 150 100 200 50 100
125
Calculating the Point-Biserial Correlation
The amounts donated (the y values) remain the same from the
age/donor problem (Problem
8.1, where ∑y 5 1,195 and ∑y2 5 133,425). The other
values must be recalculated, although
that task becomes much simpler with gender (x) recoded to 1s
and 0s. Table 8.8 lists those
results.
Try It!: #4
What is the relationship between degrees
of freedom and statistical significance in
correlation?
tan82773_08_ch08_227-262.indd 244 3/3/16 12:34 PM

Section 8.3 Correlating Data When One Variable Is
Dichotomous
Table 8.8: Point-biserial correlation results
Gender (x) x2 Amount (y) y2 xy
0 0 20 400 0
0 0 20 400 0
1 1 35 1,225 35
0 0 25 625 0
1 1 100 10,000 100
1 1 50 2,500 50
1 1 75 5,625 75
0 0 45 2,025 0
1 1 100 10,000 100
1 1 150 22,500 150
1 1 100 10,000 100
0 0 200 40,000 0
0 0 50 2,500 0
1 1 100 10,000 100
0 0 125 15,625 0
∑x58 ∑x258 ∑y51,195 ∑y2=133,425 ∑xy5710
Return to Formula 8.2, in which
rxy 5
n∑xy 2 (∑x)(∑y)
Î {[n∑x2 2 (∑x)2][n∑y2 2 (∑y)2]}
Substituting in the values from Table 8.8 gives

rxy 5
15(710) 2 (8)(1.195)
Î {[15(8) 2 (8)2][15(133,425) 2 (1,195)2]}
5 0.19
Still testing at p 5 0.05 and with the degrees of freedom still df
5 13, from Table 8.5 the criti-
cal value is still rxy0.05(13) 5 0.514. Therefore the statistical
decision will be to fail to reject H0.
The relationship between the donor’s gender and the amount
contributed is not statistically
significant. The rxy 5 0.19 result is probably a random
correlation that is unlikely to reach the
critical value from the table in any new analysis with new
subjects.
The interpretation of the point-biserial correlation is the same
as it is for conventional Pear-
son correlations, except that sign of the coefficient is a function
only of which variable is
coded 1. If male donors had been coded with 1s, the correlation
would have been negative,
rxy 5 20.19. Consider a few more applications for the point-
biserial correlation:
• What is the relationship between whether or not a parent
earned a college degree
and the child’s grades?
• How is whether or not a student is a native speaker of
English related to the
student’s test score?
tan82773_08_ch08_227-262.indd 245 3/3/16 12:34 PM

Section 8.4 The Pearson Correlation in Excel
• What is the correlation between blue-collar/white-collar
jobs and the amount of
leisure time?
If both variables are dichotomous, another bivariate correlation
is involved. It is called the phi
coefficient, discussed in Chapter 10.
Degrees of Significance?
At rxy 5 0.19 and a table value of rxy0.05(13) 5 0.514, the
correlation value is not significant.
If the value had been rxy 5 0.50, and this correlation value
represented some relationship
calculated for your senior thesis, would it be appropriate to
refer to it as “almost significant”
or “nearly significant”? It is not uncommon to see such
qualifiers even in the published lit-
erature, but significance decisions should be treated the same
way as dichotomous variables.
Only two outcomes are possible: The correlation is significant
or it is not significant. To try
to make a statement about the nearness to an alternative
outcome undermines the principle
behind significance testing. Only two hypotheses for
significance exist, and the outcome is
couched in terms of one or the other.
8.4 The Pearson Correlation in Excel
A psychologist is interested in determining the relationship

between risk-taking and success
solving novel problems. Having devised the Inventory Risk
Survey Catalog (the I-RiSC), the
psychologist gauges the willingness of a group of 16-year-olds
to do the unconventional and
then provides a series of word problems with which the
participants are unfamiliar. Scores on
the I-RiSC and the problems for 10 participants are listed in
Table 8.9.
Table 8.9: Risk-taking and problem-solving success data
I-RiSC: 2 7 4 5 1 8 7 9 3 6
Problems: 14 17 14 16 12 17 16 17 15 15
To complete the problem in Excel, it is best to set up the data in
two columns. Two rows also
will work, but parallel columns are visually simpler.
1. Create a label in cell A1 for “I-RiSC” and in cell B1
“ProbSolv” so that
the I-RiSC data appear in cells A2 to A11
and the ProbSolv data appear in B2 to B11.
2. From the Home tab at the top of the page click Data, and then
Data Analysis at the
far right.
3. Select Correlation, which is the second option in the window.
4. In the Input Range window enter A2:B11, which indicates the
cells where the data
are found. Note that the default groups the data in columns.
(Change the default if

entering the data in rows.) Had the “Labels in First Row” box
been checked, Excel
would have treated the first row in each column (A2 and B2
because that is what is
tan82773_08_ch08_227-262.indd 246 3/3/16 12:34 PM
designated) as labels rather than data. Our adjustment for the
labels was made by
indicating that the data begin in A2 rather than A1.
5. Enter a cell value below or to the right of the last data entry
for the Output Range so
that the results do not overwrite the scores—either cell A12 or
below, or to the right
of column B.
6. Click OK.
The results appear in a box called a correlation matrix (see
Table 8.10). The intersection of
column 1 and column 2 indicates how well the data in column 1
(the Excel A column, where
I-RiSC data are located) correlate with the data in column 2 (the
Excel B column, which con-
tains the problem-solving scores).
Table 8.10: Correlation matrix

Column 1 Column 2
Column 1 1 0.904203
Column 2 0.904203 1
The result of the analysis is a Pearson correlation of rxy 5
0.904. The 1s in the diagonal indi-
cate that each variable correlates perfectly with itself (rxy 5
1.0), of course. Note that the
output does not indicate whether the calculated value is
statistically significant, which makes
a check of the critical values table necessary. Table 8.5
indicates that rxy0.05(8) 5 0.632. The
relationship between risk-taking and problem solving is
statistically significant. Were these
data not contrived, it would be quite important to know that
about 82% (rxy2 5 0.818) of
problem-solving success (0.9042) is explained by whatever the
I-RiSC measures, ostensibly
the subject’s willingness to be unconventional.
Apply It!
Investigating the Correlation
between Crime and Unemployment
A law enforcement analyst is interested in any link
between crime and unemployment as a guide to allocat-
ing crime-prevention funds. Specifically, she would like
to know whether murders and property crimes correlate
with the unemployment rate.
The analyst obtains the murder and property-crime rates
for her state for the 16 years from 1990 to 2005 from
the FBI Uniform Crime Reports (rates are per 100,000
inhabitants). She then consults the Bureau of Labor Sta-
tistics for the unemployment rate in the state for the

same period. The analyst will compute the Pearson cor-
relation between murder rate and unemployment and
then between property-crime rate and unemployment.
Table 8.11 shows the data.
Digital Vision/Photodisc/Thinkstock
(continued)
tan82773_08_ch08_227-262.indd 247 3/3/16 12:34 PM
(continued)
Table 8.11: Murder rate, property crime, and unemployment
Year
Murder rate
(per 100,000 people)
Property crime rate
(per 100,000 people)
Unemployment
percentage
1990 7.1 4462 5.6
1991 6.7 5092 6.8
1992 6.4 4801 7.5

1993 6.4 4662 6.9
1994 6.2 4678 6.1
1995 5.7 4460 5.6
1996 5.8 4438 5.4
1997 5.4 4279 4.9
1998 6.1 4040 4.5
1999 5.5 3852 4.2
2000 5.1 3592 4.0
2001 4.9 3456 4.7
2002 4.3 3412 5.8
2003 4.2 3289 6.0
2004 4.7 3168 5.5
2005 5.0 3081 5.1
The Excel results indicate the following:
• The correlation between murder rate and unemployment is
rxy 5 0.386.
• Comparing the murder rate/unemployment rate correlation
to the critical value from
Table 8.5 (rxy0.05(14) 5 0.497) indicates that the calculated
correlation is not statistically
significant at p 5 0.05.
• The analyst fails to reject the null hypothesis,
ρ 5 0.
• The property crimes rate and unemployment correlation is
rxy 5 0.551.
• Comparing the calculated value to the critical value from
Table 8.5 (the same
rxy0.05(14) 5 0.497, since df are unchanged) indicates that this
correlation is statistically
significant at p 5 0.05.

• The analyst rejects the null hypothesis, ρ 5 0.
• The coefficient of determination for this relationship is
rxy2 5 0.55122 5 0.303. About 30%
of the variance in the property crime rate can be explained by
the unemployment rate.
Although the rxy2 indicates that about 30% of property crime is
explained by variations in
unemployment, the analyst will want to be careful about making
the conceptual leap to a
causal conclusion. “Explained by” isn’t the same as “caused
by.” To reiterate the point, per-
haps something else explains both crime rate and
unemployment. Perhaps underfunded pub-
lic schooling prompts an unusually high dropout rate from
school. The consequently under-
educated population has more difficulty securing stable
unemployment. Perhaps state budget
cuts have been disproportionately imposed on police agencies,
and with fewer officers on the
street, crime rises. In other words, the simplest explanation
might not be the most accurate. A
statistically significant correlation is not where the analysis
ends.
Apply It! boxes written by Shawn Murphy
tan82773_08_ch08_227-262.indd 248 3/3/16 12:34 PM

Section 8.5 Spearman’s Rho
8.5 Spearman’s Rho
The Pearson correlation requires that both variables must be at
least interval scale. The point-
biserial correlation requires that one variable must be at least
interval scale, and the other
must be a variable with only two levels.
Neither of these correlations is helpful when the data are
ordinal scale, which describes much
of the data that psychologists and other social scientists
encounter. Nearly everyone who goes
to the mall or answers the telephone has been asked to take a
survey, particularly if it hap-
pens to be an election year. Survey data are usually ordinal
scale. It is common for the ques-
tionnaires to have a Likert-type format, where a statement is
read and the respondents are
asked the degree to which they agree with the statement by
selecting from a range of choices
such as:
• Strongly agree
• Agree
• Neither agree nor disagree
• Disagree
• Strongly disagree
Although surveyors commonly code the responses (strongly
agree 5 1, agree 5 2 and so
on) and then calculate means and standard deviations for all
respondents, those statistics
assume that the data are at least interval scale. Survey data
rarely are. The Likert types of
responses are essentially rankings. A response of “strongly

agree” is more positive than
“agree” but precisely how much more is not clear. Besides, one
respondent’s “disagree” may
be another respondent’s “strongly disagree.” These data are
more safely treated as ordinal
scale responses.
Correlating Ordinal, or Mixed Ordinal/Interval Data
In addition to survey data, ordinal scale characterizes other
common data, such as class rank-
ings and percentile scores. Sometimes the variables
investigators might wish to correlate
have mixed scales. For example, a researcher wants to correlate
subjects’ income (ratio scale
data) with their optimism (usually gauged with a Likert-type
survey and so ordinal scale).
Along with the ordinal variable, the income variable is often not
normally distributed. The
lack of normality in both the ratio variable and the ordinal scale
variable rules out a Pearson’s
correlation.
Charles Spearman, Pearson’s colleague at University College
London, developed a tremen-
dously flexible correlation procedure. It accommodates two
variables in a correlation proce-
dure, provided the variables fit any of the following:
• Both are ordinal scale.
• One variable is ordinal scale and one is interval or ratio
scale.
• Two variables are interval or ratio scale, but one or both
fail to meet the Pearson
correlation requirement for normality.

The procedure is Spearman’s rho, symbolized by ρ.
Spearman’s rho is a nonparametric
procedure, which means that it makes no
assumptions about parameters; it means
that ρ will
tan82773_08_ch08_227-262.indd 249 3/3/16 12:34 PM
accommodate data when there are reasons to suspect that the
data are not normally distrib-
uted. The formula, which requires that the scores for each
variable be independently ranked,
is as follows:
Formula 8.3
ρ 5 1 2
6∑d2
n(n2 2 1)
where
d 5 the difference between the rankings for the two variables
n 5 the number of pairs of data
The formula’s 1s and 6 are constant values, used every time a
Spearman’s correlation
is calculated.

Following are the steps to calculating a Spearman’s rho:
1. Rank the scores for both variables separately.
2. For each pair of rankings, subtract the second ranking in the
pair from the first to
produce a difference score, d.
3. Square each of the d values for d2.
4. Sum the d2 values for ∑d2.
5. Solve for ρ.
Ranking Tied Scores
The ranking procedure must follow rules. If some of the scores
for one of the variables have
multiples, all must receive the same ranking. If someone were
ranking the following values,
for example:
3, 5, 6, 6, 7, 8, 8, 8, 9, 10
ranking the values from smallest to largest produces the
following values:
1, 2, 3.5, 3.5, 5, 7, 7, 7, 9, 10.
The smallest value, 3, was ranked “1,” the 5 was ranked “2,”
and so on. The two 6s and the
three 8s were handled as follows:
• Because the two 6s are rankings 3 and 4, those two values
are added and divided by
the number of them (2), which results in 3.5
([3 1 4] 4 2). After both 6s are ranked
3.5 (for places 3 and 4) the next value in the data set, 7, is
ranked 5.

• The 8s are all ranked 7 ([6 1 7 1 8] 4 3),
after which the next value, the 9, is ranked 9.
tan82773_08_ch08_227-262.indd 250 3/3/16 12:34 PM
An Example
Suppose the data ranked above measure emotional sta-
bility, a variable thought to correlate negatively with
stress. If those data are collected for career military ser-
vice personnel assigned to combat areas, and age data
are added for 10 subjects, Table 8.12 might be the result.
Table 8.12: Emotional stability and age data
Emotional stability Age
3 26
5 25
6 32
6 35
7 35
8 34

8 37
8 40
9 42
10 39
Calculations for a Spearman’s rho solution, based on the
information in Problem 8.1, give
ρ 5 1 2
6∑d2
n(n2 2 1)
5 1 2
6(24.5)
10(102 2 1)
5 0.852
Table 8.13 lists the critical values for Spearman’s rho (Table
B.6 in Appendix B). There are no
degrees of freedom for this procedure. The correct critical value
for rho is indicated by the
number of data pairs. Note that for p 5 0.05 and 10 pairs
ρ.05(10) 5 0.648. The relationship
between emotional stability and age among service personnel
assigned to combat zones is
statistically significant; therefore, we reject H0.
Try It!: #5
Spearman’s rho requires data of what

scale?
tan82773_08_ch08_227-262.indd 251 3/3/16 12:34 PM
Table 8.13: The critical values for Spearman’s rho
Number of pairs of scores p 5 0.05 p 5 0.01
5 1.0
6 0.886 1.0
7 0.786 0.929
8 0.738 0.881
9 0.683 0.883
10 0.648 0.794
12 0.591 0.777
14 0.544 0.715
16 0.506 0.665
18 0.475 0.625
20 0.450 0.591
22 0.428 0.562
24 0.409 0.537
26 0.392 0.515
28 0.377 0.496
30 0.364 0.478
Source: University of Sussex. (n.d.). Critical values of
Spearman’s rho (two-tailed). Retrieved
from www.sussex.ac.uk/Users/grahamh/RM1web/Rhotable.htm

Problem 8.2: The Spearman’s rho correlation: emotional
stability
and age among service personnel
1. Ranking the scores produces ρ1 for emotional
stability and ρ2 for age.
2. The d score is the difference between the two rankings.
3. The square of the difference score is d2.
Emotional stability Age ρ1 ρ2 d(ρ1 2 ρ2) d
2
3 26 1 2 21 1
5 25 2 1 1 1
6 32 3.5 3 0.5 0.25
6 35 3.5 5.5 22 4
7 35 5 5.5 20.5 0.25
8 34 7 4 3 9
8 37 7 7 0 0
8 40 7 9 22 4
9 42 9 10 21 1
10 39 10 8 2 4
∑d2 5 24.50
tan82773_08_ch08_227-262.indd 252 3/3/16 12:34 PM
www.sussex.ac.uk/Users/grahamh/RM1web/Rhotable.htm

Apply It!
Exploring the Correlation between
Job Satisfaction and Commute Times
As part of the justification for allowing workers
to work at home part-time, the human resources
director for a large firm intends to investigate
any correlation between job satisfaction and
average commute time for employees. The
director asks ten randomly selected employees
to fill out a job-satisfaction questionnaire with
the following responses to a series of questions:
Response Score
• very satisfied (vs) 1
• somewhat satisfied (ss) 2
• somewhat dissatisfied (sd) 3
• very dissatisfied (vd) 4
The employees were also asked to indicate their average one-
way commute time in minutes.
Recognizing that job satisfaction responses will be ordinal
scale, the HR director opts for
Spearman’s rho. The data and the difference scores are shown in
Table 8.14.
Table 8.14: Spearman’s rho data for the correlation between job
satisfaction and commute time
Commute
time

(minutes)
Commute
rank
Job
satisfaction
total
Satisfaction
rank Difference
Difference
squared
2 1 10 2 21 1
7 2 14 5 23 9
11 3 10 2 1 1
15 4 14 5 21 1
17 5 10 2 3 9
23 6 14 5 1 1
28 7 17 7.5 20.5 0.25
32 8 22 9.5 21.5 2.25
36 9 22 9.5 20.5 0.25
40 10 17 7.5 2.5 6.25
From the table, the sum of the differences is
∑d2 5 1 1 9 1 1 1 1 1 9 11 1 0.25 1 2.25 1 0.25 1 6.25 5 31

Digital Vision/Photodisc/Thinkstock
(continued)
tan82773_08_ch08_227-262.indd 253 3/3/16 12:35 PM
Direction of the Ranking
In the study of emotional stability and age for service
personnel, the least stable value received the ranking of
1, and the most stable a ranking of 10, while the young-
est subject received the age ranking of 1. In terms of
the value of the statistic, it would not have mattered
whether the rankings go from lowest to highest, or
from highest to lowest, as long as both variables are
ranked the same way. We could have ranked the most
emotionally stable 1 and the oldest 1, and the coeffi-
cient would have come out the same. If we reversed just one of
them, however, the correlation
would appear to be negative.
Summary of Spearman’s Rho
Spearman’s correlation provides flexibility to the analyst. As
long as some evidence of a rela-
tionship exists, correlations can be calculated for any
combination of ordinal, interval, and
ratio variables. But of course so much latitude requires some
sacrifice, and it is statistical
power. In the course of ranking values, the amount of difference

between any two data points
is lost. When the ages of the service personnel were ranked,
• the 25-year-old was 1,
• the 26-year-old was 2,
• and the 32-year-old was 3.
Once ranked, the fact that from the first to the second ranking is
a one-year difference and
from the second to the third ranking is a six-year difference is
lost. Pearson’s r retains those
(continued)
For n 5 10, the Spearman’s rho formula is
ρ 5 1 2
6∑d2
n(n2 2 1)
5 1 2
6(31)
10(102 2 1)
5 0.812
For rs 5 0.05 and 10 pairs of data, the critical value is
rs0.05(10) 5 0.648. The relationship between
job satisfaction and average commute time is statistically
significant. Those who commute the
least time have the highest levels of job satisfaction. Perhaps
the attitudes of those who have
the lowest levels of job satisfaction—those who have the
longest commutes—will improve if

they are required to commute less often because they can
sometimes work from home.
Apply It! boxes written by Shawn Murphy
Try It!: #6
For 10 students, grade averages and rank
in class are correlated. How will the result-
ing coefficient be affected if the highest
ranked student is given the lowest value
(1) versus the highest value (10)?
tan82773_08_ch08_227-262.indd 254 3/3/16 12:35 PM
Summary and Resources
differences. When both correlations are calculated for the same
data, their coefficients usu-
ally have little difference, but a Pearson correlation will
sometimes be statistically signifi-
cant when Spearman’s is not. Note the comparison of critical
values at p 5 0.05 shown in
Table 8.15.
Table 8.15: Comparison of Pearson and Spearman critical
values
No. pairs Pearson critical value* Spearman critical value
5 0.878 1.000

6 0.811 0.886
10 0.632 0.648
*for df =number of pairs, 22
In the examples above, the value required for significance with
a Spearman correlation is
higher than that required for a Pearson correlation.
Another limitation of the Spearman correlation is that we cannot
square the Spearman value
to determine the proportion of variance in y explained by x.
Spearman’s rho has no equivalent
of rxy2. When the data do not meet the Pearson requirements,
however, the researcher has no
choice. When the data do meet the requirements, a Pearson’s r
is usually preferable to Spear-
man’s rho.
Correlation in Research
Correlation procedures answer enough of the questions that
interest researchers and con-
sumers of research that the procedures pervade research
literature. Arroyo (2015) exam-
ined the correlation between work engagement and internal self-
concept. Arroyo found that
people tend to engage in the work they do to earn a living, not
for the external rewards, but
for the work’s own sake; their work is intrinsically satisfying.
Ceci and Kumar (2015), meanwhile, asked whether happiness
correlates with creative capac-
ity. They found no significant correlation but did find a
significant correlation between cre-
ative capacity and intrinsic motivation, suggesting that those

with the greatest creative capac-
ity are probably those who are most internally driven to create.
The researchers’ approach to
quantifying happiness is also a matter of interest, since it is
often a challenge to find a way to
quantify something so subjective.
Chapter Summary
Many of the questions researchers and scholars ask deal with
the relationships between
variables. To accommodate them, the discussion in this chapter
shifted to statistical
procedures that reflect the hypothesis of association (Objective
1). Three of the many
correlation procedures that respond to the hypothesis of
association are the Pearson
tan82773_08_ch08_227-262.indd 255 3/3/16 12:35 PM
correlation, the point-biserial correlation, and Spearman’s rho.
In each case, possible
values range from –1.0 to 11.0, and all their coefficients are
interpreted the same
way. Positive correlations indicate that as the values in one
variable increase, the values
in the other also increase. Negative correlations indicate that as
one increases, the

other decreases. The sign of the coefficient, however, is
unrelated to its strength
(Objective 2).
The differences among the correlation procedures in this
chapter are in the kinds
of variables they accommodate. The Pearson correlation
requires interval or ratio
variables that are normally and similarly distributed (Objective
3). A special applica-
tion of Pearson, the point-biserial correlation, requires an
interval/ratio variable and a
second variable that has only two manifestations, or a
dichotomously scored variable
(Objective 5). Spearman’s rho accommodates any combination
of ordinal, interval, or
ratio variables (Objective 6). Because the data used in a Pearson
correlation contain
more information than the rankings that make up the data for
Spearman’s approach,
the Pearson value provides more information about the nature
of the relationship
between the variables. This is evident in the fact that the
Pearson value can be squared
to produce the coefficient of determination. The rxy2 value
indicates the proportion of
one variable that can be explained by changes in the other
(Objective 4). Spearman
values have no equivalent of this statistic.
When two variables share information, they are correlated. The
amount of one explained
by the other is what that rxy2 value, the coefficient of
determination, indicates. This con-
cept provides a foundation for regression, which is the focus of
Chapter 9. Regression

allows what is known of y from analyzing x to predict the value
of y from a value of x.
It involves calculations and thinking with which you are already
familiar, so work the
end-of-chapter problems, reread any of the sections in Chapter
8, and prepare for
Chapter 9.
bivariate correlations Include all proce-
dures that test for significant relationships
between two variables.
canonical correlation Measures the rela-
tionship between two groups of variables.
coefficient of determination Indicates the
proportion of one variable in a Pearson cor-
relation that can be explained by the other.
correlation matrix A box in which the vari-
ables involved are listed in rows as well as
in columns, and each variable is correlated
with all variables, including itself.
hypothesis of association The umbrella
term for significance tests that analyze the
correlation between or among variables.
hypothesis of difference The umbrella
term for significance tests that analyze the
differences between groups.
linear Describes a relationship between
two variables whose strength is consistent
throughout their ranges. With curvilinear
relationships, the strength and sometimes

even the nature of the relationship (positive
or negative) changes depending upon where
in the variables’ ranges they are measured.
Key Terms
tan82773_08_ch08_227-262.indd 256 3/3/16 12:35 PM
multiple correlation Gauges the strength
of the relationship between one variable and
two or more other variables.
nonparametric Tests for data that do not
meet the usual normality requirements.
More technically, a test in which there is no
interest in population parameters.
partial correlation Measures the relation-
ship between two variables, controlling for
the influence of a third in both of the first two.
Pearson correlation coefficient Indicates
the strength of the relationship between
interval- or ratio-scale variables.
point-biserial correlation A special appli-
cation of the Pearson correlation for those
instances where one of the variables, such
as gender or marital status, has just two

manifestations.
range attenuation Occurs when a variable
is not measured throughout its entire range.
Attenuated range artificially reduces the
strength of any resulting correlation value.
scatterplot A graph representing two vari-
ables, one on the horizontal axis, the other
on the vertical axis. Each point in the graph
indicates the measure of both variables for
one individual.
semi-partial correlation Gauges the rela-
tionship between two variables, controlling
for a third in just one of the first two.
Spearman’s rho A correlation procedure
for two ordinal variables, one ordinal and
one interval/ratio variable or two interval or
ratio variables, that fail to meet Pearson cor-
relation requirements for normality.
Review Questions
Answers to the odd-numbered questions are provided in
Appendix A.
1. What values indicate the strongest and weakest values for a
Pearson’s r?
2. What is the equivalent in a Pearson
correlation for η2?
3. What are the requirements for calculating Pearson’s r?
4. What is “range attenuation,” and how does it affect

correlation values for linear
relationships?
5. A university counselor gathers data on students’ grades and
whether or not they
are employed. What statistical procedure will gauge that
relationship?
6. What procedure will indicate whether there is a significant
relationship
between sales representatives’ sales rank and their attitudes
about the product
they sell?
7. a. What procedure will gauge the relationship between
university students’ grade
averages and their scores on, for example, a statistics test?
b. What statistic will indicate the proportion of the students’
test scores that is a
function of their GPA?
tan82773_08_ch08_227-262.indd 257 3/3/16 12:35 PM
8. A forensic psychologist gathers data on the average time of
night juveniles go to bed
and whether or not they have an arrest record.
a. What procedure will allow the psychologist to evaluate the

relationship between
those two variables?
b. What is the resulting coefficient?
c. How much of variability in arrest records can be explained by
what time the juve-
nile goes to bed?
Juvenile Retire Arrest
1 9.0 No
2 9.5 No
3 11.0 Yes
4 11.5 Yes
5 10.0 Yes
6 9.75 No
7 10.0 No
8 10.25 Yes
9. A group of consumers has just taken two surveys on (a) their
attitude about
the economy and (b) their attitude about those in government. In
both, higher
scores mean more optimism. The data are ordinal scale. Are the
two attitudes
related?
Consumer Economy Government

1 15 10
2 5 4
3 16 11
4 10 8
5 11 13
6 3 4
7 12 10
8 11 8
9 10 7
10 14 9
tan82773_08_ch08_227-262.indd 258 3/3/16 12:35 PM
10. A group of students has been told that reading will help
them in a test of verbal
ability required by the university they wish to attend. The x
variable indicates the
minutes per day spent reading. The y variable represents
students’ scores on

the test.
Student Minutes (x) Score (y)
1 15 57
2 80 84
3 0 60
4 75 92
5 30 65
6 10 60
7 22 75
8 15 68
a. Is the relationship statistically significant?
b. How much of the variance in test scores can be explained by
differences in the
amount of time spent reading?
11. A district psychologist is working with developmentally
disabled students in a
special education setting and is curious about the relationship
between students’
persistence on puzzle tasks (measured in the number of minutes
they remain on
task) and their number of absences from class.
Student Persist Absent

1 12 3
2 4 3
3 15 5
4 18 7
5 12 1
6 5 4
7 8 3
8 9 4
Is the relationship between persistence and attendance
statistically significant at
p 5 0.05?
tan82773_08_ch08_227-262.indd 259 3/3/16 12:35 PM
12. An employer wishes to analyze the relationship between
stress and job perfor-
mance. Stress is reflected by systolic blood pressure. Job
performance is measured in
the number of sales per day.
a. What is the appropriate correlation procedure?

b. Is the relationship statistically significant?
Employee Sales Blood pressure
1 1 150
2 4 140
3 3 140
4 6 110
5 2 140
6 4 130
7 0 160
8 3 110
9 5 120
10 7 160
13. An industrial psychologist is determining the relationship
between workers’ willing-
ness to embrace new manufacturing procedures, gauged with a
dogmatism scale
(higher scores indicate greater dogmatism), and their level of
job satisfaction (higher
scores indicate greater satisfaction). The satisfaction data are at
least ordinal scale.
a. What is the relationship?
b. What is the null hypothesis?
c. Do you reject or fail to reject the null hypothesis?

d. What is the relationship between dogmatism and job
satisfaction?
e. Is the correlation statistically significant?
Worker Dogmatism Satisfaction
1 8 4
2 4 12
3 3 14
4 5 15
5 7 5
6 2 14
7 3 15
8 1 15
tan82773_08_ch08_227-262.indd 260 3/3/16 12:35 PM
Answers to Try It! Questions
1. A single point in a scatterplot represents two raw scores, one
for x and one for y.

2. If the two variables are normally distributed but uncorrelated,
their combined scat-
terplot will be circular with greatest density in the middle of the
plot because of the
tendency for most of the data to fall in the middle of either
distribution.
3. Range attenuation diminishes the strength of the correlation
value in linear relation-
ships. It produces an artificially low correlation coefficient.
4. As degrees of freedom increase, the correlation value
required to reach significance
diminishes.
5. Spearman’s rho accommodates variables that have any
combination of ordinal,
interval, or ratio scale.
6. The coefficient would indicate that the higher the ranking,
the lower the GPA. If a
ranking of 1 is “best,” the best (highest) GPA must also receive
a class ranking of 1.
Otherwise, the relationship looks negative when it is not.
tan82773_08_ch08_227-262.indd 261 3/3/16 12:35 PM
tan82773_08_ch08_227-262.indd 262 3/3/16 12:35 PM

ARTX 435 The Fashion Consumer
Name: Date:
I Want That! How We All Became Shoppers- Discussion
Questions
1. How do we use objects to define our identity?
2. What does the author mean when he writes that objects are
“repositories of magic”?
3. The author writes, “The catalogue told Jane about the
moment in which she was living.” What is meant by this? What
resources do consumers use today to learn about the moment in
which they are living?
4. What does the author mean when he describes “just looking”
as a form of “domestic due diligence”?

5. What is included in the “buyosphere”?

2278CorrelationAnrodphotoiStockThinkstockChapter.docx

Recommended

Recommended

More Related Content

Similar to 2278CorrelationAnrodphotoiStockThinkstockChapter.docx

Similar to 2278CorrelationAnrodphotoiStockThinkstockChapter.docx (16)

More from lorainedeserre

More from lorainedeserre (20)

Recently uploaded

Recently uploaded (20)

2278CorrelationAnrodphotoiStockThinkstockChapter.docx