SlideShare a Scribd company logo
1 of 109
12
The Chi-Square Test: Analyzing
Categorical Data
Learning Objectives
After reading this chapter, you should be able to:
• Describe the conditions that fit chi-square tests.
• Calculate and interpret the goodness of fit test and chi-square
test of independence.
• Calculate and interpret the phi coefficient and Cramer’s V.
iStockphoto/Thinkstock
tan81004_12_c12_295-322.indd 295 2/22/13 3:44 PM
CHAPTER 1212.1 Examining Categorical Data
Chapter Outline
12.1 Examining Categorical Data
12.2 The Goodness-of-Fit (1 3 k) Chi-Square
Calculating the Test Statistic
Interpreting the Test Statistic
Understanding the Chi-Square Hypotheses
Distinguishing Between Goodness-of-Fit Chi-Square Tests and
t-Tests or ANOVAs
A 1 3 k (Goodness-of-Fit) Chi-Square Problem With Unequal
fe Values
A Final 1 3 k Problem
12.3 The Chi-Square and Statistical Power
12.4 The Goodness-of-Fit Test in Excel
12.5 The Chi-Square Test of Independence
Setting up the Chi-Square Test of Independence
Interpreting the Chi-Square Test of Independence
Phi Coefficient and Cramer’s V
A 3 3 3 Test of Independence Problem
Chapter Summary
12.1 Examining Categorical Data
The 19th-century British statesman Benjamin Disraeli is
credited with saying that there are three kinds of lies: lies,
damned lies, and statistics. Clearly, he had to have a place
in this book, even if it is in the final chapter. But he belongs
here because of another com-
ment that is particularly relevant to the topics in this chapter.
He observed that what
we anticipate seldom occurs and what we least expect generally
happens (Oxford, 1980).
Disraeli’s expressed skepticism was almost certainly tongue in
cheek. Indeed, the work
on regression in Chapters 9 and 10 is based on the
understanding that outcomes are not
unpredictable, but the statement provides an effective segue into
the connection between
what occurs and what might be expected to occur. That analysis
is the focus of this chapter.
Part of the discussion in Chapter 2 was how data differ
according to scale, and how the
statistics that can be calculated also relate to scale; you learned
about different types of
data scales and the appropriate types of statistics for each. For
example, for nominal scale
data, only the mode (Mo) makes sense as a measure of central
tendency. Subsequent chap-
ters revealed that it is not only descriptive statistics that are
specific to the scale of the data.
The more involved statistical tests are also data-scale
dependent. Recall that the depen-
dent variable in a t-test, a z-test, and ANOVA must be data that
fit a continuous (interval
or ratio) scale. Both variables in the Pearson Correlation must
be at least interval scale.
These distinctions are very important. Along with whether the
hypothesis deals with dif-
ference or association and whether the groups are independent,
the scale of the data is an
important guide to determining the appropriate statistical
procedure.
tan81004_12_c12_295-322.indd 296 2/22/13 3:44 PM
CHAPTER 12Section 12.2 The Goodness-of-Fit (1 3 k) Chi-
Square
Sometimes the best data available are not continuous. There
may be no way to verify the
normality of the data (perhaps because they are not normal).
The groups involved in the
analysis may not have equivalent variances. And perhaps the
relationships between vari-
ables are not linear. When important assumptions about the
quality of the data cannot be
satisfied, the answer is to move to tests with more relaxed
requirements.
Chapter 8 provided the first discussion of nonparametric
statistical analysis. Recall that
these tests set aside a number of important assumptions about
the characteristics of the
groups involved in an analysis. Removing some of the
requirements associated with the
characteristics of the data provides greater analytical flexibility.
Chi-square tests allow for the analysis of data that are
exclusively categorical. This distin-
guishes them from all the statistical tests discussed in previous
chapters. In addition, the
use of categorical data, since categorical/nominal data cannot
reflect the characteristics of
normality, suggests that the chi-square tests are nonparametric
tests, also referred to as
“distribution-free” tests. No assumptions are needed about how
the data are distributed,
nor are there requirements regarding their scale. These tests
provide flexibility, but they
exact a cost as well, which will be discussed as the chapter
progresses. The chi-square pro-
cedures have many applications in business analysis and
decision making and are a main-
stay in the manager’s statistical “toolbox.”
The chi-square tests were developed by Karl Pear-
son—the “Pearson Correlation” Pearson. Note that
the Greek letter for “c” is written x and pronounced
with a hard c, so chi is pronounced “kie,” rhyming
with “pie.” There are two chi-square procedures
discussed in this chapter. Both of them have two
names. The first is called the goodness-of-fit chi-
square test, or alternatively the 1 3 k (said “one by
kay”) chi-square.
12.2 The Goodness-of-Fit (1 3 k) Chi-Square
Perhaps a market research specialist is trying to determine
whether several local “talk radio” stations have approximately
similar audiences. Among other things, the answer
will influence what individual stations can charge for
advertising. The market research
specialist makes a random selection of people from a local
telephone book and calls each
number to ask residents if they listen to talk radio. For those
answering in the affirmative,
the question is which station they prefer.
Both names for this procedure are instructive for what they
reveal about the kind of analy-
sis involved. “Goodness-of-fit,” however awkward the grammar,
indicates that what is at
issue is how “good” the data fit an initial hypothesis. That
initial hypothesis describes the
expected distribution of the data. In the talk radio example, the
procedure will be to test
whether listeners prefer the major talk stations in about equal
proportions; it will provide
an analysis of how well the data fit that assumption.
Key Terms: The goodness-
of-fit, or 1 3 k chi-square, is
a test for significant differences
in the categories of a single,
nominal scale variable.
tan81004_12_c12_295-322.indd 297 2/22/13 3:44 PM
CHAPTER 12Section 12.2 The Goodness-of-Fit (1 3 k) Chi-
Square
The 1 3 k chi-square name is a reminder that the procedure
involves just a single grouping
variable (the “1” in the name), which is divided into some
number (k) of categories. Note
that the “k” here has the same meaning that it had in ANOVA. It
refers to the number of
groups in the analysis. In the preceding example:
• The categorical (grouping) variable is the preferred radio
station.
• The k refers to the number of categories into which the single
variable is
divided, which is the number of talk radio stations that will be
involved in
the analysis.
Take care not to confuse the number of categories with the
number of variables involved
in the analysis. If there are five talk radio stations in the
listening area, there are five cat-
egories or dimensions of the single variable, preferred radio
station.
Recall that nominal data are also often called categorical, or
“count” data, stemming from
the fact that the measurement involved with these kinds of data
is a matter of asking listen-
ers which station they listen to, and then sorting them into the
relevant category (preferred
radio station) accordingly. This “count” becomes the dependent
variable in chi-square
tests. The analysis hinges on the frequency with which subjects
fall into the individual
categories. Note that the issue is not how much more the
individual prefers station A to
station B, something that would indicate ordinal scale data, and
the question is not how
much time the individual spends listening, which would provide
ratio scale data. The only
question for respondents who listen to talk radio is which is the
preferred station.
If the question that drives the study is whether listeners prefer
the five stations in about
equal proportions, that is the hypothesis that will be tested. In
that instance, the expecta-
tion is that numbers of listeners in each of the categories that
represent the different talk
radio stations will be reasonably similar. If they are, the
resulting chi-square value will not
be statistically significant. Statistically significant results
emerge in a goodness-of-fit chi-
square when there is a substantial discrepancy between what is
observed in the data and
what is expected based on that initial hypothesis. Exactly how
much of a discrepancy
there must be is determined by comparing the calculated value
of chi-square to a critical
value that, like the other t, F, and r test statistics, is determined
by the degrees of freedom
for the problem and the probability level at which the test is
conducted.
Referring back to the radio station problem, the market
research specialist needs to either find support for the hypoth-
esis that listeners tune in to all five stations in about equal
proportions or update the expectation. Ninety-five listeners
are asked about their station preferences. Sixty of the respon-
dents name one of the five stations in the market area. The
other 35 listen to subscription stations on satellite radio that
are not located in the area. Since the interest is in listeners to
local stations, those 35 people
were excluded from the study. From the remaining 60 listeners,
the results are as follows:
StationA 15
StationB 8
StationC 12
StationD 10
StationE 15
Review Question A:
What is the scale of
the data required by
either of the chi-
square tests?
tan81004_12_c12_295-322.indd 298 2/22/13 3:44 PM
CHAPTER 12
Where
x2 5 the value of chi-square
fo 5 the frequency observed; how many individuals occur in
the particular
categories
fe 5 the frequency expected; how often individuals can be
expected to occur in the
particular category according to the initial hypothesis
The chi-square test statistic involves a good deal of repetitive
subtracting and squaring.
A good way to keep the calculations straight is to complete
them in something like Table
12.1. The successive steps in the calculations are represented in
the rows beginning with
the fo row near the top of the table, and then working down the
rows one at a time. Each
row in the table represents a calculation step for determining
the x2 value.
T
Table 12.1: The 1 3 k chi-square
The survey of the 60 respondents indicates that preferences are
as follows:
StationA 15
StationB 8
StationC 12
StationD 10
StationE 15
These results provide a range of 8 listeners for the least
frequently mentioned Station B to
15 listeners for the most popular stations, which are Stations A
and E, so there are clearly
differences in listeners’ preferences. The question the chi-
square procedure will answer is
whether the differences in “count” across the five stations are
just random differences that
can be expected because of sampling error, or whether the
differences are great enough
that they are likely to emerge every time data are collected and
the results analyzed. It is
that last outcome, of course, that defines statistical significance.
Calculating the Test Statistic
The chi-square test statistic, which is neither intimidating to
look at nor difficult to calcu-
late, has this form:
Formula 12.1 x2 5 S
1 fo 2 fe 2 2
fe
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 299 2/22/13 3:44 PM
CHAPTER 12
Following the steps outlined above,
Statistic Station A Station B Station C Station D Station E
fo 15 8 12 10 15
fe 12 12 12 12 12
fo 2 fe 3 24 0 22 3
( fo 2 fe)
2 9 16 0 4 9
( fo 2 fe)
2/fe .75 1.33 0 .33 .75
x2 5.751 1.33 101 .33 1.755 3.16
The values in the fo row are counts of the number of individuals
that occur in each cat-
egory of the variable.
• These “frequency observed” values are the number of listeners
from the
sample of 60 who indicate that they listen to a particular radio
station.
• The sum of the fo values across the multiple categories of the
variable must
always sum to the total sample size, n.
The second row, designated fe, indicates what is expected based
on whatever hypothesis
or assumption prompted the analysis.
• For the problem above, the hypothesis/assumption is that
listeners are
attracted to the five stations in approximately equal proportions.
• That expectation is reflected in equal values for each of the
different
categories.
Later there will be a problem where the expectation is that the
categories will not be equal,
which must also be reflected in the fe values. If, for example,
station A is associated with
a major network and has several nationally syndicated shows,
perhaps the expectation is
that station A is twice as popular as the others. In that case, the
fe value for station A would
be twice as high as the fe values for each of the other stations.
Here, however, the problem is simpler. The hypothesis is that
the stations are equally
popular, so determining what to expect is simply a matter of
dividing the total number of
listeners by the number of categories:
fe 5 n/k
Where
n 5 the total of all subjects in all categories
k 5 the number of categories
So for the radio station problem, because n 5 60 and k 5 5, the
fe values are determined
as follows:
fe 5 60/5 5 12 in each fe category.
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 300 2/22/13 3:44 PM
CHAPTER 12
Study the formula for the test statistic for a moment, review the
order of mathemati-
cal operations, and the process for calculating x2 will be
straightforward. Recall from
what your ninth-grade algebra teacher told you that when there
are multiple operations
required:
• The first step is to complete whatever needs to be done in the
parentheses
(“please”).
• Then deal with exponents (“excuse”).
• Handle multiplication and division (“my dear”) next.
• Then complete any addition and subtraction last (“Aunt
Sally”) when they
are not in parentheses.
So in terms of the rows in Table 12.1, the process is to:
1. Fill in the fo values in the first line, determined by the
number of people who
indicate that they listen to a particular station.
2. For this problem at least, because of the expectation that the
five stations are
all about equally popular among listeners, divide n by k and
enter that value in
each of the five fe boxes on the second line.
Now, following order of mathematical operations in columns 1
through 5:
3. On line 3 enter the value that is fo 2 fe for each of the five
stations; for each
category it is the fo value from the first line minus the fe value
from the second
line.
4. On line 4 deal with the exponent by squaring each difference
between fo and fe.
5. On the next line, divide the result of the previous line ( fo 2
fe)
2 by the fe value
from the second line.
6. On the final line, sum the ( fo 2 fe)
2/fe across the five categories to determine the
x2 value.
The result of completing these steps for the data above is that
x2 5 3.16.
Interpreting the Test Statistic
The next step is to determine whether a x2 value of 3.16 is
statistically significant by com-
paring it to the appropriate critical value of x2 from Table 12.2.
As with the other tables,
the critical value is determined according to the probability
level at which the test is con-
ducted (p 5 .05 is the default level for this test as well), and the
number of degrees of
freedom for the problem. The df for the goodness-of-fit chi-
square are k 2 1, the number of
categories of the variable, minus 1. Here, df 5 4.
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 301 2/22/13 3:44 PM
CHAPTER 12
Table 12.2: The critical values of chi-square
df p 50.05 p 50.01 p 50.001
1 3.84 6.64 10.83
2 5.99 9.21 13.82
3 7.82 11.35 16.27
4 9.49 13.28 18.47
5 11.07 15.09 20.52
6 12.59 16.81 22.46
7 14.07 18.48 24.32
8 15.51 20.09 26.13
9 16.92 21.67 27.88
10 18.31 23.21 29.59
11 19.68 24.73 31.26
12 21.03 26.22 32.91
13 22.36 27.69 34.53
14 23.69 29.14 36.12
15 25.00 30.58 37.70
16 26.30 32.00 39.25
17 27.59 33.41 40.79
18 28.87 34.81 42.31
19 30.14 36.19 43.82
20 31.41 37.57 45.32
Source:
http://home.comcast.net/~sharov/PopEcol/tables/chisq.html
Retrieved: 6 July, 2012.
As with the other tests conducted so far, the calculated value of
chi-square is statistically
significant when it is equal to, or larger than, a critical value.
That value is determined
by the probability level of the test and the degrees of freedom
for the problem. For a
x2.05(4) the table indicates that the critical value is 9.49. A
critical value from the table
greater than the calculated value of chi-square indicates that the
fo to fe difference is best
explained by differences that could occur by chance in the chi-
square distribution. The
result is not statistically significant. This result prompts the
marketing analyst to fail to
reject the null hypothesis.
As an aside, the critical values for chi-square are often
appended to two decimals in their
tables, just as z values were in their table. With chi-square,
stopping at two decimals is not
just a matter of crowding more values onto a page as was the
case with z. The nominal
data upon which chi-square values are based are relatively crude
compared to ordinal, or
interval or ratio, data, and it makes less sense than with those
other data to imply the level
of exactness suggested by three decimals.
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 302 2/22/13 3:44 PM
http://home.comcast.net/~sharov/PopEcol/tables/chisq.html
CHAPTER 12
Understanding the Chi-Square Hypotheses
Like all statistical hypotheses, the null and alternate hypotheses
in chi-square problems
refer to what occurs in the populations the samples represent.
• In the language of statistics, the null hypothesis for a chi-
square problem is
that the frequency expected is equal to the frequency observed
(Ho: fe 5 fo).
The fact that the result in the problem just worked was not
statistically sig-
nificant indicates that in a population where people listen to
five talk radio
stations in equal proportions, it is not improbable to draw a
sample in which
the numbers of listeners who prefer each station range from 8 to
15 out
of 60. A sample with fo values of 15, 8, 12, 10, 15 is still
consistent with the
null hypothesis, and it is one of the outcomes that make up the
chi-square
distribution.
• The alternate hypothesis is that fo values of 15, 8, 12, 10, 15
is not a sam-
ple likely to occur in the chi-square distribution. Stated
symbolically it is
was actually
observed.
The null hypothesis indicates that any variability between what
is observed and what
is expected is explained by what will probably occur in the chi-
square distribution. In
other words, there is insufficient evidence to reject the
possibility that what occurred is
likely to have occurred by chance. The alternate hypothesis is
that there is too much dif-
ference between what is observed and what is expected to
conclude that the outcome is
due to chance.
Distinguishing Between Goodness-of-Fit Chi-Square Tests and
t-Tests
or ANOVAs
The 1 3 k, or goodness-of-fit, chi-square procedure falls under
the general hypothesis of
difference category of procedures. In that regard it is similar to
the independent samples
t-test and to ANOVA. Like those procedures, the value of the
chi-square statistic is a mea-
sure of difference. The primary difference is the scale of the
data in the analysis. In inde-
pendent samples t-tests and ANOVA, the t and the F
respectively are measures of the
difference between the means of the samples involved in the
analysis. The independent
(grouping) variable is categorical, and the dependent variable is
continuous (interval or
ratio scale). On the other hand, the chi-square statistic measures
the difference between
the frequencies of occurrence of a nominal (categorical)
variable compared with what is
expected. The larger the gap between the expected and observed
frequency distribution,
the greater the difference between fo and fe. Since the
dependent variable is the “count,” or
frequency, of occurrence of the categorical variable of interest,
it is impossible to calculate
means and standard deviations, which makes t-tests and
ANOVAs impossible.
A 1 3 k (Goodness-of-Fit) Chi-Square Problem With Unequal fe
Values
This first chi-square problem was based on the assumption that
listeners preferred the five
radio stations in about equal proportions, but equal fe values
across all categories of the
variable are not always the case. Perhaps a consumer advocate
is testing the claim made
by the manufacturer of an energy drink called Rush that
consumers prefer its product
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 303 2/22/13 3:44 PM
CHAPTER 12
2-to-1 over the major competitor’s product (Advantage) based
on taste alone. If this is accu-
rate, a random sample of preferences from consumers of energy
drinks should indicate
that twice as many prefer Rush over Advantage. Since it is
highly unlikely that a random
sample will yield exactly those results even if the claim is
accurate, the chi-square test can
be used to determine whether sample results are close enough to
support that claim, or
whether results are significantly different from the claim.
The consumer advocate takes a sample of 150 students and finds
that 27 of them have
used both Rush and Advantage and express a preference for one
over the other. The other
123 students prefer either some other energy drink, or use none
at all. Their responses are
discarded. Of the remaining 27 students, 16 of them prefer Rush
and 11 prefer Advantage.
Just as with the first problem, the 11 and 16 numbers represent
the fo values, and their
sum equals the value of n for the problem. That is the easy part.
Because the claim by the
manufacturer of Rush is that consumers prefer its drink 2-to-1
over the major competitor,
Advantage, the fe values must reflect the 2-to-1 expectation.
Calculating fe Values for Unequal Categories
The total of both frequencies observed and frequencies expected
must sum to the total, n.
This will always be the case, regardless of the particular
hypothesis.
S fo 5 n, and
S fe 5 n
To calculate the fe values when the numbers in multiple cat-
egories are not the same will involve vindicating that ninth-
grade math teacher who said that someday algebra would be
helpful. To determine the fe values,
1. Let x equal fe for the number who prefer the Advantage
energy drink
2. Since the expectation in this example is that twice as many
consumers will
prefer Rush over Advantage, let 2x be the fe for those who
prefer the Rush
energy drink
3. Because the fe categories must sum to the total then x 1 2x 5
n
4. Since, n 5 27, the expression can be changed as follows: x 1
2x 5 27
a. If x 1 2x 5 27, if follows that 3x 5 27
b. If 3x 5 27, then x 5 27/3, which makes x equal to 9
As a result of these calculations, the fe value for the Rush
consumers is 18 (2x 5 2 3 9 5 18),
and the fe value for Advantage consumers is 9 (x 5 9). With
those values in hand, the claim
that Rush is preferred twice as often as Advantage on the basis
of taste can be tested with the
1 3 k chi-square. The solution is Table 12.3.
Review Question
B: What is the null
hypothesis in a chi-
square problem?
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 304 2/22/13 3:44 PM
CHAPTER 12
Table 12.3: A 1 3 k chi-square for unequal fe values: Is Rush
twice as popular as Advantage?
Advantage Rush
fo 11 16
fe 9 18
fo 2 fe 2 22
( fo 2 fe)
2 4 4
( fo 2 fe)
2/fe .44 .22
x2 5 .44 1.22 5 .66
x2 5 .66
x2.05(1) 5 3.84. Accept Ho.
Interpreting the Results
Since the calculated value of chi-square is lower than the
critical value from the table
for p 5 .05 and 1 degree of freedom, the decision is to fail to
reject. That part is straight-
forward enough, but in this problem where the claim is that
Rush is twice as popular as
Advantage, what does failing to reject mean? The key is the null
hypothesis, which always
reflects the expectation upon which the test is based. Because
this problem was set up
with an fe value for one outcome that is two times the value of
the other, failing to reject
the null hypothesis means that there is not enough evidence to
reject the claim that Rush
is twice as popular as Advantage. To say it another way,
although the data do not reflect
exactly a 2-to-1 preference for Rush (16 is not 2 3 11), the
departure from that claim is not
sufficient to allow the consumer advocate to reject it.
Note that the makers of Rush maintain that their product is
twice as popular as Advantage
based on taste. Whether it is entirely taste or not probably
cannot be verified. Perhaps it is
marketing ability that prompts the students to prefer Rush, or
maybe the costs of the two
products differ, or maybe one comes in a more convenient size
than the other. The way the
data were collected made the consumer’s stated preference the
issue, without questions
about the reasons for the preference. For whatever reason,
students prefer the one product
to the other by a great enough margin that it could be 2-to-1.
A Final 1 3 k Problem
To solidify the grasp of the goodness-of-fit procedure, here is
one more problem. In this
instance a consulting company is retained by a satellite provider
to determine whether,
in a particular region of the country, satellite TV is three times
more popular than free TV,
and whether cable TV is twice as popular as free TV. The
consulting company is retained
to check the veracity of that claim. A random sample of 93
viewers in the region is exam-
ined and found to rely on the following for television service:
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 305 2/22/13 3:44 PM
CHAPTER 12
Satellite 65
Cable 16
Free 12
The same approach used in the last problem to determine the fe
values produces the
following:
3x (satellite) 1 2x (cable) 1 x (free TV) 5 93
6x 5 93
x 5 15.5
That x value makes the fe for
free TV 5 15.5,
for cable TV 5 2 3 15.5 5 31,
and for satellite TV 5 3 3 15.5 5 46.5.
Note that the fe values sum to 93, as they must. Do not be
distracted by the fact that the fe
values are not whole numbers. Although asking which type of
TV service people are using
will make the fo values whole, the fe numbers indicating what
people are expected to use
can take on any value. The calculations for this problem are in
Table 12.4.
Table 12.4: Another 1 3 k chi-square problem
Satellite Cable Free TV
fo 65 16 12
fe 46.5 31 15.5
fo 2 fe 18.5 215 23.5
( f o 2 fe)
2 342.25 225 12.25
( f o 2 fe)
2/fe 7.36 7.26 .79
S x2 5 7.36 1 7.26 1 .79 5 15.41
With a calculated x2 5 15.41 and the table value for x2 of 5.99
when testing at p 5 .05 with 2 degrees of freedom, the results
indicate that the null hypothesis should be rejected. The way
these 93 people are distributed does not fit a chi-square dis-
tribution where cable is twice as popular and satellite is three
times as popular as free television.
Review Question C:
How is the scale of the
data involved in an
analysis related to the
power of the statisti-
cal procedure?
Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square
tan81004_12_c12_295-322.indd 306 2/22/13 3:44 PM
CHAPTER 12Section 12.3 The Chi-Square and Statistical Power
The difficulty with a problem such as this one is that the initial
hypothesis is actually two
hypotheses: Satellite is three times as popular as free television,
and cable is twice as pop-
ular as free television. When the results indicate rejecting the
null hypothesis, it could be
because the expected ratio of satellite versus free TV customers
is not supported, because
the ratio of cable versus free TV customers is not supported, or
because neither hypothesis
is supported. Recall that this was the case with a significant F
in ANOVA as well. It did not
provide clear evidence for which two specific groups are
significantly different. A more
definitive chi-square test would require that the consultant
gather new data and check
those hypotheses individually.
12.3 The Chi-Square and Statistical Power
As noted at the beginning of the chapter, distribution-free tests
like chi-square proce-dures provide great flexibility. They
provide no restrictions regarding the scale of the
data, there are no normality assumptions to contend with, and
these procedures work
quite well with small samples. In the statistical version of the
“there’s no such thing as
a free lunch,” expression, however, chi-square and most
nonparametric procedures have
a drawback.
Note that even though there were differences between what was
observed and what was
expected in the first two problems in this chapter, neither chi-
square value was statistically
significant. The chi-square procedures are not very sensitive to
minor variations between
what is seen and what is expected. Indeed, the differences must
be fairly substantial to
produce a significant chi-square value. Recall that power in
statistical testing is the ability
of a procedure to detect statistical significance. Compared to
something like ANOVA, the
chi-square procedures are not particularly powerful.
Much of the lack of power comes down to the scale of the data
that are involved. Nominal
data provide no information about what is measured except the
category of the variable
to which the individual belongs. For each of the energy drink
consumers who prefer Rush
over Advantage, all that is revealed is which energy drink is
preferred. Nothing in the data
indicates how much more one drink is preferred over the other.
There is no ranking of
preference on a scale of 1 to 5. All that is known is that,
presented a choice, the consumer
chose Rush.
Recall that all statistical tests are based on probabilities, and
that because the outcome is
therefore never a certainty, there is the constant possibility of a
Type I or Type II decision
error. Because the chi-square procedures are insensitive to
minor variations between fo and
fe, chi-square analyses are more inclined toward Type II (beta)
decision errors than tradi-
tional parametric tests of significant differences. The risk is that
an analysis that suggests
that results are not statistically significant might be set aside if
new data were gathered
and the analysis run a second time.
On the bright side, Type I (alpha) decision errors are relatively
uncommon. It is not likely
that upon finding a result statistically significant, further testing
with new data would
suggest otherwise; a decision to reject the null hypothesis is
unlikely to be overturned by
a second analysis.
tan81004_12_c12_295-322.indd 307 2/22/13 3:44 PM
CHAPTER 12Section 12.4 The Goodness-of-Fit Test in Excel
12.4 The Goodness-of-Fit Test in Excel
The procedures in the Excel Data Analysis package do not
include the chi-square tests. However, because the test statistic
involves a good deal of repetitive subtracting,
squaring, dividing, and so on, it is not difficult to set up an
Excel spreadsheet to accommo-
date a 1 3 k problem. This can be done by organizing a
spreadsheet to complete the same
calculations used in Tables 12.1, 12.3, and 12.4. To illustrate,
an organic vegetable grower
claims that in spite of the higher price, shoppers will select
organically grown spinach as
often as spinach grown with the help of pesticides and chemical
fertilizers. The first 30
people who buy spinach on a particular day at a grocery store
are examined for whether
they bought the organically grown vegetable. Results are as
follows:
Organically grown: 10
Conventionally grown: 20
To test the grower’s claim,
• Enter the labels organic in cell B1 and conventional in C1.
• Enter the labels fo in cell A2, fe in A3, fo 2 fe in A4, ( fo 2
fe) sqd in A5, 4fe in
A6, and sum in A7.
• Enter the values 10 and 20 in cells B2 and C2 respectively.
Since the claim is that organic spinach will sell as frequently as
conventionally grown
spinach, the fe values are simply n 4 2 5 15.
• Enter 15 in both B3 and C3.
• In cell B4 enter the formula 5B2-B3 and then press Enter.
• With the cursor on B4 hold the shift key down and move the
cursor to C4 and
then enter the command to fill right which will be near the far
right of the
menu ribbon so that the procedure in B4 is repeated in C4.
• In cell B5 enter the command 5B4^2 to square the value in B4.
• Repeat the B5 procedure in C5 using the fill right command as
above.
• In cell B6 enter the command 5B5/B3.
• Repeat the B6 procedure in C6 using the fill right command.
• In cell B7 enter the command 5sum(B6:C6).
The last command in the above sequence produces the chi-
square value for this problem,
x2 5 3.33. The critical value from the table for 1 degree of
freedom and p 5 .05 is x2 5 3.84.
The results are not significantly different from the organic
grower’s claim; the organically
grown spinach may be just as popular as the traditionally
produced spinach, in spite of its
higher price. Figure 12.1 is a screenshot of the spreadsheet for
this problem.
tan81004_12_c12_295-322.indd 308 2/22/13 3:44 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
Figure 12.1: Chi-square goodness-of-fit using Excel
12.5 The Chi-Square Test of Independence
The goodness-of-fit or 1 3 k chi-square procedure
accommodates just one categorical (grouping) variable. In that
regard, it is similar to a one-way ANOVA, which likewise
involves just a single variable, although it is an interval or ratio
scale rather than a nomi-
nal scale variable with the data divided into any number of
categories or groups. Although
basing an analysis on a single variable keeps the arithmetic
simple, it consigns to error any
variance that is not explained by that single variable. On the
other hand, when multiple
independent variables are included, as is the case in factorial
ANOVA, there is less residual
variance and a smaller error term. In addition, besides each
variable contributing to the
result, sometimes multiple variables act together, a phenomenon
called a “statistical inter-
action” in factorial ANOVA. A similar thing can happen with
chi-square procedures. Some-
times a single variable is an inadequate explanation of an
outcome, and in those circum-
stances a second variable will act in concert with the first.
The factorial ANOVA has an approximate equiva-
lent in one of the chi-square procedures. It is called
the chi-square test of independence, or the
r 3 k chi-square. The names for this procedure are
just as informative as were those for the one-variable
chi-square test. The “test of independence” alludes to
the fact that what is tested is whether the two vari-
ables included in the analysis operate independently.
Key Terms: The chi-square
test of independence, or r 3 k
chi-square is a test of the inde-
pendence of two nominal scale
variables.
tan81004_12_c12_295-322.indd 309 2/22/13 3:45 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
In the language of ANOVA, the variables are analyzed to
determine whether they interact.
When the chi-square value is statistically significant, it
indicates that the two variables do
not operate independently, a result that will lead to an ancillary
analysis to determine the
level of their relationship.
As was the case in factorial ANOVA, both of the interacting
variables in the r 3 k chi-
square are categorical. The r 3 k designation refers to the way
the analysis is set up. The
data are organized so that the levels of one variable are
indicated in the rows (r) of a table,
and the other variable is represented in the columns, with each
column representing a
separate category (k) of that variable. Although the calculations
will fit the same table that
was used for the goodness-of-fit (1 3 k) problems, the frequency
expected ( fe) values are
calculated differently.
Setting up the Chi-Square Test of Independence
The rows and columns explanation just above pro-
vides the organization for a contingency table that
is used in this chi-square test of independence. The
way it is used can be illustrated with a problem. The
human resources department for a fast food chain
is considering offering an early retirement package
to some of the more senior managers in an effort to
reduce payroll. The department wishes to be able to
predict whether a severance package with a $15,000
bonus offered to early retirees will affect retirement plans.
Because of the potential costs
associated with offering the bonus to dozens of senior
managers, the human resources
people need to have some understanding of the impact that the
bonus will have on
employees’ decisions to retire. Among the managers, 30 senior
managers are identified
and randomly divided into two groups of 15 each.
• The 15 managers in the first group are asked to complete a
questionnaire, to
be submitted anonymously, which includes a question about
whether the
respondent anticipates retiring within the next three years.
Among this group,
two managers indicate that they intend to retire within the
specified period.
• Those in the second group of 15 managers are asked whether,
if a $15,000
bonus were offered to those who retire in the next three years,
they would
retire in that period. Of the 15, seven managers indicate that
they would retire
within the next three years if the bonus were offered.
Note that there are two potentially related variables involved.
One is whether managers
intend to retire in the coming three years. The other is whether
they would retire in that
time frame if a bonus were offered. These two variables can be
represented in a table that
looks much like what was used earlier to set up a two-way
ANOVA. In addition to helping
one to visualize the problem, when used in the chi-square test of
independence the table
helps with the task of deriving the fe values. But before
worrying about the calculations,
note the contingency table below. It is organized with the
categories of one variable in the
rows of the table and the categories of the other variable in the
table columns:
Key Terms: The contingency
table organizes data into rows
for the categories of one variable,
and columns for the catego-
ries of the other in an r 3 k
chi-square.
tan81004_12_c12_295-322.indd 310 2/22/13 3:45 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
Retirement Yes Retirement No
No Bonus
Bonus
Organizing the Contingency Table
This same table appears in Table 12.5 with the results of the
survey filled in. There are also
totals for each row, each column, and a value for all subjects
together, n. This particular
contingency table is a 2 3 2. Although the chi-square test of
independence is limited to
two variables, those two variables can each have any number of
categories. There could
have been five levels of the bonus, for example offered to
different groups of potential
retirees: no bonus, a $5,000 bonus, a $10,000 bonus, a $15,000
bonus, and a $20,000 bonus.
There might have been more than two categories of the
retirement decision as well: retire
in the next year, retire in two to three years, retire in four to
five years, and so on. With
the 2 3 2 problem, there are just four cells labeled “a” through
“d.” Note that the value
in cell “a” is the number that represents the combination of the
no-bonus group and the
number in that group who indicated that they would retire. The
number in cell “d” is the
combination of the bonus and the number in that group who
opted not to retire, and so on.
Table 12.5: The chi-square test of independence for retirement
decisions and a severance
bonus
A. The Contingency Table
Will Retire Won’t Retire Row Totals
No Bonus a 2 b 13 15
Bonus c 7 d8 15
Column Totals 9 21 n 5 30
B. Completing the Analysis
Statistic a b c d
fo 2 13 7 8
fe 4.5 10.5 4.5 10.5
fo 2 fe 22.5 2.5 2.5 22.5
( f o 2 fe)
2 6.25 6.25 6.25 6.25
( f o 2 fe)
2/fe 1.39 .60 1.39 .60
x2 5 1.39 1.601 1.39 1.6053.98
The two sample groups each have values that sum to 15, a value
reflected in the row totals
on the right. The sum of the two rows must total n. Although the
2 columns together must
also sum to 30, the individual columns will not necessarily each
be 15 since the number
opting for and the number opting against retirement from the
bonus and no-bonus groups
are not equal.
tan81004_12_c12_295-322.indd 311 2/22/13 3:45 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
The fo and fe Values in the Chi-square Test of Independence
The values in each of the four cells are the fo values that will be
used to calculate the chi-
square value. Note that in Part B of Table 12.5, the fo values
are listed just as they were
listed in the goodness-of-fit problems done earlier.
The difference between the way the chi-square values are
calculated in goodness-of-fit
and test of independence problems is in the way the fe values
are determined. For each
of the “a” through “d” cell values, fe is calculated as the row
total in which the particular
cell is included times the column total in which the cell is part,
divided by n. Symbolically,
for each cell fe 5 (row ttl 3 col ttl)/n. This makes the fe values
in the retirement problem as
follows:
• Cell a: (15 3 9)/30 5 4.5
• Cell b: (15 3 21)/30 5 10.5
• Cell c: (9 3 15)/30 5 4.5
• Cell d: (15 3 21)/30 5 10.5
Once the fe values are determined, the rest of the calculations
for a chi-square value are
the same as they were for a goodness-of-fit test and are
completed in Part B of Table 12.5.
Begin by subtracting fe from fo, square the difference, and so
on. The critical value of chi-
square comes from the same table used for goodness-of-fit
problems. The degrees of free-
dom for the test are determined by taking the rows minus 1,
times the number of columns
minus one (df 5 (r 2 1) 3 (k 2 1)). For this problem, df 5 (2 2 1)
3 (2 2 1) 5 1.
With the calculated value x2 5 3.98, and
the table value x2.05(1) 5 3.84, the result is statistically
significant.
In the context of the r 3 k procedure, what does a significant
outcome mean? Chi-square
results are statistically significant when the fe values diverge
enough from the fo values
that the difference between the two is unlikely to have occurred
by chance. The implica-
tion is that the factor that creates the fo versus fe difference in r
3 k problems is the rela-
tionship between the two variables. If the retirement decision
and the availability of the
retirement bonus variables are unrelated, which is to say that
they operate independently,
there will be no significant result. The fo 5 fe null hypothesis
for an r 3 k problem has the
same meaning as the null hypothesis in a Pearson Correlation
problem. It means that
there is no relationship between the two variables. The
difference is that a Pearson Cor-
relation cannot be calculated between two nominal variables.
The Yates Correction to 2 3 2 Problems
Earlier we said that Type I decision errors are relatively
uncommon with chi-square proce-
dures. While that is generally true, the 2 3 2 problem, where
each variable has two levels
like the example here, may be an exception. With those
problems there can be a tendency
to incorrectly find statistical significance when one or more of
the fe values in the problem
fall below 5.0. In what is now called the “Yates correction,”
Yates suggested curbing this
tendency by subtracting .5 from all fo 2 fe cell differences in
any 2 3 2 problem if at least
one fe value is less than 5.0. The reduced fo 2 fe difference
makes a significant chi-square
value less likely, of course, and so reduces Type I error
probability.
tan81004_12_c12_295-322.indd 312 2/22/13 3:45 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
There is not a consensus about the use of the Yates correction,
and some statisticians argue
that the .5 reduction is in fact an overcorrection that makes the
procedure unnecessarily
conservative. The decision in this book has been to not
incorporate the correction. The
issue is raised so that the reader will know what it is and why
some analysts recommend
making the correction. For more information, Howell (1992)
provides a helpful discussion
of the Yates correction.
Interpreting the Chi-square Test of Independence
The issue in both the chi-square goodness-of-fit (1 3 k) and test
of independence (r 3 k) is
whether what is observed is consistent with an expected
outcome. In the case of the test
of independence, a significant result (rejecting the null
hypothesis) indicates that the two
variables are not functioning independently; they are correlated.
In our example, rejecting
Ho indicates that the intention to retire and the availability of
the cash bonus are related.
In a reference back to the difference between correlation and
causation in Chapter 8, it is
not clear that the bonus causes managers to make a retirement
decision, but for whatever
reason, there are significantly more intents to retire when the
bonus is part of the equation.
At this point the focus turns to the nature of that relationship.
Since it is clear from com-
paring the calculated chi-square value to the table value that
there is a correlation, the
question now is of the strength of the relationship between the
two variables.
Phi Coefficient and Cramer’s V
To this point the analysis was similar to ANOVA or t-tests and
fell under the general
umbrella of the hypothesis of difference. Having determined a
significant difference
between fo and fe, the focus now shifts to a hypothesis of
association issue.
The correlation procedure for interval/ratio variables that meet
normality requirements
was Pearson’s r. For a correlation of ordinal scale data, or for
interval/ratio data that fail
to satisfy normality requirements, Spearman’s rho was the
answer. The need here is for
a correlation procedure based on nominal data, and there are
several from which to
choose. Pearson, who developed chi-square, also developed a
correlation procedure for
nominal variables called coefficient of contingency, C. Because
it produces quite a con-
servative correlation value, it is not as widely used as some of
the alternatives. The
upper bound for most correlation procedures is 1.0. The
coefficient of contingency can-
not reach that value.
Two of the other correlation procedures for nomi-
nal variables are phi coefficient, f (f is the Greek
equivalent of f ) , and Cramer’s V, which are both
explained here. Contingency coefficient, phi coeffi-
cient and Cramer’s V are all based directly on the
chi-square value, which makes them easy to calcu-
late once the chi-square value has been determined.
Key Terms: Phi coefficient
and Cramer’s V are both cor-
relation procedures for nominal
data used after a significant
r 3 k chi-square result.
tan81004_12_c12_295-322.indd 313 2/22/13 3:45 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
Formula 12.3 V 5 "1f2/fewer of rows or columns 2 1 2
Where
x2 5 the value calculated in the r 3 k procedure
n 5 the total number of subjects
For the decision-to-retire and the cash bonus problem, x2 5
3.98, and n 5 30. Therefore to
solve for f:
f 5 "1x2/n 2 5 "13.98/30 2 5 "1.33
f 5 .36
The retirement/cash bonus relationship is f 5 .36. Although f
cannot have a negative
value (the ways x2 is calculated and the square root function in
the f formula do away
with that possibility), it is interpreted like any other correlation
statistic. A 0 correlation
indicates no relationship; a 1.0 correlation indicates a perfect
correlation. The correlation
here of f 5 .36 is “modest,” or perhaps “low,” by correlation
standards.
When either of the two variables in the analysis has two levels,
V 5 f. This is because the
formula for Cramer’s V is,
Furthermore, if the chi-square value is statistically significant,
the correlation coefficient
will be significant at the same level; there are no separate
significance tests necessary for
C, V, or f. If the chi-square value is not statistically significant
(if the decision in the initial
chi-square analysis is to fail to reject), there is no point in
calculating a correlation value
since failing to reject Ho: fo 5 fe indicates that the variables are
independent.
In a statistically significant 2 3 2, 2 3 3, or 3 3 2 chi-square
problem, phi coefficient will be
the appropriate follow-up correlation procedure. The formula is
the following:
Formula 12.2 f 5 "1x2/n 2
Where
f2 5 the square of the phi coefficient
rows or columns 5 the number of levels of the two variables
If there are just 2 levels of either variable, the divisor is 1 and
".362 5 .36, V 5 f. If the
fewest number of rows or columns is three, this changes, of
course, and the correct correla-
tion value to calculate is Cramer’s V.
tan81004_12_c12_295-322.indd 314 2/22/13 3:45 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
A 3 3 3 Test of Independence Problem
A property management company in a large city manages the
landscaping and rent col-
lection at several apartment complexes. Some of the complexes
are quite large, with more
than 50 units; some are very small, with fewer than 10 units;
and the others are classi-
fied as medium-sized. Collecting and crediting rent payments
each month is very time-
consuming. A bookkeeper at the management company guesses
that in the smaller
complexes there is a more intimate relationship between
manager and tenant than in
the larger complexes, and rent difficulties are correspondingly
lower as a result. To test
that assumption, the bookkeeper examines data from 100
apartments located in each of
small, medium, and large complexes and determines the number
of rent payments that
are on time, that are within one week late, and that are more
than one week late. The
data are below:
Rent Submission
On-time Within 1 week .1 week late Row totals
Small a 65 b 30 c5 100
Medium d55 e 35 f 10 100
Large g45 h 25 i 30 100
Column totals 165 90 45 300
The first question is whether the two variables of apartment
complex size and rent late-
ness are independent. If the chi-square value is statistically
significant, the decision will
be that they are not independent, and that will prompt a second
question about the
strength of their relationship. The calculations for this chi-
square test of independence
are in Table 12.6.
Table 12.6: A chi-square test of independence for the size of the
apartment complex and
the lateness of the rent
a b c d e f g h i
fo 65 30 5 55 35 10 45 25 30
fe 55 30 15 55 30 15 55 30 15
fo 2 fe 10 0 210 0 5 25 210 25 15
( fo 2 fe)
2 100 0 100 0 25 25 100 25 225
( fo 2 fe)
2/fe 1.82 0 6.67 0 .83 1.67 1.82 .83 15
S 28.64
• The calculated x2 5 28.64. Since this is a 3 3 3 problem, df 5
(3 2 1) 3
(3 2 1) 5 4.
• The critical value of chi-square x2.05(4) 5 9.49.
tan81004_12_c12_295-322.indd 315 2/22/13 3:45 PM
CHAPTER 12Section 12.5 The Chi-Square Test of
Independence
The result is statistically significant. How promptly renters pay
their monthly rent is
related to the size of the complex in which they live.
Determining the strength of the correlation calls for Cramer’s V
since both variables have
more than two levels.
V 5"1f2/fewer of rows or columns 2 1 2
But since V requires the calculation first of phi coefficient, the
answer begins there.
f 5 "1x2/n 2
5 "128.64/300 2
5 .31
With a value for phi, V can be calculated.
V 5"1f2/fewer of rows or columns 2 1 2
5 "1 .312/2 2
5 ".05
5 .22
The relationship between the size of the rental complex and
how promptly rent is paid is V 5 .22. The correlation is not
particularly robust, but it is statistically significant, since the
value of chi-square upon which it is based is significant. Based
on the analysis, those at the property management company
are in a position to alter procedures in some way that responds
to the relationship between
rent payment and complex size. Perhaps it will make a
difference if rent collections can be
made online. Maybe an effort to improve the social relationship
between the apartment
manager and the tenants, particularly in large apartment
complexes, will prompt rent
payments to be made in a more timely fashion.
Review Question D:
What does phi coef-
ficient measure?
tan81004_12_c12_295-322.indd 316 2/22/13 3:45 PM
CHAPTER 12Chapter Summary
Chapter Summary
The chi-square tests assume that Disraeli was unnecessarily
skeptical. In fact, an informed expectation of an outcome should
provide a fairly good indicator of what will actually
occur, which is the understanding upon which both chi-square
tests are based. The chi-
square tests answer many of the same questions that earlier tests
in this book answered.
The difference is that the chi-square tests are based on nominal
data. Pearson developed
tests for data which indicate nothing more than the count of the
number that occur in a
particular category. Consequently, the analysis is based on
differences between the fre-
quency observed ( fo) and the frequency expected ( fe)
(Objective 1).
The goodness-of-fit, or 1 3 k procedure analyzes whether the
proportions occurring in
the multiple categories of a single variable are consistent with
what is expected based on
an initial hypothesis. The chi-square test of independence, also
called the r 3 k procedure,
straddles the boundary between tests of the hypothesis of
difference and those related to
the hypothesis of association. The initial analysis establishes
whether two variables func-
tion independently. Like the goodness-of-fit test, this part of the
analysis is based on the
magnitude of the fo 2 fe difference. A significant value of chi-
square means rejecting the
probability that the variables are independent (Objective 2). At
that point the question is
about the strength of the relationship between the two variables.
That correlation can be
gauged by one of several correlation procedures designed for
nominal data. Those cov-
ered in this chapter include the phi coefficient and Cramer’s V
(Objective 3).
It should be noted here that this book represents only a brief
introduction to the analysis
procedures that can be useful for managers. The list of
statistical procedures covered in 12
chapters is far from exhaustive, but it is a valuable beginning.
The different tests explained
in Chapters 1212 are representative of those that are appropriate
to many kinds of busi-
ness analysis. Figure 12.2 is a flowchart-like guide to which test
will answer the manager’s
question. It is provided here as a summary and overview of the
preceding chapters. As the
decision tree is followed from the top down, note the issues are:
• Is the question about differences or associations?
• Are the data involved nominal (categorical), ordinal, or
interval/ratio?
• How many groups are involved?
• Are the groups independent?
tan81004_12_c12_295-322.indd 317 2/22/13 3:45 PM
CHAPTER 12Chapter Summary
Figure 12.2: Finding the appropriate test
Answering each question above in turn guides one to the test
tailored to the particular
problem. If the question is about differences between groups
(1), the measures involved
are interval or ratio scale (2), there are 4 groups involved (3)
that are independent (4), one
of the ANOVA tests will answer the question.
No statistics book can provide comprehensive coverage of every
statistical test, and early
in the development of this book a decision was made to present
tests neither for sig-
nificant differences nor association for ordinal data. They are
presented in Figure 12.2 to
round it out, but they are not found elsewhere in the book.
Should the reader wish to pur-
sue Mann-Whitney, Kruskal-Wallis, Wilcoxon, or Friedman’s
ANOVA tests, Tanner (2011)
is a useful source.
No. Groups The testIndependent?
Independent
Related
Questions about Differences Questions about Associations
Chi-square goodness of fit
Phi Coefficient
Spearman’s rho
Pearson Correlation
Point-biserial Correlation
Multiple Correlation
Semi-partial Correlation
Chi-square test of independence
Mann-Whitney U
Wilcoxon T
Kruskal-Wallis H
Friedman’s ANOVA
Friedman’s ANOVA
Independent t
Before/After t
Analysis of Variance
1
2
2
2+
Data Scale
Nominal
data
Ordinal
data
Interval/
Ratio data
Independent
Related
Independent
Related
2
2+ Independent
Related
tan81004_12_c12_295-322.indd 318 2/22/13 3:45 PM
CHAPTER 12Chapter Formulas
When the procedures encountered in this book are used by
managers with an under-
standing of the procedures’ purpose and an appreciation for
their requirements, they offer
the promise that the related decisions will be more reasonable
and better informed, and
can equip managers with the tools they need to be most
effective. Finally, if you wish
to broaden your horizons in the future, virtually all of the more
advanced procedures
that are beyond the scope of this book are based on the concepts
represented here. Your
authors wish you the best of luck. The first author can be
reached with comments and
questions at [email protected]
Answers to Review Questions
A. The chi-square procedures require data of only nominal
scale.
B. The null hypothesis is that the frequency observed equals the
frequency expected,
fo 5 fe, meaning that there is not enough evidence to reject the
possibility that
what emerged in the analysis was consistent with the prediction.
C. Nominal data provide very little measurement information,
and the chi-square
procedures are a case in point. The analyses are based on
nothing more than the
frequency with which data occur in the various categories.
These data yield noth-
ing about how much of a measured quality is present, for
example. Because none
of the data nuances present with ordinal and interval/ratio data
are gauged,
differences must be substantial to prompt rejecting the null
hypothesis. These are
not particularly powerful procedures.
D. Phi coefficient measures the strength of the relationship
between two nominal
variables when at least one of them has only two categories.
Chapter Formulas
Formula 12.1 x2 5 S
1 fo 2 fe 2 2
fe
is the formula for the chi-square test statistic. The same formula
is used for both “goodness
of fit” test and for the r 3 k chi-square test of independence.
Formula 12.2 f 5"1x2/n 2
Phi coefficient is the measure of the correlation for two nominal
variables when the chi-
square test of independence indicates a significant result, and
when one of the variables
involved has two or fewer categories.
Formula 12.3 V"f2/ 1smaller of rows or columns 2 2 1
When the test of independence is significant and both variables
have at least three catego-
ries, Cramer’s V is calculated rather than phi coefficient. V
requires phi, however, which
must be calculated first.
tan81004_12_c12_295-322.indd 319 2/22/13 3:45 PM
mailto:[email protected]
CHAPTER 12Management Application Exercises
Management Application Exercises
Unless otherwise stated, use p 5 .05 in all your answers.
1. Three new movies, each with the potential to be a
blockbuster, are released on the
same day. Reporters from the local television station are
interested to see whether
one appears to have caught the public attention more than the
others. The reporter
goes to the local multiplex and asks those waiting to buy tickets
which movie they
intend to see. On the basis of results from 52 people, are there
significant differ-
ences in movie preferences? The data are as follows:
Fantasy Haven: 22
Night of Terror: 18
Fists of Glory: 12
2. Data from behavioral psychology indicate that administering
a tangible reward to
subjects will prompt response levels twice as frequent as from
subjects who receive
a nontangible reward. To test this notion in a business context,
two sales seminars
are compared. In one seminar, sales representatives are tossed a
piece of candy
every time they ask a relevant question or provide an insightful
comment. In the
other seminar, only verbal reinforcement is provided. At the end
of the seminars,
data are as follows:
verbal reinforcement seminar—17 questions/comments
tangible reward seminar—27 questions/comments
a. What is the fe value for each group?
b. Are the results consistent with the expectation?
3. A Department of Labor study of education and employment
found that unem-
ployed full-time students take twice as many units as students
who are full-time
employees and 1.5 times more units than students who are part-
time employees.
a. If the fe for the unemployed student is 16 units, what are the
fe values for
students who work part time and full time?
b. If the student who is unemployed takes 16 units, the student
who is em-
ployed part time takes 14 units, and the full-time employee
takes 12 units,
is the expectation supported?
4. In a management trainee program for a multinational
corporation, trainees are
expected to learn a foreign language. Besides classes at a
language training insti-
tute, tutors are available. Experience suggests that those
learning Japanese seek
the help of tutors twice as frequently as those who are learning
Spanish. Among 20
students of Japanese, 16 ask for the help of tutors. Among 30
students of Spanish, 8
ask for tutors’ help.
a. What are the fe values?
b. Are results consistent with prior experience?
c. In this instance, what does HA specify?
tan81004_12_c12_295-322.indd 320 2/22/13 3:45 PM
CHAPTER 12Key Terms
5. A marketing analyst is examining the relationship between
shoppers’ ethnicity
and the purchase of certain grocery item. From ethnic group A,
2 of 12 people
purchased the item. From ethnic group B, 5 of 10 people
purchased the item. From
ethnic group C, 4 of 14 people purchased the item.
a. Are the shoppers’ ethnicities and the tendency to purchase
this item inde-
pendent?
b. If not, what is the correlation?
6. During the summer months, when electricity usage is high,
the power company
appeals to customers to reduce consumption by 10% as a public
service to avoid
blackouts. An alternative is to offer rebates to customers who
reduce usage by
10% compared to the same month the previous year. Among 50
randomly selected
customers just asked to reduce electricity use, 14 reduce their
use by 10% or more.
Among 50 randomly selected customers offered rebates, 25
reduce their electricity
use by 10%. Are the differences between the public service
appeal and the rebates
statistically significant?
7. A number of nonprofit groups use fireworks sales as the
major fundraiser in the
days before the 4th of July. Some of the nonprofits are service
groups such as the
Veterans of Foreign Wars. Others are intended for support of
groups like the cheer-
leaders from the local high school. The questions are whether
those two groups
attract different numbers of customers, and whether the gender
of the customer is
a factor. Among 20 men who bought fireworks during a
particular 2-hour period,
14 purchased from service organizations, the other 6 purchased
from non-service
groups. Of the 18 women who purchased in the same period, 8
bought from ser-
vice organizations, and 10 from non-service groups. Is the
gender of the purchaser
related to the group from which the purchase is made?
8. A corporate CEO is interested in whether 20 management
trainees’ possession of
a graduate degree (yes/no) is related to their promotion within
the first five years
(yes/no). The x2 value is 8.450.
a. What does the x2 value indicate about possession of a
graduate degree and
promotion?
b. What is the value of f?
Key Terms
• The goodness-of-fit, or 1 3 k, is a chi-square test for
significant differences between
the frequency observed and the frequency expected in the
categories of one nominal-
scale variable.
• The chi-square test of independence, or the r 3 k chi-square is
a test of whether
two nominal scale variables operate independently. The test
statistic is the same as
for the goodness-of-fit chi-square. A statistically significant
result indicates that the
two are not independent and a correlation procedure follows.
tan81004_12_c12_295-322.indd 321 2/22/13 3:45 PM
CHAPTER 12Key Terms
• The data in a chi-square test of independence are often
arranged in a contingency
table with the rows indicating the categories of one variable and
the columns the
categories of the other. With the count of the data entered into
the resulting cells, the
table provides a visual indicator of the way the variables
operate together.
• When a chi-square test of independence is statistically
significant, it is followed by
a measure of the strength of the correlation between the
variables. This is usually a
phi coefficient if there are only two categories of one of the
variables, or Cramer’s
V when there are more than two categories.
tan81004_12_c12_295-322.indd 322 2/22/13 3:45 PM
BU
Due: 09/01/2015
CE 371
Numerical Methods in Civil Engineering
PROJECT
Fall’14
Page 1 of 1
two infinite
cylindrical surfaces of radii 1r and 2r as shown in figure below.
Initially (at
ywhere (inner circle is insulated);
The inner
The dimensionless heat conduction equation in radial
coordinates is;
2
2
1
r r r t
To reduce the situation to a characteristic value problem, we
must
render the boundary conditions homogenous, that is, of form 0a
r
.
differential equation
becomes;
2
2
1
r r r t
subject to;
ii) the boundary conditions: 0
r
a) Simulate the equation using finite difference method explicit
scheme
with Δt = 0.25, 0.5, 1.0
b) Simulate the equation using finite difference method implicit
scheme
with Δt = 0.25, 0.5, 1.0
c) Plot your findings and compare the results for a and b.
r1 = 1 m
r2 = 10 m
NOTE: Your project reports will be evaluated for both
correctness and accuracy of
your results and for clarity and neatness of your report. Be sure
to organize your
plots neatly.
11
Confidence Intervals
Learning Objectives
After reading this chapter, you should be able to:
• Distinguish between point and interval-estimates of values.
• Calculate confidence intervals for t-tests and regression
solutions.
• Explain the factors that affect the width of a confidence
interval.
iStockphoto/Thinkstock
tan81004_11_c11_273-294.indd 273 2/22/13 3:44 PM
CHAPTER 11Introduction
Chapter Outline
11.1 A Confidence Interval for a One-Sample t-Test
Confidence Intervals and the Significance of the Test
The Width of the Interval
11.2 A Confidence Interval for an Independent Samples t-Test
An Independent Samples t-Test Example
Calculating the Confidence Interval of the Difference
11.3 The Confidence Interval of the Prediction
The Standard Error of the Estimate
Calculating the Confidence Interval of the Prediction
Regarding the Width of the Interval
The Excel Confidence Intervals for a Regression
Solution
Chapter Summary
Introduction
Early in the book there was a distinction drawn between the
statistics that describe sample data and the parameter values
that describe the characteristics of populations.
The relationship between statistics and parameters is important.
The value in calculat-
ing many statistics is connected to how well they represent one
of the possible values
of a related parameter. In fact a good deal of analysis and
decision making is based on
the understanding that when samples are large and randomly
selected, the statistics that
describe the sample will also provide useful indicators of what
the parameter values will
likely be. In the progression from one type of analysis to the
next in this book, there have
been a number of instances in which a statistic was calculated
and used in a procedure
because the parameter value was not available. In each
situation, the statistic that was cal-
culated from the sample data was employed as an estimate of
what the related parameter
value would likely be for the entire population. For example,
• The sample mean, M, is one of the possible values of the
population mean, m.
• The standard deviation, s, can serve as an estimate of s.
• The sample standard error of the mean (SEM) was used to
complete the one-
sample t-test in the place of the population standard error of the
mean, sM,
which the z-test requires.
• The standard error of the difference, SEd, in the independent
samples t-test
substituted for the standard deviation of the differences
sM12M2 between the
means of all possible pairs of samples in a distribution of
difference scores.
• Ordinary least squares regression produces a predicted value
of the criterion
variable (y’) when the actual value of that variable, y, is
unknown.
In each case, the statistic is a particular, discrete value that
takes the place of the related
parameter. The value of the statistic is based on the
understanding that it is the best estimate
available from whatever data are at hand for the value of the
more-difficult-to-determine
tan81004_11_c11_273-294.indd 274 2/22/13 3:44 PM
CHAPTER 11Section 11.1 A Confidence Interval for a One-
Sample t-Test
parameter. Usually a relatively large sample that is created by
randomly selecting the indi-
viduals who make it up will reflect the essential characteristics
of the population from which
the sample was drawn. But “usually” means that sometimes the
sample does not accurately
represent the population, and the difficulty is that at the time it
may not be clear whether the
sample is representative.
Many of the statistical procedures used to this point involve
calculating a value that indi-
cates the degree of difference or association between groups and
then determining whether
that value is statistically significant. The calculated values
represent what are called
point estimates of outcomes; they are discrete numbers that are
used to determine when
a difference or a relationship reflected in sample data is likely
to have occurred by chance.
The difficulty is that even when a value is based on the most
careful data collection proce-
dures, there is always a risk of sampling error. The
point estimate by itself provides no indicator of its
accuracy. It is difficult to know how much confidence
to have in results that are based on these values.
One way to address the limitations of point esti-
mates is to move away from the focus on discrete
values and rely instead on what are called interval estimates of
the relevant values. For
example, rather than asking whether the sample mean provides
an accurate estimate of
the mean of the population from which the sample was drawn,
the question becomes,
“What range of values is likely to capture the true value of the
population mean from
which a significantly different sample was drawn?” In other
words, instead of relying
on point estimates to estimate population parameters, a
confidence interval provides,
with a specified level of probability, a range of values
within which the estimated population parameter is
likely to fall. Relying on a range of values to capture
the value of interest rather than trying to pinpoint it
with a discrete value is why this approach to statis-
tical analysis makes reference to “confidence inter-
vals.” The emphasis is on the use of “interval esti-
mates” rather than on “point estimates.”
It isn’t uncommon to see both point and interval esti-
mates used in the same analysis. If a statistical test
produces a statistically significant result, the analysis that
provided a point estimate of
the value of the population parameter is often followed by a
confidence interval for that
population parameter. That way one can know (a) what the best
estimate of the value is,
and (b) how much variation should be allowed around that value
in order to have a rea-
sonable chance of capturing it. Different statistical procedures
involve different kinds of
confidence intervals.
11.1 A Confidence Interval for a One-Sample t-Test
Suppose that the market development team associated with an
electric power plant located
in Fresno County, California, wishes to estimate average
monthly electricity usage. Cal-
culating the mean (M) use of electricity among a randomly
selected sample of Fresno
Key Terms: Point estimates
are discrete values that are cal-
culated for unknown parameter
values.
Key Terms: Interval esti-
mates are calculated ranges
within which an unknown
parameter value is likely to
occur. Because the interval is
established based on a specified
level of confidence, it is also
called a confidence interval.
tan81004_11_c11_273-294.indd 275 2/22/13 3:44 PM
CHAPTER 11
Formula 11.1 C.I..95 5 6t1SEM2 1 M
residents will probably provide a reasonable estimate of mean
use among all members of
the county, (m), so long as the sample is of an adequate size and
is selected in a way that
minimizes sampling error. However, what if the goal is to
compare domestic use in Fresno
County to domestic use in the country as a whole?
As an example, suppose that the mean electricity bill for
domestic users in the country
as a whole is $230.835 per month. For Fresno County, the
monthly costs for 31 ran-
domly selected residences average $245, with a standard
deviation of 37.555. Since
SEM 5 s/ "n , SEM 5 37.555/ "31 5 6.745. Recall from Chapter
4 that t can be calculated
as follows:
t 5
M 2 mM
SEM
Remembering that uM and m have the same value,
t 5
245 2 230.835
6.745
t 5 2.100
If the probability of a type I error is set at p 5 .05, then the
critical value of t for df 5 30
(n 2 1) 5 2.042 (Table 4.1). Average electricity use by the
people in Fresno County is sig-
nificantly different than it is for the country as a whole. For a
one-sample t-test, a statisti-
cally significant result indicates that the mean of the population
from which the sample
was drawn (Fresno) is different from the population mean to
which it was compared
(United States); to say it differently, it indicates that there are
two different populations
involved—one is the population from which the sample was
drawn (Fresno), and the
other is the population to which the sample was compared
(United States). But is
the sample mean (M) that prompts a significant t value really an
accurate estimate of
the population from which the sample was drawn? In the
electricity use problem, does
M provide a reasonably good estimate of electricity use in the
population of all electric-
ity users in Fresno County? How heavily can one afford to rely
on the “M as a good rep-
resentation of m” assumption? Rather than calculat-
ing the sample mean (M) and relying on sample size
and random selection to make the case that M 5 m,
the confidence interval of the population mean
provides a range of values, thus the reference to an
interval, within which mM, the mean of the popu-
lation to which the sample does belong, will occur
with a specified level of probability. The confidence
interval for a one-sample t is calculated with this
formula:
Key Terms: The confidence
interval of the popula-
tion mean is a range within
which the value of an unknown
population mean has a specified
probability of occurring.
Section 11.1 A Confidence Interval for a One-Sample t-Test
tan81004_11_c11_273-294.indd 276 2/22/13 3:44 PM
CHAPTER 11
Where
C.I..95 5 a .95 confidence interval, or a range of values within
which the probability
is p 5 .95 that the true value of the population mean, mM, will
be included.
t 5 the critical value of t for the degrees of freedom associated
with the t-test.
The 6 symbol indicates that the critical value of t from the table
is included twice,
once as a positive value and a second time as a negative value.
SEM 5 the calculated value of the standard error of the mean.
M 5 the sample mean.
As is the case with any type of statistical testing, when
calculating confidence intervals the
analyst deals in probabilities rather than certainties. The range
of values that is a .95 con-
fidence interval for a one-sample t-test will capture the mean of
the population, which is
represented by the sample 19 times out of 20. The other side of
that coin for a .95 confi-
dence interval is that based on averages, 1 time in 20 the
parameter value that is sought
will occur outside the interval.
The level of probability for the confidence interval is
determined by whatever the prob-
ability level was for the original test of statistical significance.
Since most statistical testing
is conducted at p 5 .05, that is also the most common standard
for the confidence interval.
The confidence interval calculated after a one-sample t that is
statistically significant at
p 5 .05 must produce a range of values that will miss the true
value of the population mean no more than 5 times in 100.
However, confidence intervals are stated in terms of the prob-
ability of capturing the particular value rather than the proba-
bility of missing that value. So instead of a p 5 .05 confidence
interval, it is a p 5 .95 confidence interval. If the t-test had
been conducted at p 5 .01, a .99 confidence interval would be
the appropriate interval estimate to calculate, and so on.
To calculate the confidence interval for the one-sample t
procedure completed above, the
process is as follows:
C.I..95 5 6t(SEM) 1 M
C.I..95 5 62.042(6.745) 1 245
C.I..95 5 613.773 1 245
C.I..95 5 258.773, 231.227
Review Question A:
What is the probabil-
ity that the true value
will be outside a .95
confidence interval?
Section 11.1 A Confidence Interval for a One-Sample t-Test
tan81004_11_c11_273-294.indd 277 2/22/13 3:44 PM
CHAPTER 11Section 11.1 A Confidence Interval for a One-
Sample t-Test
The result is interpreted this way: With .95 confidence, the
mean monthly cost of electric-
ity in the population to which the Fresno County people belong
is somewhere between
$231.23 and $258.77.
The initial question that prompted the t-test could be framed
this way: Is monthly electric-
ity usage in Fresno County consistent with electricity usage for
the same month nation-
wide? Because the t value is statistically significant, the answer
is “no.” The statistically
significant t indicates that the sample probably belongs to a
population other than the one
to which it was compared. Maybe it is summer, and with
temperatures higher in Cali-
fornia than elsewhere, California consumers have higher
electricity usage than what is
characteristic of the country as a whole. For whatever reason,
the t-test indicates that the
sample belongs to a different population than the population
that is the country.
The sample mean used in the t-test is a point estimate of the
mean of a population. If the
t-test result is not statistically significant, the conclusion is that
the sample mean is one of
the many samples making up the population of sample means.
However, the fact that the
t-test result is statistically significant indicates that the sample
mean is point estimate of
the value of some population mean different from what was
indicated for the country as a
whole. What the point estimate does not indicate, however, is
how well the sample mean
estimates the value of the mean of the population to which the
sample belongs. The confi-
dence interval responds to this absence by providing a way to
calculate a range of values
within which the population mean will probably occur. The
“probably” is the reminder
that there is never a certainty of including the value. In this
case, there is 95% probability
that the mean of the population represented by the sample is
somewhere between $231.23
and $258.77.
Note that the range of values in the confidence interval does not
include the original pop-
ulation mean ($230.835) from the initial t-test. When a sample
mean in a one-sample t-test
is significantly higher than the population mean, the sample
mean (M) is probably not
one of the possible values of the population mean, a population
with mM 5 230.835 in this
case. Because the sample mean was significantly higher than
mM, the mean of the popula-
tion to which the sample probably does belong will have a value
higher than 230.835 in
the t-test. So the values in the range that is the confidence
interval are all beyond the origi-
nal value of mM. If a significant t test involved a sample mean
less than mM, the confidence
interval would include a range with values all lower than mM.
Confidence Intervals and the Significance of the Test
Confidence intervals are only calculated for significant results.
If the value of t is not sta-
tistically significant, the confidence interval will include a
range of values that includes
the original value of mM. This happens because a non-
significant t-test result is interpreted
to mean that the value of mM from the test is one of the values
that could be the mean for
the population to which the sample was compared. An example
will clarify this.
Suppose sales of frozen yogurt for a franchise average $1,125
per day. For a particularly
cold week, the daily sales were as follows:
$974, $1,256, $1,170, $842, $875, $1,056, $1,145
tan81004_11_c11_273-294.indd 278 2/22/13 3:44 PM
CHAPTER 11
Are sales during that week significantly different from average
daily sales of $1,125? Ver-
ify that:
M 5 1045.429
s 5 155.679
SEM 5 s/"n 5 155.679/"7 5 58.841
t 5
M 2 mM
SEM
5
1045.429 2 1125.0
58.841
5 21.352;
With t.05(6) 5 2.447, the result is not significant.
The sample mean from that non-significant result, with the
critical value of t and the esti-
mated standard error of the mean, produce this .95 confidence
interval:
C.I..95 5 6t(SEM) 1 M 5 62.447(58.841) 1 1045.429 5
1189.413, 901.445
The t-test was not significant, and the resulting confidence
interval therefore includes the
original population mean ($1,125) as one of the values in the
interval. Logically, if the point
estimate (M) is not significantly different from the population
mean (mM), then maybe the
mean of the population represented by the sample does have the
same value as the speci-
fied population mean. At least it is a possibility that cannot be
rejected.
The Width of the Interval
The narrower the confidence interval, the more precise estimate
the interval provides for
the value of the unknown population mean. Sometimes
confidence intervals are so wide
that they seem not to be very helpful as indicators of the
unknown value, so it makes sense
to analyze the factors that affect the width of the interval. There
are several factors, includ-
ing the level of probability at which the interval is calculated.
For the electricity problem,
the .95 confidence interval stretched from $231.227 to
$258.773, around the $245 sample
mean. If the market development team wanted a greater level of
certainty about capturing
the value of the population mean to which the sample belongs, a
C.I..99 could be used, but
note what happens as a result. First, verify that the critical t
value for p 5 .01 and df 5 30 is
2.750. Now the calculated t value of 2.11 is not statistically
significant. Second, to calculate
the .99 confidence interval:
C.I..99 5 6t(SEM) 1 M
C.I..99 5 62.750(6.745) 1 245
C.I..99 5 618.549 1 245
C.I..99 5 263.549, 226.451
Section 11.1 A Confidence Interval for a One-Sample t-Test
tan81004_11_c11_273-294.indd 279 2/22/13 3:44 PM
CHAPTER 11Section 11.2 A Confidence Interval for an
Independent Samples t-Test
The price for a greater certainty of capturing the true value of
the population mean
(p 5 .99 rather than p 5 .95) represented by the sample is a
wider confidence interval and
less precision. Now the market development department is 99%
certain that the confi-
dence interval includes the population mean, but the interval
estimate is much broader
and probably less helpful. Note that the interval now includes
the population mean of
$230.835. Precision and certainty push the confidence interval
in opposite directions. One
is improved only at the expense of reducing the other.
Just as the confidence interval increased when a .95 confidence
interval was used, it could
have been reduced by calculating a .90 C.I. interval instead
(although the particular table
in this book does not provide the critical values for significance
testing at p 5 .1). Note
that with a C.I.90 the probability of missing the true value of
the population mean to which
the sample belongs is 1 time in 10. As the level of probability is
relaxed, the critical value
of t diminishes accordingly, which shrinks the C.I.
The other element in the width of the confidence interval is the
standard error of the mean,
SEM. Since SEM 5 s/ "n , the value of the standard error of the
mean can be reduced by
either increasing the sample size (larger n, larger divisor,
smaller resulting value), finding
a way to reduce the variability in the scores so that the standard
deviation is smaller, or
both. As it turns out, increasing the sample size usually will
serve both purposes. Besides
the larger n value, it also usually decreases the s value. Small
samples tend to be platykur-
tic; they ordinarily have relatively large standard deviations
given their ranges. As sample
sizes grow, particularly randomly selected samples, overall
variability tends to decrease.
This is consistent with the central tendency characteristics of
normal distributions, as more
of the scores selected will tend to occur in the middle of the
distribution near the mean.
Since the standard deviation measures how much individual
values tend to vary from the
mean of the group, increasing sample size usually shrinks the
standard deviation value.
11.2 A Confidence Interval for an Independent Samples t-Test
A statistically significant independent samples t-test indicates
that the two samples represented by M1 and M2 probably did
not come from the same population; they
came from populations with different means. Stated
symbolically, m1 2 m2 ? 0. Consistent with the lan-
guage used earlier, the two sample means in the
case of the independent t-test are point estimates
of their respective population means. The more
difference there is between M1 and M2 in a statis-
tically significant t-test, the greater the difference
there probably is between the means of the two
related populations. The confidence interval for the independent
samples t-test is called
the confidence interval of the difference. It provides a way to
estimate the difference
between the two population means represented by the sample
means in a statistically
significant independent samples t-test.
The confidence interval of the difference employs this formula:
Key Terms: The confidence
interval of the difference
is a range probably contain-
ing the difference between two
unknown population means.
tan81004_11_c11_273-294.indd 280 2/22/13 3:44 PM
CHAPTER 11Section 11.2 A Confidence Interval for an
Independent Samples t-Test
Formula 11.2 C.I..95 5 6t(SEd) 1 (M1 2 M2)
Where
C.I..95 5 a range of values within which the difference between
the means of the
population represented by the samples will be captured with p 5
.95
t 5 the critical value of t for n1 1 n2 2 2 degrees of freedom
(the same number of
degrees of freedom as there were for the independent samples t)
SEd 5 the calculated value of the standard error of the
difference
M1, M2 5 the two sample means from the independent t-test
As with the one-sample t-test, a confidence interval of the
difference is only calculated if
the independent samples t-test is statistically significant.
Considering the purpose for the
confidence interval makes the reason for this evident. What is at
issue in the independent
samples t is whether the two samples likely represent
populations with the same means,
or for practical purposes, whether the two samples belong to the
same population. That
probability is rejected when the result is significant. If the
analysis is not significant and
the decision is to “fail to reject,” then the samples may come
from populations with the
same means. It makes little sense, therefore, to calculate a
confidence interval for the dif-
ference between the population means if there may be only one
population involved.
An Independent Samples t-Test Example
A pharmaceutical manufacturer has received Federal Drug
Administration (FDA)
approval to market a new drug for people with high cholesterol.
In two sales regions, pre-
scription sales have been similar. But then at some point, one of
the two representatives
receives special training on the drug’s chemistry so that the
representative can explain more knowledgeably how the drug
works and what the side effects are likely to be. The question
is whether the extra training is worth the trouble—do the doc-
tors visited by the sales rep who received the extra training
write a significantly different number of prescriptions than
the doctors the other rep visits? The numbers of prescription
orders for the drug by doctors in the two sales regions over a
seven-week period following the training are as follows:
Extra training: 13, 10, 14, 17, 16, 12, 15
No extra training: 12, 10, 9, 12, 8, 8, 9
Review Question B:
What is the relation-
ship between the level
of confidence and the
width of the confi-
dence interval?
tan81004_11_c11_273-294.indd 281 2/22/13 3:44 PM
CHAPTER 11Section 11.2 A Confidence Interval for an
Independent Samples t-Test
Verify the following descriptive statistics:
Mean Standard Deviation Standard Error of
the Mean
Extra Training 13.857 2.410 .911
No Training 9.714 1.704 .644
Recall that SEd 5 "1SEM2 2 1 SEM22 2 5 "1.9112 1 .6442 2
5 1.116
t 5
M1 2 M2
SEd
5
13.857 2 9.714
1.116
5 3.712; t.051122 5 2.179. Reject H0
The result is statistically significant. As the difference between
the sample means sug-
gests, the doctors visited by the sales representative with the
extra training are writing
significantly more prescriptions than the doctors in the other
sales district. In terms of
the number of prescriptions the doctors write, the two sales reps
probably now represent
two distinct populations. The sample means provide point
estimates of the means of the
two populations involved. The confidence interval of the
difference provides a different
way to contrast the two populations by indicating, with the
specified probability, what
range of values is likely to capture the magnitude of the
difference between the two
population means.
Calculating the Confidence Interval of the Difference
Formula 11.2 is:
C.I..95 5 6t(SEd) 1 (M1 2 M2)
Note that although the final term in the formula indicates M1 2
M2, it is the absolute
value of the difference between the means that the formula
requires. If M2 happens to
have a greater value than M1, the result will be a difference that
is negative, but whether
M1 2 M2 is positive or negative isn’t relevant to the confidence
interval. It is the absolute
value of M1 2 M2 that is entered in the formula.
All the required values are available from the t-test solution
above. Substituting them
gives the following for the confidence interval:
tan81004_11_c11_273-294.indd 282 2/22/13 3:44 PM
CHAPTER 11Section 11.3 The Confidence Interval of the
Prediction
C.I..95 5 62.179(1.116) 1 (13.857 2 9.714)
C.I..95 5 6.575, 1.711
a. 2.179 3 1.116 1 (13.857 2 9.714) is the upper bound of the
confidence inter-
val, and
b. 22.179 3 1.116 1 (13.857 2 9.714) is the lower bound of the
confidence interval.
The significant independent samples t-test result that indicates
that the two samples prob-
ably do not represent populations with the same mean is
followed with a confidence inter-
val estimating how much difference there is between the means
of the two populations.
The calculations indicate that with p 5 .95 confidence, the
difference will be somewhere
from 1.711 to 6.575 prescriptions per week.
Note that the confidence interval does not estimate the means of
the two populations.
Rather, it estimates the difference between their means. The
sample means provide esti-
mates of the population means, which is as close as we come to
identifying their values
without further data collection and analysis.
11.3 The Confidence Interval of the Prediction
The first two confidence intervals in this chapter were
calculated and interpreted in ref-
erence to population means. When t is significant in the one-
sample test, the confidence
interval is a range of values within which the mean of the
population represented by the
sample is likely to occur. When t is significant in an
independent groups test, the confi-
dence interval indicates how much difference there is likely to
be between the means of
the two populations inferred by the samples.
The confidence interval of the prediction has a different
orientation. It is a confidence
interval for a regression solution, and because the task in
regression is to predict the value
of y from x (or from multiple x predictors), what the confidence
interval produces is a
range of values within which the true value of y will probably
occur, given a specific value
for x.
Remember that the basis for regression is that when-
ever variables are significantly correlated, a number
of estimates of the value of y from x will be more
accurate than a series of random estimates. While
that represents a very important theoretical princi-
ple, managers are not usually interested in a series of
estimates. For example, given a correlation between
the price of some product and the resulting sales vol-
ume, a production manager who is trying to forecast what
production will need to be in
order to satisfy demand is often not interested in a series of
estimates of sales volume (y)
for several different prices (xs), but rather in one estimate of y
from x. The question is often
“If the price is ___, what are sales most likely to be?” The value
of the confidence interval
is that it provides a way to estimate how precise that one
prediction of sales volume is
likely to be.
Key Terms: The confidence
interval of the prediction is
a range of probability contain-
ing the true value of a criterion
variable.
tan81004_11_c11_273-294.indd 283 2/22/13 3:44 PM
CHAPTER 11Section 11.3 The Confidence Interval of the
Prediction
Small values of the standard error of the estimate indicate
relatively little error and inter-
vals from y’ 2 1(SEest), to y’ 1 1(SEest) that are relatively
narrow. When that occurs, there
can be increased confidence in the predicted value of y. Small
intervals indicate little error
in the prediction. On the other hand, if that range is quite wide,
it suggests there could be
a good deal of variance between the predicted value of y and its
actual value.
Capturing the true value of the criterion variable about 2⁄3 of
the time, which is what y’
6SEest provides for, produces essentially a p 5 .68 confidence
interval. The problem is that
the range of values that is y’ 61(SEest) would miss the true
value of y about
1⁄3 of the time.
The probability of not capturing the true predicted value in such
a confidence interval is
too great to be helpful in most analytical situations. To improve
the probability of captur-
ing y, the more precise C.I..95 and C.I..99 confidence intervals
of the prediction are calcu-
lated. The formula for a C.I..95 is:
Formula 11.4 C.I..95 5 6tn22(SEest) 1 y’
Formula 11.3 C.I..68 5 61(SEest) 1 y’
It is important to be cautious about the distinction between
specificity and precision. The
difficulty is not in predicting the value of y given x, which can
be done whenever x and
y are significantly correlated. The challenge is in making
precise predictions of the value
of y from x. Although the least squares regression equation
provides a particular value
that is the best prediction of the criterion variable from a
specific value of x, what consti-
tutes the best prediction is relative. The best prediction in some
circumstances can be very
imprecise. A number of factors conspire against accurate
predictions, most notably, weak
correlations. The problem that the regression solution does not
solve is that the predicted
value of y, (y’), yields little information about how precise the
prediction is likely to be. It
is a shortfall that the confidence interval addresses.
The Standard Error of the Estimate
Recall that in any normal distribution, plus or minus 1 standard
deviation from the mean,
accounts for about 2⁄3 of the population, 68.26% to be more
exact. Recall also from Chapter
9 that the standard error of the estimate (SEest), which
estimates prediction errors in regres-
sion problem, is by definition a standard deviation value. In fact
it is the standard deviation
of all possible error scores from an infinite number of
predictions of y from x. While that
definition has important conceptual value, in practice the
standard error of the estimate is
in fact an estimated value, as the name suggests. Recall that
SEest 5 sy"1 2 r2xy . Because
it is a type of standard deviation value, and since 6 one standard
deviation from the mean
in any normal distribution includes 68% of that distribution, we
can say by extension that
61(SEest) from the predicted value of y (y’) point should
provide a range of values within
which the true value of y will occur about 68% of the time.
tan81004_11_c11_273-294.indd 284 2/22/13 3:44 PM
CHAPTER 11Section 11.3 The Confidence Interval of the
Prediction
Where
C.I..95 5 a .95 confidence interval for the regression solution
t 5 the critical value of t for n 2 2 degrees of freedom where n 5
the number of
pairs of scores
SEest 5 the standard error of the estimate
y’ 5 the predicted value for the criterion variable
Calculating the Confidence Interval of the Prediction
A produce importer recognizes a relationship between how long
it takes produce to get
from shipping docks to the retail market and how much of the
order is lost to spoilage
because of over-ripening. The data for number of days (x) and
the percentage of the order
lost to over-ripeness (y) is as follows:
Number of days: 12, 15, 11, 9, 7, 9, 12, 10, 7, 8, 9, 8
Percentage lost: 8, 10, 7, 7, 6, 8, 9, 9, 5, 6, 7, 6
With a dock workers’ strike looming and produce coming into
port, the importer wishes
to predict what the losses will be if it takes 18 days to get the
produce to the retailer.
Verify the following descriptive statistics:
Mean Standard Deviation
Number of Days (x) 9.750 2.379
% Lost (y) 7.333 1.497
rxy 5 .868 and is statistically significant at level p 5 .05
y’ 5 a 1 bx
b 5 rxy(sy/sx)
5 .868(1.497 4 2.379)
5 .546
a 5 My 2 bMx
5 7.333 2 (.546)(9.750)
5 2.010
y’ 5 2.010 1 .546(18)
5 11.838
tan81004_11_c11_273-294.indd 285 2/22/13 3:44 PM
CHAPTER 11Section 11.3 The Confidence Interval of the
Prediction
Based on these data, if it takes 18 days to get the produce to
retail outlets, there will be an
11.838% loss due to over-ripening.
Like the values for M in the one sample t, and M1 2 M2 in the
independent t, the 11.838
value is a point estimate—a discrete number—indicating a
solution value. As with t-tests,
the confidence interval of the prediction produces an interval
estimate, or a range of num-
bers within which the true percentage of spoilage will occur
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx
12The Chi-Square Test Analyzing Categorical DataLea.docx

More Related Content

Similar to 12The Chi-Square Test Analyzing Categorical DataLea.docx

1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
Zoha Qureshi
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
Remas Mohamed
 
t-Test Project Instructions and Rubric Project Overvi.docx
t-Test Project Instructions and Rubric  Project Overvi.docxt-Test Project Instructions and Rubric  Project Overvi.docx
t-Test Project Instructions and Rubric Project Overvi.docx
mattinsonjanel
 
Read the article Competition or Complement Six Sigma and TOC that.docx
Read the article Competition or Complement Six Sigma and TOC that.docxRead the article Competition or Complement Six Sigma and TOC that.docx
Read the article Competition or Complement Six Sigma and TOC that.docx
makdul
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
Bilkent University
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
smumbahelp
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
smumbahelp
 

Similar to 12The Chi-Square Test Analyzing Categorical DataLea.docx (20)

Data analysis.pptx
Data analysis.pptxData analysis.pptx
Data analysis.pptx
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Factor analysis using spss 2005
Factor analysis using spss 2005Factor analysis using spss 2005
Factor analysis using spss 2005
 
t-Test Project Instructions and Rubric Project Overvi.docx
t-Test Project Instructions and Rubric  Project Overvi.docxt-Test Project Instructions and Rubric  Project Overvi.docx
t-Test Project Instructions and Rubric Project Overvi.docx
 
Read the article Competition or Complement Six Sigma and TOC that.docx
Read the article Competition or Complement Six Sigma and TOC that.docxRead the article Competition or Complement Six Sigma and TOC that.docx
Read the article Competition or Complement Six Sigma and TOC that.docx
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorialPerformance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
 
Episode 18 : Research Methodology ( Part 8 )
Episode 18 :  Research Methodology ( Part 8 )Episode 18 :  Research Methodology ( Part 8 )
Episode 18 : Research Methodology ( Part 8 )
 
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYSTATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
 
Episode 12 : Research Methodology ( Part 2 )
Episode 12 :  Research Methodology ( Part 2 )Episode 12 :  Research Methodology ( Part 2 )
Episode 12 : Research Methodology ( Part 2 )
 
Data analysis and Interpretation
Data analysis and InterpretationData analysis and Interpretation
Data analysis and Interpretation
 
Samle size
Samle sizeSamle size
Samle size
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14
 
Advice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation ResearchAdvice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation Research
 
analysis plan.ppt
analysis plan.pptanalysis plan.ppt
analysis plan.ppt
 

More from hyacinthshackley2629

Your company nameYour nameInstruction Page1. O.docx
Your company nameYour nameInstruction Page1. O.docxYour company nameYour nameInstruction Page1. O.docx
Your company nameYour nameInstruction Page1. O.docx
hyacinthshackley2629
 
Your Company NameYour Company NameBudget Proposalfor[ent.docx
Your Company NameYour Company NameBudget Proposalfor[ent.docxYour Company NameYour Company NameBudget Proposalfor[ent.docx
Your Company NameYour Company NameBudget Proposalfor[ent.docx
hyacinthshackley2629
 
Your company is a security service contractor that consults with bus.docx
Your company is a security service contractor that consults with bus.docxYour company is a security service contractor that consults with bus.docx
Your company is a security service contractor that consults with bus.docx
hyacinthshackley2629
 
Your company You are a new Supply Chain Analyst with the ACME.docx
Your company   You are a new Supply Chain Analyst with the ACME.docxYour company   You are a new Supply Chain Analyst with the ACME.docx
Your company You are a new Supply Chain Analyst with the ACME.docx
hyacinthshackley2629
 
Your Communications PlanDescriptionA.What is your .docx
Your Communications PlanDescriptionA.What is your .docxYour Communications PlanDescriptionA.What is your .docx
Your Communications PlanDescriptionA.What is your .docx
hyacinthshackley2629
 
Your Communication InvestigationFor your mission after reading y.docx
Your Communication InvestigationFor your mission after reading y.docxYour Communication InvestigationFor your mission after reading y.docx
Your Communication InvestigationFor your mission after reading y.docx
hyacinthshackley2629
 
Your Communications PlanFirst step Choose a topic. Revi.docx
Your Communications PlanFirst step Choose a topic. Revi.docxYour Communications PlanFirst step Choose a topic. Revi.docx
Your Communications PlanFirst step Choose a topic. Revi.docx
hyacinthshackley2629
 
Your coffee franchise cleared for business in both countries (Mexico.docx
Your coffee franchise cleared for business in both countries (Mexico.docxYour coffee franchise cleared for business in both countries (Mexico.docx
Your coffee franchise cleared for business in both countries (Mexico.docx
hyacinthshackley2629
 

More from hyacinthshackley2629 (20)

Your company nameYour nameInstruction Page1. O.docx
Your company nameYour nameInstruction Page1. O.docxYour company nameYour nameInstruction Page1. O.docx
Your company nameYour nameInstruction Page1. O.docx
 
Your Company NameYour Company NameBudget Proposalfor[ent.docx
Your Company NameYour Company NameBudget Proposalfor[ent.docxYour Company NameYour Company NameBudget Proposalfor[ent.docx
Your Company NameYour Company NameBudget Proposalfor[ent.docx
 
Your company recently reviewed the results of a penetration test.docx
Your company recently reviewed the results of a penetration test.docxYour company recently reviewed the results of a penetration test.docx
Your company recently reviewed the results of a penetration test.docx
 
Your company wants to explore moving much of their data and info.docx
Your company wants to explore moving much of their data and info.docxYour company wants to explore moving much of their data and info.docx
Your company wants to explore moving much of their data and info.docx
 
Your company plans to establish MNE manufacturing operations in Sout.docx
Your company plans to establish MNE manufacturing operations in Sout.docxYour company plans to establish MNE manufacturing operations in Sout.docx
Your company plans to establish MNE manufacturing operations in Sout.docx
 
Your company just purchased a Dell server MD1420 DAS to use to store.docx
Your company just purchased a Dell server MD1420 DAS to use to store.docxYour company just purchased a Dell server MD1420 DAS to use to store.docx
Your company just purchased a Dell server MD1420 DAS to use to store.docx
 
your company is moving to a new HRpayroll system that is sponsored .docx
your company is moving to a new HRpayroll system that is sponsored .docxyour company is moving to a new HRpayroll system that is sponsored .docx
your company is moving to a new HRpayroll system that is sponsored .docx
 
Your company is considering the implementation of a technology s.docx
Your company is considering the implementation of a technology s.docxYour company is considering the implementation of a technology s.docx
Your company is considering the implementation of a technology s.docx
 
Your company is a security service contractor that consults with bus.docx
Your company is a security service contractor that consults with bus.docxYour company is a security service contractor that consults with bus.docx
Your company is a security service contractor that consults with bus.docx
 
Your company has just sent you to a Project Management Conference on.docx
Your company has just sent you to a Project Management Conference on.docxYour company has just sent you to a Project Management Conference on.docx
Your company has just sent you to a Project Management Conference on.docx
 
Your company has designed an information system for a library.  The .docx
Your company has designed an information system for a library.  The .docxYour company has designed an information system for a library.  The .docx
Your company has designed an information system for a library.  The .docx
 
Your company has had embedded HR generalists in business units for t.docx
Your company has had embedded HR generalists in business units for t.docxYour company has had embedded HR generalists in business units for t.docx
Your company has had embedded HR generalists in business units for t.docx
 
Your company You are a new Supply Chain Analyst with the ACME.docx
Your company   You are a new Supply Chain Analyst with the ACME.docxYour company   You are a new Supply Chain Analyst with the ACME.docx
Your company You are a new Supply Chain Analyst with the ACME.docx
 
Your company has asked that you create a survey to collect data .docx
Your company has asked that you create a survey to collect data .docxYour company has asked that you create a survey to collect data .docx
Your company has asked that you create a survey to collect data .docx
 
Your Communications PlanDescriptionA.What is your .docx
Your Communications PlanDescriptionA.What is your .docxYour Communications PlanDescriptionA.What is your .docx
Your Communications PlanDescriptionA.What is your .docx
 
Your community includes people from diverse backgrounds. Answer .docx
Your community includes people from diverse backgrounds. Answer .docxYour community includes people from diverse backgrounds. Answer .docx
Your community includes people from diverse backgrounds. Answer .docx
 
Your Communications Plan Please respond to the following.docx
Your Communications Plan Please respond to the following.docxYour Communications Plan Please respond to the following.docx
Your Communications Plan Please respond to the following.docx
 
Your Communication InvestigationFor your mission after reading y.docx
Your Communication InvestigationFor your mission after reading y.docxYour Communication InvestigationFor your mission after reading y.docx
Your Communication InvestigationFor your mission after reading y.docx
 
Your Communications PlanFirst step Choose a topic. Revi.docx
Your Communications PlanFirst step Choose a topic. Revi.docxYour Communications PlanFirst step Choose a topic. Revi.docx
Your Communications PlanFirst step Choose a topic. Revi.docx
 
Your coffee franchise cleared for business in both countries (Mexico.docx
Your coffee franchise cleared for business in both countries (Mexico.docxYour coffee franchise cleared for business in both countries (Mexico.docx
Your coffee franchise cleared for business in both countries (Mexico.docx
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

12The Chi-Square Test Analyzing Categorical DataLea.docx

  • 1. 12 The Chi-Square Test: Analyzing Categorical Data Learning Objectives After reading this chapter, you should be able to: • Describe the conditions that fit chi-square tests. • Calculate and interpret the goodness of fit test and chi-square test of independence. • Calculate and interpret the phi coefficient and Cramer’s V. iStockphoto/Thinkstock tan81004_12_c12_295-322.indd 295 2/22/13 3:44 PM CHAPTER 1212.1 Examining Categorical Data Chapter Outline 12.1 Examining Categorical Data 12.2 The Goodness-of-Fit (1 3 k) Chi-Square Calculating the Test Statistic Interpreting the Test Statistic Understanding the Chi-Square Hypotheses
  • 2. Distinguishing Between Goodness-of-Fit Chi-Square Tests and t-Tests or ANOVAs A 1 3 k (Goodness-of-Fit) Chi-Square Problem With Unequal fe Values A Final 1 3 k Problem 12.3 The Chi-Square and Statistical Power 12.4 The Goodness-of-Fit Test in Excel 12.5 The Chi-Square Test of Independence Setting up the Chi-Square Test of Independence Interpreting the Chi-Square Test of Independence Phi Coefficient and Cramer’s V A 3 3 3 Test of Independence Problem Chapter Summary 12.1 Examining Categorical Data The 19th-century British statesman Benjamin Disraeli is credited with saying that there are three kinds of lies: lies, damned lies, and statistics. Clearly, he had to have a place in this book, even if it is in the final chapter. But he belongs here because of another com- ment that is particularly relevant to the topics in this chapter. He observed that what we anticipate seldom occurs and what we least expect generally happens (Oxford, 1980). Disraeli’s expressed skepticism was almost certainly tongue in cheek. Indeed, the work on regression in Chapters 9 and 10 is based on the understanding that outcomes are not unpredictable, but the statement provides an effective segue into the connection between what occurs and what might be expected to occur. That analysis
  • 3. is the focus of this chapter. Part of the discussion in Chapter 2 was how data differ according to scale, and how the statistics that can be calculated also relate to scale; you learned about different types of data scales and the appropriate types of statistics for each. For example, for nominal scale data, only the mode (Mo) makes sense as a measure of central tendency. Subsequent chap- ters revealed that it is not only descriptive statistics that are specific to the scale of the data. The more involved statistical tests are also data-scale dependent. Recall that the depen- dent variable in a t-test, a z-test, and ANOVA must be data that fit a continuous (interval or ratio) scale. Both variables in the Pearson Correlation must be at least interval scale. These distinctions are very important. Along with whether the hypothesis deals with dif- ference or association and whether the groups are independent, the scale of the data is an important guide to determining the appropriate statistical procedure. tan81004_12_c12_295-322.indd 296 2/22/13 3:44 PM CHAPTER 12Section 12.2 The Goodness-of-Fit (1 3 k) Chi- Square Sometimes the best data available are not continuous. There may be no way to verify the normality of the data (perhaps because they are not normal). The groups involved in the
  • 4. analysis may not have equivalent variances. And perhaps the relationships between vari- ables are not linear. When important assumptions about the quality of the data cannot be satisfied, the answer is to move to tests with more relaxed requirements. Chapter 8 provided the first discussion of nonparametric statistical analysis. Recall that these tests set aside a number of important assumptions about the characteristics of the groups involved in an analysis. Removing some of the requirements associated with the characteristics of the data provides greater analytical flexibility. Chi-square tests allow for the analysis of data that are exclusively categorical. This distin- guishes them from all the statistical tests discussed in previous chapters. In addition, the use of categorical data, since categorical/nominal data cannot reflect the characteristics of normality, suggests that the chi-square tests are nonparametric tests, also referred to as “distribution-free” tests. No assumptions are needed about how the data are distributed, nor are there requirements regarding their scale. These tests provide flexibility, but they exact a cost as well, which will be discussed as the chapter progresses. The chi-square pro- cedures have many applications in business analysis and decision making and are a main- stay in the manager’s statistical “toolbox.” The chi-square tests were developed by Karl Pear- son—the “Pearson Correlation” Pearson. Note that the Greek letter for “c” is written x and pronounced
  • 5. with a hard c, so chi is pronounced “kie,” rhyming with “pie.” There are two chi-square procedures discussed in this chapter. Both of them have two names. The first is called the goodness-of-fit chi- square test, or alternatively the 1 3 k (said “one by kay”) chi-square. 12.2 The Goodness-of-Fit (1 3 k) Chi-Square Perhaps a market research specialist is trying to determine whether several local “talk radio” stations have approximately similar audiences. Among other things, the answer will influence what individual stations can charge for advertising. The market research specialist makes a random selection of people from a local telephone book and calls each number to ask residents if they listen to talk radio. For those answering in the affirmative, the question is which station they prefer. Both names for this procedure are instructive for what they reveal about the kind of analy- sis involved. “Goodness-of-fit,” however awkward the grammar, indicates that what is at issue is how “good” the data fit an initial hypothesis. That initial hypothesis describes the expected distribution of the data. In the talk radio example, the procedure will be to test whether listeners prefer the major talk stations in about equal proportions; it will provide an analysis of how well the data fit that assumption. Key Terms: The goodness- of-fit, or 1 3 k chi-square, is a test for significant differences in the categories of a single,
  • 6. nominal scale variable. tan81004_12_c12_295-322.indd 297 2/22/13 3:44 PM CHAPTER 12Section 12.2 The Goodness-of-Fit (1 3 k) Chi- Square The 1 3 k chi-square name is a reminder that the procedure involves just a single grouping variable (the “1” in the name), which is divided into some number (k) of categories. Note that the “k” here has the same meaning that it had in ANOVA. It refers to the number of groups in the analysis. In the preceding example: • The categorical (grouping) variable is the preferred radio station. • The k refers to the number of categories into which the single variable is divided, which is the number of talk radio stations that will be involved in the analysis. Take care not to confuse the number of categories with the number of variables involved in the analysis. If there are five talk radio stations in the listening area, there are five cat- egories or dimensions of the single variable, preferred radio station. Recall that nominal data are also often called categorical, or “count” data, stemming from the fact that the measurement involved with these kinds of data
  • 7. is a matter of asking listen- ers which station they listen to, and then sorting them into the relevant category (preferred radio station) accordingly. This “count” becomes the dependent variable in chi-square tests. The analysis hinges on the frequency with which subjects fall into the individual categories. Note that the issue is not how much more the individual prefers station A to station B, something that would indicate ordinal scale data, and the question is not how much time the individual spends listening, which would provide ratio scale data. The only question for respondents who listen to talk radio is which is the preferred station. If the question that drives the study is whether listeners prefer the five stations in about equal proportions, that is the hypothesis that will be tested. In that instance, the expecta- tion is that numbers of listeners in each of the categories that represent the different talk radio stations will be reasonably similar. If they are, the resulting chi-square value will not be statistically significant. Statistically significant results emerge in a goodness-of-fit chi- square when there is a substantial discrepancy between what is observed in the data and what is expected based on that initial hypothesis. Exactly how much of a discrepancy there must be is determined by comparing the calculated value of chi-square to a critical value that, like the other t, F, and r test statistics, is determined by the degrees of freedom for the problem and the probability level at which the test is conducted.
  • 8. Referring back to the radio station problem, the market research specialist needs to either find support for the hypoth- esis that listeners tune in to all five stations in about equal proportions or update the expectation. Ninety-five listeners are asked about their station preferences. Sixty of the respon- dents name one of the five stations in the market area. The other 35 listen to subscription stations on satellite radio that are not located in the area. Since the interest is in listeners to local stations, those 35 people were excluded from the study. From the remaining 60 listeners, the results are as follows: StationA 15 StationB 8 StationC 12 StationD 10 StationE 15 Review Question A: What is the scale of the data required by either of the chi- square tests? tan81004_12_c12_295-322.indd 298 2/22/13 3:44 PM CHAPTER 12
  • 9. Where x2 5 the value of chi-square fo 5 the frequency observed; how many individuals occur in the particular categories fe 5 the frequency expected; how often individuals can be expected to occur in the particular category according to the initial hypothesis The chi-square test statistic involves a good deal of repetitive subtracting and squaring. A good way to keep the calculations straight is to complete them in something like Table 12.1. The successive steps in the calculations are represented in the rows beginning with the fo row near the top of the table, and then working down the rows one at a time. Each row in the table represents a calculation step for determining the x2 value. T Table 12.1: The 1 3 k chi-square The survey of the 60 respondents indicates that preferences are as follows: StationA 15 StationB 8 StationC 12
  • 10. StationD 10 StationE 15 These results provide a range of 8 listeners for the least frequently mentioned Station B to 15 listeners for the most popular stations, which are Stations A and E, so there are clearly differences in listeners’ preferences. The question the chi- square procedure will answer is whether the differences in “count” across the five stations are just random differences that can be expected because of sampling error, or whether the differences are great enough that they are likely to emerge every time data are collected and the results analyzed. It is that last outcome, of course, that defines statistical significance. Calculating the Test Statistic The chi-square test statistic, which is neither intimidating to look at nor difficult to calcu- late, has this form: Formula 12.1 x2 5 S 1 fo 2 fe 2 2 fe Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 299 2/22/13 3:44 PM CHAPTER 12
  • 11. Following the steps outlined above, Statistic Station A Station B Station C Station D Station E fo 15 8 12 10 15 fe 12 12 12 12 12 fo 2 fe 3 24 0 22 3 ( fo 2 fe) 2 9 16 0 4 9 ( fo 2 fe) 2/fe .75 1.33 0 .33 .75 x2 5.751 1.33 101 .33 1.755 3.16 The values in the fo row are counts of the number of individuals that occur in each cat- egory of the variable. • These “frequency observed” values are the number of listeners from the sample of 60 who indicate that they listen to a particular radio station. • The sum of the fo values across the multiple categories of the variable must always sum to the total sample size, n. The second row, designated fe, indicates what is expected based on whatever hypothesis or assumption prompted the analysis.
  • 12. • For the problem above, the hypothesis/assumption is that listeners are attracted to the five stations in approximately equal proportions. • That expectation is reflected in equal values for each of the different categories. Later there will be a problem where the expectation is that the categories will not be equal, which must also be reflected in the fe values. If, for example, station A is associated with a major network and has several nationally syndicated shows, perhaps the expectation is that station A is twice as popular as the others. In that case, the fe value for station A would be twice as high as the fe values for each of the other stations. Here, however, the problem is simpler. The hypothesis is that the stations are equally popular, so determining what to expect is simply a matter of dividing the total number of listeners by the number of categories: fe 5 n/k Where n 5 the total of all subjects in all categories k 5 the number of categories So for the radio station problem, because n 5 60 and k 5 5, the fe values are determined as follows:
  • 13. fe 5 60/5 5 12 in each fe category. Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 300 2/22/13 3:44 PM CHAPTER 12 Study the formula for the test statistic for a moment, review the order of mathemati- cal operations, and the process for calculating x2 will be straightforward. Recall from what your ninth-grade algebra teacher told you that when there are multiple operations required: • The first step is to complete whatever needs to be done in the parentheses (“please”). • Then deal with exponents (“excuse”). • Handle multiplication and division (“my dear”) next. • Then complete any addition and subtraction last (“Aunt Sally”) when they are not in parentheses. So in terms of the rows in Table 12.1, the process is to: 1. Fill in the fo values in the first line, determined by the number of people who indicate that they listen to a particular station. 2. For this problem at least, because of the expectation that the
  • 14. five stations are all about equally popular among listeners, divide n by k and enter that value in each of the five fe boxes on the second line. Now, following order of mathematical operations in columns 1 through 5: 3. On line 3 enter the value that is fo 2 fe for each of the five stations; for each category it is the fo value from the first line minus the fe value from the second line. 4. On line 4 deal with the exponent by squaring each difference between fo and fe. 5. On the next line, divide the result of the previous line ( fo 2 fe) 2 by the fe value from the second line. 6. On the final line, sum the ( fo 2 fe) 2/fe across the five categories to determine the x2 value. The result of completing these steps for the data above is that x2 5 3.16. Interpreting the Test Statistic The next step is to determine whether a x2 value of 3.16 is statistically significant by com- paring it to the appropriate critical value of x2 from Table 12.2. As with the other tables,
  • 15. the critical value is determined according to the probability level at which the test is con- ducted (p 5 .05 is the default level for this test as well), and the number of degrees of freedom for the problem. The df for the goodness-of-fit chi- square are k 2 1, the number of categories of the variable, minus 1. Here, df 5 4. Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 301 2/22/13 3:44 PM CHAPTER 12 Table 12.2: The critical values of chi-square df p 50.05 p 50.01 p 50.001 1 3.84 6.64 10.83 2 5.99 9.21 13.82 3 7.82 11.35 16.27 4 9.49 13.28 18.47 5 11.07 15.09 20.52 6 12.59 16.81 22.46 7 14.07 18.48 24.32 8 15.51 20.09 26.13
  • 16. 9 16.92 21.67 27.88 10 18.31 23.21 29.59 11 19.68 24.73 31.26 12 21.03 26.22 32.91 13 22.36 27.69 34.53 14 23.69 29.14 36.12 15 25.00 30.58 37.70 16 26.30 32.00 39.25 17 27.59 33.41 40.79 18 28.87 34.81 42.31 19 30.14 36.19 43.82 20 31.41 37.57 45.32 Source: http://home.comcast.net/~sharov/PopEcol/tables/chisq.html Retrieved: 6 July, 2012. As with the other tests conducted so far, the calculated value of chi-square is statistically significant when it is equal to, or larger than, a critical value. That value is determined by the probability level of the test and the degrees of freedom for the problem. For a x2.05(4) the table indicates that the critical value is 9.49. A
  • 17. critical value from the table greater than the calculated value of chi-square indicates that the fo to fe difference is best explained by differences that could occur by chance in the chi- square distribution. The result is not statistically significant. This result prompts the marketing analyst to fail to reject the null hypothesis. As an aside, the critical values for chi-square are often appended to two decimals in their tables, just as z values were in their table. With chi-square, stopping at two decimals is not just a matter of crowding more values onto a page as was the case with z. The nominal data upon which chi-square values are based are relatively crude compared to ordinal, or interval or ratio, data, and it makes less sense than with those other data to imply the level of exactness suggested by three decimals. Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 302 2/22/13 3:44 PM http://home.comcast.net/~sharov/PopEcol/tables/chisq.html CHAPTER 12 Understanding the Chi-Square Hypotheses Like all statistical hypotheses, the null and alternate hypotheses in chi-square problems refer to what occurs in the populations the samples represent.
  • 18. • In the language of statistics, the null hypothesis for a chi- square problem is that the frequency expected is equal to the frequency observed (Ho: fe 5 fo). The fact that the result in the problem just worked was not statistically sig- nificant indicates that in a population where people listen to five talk radio stations in equal proportions, it is not improbable to draw a sample in which the numbers of listeners who prefer each station range from 8 to 15 out of 60. A sample with fo values of 15, 8, 12, 10, 15 is still consistent with the null hypothesis, and it is one of the outcomes that make up the chi-square distribution. • The alternate hypothesis is that fo values of 15, 8, 12, 10, 15 is not a sam- ple likely to occur in the chi-square distribution. Stated symbolically it is was actually observed. The null hypothesis indicates that any variability between what is observed and what is expected is explained by what will probably occur in the chi- square distribution. In other words, there is insufficient evidence to reject the possibility that what occurred is likely to have occurred by chance. The alternate hypothesis is that there is too much dif- ference between what is observed and what is expected to conclude that the outcome is
  • 19. due to chance. Distinguishing Between Goodness-of-Fit Chi-Square Tests and t-Tests or ANOVAs The 1 3 k, or goodness-of-fit, chi-square procedure falls under the general hypothesis of difference category of procedures. In that regard it is similar to the independent samples t-test and to ANOVA. Like those procedures, the value of the chi-square statistic is a mea- sure of difference. The primary difference is the scale of the data in the analysis. In inde- pendent samples t-tests and ANOVA, the t and the F respectively are measures of the difference between the means of the samples involved in the analysis. The independent (grouping) variable is categorical, and the dependent variable is continuous (interval or ratio scale). On the other hand, the chi-square statistic measures the difference between the frequencies of occurrence of a nominal (categorical) variable compared with what is expected. The larger the gap between the expected and observed frequency distribution, the greater the difference between fo and fe. Since the dependent variable is the “count,” or frequency, of occurrence of the categorical variable of interest, it is impossible to calculate means and standard deviations, which makes t-tests and ANOVAs impossible. A 1 3 k (Goodness-of-Fit) Chi-Square Problem With Unequal fe Values
  • 20. This first chi-square problem was based on the assumption that listeners preferred the five radio stations in about equal proportions, but equal fe values across all categories of the variable are not always the case. Perhaps a consumer advocate is testing the claim made by the manufacturer of an energy drink called Rush that consumers prefer its product Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 303 2/22/13 3:44 PM CHAPTER 12 2-to-1 over the major competitor’s product (Advantage) based on taste alone. If this is accu- rate, a random sample of preferences from consumers of energy drinks should indicate that twice as many prefer Rush over Advantage. Since it is highly unlikely that a random sample will yield exactly those results even if the claim is accurate, the chi-square test can be used to determine whether sample results are close enough to support that claim, or whether results are significantly different from the claim. The consumer advocate takes a sample of 150 students and finds that 27 of them have used both Rush and Advantage and express a preference for one over the other. The other 123 students prefer either some other energy drink, or use none at all. Their responses are discarded. Of the remaining 27 students, 16 of them prefer Rush
  • 21. and 11 prefer Advantage. Just as with the first problem, the 11 and 16 numbers represent the fo values, and their sum equals the value of n for the problem. That is the easy part. Because the claim by the manufacturer of Rush is that consumers prefer its drink 2-to-1 over the major competitor, Advantage, the fe values must reflect the 2-to-1 expectation. Calculating fe Values for Unequal Categories The total of both frequencies observed and frequencies expected must sum to the total, n. This will always be the case, regardless of the particular hypothesis. S fo 5 n, and S fe 5 n To calculate the fe values when the numbers in multiple cat- egories are not the same will involve vindicating that ninth- grade math teacher who said that someday algebra would be helpful. To determine the fe values, 1. Let x equal fe for the number who prefer the Advantage energy drink 2. Since the expectation in this example is that twice as many consumers will prefer Rush over Advantage, let 2x be the fe for those who prefer the Rush energy drink 3. Because the fe categories must sum to the total then x 1 2x 5
  • 22. n 4. Since, n 5 27, the expression can be changed as follows: x 1 2x 5 27 a. If x 1 2x 5 27, if follows that 3x 5 27 b. If 3x 5 27, then x 5 27/3, which makes x equal to 9 As a result of these calculations, the fe value for the Rush consumers is 18 (2x 5 2 3 9 5 18), and the fe value for Advantage consumers is 9 (x 5 9). With those values in hand, the claim that Rush is preferred twice as often as Advantage on the basis of taste can be tested with the 1 3 k chi-square. The solution is Table 12.3. Review Question B: What is the null hypothesis in a chi- square problem? Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 304 2/22/13 3:44 PM CHAPTER 12 Table 12.3: A 1 3 k chi-square for unequal fe values: Is Rush twice as popular as Advantage? Advantage Rush fo 11 16 fe 9 18
  • 23. fo 2 fe 2 22 ( fo 2 fe) 2 4 4 ( fo 2 fe) 2/fe .44 .22 x2 5 .44 1.22 5 .66 x2 5 .66 x2.05(1) 5 3.84. Accept Ho. Interpreting the Results Since the calculated value of chi-square is lower than the critical value from the table for p 5 .05 and 1 degree of freedom, the decision is to fail to reject. That part is straight- forward enough, but in this problem where the claim is that Rush is twice as popular as Advantage, what does failing to reject mean? The key is the null hypothesis, which always reflects the expectation upon which the test is based. Because this problem was set up with an fe value for one outcome that is two times the value of the other, failing to reject the null hypothesis means that there is not enough evidence to reject the claim that Rush is twice as popular as Advantage. To say it another way, although the data do not reflect exactly a 2-to-1 preference for Rush (16 is not 2 3 11), the departure from that claim is not sufficient to allow the consumer advocate to reject it.
  • 24. Note that the makers of Rush maintain that their product is twice as popular as Advantage based on taste. Whether it is entirely taste or not probably cannot be verified. Perhaps it is marketing ability that prompts the students to prefer Rush, or maybe the costs of the two products differ, or maybe one comes in a more convenient size than the other. The way the data were collected made the consumer’s stated preference the issue, without questions about the reasons for the preference. For whatever reason, students prefer the one product to the other by a great enough margin that it could be 2-to-1. A Final 1 3 k Problem To solidify the grasp of the goodness-of-fit procedure, here is one more problem. In this instance a consulting company is retained by a satellite provider to determine whether, in a particular region of the country, satellite TV is three times more popular than free TV, and whether cable TV is twice as popular as free TV. The consulting company is retained to check the veracity of that claim. A random sample of 93 viewers in the region is exam- ined and found to rely on the following for television service: Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 305 2/22/13 3:44 PM CHAPTER 12
  • 25. Satellite 65 Cable 16 Free 12 The same approach used in the last problem to determine the fe values produces the following: 3x (satellite) 1 2x (cable) 1 x (free TV) 5 93 6x 5 93 x 5 15.5 That x value makes the fe for free TV 5 15.5, for cable TV 5 2 3 15.5 5 31, and for satellite TV 5 3 3 15.5 5 46.5. Note that the fe values sum to 93, as they must. Do not be distracted by the fact that the fe values are not whole numbers. Although asking which type of TV service people are using will make the fo values whole, the fe numbers indicating what people are expected to use can take on any value. The calculations for this problem are in Table 12.4. Table 12.4: Another 1 3 k chi-square problem Satellite Cable Free TV fo 65 16 12
  • 26. fe 46.5 31 15.5 fo 2 fe 18.5 215 23.5 ( f o 2 fe) 2 342.25 225 12.25 ( f o 2 fe) 2/fe 7.36 7.26 .79 S x2 5 7.36 1 7.26 1 .79 5 15.41 With a calculated x2 5 15.41 and the table value for x2 of 5.99 when testing at p 5 .05 with 2 degrees of freedom, the results indicate that the null hypothesis should be rejected. The way these 93 people are distributed does not fit a chi-square dis- tribution where cable is twice as popular and satellite is three times as popular as free television. Review Question C: How is the scale of the data involved in an analysis related to the power of the statisti- cal procedure? Section 12.2 The Goodness-of-Fit (1 3 k) Chi-Square tan81004_12_c12_295-322.indd 306 2/22/13 3:44 PM CHAPTER 12Section 12.3 The Chi-Square and Statistical Power The difficulty with a problem such as this one is that the initial
  • 27. hypothesis is actually two hypotheses: Satellite is three times as popular as free television, and cable is twice as pop- ular as free television. When the results indicate rejecting the null hypothesis, it could be because the expected ratio of satellite versus free TV customers is not supported, because the ratio of cable versus free TV customers is not supported, or because neither hypothesis is supported. Recall that this was the case with a significant F in ANOVA as well. It did not provide clear evidence for which two specific groups are significantly different. A more definitive chi-square test would require that the consultant gather new data and check those hypotheses individually. 12.3 The Chi-Square and Statistical Power As noted at the beginning of the chapter, distribution-free tests like chi-square proce-dures provide great flexibility. They provide no restrictions regarding the scale of the data, there are no normality assumptions to contend with, and these procedures work quite well with small samples. In the statistical version of the “there’s no such thing as a free lunch,” expression, however, chi-square and most nonparametric procedures have a drawback. Note that even though there were differences between what was observed and what was expected in the first two problems in this chapter, neither chi- square value was statistically significant. The chi-square procedures are not very sensitive to minor variations between
  • 28. what is seen and what is expected. Indeed, the differences must be fairly substantial to produce a significant chi-square value. Recall that power in statistical testing is the ability of a procedure to detect statistical significance. Compared to something like ANOVA, the chi-square procedures are not particularly powerful. Much of the lack of power comes down to the scale of the data that are involved. Nominal data provide no information about what is measured except the category of the variable to which the individual belongs. For each of the energy drink consumers who prefer Rush over Advantage, all that is revealed is which energy drink is preferred. Nothing in the data indicates how much more one drink is preferred over the other. There is no ranking of preference on a scale of 1 to 5. All that is known is that, presented a choice, the consumer chose Rush. Recall that all statistical tests are based on probabilities, and that because the outcome is therefore never a certainty, there is the constant possibility of a Type I or Type II decision error. Because the chi-square procedures are insensitive to minor variations between fo and fe, chi-square analyses are more inclined toward Type II (beta) decision errors than tradi- tional parametric tests of significant differences. The risk is that an analysis that suggests that results are not statistically significant might be set aside if new data were gathered and the analysis run a second time.
  • 29. On the bright side, Type I (alpha) decision errors are relatively uncommon. It is not likely that upon finding a result statistically significant, further testing with new data would suggest otherwise; a decision to reject the null hypothesis is unlikely to be overturned by a second analysis. tan81004_12_c12_295-322.indd 307 2/22/13 3:44 PM CHAPTER 12Section 12.4 The Goodness-of-Fit Test in Excel 12.4 The Goodness-of-Fit Test in Excel The procedures in the Excel Data Analysis package do not include the chi-square tests. However, because the test statistic involves a good deal of repetitive subtracting, squaring, dividing, and so on, it is not difficult to set up an Excel spreadsheet to accommo- date a 1 3 k problem. This can be done by organizing a spreadsheet to complete the same calculations used in Tables 12.1, 12.3, and 12.4. To illustrate, an organic vegetable grower claims that in spite of the higher price, shoppers will select organically grown spinach as often as spinach grown with the help of pesticides and chemical fertilizers. The first 30 people who buy spinach on a particular day at a grocery store are examined for whether they bought the organically grown vegetable. Results are as follows: Organically grown: 10 Conventionally grown: 20
  • 30. To test the grower’s claim, • Enter the labels organic in cell B1 and conventional in C1. • Enter the labels fo in cell A2, fe in A3, fo 2 fe in A4, ( fo 2 fe) sqd in A5, 4fe in A6, and sum in A7. • Enter the values 10 and 20 in cells B2 and C2 respectively. Since the claim is that organic spinach will sell as frequently as conventionally grown spinach, the fe values are simply n 4 2 5 15. • Enter 15 in both B3 and C3. • In cell B4 enter the formula 5B2-B3 and then press Enter. • With the cursor on B4 hold the shift key down and move the cursor to C4 and then enter the command to fill right which will be near the far right of the menu ribbon so that the procedure in B4 is repeated in C4. • In cell B5 enter the command 5B4^2 to square the value in B4. • Repeat the B5 procedure in C5 using the fill right command as above. • In cell B6 enter the command 5B5/B3. • Repeat the B6 procedure in C6 using the fill right command. • In cell B7 enter the command 5sum(B6:C6). The last command in the above sequence produces the chi- square value for this problem, x2 5 3.33. The critical value from the table for 1 degree of freedom and p 5 .05 is x2 5 3.84. The results are not significantly different from the organic grower’s claim; the organically
  • 31. grown spinach may be just as popular as the traditionally produced spinach, in spite of its higher price. Figure 12.1 is a screenshot of the spreadsheet for this problem. tan81004_12_c12_295-322.indd 308 2/22/13 3:44 PM CHAPTER 12Section 12.5 The Chi-Square Test of Independence Figure 12.1: Chi-square goodness-of-fit using Excel 12.5 The Chi-Square Test of Independence The goodness-of-fit or 1 3 k chi-square procedure accommodates just one categorical (grouping) variable. In that regard, it is similar to a one-way ANOVA, which likewise involves just a single variable, although it is an interval or ratio scale rather than a nomi- nal scale variable with the data divided into any number of categories or groups. Although basing an analysis on a single variable keeps the arithmetic simple, it consigns to error any variance that is not explained by that single variable. On the other hand, when multiple independent variables are included, as is the case in factorial ANOVA, there is less residual variance and a smaller error term. In addition, besides each variable contributing to the result, sometimes multiple variables act together, a phenomenon called a “statistical inter- action” in factorial ANOVA. A similar thing can happen with chi-square procedures. Some- times a single variable is an inadequate explanation of an
  • 32. outcome, and in those circum- stances a second variable will act in concert with the first. The factorial ANOVA has an approximate equiva- lent in one of the chi-square procedures. It is called the chi-square test of independence, or the r 3 k chi-square. The names for this procedure are just as informative as were those for the one-variable chi-square test. The “test of independence” alludes to the fact that what is tested is whether the two vari- ables included in the analysis operate independently. Key Terms: The chi-square test of independence, or r 3 k chi-square is a test of the inde- pendence of two nominal scale variables. tan81004_12_c12_295-322.indd 309 2/22/13 3:45 PM CHAPTER 12Section 12.5 The Chi-Square Test of Independence In the language of ANOVA, the variables are analyzed to determine whether they interact. When the chi-square value is statistically significant, it indicates that the two variables do not operate independently, a result that will lead to an ancillary analysis to determine the level of their relationship. As was the case in factorial ANOVA, both of the interacting variables in the r 3 k chi- square are categorical. The r 3 k designation refers to the way
  • 33. the analysis is set up. The data are organized so that the levels of one variable are indicated in the rows (r) of a table, and the other variable is represented in the columns, with each column representing a separate category (k) of that variable. Although the calculations will fit the same table that was used for the goodness-of-fit (1 3 k) problems, the frequency expected ( fe) values are calculated differently. Setting up the Chi-Square Test of Independence The rows and columns explanation just above pro- vides the organization for a contingency table that is used in this chi-square test of independence. The way it is used can be illustrated with a problem. The human resources department for a fast food chain is considering offering an early retirement package to some of the more senior managers in an effort to reduce payroll. The department wishes to be able to predict whether a severance package with a $15,000 bonus offered to early retirees will affect retirement plans. Because of the potential costs associated with offering the bonus to dozens of senior managers, the human resources people need to have some understanding of the impact that the bonus will have on employees’ decisions to retire. Among the managers, 30 senior managers are identified and randomly divided into two groups of 15 each. • The 15 managers in the first group are asked to complete a questionnaire, to be submitted anonymously, which includes a question about whether the
  • 34. respondent anticipates retiring within the next three years. Among this group, two managers indicate that they intend to retire within the specified period. • Those in the second group of 15 managers are asked whether, if a $15,000 bonus were offered to those who retire in the next three years, they would retire in that period. Of the 15, seven managers indicate that they would retire within the next three years if the bonus were offered. Note that there are two potentially related variables involved. One is whether managers intend to retire in the coming three years. The other is whether they would retire in that time frame if a bonus were offered. These two variables can be represented in a table that looks much like what was used earlier to set up a two-way ANOVA. In addition to helping one to visualize the problem, when used in the chi-square test of independence the table helps with the task of deriving the fe values. But before worrying about the calculations, note the contingency table below. It is organized with the categories of one variable in the rows of the table and the categories of the other variable in the table columns: Key Terms: The contingency table organizes data into rows for the categories of one variable, and columns for the catego- ries of the other in an r 3 k chi-square.
  • 35. tan81004_12_c12_295-322.indd 310 2/22/13 3:45 PM CHAPTER 12Section 12.5 The Chi-Square Test of Independence Retirement Yes Retirement No No Bonus Bonus Organizing the Contingency Table This same table appears in Table 12.5 with the results of the survey filled in. There are also totals for each row, each column, and a value for all subjects together, n. This particular contingency table is a 2 3 2. Although the chi-square test of independence is limited to two variables, those two variables can each have any number of categories. There could have been five levels of the bonus, for example offered to different groups of potential retirees: no bonus, a $5,000 bonus, a $10,000 bonus, a $15,000 bonus, and a $20,000 bonus. There might have been more than two categories of the retirement decision as well: retire in the next year, retire in two to three years, retire in four to five years, and so on. With the 2 3 2 problem, there are just four cells labeled “a” through “d.” Note that the value in cell “a” is the number that represents the combination of the no-bonus group and the number in that group who indicated that they would retire. The
  • 36. number in cell “d” is the combination of the bonus and the number in that group who opted not to retire, and so on. Table 12.5: The chi-square test of independence for retirement decisions and a severance bonus A. The Contingency Table Will Retire Won’t Retire Row Totals No Bonus a 2 b 13 15 Bonus c 7 d8 15 Column Totals 9 21 n 5 30 B. Completing the Analysis Statistic a b c d fo 2 13 7 8 fe 4.5 10.5 4.5 10.5 fo 2 fe 22.5 2.5 2.5 22.5 ( f o 2 fe) 2 6.25 6.25 6.25 6.25 ( f o 2 fe) 2/fe 1.39 .60 1.39 .60 x2 5 1.39 1.601 1.39 1.6053.98
  • 37. The two sample groups each have values that sum to 15, a value reflected in the row totals on the right. The sum of the two rows must total n. Although the 2 columns together must also sum to 30, the individual columns will not necessarily each be 15 since the number opting for and the number opting against retirement from the bonus and no-bonus groups are not equal. tan81004_12_c12_295-322.indd 311 2/22/13 3:45 PM CHAPTER 12Section 12.5 The Chi-Square Test of Independence The fo and fe Values in the Chi-square Test of Independence The values in each of the four cells are the fo values that will be used to calculate the chi- square value. Note that in Part B of Table 12.5, the fo values are listed just as they were listed in the goodness-of-fit problems done earlier. The difference between the way the chi-square values are calculated in goodness-of-fit and test of independence problems is in the way the fe values are determined. For each of the “a” through “d” cell values, fe is calculated as the row total in which the particular cell is included times the column total in which the cell is part, divided by n. Symbolically, for each cell fe 5 (row ttl 3 col ttl)/n. This makes the fe values in the retirement problem as follows:
  • 38. • Cell a: (15 3 9)/30 5 4.5 • Cell b: (15 3 21)/30 5 10.5 • Cell c: (9 3 15)/30 5 4.5 • Cell d: (15 3 21)/30 5 10.5 Once the fe values are determined, the rest of the calculations for a chi-square value are the same as they were for a goodness-of-fit test and are completed in Part B of Table 12.5. Begin by subtracting fe from fo, square the difference, and so on. The critical value of chi- square comes from the same table used for goodness-of-fit problems. The degrees of free- dom for the test are determined by taking the rows minus 1, times the number of columns minus one (df 5 (r 2 1) 3 (k 2 1)). For this problem, df 5 (2 2 1) 3 (2 2 1) 5 1. With the calculated value x2 5 3.98, and the table value x2.05(1) 5 3.84, the result is statistically significant. In the context of the r 3 k procedure, what does a significant outcome mean? Chi-square results are statistically significant when the fe values diverge enough from the fo values that the difference between the two is unlikely to have occurred by chance. The implica- tion is that the factor that creates the fo versus fe difference in r 3 k problems is the rela- tionship between the two variables. If the retirement decision and the availability of the retirement bonus variables are unrelated, which is to say that they operate independently, there will be no significant result. The fo 5 fe null hypothesis
  • 39. for an r 3 k problem has the same meaning as the null hypothesis in a Pearson Correlation problem. It means that there is no relationship between the two variables. The difference is that a Pearson Cor- relation cannot be calculated between two nominal variables. The Yates Correction to 2 3 2 Problems Earlier we said that Type I decision errors are relatively uncommon with chi-square proce- dures. While that is generally true, the 2 3 2 problem, where each variable has two levels like the example here, may be an exception. With those problems there can be a tendency to incorrectly find statistical significance when one or more of the fe values in the problem fall below 5.0. In what is now called the “Yates correction,” Yates suggested curbing this tendency by subtracting .5 from all fo 2 fe cell differences in any 2 3 2 problem if at least one fe value is less than 5.0. The reduced fo 2 fe difference makes a significant chi-square value less likely, of course, and so reduces Type I error probability. tan81004_12_c12_295-322.indd 312 2/22/13 3:45 PM CHAPTER 12Section 12.5 The Chi-Square Test of Independence There is not a consensus about the use of the Yates correction, and some statisticians argue that the .5 reduction is in fact an overcorrection that makes the
  • 40. procedure unnecessarily conservative. The decision in this book has been to not incorporate the correction. The issue is raised so that the reader will know what it is and why some analysts recommend making the correction. For more information, Howell (1992) provides a helpful discussion of the Yates correction. Interpreting the Chi-square Test of Independence The issue in both the chi-square goodness-of-fit (1 3 k) and test of independence (r 3 k) is whether what is observed is consistent with an expected outcome. In the case of the test of independence, a significant result (rejecting the null hypothesis) indicates that the two variables are not functioning independently; they are correlated. In our example, rejecting Ho indicates that the intention to retire and the availability of the cash bonus are related. In a reference back to the difference between correlation and causation in Chapter 8, it is not clear that the bonus causes managers to make a retirement decision, but for whatever reason, there are significantly more intents to retire when the bonus is part of the equation. At this point the focus turns to the nature of that relationship. Since it is clear from com- paring the calculated chi-square value to the table value that there is a correlation, the question now is of the strength of the relationship between the two variables. Phi Coefficient and Cramer’s V
  • 41. To this point the analysis was similar to ANOVA or t-tests and fell under the general umbrella of the hypothesis of difference. Having determined a significant difference between fo and fe, the focus now shifts to a hypothesis of association issue. The correlation procedure for interval/ratio variables that meet normality requirements was Pearson’s r. For a correlation of ordinal scale data, or for interval/ratio data that fail to satisfy normality requirements, Spearman’s rho was the answer. The need here is for a correlation procedure based on nominal data, and there are several from which to choose. Pearson, who developed chi-square, also developed a correlation procedure for nominal variables called coefficient of contingency, C. Because it produces quite a con- servative correlation value, it is not as widely used as some of the alternatives. The upper bound for most correlation procedures is 1.0. The coefficient of contingency can- not reach that value. Two of the other correlation procedures for nomi- nal variables are phi coefficient, f (f is the Greek equivalent of f ) , and Cramer’s V, which are both explained here. Contingency coefficient, phi coeffi- cient and Cramer’s V are all based directly on the chi-square value, which makes them easy to calcu- late once the chi-square value has been determined. Key Terms: Phi coefficient and Cramer’s V are both cor-
  • 42. relation procedures for nominal data used after a significant r 3 k chi-square result. tan81004_12_c12_295-322.indd 313 2/22/13 3:45 PM CHAPTER 12Section 12.5 The Chi-Square Test of Independence Formula 12.3 V 5 "1f2/fewer of rows or columns 2 1 2 Where x2 5 the value calculated in the r 3 k procedure n 5 the total number of subjects For the decision-to-retire and the cash bonus problem, x2 5 3.98, and n 5 30. Therefore to solve for f: f 5 "1x2/n 2 5 "13.98/30 2 5 "1.33 f 5 .36 The retirement/cash bonus relationship is f 5 .36. Although f cannot have a negative value (the ways x2 is calculated and the square root function in the f formula do away with that possibility), it is interpreted like any other correlation statistic. A 0 correlation indicates no relationship; a 1.0 correlation indicates a perfect correlation. The correlation here of f 5 .36 is “modest,” or perhaps “low,” by correlation standards.
  • 43. When either of the two variables in the analysis has two levels, V 5 f. This is because the formula for Cramer’s V is, Furthermore, if the chi-square value is statistically significant, the correlation coefficient will be significant at the same level; there are no separate significance tests necessary for C, V, or f. If the chi-square value is not statistically significant (if the decision in the initial chi-square analysis is to fail to reject), there is no point in calculating a correlation value since failing to reject Ho: fo 5 fe indicates that the variables are independent. In a statistically significant 2 3 2, 2 3 3, or 3 3 2 chi-square problem, phi coefficient will be the appropriate follow-up correlation procedure. The formula is the following: Formula 12.2 f 5 "1x2/n 2 Where f2 5 the square of the phi coefficient rows or columns 5 the number of levels of the two variables If there are just 2 levels of either variable, the divisor is 1 and ".362 5 .36, V 5 f. If the fewest number of rows or columns is three, this changes, of course, and the correct correla- tion value to calculate is Cramer’s V. tan81004_12_c12_295-322.indd 314 2/22/13 3:45 PM
  • 44. CHAPTER 12Section 12.5 The Chi-Square Test of Independence A 3 3 3 Test of Independence Problem A property management company in a large city manages the landscaping and rent col- lection at several apartment complexes. Some of the complexes are quite large, with more than 50 units; some are very small, with fewer than 10 units; and the others are classi- fied as medium-sized. Collecting and crediting rent payments each month is very time- consuming. A bookkeeper at the management company guesses that in the smaller complexes there is a more intimate relationship between manager and tenant than in the larger complexes, and rent difficulties are correspondingly lower as a result. To test that assumption, the bookkeeper examines data from 100 apartments located in each of small, medium, and large complexes and determines the number of rent payments that are on time, that are within one week late, and that are more than one week late. The data are below: Rent Submission On-time Within 1 week .1 week late Row totals Small a 65 b 30 c5 100 Medium d55 e 35 f 10 100
  • 45. Large g45 h 25 i 30 100 Column totals 165 90 45 300 The first question is whether the two variables of apartment complex size and rent late- ness are independent. If the chi-square value is statistically significant, the decision will be that they are not independent, and that will prompt a second question about the strength of their relationship. The calculations for this chi- square test of independence are in Table 12.6. Table 12.6: A chi-square test of independence for the size of the apartment complex and the lateness of the rent a b c d e f g h i fo 65 30 5 55 35 10 45 25 30 fe 55 30 15 55 30 15 55 30 15 fo 2 fe 10 0 210 0 5 25 210 25 15 ( fo 2 fe) 2 100 0 100 0 25 25 100 25 225 ( fo 2 fe) 2/fe 1.82 0 6.67 0 .83 1.67 1.82 .83 15 S 28.64 • The calculated x2 5 28.64. Since this is a 3 3 3 problem, df 5
  • 46. (3 2 1) 3 (3 2 1) 5 4. • The critical value of chi-square x2.05(4) 5 9.49. tan81004_12_c12_295-322.indd 315 2/22/13 3:45 PM CHAPTER 12Section 12.5 The Chi-Square Test of Independence The result is statistically significant. How promptly renters pay their monthly rent is related to the size of the complex in which they live. Determining the strength of the correlation calls for Cramer’s V since both variables have more than two levels. V 5"1f2/fewer of rows or columns 2 1 2 But since V requires the calculation first of phi coefficient, the answer begins there. f 5 "1x2/n 2 5 "128.64/300 2 5 .31 With a value for phi, V can be calculated. V 5"1f2/fewer of rows or columns 2 1 2 5 "1 .312/2 2
  • 47. 5 ".05 5 .22 The relationship between the size of the rental complex and how promptly rent is paid is V 5 .22. The correlation is not particularly robust, but it is statistically significant, since the value of chi-square upon which it is based is significant. Based on the analysis, those at the property management company are in a position to alter procedures in some way that responds to the relationship between rent payment and complex size. Perhaps it will make a difference if rent collections can be made online. Maybe an effort to improve the social relationship between the apartment manager and the tenants, particularly in large apartment complexes, will prompt rent payments to be made in a more timely fashion. Review Question D: What does phi coef- ficient measure? tan81004_12_c12_295-322.indd 316 2/22/13 3:45 PM CHAPTER 12Chapter Summary Chapter Summary The chi-square tests assume that Disraeli was unnecessarily skeptical. In fact, an informed expectation of an outcome should provide a fairly good indicator of what will actually
  • 48. occur, which is the understanding upon which both chi-square tests are based. The chi- square tests answer many of the same questions that earlier tests in this book answered. The difference is that the chi-square tests are based on nominal data. Pearson developed tests for data which indicate nothing more than the count of the number that occur in a particular category. Consequently, the analysis is based on differences between the fre- quency observed ( fo) and the frequency expected ( fe) (Objective 1). The goodness-of-fit, or 1 3 k procedure analyzes whether the proportions occurring in the multiple categories of a single variable are consistent with what is expected based on an initial hypothesis. The chi-square test of independence, also called the r 3 k procedure, straddles the boundary between tests of the hypothesis of difference and those related to the hypothesis of association. The initial analysis establishes whether two variables func- tion independently. Like the goodness-of-fit test, this part of the analysis is based on the magnitude of the fo 2 fe difference. A significant value of chi- square means rejecting the probability that the variables are independent (Objective 2). At that point the question is about the strength of the relationship between the two variables. That correlation can be gauged by one of several correlation procedures designed for nominal data. Those cov- ered in this chapter include the phi coefficient and Cramer’s V (Objective 3).
  • 49. It should be noted here that this book represents only a brief introduction to the analysis procedures that can be useful for managers. The list of statistical procedures covered in 12 chapters is far from exhaustive, but it is a valuable beginning. The different tests explained in Chapters 1212 are representative of those that are appropriate to many kinds of busi- ness analysis. Figure 12.2 is a flowchart-like guide to which test will answer the manager’s question. It is provided here as a summary and overview of the preceding chapters. As the decision tree is followed from the top down, note the issues are: • Is the question about differences or associations? • Are the data involved nominal (categorical), ordinal, or interval/ratio? • How many groups are involved? • Are the groups independent? tan81004_12_c12_295-322.indd 317 2/22/13 3:45 PM CHAPTER 12Chapter Summary Figure 12.2: Finding the appropriate test Answering each question above in turn guides one to the test tailored to the particular problem. If the question is about differences between groups (1), the measures involved are interval or ratio scale (2), there are 4 groups involved (3) that are independent (4), one of the ANOVA tests will answer the question.
  • 50. No statistics book can provide comprehensive coverage of every statistical test, and early in the development of this book a decision was made to present tests neither for sig- nificant differences nor association for ordinal data. They are presented in Figure 12.2 to round it out, but they are not found elsewhere in the book. Should the reader wish to pur- sue Mann-Whitney, Kruskal-Wallis, Wilcoxon, or Friedman’s ANOVA tests, Tanner (2011) is a useful source. No. Groups The testIndependent? Independent Related Questions about Differences Questions about Associations Chi-square goodness of fit Phi Coefficient Spearman’s rho Pearson Correlation Point-biserial Correlation Multiple Correlation Semi-partial Correlation Chi-square test of independence Mann-Whitney U
  • 51. Wilcoxon T Kruskal-Wallis H Friedman’s ANOVA Friedman’s ANOVA Independent t Before/After t Analysis of Variance 1 2 2 2+ Data Scale Nominal data Ordinal data Interval/ Ratio data Independent
  • 52. Related Independent Related 2 2+ Independent Related tan81004_12_c12_295-322.indd 318 2/22/13 3:45 PM CHAPTER 12Chapter Formulas When the procedures encountered in this book are used by managers with an under- standing of the procedures’ purpose and an appreciation for their requirements, they offer the promise that the related decisions will be more reasonable and better informed, and can equip managers with the tools they need to be most effective. Finally, if you wish to broaden your horizons in the future, virtually all of the more advanced procedures that are beyond the scope of this book are based on the concepts represented here. Your authors wish you the best of luck. The first author can be reached with comments and questions at [email protected] Answers to Review Questions A. The chi-square procedures require data of only nominal
  • 53. scale. B. The null hypothesis is that the frequency observed equals the frequency expected, fo 5 fe, meaning that there is not enough evidence to reject the possibility that what emerged in the analysis was consistent with the prediction. C. Nominal data provide very little measurement information, and the chi-square procedures are a case in point. The analyses are based on nothing more than the frequency with which data occur in the various categories. These data yield noth- ing about how much of a measured quality is present, for example. Because none of the data nuances present with ordinal and interval/ratio data are gauged, differences must be substantial to prompt rejecting the null hypothesis. These are not particularly powerful procedures. D. Phi coefficient measures the strength of the relationship between two nominal variables when at least one of them has only two categories. Chapter Formulas Formula 12.1 x2 5 S 1 fo 2 fe 2 2 fe is the formula for the chi-square test statistic. The same formula is used for both “goodness
  • 54. of fit” test and for the r 3 k chi-square test of independence. Formula 12.2 f 5"1x2/n 2 Phi coefficient is the measure of the correlation for two nominal variables when the chi- square test of independence indicates a significant result, and when one of the variables involved has two or fewer categories. Formula 12.3 V"f2/ 1smaller of rows or columns 2 2 1 When the test of independence is significant and both variables have at least three catego- ries, Cramer’s V is calculated rather than phi coefficient. V requires phi, however, which must be calculated first. tan81004_12_c12_295-322.indd 319 2/22/13 3:45 PM mailto:[email protected] CHAPTER 12Management Application Exercises Management Application Exercises Unless otherwise stated, use p 5 .05 in all your answers. 1. Three new movies, each with the potential to be a blockbuster, are released on the same day. Reporters from the local television station are interested to see whether one appears to have caught the public attention more than the others. The reporter goes to the local multiplex and asks those waiting to buy tickets
  • 55. which movie they intend to see. On the basis of results from 52 people, are there significant differ- ences in movie preferences? The data are as follows: Fantasy Haven: 22 Night of Terror: 18 Fists of Glory: 12 2. Data from behavioral psychology indicate that administering a tangible reward to subjects will prompt response levels twice as frequent as from subjects who receive a nontangible reward. To test this notion in a business context, two sales seminars are compared. In one seminar, sales representatives are tossed a piece of candy every time they ask a relevant question or provide an insightful comment. In the other seminar, only verbal reinforcement is provided. At the end of the seminars, data are as follows: verbal reinforcement seminar—17 questions/comments tangible reward seminar—27 questions/comments a. What is the fe value for each group? b. Are the results consistent with the expectation? 3. A Department of Labor study of education and employment found that unem- ployed full-time students take twice as many units as students who are full-time employees and 1.5 times more units than students who are part- time employees.
  • 56. a. If the fe for the unemployed student is 16 units, what are the fe values for students who work part time and full time? b. If the student who is unemployed takes 16 units, the student who is em- ployed part time takes 14 units, and the full-time employee takes 12 units, is the expectation supported? 4. In a management trainee program for a multinational corporation, trainees are expected to learn a foreign language. Besides classes at a language training insti- tute, tutors are available. Experience suggests that those learning Japanese seek the help of tutors twice as frequently as those who are learning Spanish. Among 20 students of Japanese, 16 ask for the help of tutors. Among 30 students of Spanish, 8 ask for tutors’ help. a. What are the fe values? b. Are results consistent with prior experience? c. In this instance, what does HA specify? tan81004_12_c12_295-322.indd 320 2/22/13 3:45 PM CHAPTER 12Key Terms 5. A marketing analyst is examining the relationship between shoppers’ ethnicity and the purchase of certain grocery item. From ethnic group A, 2 of 12 people
  • 57. purchased the item. From ethnic group B, 5 of 10 people purchased the item. From ethnic group C, 4 of 14 people purchased the item. a. Are the shoppers’ ethnicities and the tendency to purchase this item inde- pendent? b. If not, what is the correlation? 6. During the summer months, when electricity usage is high, the power company appeals to customers to reduce consumption by 10% as a public service to avoid blackouts. An alternative is to offer rebates to customers who reduce usage by 10% compared to the same month the previous year. Among 50 randomly selected customers just asked to reduce electricity use, 14 reduce their use by 10% or more. Among 50 randomly selected customers offered rebates, 25 reduce their electricity use by 10%. Are the differences between the public service appeal and the rebates statistically significant? 7. A number of nonprofit groups use fireworks sales as the major fundraiser in the days before the 4th of July. Some of the nonprofits are service groups such as the Veterans of Foreign Wars. Others are intended for support of groups like the cheer- leaders from the local high school. The questions are whether those two groups attract different numbers of customers, and whether the gender of the customer is
  • 58. a factor. Among 20 men who bought fireworks during a particular 2-hour period, 14 purchased from service organizations, the other 6 purchased from non-service groups. Of the 18 women who purchased in the same period, 8 bought from ser- vice organizations, and 10 from non-service groups. Is the gender of the purchaser related to the group from which the purchase is made? 8. A corporate CEO is interested in whether 20 management trainees’ possession of a graduate degree (yes/no) is related to their promotion within the first five years (yes/no). The x2 value is 8.450. a. What does the x2 value indicate about possession of a graduate degree and promotion? b. What is the value of f? Key Terms • The goodness-of-fit, or 1 3 k, is a chi-square test for significant differences between the frequency observed and the frequency expected in the categories of one nominal- scale variable. • The chi-square test of independence, or the r 3 k chi-square is a test of whether two nominal scale variables operate independently. The test statistic is the same as for the goodness-of-fit chi-square. A statistically significant result indicates that the
  • 59. two are not independent and a correlation procedure follows. tan81004_12_c12_295-322.indd 321 2/22/13 3:45 PM CHAPTER 12Key Terms • The data in a chi-square test of independence are often arranged in a contingency table with the rows indicating the categories of one variable and the columns the categories of the other. With the count of the data entered into the resulting cells, the table provides a visual indicator of the way the variables operate together. • When a chi-square test of independence is statistically significant, it is followed by a measure of the strength of the correlation between the variables. This is usually a phi coefficient if there are only two categories of one of the variables, or Cramer’s V when there are more than two categories. tan81004_12_c12_295-322.indd 322 2/22/13 3:45 PM BU Due: 09/01/2015
  • 60. CE 371 Numerical Methods in Civil Engineering PROJECT Fall’14 Page 1 of 1 two infinite cylindrical surfaces of radii 1r and 2r as shown in figure below. Initially (at ywhere (inner circle is insulated); The inner The dimensionless heat conduction equation in radial coordinates is; 2 2
  • 61. 1 r r r t To reduce the situation to a characteristic value problem, we must render the boundary conditions homogenous, that is, of form 0a r . differential equation becomes; 2 2
  • 62. 1 r r r t subject to; ii) the boundary conditions: 0 r a) Simulate the equation using finite difference method explicit scheme with Δt = 0.25, 0.5, 1.0 b) Simulate the equation using finite difference method implicit scheme with Δt = 0.25, 0.5, 1.0
  • 63. c) Plot your findings and compare the results for a and b. r1 = 1 m r2 = 10 m NOTE: Your project reports will be evaluated for both correctness and accuracy of your results and for clarity and neatness of your report. Be sure to organize your plots neatly. 11 Confidence Intervals Learning Objectives After reading this chapter, you should be able to: • Distinguish between point and interval-estimates of values. • Calculate confidence intervals for t-tests and regression solutions. • Explain the factors that affect the width of a confidence interval. iStockphoto/Thinkstock
  • 64. tan81004_11_c11_273-294.indd 273 2/22/13 3:44 PM CHAPTER 11Introduction Chapter Outline 11.1 A Confidence Interval for a One-Sample t-Test Confidence Intervals and the Significance of the Test The Width of the Interval 11.2 A Confidence Interval for an Independent Samples t-Test An Independent Samples t-Test Example Calculating the Confidence Interval of the Difference 11.3 The Confidence Interval of the Prediction The Standard Error of the Estimate Calculating the Confidence Interval of the Prediction Regarding the Width of the Interval The Excel Confidence Intervals for a Regression Solution Chapter Summary Introduction
  • 65. Early in the book there was a distinction drawn between the statistics that describe sample data and the parameter values that describe the characteristics of populations. The relationship between statistics and parameters is important. The value in calculat- ing many statistics is connected to how well they represent one of the possible values of a related parameter. In fact a good deal of analysis and decision making is based on the understanding that when samples are large and randomly selected, the statistics that describe the sample will also provide useful indicators of what the parameter values will likely be. In the progression from one type of analysis to the next in this book, there have been a number of instances in which a statistic was calculated and used in a procedure because the parameter value was not available. In each situation, the statistic that was cal- culated from the sample data was employed as an estimate of what the related parameter value would likely be for the entire population. For example, • The sample mean, M, is one of the possible values of the population mean, m.
  • 66. • The standard deviation, s, can serve as an estimate of s. • The sample standard error of the mean (SEM) was used to complete the one- sample t-test in the place of the population standard error of the mean, sM, which the z-test requires. • The standard error of the difference, SEd, in the independent samples t-test substituted for the standard deviation of the differences sM12M2 between the means of all possible pairs of samples in a distribution of difference scores. • Ordinary least squares regression produces a predicted value of the criterion variable (y’) when the actual value of that variable, y, is unknown. In each case, the statistic is a particular, discrete value that takes the place of the related parameter. The value of the statistic is based on the understanding that it is the best estimate available from whatever data are at hand for the value of the
  • 67. more-difficult-to-determine tan81004_11_c11_273-294.indd 274 2/22/13 3:44 PM CHAPTER 11Section 11.1 A Confidence Interval for a One- Sample t-Test parameter. Usually a relatively large sample that is created by randomly selecting the indi- viduals who make it up will reflect the essential characteristics of the population from which the sample was drawn. But “usually” means that sometimes the sample does not accurately represent the population, and the difficulty is that at the time it may not be clear whether the sample is representative. Many of the statistical procedures used to this point involve calculating a value that indi- cates the degree of difference or association between groups and then determining whether that value is statistically significant. The calculated values represent what are called
  • 68. point estimates of outcomes; they are discrete numbers that are used to determine when a difference or a relationship reflected in sample data is likely to have occurred by chance. The difficulty is that even when a value is based on the most careful data collection proce- dures, there is always a risk of sampling error. The point estimate by itself provides no indicator of its accuracy. It is difficult to know how much confidence to have in results that are based on these values. One way to address the limitations of point esti- mates is to move away from the focus on discrete values and rely instead on what are called interval estimates of the relevant values. For example, rather than asking whether the sample mean provides an accurate estimate of the mean of the population from which the sample was drawn, the question becomes, “What range of values is likely to capture the true value of the population mean from which a significantly different sample was drawn?” In other words, instead of relying
  • 69. on point estimates to estimate population parameters, a confidence interval provides, with a specified level of probability, a range of values within which the estimated population parameter is likely to fall. Relying on a range of values to capture the value of interest rather than trying to pinpoint it with a discrete value is why this approach to statis- tical analysis makes reference to “confidence inter- vals.” The emphasis is on the use of “interval esti- mates” rather than on “point estimates.” It isn’t uncommon to see both point and interval esti- mates used in the same analysis. If a statistical test produces a statistically significant result, the analysis that provided a point estimate of the value of the population parameter is often followed by a confidence interval for that population parameter. That way one can know (a) what the best estimate of the value is, and (b) how much variation should be allowed around that value in order to have a rea- sonable chance of capturing it. Different statistical procedures involve different kinds of
  • 70. confidence intervals. 11.1 A Confidence Interval for a One-Sample t-Test Suppose that the market development team associated with an electric power plant located in Fresno County, California, wishes to estimate average monthly electricity usage. Cal- culating the mean (M) use of electricity among a randomly selected sample of Fresno Key Terms: Point estimates are discrete values that are cal- culated for unknown parameter values. Key Terms: Interval esti- mates are calculated ranges within which an unknown parameter value is likely to occur. Because the interval is established based on a specified level of confidence, it is also called a confidence interval. tan81004_11_c11_273-294.indd 275 2/22/13 3:44 PM
  • 71. CHAPTER 11 Formula 11.1 C.I..95 5 6t1SEM2 1 M residents will probably provide a reasonable estimate of mean use among all members of the county, (m), so long as the sample is of an adequate size and is selected in a way that minimizes sampling error. However, what if the goal is to compare domestic use in Fresno County to domestic use in the country as a whole? As an example, suppose that the mean electricity bill for domestic users in the country as a whole is $230.835 per month. For Fresno County, the monthly costs for 31 ran- domly selected residences average $245, with a standard deviation of 37.555. Since SEM 5 s/ "n , SEM 5 37.555/ "31 5 6.745. Recall from Chapter 4 that t can be calculated as follows:
  • 72. t 5 M 2 mM SEM Remembering that uM and m have the same value, t 5 245 2 230.835 6.745 t 5 2.100 If the probability of a type I error is set at p 5 .05, then the critical value of t for df 5 30 (n 2 1) 5 2.042 (Table 4.1). Average electricity use by the people in Fresno County is sig- nificantly different than it is for the country as a whole. For a one-sample t-test, a statisti- cally significant result indicates that the mean of the population from which the sample was drawn (Fresno) is different from the population mean to which it was compared (United States); to say it differently, it indicates that there are two different populations
  • 73. involved—one is the population from which the sample was drawn (Fresno), and the other is the population to which the sample was compared (United States). But is the sample mean (M) that prompts a significant t value really an accurate estimate of the population from which the sample was drawn? In the electricity use problem, does M provide a reasonably good estimate of electricity use in the population of all electric- ity users in Fresno County? How heavily can one afford to rely on the “M as a good rep- resentation of m” assumption? Rather than calculat- ing the sample mean (M) and relying on sample size and random selection to make the case that M 5 m, the confidence interval of the population mean provides a range of values, thus the reference to an interval, within which mM, the mean of the popu- lation to which the sample does belong, will occur with a specified level of probability. The confidence interval for a one-sample t is calculated with this formula: Key Terms: The confidence interval of the popula-
  • 74. tion mean is a range within which the value of an unknown population mean has a specified probability of occurring. Section 11.1 A Confidence Interval for a One-Sample t-Test tan81004_11_c11_273-294.indd 276 2/22/13 3:44 PM CHAPTER 11 Where C.I..95 5 a .95 confidence interval, or a range of values within which the probability is p 5 .95 that the true value of the population mean, mM, will be included. t 5 the critical value of t for the degrees of freedom associated with the t-test. The 6 symbol indicates that the critical value of t from the table is included twice,
  • 75. once as a positive value and a second time as a negative value. SEM 5 the calculated value of the standard error of the mean. M 5 the sample mean. As is the case with any type of statistical testing, when calculating confidence intervals the analyst deals in probabilities rather than certainties. The range of values that is a .95 con- fidence interval for a one-sample t-test will capture the mean of the population, which is represented by the sample 19 times out of 20. The other side of that coin for a .95 confi- dence interval is that based on averages, 1 time in 20 the parameter value that is sought will occur outside the interval. The level of probability for the confidence interval is determined by whatever the prob- ability level was for the original test of statistical significance. Since most statistical testing is conducted at p 5 .05, that is also the most common standard for the confidence interval. The confidence interval calculated after a one-sample t that is statistically significant at
  • 76. p 5 .05 must produce a range of values that will miss the true value of the population mean no more than 5 times in 100. However, confidence intervals are stated in terms of the prob- ability of capturing the particular value rather than the proba- bility of missing that value. So instead of a p 5 .05 confidence interval, it is a p 5 .95 confidence interval. If the t-test had been conducted at p 5 .01, a .99 confidence interval would be the appropriate interval estimate to calculate, and so on. To calculate the confidence interval for the one-sample t procedure completed above, the process is as follows: C.I..95 5 6t(SEM) 1 M C.I..95 5 62.042(6.745) 1 245 C.I..95 5 613.773 1 245 C.I..95 5 258.773, 231.227 Review Question A: What is the probabil- ity that the true value will be outside a .95
  • 77. confidence interval? Section 11.1 A Confidence Interval for a One-Sample t-Test tan81004_11_c11_273-294.indd 277 2/22/13 3:44 PM CHAPTER 11Section 11.1 A Confidence Interval for a One- Sample t-Test The result is interpreted this way: With .95 confidence, the mean monthly cost of electric- ity in the population to which the Fresno County people belong is somewhere between $231.23 and $258.77. The initial question that prompted the t-test could be framed this way: Is monthly electric- ity usage in Fresno County consistent with electricity usage for the same month nation- wide? Because the t value is statistically significant, the answer is “no.” The statistically significant t indicates that the sample probably belongs to a population other than the one
  • 78. to which it was compared. Maybe it is summer, and with temperatures higher in Cali- fornia than elsewhere, California consumers have higher electricity usage than what is characteristic of the country as a whole. For whatever reason, the t-test indicates that the sample belongs to a different population than the population that is the country. The sample mean used in the t-test is a point estimate of the mean of a population. If the t-test result is not statistically significant, the conclusion is that the sample mean is one of the many samples making up the population of sample means. However, the fact that the t-test result is statistically significant indicates that the sample mean is point estimate of the value of some population mean different from what was indicated for the country as a whole. What the point estimate does not indicate, however, is how well the sample mean estimates the value of the mean of the population to which the sample belongs. The confi- dence interval responds to this absence by providing a way to calculate a range of values
  • 79. within which the population mean will probably occur. The “probably” is the reminder that there is never a certainty of including the value. In this case, there is 95% probability that the mean of the population represented by the sample is somewhere between $231.23 and $258.77. Note that the range of values in the confidence interval does not include the original pop- ulation mean ($230.835) from the initial t-test. When a sample mean in a one-sample t-test is significantly higher than the population mean, the sample mean (M) is probably not one of the possible values of the population mean, a population with mM 5 230.835 in this case. Because the sample mean was significantly higher than mM, the mean of the popula- tion to which the sample probably does belong will have a value higher than 230.835 in the t-test. So the values in the range that is the confidence interval are all beyond the origi- nal value of mM. If a significant t test involved a sample mean less than mM, the confidence interval would include a range with values all lower than mM.
  • 80. Confidence Intervals and the Significance of the Test Confidence intervals are only calculated for significant results. If the value of t is not sta- tistically significant, the confidence interval will include a range of values that includes the original value of mM. This happens because a non- significant t-test result is interpreted to mean that the value of mM from the test is one of the values that could be the mean for the population to which the sample was compared. An example will clarify this. Suppose sales of frozen yogurt for a franchise average $1,125 per day. For a particularly cold week, the daily sales were as follows: $974, $1,256, $1,170, $842, $875, $1,056, $1,145 tan81004_11_c11_273-294.indd 278 2/22/13 3:44 PM CHAPTER 11
  • 81. Are sales during that week significantly different from average daily sales of $1,125? Ver- ify that: M 5 1045.429 s 5 155.679 SEM 5 s/"n 5 155.679/"7 5 58.841 t 5 M 2 mM SEM 5 1045.429 2 1125.0 58.841 5 21.352; With t.05(6) 5 2.447, the result is not significant. The sample mean from that non-significant result, with the
  • 82. critical value of t and the esti- mated standard error of the mean, produce this .95 confidence interval: C.I..95 5 6t(SEM) 1 M 5 62.447(58.841) 1 1045.429 5 1189.413, 901.445 The t-test was not significant, and the resulting confidence interval therefore includes the original population mean ($1,125) as one of the values in the interval. Logically, if the point estimate (M) is not significantly different from the population mean (mM), then maybe the mean of the population represented by the sample does have the same value as the speci- fied population mean. At least it is a possibility that cannot be rejected. The Width of the Interval The narrower the confidence interval, the more precise estimate the interval provides for the value of the unknown population mean. Sometimes confidence intervals are so wide that they seem not to be very helpful as indicators of the
  • 83. unknown value, so it makes sense to analyze the factors that affect the width of the interval. There are several factors, includ- ing the level of probability at which the interval is calculated. For the electricity problem, the .95 confidence interval stretched from $231.227 to $258.773, around the $245 sample mean. If the market development team wanted a greater level of certainty about capturing the value of the population mean to which the sample belongs, a C.I..99 could be used, but note what happens as a result. First, verify that the critical t value for p 5 .01 and df 5 30 is 2.750. Now the calculated t value of 2.11 is not statistically significant. Second, to calculate the .99 confidence interval: C.I..99 5 6t(SEM) 1 M C.I..99 5 62.750(6.745) 1 245 C.I..99 5 618.549 1 245 C.I..99 5 263.549, 226.451
  • 84. Section 11.1 A Confidence Interval for a One-Sample t-Test tan81004_11_c11_273-294.indd 279 2/22/13 3:44 PM CHAPTER 11Section 11.2 A Confidence Interval for an Independent Samples t-Test The price for a greater certainty of capturing the true value of the population mean (p 5 .99 rather than p 5 .95) represented by the sample is a wider confidence interval and less precision. Now the market development department is 99% certain that the confi- dence interval includes the population mean, but the interval estimate is much broader and probably less helpful. Note that the interval now includes the population mean of $230.835. Precision and certainty push the confidence interval in opposite directions. One is improved only at the expense of reducing the other. Just as the confidence interval increased when a .95 confidence interval was used, it could
  • 85. have been reduced by calculating a .90 C.I. interval instead (although the particular table in this book does not provide the critical values for significance testing at p 5 .1). Note that with a C.I.90 the probability of missing the true value of the population mean to which the sample belongs is 1 time in 10. As the level of probability is relaxed, the critical value of t diminishes accordingly, which shrinks the C.I. The other element in the width of the confidence interval is the standard error of the mean, SEM. Since SEM 5 s/ "n , the value of the standard error of the mean can be reduced by either increasing the sample size (larger n, larger divisor, smaller resulting value), finding a way to reduce the variability in the scores so that the standard deviation is smaller, or both. As it turns out, increasing the sample size usually will serve both purposes. Besides the larger n value, it also usually decreases the s value. Small samples tend to be platykur- tic; they ordinarily have relatively large standard deviations given their ranges. As sample sizes grow, particularly randomly selected samples, overall
  • 86. variability tends to decrease. This is consistent with the central tendency characteristics of normal distributions, as more of the scores selected will tend to occur in the middle of the distribution near the mean. Since the standard deviation measures how much individual values tend to vary from the mean of the group, increasing sample size usually shrinks the standard deviation value. 11.2 A Confidence Interval for an Independent Samples t-Test A statistically significant independent samples t-test indicates that the two samples represented by M1 and M2 probably did not come from the same population; they came from populations with different means. Stated symbolically, m1 2 m2 ? 0. Consistent with the lan- guage used earlier, the two sample means in the case of the independent t-test are point estimates of their respective population means. The more difference there is between M1 and M2 in a statis- tically significant t-test, the greater the difference there probably is between the means of the two related populations. The confidence interval for the independent samples t-test is called
  • 87. the confidence interval of the difference. It provides a way to estimate the difference between the two population means represented by the sample means in a statistically significant independent samples t-test. The confidence interval of the difference employs this formula: Key Terms: The confidence interval of the difference is a range probably contain- ing the difference between two unknown population means. tan81004_11_c11_273-294.indd 280 2/22/13 3:44 PM CHAPTER 11Section 11.2 A Confidence Interval for an Independent Samples t-Test Formula 11.2 C.I..95 5 6t(SEd) 1 (M1 2 M2) Where
  • 88. C.I..95 5 a range of values within which the difference between the means of the population represented by the samples will be captured with p 5 .95 t 5 the critical value of t for n1 1 n2 2 2 degrees of freedom (the same number of degrees of freedom as there were for the independent samples t) SEd 5 the calculated value of the standard error of the difference M1, M2 5 the two sample means from the independent t-test As with the one-sample t-test, a confidence interval of the difference is only calculated if the independent samples t-test is statistically significant. Considering the purpose for the confidence interval makes the reason for this evident. What is at issue in the independent samples t is whether the two samples likely represent populations with the same means, or for practical purposes, whether the two samples belong to the same population. That probability is rejected when the result is significant. If the analysis is not significant and the decision is to “fail to reject,” then the samples may come from populations with the
  • 89. same means. It makes little sense, therefore, to calculate a confidence interval for the dif- ference between the population means if there may be only one population involved. An Independent Samples t-Test Example A pharmaceutical manufacturer has received Federal Drug Administration (FDA) approval to market a new drug for people with high cholesterol. In two sales regions, pre- scription sales have been similar. But then at some point, one of the two representatives receives special training on the drug’s chemistry so that the representative can explain more knowledgeably how the drug works and what the side effects are likely to be. The question is whether the extra training is worth the trouble—do the doc- tors visited by the sales rep who received the extra training write a significantly different number of prescriptions than the doctors the other rep visits? The numbers of prescription orders for the drug by doctors in the two sales regions over a seven-week period following the training are as follows: Extra training: 13, 10, 14, 17, 16, 12, 15
  • 90. No extra training: 12, 10, 9, 12, 8, 8, 9 Review Question B: What is the relation- ship between the level of confidence and the width of the confi- dence interval? tan81004_11_c11_273-294.indd 281 2/22/13 3:44 PM CHAPTER 11Section 11.2 A Confidence Interval for an Independent Samples t-Test Verify the following descriptive statistics: Mean Standard Deviation Standard Error of the Mean Extra Training 13.857 2.410 .911 No Training 9.714 1.704 .644
  • 91. Recall that SEd 5 "1SEM2 2 1 SEM22 2 5 "1.9112 1 .6442 2 5 1.116 t 5 M1 2 M2 SEd 5 13.857 2 9.714 1.116 5 3.712; t.051122 5 2.179. Reject H0 The result is statistically significant. As the difference between the sample means sug- gests, the doctors visited by the sales representative with the extra training are writing significantly more prescriptions than the doctors in the other sales district. In terms of the number of prescriptions the doctors write, the two sales reps probably now represent two distinct populations. The sample means provide point estimates of the means of the two populations involved. The confidence interval of the
  • 92. difference provides a different way to contrast the two populations by indicating, with the specified probability, what range of values is likely to capture the magnitude of the difference between the two population means. Calculating the Confidence Interval of the Difference Formula 11.2 is: C.I..95 5 6t(SEd) 1 (M1 2 M2) Note that although the final term in the formula indicates M1 2 M2, it is the absolute value of the difference between the means that the formula requires. If M2 happens to have a greater value than M1, the result will be a difference that is negative, but whether M1 2 M2 is positive or negative isn’t relevant to the confidence interval. It is the absolute value of M1 2 M2 that is entered in the formula. All the required values are available from the t-test solution above. Substituting them
  • 93. gives the following for the confidence interval: tan81004_11_c11_273-294.indd 282 2/22/13 3:44 PM CHAPTER 11Section 11.3 The Confidence Interval of the Prediction C.I..95 5 62.179(1.116) 1 (13.857 2 9.714) C.I..95 5 6.575, 1.711 a. 2.179 3 1.116 1 (13.857 2 9.714) is the upper bound of the confidence inter- val, and b. 22.179 3 1.116 1 (13.857 2 9.714) is the lower bound of the confidence interval. The significant independent samples t-test result that indicates that the two samples prob- ably do not represent populations with the same mean is followed with a confidence inter- val estimating how much difference there is between the means
  • 94. of the two populations. The calculations indicate that with p 5 .95 confidence, the difference will be somewhere from 1.711 to 6.575 prescriptions per week. Note that the confidence interval does not estimate the means of the two populations. Rather, it estimates the difference between their means. The sample means provide esti- mates of the population means, which is as close as we come to identifying their values without further data collection and analysis. 11.3 The Confidence Interval of the Prediction The first two confidence intervals in this chapter were calculated and interpreted in ref- erence to population means. When t is significant in the one- sample test, the confidence interval is a range of values within which the mean of the population represented by the sample is likely to occur. When t is significant in an independent groups test, the confi- dence interval indicates how much difference there is likely to be between the means of the two populations inferred by the samples.
  • 95. The confidence interval of the prediction has a different orientation. It is a confidence interval for a regression solution, and because the task in regression is to predict the value of y from x (or from multiple x predictors), what the confidence interval produces is a range of values within which the true value of y will probably occur, given a specific value for x. Remember that the basis for regression is that when- ever variables are significantly correlated, a number of estimates of the value of y from x will be more accurate than a series of random estimates. While that represents a very important theoretical princi- ple, managers are not usually interested in a series of estimates. For example, given a correlation between the price of some product and the resulting sales vol- ume, a production manager who is trying to forecast what production will need to be in order to satisfy demand is often not interested in a series of estimates of sales volume (y) for several different prices (xs), but rather in one estimate of y
  • 96. from x. The question is often “If the price is ___, what are sales most likely to be?” The value of the confidence interval is that it provides a way to estimate how precise that one prediction of sales volume is likely to be. Key Terms: The confidence interval of the prediction is a range of probability contain- ing the true value of a criterion variable. tan81004_11_c11_273-294.indd 283 2/22/13 3:44 PM CHAPTER 11Section 11.3 The Confidence Interval of the Prediction Small values of the standard error of the estimate indicate relatively little error and inter- vals from y’ 2 1(SEest), to y’ 1 1(SEest) that are relatively narrow. When that occurs, there can be increased confidence in the predicted value of y. Small
  • 97. intervals indicate little error in the prediction. On the other hand, if that range is quite wide, it suggests there could be a good deal of variance between the predicted value of y and its actual value. Capturing the true value of the criterion variable about 2⁄3 of the time, which is what y’ 6SEest provides for, produces essentially a p 5 .68 confidence interval. The problem is that the range of values that is y’ 61(SEest) would miss the true value of y about 1⁄3 of the time. The probability of not capturing the true predicted value in such a confidence interval is too great to be helpful in most analytical situations. To improve the probability of captur- ing y, the more precise C.I..95 and C.I..99 confidence intervals of the prediction are calcu- lated. The formula for a C.I..95 is: Formula 11.4 C.I..95 5 6tn22(SEest) 1 y’ Formula 11.3 C.I..68 5 61(SEest) 1 y’
  • 98. It is important to be cautious about the distinction between specificity and precision. The difficulty is not in predicting the value of y given x, which can be done whenever x and y are significantly correlated. The challenge is in making precise predictions of the value of y from x. Although the least squares regression equation provides a particular value that is the best prediction of the criterion variable from a specific value of x, what consti- tutes the best prediction is relative. The best prediction in some circumstances can be very imprecise. A number of factors conspire against accurate predictions, most notably, weak correlations. The problem that the regression solution does not solve is that the predicted value of y, (y’), yields little information about how precise the prediction is likely to be. It is a shortfall that the confidence interval addresses. The Standard Error of the Estimate Recall that in any normal distribution, plus or minus 1 standard deviation from the mean,
  • 99. accounts for about 2⁄3 of the population, 68.26% to be more exact. Recall also from Chapter 9 that the standard error of the estimate (SEest), which estimates prediction errors in regres- sion problem, is by definition a standard deviation value. In fact it is the standard deviation of all possible error scores from an infinite number of predictions of y from x. While that definition has important conceptual value, in practice the standard error of the estimate is in fact an estimated value, as the name suggests. Recall that SEest 5 sy"1 2 r2xy . Because it is a type of standard deviation value, and since 6 one standard deviation from the mean in any normal distribution includes 68% of that distribution, we can say by extension that 61(SEest) from the predicted value of y (y’) point should provide a range of values within which the true value of y will occur about 68% of the time. tan81004_11_c11_273-294.indd 284 2/22/13 3:44 PM CHAPTER 11Section 11.3 The Confidence Interval of the
  • 100. Prediction Where C.I..95 5 a .95 confidence interval for the regression solution t 5 the critical value of t for n 2 2 degrees of freedom where n 5 the number of pairs of scores SEest 5 the standard error of the estimate y’ 5 the predicted value for the criterion variable Calculating the Confidence Interval of the Prediction A produce importer recognizes a relationship between how long it takes produce to get from shipping docks to the retail market and how much of the order is lost to spoilage because of over-ripening. The data for number of days (x) and the percentage of the order lost to over-ripeness (y) is as follows: Number of days: 12, 15, 11, 9, 7, 9, 12, 10, 7, 8, 9, 8 Percentage lost: 8, 10, 7, 7, 6, 8, 9, 9, 5, 6, 7, 6 With a dock workers’ strike looming and produce coming into
  • 101. port, the importer wishes to predict what the losses will be if it takes 18 days to get the produce to the retailer. Verify the following descriptive statistics: Mean Standard Deviation Number of Days (x) 9.750 2.379 % Lost (y) 7.333 1.497 rxy 5 .868 and is statistically significant at level p 5 .05 y’ 5 a 1 bx b 5 rxy(sy/sx) 5 .868(1.497 4 2.379) 5 .546 a 5 My 2 bMx 5 7.333 2 (.546)(9.750)
  • 102. 5 2.010 y’ 5 2.010 1 .546(18) 5 11.838 tan81004_11_c11_273-294.indd 285 2/22/13 3:44 PM CHAPTER 11Section 11.3 The Confidence Interval of the Prediction Based on these data, if it takes 18 days to get the produce to retail outlets, there will be an 11.838% loss due to over-ripening. Like the values for M in the one sample t, and M1 2 M2 in the independent t, the 11.838 value is a point estimate—a discrete number—indicating a solution value. As with t-tests, the confidence interval of the prediction produces an interval estimate, or a range of num- bers within which the true percentage of spoilage will occur