Week 5 Lecture 14
The Chi Square Test
Quite often, patterns of responses or measures give us a lot of information. Patterns are
generally the result of counting how many things fit into a particular category. Whenever we
make a histogram, bar, or pie chart we are looking at the pattern of the data. Frequently, changes
in these visual patterns will be our first clues that things have changed, and the first clue that we
need to initiate a research study (Lind, Marchel, & Wathen, 2008).
One of the most useful test in examining patterns and relationships in data involving
counts (how many fit into this category, how many into that, etc.) is the chi-square. It is
extremely easy to calculate and has many more uses than we will cover. Examining patterns
involves two uses of the Chi-square - the goodness of fit and the contingency table. Both of
these uses have a common trait: they involve counts per group. In fact, the chi-square is the only
statistic we will look at that we use when we have counts per multiple groups (Tanner &
Youssef-Morgan, 2013).
Chi Square Goodness of Fit Test
The goodness of fit test checks to see if the data distribution (counts per group) matches
some pattern we are interested in. Example: Are the employees in our example company
distributed equal across the grades? Or, a more reasonable expectation for a company might be
are the employees distributed in a pyramid fashion – most on the bottom and few at the top?
The Chi Square test compares the actual versus a proposed distribution of counts by
generating a measure for each cell or count: (actual – expected)2/actual. Summing these for all
of the cells or groups provides us with the Chi Square Statistic. As with our other tests, we
determine the p-value of getting a result as large or larger to determine if we reject or not reject
our null hypothesis. An example will show the approach using Excel.
Regardless of the Chi Square test, the chi square related functions are found in the fx
Statistics window rather than the Data Analysis where we found the t and ANOVA test
functions. The most important for us are:
• CHISQ.TEST (actual range, expected range) – returns the p-value for the test
• CHISQ.INV.RT(p-value, df) – returns the actual Chi Square value for the p-value
or probability value used.
• CHISQ.DIST.RT(X, df) – returns the p-value for a given value.
When we have a table of actual and expected results, using the =CHISQ.TEST(actual
range, expected range) will provide us with the p-value of the calculated chi square value (but
does not give us the actual calculated chi square value for the test). We can compare this value
against our alpha criteria (generally 0.05) to make our decision about rejecting or not rejecting
the null hypothesis.
If, after finding the p-value for our chi square test, we want to determine the calculated
value of the chi square statistic, we can use the =CHISQ.INV.RT(probability, df).
Unit 8 - Information and Communication Technology (Paper I).pdf
Week 5 Lecture 14 The Chi Square Test Quite often, pat.docx
1. Week 5 Lecture 14
The Chi Square Test
Quite often, patterns of responses or measures give us a lot of
information. Patterns are
generally the result of counting how many things fit into a
particular category. Whenever we
make a histogram, bar, or pie chart we are looking at the pattern
of the data. Frequently, changes
in these visual patterns will be our first clues that things have
changed, and the first clue that we
need to initiate a research study (Lind, Marchel, & Wathen,
2008).
One of the most useful test in examining patterns and
relationships in data involving
counts (how many fit into this category, how many into that,
etc.) is the chi-square. It is
extremely easy to calculate and has many more uses than we
will cover. Examining patterns
involves two uses of the Chi-square - the goodness of fit and the
contingency table. Both of
these uses have a common trait: they involve counts per group.
In fact, the chi-square is the only
statistic we will look at that we use when we have counts per
multiple groups (Tanner &
Youssef-Morgan, 2013).
Chi Square Goodness of Fit Test
The goodness of fit test checks to see if the data distribution
2. (counts per group) matches
some pattern we are interested in. Example: Are the employees
in our example company
distributed equal across the grades? Or, a more reasonable
expectation for a company might be
are the employees distributed in a pyramid fashion – most on
the bottom and few at the top?
The Chi Square test compares the actual versus a proposed
distribution of counts by
generating a measure for each cell or count: (actual –
expected)2/actual. Summing these for all
of the cells or groups provides us with the Chi Square Statistic.
As with our other tests, we
determine the p-value of getting a result as large or larger to
determine if we reject or not reject
our null hypothesis. An example will show the approach using
Excel.
Regardless of the Chi Square test, the chi square related
functions are found in the fx
Statistics window rather than the Data Analysis where we found
the t and ANOVA test
functions. The most important for us are:
• CHISQ.TEST (actual range, expected range) – returns the p-
value for the test
• CHISQ.INV.RT(p-value, df) – returns the actual Chi Square
value for the p-value
or probability value used.
• CHISQ.DIST.RT(X, df) – returns the p-value for a given
value.
When we have a table of actual and expected results, using the
=CHISQ.TEST(actual
3. range, expected range) will provide us with the p-value of the
calculated chi square value (but
does not give us the actual calculated chi square value for the
test). We can compare this value
against our alpha criteria (generally 0.05) to make our decision
about rejecting or not rejecting
the null hypothesis.
If, after finding the p-value for our chi square test, we want to
determine the calculated
value of the chi square statistic, we can use the
=CHISQ.INV.RT(probability, df) function, the
value for probability is our chi square test outcome, and the
degrees of freedom (df) equals the
number of cells in our actual table minus 1 (6 – 1 =5 for an
problem working with our 6 grade
levels). Finally, if we are interested in the probability of
exceeding a particular chi square value,
we can use the CHIDIST or CHISQ.DIST.RT function.
Excel Example. To see if our employees are distributed in a
traditional pyramid shape,
we would use the Chi Square Goodness of Fit test as we are
dealing both with count data and
with a proposed distribution pattern. For this test, let us assume
the following table shows the
expected distribution of our 50 employees in a pyramid
organizational structure.
Grade: A B C D E F Total
Count: 15 12 10 6 4 3 50
The actual or observed distribution within our sample is shown
below.
4. Grade: A B C D E F Total
Count: 15 7 5 5 12 6 50
The research question: Are employees distributed in a pyramidal
fashion?
Step 1: Ho: No difference exists between observed and expected
frequency counts
Ha: Observed and Expected frequencies differ.
Step 2: Reject the null hypothesis if the p-value < alpha = .05.
Step 3: Chi Square Goodness of Fit test.
Step 4: Conduct the test. Below is a screen short of an Excel
solution.
Step 5: Conclusions and Interpretation. Since our p-value of
0.00024 is < our alpha of 0.05, we
reject the null hypothesis. The employees are not distributed in
a pyramid pattern.
Side Note: We might think that if our sample had an equal
number of employees per grade we
would have a better chance of grade based differences averaging
out. Doing this same test and
assuming an equal distribution across grades produces a p-value
of 0.063 causing us to fail to
reject the null hypothesis. The student is encouraged to try this,
the equal value for each grade
5. would be 50/6.
Effect size. For a single row, goodness-of-fit test, the
associated effect size measure is
called effect size r, and equals the square root of: the chi square
value/(N*df), where df = the
number of cells – 1. A value less than .30 is considered small,
between .30 and .50 is considered
moderate, and more than .50 is considered large (Steinberg,
2008). Since we rejected the null in
the example above, the effect size would be: r= square root
(23.75/50*5) = sgrt(0.095) =0.31. This
is a moderate impact, suggesting that both sample size and
variable interaction had some impact.
With moderate results, we generally would want to get a larger
sample and repeat the test (Tanner
& Youssef-Morgan, 2013).
Chi Square Contingency Table test
Contingency table tests, also known as tests of independence,
are slightly more complex
than goodness of fit tables. They classify the data by two or
more variable labels (we will limit
our discussions to two variable tables). Looking a lot like the
input table for the ANOVA 2-
factor without replication we looked at last week. Both
variables involve the counts per category
(nominal, ordinal, or interval/ratio data in ranges) of items that
meet our research interest (Lind,
Marchel, & Wathen, 2008).
With most contingency tables, we do not have a given expected
frequency as we had with
the goodness of fit situation. To find the expected value for
each cell for a multiple row/column
6. table, we use the formula: row total * column total/grand total
(which suggests the expected
frequency is the average of the observed frequencies per cell,
not an unreasonable
expectation). Once we have generated the values for the
expected table, we use the same
formula to perform the Chi Square test. Manually, this is the
sum of ((actual –
expected)2/expected) for all of the cells. The same fx Chi
Square functions used for the
Goodness of Fit test are used for the Contingency Table
analysis.
The null hypothesis for a contingency table test is “no
relationship exists between the
variables.” The alternate hypothesis would be: “a relationship
exists.” In general, you are
testing either for similar distributions between the groups of
interest or to see if a relationship
("correlation") exists (even if the data is nominal level). The df
for a contingency table is
(number of rows-1)*(number of columns – 1).
Excel Example. The data entry for this test is the same as with
our earlier test, and the
functions are found in the fx statistical list. One possible
explanation for different salaries is the
performance on the job, reflected in the performance rating.
We might wonder if males and
females are evaluated differently (either due to actual
performance or to bias; if so, we have
another issue to examine). So, our research question for this
issue becomes, are males and
7. females rated the same?
Step 1: Ho: Male and Female ratings are similar (no difference
in distributions)
Ha: Males and Females rating distributions differ
Step 2: Reject Ho if p-value is < alpha = 0.05.
Step 3: Chi Square Contingency Table Analysis
Step 4: Perform Test.
Step 5: Conclusions and Interpretation. Since the p-value
(CHISQ.TEST result) is greater than
(>) alpha = .05, we fail to reject the null hypothesis and
conclude that males and females are
evaluated in a similar pattern. It does not appear that
performance rating impact average salary
differences.
Effect size. Now, as with the t-test and ANOVA, had we
rejected the null hypothesis, we
would have wanted to examine the practical impact of the
outcome using an effect size
measure. The effect size measure for the Chi Square is a
correlation measure. Two measures are
generally used with the contingency table outcomes – the Phi
coefficient and Cramer’s V
(Tanner & Youssef-Morgan, 2013).
8. The Phi coefficient (=square root of (chi square/sample size))
provides a rough estimate
of the correlation between the two variables. Phi is primarily
used with small tables (2x2, 2x3, or
3x2). Values below .30 are weak, .30 to about .50 are
moderate, and above .50 (to 1) are strong
relationships (Tanner & Youssef-Morgan, 2013).
Cramer’s V can be considered as a percent of the shared
variation – or common variation
between the variables. It equals the square root of (phi
squared/(smaller number of rows or
columns -1). It ranges from 0 (no relationship or variation in
common) to 1.0 (strong
relationship, all variation in common) (Tanner & Youssef-
Morgan, 2013).
For our example above, it would not make sense to calculate
either value since we did not
reject the null; but for illustrative purposes we will.
• Phi = square root of (1.978/50) = square root of (0.03956) =
0.199 –small, no relationship
• V = square root of (0.1.99^2/(2-1)) = 0.19. Note, when the
smaller of the number of rows
and columns equals 1, V will equal Phi (Tanner & Youssef-
Morgan, 2013).
Caution
Due to the division involved in calculating the Chi Square
value, it is extremely
influenced with cells that have small expected values. Most
texts say simply that if the expected
frequency in one or more cells is less than (<) 5 to not use the
9. Chi Square distribution in a
hypothesis test. There are some different opinions about this
issue. Different texts issue
different rules on what to do if we have expected frequencies of
5 or less in cells.
As a compromise, let’s use the standard that no more than 20%
of the cells should have
an expected value of less than 5. If they do, we need to
combine rows or columns to reduce this
percentage interest (Lind, Marchel, & Wathen, 2008).
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.
Steinberg, W.J. (2008). Statistics Alive! Thousand Oaks, CA:
Sage Publications, Inc.
Tanner, D. E. & Youssef-Morgan, C. M. (2013). Statistics for
Managers. San Diageo, CA:
Bridgeport Education.
10. Week 5 Lecture 13
This week we look at two different approaches to analyzing data
and making inferences
about the populations they come from. The first is confidence
intervals, a range of values that
we expect to contain the actual population mean based on the
sample results we obtained. The
other is a way to use nominal and ordinal data in a statistical
analysis. The Chi Square family of
tests looks at patterns within samples and sees whether the
underlying populations could contain
the same pattern of measure distributions (Lind, Marchel, &
Wathen, 2008).
Confidence Intervals
When we perform a t-test or ANOVA, we are using a single
point estimate for the means
of the populations we are testing. Some professionals and
managers are a bit uncomfortable with
this; they understand that the sample has a sampling error – and
the actual population mean could
be – and most likely is – a bit different. They are interested in
getting an estimate of what the
sampling error is and how much the population mean could
differ from the sample mean.
We deal with this through the use of confidence intervals, a
range of values that have a
specific probability of containing the actual population mean.
We have seen one example of a
confidence mean already, the intervals used to determine which
population means varied when
we rejected the null hypothesis for the ANOVA test were
confidence intervals.
11. Confidence intervals often provide the added information and
comfort about estimates of
population parameter values that the single point estimates lack.
Since the one thing we do know
about a statistic generated from a sample is that it will not
exactly equal the population
parameter, we can use a confidence interval to get a better feel
for the range of values that might
be the actual population parameter. They also give us an
indication of how much variation exists
in the data set. The larger the range (at the same confidence
level), the more variation within the
sample data set and the less representative the mean would be
(Lind, Marchel, & Wathen,
2008). We are going to look at two different kinds of
confidence intervals this week – intervals
for a one sample mean and intervals for the differences between
the means of two samples (Lind,
Marchel, & Wathen, 2008).
One Sample Confidence Interval for the mean
A confidence interval is simply a range of values that could
contain the actual population
parameter of interest. It is centered on the sample mean, and
uses the variation in the sample to
estimate a range of possible values (Lind, Marchel, & Wathen,
2008). To construct a confidence
interval, we use several pieces of information from the sample
and the confidence level we
want.
From the sample we use the mean, standard deviation, and size.
To get the confidence
level – a desired probability (usually set at 95%), that the
12. interval does, in fact, contain the
population mean.
Example. The confidence interval for the female mean salary in
the population would be
calculated this way. The sample mean value is 38, the standard
deviation is 18., and the sample
size is 25 3 (from Week 1 material). Once we determine the
confidence level we want, we use
the associated 2-tail t value to achieve it. The t-value is found
with the fx function t.inv.2t (Prob,
df). For a 95% confidence interval, we would use t.inv.2t(0.05,
24), this equals 2.064 (rounded).
We now have all the information we need to construct a 95%
confidence interval for the
female salary mean:
CI = mean +/- t * stdev/sqrt(sample size) = 38 +/-
2.064*18.3/sqrt(25) = 38 +/- 7.6.
This is typically written as 30.4 to 45.6. Note: the standard
deviation divided by the square root
of the sample size is called the standard error of the mean, and
is the variation measure of the
sample used in several statistical tests, including the t-test and
confidence intervals.
The associated 95% CI for males is 44.6 to 59.3. Note that the
endpoints overlap – male
smallest vale is 44.6 while the female largest value is 45.6.
This suggests that both population
average salaries could be the same and around 45. However,
13. just as the two one-sample t-tests
gave us misleading information on possible equality, using two
confidence intervals to compare
two populations also is not the best approach.
The Confidence Interval for mean differences.
When comparing multiple samples, it is always best to use all
the possible information in
a single test or procedure. The same is true for confidence
intervals. If we are interested in
seeing if sample means could be equal, we look to see if the
difference between the averages
could be 0 or not. If so, then the means could be the same; if
not, then the means must be
significantly different.
The formula for the mean difference confidence interval is mean
difference +/- t*standard
error. The standard error for the difference of two populations
is found by adding the
variance/sample size (which is the standard error squared) for
each and taking the square root
(Lind, Marchel, & Wathen, 2008). For our salary data set we
have the following values:
Female mean = 38 Male mean = 52 t = t.inv.2t(0.05, 48) = 2.106
Female Stdev = 18.3 Maler Stdev = 17.8 Sample size = 50, df =
48
Standard error = sqrt(Variance (female)/25 + Variance
(male)/25) =
Sqrt(334.7/25 + 316/25) = 5.10.
14. This gives us a 95% confidence interval for the difference
equaling:
(52-38) +/- 2.106 * 5.10 = 14 +/- 10.7 = 3.3 to 24.7.
Since this confidence interval does not contain 0, we are 95%
confident that the male and female
salary means are not equal – which is the same result we got
from our 2 sample t-test in week 2.
We also now have a sense of how much variation exists in our
measures.
Side note: The “+/- t* SE” term is often called the margin of
error. We most often hear
this phrase in conjunction with opinion polls – particularly
political polls, “candidate A has 43%
approval rating with a margin of error of 3.5%. While we do
not deal with proportions in the
class, they are calculated the same as an empirical probability –
number of positive replies
divided by the sample size. The construction of these margins
or confidences is conceptually the
same – a t-value and a standard error of the proportion based on
the sample size and results
(Lind, Marchel, & Wathen, 2008).
References
Lind, D. A., Marchel, W. G., & Wathen, S. A. (2008).
Statistical Techniques in Business &
Finance. (13th Ed.) Boston: McGraw-Hill Irwin.