3. The Chi-square distribution
• Positively skewed but becomes
symmetrical with increasing degrees of
freedom
• Mean = k where k = degrees of freedom
• Variance = 2k
• Assuming a normally distributed dataset
and sampling a single z2 value at a time
2
n
1
i
i
z
= å
2(1) = z2
– If more than one… 2(N) =
Admission.edhole.com
4. Why used?
• Chi-square analysis is primarily used to
deal with categorical (frequency) data
• We measure the “goodness of fit” between
our observed outcome and the expected
outcome for some variable
• With two variables, we test in particular
whether they are independent of one
another using the same basic approach.
Admission.edhole.com
5. One-dimensional
• Suppose we want to know how people in a
particular area will vote in general and go
around asking them.
Republican Democrat Other
20 30 10
• How will we go about seeing what’s really
going on?
Admission.edhole.com
6. • Hypothesis: Dems should win district
• Solution: chi-square analysis to determine
if our outcome is different from what would
be expected if there was no preference
2
2 (O E)
c =å -
E
Admission.edhole.com
7. Republican Democrat Other
Observed 20 30 10
Expected 20 20 20
• Plug in to formula
(20 20)2 (30 20)2 (10 20)2
- + - + -
20 20 20
Admission.edhole.com
8. 2
=
=
(2) 10
5.99
c
c
2
.05
• Reject H0
• The district will probably vote democratic
• However…
Admission.edhole.com
9. Conclusion
• Note that all we really can conclude is that our
data is different from the expected outcome
given a situation
– Although it would appear that the district will vote
democratic, really we can only conclude they were
not responding by chance
– Regardless of the position of the frequencies we’d
have come up with the same result
– In other words, it is a non-directional test regardless
of the prediction
Admission.edhole.com
10. More complex
• What do stats kids do with their free time?
TV Nap Worry Stare at
Ceiling
Males 30 40 20 10
Females 20 30 40 10
Admission.edhole.com
11. • Is there a relationship between gender
and what the stats kids do with their
free time?
TV Nap Worry Stare at
Ceiling
Total
Males 30 40 20 10 100
Females 20 30 40 10 100
50 70 60 20 200
• Expected = (R*C)/N
ij• Example for males TV: (100*25
Admission.edhole.com
50)/200 =
12. TV Nap Worry Stare at
• df = (R-1)(C-1)
– R = number of rows
– C = number of columns
Ceiling
Total
Males (E) 30 (25) 40 (35) 20 (30) 10 (10) 100
Females
(E) 20 (25) 30 (35) 40 (30) 10 (10) 100
50 70 60 20 200
Admission.edhole.com
13. Interpretation
2
=
=
(3) 10.10
7.82
c
c
2
.05
• Reject H0, there is some relationship
between gender and how stats students
spend their free time
Admission.edhole.com
14. Other
• Important point about the non-directional
nature of the test, the chi-square test by
itself cannot speak to specific hypotheses
about the way the results would come out
• Not useful for ordinal data because of this
Admission.edhole.com
15. Assumptions
• Normality
– Rule of thumb is that we need at least 5 for our expected
frequencies value
• Inclusion of non-occurences
– Must include all responses, not just those positive ones
• Independence
– Not that the variables are independent or related (that’s what the
test can be used for), but rather as with our t-tests, the
observations (data points) don’t have any bearing on one
another.
• To help with the last two, make sure that your N equals
the total number of people who responded
Admission.edhole.com
16. Measures of Association
• Contingency coefficient
• Phi
• Cramer’s Phi
• Odds Ratios
• Kappa
• These were discussed in 5700
Admission.edhole.com