Chi-square test
Dr. Shivraj Nile
Zhejiang Chinese Medical University,
Hangzhou, Zhejiang 310053, PR China
nileshivraj@hotmail.com
Questions?
 What is the chi-square distribution? How is it
related to the Normal distribution?
 How is the chi-square distribution related to
the sampling distribution of the variance?
 Test a population value of the variance; put
confidence intervals around a population
value.
 How is the F distribution related the Normal?
To Chi-square?
Distributions
 There are many theoretical distributions, both
continuous and discrete. Howell calls these test
statistics.
 We use 4 test statistics a lot: z (unit normal), t
(student’s t-distribution), chi-square ( ), and F
(Probability Density Function) distribution.
 Z and t are closely related to the sampling
distribution of means; chi-square and F are
closely related to the sampling distribution of
variances.
2

Chi Square Test
A chi-squared test (χ2) is basically a data analysis on the basis of
observations of a random set of variables. Usually, it is a
comparison of two statistical data sets. This test was introduced
by Karl Pearson in 1900 for categorical data analysis and
distribution. So, it was mentioned as Pearson’s chi-squared test.
A Pearson’s chi-square test is a statistical test for categorical
data. It is used to determine whether your data are significantly
different from what you expected.
1.The chi-square goodness of fit test is used to test whether the
frequency distribution of a categorical variable is different from
your expectations.
2.The chi-square test of independence is used to test whether two
categorical variables are related to each other.
 The chi-square test is used to estimate how likely the
observations that are made would be, by considering the
assumption of the null hypothesis as true.
 A hypothesis is a consideration that a given condition or
statement might be true, which we can test afterwards.
Chi-squared tests are usually created from a sum of
squared falsities or errors over the sample variance.
 Chi-square is often written as Χ2 and is pronounced “kye-
square” (rhymes with “eye-square”). It is also called chi-
squared.
Formula
The chi-squared test is done to check if there is any difference between the observed value
and expected value. The formula can be written as;
χ2 = ∑(Oi – Ei)2/Ei
Where, Oi : Observed value and Ei : Expected value.
When to use a chi-square test
A Pearson’s chi-square test may be an appropriate option
for your data if all of the following are true:
1. You want to test a hypothesis about one or
more categorical variables. If one or more of your
variables is quantitative, you should use a
different statistical test. Alternatively, you could
convert the quantitative variable into a categorical
variable by separating the observations into intervals.
2. The sample was randomly selected from
the population.
3. There are a minimum of five observations
expected in each group or combination of groups.
How to perform a chi-square test
The exact procedure for performing a Pearson’s chi-square test
depends on which test you’re using, but it generally follows these
steps:
1. Create a table of the observed and expected frequencies. In this
step we have to carefully consider which expected values are most
appropriate for your null hypothesis.
2. Calculate the chi-square value from your observed and expected
frequencies using the chi-square formula.
3. Find the critical chi-square value in a chi-square critical value
table or using statistical software.
4. Compare the chi-square value to the critical value to determine
which is larger.
5. Decide whether to reject the null hypothesis. You should reject the
null hypothesis if the chi-square value is greater than the critical value. If
you reject the null hypothesis, you can conclude that your data are
significantly different from what you expected.
How to report a chi-square test
If you decide to include a Pearson’s chi-square test in your research
paper, dissertation or thesis, you should report it in your results section. You
can follow these rules if you want to report statistics in APA* Style:
1. You don’t need to provide a reference or formula since the chi-square test is
a commonly used statistic.
2. Refer to chi-square using its Greek symbol, Χ2. Although the symbol looks
very similar to an “X” from the Latin alphabet, it’s actually a different
symbol. Greek symbols should not be italicized.
3. Include a space on either side of the equal sign.
4. If your chi-square is less than zero, you should include a leading zero (a zero
before the decimal point) since the chi-square can be greater than zero.
5. Provide two significant digits after the decimal point.
6. Report the chi-square alongside its degrees of freedom (df), sample size,
and p value, following this format: Χ2 (degrees of freedom, N = sample size)
= chi-square value, p = p value). *American Psychological Association
Three essentials to apply chi- square test are
1. A random sample
2. Qualitative data
3. Lowest frequency not less than 5
Steps :-
1. Assumption of Null Hypothesis (HO).
2. Prepare a contingency table and note down the observed
frequencies or data (O).
3. Determine the expected number (E) by multiplying CT × RT
/GT (column total, row total and grand total ).
4. Find the difference between observed and expected frequencies
in each cell (O-E).
5. Calculate chi- square value for each cell with (O-E)²/E.
6. Sum up all chi –square values to get the total chi-square value
(χ)² d.f. (degrees of freedom) = χ²= ∑(O-E)²/E and d.f. is (c-1)
(r-1).
Calculation of chi-square value
 In order to determine the probability using a
chi square chart you need to determine the
degrees of freedom (DF)
 Degrees of Freedom: is the number of
phenotypic possibilities in your cross minus
one.
 DF = # of groups (phenotype classes) – 1
 Using the DF value, determine the probability
or distribution using the Chi Square table
 If the level of significance read from the table
is greater than 0.05 or 5% then the null
hypothesis is accepted and the results are due
to chance alone and are unbiased.
DF VALUE:
The mathematical properties
of chi-square distribution
Types of chi-square tests
The two types of Pearson’s chi-square
tests are:
1. Chi-square goodness of fit test
2. Chi-square test of independence
Mathematically, these are actually the same test.
However, we often think of them as different tests
because they’re used for different purposes.
1. Chi-square goodness of fit test
You can use a chi-square goodness of fit test when you
have one categorical variable. It allows you to test whether the
frequency distribution of the categorical variable is significantly
different from your expectations. Often, but not always, the
expectation is that the categories will have equal proportions.
Example: Hypotheses for chi-square goodness of fit test
Expectation of different proportions
 Null hypothesis (H0): The bird species visit the bird feeder in
the same proportions as the average over the past five years.
 Alternative hypothesis (HA): The bird species visit the bird
feeder in different proportions from the average over the past
five years.
2. Chi-square test of independence
You can use a chi-square test of
independence when you have two categorical
variables. It allows you to test whether the two variables
are related to each other. If two variables are
independent (unrelated), the probability of belonging to
a certain group of one variable isn’t affected by the other
variable.
Example: Chi-square test of independence
 Null hypothesis (H0): The proportion of people who are
left-handed is the same for Americans and Canadians.
 Alternative hypothesis (HA): The proportion of people
who are left-handed differs between nationalities.
1. Tests of goodness-of-fit
Observed frequencies of one variable are
significantly different from the expected frequencies
of the same variable.
E.g. occurrences of heads and tails while flipping a coin.
2. Chi-Square tests of independence (or relationship)
Two variables are associated or independent of the
other.
E.g. association between smoking and lung cancer.
Types of chi-square tests
 The chi-square test of independence is probably the
most frequently used hypothesis test in the medicine.
 In this PPT, we will use chi-square test to evaluate
differences among population when the test variable is
nominal, dichotomous, ordinal, or grouped interval.
Chi-square test
Independence Defined
 Two variables are independent if, for all cases, the
classification of a case into a particular category of one
variable (the group variable) has no effect on the
probability that the case will fall into any particular
category of the second variable (the test variable).
 When two variables are independent, there is no
relationship between them. We would expect that the
frequency breakdowns of the test variable to be similar
for all groups.
Independence Demonstrated
Suppose we are interested in the relationship between
gender and attending college.
 If there is no relationship between gender and
attending college and 40% of our total sample attend
college, we would expect 40% of the males in our
sample to attend college and 40% of the females to
attend college.
 If there is a relationship between gender and
attending college, we would expect a higher
proportion of one group to attend college than the
other group, e.g. 60% to 20%.
Displaying Independent and Dependent
Relationships
Independent Relationship
between Gender and College
40% 40% 40%
0%
20%
40%
60%
80%
100%
Males Females Total
Poportion
Attending
College
Dependent Relationship
between Gender and College
60%
20%
40%
0%
20%
40%
60%
80%
100%
Males Females Total
Poportion
Attending
College
When the variables are
independent, the proportion
in both groups is close to
the same size as the
proportion for the total
sample.
When group membership
makes a difference, the
dependent relationship is
indicated by one group having
a higher proportion than the
proportion for the total sample.
Independent and Dependent Variables
 The two variables in a chi-square test of independence
each play a specific role.
 The group variable is also known as the
independent variable because it has an influence
on the test variable.
 The test variable is also known as the dependent
variable because its value is believed to be
dependent on the value of the group variable.
 The chi-square test of independence is a test of the
influence or impact that a subject’s value on one
variable has on the same subject’s value for a second
variable.
Chi square distribution



E
E
O 2
2 )
(

Expected frequency
observed frequency
Expected frequency are computed as
if there is no difference between the
groups, i.e. both groups have the
same proportion.
This formula compute how
the pattern of observed
frequency differs from the
pattern of expected
frequency.
2. Chi-square distributions are determined by degree of freedom
Chi square distribution
1. Chi-square distribution is a nonsymmetrical distribution
Chi square test statistic
 Cannot be negative because all discrepancies are
squared.
 Will be zero only in the unusual event that each
observed frequency exactly equals the corresponding
expected frequency.
 Larger the discrepancy between the expected
frequencies and their corresponding observed
frequencies, the larger the observed value of chi-square.
Table . Partial Table of Critical Values of Chi-Square
Probability for chi square test statistic can be
obtained from the chi-square probability distribution.
0.05
reject region
The decision rule
The quantity will be small if the observed and
expected frequency are close together and will be large if
the differences are large.
The computed value of χ2 is compared with the tabulated
value of with K-1 degrees of freedom. The decision rule,
then is: reject H0 if χ2 is greater than or equal to the
tabulated χ2 for the chosen value of α.


E
E
O 2
)
(
Ratios Observed # Expected # O-E (O-E)2 (O-E)2/E
Stripes Only
Spots Only
Stripes/Spots
Chi Square Sum Σ=
Degrees of Freedom: ___________
Accept of Reject Null Hypothesis:______________________________________________
Example 1: A genetics engineer was attempting to cross a tiger and a
cheetah. She predicted a phenotypic outcome of the traits she was
observing to be in the following ratio 4 stripes only: 3, spots only: 9,
both stripes and spots. When the cross was performed and she
counted the individuals she found 50 with stripes only, 41 with spots
only and 85 with both. Run the Chi-Square Test (Hint: Calculate the
precents of observed and expected)
Chi-square (χ2) = ∑(Oi – Ei)2/Ei
Set up a table to keep track of the calculations:
Expected
ratio
Observed # Expected # O-E (O-E)2 (O-E)2/E
4 stripes 50 44 6 36 0.82
3 spots 41 33 8 64 1.94
9 stripes/
spots
85 99 -14 196 1.98
16 total 176 total 176 total 0 total Sum = 4.74
4/16 * 176 = expected # of stripes = 44
3/16 * 176 = expected # of spots = 33
9/16 * 176 = expected # stripes/spots = 99
Degrees of Freedom (df) = 3 - 1 = 2 (3 different characteristics -
stripes, spots, or both)
Since 4.74 is less than (P value) 5.991, I can accept the null
hypothesis put forward by the engineer.
The term “degrees of freedom” (d.f. or df) describes the freedom for
values, or variables, to vary. p-value is the probability of obtaining test
results at least as extreme as the result actually observed, under the
assumption that the null hypothesis is correct.
Ⅱ Chi-Square test
(tests of goodness-of-fit)



E
E
O 2
2 )
(

Model assumptions: No cell
has an expected frequency
less than 5.




E
E
O 2
2 )
5
.
0
(

At least one cell has an
expected frequency less
than 5.
Degrees of Freedom: k - 1
Number of outcomes
Tests of goodness-of-fit
Example 1
As personnel director, you want to test the perception of
fairness of three methods of performance evaluation. Of 180
employees, 63 rated Method 1 as fair. 45 rated Method 2 as
fair. 72 rated Method 3 as fair. At the 0.05 level, is there a
difference in perceptions?
Tests of goodness-of-fit
H0: p1 = p2 = p3 = 1/3
H1: At least 1 is different
a = 0.05
Tests of goodness-of-fit
 
 
      3
.
6
60
60
72
60
60
45
60
60
63
O
60
3
1
180
2
2
2
cells
all i
2
i
2
3
2
1













 E
E
E
E
E
i

Reject H0 at a = 0.05. There is evidence of a difference in
proportions
Exercise 1
Ask 100 People (n) Which of 3 Candidates (k) They Will
Vote For. At the 0.05 level, is there a difference in
candidates?
Candidate
Tom Bill Mary Total
35 20 45 100
Tests of goodness-of-fit
Ⅲ Chi-Square test
(tests of independence or relationship)
Hypothesis test for 2×2 table
Pearson chi-
square
Continuity
correction of
chi- square
Fisher’ exact
test
1. Hypothesis test for 2×2 table
n≥40 and
E≥5
n≥40 and
1≤ E < 5
n<40 or E<1



E
E
O 2
2 )
(

n≥40 and E≥5
)
)(
)(
)(
(
)
( 2
2
d
b
c
a
d
c
b
a
n
bc
ad







1. Hypothesis test for 2×2 table
Pearson chi- square




E
E
O 2
2 )
5
.
0
|
(|

n ≥ 40 and 1≤E<5
)
)(
)(
)(
(
)
2
/
|
(| 2
2
d
b
c
a
d
c
b
a
n
n
bc
ad








1. Hypothesis test for 2×2 table
Continuity correction of chi- square
!
!
!
!
!
)!
(
)!
(
)!
(
)!
(
n
d
c
b
a
d
b
c
a
d
c
b
a
P




 n<40 or E<1
1. Hypothesis test for 2×2 table
Fisher’ exact test
Example 2
A sample of 200 college students participated in
a study designed to evaluate the level of college
students’ knowledge of a certain group of
common diseases. The following table shows
the students classified by major field of study
and level of knowledge of the group of diseases:
1. Hypothesis test for 2×2 table
major good poor total
premedical 16 24 40
other 20 140 160
total 36 164 200
Do these data suggest that there is a relationship between
knowledge of the group of diseases and major field of study
of the college students from which the present sample was
drawn? Let α=0.05.
1. Hypothesis test for 2×2 table
major good poor total
premedical a b R1
other c d R2
total C1 C2 n
Four cells  four-fold table
16 24
20 140
131.2
28.8
32.8
7.2
Observed cells
Expected cells
200
164
160
;
200
36
160
200
164
40
;
200
36
40
22
21
12
11








E
E
E
E
H0: there is no relationship (independent) between knowledge and
major field
H1: there is a relationship between knowledge and major field
a = 0.05
396
.
16
2
.
131
2
.
131
140
8
.
28
8
.
28
20
8
.
32
8
.
32
24
2
.
7
2
.
7
16
)
(
2
2
2
2
2
2










 
)
(
)
(
)
(
)
(
E
E
O

131.2
28.8
32.8
7.2
140
20
24
16
396
.
16
)
164
36
160
40
/(
200
20
24
140
16
)
)(
)(
)(
(
)
(
2
2
2















)
(
d
b
c
a
d
c
b
a
n
bc
ad

1. Chi-Square test for 2×2table
df=(R-1)(C-1)=1
84
.
3
1
,
05
.
0
2


Reject H0 at a=0 .05
There is relationship between knowledge of the group of
diseases and major field of study of the college students.
The students major in premedical has higher knowledge
rates of diseases.
1. Chi-Square test for 2×2table
Exercise 2
A study was conducted to determine whether the antibody
status in wives is related with antibody status in their
husband. 48 couples were examined, the data regarding
the incidence of anti- sperm antibodies is as follows:
Ab of wife
Ab of husband
- + Total
- 8 10 18
+ 4 23 27
total 12 33 45
Question: Is the antibody status in wives
related with antibody status in their husband?
H0: the antibody status in wives is related with antibody
status in their husband
H1: the antibody status in wives is not related with antibody
status in their husband
a = 0.05
452
.
3
)
)(
)(
)(
(
)
2
/
( 2
2








d
b
c
a
d
c
b
a
n
n
bc
ad

Not reject H0, we can not think the antibody status in
wives is related with antibody status in their husband.
Hypothesis test for R×C table
Pearson chi- square Fisher’ exact test
2. hypothesis test for R×C table



E
E
O 2
2 )
(

 
 )
1
(
2
2
C
Rn
n
O
n

Model assumptions :
The expected frequency should be greater than 5 in
more than 4/5 cells;
The expected frequency in any cell should be greater
than 1.
Pearson chi- square for R×C table
Example 3:
To study menstrual dysfunction in distance runners. Somebody
did an observational study of three groups of women. The first
two groups were volunteers who regularly engaged in some
form of running, and the third, a control group, consisted of
women who did not run but were otherwise similar to the
other two groups. The runners were divided into joggers who
jog "slow and easy" 5 to 30 miles per week, and runners who
run more than 30 miles per week and combine long, slow
distance with speed work. The investigators used a survey to
show that the three groups were similar in the amount of
physical activity (aside from running), distribution of ages,
heights, occupations, and type of birth control methods being
used.
Are these data consistent with the hypothesis that running
does not increase the likelihood that a woman will consult
her physician for a menstrual problem?
Table. shows these expected frequencies, together with
the expected frequencies of women who did not consult
their physicians.
58
.
22
165
54
69
11 


E
627
.
9
...
42
.
31
)
42
.
31
40
(
58
.
22
)
58
.
22
14
( 2
2
2







2
)
1
2
)(
1
3
(
)
1
)(
1
( 





 c
r

H0: π1 = π2 = π3
H1: At least 1 is different from the other
a = 0.05
99
.
5
2
,
05
.
0
2


Reject H0 at 0.05 level, so we can think that running
increases the likelihood a woman will consult her physician
for a menstrual problem.
Chi-square IMP.ppt
Chi-square IMP.ppt

Chi-square IMP.ppt

  • 1.
    Chi-square test Dr. ShivrajNile Zhejiang Chinese Medical University, Hangzhou, Zhejiang 310053, PR China nileshivraj@hotmail.com
  • 2.
    Questions?  What isthe chi-square distribution? How is it related to the Normal distribution?  How is the chi-square distribution related to the sampling distribution of the variance?  Test a population value of the variance; put confidence intervals around a population value.  How is the F distribution related the Normal? To Chi-square?
  • 3.
    Distributions  There aremany theoretical distributions, both continuous and discrete. Howell calls these test statistics.  We use 4 test statistics a lot: z (unit normal), t (student’s t-distribution), chi-square ( ), and F (Probability Density Function) distribution.  Z and t are closely related to the sampling distribution of means; chi-square and F are closely related to the sampling distribution of variances. 2 
  • 4.
    Chi Square Test Achi-squared test (χ2) is basically a data analysis on the basis of observations of a random set of variables. Usually, it is a comparison of two statistical data sets. This test was introduced by Karl Pearson in 1900 for categorical data analysis and distribution. So, it was mentioned as Pearson’s chi-squared test. A Pearson’s chi-square test is a statistical test for categorical data. It is used to determine whether your data are significantly different from what you expected. 1.The chi-square goodness of fit test is used to test whether the frequency distribution of a categorical variable is different from your expectations. 2.The chi-square test of independence is used to test whether two categorical variables are related to each other.
  • 5.
     The chi-squaretest is used to estimate how likely the observations that are made would be, by considering the assumption of the null hypothesis as true.  A hypothesis is a consideration that a given condition or statement might be true, which we can test afterwards. Chi-squared tests are usually created from a sum of squared falsities or errors over the sample variance.  Chi-square is often written as Χ2 and is pronounced “kye- square” (rhymes with “eye-square”). It is also called chi- squared. Formula The chi-squared test is done to check if there is any difference between the observed value and expected value. The formula can be written as; χ2 = ∑(Oi – Ei)2/Ei Where, Oi : Observed value and Ei : Expected value.
  • 6.
    When to usea chi-square test A Pearson’s chi-square test may be an appropriate option for your data if all of the following are true: 1. You want to test a hypothesis about one or more categorical variables. If one or more of your variables is quantitative, you should use a different statistical test. Alternatively, you could convert the quantitative variable into a categorical variable by separating the observations into intervals. 2. The sample was randomly selected from the population. 3. There are a minimum of five observations expected in each group or combination of groups.
  • 7.
    How to performa chi-square test The exact procedure for performing a Pearson’s chi-square test depends on which test you’re using, but it generally follows these steps: 1. Create a table of the observed and expected frequencies. In this step we have to carefully consider which expected values are most appropriate for your null hypothesis. 2. Calculate the chi-square value from your observed and expected frequencies using the chi-square formula. 3. Find the critical chi-square value in a chi-square critical value table or using statistical software. 4. Compare the chi-square value to the critical value to determine which is larger. 5. Decide whether to reject the null hypothesis. You should reject the null hypothesis if the chi-square value is greater than the critical value. If you reject the null hypothesis, you can conclude that your data are significantly different from what you expected.
  • 8.
    How to reporta chi-square test If you decide to include a Pearson’s chi-square test in your research paper, dissertation or thesis, you should report it in your results section. You can follow these rules if you want to report statistics in APA* Style: 1. You don’t need to provide a reference or formula since the chi-square test is a commonly used statistic. 2. Refer to chi-square using its Greek symbol, Χ2. Although the symbol looks very similar to an “X” from the Latin alphabet, it’s actually a different symbol. Greek symbols should not be italicized. 3. Include a space on either side of the equal sign. 4. If your chi-square is less than zero, you should include a leading zero (a zero before the decimal point) since the chi-square can be greater than zero. 5. Provide two significant digits after the decimal point. 6. Report the chi-square alongside its degrees of freedom (df), sample size, and p value, following this format: Χ2 (degrees of freedom, N = sample size) = chi-square value, p = p value). *American Psychological Association
  • 9.
    Three essentials toapply chi- square test are 1. A random sample 2. Qualitative data 3. Lowest frequency not less than 5 Steps :- 1. Assumption of Null Hypothesis (HO). 2. Prepare a contingency table and note down the observed frequencies or data (O). 3. Determine the expected number (E) by multiplying CT × RT /GT (column total, row total and grand total ). 4. Find the difference between observed and expected frequencies in each cell (O-E). 5. Calculate chi- square value for each cell with (O-E)²/E. 6. Sum up all chi –square values to get the total chi-square value (χ)² d.f. (degrees of freedom) = χ²= ∑(O-E)²/E and d.f. is (c-1) (r-1). Calculation of chi-square value
  • 10.
     In orderto determine the probability using a chi square chart you need to determine the degrees of freedom (DF)  Degrees of Freedom: is the number of phenotypic possibilities in your cross minus one.  DF = # of groups (phenotype classes) – 1  Using the DF value, determine the probability or distribution using the Chi Square table  If the level of significance read from the table is greater than 0.05 or 5% then the null hypothesis is accepted and the results are due to chance alone and are unbiased. DF VALUE:
  • 11.
    The mathematical properties ofchi-square distribution
  • 12.
    Types of chi-squaretests The two types of Pearson’s chi-square tests are: 1. Chi-square goodness of fit test 2. Chi-square test of independence Mathematically, these are actually the same test. However, we often think of them as different tests because they’re used for different purposes.
  • 13.
    1. Chi-square goodnessof fit test You can use a chi-square goodness of fit test when you have one categorical variable. It allows you to test whether the frequency distribution of the categorical variable is significantly different from your expectations. Often, but not always, the expectation is that the categories will have equal proportions. Example: Hypotheses for chi-square goodness of fit test Expectation of different proportions  Null hypothesis (H0): The bird species visit the bird feeder in the same proportions as the average over the past five years.  Alternative hypothesis (HA): The bird species visit the bird feeder in different proportions from the average over the past five years.
  • 14.
    2. Chi-square testof independence You can use a chi-square test of independence when you have two categorical variables. It allows you to test whether the two variables are related to each other. If two variables are independent (unrelated), the probability of belonging to a certain group of one variable isn’t affected by the other variable. Example: Chi-square test of independence  Null hypothesis (H0): The proportion of people who are left-handed is the same for Americans and Canadians.  Alternative hypothesis (HA): The proportion of people who are left-handed differs between nationalities.
  • 15.
    1. Tests ofgoodness-of-fit Observed frequencies of one variable are significantly different from the expected frequencies of the same variable. E.g. occurrences of heads and tails while flipping a coin. 2. Chi-Square tests of independence (or relationship) Two variables are associated or independent of the other. E.g. association between smoking and lung cancer. Types of chi-square tests
  • 16.
     The chi-squaretest of independence is probably the most frequently used hypothesis test in the medicine.  In this PPT, we will use chi-square test to evaluate differences among population when the test variable is nominal, dichotomous, ordinal, or grouped interval. Chi-square test
  • 17.
    Independence Defined  Twovariables are independent if, for all cases, the classification of a case into a particular category of one variable (the group variable) has no effect on the probability that the case will fall into any particular category of the second variable (the test variable).  When two variables are independent, there is no relationship between them. We would expect that the frequency breakdowns of the test variable to be similar for all groups.
  • 18.
    Independence Demonstrated Suppose weare interested in the relationship between gender and attending college.  If there is no relationship between gender and attending college and 40% of our total sample attend college, we would expect 40% of the males in our sample to attend college and 40% of the females to attend college.  If there is a relationship between gender and attending college, we would expect a higher proportion of one group to attend college than the other group, e.g. 60% to 20%.
  • 19.
    Displaying Independent andDependent Relationships Independent Relationship between Gender and College 40% 40% 40% 0% 20% 40% 60% 80% 100% Males Females Total Poportion Attending College Dependent Relationship between Gender and College 60% 20% 40% 0% 20% 40% 60% 80% 100% Males Females Total Poportion Attending College When the variables are independent, the proportion in both groups is close to the same size as the proportion for the total sample. When group membership makes a difference, the dependent relationship is indicated by one group having a higher proportion than the proportion for the total sample.
  • 20.
    Independent and DependentVariables  The two variables in a chi-square test of independence each play a specific role.  The group variable is also known as the independent variable because it has an influence on the test variable.  The test variable is also known as the dependent variable because its value is believed to be dependent on the value of the group variable.  The chi-square test of independence is a test of the influence or impact that a subject’s value on one variable has on the same subject’s value for a second variable.
  • 21.
    Chi square distribution    E E O2 2 ) (  Expected frequency observed frequency Expected frequency are computed as if there is no difference between the groups, i.e. both groups have the same proportion. This formula compute how the pattern of observed frequency differs from the pattern of expected frequency.
  • 22.
    2. Chi-square distributionsare determined by degree of freedom Chi square distribution 1. Chi-square distribution is a nonsymmetrical distribution
  • 23.
    Chi square teststatistic  Cannot be negative because all discrepancies are squared.  Will be zero only in the unusual event that each observed frequency exactly equals the corresponding expected frequency.  Larger the discrepancy between the expected frequencies and their corresponding observed frequencies, the larger the observed value of chi-square.
  • 24.
    Table . PartialTable of Critical Values of Chi-Square Probability for chi square test statistic can be obtained from the chi-square probability distribution. 0.05 reject region
  • 25.
    The decision rule Thequantity will be small if the observed and expected frequency are close together and will be large if the differences are large. The computed value of χ2 is compared with the tabulated value of with K-1 degrees of freedom. The decision rule, then is: reject H0 if χ2 is greater than or equal to the tabulated χ2 for the chosen value of α.   E E O 2 ) (
  • 26.
    Ratios Observed #Expected # O-E (O-E)2 (O-E)2/E Stripes Only Spots Only Stripes/Spots Chi Square Sum Σ= Degrees of Freedom: ___________ Accept of Reject Null Hypothesis:______________________________________________ Example 1: A genetics engineer was attempting to cross a tiger and a cheetah. She predicted a phenotypic outcome of the traits she was observing to be in the following ratio 4 stripes only: 3, spots only: 9, both stripes and spots. When the cross was performed and she counted the individuals she found 50 with stripes only, 41 with spots only and 85 with both. Run the Chi-Square Test (Hint: Calculate the precents of observed and expected)
  • 27.
    Chi-square (χ2) =∑(Oi – Ei)2/Ei Set up a table to keep track of the calculations: Expected ratio Observed # Expected # O-E (O-E)2 (O-E)2/E 4 stripes 50 44 6 36 0.82 3 spots 41 33 8 64 1.94 9 stripes/ spots 85 99 -14 196 1.98 16 total 176 total 176 total 0 total Sum = 4.74 4/16 * 176 = expected # of stripes = 44 3/16 * 176 = expected # of spots = 33 9/16 * 176 = expected # stripes/spots = 99 Degrees of Freedom (df) = 3 - 1 = 2 (3 different characteristics - stripes, spots, or both) Since 4.74 is less than (P value) 5.991, I can accept the null hypothesis put forward by the engineer.
  • 28.
    The term “degreesof freedom” (d.f. or df) describes the freedom for values, or variables, to vary. p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct.
  • 29.
    Ⅱ Chi-Square test (testsof goodness-of-fit)
  • 30.
       E E O 2 2 ) (  Modelassumptions: No cell has an expected frequency less than 5.     E E O 2 2 ) 5 . 0 (  At least one cell has an expected frequency less than 5. Degrees of Freedom: k - 1 Number of outcomes Tests of goodness-of-fit
  • 31.
    Example 1 As personneldirector, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair. 45 rated Method 2 as fair. 72 rated Method 3 as fair. At the 0.05 level, is there a difference in perceptions? Tests of goodness-of-fit
  • 32.
    H0: p1 =p2 = p3 = 1/3 H1: At least 1 is different a = 0.05 Tests of goodness-of-fit           3 . 6 60 60 72 60 60 45 60 60 63 O 60 3 1 180 2 2 2 cells all i 2 i 2 3 2 1               E E E E E i  Reject H0 at a = 0.05. There is evidence of a difference in proportions
  • 33.
    Exercise 1 Ask 100People (n) Which of 3 Candidates (k) They Will Vote For. At the 0.05 level, is there a difference in candidates? Candidate Tom Bill Mary Total 35 20 45 100 Tests of goodness-of-fit
  • 34.
    Ⅲ Chi-Square test (testsof independence or relationship)
  • 35.
    Hypothesis test for2×2 table Pearson chi- square Continuity correction of chi- square Fisher’ exact test 1. Hypothesis test for 2×2 table n≥40 and E≥5 n≥40 and 1≤ E < 5 n<40 or E<1
  • 36.
       E E O 2 2 ) (  n≥40and E≥5 ) )( )( )( ( ) ( 2 2 d b c a d c b a n bc ad        1. Hypothesis test for 2×2 table Pearson chi- square
  • 37.
        E E O 2 2 ) 5 . 0 | (|  n≥ 40 and 1≤E<5 ) )( )( )( ( ) 2 / | (| 2 2 d b c a d c b a n n bc ad         1. Hypothesis test for 2×2 table Continuity correction of chi- square
  • 38.
    ! ! ! ! ! )! ( )! ( )! ( )! ( n d c b a d b c a d c b a P      n<40 orE<1 1. Hypothesis test for 2×2 table Fisher’ exact test
  • 39.
    Example 2 A sampleof 200 college students participated in a study designed to evaluate the level of college students’ knowledge of a certain group of common diseases. The following table shows the students classified by major field of study and level of knowledge of the group of diseases: 1. Hypothesis test for 2×2 table
  • 40.
    major good poortotal premedical 16 24 40 other 20 140 160 total 36 164 200 Do these data suggest that there is a relationship between knowledge of the group of diseases and major field of study of the college students from which the present sample was drawn? Let α=0.05. 1. Hypothesis test for 2×2 table
  • 41.
    major good poortotal premedical a b R1 other c d R2 total C1 C2 n Four cells  four-fold table 16 24 20 140 131.2 28.8 32.8 7.2 Observed cells Expected cells 200 164 160 ; 200 36 160 200 164 40 ; 200 36 40 22 21 12 11         E E E E
  • 42.
    H0: there isno relationship (independent) between knowledge and major field H1: there is a relationship between knowledge and major field a = 0.05 396 . 16 2 . 131 2 . 131 140 8 . 28 8 . 28 20 8 . 32 8 . 32 24 2 . 7 2 . 7 16 ) ( 2 2 2 2 2 2             ) ( ) ( ) ( ) ( E E O  131.2 28.8 32.8 7.2 140 20 24 16 396 . 16 ) 164 36 160 40 /( 200 20 24 140 16 ) )( )( )( ( ) ( 2 2 2                ) ( d b c a d c b a n bc ad  1. Chi-Square test for 2×2table
  • 43.
    df=(R-1)(C-1)=1 84 . 3 1 , 05 . 0 2   Reject H0 ata=0 .05 There is relationship between knowledge of the group of diseases and major field of study of the college students. The students major in premedical has higher knowledge rates of diseases. 1. Chi-Square test for 2×2table
  • 44.
    Exercise 2 A studywas conducted to determine whether the antibody status in wives is related with antibody status in their husband. 48 couples were examined, the data regarding the incidence of anti- sperm antibodies is as follows:
  • 45.
    Ab of wife Abof husband - + Total - 8 10 18 + 4 23 27 total 12 33 45 Question: Is the antibody status in wives related with antibody status in their husband?
  • 46.
    H0: the antibodystatus in wives is related with antibody status in their husband H1: the antibody status in wives is not related with antibody status in their husband a = 0.05 452 . 3 ) )( )( )( ( ) 2 / ( 2 2         d b c a d c b a n n bc ad  Not reject H0, we can not think the antibody status in wives is related with antibody status in their husband.
  • 47.
    Hypothesis test forR×C table Pearson chi- square Fisher’ exact test 2. hypothesis test for R×C table
  • 48.
       E E O 2 2 ) (    ) 1 ( 2 2 C Rn n O n  Model assumptions : The expected frequency should be greater than 5 in more than 4/5 cells; The expected frequency in any cell should be greater than 1. Pearson chi- square for R×C table
  • 49.
    Example 3: To studymenstrual dysfunction in distance runners. Somebody did an observational study of three groups of women. The first two groups were volunteers who regularly engaged in some form of running, and the third, a control group, consisted of women who did not run but were otherwise similar to the other two groups. The runners were divided into joggers who jog "slow and easy" 5 to 30 miles per week, and runners who run more than 30 miles per week and combine long, slow distance with speed work. The investigators used a survey to show that the three groups were similar in the amount of physical activity (aside from running), distribution of ages, heights, occupations, and type of birth control methods being used.
  • 50.
    Are these dataconsistent with the hypothesis that running does not increase the likelihood that a woman will consult her physician for a menstrual problem?
  • 51.
    Table. shows theseexpected frequencies, together with the expected frequencies of women who did not consult their physicians. 58 . 22 165 54 69 11    E
  • 52.
    627 . 9 ... 42 . 31 ) 42 . 31 40 ( 58 . 22 ) 58 . 22 14 ( 2 2 2        2 ) 1 2 )( 1 3 ( ) 1 )( 1 (       c r  H0: π1 = π2 = π3 H1: At least 1 is different from the other a = 0.05 99 . 5 2 , 05 . 0 2   Reject H0 at 0.05 level, so we can think that running increases the likelihood a woman will consult her physician for a menstrual problem.