Chi Square Test
Rabindra Adhikary
M. Optom 1st Batch
Tilganga Institute of Ophthalmology, PU
• Statistical test that measures the association
between two categorical variables.
• Most commonly applied in questionnaire data
from a survey
– Response is yes, no [nominal or ordinal]
• If the observed χ2test statistic is greater than the
critical value from table  null hypothesis is
rejected
– χ2 > critical value at given df and level of significance;
H0 rejected
Formula
Example
• Presume you observe 100 people to see who
deposits garbage in the can and who litters. You
want to see if there is difference between the
gender.
• A person can fall under one of the following 4
categories:
– Male, deposits garbage
– Male, litters
– Female, deposits garbage
– Female, litters
deposit litter
female 18 7
male 42 33
We can put this nominal data in 2X2
contingency table
Deposit Litter Total
Female 18 7 25
Male 42 33 75
Total 60 40 100
Deposit Litter Total
Female 15
Male 30
If Null Hypothesis were true there is no difference in Gender
So, the Expected outcome would be:
We can put this nominal data in 2X2
contingency table
Deposit Litter Total
Female 18 7 25
Male 42 33 75
Total 60 40 100
Deposit Litter Total
Female 15 10
Male 45 30
If Null Hypothesis were true there is no difference in Gender
So, the Expected outcome would be:
We can put this nominal data in 2X2
contingency table
Deposit Litter Total
Female 18 7 25
Male 42 33 75
Total 60 40 100
Deposit Litter Total
Female 15 10 25
Male 45 30 75
Total 60 40 100
If Null Hypothesis were true there is no difference in Gender
So, the Expected outcome would be:
Single Table
observed Expected O-E (O-E)2
18 15 3 9
42 45 -3 9
7 10 -3 9
33 30 3 9
9/15 + 9/45 + 9/10 + 9/30 = 2
Degree of Freedom [df]
• df = [C-1][R-1]
– C= number of columns
– R= number of Rows
• Chi square statistic > critical value
But in our observation,
• Chi square statistic (2) < critical value (3.841)
– pN6f]kf] cfof]!
• Null Hypothesis is NOT Rejected.
– Meaning ?
We retain the null hypothesis as:
• The littering or depositing behavior is
independent of the gender distribution
• Do you get how to assume the expected outcome
?
– toss a coin 100 times. Observed outcome H-60, T-40
• What was the expected outcome?
• What is the chi squared statistic ?
• Is Null hypothesis rejected ?
• The more spread are the observed variable from
the expected variable  greater chance of null
hypothesis being rejected
– Do you agree ? ? how
Key Assumptions in χ2
• Each individual appears in the table once only
• The result for each individual is independent
of all other individuals
• The table of expected values should have 80%
of all expected values greater than 5.
• This test is valid only when you have
reasonable sample size
Key Assumptions in χ2
• For 2X2 table [only 2 categories in each variable]
– χ2 test can be used when total sample size is > 40
– if the total sample size is 20-40, and smallest
expected frequency is at least 5 χ2 test can be used
– Otherwise Fisher’s exact test should be used [SPSS
will automatically give this]
• For all other table:
– χ2 can be used if no more than 20% of the expected
frequencies are less than 5 and none is less than 1
• More Examples…..
• Discussion….
Thank You

Chi squared test

  • 1.
    Chi Square Test RabindraAdhikary M. Optom 1st Batch Tilganga Institute of Ophthalmology, PU
  • 2.
    • Statistical testthat measures the association between two categorical variables. • Most commonly applied in questionnaire data from a survey – Response is yes, no [nominal or ordinal] • If the observed χ2test statistic is greater than the critical value from table  null hypothesis is rejected – χ2 > critical value at given df and level of significance; H0 rejected
  • 3.
  • 4.
    Example • Presume youobserve 100 people to see who deposits garbage in the can and who litters. You want to see if there is difference between the gender. • A person can fall under one of the following 4 categories: – Male, deposits garbage – Male, litters – Female, deposits garbage – Female, litters deposit litter female 18 7 male 42 33
  • 5.
    We can putthis nominal data in 2X2 contingency table Deposit Litter Total Female 18 7 25 Male 42 33 75 Total 60 40 100 Deposit Litter Total Female 15 Male 30 If Null Hypothesis were true there is no difference in Gender So, the Expected outcome would be:
  • 6.
    We can putthis nominal data in 2X2 contingency table Deposit Litter Total Female 18 7 25 Male 42 33 75 Total 60 40 100 Deposit Litter Total Female 15 10 Male 45 30 If Null Hypothesis were true there is no difference in Gender So, the Expected outcome would be:
  • 7.
    We can putthis nominal data in 2X2 contingency table Deposit Litter Total Female 18 7 25 Male 42 33 75 Total 60 40 100 Deposit Litter Total Female 15 10 25 Male 45 30 75 Total 60 40 100 If Null Hypothesis were true there is no difference in Gender So, the Expected outcome would be:
  • 8.
    Single Table observed ExpectedO-E (O-E)2 18 15 3 9 42 45 -3 9 7 10 -3 9 33 30 3 9 9/15 + 9/45 + 9/10 + 9/30 = 2
  • 9.
    Degree of Freedom[df] • df = [C-1][R-1] – C= number of columns – R= number of Rows
  • 11.
    • Chi squarestatistic > critical value But in our observation, • Chi square statistic (2) < critical value (3.841) – pN6f]kf] cfof]! • Null Hypothesis is NOT Rejected. – Meaning ?
  • 12.
    We retain thenull hypothesis as: • The littering or depositing behavior is independent of the gender distribution
  • 13.
    • Do youget how to assume the expected outcome ? – toss a coin 100 times. Observed outcome H-60, T-40 • What was the expected outcome? • What is the chi squared statistic ? • Is Null hypothesis rejected ? • The more spread are the observed variable from the expected variable  greater chance of null hypothesis being rejected – Do you agree ? ? how
  • 14.
    Key Assumptions inχ2 • Each individual appears in the table once only • The result for each individual is independent of all other individuals • The table of expected values should have 80% of all expected values greater than 5. • This test is valid only when you have reasonable sample size
  • 15.
    Key Assumptions inχ2 • For 2X2 table [only 2 categories in each variable] – χ2 test can be used when total sample size is > 40 – if the total sample size is 20-40, and smallest expected frequency is at least 5 χ2 test can be used – Otherwise Fisher’s exact test should be used [SPSS will automatically give this] • For all other table: – χ2 can be used if no more than 20% of the expected frequencies are less than 5 and none is less than 1
  • 16.
  • 17.
  • 18.