SlideShare a Scribd company logo
ACHARYA NARENDRA DEVA UNIVERSITY OF AGRICULTURE &
TECHNOLOGY, KUMARGANJ, AYODHYA (U.P.) 224229
Assignment
on
Chi-square test
Course No : STAT-502 4(3+1)
Course name : Statistical methods for applied sciences
Presented to : Presented by :
Dr. Vishal Mehta Vikas Yadav
Assistant Professor Id. No. A-11153/19/22
Department of Agril. Statistics Ph. D. 1st Semester
Soil Science and Agril. Chemistry
Content:
• Introduction
• Properties of Chi-square test
• Limitations of Chi-square test
• Type of Chi-square test
1. Chi-square test for goodness of fit
2. Chi-square test for independence
• Chi-square test for goodness of fit: Example
• Chi-square test for independence: Example
• References
Introduction:
• Chi-square (χ2) test is a statistical method used to
determine whether there is a significant difference
between the observed and expected frequencies in one
or more categories. It is commonly used in various
fields such as medical research, social sciences, and
business to test the goodness of fit, independence, and
association.
Properties
The chi-square test has the following significant properties:
1.If you multiply the number of degrees of freedom by two,
you will receive an answer that is equal to the variance.
2.The chi-square distribution curve approaches the data is
normally distributed as the degree of freedom increases.
3.The mean distribution is equal to the number of degrees of
freedom
Limitations of Chi-Square Test
There are two limitations to using the chi-square test that you
should be aware of.
• The chi-square test, for starters, is extremely sensitive to
sample size. Even insignificant relationships can appear
statistically significant when a large enough sample is used.
Keep in mind that "statistically significant" does not always
imply "meaningful" when using the chi-square test.
• Be mindful that the chi-square can only determine whether
two variables are related. It does not necessarily follow that
one variable has a causal relationship with the other. It would
require a more detailed analysis to establish causality
Types of chi-square test
1. Chi-square test for goodness of fit
2. Chi-square test for independence
Chi-square test for goodness of fit:
• Chi-square test for goodness of fit is used to determine
whether the observed data follow a certain distribution. For
example, if we want to know whether the observed data
follow a normal distribution or not, we can use the chi-square
test for goodness of fit. The test compares the observed
frequencies with the expected frequencies based on the null
hypothesis.
Chi-Square Goodness of Fit Test: Formula
• A Chi-Square goodness of fit test uses the following null and
alternative hypotheses
• H0: (null hypothesis) A variable follows a hypothesized
distribution.
• H1: (alternative hypothesis) A variable does not follow a
hypothesized distribution.
We use the following formula to calculate the Chi-Square test
statistic X2:
X2 = Σ(O-E)2 / E
where:
Σ: is a fancy symbol that means “sum”
O: observed value
E: expected value
Chi-Square test Goodness of Fit Test: Example
• A shop owner claims that an equal number of customers
come into his shop each weekday. To test this hypothesis, an
independent researcher records the number of customers that
come into the shop on a given week and finds the following:
• Monday: 50 customers
• Tuesday: 60 customers
• Wednesday: 40 customers
• Thursday: 47 customers
• Friday: 53 customers
Solution:
We will use the following steps to perform a Chi-Square goodness
of fit test to determine if the data is consistent with the shop
owner’s claim
Step 1: Define the hypotheses.
We will perform the Chi-Square goodness of fit test using the
following hypotheses:
• H0: An equal number of customers come into the shop each day.
• H1: An equal number of customers do not come into the shop
each day.
Step 2: Calculate (O-E)2 / E for each day.
There were a total of 250 customers that came into the shop
during the week. Thus, if we expected an equal amount to
come in each day then the expected value “E” for each day
would be 50
• Monday: (50-50)2 / 50 = 0
• Tuesday: (60-50)2 / 50 = 2
• Wednesday: (40-50)2 / 50 = 2
• Thursday: (47-50)2 / 50 = 0.18
• Friday: (53-50)2 / 50 = 0.18
• Step 3: Calculate the test statistic X2.
X2 = Σ(O-E)2 / E = 0 + 2 + 2 + 0.18 + 0.18 = 4.36
• Step 4: Calculate the p-value of the test statistic X2.
The p-value associated with X2 = 4.36 and n-1 = 5-1 = 4
degrees of freedom is 0.359472
Conclusion
• Since this p-value is not less than 0.05, we fail to reject the
null hypothesis. This means we do not have sufficient
evidence to say that the true distribution of customers is
different from the distribution that the shop owner claimed
Chi-square test for independence:
Chi-square test for independence is used to determine whether
there is a relationship between two variables. For example, if
we want to know whether there is a relationship between
smoking and lung cancer, we can use the chi-square test for
independence. The test compares the observed frequencies
with the expected frequencies based on the null hypothesis that
there is no relationship between the two variables.
Assumptions
• Both variables are CATEGORICAL
• Observations are INDEPENDENT
• The COUNT for each category is GREATER THAN 5
• Each count in a category is MUTUALLY EXCLUSIVE
• Data is chosen RANDOMLY
Chi-Square test for independence : Example
• We want to see if age has an impact on what political party
you vote for. We collect a random sample of 135 people and
display it in the following contingency table broken down by
age and political party.
Solution
Hypothesis
Lets start by stating our hypotheses:
• H_0: Age has no impact on the political party you vote for.
The two variables are independent.
• H_1: Age does have an impact on the political party. The two
variables are dependent.
Significance Level and Critical Value
For this example we will use a 5% significance level. As we
have 2 degrees of freedom (using the formula above):
v = (3 - 1) (2 - 1) = 2
Using the significance level, degrees of freedom and Chi-
Square probability table we find our critical value to be 5.991.
This means our Chi-Square statistic needs to be greater than
5.991 in order for us to reject the null hypothesis and the
variables to not be independent
• Calculating Expected Counts
We now need to determine the expected count frequency for each cell in
our contingency table. These are the expected values if the null
hypothesis is true and is calculated using the following formula:
Er,c = nr*nc /nT
Where n_r and n_c are the row and column totals for certain categories
and n_T is the total number of counts.
For example, the expected count for ages 18–30 who voted Liberals is:
E1,1 = 35*90/135 = 23.3
We can then populate the contingency table with these expected values
(in brackets):
Chi-Square Statistic
It is now time to calculate the Chi-Square statistic using the
formula above
Χ2
2=(10-23.3)2/23.3 + (30-30)2/30 + (50–36.7)2/36.7 +
(25–11.7)2/11.7 + (15–15)2/15 + (5–18.3)2/18.3
This equals 37.2
Therefore, our statistic is much greater than the critical value and
so we can reject the null hypothesis
Conclusion
In this article we have described and shown an example of the
Chi-Square test of independence. This test measures if two
categorical variables are dependent on each-other. This is used
in Data Science for Feature Selection where we only want
modelling features that have an effect on the target.
References
1. Wikipedia
2. www.towarddatascience.com
3. www.statology.org
4. Agresti, A. (2018). An introduction to categorical data
analysis. Wiley.
5. Kothari, C. R. (2004). Research methodology: methods and
techniques. New Age International.

More Related Content

Similar to Chi-square test.pptx

chi sqare test.ppt
chi sqare test.pptchi sqare test.ppt
chi sqare test.ppt
SachinJadhav810388
 
chapter18.ppt
chapter18.pptchapter18.ppt
chapter18.ppt
Cisco Systems
 
Chi sqaure test
Chi sqaure testChi sqaure test
Chi sqaure test
Pritam Kolge
 
Hypothsis testing
Hypothsis testingHypothsis testing
Hypothsis testing
University of Balochistan
 
linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...
KavyasriPuttamreddy
 
Chi square test evidence based dentistry
Chi square test evidence based dentistryChi square test evidence based dentistry
Chi square test evidence based dentistry
PiyushJain163909
 
Chi‑square test
Chi‑square test Chi‑square test
Chi‑square test
Ramachandra Barik
 
Chapter 15
Chapter 15 Chapter 15
Chapter 15
Tuul Tuul
 
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docxSection 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
kenjordan97598
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
Vikash Keshri
 
Chi squared test
Chi squared testChi squared test
Chi squared test
vikas232190
 
Unit 3
Unit 3Unit 3
Qm 0809
Qm 0809 Qm 0809
Qm 0809
8430025979
 
Non parametric-tests
Non parametric-testsNon parametric-tests
Non parametric-tests
Asmita Bhagdikar
 
Chisquared test.pptx
Chisquared test.pptxChisquared test.pptx
Chisquared test.pptx
Krishna Krish Krish
 
Data Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptxData Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptx
subhashchandra197
 
This is my statistics exam I need help I have been lost this whole s.docx
This is my statistics exam I need help I have been lost this whole s.docxThis is my statistics exam I need help I have been lost this whole s.docx
This is my statistics exam I need help I have been lost this whole s.docx
divinapavey
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
ShifnaRahman
 
Chi Square Goodness of fit test 2021 (2).pptx
Chi Square Goodness of fit test 2021 (2).pptxChi Square Goodness of fit test 2021 (2).pptx
Chi Square Goodness of fit test 2021 (2).pptx
lushomo3
 

Similar to Chi-square test.pptx (20)

chi sqare test.ppt
chi sqare test.pptchi sqare test.ppt
chi sqare test.ppt
 
chapter18.ppt
chapter18.pptchapter18.ppt
chapter18.ppt
 
Chi sqaure test
Chi sqaure testChi sqaure test
Chi sqaure test
 
Hypothsis testing
Hypothsis testingHypothsis testing
Hypothsis testing
 
linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...
 
Chi square test evidence based dentistry
Chi square test evidence based dentistryChi square test evidence based dentistry
Chi square test evidence based dentistry
 
Chi‑square test
Chi‑square test Chi‑square test
Chi‑square test
 
Chapter 15
Chapter 15 Chapter 15
Chapter 15
 
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docxSection 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
Section 9 Chi Square and ANOVA Tests Rhonda Knehans Dr.docx
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 
Unit 3
Unit 3Unit 3
Unit 3
 
Qm 0809
Qm 0809 Qm 0809
Qm 0809
 
Non parametric-tests
Non parametric-testsNon parametric-tests
Non parametric-tests
 
Chisquared test.pptx
Chisquared test.pptxChisquared test.pptx
Chisquared test.pptx
 
Data Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptxData Science : Unit-I -Hypothesis and Inferences.pptx
Data Science : Unit-I -Hypothesis and Inferences.pptx
 
This is my statistics exam I need help I have been lost this whole s.docx
This is my statistics exam I need help I have been lost this whole s.docxThis is my statistics exam I need help I have been lost this whole s.docx
This is my statistics exam I need help I have been lost this whole s.docx
 
Chi square mahmoud
Chi square mahmoudChi square mahmoud
Chi square mahmoud
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
Chi Square Goodness of fit test 2021 (2).pptx
Chi Square Goodness of fit test 2021 (2).pptxChi Square Goodness of fit test 2021 (2).pptx
Chi Square Goodness of fit test 2021 (2).pptx
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 

Chi-square test.pptx

  • 1. ACHARYA NARENDRA DEVA UNIVERSITY OF AGRICULTURE & TECHNOLOGY, KUMARGANJ, AYODHYA (U.P.) 224229 Assignment on Chi-square test Course No : STAT-502 4(3+1) Course name : Statistical methods for applied sciences Presented to : Presented by : Dr. Vishal Mehta Vikas Yadav Assistant Professor Id. No. A-11153/19/22 Department of Agril. Statistics Ph. D. 1st Semester Soil Science and Agril. Chemistry
  • 2. Content: • Introduction • Properties of Chi-square test • Limitations of Chi-square test • Type of Chi-square test 1. Chi-square test for goodness of fit 2. Chi-square test for independence • Chi-square test for goodness of fit: Example • Chi-square test for independence: Example • References
  • 3. Introduction: • Chi-square (χ2) test is a statistical method used to determine whether there is a significant difference between the observed and expected frequencies in one or more categories. It is commonly used in various fields such as medical research, social sciences, and business to test the goodness of fit, independence, and association.
  • 4. Properties The chi-square test has the following significant properties: 1.If you multiply the number of degrees of freedom by two, you will receive an answer that is equal to the variance. 2.The chi-square distribution curve approaches the data is normally distributed as the degree of freedom increases. 3.The mean distribution is equal to the number of degrees of freedom
  • 5. Limitations of Chi-Square Test There are two limitations to using the chi-square test that you should be aware of. • The chi-square test, for starters, is extremely sensitive to sample size. Even insignificant relationships can appear statistically significant when a large enough sample is used. Keep in mind that "statistically significant" does not always imply "meaningful" when using the chi-square test.
  • 6. • Be mindful that the chi-square can only determine whether two variables are related. It does not necessarily follow that one variable has a causal relationship with the other. It would require a more detailed analysis to establish causality
  • 7. Types of chi-square test 1. Chi-square test for goodness of fit 2. Chi-square test for independence
  • 8. Chi-square test for goodness of fit: • Chi-square test for goodness of fit is used to determine whether the observed data follow a certain distribution. For example, if we want to know whether the observed data follow a normal distribution or not, we can use the chi-square test for goodness of fit. The test compares the observed frequencies with the expected frequencies based on the null hypothesis.
  • 9. Chi-Square Goodness of Fit Test: Formula • A Chi-Square goodness of fit test uses the following null and alternative hypotheses • H0: (null hypothesis) A variable follows a hypothesized distribution. • H1: (alternative hypothesis) A variable does not follow a hypothesized distribution.
  • 10. We use the following formula to calculate the Chi-Square test statistic X2: X2 = Σ(O-E)2 / E where: Σ: is a fancy symbol that means “sum” O: observed value E: expected value
  • 11. Chi-Square test Goodness of Fit Test: Example • A shop owner claims that an equal number of customers come into his shop each weekday. To test this hypothesis, an independent researcher records the number of customers that come into the shop on a given week and finds the following: • Monday: 50 customers • Tuesday: 60 customers • Wednesday: 40 customers • Thursday: 47 customers • Friday: 53 customers
  • 12. Solution: We will use the following steps to perform a Chi-Square goodness of fit test to determine if the data is consistent with the shop owner’s claim Step 1: Define the hypotheses. We will perform the Chi-Square goodness of fit test using the following hypotheses: • H0: An equal number of customers come into the shop each day. • H1: An equal number of customers do not come into the shop each day.
  • 13. Step 2: Calculate (O-E)2 / E for each day. There were a total of 250 customers that came into the shop during the week. Thus, if we expected an equal amount to come in each day then the expected value “E” for each day would be 50 • Monday: (50-50)2 / 50 = 0 • Tuesday: (60-50)2 / 50 = 2 • Wednesday: (40-50)2 / 50 = 2 • Thursday: (47-50)2 / 50 = 0.18 • Friday: (53-50)2 / 50 = 0.18
  • 14. • Step 3: Calculate the test statistic X2. X2 = Σ(O-E)2 / E = 0 + 2 + 2 + 0.18 + 0.18 = 4.36 • Step 4: Calculate the p-value of the test statistic X2. The p-value associated with X2 = 4.36 and n-1 = 5-1 = 4 degrees of freedom is 0.359472
  • 15. Conclusion • Since this p-value is not less than 0.05, we fail to reject the null hypothesis. This means we do not have sufficient evidence to say that the true distribution of customers is different from the distribution that the shop owner claimed
  • 16. Chi-square test for independence: Chi-square test for independence is used to determine whether there is a relationship between two variables. For example, if we want to know whether there is a relationship between smoking and lung cancer, we can use the chi-square test for independence. The test compares the observed frequencies with the expected frequencies based on the null hypothesis that there is no relationship between the two variables.
  • 17. Assumptions • Both variables are CATEGORICAL • Observations are INDEPENDENT • The COUNT for each category is GREATER THAN 5 • Each count in a category is MUTUALLY EXCLUSIVE • Data is chosen RANDOMLY
  • 18. Chi-Square test for independence : Example • We want to see if age has an impact on what political party you vote for. We collect a random sample of 135 people and display it in the following contingency table broken down by age and political party.
  • 19.
  • 20. Solution Hypothesis Lets start by stating our hypotheses: • H_0: Age has no impact on the political party you vote for. The two variables are independent. • H_1: Age does have an impact on the political party. The two variables are dependent.
  • 21. Significance Level and Critical Value For this example we will use a 5% significance level. As we have 2 degrees of freedom (using the formula above): v = (3 - 1) (2 - 1) = 2 Using the significance level, degrees of freedom and Chi- Square probability table we find our critical value to be 5.991. This means our Chi-Square statistic needs to be greater than 5.991 in order for us to reject the null hypothesis and the variables to not be independent
  • 22. • Calculating Expected Counts We now need to determine the expected count frequency for each cell in our contingency table. These are the expected values if the null hypothesis is true and is calculated using the following formula: Er,c = nr*nc /nT Where n_r and n_c are the row and column totals for certain categories and n_T is the total number of counts. For example, the expected count for ages 18–30 who voted Liberals is: E1,1 = 35*90/135 = 23.3 We can then populate the contingency table with these expected values (in brackets):
  • 23.
  • 24. Chi-Square Statistic It is now time to calculate the Chi-Square statistic using the formula above Χ2 2=(10-23.3)2/23.3 + (30-30)2/30 + (50–36.7)2/36.7 + (25–11.7)2/11.7 + (15–15)2/15 + (5–18.3)2/18.3 This equals 37.2 Therefore, our statistic is much greater than the critical value and so we can reject the null hypothesis
  • 25. Conclusion In this article we have described and shown an example of the Chi-Square test of independence. This test measures if two categorical variables are dependent on each-other. This is used in Data Science for Feature Selection where we only want modelling features that have an effect on the target.
  • 26. References 1. Wikipedia 2. www.towarddatascience.com 3. www.statology.org 4. Agresti, A. (2018). An introduction to categorical data analysis. Wiley. 5. Kothari, C. R. (2004). Research methodology: methods and techniques. New Age International.