Chi-SquareTest and Relationship
BetweenVariables
Lecturer: NyamuWaweru
Punnett Square - Inheritance Example:
• If two carriers (both with sickle cell trait, genotype
AS) have a child
A (from Father)
S (from Father)
A (from Mother) AA AS
S (from Mother) AS SS
•Probability of a child with Sickle Cell Disease (SS): 1/4 or 25%
•Probability of a child with Sickle Cell Trait (AS): 1/2 or 50%
•Probability of a normal child (AA): 1/4 or 25%
Sickle Cell Disease, Inheritance, and Statistical Analysis
• Sickle Cell Disease (SCD) Inheritance Pattern
• 1. Genetic Basis:
– Cause: A point mutation in the β-globin gene (on chromosome 11)
– Effect: This mutation leads to a single amino acid substitution (glutamic acid valine) in the
→
hemoglobin protein
– This produces Hemoglobin S (HbS) instead of the normal Hemoglobin A (HbA)
– Consequence: Under low oxygen conditions, HbS polymerizes, causing red blood cells to
deform into a characteristic "sickle" shape
– This leads to hemolytic anemia, vaso-occlusive crises, and organ damage
• 2. Inheritance Pattern:Autosomal Recessive
– Alleles Involved:
• A: Normal allele (codes for HbA)
• S: Mutant sickle allele (codes for HbS)
– Genotypes and Phenotypes:
• AA (Homozygous Dominant): Normal phenotype. No sickle cell disease or trait
• AS (Heterozygous): Sickle CellTrait. Individuals are generally healthy carriers
• They produce both HbA and HbS.They are resistant to severe malaria (Plasmodium
falciparum), which explains the high allele frequency in malaria-endemic regions (an
example of balanced polymorphism)
• SS (Homozygous Recessive): SCD Individuals have the full-blown disease
Chi-square test statistics = X2
• The Chi-square ( ²) test
χ is a non-parametric statistical test
used to determine whether there is a significant association
between two categorical variables
• In health and genetics, it helps determine whether observed
outcomes differ from expected outcomes due to chance or
because of a true relationship between variables
• Chi is a letter of the Greek alphabet; the symbol is and it's
χ
pronounced like KYE, the sound in "kite."
• The chi square test uses the statistic chi squared, written 2
χ
• The "test" that uses this statistic helps an investigator determine
whether an observed set of results matches an expected outcome
• In some types of research (genetics provides many examples) there
may be a theoretical basis for expecting a particular result- not a
guess, but a predicted outcome based on a sound theoretical
foundation
Purpose of Chi-SquareTest in Health Research
• The Chi-square test is a statistical hypothesis test used to
determine if there is a significant association
(relationship) between two categorical variables in a sample
• It assesses whether observed differences in frequencies are due
to chance or reflect a real relationship in the population
• To test independence between two categorical variables
– Is Sickle Cell disease status associated with gender?
• To test goodness of fit—how well observed data fit expected
ratios
– Do Sickle Cell inheritance patterns follow Mendelian ratios?
• To test association between exposure and outcome
– Is malaria protection related to presence of Sickle Cell trait?
Chance Factor in aTrial/ experiment
• Chance (tossing a coin)
– We need to consider for a moment what might cause the
observed outcome to differ from the expected outcome
– You know what all the possible outcomes are (only two: head and
tail), and you know what the probability of each is
– However, in any single trial (toss) you can't say what the outcome
will be
– Why… because of the element of chance, which is a random
factor. Saying that chance is a random factor just means that you
can't control it
– But it's there every time you flip that coin
– Chance is a factor that must always be considered; it's often
present but not recognized
– Since it may affect experimental work, it must be taken into
account when results are interpreted
Inherent and Error Factors in aTrial/experiment
• Inherent
– What else might cause an observed outcome to differ from the expected? Suppose that
at your last physical exam, your doctor told you that your resting pulse rate was 60 (per
minute) and that that's good, that's normal for you
– When you measure it yourself later you find it's 58 at one moment, 63 ten minutes
later, 57 ten minutes later
– Why isn't it the same every time, and why isn't it 60 every time?When measurements
involving living organisms are under study, there will always be the element of inherent
variability
– Your resting pulse rate may vary a bit, but it's consistently about 60, and those slightly
different values are still normal
• Error
– In addition to these factors, there's the element of error.You've done enough lab work
already to realize that people introduce error into experimental work in performing
steps of procedures and in making measurements
– Instruments, tools, implements themselves may have built-in limitations that contribute
to error
– Putting all of these factors together, it's not hard to see how an observed result may
differ a bit from an expected result
– But these small departures from expectation are not significant departures
– That is, we don't regard the small differences observed as being important
Key Components of Chi-Square :
• Null Hypothesis (H ):
₀
– There is no association between the two variables
– They are independent
• Alternative Hypothesis (H ):
₁
• There is an association between the two variables
• CategoricalVariables:
• Data is organized in a contingency table (e.g., 2x2 table)
• Variables that take categories instead of numerical values (e.g., genotype type, disease presence,
gender)
• Observed Frequencies (O):
• The actual counts collected from the data
• The actual count found in the sample or experiment
• Expected Frequencies (E):
• The counts we would expect to see if the null hypothesis were true
• The count that would be expected if there were no association (based on theory or probability)
• Formula for expected frequency in a cell: E = (Row Total × Column Total) / Grand Total
• p-value
• Probability of obtaining the observed results by chance. If p < 0.05, the result is statistically
significant
• Degrees of Freedom (df)
• Calculated as (rows 1) × (columns 1)
− −
Steps to Perform theTest:
• State the Hypotheses
– Null (H ):
₀ No association between variables
– Alternative (H ):
₁ There is an association between variables
• Collect andTabulate Data (Construct a ContingencyTable with observed frequencies)
• Determine the Degrees of Freedom (df): df = (number of rows - 1) * (number of columns - 1)
• Compare ² value with critical value (Find the p-value
χ using the ² value and df (from a ²
χ χ
distribution table or software)
– If ² calculated > ² critical Reject H
χ χ ₀
→
– If ² calculated < ² critical Fail to reject H
χ χ ₀
→
• Make a Decision: Interpret Results
– If p-value ≤ significance level ( , usually 0.05)
α , reject H₀.The association is statistically
significant
– If p-value > α, fail to reject H₀.There is not enough evidence to conclude a significant
association
Calculation of chi square
• The formula for calculating 2 is: 2 = [(o - e) 2 / e], where "o" is observed and "e" is
χ χ Σ
expected
• The sigma symbol, , means "sum of what follows."
Σ
• For each category (type or group such as "heads") of outcome that is possible, we would
have an expected value and an observed value (for the number of heads and the number of
tails, e.g.)
• For each one of those categories (outcomes) we would calculate the quantity (o - e) 2
/e and
then add them for all the categories, which was two in the coin toss example (head category
and tail category)
• It is convenient to organize the data in table form, as shown below for two coin toss
experiments
Example: Sickle Cell Disease Inheritance Pattern
• Sickle Cell Disease (SCD) follows Mendelian inheritance involving
two alleles:
– A = Normal hemoglobin gene
– S = Sickle cell gene
• When two carriers (AS × AS) have children, the expected
Mendelian ratio of genotypes is:
Genotype Expected Ratio Expected %
AA 1 25%
AS 2 50%
SS 1 25%
Total 4 100%
Example: Sickle Cell Disease Inheritance Pattern
• Observed Data Example
– Let’s say in a hospital genetics clinic, 160 children of
carrier parents (AS × AS) were tested:
Genotype Observed (O) Expected %
Expected (E) = 160
× %
AA 48 25% 40
AS 80 50% 80
SS 32 25% 40
Total 160 100% 160
Example: Sickle Cell Disease Inheritance Pattern
Example #2
• Scenario:A researcher wants to know if the prevalence of Sickle CellTrait (a
categorical variable:Yes/No) is associated with a history of severe malaria in
childhood (another categorical variable:Yes/No) in a population
• 1. Hypotheses:
• H :
₀ There is no association between sickle cell trait status and history of severe malaria.
(They are independent)
• H :
₁ There is an association between sickle cell trait status and history of severe malaria
• 2. Data Collection (Observed Frequencies):
A sample of 400 individuals from a malaria-endemic region is surveyed
History of Severe
Malaria: YES
History of Severe
Malaria: NO
Row Total
Sickle Cell
Trait: YES
30 170 200
Sickle Cell
Trait: NO
80 120 200
Column Total 110 290 400
Example #2 cont’’
• 3. Calculate Expected Frequencies (E):
• Expected for (Trait=YES, Malaria=YES):
– (200 * 110) / 400 = 55
• Expected for (Trait=YES, Malaria=NO):
– (200 * 290) / 400 = 145
• Expected for (Trait=NO, Malaria=YES):
– (200 * 110) / 400 = 55
• Expected for (Trait=NO, Malaria=NO):
– (200 * 290) / 400 = 145
Example #2 cont’’
Example #2 cont’’
• 6. Find the p-value and Interpret:
– For df=1, a ² value of
χ 31.34 is highly significant (p-value < 0.001).
– Interpretation: Since the p-value is much less than 0.05, we reject the null hypothesis.
– There is a statistically significant association between having sickle cell trait and a history of
severe malaria.
• 7. Epidemiological Conclusion:
– Looking at the observed data, individuals with the sickle cell trait (AS) were less likely to have
a history of severe malaria (30/200 = 15%) compared to those without the trait (80/200 =
40%).
– The chi-square test confirms that this observed protective effect is statistically significant and
unlikely to be due to random chance
– This supports the established theory of the heterozygote advantage against malaria
• Summary of Key Points
– SCD Inheritance: Autosomal recessive. Carriers (AS) have the sickle cell trait and are
generally healthy but are resistant to severe malaria
– Chi-SquareTest: A tool to test for an association between two categorical variables
– Application: Used in epidemiology to validate observed relationships, such as the link
between sickle cell trait and malaria resistance, by providing a p-value to judge statistical
significance
– Interpretation: A low p-value (≤ 0.05) indicates that the relationship observed in the
sample data is likely to exist in the broader population
Types of Chi-SquareTests
Type Purpose Example
Goodness of Fit
Test
To test if observed data fit a
theoretical distribution
Sickle Cell inheritance ratios (AA,
AS, SS)
Test of
Independence
To test association between two
categorical variables
Association between gender and
Sickle Cell status
HomogeneityTest
To test if two populations have
the same distribution of a
variable
Comparing genotype distribution
between two ethnic groups
Interpretation in Health Context
• How do you interpret it? What do the results look like?
– Usually, the higher the chi-square statistic, the greater likelihood
the finding is significant, but you must look at the corresponding
p-value to determine significance
• In our Sickle Cell example:
– ² = 3.2 not significant inheritance follows Mendelian law
χ → →
• In public health example:
– If ² shows significance between gender and HIV status gender
χ →
influences infection risk
Selection of critical value of chi square
• Having calculated a 2 value for the data in experiment #2, we now need to evaluate that 2
χ χ
value
• To do so we must compare our calculated 2 with the appropriate critical value of 2 from the
χ χ
table shown on the slide 22 here
– All of these critical values in the table have been predetermined by statisticians
• To select a value from the table, we need to know 2 things:
– 1.The number of degrees of freedom.That is one less than the number of categories (groups)
we have. For our inheritance SCT experiment that is 2 groups - 1 = 1. So our critical value of
2 will be in the first row of the table.
χ
– 2.The probability value, which reflects the degree of confidence we want to have in our
interpretation
– The column headings 0.05 and 0.01 correspond to probabilities, or confidence levels
– 0.05 means that when we draw our conclusion, we may be 95% confident that we have drawn
the correct conclusion
– That shows that we can't be certain; there would still be a 5% probability of drawing the
wrong conclusion.
– But 95% is very good. 0.01 would give us 99% confidence, only a 1% likelihood of drawing the
wrong conclusion
– We will now agree that, unless told otherwise, we will always use the 0.05 probability column
(95% confidence level)
– For 1 degree of freedom, in our coin toss experiment, the table χ2 value is 3.84. We compare
the calculated χ2 (0.36) to that
The interpretation
• In every 2-test the calculated 2 value will either be
χ χ
– (i) less than or equal to the critical 2 value OR
χ
– (ii) greater that the critical 2 value
χ
• If calculated 2 ≤ critical 2 , then we conclude that there is no statistically significant
χ χ
difference between the two distributions
• That is, the observed results are not significantly different from the expected results, and the
numerical difference between observed and expected can be attributed to chance
• If calculated 2 > critical 2 , then we conclude that there is a statistically significant
χ χ
difference between the two distributions
• That is, the observed results are significantly different from the expected results, and the
numerical difference between observed and expected can not be attributed to chance
• That means that the difference found is due to some other factor
• This test won't identify that other factor, only that there is some factor other than chance
responsible for the difference between the two distributions
• That much difference cannot be attributed to chance.We may be 95% confident that
something else, some other factor, caused the difference.
• The 2 -test won't identify that other factor, only that there is some factor other than
χ
chance responsible for the difference between the two distributions
Hypothesis testing
• Biostatisticians formally describe what we've just done in terms of testing a hypothesis
• This process begins with stating the "null hypothesis.“
• The null hypothesis says that the difference found between observed distribution and
expected distribution is not significant, i.e. that the difference is just due to random chance
• Then we use the 2 -test to test the validity of that null hypothesis. If calculated 2 ≤
χ χ
critical 2 , then we accept the null hypothesis
χ
• That means that the two distributions are not significantly different, that the difference we
see is due to chance, not some other factor
• On the other hand, if calculated 2 > critical 2 , then we reject the null hypothesis.
χ χ
• That means that the two distributions are significantly different, that the difference we see
is not due to chance alone
• Note this well:
– In performing the chi squared test in this course, it is not sufficient in your
interpretation to say "accept null hypothesis" or "reject null hypothesis."
– You will be expected to fully state whether the distributions being compared are
significantly different or not and whether the difference is due to chance alone or
other factors
Application of Chi-Square in Epidemiological Data
• Chi-square tests are widely applied in public
health to test associations between
categorical variables, such as:
Application Area Example QuestionTested with Chi-Square
Genetics Is Sickle Cell trait related to malaria resistance?
Epidemiology Is gender associated with HIV infection rates?
Health Programs Does vaccination status affect infection prevalence?
Clinical Research Is there an association between blood group and disease outcome?

Chi-Square Test and Relationship Between Variables.pptx

  • 1.
  • 2.
    Punnett Square -Inheritance Example: • If two carriers (both with sickle cell trait, genotype AS) have a child A (from Father) S (from Father) A (from Mother) AA AS S (from Mother) AS SS •Probability of a child with Sickle Cell Disease (SS): 1/4 or 25% •Probability of a child with Sickle Cell Trait (AS): 1/2 or 50% •Probability of a normal child (AA): 1/4 or 25%
  • 3.
    Sickle Cell Disease,Inheritance, and Statistical Analysis • Sickle Cell Disease (SCD) Inheritance Pattern • 1. Genetic Basis: – Cause: A point mutation in the β-globin gene (on chromosome 11) – Effect: This mutation leads to a single amino acid substitution (glutamic acid valine) in the → hemoglobin protein – This produces Hemoglobin S (HbS) instead of the normal Hemoglobin A (HbA) – Consequence: Under low oxygen conditions, HbS polymerizes, causing red blood cells to deform into a characteristic "sickle" shape – This leads to hemolytic anemia, vaso-occlusive crises, and organ damage • 2. Inheritance Pattern:Autosomal Recessive – Alleles Involved: • A: Normal allele (codes for HbA) • S: Mutant sickle allele (codes for HbS) – Genotypes and Phenotypes: • AA (Homozygous Dominant): Normal phenotype. No sickle cell disease or trait • AS (Heterozygous): Sickle CellTrait. Individuals are generally healthy carriers • They produce both HbA and HbS.They are resistant to severe malaria (Plasmodium falciparum), which explains the high allele frequency in malaria-endemic regions (an example of balanced polymorphism) • SS (Homozygous Recessive): SCD Individuals have the full-blown disease
  • 4.
    Chi-square test statistics= X2 • The Chi-square ( ²) test χ is a non-parametric statistical test used to determine whether there is a significant association between two categorical variables • In health and genetics, it helps determine whether observed outcomes differ from expected outcomes due to chance or because of a true relationship between variables • Chi is a letter of the Greek alphabet; the symbol is and it's χ pronounced like KYE, the sound in "kite." • The chi square test uses the statistic chi squared, written 2 χ • The "test" that uses this statistic helps an investigator determine whether an observed set of results matches an expected outcome • In some types of research (genetics provides many examples) there may be a theoretical basis for expecting a particular result- not a guess, but a predicted outcome based on a sound theoretical foundation
  • 5.
    Purpose of Chi-SquareTestin Health Research • The Chi-square test is a statistical hypothesis test used to determine if there is a significant association (relationship) between two categorical variables in a sample • It assesses whether observed differences in frequencies are due to chance or reflect a real relationship in the population • To test independence between two categorical variables – Is Sickle Cell disease status associated with gender? • To test goodness of fit—how well observed data fit expected ratios – Do Sickle Cell inheritance patterns follow Mendelian ratios? • To test association between exposure and outcome – Is malaria protection related to presence of Sickle Cell trait?
  • 6.
    Chance Factor inaTrial/ experiment • Chance (tossing a coin) – We need to consider for a moment what might cause the observed outcome to differ from the expected outcome – You know what all the possible outcomes are (only two: head and tail), and you know what the probability of each is – However, in any single trial (toss) you can't say what the outcome will be – Why… because of the element of chance, which is a random factor. Saying that chance is a random factor just means that you can't control it – But it's there every time you flip that coin – Chance is a factor that must always be considered; it's often present but not recognized – Since it may affect experimental work, it must be taken into account when results are interpreted
  • 7.
    Inherent and ErrorFactors in aTrial/experiment • Inherent – What else might cause an observed outcome to differ from the expected? Suppose that at your last physical exam, your doctor told you that your resting pulse rate was 60 (per minute) and that that's good, that's normal for you – When you measure it yourself later you find it's 58 at one moment, 63 ten minutes later, 57 ten minutes later – Why isn't it the same every time, and why isn't it 60 every time?When measurements involving living organisms are under study, there will always be the element of inherent variability – Your resting pulse rate may vary a bit, but it's consistently about 60, and those slightly different values are still normal • Error – In addition to these factors, there's the element of error.You've done enough lab work already to realize that people introduce error into experimental work in performing steps of procedures and in making measurements – Instruments, tools, implements themselves may have built-in limitations that contribute to error – Putting all of these factors together, it's not hard to see how an observed result may differ a bit from an expected result – But these small departures from expectation are not significant departures – That is, we don't regard the small differences observed as being important
  • 8.
    Key Components ofChi-Square : • Null Hypothesis (H ): ₀ – There is no association between the two variables – They are independent • Alternative Hypothesis (H ): ₁ • There is an association between the two variables • CategoricalVariables: • Data is organized in a contingency table (e.g., 2x2 table) • Variables that take categories instead of numerical values (e.g., genotype type, disease presence, gender) • Observed Frequencies (O): • The actual counts collected from the data • The actual count found in the sample or experiment • Expected Frequencies (E): • The counts we would expect to see if the null hypothesis were true • The count that would be expected if there were no association (based on theory or probability) • Formula for expected frequency in a cell: E = (Row Total × Column Total) / Grand Total • p-value • Probability of obtaining the observed results by chance. If p < 0.05, the result is statistically significant • Degrees of Freedom (df) • Calculated as (rows 1) × (columns 1) − −
  • 9.
    Steps to PerformtheTest: • State the Hypotheses – Null (H ): ₀ No association between variables – Alternative (H ): ₁ There is an association between variables • Collect andTabulate Data (Construct a ContingencyTable with observed frequencies) • Determine the Degrees of Freedom (df): df = (number of rows - 1) * (number of columns - 1) • Compare ² value with critical value (Find the p-value χ using the ² value and df (from a ² χ χ distribution table or software) – If ² calculated > ² critical Reject H χ χ ₀ → – If ² calculated < ² critical Fail to reject H χ χ ₀ → • Make a Decision: Interpret Results – If p-value ≤ significance level ( , usually 0.05) α , reject H₀.The association is statistically significant – If p-value > α, fail to reject H₀.There is not enough evidence to conclude a significant association
  • 10.
    Calculation of chisquare • The formula for calculating 2 is: 2 = [(o - e) 2 / e], where "o" is observed and "e" is χ χ Σ expected • The sigma symbol, , means "sum of what follows." Σ • For each category (type or group such as "heads") of outcome that is possible, we would have an expected value and an observed value (for the number of heads and the number of tails, e.g.) • For each one of those categories (outcomes) we would calculate the quantity (o - e) 2 /e and then add them for all the categories, which was two in the coin toss example (head category and tail category) • It is convenient to organize the data in table form, as shown below for two coin toss experiments
  • 11.
    Example: Sickle CellDisease Inheritance Pattern • Sickle Cell Disease (SCD) follows Mendelian inheritance involving two alleles: – A = Normal hemoglobin gene – S = Sickle cell gene • When two carriers (AS × AS) have children, the expected Mendelian ratio of genotypes is: Genotype Expected Ratio Expected % AA 1 25% AS 2 50% SS 1 25% Total 4 100%
  • 12.
    Example: Sickle CellDisease Inheritance Pattern • Observed Data Example – Let’s say in a hospital genetics clinic, 160 children of carrier parents (AS × AS) were tested: Genotype Observed (O) Expected % Expected (E) = 160 × % AA 48 25% 40 AS 80 50% 80 SS 32 25% 40 Total 160 100% 160
  • 13.
    Example: Sickle CellDisease Inheritance Pattern
  • 15.
    Example #2 • Scenario:Aresearcher wants to know if the prevalence of Sickle CellTrait (a categorical variable:Yes/No) is associated with a history of severe malaria in childhood (another categorical variable:Yes/No) in a population • 1. Hypotheses: • H : ₀ There is no association between sickle cell trait status and history of severe malaria. (They are independent) • H : ₁ There is an association between sickle cell trait status and history of severe malaria • 2. Data Collection (Observed Frequencies): A sample of 400 individuals from a malaria-endemic region is surveyed History of Severe Malaria: YES History of Severe Malaria: NO Row Total Sickle Cell Trait: YES 30 170 200 Sickle Cell Trait: NO 80 120 200 Column Total 110 290 400
  • 16.
    Example #2 cont’’ •3. Calculate Expected Frequencies (E): • Expected for (Trait=YES, Malaria=YES): – (200 * 110) / 400 = 55 • Expected for (Trait=YES, Malaria=NO): – (200 * 290) / 400 = 145 • Expected for (Trait=NO, Malaria=YES): – (200 * 110) / 400 = 55 • Expected for (Trait=NO, Malaria=NO): – (200 * 290) / 400 = 145
  • 17.
  • 18.
    Example #2 cont’’ •6. Find the p-value and Interpret: – For df=1, a ² value of χ 31.34 is highly significant (p-value < 0.001). – Interpretation: Since the p-value is much less than 0.05, we reject the null hypothesis. – There is a statistically significant association between having sickle cell trait and a history of severe malaria. • 7. Epidemiological Conclusion: – Looking at the observed data, individuals with the sickle cell trait (AS) were less likely to have a history of severe malaria (30/200 = 15%) compared to those without the trait (80/200 = 40%). – The chi-square test confirms that this observed protective effect is statistically significant and unlikely to be due to random chance – This supports the established theory of the heterozygote advantage against malaria • Summary of Key Points – SCD Inheritance: Autosomal recessive. Carriers (AS) have the sickle cell trait and are generally healthy but are resistant to severe malaria – Chi-SquareTest: A tool to test for an association between two categorical variables – Application: Used in epidemiology to validate observed relationships, such as the link between sickle cell trait and malaria resistance, by providing a p-value to judge statistical significance – Interpretation: A low p-value (≤ 0.05) indicates that the relationship observed in the sample data is likely to exist in the broader population
  • 19.
    Types of Chi-SquareTests TypePurpose Example Goodness of Fit Test To test if observed data fit a theoretical distribution Sickle Cell inheritance ratios (AA, AS, SS) Test of Independence To test association between two categorical variables Association between gender and Sickle Cell status HomogeneityTest To test if two populations have the same distribution of a variable Comparing genotype distribution between two ethnic groups
  • 20.
    Interpretation in HealthContext • How do you interpret it? What do the results look like? – Usually, the higher the chi-square statistic, the greater likelihood the finding is significant, but you must look at the corresponding p-value to determine significance • In our Sickle Cell example: – ² = 3.2 not significant inheritance follows Mendelian law χ → → • In public health example: – If ² shows significance between gender and HIV status gender χ → influences infection risk
  • 21.
    Selection of criticalvalue of chi square • Having calculated a 2 value for the data in experiment #2, we now need to evaluate that 2 χ χ value • To do so we must compare our calculated 2 with the appropriate critical value of 2 from the χ χ table shown on the slide 22 here – All of these critical values in the table have been predetermined by statisticians • To select a value from the table, we need to know 2 things: – 1.The number of degrees of freedom.That is one less than the number of categories (groups) we have. For our inheritance SCT experiment that is 2 groups - 1 = 1. So our critical value of 2 will be in the first row of the table. χ – 2.The probability value, which reflects the degree of confidence we want to have in our interpretation – The column headings 0.05 and 0.01 correspond to probabilities, or confidence levels – 0.05 means that when we draw our conclusion, we may be 95% confident that we have drawn the correct conclusion – That shows that we can't be certain; there would still be a 5% probability of drawing the wrong conclusion. – But 95% is very good. 0.01 would give us 99% confidence, only a 1% likelihood of drawing the wrong conclusion – We will now agree that, unless told otherwise, we will always use the 0.05 probability column (95% confidence level) – For 1 degree of freedom, in our coin toss experiment, the table χ2 value is 3.84. We compare the calculated χ2 (0.36) to that
  • 22.
    The interpretation • Inevery 2-test the calculated 2 value will either be χ χ – (i) less than or equal to the critical 2 value OR χ – (ii) greater that the critical 2 value χ • If calculated 2 ≤ critical 2 , then we conclude that there is no statistically significant χ χ difference between the two distributions • That is, the observed results are not significantly different from the expected results, and the numerical difference between observed and expected can be attributed to chance • If calculated 2 > critical 2 , then we conclude that there is a statistically significant χ χ difference between the two distributions • That is, the observed results are significantly different from the expected results, and the numerical difference between observed and expected can not be attributed to chance • That means that the difference found is due to some other factor • This test won't identify that other factor, only that there is some factor other than chance responsible for the difference between the two distributions • That much difference cannot be attributed to chance.We may be 95% confident that something else, some other factor, caused the difference. • The 2 -test won't identify that other factor, only that there is some factor other than χ chance responsible for the difference between the two distributions
  • 24.
    Hypothesis testing • Biostatisticiansformally describe what we've just done in terms of testing a hypothesis • This process begins with stating the "null hypothesis.“ • The null hypothesis says that the difference found between observed distribution and expected distribution is not significant, i.e. that the difference is just due to random chance • Then we use the 2 -test to test the validity of that null hypothesis. If calculated 2 ≤ χ χ critical 2 , then we accept the null hypothesis χ • That means that the two distributions are not significantly different, that the difference we see is due to chance, not some other factor • On the other hand, if calculated 2 > critical 2 , then we reject the null hypothesis. χ χ • That means that the two distributions are significantly different, that the difference we see is not due to chance alone • Note this well: – In performing the chi squared test in this course, it is not sufficient in your interpretation to say "accept null hypothesis" or "reject null hypothesis." – You will be expected to fully state whether the distributions being compared are significantly different or not and whether the difference is due to chance alone or other factors
  • 25.
    Application of Chi-Squarein Epidemiological Data • Chi-square tests are widely applied in public health to test associations between categorical variables, such as: Application Area Example QuestionTested with Chi-Square Genetics Is Sickle Cell trait related to malaria resistance? Epidemiology Is gender associated with HIV infection rates? Health Programs Does vaccination status affect infection prevalence? Clinical Research Is there an association between blood group and disease outcome?