Upcoming SlideShare
×

# Chi square test final

7,348 views
6,815 views

Published on

apply it with spss

Published in: Health & Medicine, Technology
12 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
7,348
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
550
0
Likes
12
Embeds 0
No embeds

No notes for slide
• Data are measurements or observations that are collected as a source of informationA data unit is one entity (such as a person or business) in the population being studied, about which data are collected.An observation is an occurrence of a specific data item that is recorded about a data unit.A dataset is a complete collection of all observations.
• History of statistics can be said to start around 1749 although, over time, there have been changes to the interpretation of the word statistics.Philosophical Magazine, Series 5, Vol 50, pp. 157-175.The Philosophical Magazine is one of the oldest scientific journals published in English. It was established by Alexander Tilloch in 1798
• (These column and row totals are also called marginal frequencies.)the number of observations expected to occur by chance—the expected frequencies.
• no relationship exists between theIf a relationship (or dependency) does occur, the observed frequencies will vary quite a bit from the expected frequencies, and the value of the chi-square statistic will be large.If &gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;&gt;. column and row variable
• Calculate degree of freedom
• If β is set at 0.10, then the investigator has decided that he is willing to accept a 10% chance of missing an association of a given effect size between Tamiflu and psychosis. This represents a power of 0.90, i.e., a 90% chance of finding an association of that size. For example, suppose that there really would be a 30% increase in psychosis incidence if the entire population took Tamiflu. Then 90 times out of 100, the investigator would observe an effect of that size or larger in his study. This does not mean, however, that the investigator will be absolutely unable to detect a smaller effect; just that he will have less than 90% likelihood of doing so.Many studies set alpha at 0.05 and beta at 0.20 (a power of 0.80). These are somewhat arbitrary values, and others are sometimes used; the conventional range for alpha is between 0.01 and 0.10; and for beta, between 0.05 and 0.20. In general the investigator should choose a low value of alpha when the research question makes it particularly important to avoid a type I (false-positive) error, and he should choose a low value of beta when it is especially important to avoid a type II error.
• The mean of the chi-square distribution is equal to thedegrees of freedom; therefore, as the degrees of freedom increase, the mean moves more to the rightA one-sided test or directional test is one in which the direction of departure from the null hypothesis has been specified in advance. This is only applicable in the case of a single degree of freedom test, since with more than one degree of freedom, there is more than just a single direction by which the results can depart from the null hypothesis. Thus all tests involving multiple degrees of freedom for an hypothesis are going to be two-sided or nondirectional. If the distribution being used in the test is symmetric, then one-sided corresponds with one-tailed. However, with distributions such as chi-squares and Fs, which are not symmetric, the standard tests use only one tail, but are two-sided, or nondirectional. The chi-square distribution with 1 df is the same as the distribution of the square of a Z or standard normal distribution. The F distribution with 1 numerator df is the same as the distribution of the square of a t with the same denominator df. Compare the critical values in tables or that you get from SPSS to verify these relationships. They show that a one-tailed chi-square test gives the same results as a two-tailed Z-test, and that a one-tailed F test gives the same results as a two-tailed t test; all are two-sided or nondirectiona
• Only upper-tailed values for χ2 because these are the values generally used in hypothesis testing
• Sum of probability is equal to 1
• Mcnemar – repeated measured design or matched in case control study
• A smaller value for χ2 means that the null hypothesis will not be rejected as often as it is with the larger, uncorrected chi-square; that is, it is more conservative. Thus, the risk of a type I error (rejecting the null hypothesis when it is true) is smaller; however, the risk of a type II error (not rejecting the null hypothesis when it is false and should be rejected) then increases
• ### Chi square test final

1. 1. CHI SQUARE TEST DR HAR ASHISH JINDAL JR
2. 2. Contents • • • • • • • • • • Definitions Milestone in Statistics Chi square test Chi Square test Goodness of Fit Chi square test for homogeneity of Proportion Chi Square Independent test Limitation of Chi square Fischer Exact test Continuity correction Overuse of chi square
3. 3. Definitions • Statistics defined as the science, which deals with collection, presentation, analysis and interpretation of data. • Biostatistics defined as application of statistical method to medical, biological and public health related problems.
4. 4. Statistics Descriptive Collecting Organizing Summarizing Presenting Data Inferential Making inference Hypothesis testing Chi Determining Square Test relationships Making predictions
5. 5. Introduction • Data : A collection of facts from which conclusions can be made. • An observations made on the subjects one after the other is called raw data – It becomes useful - when they are arranged and organized in a manner that we can extract information from the data and communicate it to others.
6. 6. Definitions • A variable is any characteristics, number, or quantity that can be measured or counted. – Independent variable: doesn’t changed by the other variables. E.g age – Dependent variable: depends on other factors e.g test score on time studied • Parameter: is any numerical quantity that characterizes a given population or some aspect of it. E.g mean
7. 7. Data Types DISCRETE Interval data QUANTITATIVE CONTINOUS Ratio data Data NOMINAL QUALITATIVE ORDINAL
8. 8. Qualitative Data • • Qualitative variables Example: gender (male, female) • Frequency in category • Nominal or ordinal scale • Examples – Do you have a disease? - nominal – What is the Socio economic status ? – ordinal
9. 9. MILESTONE IN STATISTICS • "Karl Pearson's famous chi-square paper appeared in the spring of 1900, an auspicious beginning to a wonderful century for the field of statistics." (published in the Philosophical magazine )
10. 10. Chi Square Test • Simplest & most widely used non-parametric test in statistical work.
11. 11. Logic of the chi-square • The total number of observations in each column and the total number of observations in each row are considered to be given or fixed. • If we assume that columns and rows are independent, we can calculate - expected frequencies.
12. 12. Logic of Chi square If no relationship exists between the column and row variable If a relationship (or dependency) does occur • the observed frequencies will be very close The observed frequencies will vary from the to the expected frequencies Compares thefrequencies frequency in expected observed with the expected frequency. they will differ only by small amounts The value of the chi-square statistic will be large. the value of the chi-square statistic will be small each cell
13. 13. Steps for Chi square test Define Null and alternative hypothesis State alpha Calculate degree of freedom State decision rule Calculate test statistics State and Interpret results
14. 14. Hypothesis Testing • Tests a claim about a parameter using evidence (data in a sample) gives causal relationships Steps 1. Formulate Hypothesis about the population 2. Random sample 3. Summarizing the information (descriptive statistic) 4. Does the information given by the sample support the hypothesis? Are we making any error? (inferential stat.) • Decision rule: Convert the research question to null and alternative hypothesis
15. 15. Null Hypothesis • H0 = No difference between observed and expected observations • H1 = difference is present between observed and expected observations
16. 16. What is statistical significance? • A statistical concept indicating that the result is very unlikely due to chance and, therefore, likely represents a true relationship between the variables. • Statistical significance is usually indicated by the alpha value (or probability value), which should be smaller than a chosen significance level.
17. 17. State alpha value • Alpha is error(type I) that is • Rejecting a true null hypothesis • For majority of the studies alpha is 0.05 • Meaning: the investigator has set 5% as the maximum chance of incorrectly rejecting the null hypothesis
18. 18. Degree of freedom It is positive whole number that indicates the lack of restrictions in calculations. Calculation • For Goodness of Fit = Number of levels (outcome)-1 • For independent variables / Homogeneity of The degree of (No. of columns –numberof rows – 1) in proportion : freedom is the 1) (No. of values a calculation that can vary.
19. 19. The Chi-Square Distribution • No negative values • Mean is equal to the degrees of freedom • The standard deviation increases as degrees of freedom increase, so the chi-square curve spreads out more as the degrees of freedom increase. • As the degrees of freedom become very large, the shape becomes more like the normal distribution.
20. 20. The Chi-Square Distribution • The chi-square distribution is different for each value of the degrees of freedom, different critical values correspond to degrees of freedom. • we find the critical value that separates the area defined by α from that defined by 1 – α.
21. 21. Finding Critical Value Q. What is the critical 2 value if df = 2, and  =0.05? If ni = E(ni), 2 = 0 Reject H0 Do not reject H0  = 0.05 df =2 0 2 Table (Portion) DF 1 2 0.995 ... 0.010 5.991 2 Significance level … 0.95 … … 0.004 … … 0.103 … 0.05 3.841 5.991
22. 22. State decision rule If the value obtained is greater than the critical value of chi square , the null hypothesis will be rejected
23. 23. Expected Value Calculate test statistics • Calculated using the formulaChi square for independent variables χ2 = of fit ( O – E )2 ∑ Chi square for goodness Homogeneity of proportion E O = observed frequencies E = expected frequencies • a theory • Previous study • Comparison groups • Previous study • standard • Expected Value = Row total * Column total / Table total Question >>> How to find the Expected value
24. 24. State and interpret results • See whether the value of chi square is more than or less than the critical value If the value of chi square is less than the critical value we accept the null hypothesis If the value of chi square is more than the critical value the null hypothesis can be rejected
25. 25. Chi square test • Goodness of fit • For homogeneity of Proportions • For 2 independent groups – Cohort Study – Case control study – Matched case control Study • For > 2 independent groups
26. 26. Goodness of fit Q How "close" are the observed values tocan be based Expected frequency those which would be expected in a on theory study • • previous experience OR • comparison groups Q.whether a variable has a frequency distribution compariable to the one expected. Chi-square goodness of fit test
27. 27. Goodness of fit • A goodness-of-fit test is an inferential procedure used to determine whether a frequency distribution follows a claimed distribution. • It is a test of the agreement or conformity between the observed frequencies (Oi) and the expected frequencies (Ei) for several classes or categories (i)
28. 28. Example :Is Sudden Infant Death Syndrome seasonal?? Null Hypothesis: The proportion of deaths due to SIDS in winter , summer , autumn , spring is equal = ¼ = 25% Alternative :Not all probabilities stated a in null hypothesis is correct SIDS cases Observed Expected = 322*1/4 Summer 78 80.5 Spring 71 80.5 Autumn 87 80.5 Winter 86 80.5 Total 322 For α =0.05 for df =3 critical value X2 = 7.81 X2 = (78-80.5)2/80.5 + (71- 80.5)2/80.5 + (87.5 – 80.5)2/80.5 + (86 – 80.5)2/80.5 = 2.09 Degree of freedom = k-1 = 4-1 =3 Conclusion: As calculated X2 value is less than Critical value we can accept the null hypothesis and state that deaths due to SIDS across seasons are not statistically different from what's expected by chance (i.e. all seasons being equal)
29. 29. Chi square test • Goodness of fit • For homogeneity of Proportions • For 2 independent groups – Cohort Study – Case control study – Matched case control Study • For > 2 independent groups
30. 30. Homogeneity of proportions • In a chi-square test for homogeneity of proportions, we test the claim that different populations have the same proportion of individuals with some characteristic. EXAMPLE: Is there evidence to indicate that the perception of effects of vaccination is the same in 2013 as was in 2000? Q what is the effect of vaccination on health ? Answers :- Good , No , Bad Null hypothesis: Ho = No difference between the two population H1 = There is difference between the two population
31. 31. State alpha = 0.05 find df = (3-1)(2-1)= 2 =5.99 Chi square distribution X2= 5.991
32. 32. 2000 2013 Expected 2000 frequency Good -656 No- 283 Good effect (989)(1382)/1 Bad- 50 987 = 687.87 2013 No effect (989)(505)/19 87 = 251.36 2000 (998)(505)/1987 = 253.64 2013 656 (989)(100)/19 87= 49.77 283 726 (998)(100)/1987 = 50.23 222 Observed Good Bad effect No effect Bad Total Column total (998)(1382)/198 7=694.13 50 989 989 50 998 998 Row total Good- 726 No-222 1382 Bad -50 505 Total 1382 100 505 100 1987 1987
33. 33. Homogeneity of proportions • χ2 value = ∑ (O-E)2/E Calculated χ2= 10.871 Results: as 10.871> 5.991 we reject the null hypothesis at 0.05 significance . >There is a statistically significant difference in the level of feeling towards vaccination between 2000 and 2013
34. 34. Chi square test • Goodness of fit • For homogeneity of Proportions • For 2 independent groups – Cohort Study – case control study – Matched case control Study • For > 2 independent groups
35. 35. Chi square Independence test • It is used to find out whether there is an association between a row variable and column variable in a contingency table constructed from sample data.
36. 36. Assumption • The variables should be independent. • All expected frequencies are greater than or equal to 1 (i.e., E>1.) • No more than 20% of the expected frequencies are less than 5 Calculated as χ2 value = ∑ (O-E)2/E
37. 37. Expected Count Joint probability = Exposure a+b a+c tt tt Marginal probability = a+b tt Location Disease Disease present neg. Total Present a Negative c d c+d Total a+c b+d tt Marginal probability = b a+c tt Expected count = a+ b sample size (tt) a+b a+c tt tt
38. 38. Short cut of Chi Square
39. 39. Short cut of Chi Square Observed values Expected values
40. 40. => (37- 22.5)2/22.5 +(13 – 27.5)2/27.5 +(17-31.5)2 /31.5+ (53-38.5)2/38.5 = 29.1  120[(37)(53)(13)(17)]2 / 54(66)(50)(70) = 29.1
41. 41. Application in various studies • Cohort study • Case control study • Matched case control study
42. 42. Cohort Study Assumptions: • The two samples are independent • Let a+b = number of people exposed to the risk factor • Let c+d = number of people not exposed to the risk factor Assess whether there is association between exposure and disease by calculating the relative risk (RR)
43. 43. Example: To test the association in a cohort study among smoking and Lung CA Null hypothesis :Ho=the association risk of Smoking and Lung CA (RR=1) We can define No relative between disease: H1 =Association present b/w smoking and Lung CA p1= (Incidence of disease in exposure present) p2 = (Incidence of disease exposure CA Sing Lung CA Lung absent) Total present absent Relative risk YES 84 2914 3000 RR= p1/p2 NO 87 4913 5000 Hence for these studies TOTAL 171 7827 8000 RR= (a/a+ b)/(c/c + d) RR = (84/3000)/(87/5000)=1.21 We can test the hypothesis that RR=1 by calculating the Alpha value= 0.05 and df = 1 chi-square test statistic CONCLUSION:As the X2 > than 3.82 we reject the null hypothesis of RR=1 at 0.05 significance.
44. 44. Case control study Assumptions • The samples are independent • Cases = diseased individuals = a+c • Controls = non-diseased individuals = b+d Assess whether there is association between exposure and disease by calculating the odds ratio (OR)
45. 45. Example: To test the association in a case control study between CHD and smoking Null hypothesis Ho: No association between CHD and smoking(OR=1) H1= Association exists between CHD and Smoking(OR>1 or<1) • Odd’s Ratio = odd’s of exposure amongst diseased group/ odd’s of exposure amongst non diseased • odd’s of exposure amongst diseased = (a/a+c)/(c/a+c) = a/c • Odd’s of exposure amongst non diseased = (b/b+d)/(d/b+d) = b/d • Odd’s Ratio = ad/ bc • Odd’s Ratio=112*224/88*176 = 1.62 We can test whether OR=1 by calculating the chi-square0.05 and df = 1 Alpha value= Conclusion: we reject the null hypothesis that odd’s ratio = 1 at 0.05 significance as X2 > 3.84
46. 46. Matched case control study • Case-control pairs are matched on characteristics such as age, race, sex Assumptions • Samples are not independent • The discordant pairs are case-control pairs with different exposure histories • The matched odds ratio is estimated by bb/cc Pairs in which cases exposed but controls not = bb Pairs in which controls exposed but cases not = cc Assess whether there is association between exposure and disease by calculating the matched odds ratio (OR)
47. 47. To test association of smoking exposure and CHD in a matched case control study Null hypothesis : No association of smoking exposure and CHD (OR=1) Alternative Hypothesis: Association exists between smoking exposure and CHD(OR>1 or< 1) CHD absent • Test whether OR = 1 by calculating Smoking history Smoking history McNemar’s statistic present absent Smoking history present 20 40(bb) Smoking history absent CHD present 10(cc) 30 Alpha value= 0.05 and df = 1 OR=40/10 = 4 X2= [(40-10)-1]2/(40+10) = 841/50 = 16.81 Conclusion: We reject the Null Hypothesis that OR =1 as calculated X 2 >3.84
48. 48. Chi square for > 2 independent variables • The chi-square test is used regardless of whether the research question in terms of proportions or frequencies • Contingency tables can have any number of rows and columns. • The sample size needs to increase as the number of categories increases to keep the expected values of an acceptable size.
49. 49. Limitation of Chi square test • Conditions for approximation of chi square is adequate: – No expected frequency should be <2 – No more than 20%of the cells should have an expected frequency < 5 Question : What to do when these assumptions are not met? Fischer Exact test
50. 50. Fisher Exact test • Gives exact probability of the occurrence of the observed frequencies • Fisher's exact test is especially appropriate with – small sample sizes (Total number of cases is <20 ) or – if expected number of cases in any cell is <2 or – If more than 20% of the cells have expected frequencies <5 Ronald A. Fisher (1890–1962)
51. 51. Continuity correction • It subtracts ½ from the difference between observed and expected frequencies in the numerator of χ2 before squaring; • It makes the value for χ2 smaller >>>> acceptance of null hypothesis >>decrease type I error • In the shortcut formula, n/2 is subtracted from the absolute value of ad – bc prior to squaring.
52. 52. Overuse of Chi square When two groups are being analyzed and the characteristic of interest is measured on a numerical scale. Instead of correctly using the t test, researchers convert the numerical scale to an ordinal or even binary scale and then use chi-square When numerical variables are analyzed with methods designed for ordinal or categorical variables, the greater specificity or detail of the numerical measurement is wasted. Categorize a numerical variable, such as age, but only after investigating whether the categories are appropriate
53. 53. Take Home Message • Chi square test applied on Qualitative data may it be nominal or ordinal. • Before applying Chi square test see all assumptions are met • If value of chi square is large >>>there is a high probability of rejecting the null hypothesis • If the value of chi square is small >>>there is less probability of rejecting the null hypothesis
54. 54. References • Dawson :Basic and clinical statistics • K. Park. : Textbook on Preventive and Social Medicine • John Hopkins Boomberg: Use of Chi square • Non Parametric tests for non statisticians: Foreman and Corner • IBM: SPSS Help