Intro to tests of significance qualitative

3,593 views

Published on

Published in: Health & Medicine
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,593
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Intro to tests of significance qualitative

  1. 1. Tests ofsignificance Dr P Raghavendra, PG in Dept of Community Medicine, SMC, Vijayawada
  2. 2. Overview• Types of data• Basic biostatistics – central tendency, standard deviation, standard normal curve, Z score• Basic terms – Sampling Variation, Null hypothesis, P value, Standard error• Steps in hypothesis testing• Types of tests of significance• SEDP• Chi Square test• Special Situations in chi square testing
  3. 3. Types of Data
  4. 4. Qualitative Data• Qualitative are those which can be answered as YES or NO, Male or Female, etc.• We can only measure their numbers, eg: Number of males, Number of MDR-TB cases among TB patients, etc.• A set of qualitative data can be expressed as proportions. Eg: Prevalence, success rate.
  5. 5. Quantitative Data• Quantitative are those which can be measured in numbers, like Blood pressure, Age, etc.• A set of quantitative date can be expressed in mean and its standard deviation.• Mean is the average of all variables in the data.• Standard deviation is a measure of the distribution of variables around the mean.
  6. 6. Basic Bio- Statistics
  7. 7. Basic Bio-Statistics• Mean, Median, Mode These are the measures of central tendency.• Standard Deviation It is the measure of distribution of the data around the central value.• Biological Variability
  8. 8. Simple Normal Curve Mean = Median = Mode
  9. 9. • Z value – Gives the position of the variable with respect to central value in terms of standard deviation Z value = Observed Value – mean standard deviation
  10. 10. Numbers!!• Numbers are confusing.• Consider an example: – In a hypothetical population, there were 70 deaths among men and 90 deaths among women. So, where is the death rate high, among men? Or women??
  11. 11. Sampling Variation• Research done on samples and not on populations.• Variability of observations occur among different samples.• This complicates whether the observed difference is due to biological or sampling variation from true variation.• To conclude actual difference, we use tests of significance.
  12. 12. Null Hypothesis• 1st step in testing any hypothesis.• Set up such that it conveys a meaning that there exists no difference between the different samples.• Eg: Null Hypothesis – The mean pulse rate among the two groups are same (or) there is no significant difference between their pulse rates.
  13. 13. • By using various tests of significance we either: –Reject the Null Hypothesis (or) –Accept the Null Hypothesis• Rejecting null hypothesis  difference is significant.• Accepting null hypothesis  difference is not significant.
  14. 14. Level of Significance – “P” Value• P is the probablity of the measured significance in difference attributable to chance.• It is the probability of null hypothesis being true.• We accept or reject the null hypothesis based on P value.• If P value is small, it implies, the probability of attributing chance to cause the significance in difference is small and we reject null hypothesis.• In any test we use, we find out P value or Z score to attribute significance.
  15. 15. How much P is significant• Depends on the type of study.• Balance between Type – I and Type – II errors.• Practically, P < 0.05 (5%) is considered significant.• P = 0.05 implies, – We may go wrong 5 out of 100 times by rejecting null hypothesis. – Or, We can attribute significance with 95% confidence.
  16. 16. Standard Error• To put it simply, it is the standard deviation of means or proportion.• Gives idea about the dispersion of mean or proportions obtained from multiple or repeated samples of same population.
  17. 17. Steps inTesting aHypothesis
  18. 18. General procedure in testing a hypothesis1. Set up a null hypothesis.2. Define alternative hypothesis.3. Calculate the test statistic (t, 2, Z, etc).4. Determine degrees of freedom.5. Find out the corresponding probability level (P Value) for the calculated test statistic and its degree of freedom. This can be read from relevant tables.6. Accept or reject the Null hypothesis depending on P value
  19. 19. Classification of tests of significance• For Qualitative data:- 1. Standard error of difference between 2 proportions (SEp -p ) 1 2 2. Chi-square test or 2• For Quantitative data:- 1. Unpaired (student) ‘t’ test 2. The Z test 3. Paired ‘t’ test
  20. 20. Standard error ofdifference between 2 proportions (SEp -p ) 1 2
  21. 21. Standard error of difference between proportions (SEp1-p2)• For comparing qualitative data between 2 groups.• Usable for large samples only. (> 30 in each group).• We calculate SEp1-p2 by using the formula:
  22. 22. • And then we calculate the test statistic ‘Z Score’ by using the formula: Z = P1 – P2 SEP1-P2• If Z > 1.96 P < 0.05 SIGNIFICANT• If Z < 1.96 P > 0.05 INSIGNIFICANT
  23. 23. Example – 1• Consider a hypothetical study where cure rate of Typhoid fever after treatment with Ciprofloxacin and Ceftriaxone were recorded to be 90% and 80% among 100 patients treated with each of the drug. How can we determine whether cure rate of Ciprofloxacin is better than Ceftriaxone?
  24. 24. Solution• Step – 1: Set up a null hypothesis – H0: “There is no significant difference between cure rates of Ciprofloxacin and Ceftriaxone.”• Step – 2: Define alternative hypothesis – Ha: “ Ciprofloxacin is 1.125 times better in curing typhoid fever than Ceftriaxone”• Step – 3: Calculate the test statistic – ‘Z Score’
  25. 25. Step – 3: Calculating Z score Z= P1 – P2 SEP1-P2– Here, P1 = 90, P2 = 80– SEP1-P2 will be given by the formula: So, Z = 10/5 = 2
  26. 26. • Step – 4: Find out the corresponding P Value – Since Z = 2 i.e., > 1.96, hence, P < 0.05• Step – 5: Accept or reject the Null hypothesis – Since P < 0.05, So we reject the null hypothesis (H0) There is no significant difference between cure rates of Ciprofloxacin and Ceftriaxone – And we accept the alternate hypothesis (Ha) that, Ciprofloxacin is 1.125 times better in curing typhoid fever than Ceftriaxone
  27. 27. Example – 2• Consider another study, where 72 TB patients were divided into 2 treatment groups A and B. Group A were given DOTS and group B were given a newly discovered regimen (Regimen-X) found effective in Nigeria. The cure rates at the end of treatment (assume 100% compliance) among group A and group B were 90% and 80% respectively. Comment on whether this difference is significant or not.
  28. 28. If you think the answer is same for example 1 and 2, remember that –• Numbers what we see, can be misguiding.• Mind can be biased.• Hence always try and test the significance of the difference observed.
  29. 29. Solution• Step – 1: Set up a null hypothesis – H0: “There is no significant difference between cure rates of DOTS and Regimen-X.”• Step – 2: Define alternative hypothesis – Ha: “ DOTS is 1.125 times better in curing typhoid fever than Regimen-X”• Step – 3: Calculate the test statistic – ‘Z Score’
  30. 30. Step – 3: Calculating Z score Z= P1 – P2 SEP1-P2– Here, P1 = 90, P2 = 80– SEP1-P2 will be given by the formula: So, Z = 1.2
  31. 31. • Step – 4: Find out the corresponding P Value – Since Z = 1.2 i.e., < 1.96, hence, P > 0.05• Step – 5: Accept or reject the Null hypothesis – Since P > 0.05, So we accept the null hypothesis (H0) There is no significant difference between cure rates of DOTS and Regimen-X and the new regimen-X is as good as DOTS.
  32. 32. Limitations of SEP1-P2• Sample must be large. – At least 30 in each sample.• It can compare only 2 groups. – Quite often we might be testing significance of difference between many groups. – In such cases we might not be able to do SEP1-P2
  33. 33. Chi-Square Test
  34. 34. Chi square ( 2) test• This test too is for testing qualitative data.• Its advantages over SEDP are: – Can be applied for smaller samples as well as for large samples. – Can be used if there are sub-groups or more than 2 groups
  35. 35. Chi square ( 2) testPrerequisites for Chi square ( 2) test to be applied: – It must be quantitative data. – The sample must be a random sample. – None of the observed values must be zero. – Yates correction must be applied if any observation is less than 5 (but 0).
  36. 36. Steps in Calculating 2 value1. Make a contingency table mentioning the frequencies in all cells.2. Determine the expected value (E) in each cell. E= Row total x Column total rxt Grand total T3. Calculate the difference between observed and expected values in each cell (O-E). Contd…
  37. 37. 4. Calculate 2 value for each cell 2 of each cell = (O – E)2 E5. Sum up 2 value of each cell to get 2 value of the table. 2= (O – E)2 E
  38. 38. Example 3• Consider a study done in a hospital where cases of breast cancer were compared against controls from normal population against possession of a family history of Ca Breast. 100 in each group were studied for presence of family history. 25 of cases and 15 among controls had a positive family history. Comment on the significance of family history in breast cancer.
  39. 39. Solution• From the numbers, it suggests that family history is 1.66 (25/15) times more common in Ca breast. So is it a risk factor in our population?• We need to test for the significance of this difference.• We shall apply 2 test.• As in SEDP, we should follow the same steps.
  40. 40. Solution• Step – 1: Set up a null hypothesis – H0: “There is no significant difference between incidence of family history among cases and controls.”• Step – 2: Define alternative hypothesis – Ha: “Family history is 1.66 times more common in Ca breast”• Step – 3: Calculate the test statistic – ‘ 2’
  41. 41. Step – 3: Calculating 21. Make a contingency table mentioning the frequencies in all cells. Risk factor (Family History) Group Total Present absent Cases 25 ( a) 75 (b) 100 Controls 15 c ( ) 85 d ( ) 100 Total 40 160 200
  42. 42. Step – 3: Calculating 22. Determine the expected value (E) in each cell. E= Row total x Column total rxt Grand total T – For (a), Ea = 100 x 40 / 200 = 20 – For (b), Eb = 100 x 160 / 200 = 80 – For (c), Ec = 100 x 40 / 200 = 20 – For (d), Ed = 100 x 160 / 200 = 80
  43. 43. Step – 3: Calculating 2 (O – E)2 O E (O – E) (O – E)2 Ea 25 20 5 25 1.25b 75 80 -5 25 0.3125c 15 20 5 25 1.25d 85 80 -5 25 0.3125 2 = 3.125
  44. 44. Solution• Step – 4: Determine degrees of freedom. DoF is given by the formula: DoF = (r-1) x (c-1) where r and c are the number of rows and columns respectively Here, r = c = 2. Hence, DoF = (2-1) x (2-1) = 1
  45. 45. Solution• Step – 5: Find out the corresponding P Value – P values can be calculated by using the 2 distribution tables. 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46
  46. 46. Solution• Step – 6: Accept or reject the Null hypothesis• In our given scenario, 2 = 3.125• This is less than 3.84 (for P = 0.05 at dof =1)• Hence Null hypothesis is Accepted, i.e.,“There is no significant difference between incidence of family history among cases and controls”
  47. 47. Example - 4• Calculate 2 for the following data and comment on the significance of blood groups in type of leprosy. Non LepromatousBlood group Non-leprosy Lepromatous Total leprosy leprosy A 30 49 52 131 B 60 49 36 145 O 47 59 48 154 AB 13 12 16 41 Total 150 169 152 471
  48. 48. The expected frequencies will be Lepromatous Non LepromatousBlood Group Non-leprosy leprosy leprosy (131/471) x 150 (131/471) x 169 (131/471) x 152 A = 41.7 = 47.0 = 42.3 (145/471) x 150 (145/471)x 169 (145/471)x 152 B = 46.2 =52.0 = 46.8 (154/471)x 150 (154/471)x 169 (154/471)x 152 O = 49.0 = 55.3 = 49.7 (41/171)x 150 (41/171)x 169 (41/171)x 152 AB = 13.1 = 14.7 = 13.2
  49. 49. On substituting 2 formula, 2 in each cell will be: Non Lepromatous Blood group Non Leprosy Lepromatous Total leprosy leprosy A 3.28 0.09 2.22 5.59 B 4.12 0.17 2.49 6.78 O 0.08 0.25 0.06 0.39 AB 0.01 0.50 0.58 1.09 2 13.85
  50. 50. Solution• DoF = (r-1) x (c-1) = (3-1)(4-1) =6• For DoF of 6, a 2 of 13.85 means P will be between 0.01 and 0.05.• Hence, We can conclude that the observations are significant. 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46
  51. 51. Special situations in 2 test• In 2 x 2 tables only, a simplification of formula for 2 can be applied: 2= (ad – bc)2 x N (a+b)(c+d)(a+c)(b+d) (N = a+b+c+d)• Yates correction
  52. 52. Special situations in 2 test• Yates correction cannot be done in multinomial groups.• This can be used only in 2x2 tables.• In multinomial groups, such groups containing values <5 can be merged to form a bigger group to avoid the pitfall.
  53. 53. Thank you

×