Upcoming SlideShare
×

# T test and ANOVA

13,914 views

Published on

TTest and ANOVA

32 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• great

Are you sure you want to  Yes  No
• wonderful material

Are you sure you want to  Yes  No
• Excellent material

Are you sure you want to  Yes  No
Views
Total views
13,914
On SlideShare
0
From Embeds
0
Number of Embeds
2,230
Actions
Shares
0
867
3
Likes
32
Embeds 0
No embeds

No notes for slide
• 2 As a result of this class, you will be able to ...
• 13
• 14
• 2 As a result of this class, you will be able to ...
• 79 Note: There is one dependent variable in the ANOVA model. MANOVA has more than one dependent variable. Ask, what are nominal &amp; interval scales?
• 80
• ### T test and ANOVA

1. 1. FF2613Inferential Statistics, T Test,ANOVA & Proportionate Test Assoc. Prof . Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia ©drtamil@gmail.com 2012
2. 2. FF2613Inferential Statistics Basic Hypothesis Testing ©drtamil@gmail.com 2012
3. 3. Inferential Statistic4 When we conduct a study, we want to make an inference from the data collected. For example; “drug A is better than drug B in treating disease D" ©drtamil@gmail.com 2012
4. 4. Drug A Better Than Drug B?4 Drug A has a higher rate of cure than drug B. (Cured/Not Cured)4 If for controlling BP, the mean of BP drop for drug A is larger than drug B. (continuous data – mm Hg) ©drtamil@gmail.com 2012
5. 5. Null Hypothesis4 Null Hyphotesis; “no difference of effectiveness between drug A and drug B in treating disease D" ©drtamil@gmail.com 2012
6. 6. Null Hypothesis4 H0is assumed TRUE unless data indicate otherwise: • The experiment is trying to reject the null hypothesis • Can reject, but cannot prove, a hypothesis – e.g. “all swans are white” » One black swan suffices to reject » H0 “Not all swans are white” » No number of white swans can prove the hypothesis – since the next swan could still be black. ©drtamil@gmail.com 2012
7. 7. Can reindeer fly?4 You believe reindeer can fly4 Null hypothesis: “reindeer cannot fly”4 Experimental design: to throw reindeer off the roof4 Implementation: they all go splat on the ground4 Evaluation: null hypothesis not rejected • This does not prove reindeer cannot fly: what you have shown is that – “from this roof, on this day, under these weather conditions, these particular reindeer either could not, or chose not to, fly”4 It is possible, in principle, to reject the null hypothesis • By exhibiting a flying reindeer! ©drtamil@gmail.com 2012
8. 8. Significance4 Inferential statistics determine whether a significant difference of effectiveness exist between drug A and drug B.4 If there is a significant difference (p<0.05), then the null hypothesis would be rejected.4 Otherwise, if no significant difference (p>0.05), then the null hypothesis would not be rejected.4 The usual level of significance utilised to reject or not reject the null hypothesis are either 0.05 or 0.01. In the above example, it was set at 0.05. ©drtamil@gmail.com 2012
9. 9. Confidence interval4 Confidence interval = 1 - level of significance.4 If the level of significance is 0.05, then the confidence interval is 95%.4CI = 1 – 0.05 = 0.95 = 95%4If CI = 99%, then level of significance is 0.01. ©drtamil@gmail.com 2012
10. 10. What is level of significance? Chance?Reject H0 Reject H0 .025 .025 -2.0639 0 2.0639 -1.96 1.96 t ©drtamil@gmail.com 2012
11. 11. Fisher’s Use of p-Values4 R.A. Fisher referred to the probability to declare significance as “p-value”.4 “It is a common practice to judge a result significant, if it is of such magnitude that it would be produced by chance not more frequently than once in 20 trials.”4 1/20=0.05. If p-value less than 0.05, then the probability of the effect detected were due to chance is less than 5%.4 We would be 95% confident that the effect detected is due to real effect, not due to chance. ©drtamil@gmail.com 2012
12. 12. Error4 Although we have determined the level of significance and confidence interval, there is still a chance of error.4 There are 2 types; • Type I Error • Type II Error ©drtamil@gmail.com 2012
13. 13. Error REALITY Treatments are Treatments areDECISION not different different Conclude Correct Decision Type II errortreatments are β error not different (Cell a) (Cell b) Conclude Type I error Correct Decisiontreatments are α error different (Cell c) (Cell d) ©drtamil@gmail.com 2012
14. 14. Error Incorrect Null Test of Correct Null Hypothesis Hypothesis Significance (Ho not rejected) (Ho rejected)Null HypothesisNot Rejected Correct Conclusion Type II ErrorNull HypothesisRejected Type I Error Correct Conclusion ©drtamil@gmail.com 2012
15. 15. Type I Error• Type I Error – rejecting the null hypothesis although the null hypothesis is correct e.g.• when we compare the mean/proportion of the 2 groups, the difference is small but the difference is found to be significant. Therefore the null hypothesis is rejected.• It may occur due to inappropriate choice of alpha (level of significance). ©drtamil@gmail.com 2012
16. 16. Type II Error• Type II Error – not rejecting the null hypothesis although the null hypothesis is wrong• e.g. when we compare the mean/proportion of the 2 groups, the difference is big but the difference is not significant. Therefore the null hypothesis is not rejected.• It may occur when the sample size is too small. ©drtamil@gmail.com 2012
17. 17. Example of Type II Error Data of a clinical trial on 30 patients on comparison of pain control between two modes of treatment. Type of treatment * Pain (2 hrs post-op) Crosstabulation Pain (2 hrs post-op) No pain In pain Total Type of treatment Pethidine Count 8 7 15 % within Type 53.3% 46.7% 100.0% of treatment Cocktail Count 4 11 15 % within Type 26.7% 73.3% 100.0% of treatment Total Count 12 18 30 % within Type 40.0% 60.0% 100.0% of treatment Chi-square =2.222, p=0.136p = 0.136. p bigger than 0.05. No significant difference and the null hypothesis was notrejected.There was a large difference between the rates but were notsignificant. Type II Error? ©drtamil@gmail.com 2012
18. 18. Not significant since power of the study is less than 80%. Power is only 32%! ©drtamil@gmail.com 2012
19. 19. Check for the errors4 You can check for type II errors of your own data analysis by checking for the power of the respective analysis4 This can easily be done by utilising software such as Power & Sample Size (PS2) from the website of the Vanderbilt University ©drtamil@gmail.com 2012
20. 20. Determining theappropriate statistical test ©drtamil@gmail.com 2012
21. 21. Data Analysis4 Descriptive – summarising data4 Test of Association4 Multivariate – controlling for confounders ©drtamil@gmail.com 2012
22. 22. Test of Association4 To study the relationship between one or more risk variable(s) (independent) with outcome variable (dependent)4 For example; does ethnicity affects the suicidal/para-suicidal tendencies of psychiatric patients. ©drtamil@gmail.com 2012
23. 23. Problem Flow Chart Independent VariablesEthnicity Marital Status Suicidal Tendencies Dependent Variable ©drtamil@gmail.com 2012
24. 24. Multivariat4 Studies the association between multiple causative factors/variables (independent variables) with the outcome (dependent).4 For example; risk factors such as parental care, practise of religion, education level of parents & disciplinary problems of their child (outcome). ©drtamil@gmail.com 2012
25. 25. Hypothesis Testing4 Distinguish parametric & non-parametric procedures4 Test two or more populations using parametric & non-parametric procedures • Means • Medians • Variances ©drtamil@gmail.com 2012
26. 26. Hypothesis Testing Procedures ©drtamil@gmail.com 2012
27. 27. Parametric Test Procedures4 Involve population parameters • Example: Population mean4 Require interval scale or ratio scale • Whole numbers or fractions • Example: Height in inches: 72, 60.5, 54.74 Have stringent assumptions • Example: Normal distribution4 Examples: Z test, t test ©drtamil@gmail.com 2012
28. 28. Nonparametric Test Procedures4 Statistic does not depend on population distribution4 Data may be nominally or ordinally scaled • Example: Male-female4 May involve population parameters such as median4 Example: Wilcoxon rank sum test ©drtamil@gmail.com 2012
29. 29. Parametric Analysis – QuantitativeQualitative Quantitative Normally distributed data Students t TestDichotomusQualitative Quantitative Normally distributed data ANOVAPolinomialQuantitative Quantitative Repeated measurement of the Paired t Test same individual & item (e.g. Hb level before & after treatment). Normally distributed dataQuantitative - Quantitative - Normally distributed data Pearson Correlationcontinous continous & Linear Regresssion ©drtamil@gmail.com 2012
30. 30. non-parametric testsVariable 1 Variable 2 Criteria Type of TestQualitative Qualitative Sample size < 20 or (< 40 but Fisher TestDichotomus Dichotomus with at least one expected value < 5)Qualitative Quantitativ Data not normally distributed Wilcoxon Rank Sum eDichotomus Test or U Mann- Whitney TestQualitative Quantitativ Data not normally distributed Kruskal-Wallis One ePolinomial Way ANOVA TestQuantitative Quantitativ Repeated measurement of the Wilcoxon Rank Sign e same individual & item TestQuantitative - Quantitativ - Data not normally distributed Spearman/Kendall econtinous continous Rank Correlation ©drtamil@gmail.com 2012
31. 31. Statistical Tests - QualitativeVariable 1 Variable 2 Criteria Type of TestQualitative Qualitative Sample size > 20 dan no Chi Square Test (X2) expected value < 5Qualitative Qualitative Sample size > 30 Proportionate TestDichotomus DichotomusQualitative Qualitative Sample size > 40 but with at X2 Test with YatesDichotomus Dichotomus least one expected value < 5 CorrectionQualitative Quantitative Qualitative Normallysize < 20 or data but Fisher Test Test Sample distributed (< 40 Students tDichotomus Dichotomus with at least one expected value < 5)Qualitative Quantitative Data not normally distributed Wilcoxon Rank Sum ©drtamil@gmail.com 2012
32. 32. Data Analysis4Using SPSS; http://161.142.92.104/spss/4Using Excel; http://161.142.92.104/excel/ ©drtamil@gmail.com 2012
33. 33. FF2613 T Test, ANOVA & Proportionate TestAssoc. Prof . Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia ©drtamil@gmail.com 2012
34. 34. T - TestIndependent T-Test Student’s T-Test Paired T-Test ANOVA © d rta m il@ g m a il. o m c 2012
35. 35. Student’s T-test William Sealy Gosset @“Student”, 1908. The Probable Error of Mean. Biometrika. ©drtamil@gmail.com 2012
36. 36. Student’s T-Test4 To compare the means of two independent groups. For example; comparing the mean Hb between cases and controls. 2 variables are involved here, one quantitative (i.e. Hb) and the other a dichotomous qualitative variable (i.e. case/control).4 t= ©drtamil@gmail.com 2012
37. 37. Examples: Student’s t- test4 Comparing the level of blood cholestrol (mg/dL) between the hypertensive and normotensive.4 Comparing the HAMD score of two groups of psychiatric patients treated with two different types of drugs (i.e. Fluoxetine & Sertraline ©drtamil@gmail.com 2012
38. 38. Example Group Statistics DRUG N Mean Std. Deviation DHAMAWK6 F 35 4.2571 3.12808 S 32 3.8125 4.39529 Independent Samples Test t-test for Equality of Means Sig. Mean t df (2-tailed) DifferenceDHAMAWK6 Equal variances .48 65 .633 .4446 assumed ©drtamil@gmail.com 2012
39. 39. Assumptions of T test4 Observations are normally distributed in each population. (Explore)4 The population variances are equal. ( L e v e n e ’s T e s t)The 2 groups are independent of each other. (Design of study) ©drtamil@gmail.com 2012
40. 40. Manual Calculation4 Sample size > 30 4 Small sample size, equal variance X1 − X 2 t= X1 − X 2 1 1t= s0 + n1 n2 2 2 s s + 1 2 n1 n2 (n1 − 1) s12 + (n2 − 1) s2 2 s0 = 2 (n1 − 1) + (n2 − 1) ©drtamil@gmail.com 2012
41. 41. Example – compare cholesterol level4 Hypertensive : 4 Normal : Mean : 214.92 Mean : 182.19 s.d. : 39.22 s.d. : 37.26 n : 64 n : 36• Comparing the cholesterol level between hypertensive and normal patients.• The difference is (214.92 – 182.19) = 32.73 mg%.• H0 : There is no difference of cholesterol level between hypertensive and normal patients.• n > 30, (64+36=100), therefore use the first formula. ©drtamil@gmail.com 2012
42. 42. Calculation X1 − X 2 t= 2 2 s s 1 + 2 n1 n24t = (214.92- 182.19)________ ((39.222/64)+(37.262/36))0.54 t = 4.1374 df = n1+n2-2 = 64+36-2 = 984 Refer to t table; with t = 4.137, p < 0.001 ©drtamil@gmail.com 2012
43. 43. If df>100, can refer Table A1.We don’t have 4.137 so weuse 3.99 instead. If t = 3.99,then p=0.00003x2=0.00006Therefore if t=4.137,p<0.00006.
44. 44. Or can refer to Table A3. We don’t have df=98, so we use df=60 instead. t = 4.137 > 3.46 (p=0.001)Therefore if t=4.137, p<0.001.
45. 45. Conclusion• Therefore p < 0.05, null hypothesis rejected.• There is a significant difference of cholesterol level between hypertensive and normal patients.• Hypertensive patients have a significantly higher cholesterol level compared to normotensive patients. ©drtamil@gmail.com 2012
46. 46. Exercise (try it)• Comparing the mini test 1 (2012) results between UKM and ACMS students.• The difference is 11.255• H0 : There is no difference of marks between UKM and ACMS students.• n > 30, therefore use the first formula. ©drtamil@gmail.com 2012
47. 47. Exercise (answer)4 Nullhypothesis rejected4 There is a difference of marks between UKM and ACMS students. UKM marks higher than AUCMS ©drtamil@gmail.com 2012
48. 48. T-Test In SPSS4 For this exercise, we will be using the data from the CD, under Chapter 7, sga-bab7.sav4 This data came from a case-control study on factors affecting SGA in Kelantan.4 Open the data & select - >Analyse >Compare Means >Ind-Samp T Test… ©drtamil@gmail.com 2012
49. 49. T-Test in SPSS4 We want to see whether there is any association between the mothers’ weight and SGA. So select the risk factor (weight2) into ‘Test Variable’ & the outcome (SGA) into ‘Grouping Variable’.4 Now click on the ‘Define Groups’ button. Enter • 0 (Control) for Group 1 and • 1 (Case) for Group 2.4 Click the ‘Continue’ button & then click the ‘OK’ button. ©drtamil@gmail.com 2012
50. 50. T-Test Results Group Statistics Std. Error SGA N Mean Std. Deviation MeanWeight at first ANC Normal 108 58.666 11.2302 1.0806 SGA 109 51.037 9.3574 .89634 Compare the mean+sd of both groups. • Normal 58.7+11.2 kg • SGA 51.0+ 9.4 kg4 Apparently there is a difference of weight between the two groups. ©drtamil@gmail.com 2012
51. 51. Results & Homogeneity of Variances Independent Samples Test Levenes Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Mean Std. Error Difference F Sig. t df Sig. (2-tailed) Difference Difference Lower UpperWeight at first ANC Equal variances 1.862 .174 5.439 215 .000 7.629 1.4028 4.8641 10.3940 assumed Equal variances 5.434 207.543 .000 7.629 1.4039 4.8612 10.3969 not assumed 4 Look at the p value of Levene’s Test. If p is not significant then equal variances is assumed (use top row). 4 If it is significant then equal variances is not assumed (use bottom row). 4 So the t value here is 5.439 and p < 0.0005. The difference is significant. Therefore there is an association between the mothers weight and SGA. ©drtamil@gmail.com 2012
52. 52. How to present the result?Group N Mean test pNormal 108 58.7+11.2 kg T test <0.0005 t = 5.439 SGA 109 51.0+ 9.4 ©drtamil@gmail.com 2012
53. 53. Paired t-test“Repeated measurement on the same individual” ©drtamil@gmail.com 2012
54. 54. Paired T-Test4 “Repeated measurement on the same individual”4 t= ©drtamil@gmail.com 2012
55. 55. Formula d −0t= sd n (∑ d ) 2 ∑d i 2 − nsd = n −1df = n p − 1 ©drtamil@gmail.com 2012
56. 56. Examples of paired t-test4 Comparing the HAMD score between week 0 and week 6 of treatment with Sertraline for a group of psychiatric patients.4 Comparing the haemoglobin level amongst anaemic pregnant women after 6 weeks of treatment with haematinics. ©drtamil@gmail.com 2012
57. 57. Example Paired Samples Statistics Mean N Std. Deviation Pair DHAMAWK0 13.9688 32 6.48315 1 DHAMAWK6 3.8125 32 4.39529 Paired Samples Test Paired Differences Std. Sig. Mean Deviation t df (2-tailed)Pair DHAMAWK0 - 10.1563 6.75903 8.500 31 .0001 DHAMAWK6 ©drtamil@gmail.com 2012
58. 58. M a n u a l C a l c u l a t i o n The measurement of the systolic and diastolic blood pressures was done two consecutive times with an interval of 10 minutes. You want to d e te r m in e w h e th e r th e r e w a s a n y difference between those two measurements.4 H0:There is no difference of the systolic blood pressure during the first (time 0) and second measurement (time 10 minutes). ©drtamil@gmail.com 2012
59. 59. Calculation4 Calculate the difference between first & second measurement and square it. Total up the difference and the square. ©drtamil@gmail.com 2012
60. 60. Calculation4∑ d = 112 ∑ d2 = 1842 n = 364 Mean d = 112/36 = 3.114 sd = ((1842-1122/36)/35)0.5 d −0 t= sd sd = 6.53 n4 t = 3.11/(6.53/6) t = 2.858 (∑ d ) 24 df = np – 1 = 36 – 1 = 35. ∑ d i2 − n sd = n −14 Refer to t table; df = n p − 1 ©drtamil@gmail.com 2012
61. 61. Refer to Table A3. We don’t have df=35, so we use df=30 instead. t = 2.858, larger than 2.75(p=0.01) but smaller than 3.03(p=0.005). 3.03>t>2.75 Therefore if t=2.858, 0.005<p<0.01.
62. 62. Conclusionwith t = 2.858, 0.005<p<0.01Therefore p < 0.01.Therefore p < 0.05, null hypothesisrejected.Conclusion: There is a significantdifference of the systolic blood pressurebetween the first and secondmeasurement. The mean average of firstreading is significantly higher comparedto the second reading. ©drtamil@gmail.com 2012
63. 63. Paired T-Test In SPSS4 For this exercise, we will be using the data from the CD, under Chapter 7, sgapair.sav4 This data came from a controlled trial on haematinic effect on Hb.4 Open the data & select - >Analyse >Compare Means >Paired-Samples T Test… ©drtamil@gmail.com 2012
64. 64. Paired T-Test In SPSS4 We want to see whether there is any association between the prescription on haematinic to anaemic pregnant mothers and Hb.4 We are comparing the Hb before & after treatment. So pair the two measurements (Hb2 & Hb3) together.4 Click the ‘OK’ button. ©drtamil@gmail.com 2012
65. 65. Paired T-Test Results Paired Samples Statistics Std. Error Mean N Std. Deviation Mean Pair HB2 10.247 70 .3566 .0426 1 HB3 10.594 70 .9706 .11604 Thisshows the mean & standard deviation of the two groups. ©drtamil@gmail.com 2012
66. 66. Paired T-Test Results Paired Samples Test Paired Differences 95% Confidence Interval of the Std. Error Difference Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)Pair 1 HB2 - HB3 -.347 .9623 .1150 -.577 -.118 -3.018 69 .004 4 This shows the mean difference of Hb before & after treatment is only 0.347 g%. 4 Yet the t=3.018 & p=0.004 show the difference is statistically significant. ©drtamil@gmail.com 2012
67. 67. How to present the result? Mean D Group N Test p (Diff.) Beforetreatment Paired T- (HB2) vs 70 0.35 + 0.96 test 0.004 After t = 3.018treatment (HB3) ©drtamil@gmail.com 2012
69. 69. ANOVA – Analysis of Variance4 Extension of independent-samples t test4 Comparesthe means of groups of independent observations • Don’t be fooled by the name. ANOVA does not compare variances.4 Can compare more than two groups ©drtamil@gmail.com 2012
70. 70. One-Way ANOVA F-Test4 Tests the equality of 2 or more population means4 Variables • One nominal scaled independent variable – 2 or more treatment levels or classifications (i.e. Race; Malay, Chinese, Indian & Others) • One interval or ratio scaled dependent variable (i.e. weight, height, age)4 Used to analyse completely randomized experimental designs ©drtamil@gmail.com 2012
71. 71. Examples4 Comparing the blood cholesterol levels between the bus drivers, bus conductors and taxi drivers.4 Comparing the mean systolic pressure between Malays, Chinese, Indian & Others. ©drtamil@gmail.com 2012
72. 72. One-Way ANOVA F-Test Assumptions4 Randomness & independence of errors • Independent random samples are drawn4 Normality • Populations are normally distributed4 Homogeneity of variance • Populations have equal variances ©drtamil@gmail.com 2012
73. 73. Example DescriptivesBirth weight N Mean Std. Deviation Minimum MaximumHousewife 151 2.7801 .52623 1.90 4.72Office work 23 2.7643 .60319 1.60 3.96Field work 44 2.8430 .55001 1.90 3.79Total 218 2.7911 .53754 1.60 4.72 ANOVA Birth weight Sum of Squares df Mean Square F Sig. Between Groups .153 2 .077 .263 .769 Within Groups 62.550 215 .291 Total 62.703 217 ©drtamil@gmail.com 2012
74. 74. Manual Calculation ANOVA ©drtamil@gmail.com 2012
75. 75. Manual Calculation4 Notexpected to be calculated manually by medical students. ©drtamil@gmail.com 2012
76. 76. Example:Time To CompleteAnalysis45 samples wereanalysed using 3 differentblood analyser (Mach1,Mach2 & Mach3).15 samples were placedinto each analyser.Time in seconds wasmeasured for eachsample analysis.
77. 77. Example:Time To CompleteAnalysisThe overall mean of theentire sample was 22.71seconds.This is called the “grand”mean, and is oftendenoted by X .If H0 were true then we’dexpect the group meansto be close to the grandmean.
78. 78. Example:Time To CompleteAnalysisThe ANOVA test isbased on the combineddistances from X .If the combineddistances are large, thatindicates we shouldreject H0.
79. 79. The Anova StatisticTo combine the differences from the grand mean we • Square the differences • Multiply by the numbers of observations in the groups • Sum over the groups ( )2 ( ) 2 ( SSB = 15 X Mach1 − X + 15 X Mach 2 − X + 15 X Mach3 − X ) 2where the X * are the group means. “SSB” = Sum of Squares Between groups
80. 80. The Anova StatisticTo combine the differences from the grand mean we • Square the differences • Multiply by the numbers of observations in the groups • Sum over the groups ( )2 ( ) 2 ( SSB = 15 X Mach1 − X + 15 X Mach 2 − X + 15 X Mach3 − X ) 2where the X * are the group means. “SSB” = Sum of Squares Between groupsNote: This looks a bit like a variance.
81. 81. Sum of Squares Between ( ) 2 ( )2 ( SSB = 15 X Mach1 − X + 15 X Mach 2 − X + 15 X Mach3 − X )24 Grand Mean = 22.714 Mean Mach1 = 24.93; (24.93-22.71)2=4.92844 Mean Mach2 = 22.61; (22.61-22.71)2=0.014 Mean Mach3 = 20.59; (20.59-22.71)2=4.49444 SSB = (15*4.9284)+(15*0.01)+(15*4.4944)4 SSB = 141.492 ©drtamil@gmail.com 2012
82. 82. How big is big?4 For the Time to Complete, SSB = 141.4924 Is that big enough to reject H0?4 As with the t test, we compare the statistic to the variability of the individual observations.4 InANOVA the variability is estimated by the Mean Square Error, or MSE
83. 83. MSE Mean Square Error The Mean Square Error is a measure of the variability after the group effects have been taken into account. ∑∑ (x − X j) 1 2MSE = ij N −K j i where xij is the ith observation in the jth group.
84. 84. MSE Mean Square Error The Mean Square Error is a measure of the variability after the group effects have been taken into account. ∑∑ (x − X j) 1 2MSE = ij N −K j i where xij is the ith observation in the jth group.
85. 85. MSE Mean Square Error The Mean Square Error is a measure of the variability after the group effects have been taken into account. ∑∑ (x − X j) 1 2MSE = ij N −K j i
86. 86. ∑∑ (xij − X j ) 1 2 MSE = N −K j iMach1 (x-mean)^2 Mach2 (x-mean)^2 Mach3 (x-mean)^223.73 1.4400 21.5 1.2321 19.74 0.722523.74 1.4161 21.6 1.0201 19.75 0.705623.75 1.3924 21.7 0.8281 19.76 0.688924.00 0.8649 21.7 0.8281 19.9 0.476124.10 0.6889 21.8 0.6561 20 0.348124.20 0.5329 21.9 0.5041 20.1 0.240125.00 0.0049 22.75 0.0196 20.3 0.084125.10 0.0289 22.75 0.0196 20.4 0.036125.20 0.0729 22.75 0.0196 20.5 0.008125.30 0.1369 23.3 0.4761 20.5 0.008125.40 0.2209 23.4 0.6241 20.6 0.000125.50 0.3249 23.4 0.6241 20.7 0.012126.30 1.8769 23.5 0.7921 22.1 2.280126.31 1.9044 23.5 0.7921 22.2 2.592126.32 1.9321 23.6 0.9801 22.3 2.9241SUM 12.8380 9.4160 11.1262 ©drtamil@gmail.com 2012
87. 87. ∑∑ (xij − X j ) 1 2 MSE = N −K j i4 Note that the variation of the means (141.492) seems quite large (more likely to be significant???) compared to the variance of observations within groups (12.8380+9.4160+11.1262=33.3802).4 MSE = 33.3802/(45-3) = 0.7948 ©drtamil@gmail.com 2012
88. 88. Notes on MSE4 Ifthere are only two groups, the MSE is equal to the pooled estimate of variance used in the equal-variance t test.4 ANOVA assumes that all the group variances are equal.4 Other options should be considered if group variances differ by a factor of 2 or more.4 (12.8380 ~ 9.4160 ~ 11.1262)
89. 89. ANOVA F Test4 The ANOVA F test is based on the F statistic SSB (K − 1) F= MSE where K is the number of groups.4 Under H0 the F statistic has an “F” distribution, with K-1 and N-K degrees of freedom (N is the total number of observations)
90. 90. Time to Analyse: F test p-valueTo get a p-value wecompare our F statisticto an F(2, 42)distribution.
91. 91. Time to Analyse: F test p-valueTo get a p-value wecompare our F statisticto an F(2, 42)distribution.In our example 141.492 2F= = 89.015 33.3802 42We cannot draw the linesince the F value is solarge, therefore the pvalue is so small!!!!!!
92. 92. Refer to F Dist. Table (α=0.01). We don’t have df=2;42, so we use df=2;40 instead. F = 89.015, larger than 5.18 (p=0.01) Therefore if F=89.015, p<0.01.Why use df=2;42?We have 3 groupsso K-1 = 2We have 45samples thereforeN-K = 42. ©drtamil@gmail.com 2012
93. 93. Time to Analyse: F test p-valueTo get a p-value wecompare our F statisticto an F(2, 42)distribution.In our example 141.492 2F= = 89.015 33.3802 42The p-value is reallyP(F (2,42) > 89.015) = 0.00000000000008
94. 94. ANOVA Table Results are often displayed using an ANOVA Table Sum of Mean Squares df Square F Sig.Between 141.492 2 40.746 89.015 .0000000GroupsWithin Groups 33.380 42 .795Total 174.872 44
95. 95. ANOVA Table Results are often displayed using an ANOVA Table Sum of Mean Squares df Square F Sig.Between 141.492 2 40.746 89.015 .0000000GroupsWithin Groups 33.380 42 .795Total 174.872 44 Pop Quiz!: Where are the following quantities presented in this table? Sum of Squares Mean Square F Statistic p value Between (SSB) Error (MSE)
96. 96. ANOVA Table Results are often displayed using an ANOVA Table Sum of Mean Squares df Square F Sig.Between 141.492 2 40.746 89.015 .0000000GroupsWithin Groups 33.380 42 .795Total 174.872 44 Sum of Squares Mean Square F Statistic p value Between (SSB) Error (MSE)
97. 97. ANOVA Table Results are often displayed using an ANOVA Table Sum of Mean Squares df Square F Sig.Between 141.492 2 40.746 89.015 .0000000GroupsWithin Groups 33.380 42 .795Total 174.872 44 Sum of Squares Mean Square F Statistic p value Between (SSB) Error (MSE)
98. 98. ANOVA Table Results are often displayed using an ANOVA Table Sum of Mean Squares df Square F Sig.Between 141.492 2 40.746 89.015 .0000000GroupsWithin Groups 33.380 42 .795Total 174.872 44 Sum of Squares Mean Square F Statistic p value Between (SSB) Error (MSE)
99. 99. ANOVA Table Results are often displayed using an ANOVA Table Sum of Mean Squares df Square F Sig.Between 141.492 2 40.746 89.015 .0000000GroupsWithin Groups 33.380 42 .795Total 174.872 44 Sum of Squares Mean Square F Statistic p value Between (SSB) Error (MSE)
100. 100. ANOVA In SPSS4 For this exercise, we will be using the data from the CD, under Chapter 7, sga-bab7.sav4 This data came from a case-control study on factors affecting SGA in Kelantan.4 Open the data & select - >Analyse >Compare Means >One-Way ANOVA… ©drtamil@gmail.com 2012
101. 101. ANOVA in SPSS4 We want to see whether there is any association between the babies’ weight and mothers’ type of work. So select the risk factor (typework) into ‘Factor’ & the outcome (birthwgt) into ‘Dependent’.4 Now click on the ‘Post Hoc’ button. Select Bonferonni.4 Click the ‘Continue’ button & then click the ‘OK’ button.4 Then click on the ‘Options’ button. ©drtamil@gmail.com 2012
102. 102. ANOVA in SPSS4 Select ‘Descriptive’, ‘Homegeneity of variance test’ and ‘Means plot’.4 Click ‘Continue’ and then ‘OK’. ©drtamil@gmail.com 2012
103. 103. ANOVA Results DescriptivesBirth weight 95% Confidence Interval for Mean N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum MaximumHousewife 151 2.7801 .52623 .04282 2.6955 2.8647 1.90 4.72Office work 23 2.7643 .60319 .12577 2.5035 3.0252 1.60 3.96Field work 44 2.8430 .55001 .08292 2.6757 3.0102 1.90 3.79Total 218 2.7911 .53754 .03641 2.7193 2.8629 1.60 4.72 4 Compare the mean+sd of all groups. 4 Apparently there are not much difference of babies’ weight between the groups. ©drtamil@gmail.com 2012
104. 104. Results & Homogeneity of Variances Test of Homogeneity of Variances Birth weight Levene Statistic df1 df2 Sig. .757 2 215 .4704 Look at the p value of Levene’s Test. If p is not significant then equal variances is assumed. ©drtamil@gmail.com 2012
105. 105. ANOVA Results ANOVA Birth weight Sum of Squares df Mean Square F Sig. Between Groups .153 2 .077 .263 .769 Within Groups 62.550 215 .291 Total 62.703 2174 Sothe F value here is 0.263 and p =0.769. The difference is not significant. Therefore there is no association between the babies’ weight and mothers’ type of work. ©drtamil@gmail.com 2012
106. 106. How to present the result?Type of Work Mean+sd Test p Office 2.76 + 0.60 ANOVA Housewife 2.78 + 0.53 0.769 F = 0.263 Farmer 2.84 + 0.55 ©drtamil@gmail.com 2012
107. 107. Proportionate Test ©drtamil@gmail.com 2012
108. 108. Proportionate Test4 Qualitativedata utilises rates, i.e. rate of anaemia among males & females4 To compare such rates, statistical tests such as Z-Test and Chi-square can be used. ©drtamil@gmail.com 2012
109. 109. Formula p1 − p2 • where p1 is the rate forz= event 1 = a1/n1 1 1  p0 q0  +  • p2 is the rate for event 2 = a2/n2  n1 n2  • a1 and a2 are frequencies of event 1 and 2 p1n1 + p2 n2 4 We refer to the normalp0 = distribution table to n1 + n2 decide whether to reject or not the null hypothesis.q0 = 1 − p0 ©drtamil@gmail.com 2012
110. 110. http://stattrek.com/hypothesis- test/proportion.aspx4 ■The sampling method is simple random sampling.4 ■Each sample point can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.4 ■The sample includes at least 10 successes and 10 failures.4 ■The population size is at least 10 times as big as the sample size. ©drtamil@gmail.com 2012
111. 111. Example4 Comparison of worm infestation rate between male and female medical students in Year 2.4 Rate for males ; p1= 29/96 = 0.3024 Rate for females;p2 =24/104 = 0.2314 H0: There is no difference of worm infestation rate between male and female medical students in Year 2 ©drtamil@gmail.com 2012
112. 112. Cont. p1 p2p0 q0 ©drtamil@gmail.com 2012
113. 113. Cont.4 p0 = (29/96*96)+(24/104*104) = 0.265 96+1044 q0 = 1 – 0.265 = 0.735 ©drtamil@gmail.com 2012
114. 114. Cont.4z = 0.302 - 0.231 = 1.1367 ((0.735*0.265) (1/96 + 1/104))0.54 From the normal distribution table (A1), z value is significant at p=0.05 if it is above 1.96. Since the value is less than 1.96, then there is no difference of rate for worm infestatation between the male and female students. ©drtamil@gmail.com 2012
115. 115. Refer to Table A1.We don’t have 1.1367 so weuse 1.14 instead. If z = 1.14,then p=0.1271x2=0.2542Therefore if z=1.14,p=0.2542. H0 not rejected
116. 116. Exercise (try it)4 Comparison of failure rate between ACMS and UKM medical students in Year 2 for minitest 1 (MS2 2012).4 Rate for UKM ; p1= 42/196 = 0.2144 Rate for ACMS;p2 = 35/70 = 0.5 ©drtamil@gmail.com 2012
117. 117. Answer4 P1 = 0.214, p2 = 0.5, p0 = 0.289, q0 = 0.7114 N1 = 196, n2 = 70, Z = 20.470.5 = 4.524 p < 0.00006 ©drtamil@gmail.com 2012