More Related Content Similar to T test and ANOVA Similar to T test and ANOVA (20) More from Azmi Mohd Tamil More from Azmi Mohd Tamil (20) T test and ANOVA1. FF2613
Inferential Statistics, T Test,
ANOVA & Proportionate Test
Assoc. Prof . Dr Azmi Mohd Tamil
Dept of Community Health
Universiti Kebangsaan Malaysia
©drtamil@gmail.com 2012
3. Inferential Statistic
4 When we conduct a study, we want to
make an inference from the data
collected. For example;
“drug A is better than drug B in treating
disease D"
©drtamil@gmail.com 2012
4. Drug A Better Than Drug B?
4 Drug A has a higher rate of cure than
drug B. (Cured/Not Cured)
4 If for controlling BP, the mean of BP
drop for drug A is larger than drug B.
(continuous data – mm Hg)
©drtamil@gmail.com 2012
5. Null Hypothesis
4 Null Hyphotesis;
“no difference of effectiveness between
drug A and drug B in treating disease D"
©drtamil@gmail.com 2012
6. Null Hypothesis
4 H0is assumed TRUE unless data indicate
otherwise:
• The experiment is trying to reject the null
hypothesis
• Can reject, but cannot prove, a hypothesis
– e.g. “all swans are white”
» One black swan suffices to reject
» H0 “Not all swans are white”
» No number of white swans can prove the hypothesis –
since the next swan could still be black.
©drtamil@gmail.com 2012
7. Can reindeer fly?
4 You believe reindeer can fly
4 Null hypothesis: “reindeer cannot fly”
4 Experimental design: to throw reindeer off the
roof
4 Implementation: they all go splat on the ground
4 Evaluation: null hypothesis not rejected
• This does not prove reindeer cannot fly: what you have
shown is that
– “from this roof, on this day, under these weather conditions,
these particular reindeer either could not, or chose not to,
fly”
4 It is possible, in principle, to reject the null
hypothesis
• By exhibiting a flying reindeer!
©drtamil@gmail.com 2012
8. Significance
4 Inferential statistics determine whether a significant
difference of effectiveness exist between drug A
and drug B.
4 If there is a significant difference (p<0.05), then the
null hypothesis would be rejected.
4 Otherwise, if no significant difference (p>0.05), then
the null hypothesis would not be rejected.
4 The usual level of significance utilised to reject or
not reject the null hypothesis are either 0.05 or 0.01.
In the above example, it was set at 0.05.
©drtamil@gmail.com 2012
9. Confidence interval
4 Confidence interval = 1 - level of
significance.
4 If the level of significance is 0.05, then
the confidence interval is 95%.
4CI = 1 – 0.05 = 0.95 = 95%
4If CI = 99%, then level of
significance is 0.01.
©drtamil@gmail.com 2012
10. What is level of
significance? Chance?
Reject H0 Reject H0
.025 .025
-2.0639 0 2.0639
-1.96 1.96 t
©drtamil@gmail.com 2012
11. Fisher’s Use of p-Values
4 R.A. Fisher referred to the probability to declare
significance as “p-value”.
4 “It is a common practice to judge a result significant, if
it is of such magnitude that it would be produced by
chance not more frequently than once in 20 trials.”
4 1/20=0.05. If p-value less than 0.05, then the
probability of the effect detected were due to chance
is less than 5%.
4 We would be 95% confident that the effect detected is
due to real effect, not due to chance.
©drtamil@gmail.com 2012
12. Error
4 Although we have determined the level
of significance and confidence interval,
there is still a chance of error.
4 There are 2 types;
• Type I Error
• Type II Error
©drtamil@gmail.com 2012
13. Error
REALITY
Treatments are Treatments are
DECISION not different different
Conclude Correct Decision Type II error
treatments are β error
not different
(Cell a) (Cell b)
Conclude Type I error Correct Decision
treatments are α error
different
(Cell c) (Cell d)
©drtamil@gmail.com 2012
14. Error
Incorrect Null
Test of Correct Null Hypothesis Hypothesis
Significance (Ho not rejected) (Ho rejected)
Null Hypothesis
Not Rejected Correct Conclusion Type II Error
Null Hypothesis
Rejected Type I Error Correct Conclusion
©drtamil@gmail.com 2012
15. Type I Error
• Type I Error – rejecting the null hypothesis
although the null hypothesis is correct
e.g.
• when we compare the mean/proportion of
the 2 groups, the difference is small but the
difference is found to be significant.
Therefore the null hypothesis is rejected.
• It may occur due to inappropriate choice of
alpha (level of significance).
©drtamil@gmail.com 2012
16. Type II Error
• Type II Error – not rejecting the null
hypothesis although the null hypothesis is
wrong
• e.g. when we compare the mean/proportion
of the 2 groups, the difference is big but the
difference is not significant. Therefore the
null hypothesis is not rejected.
• It may occur when the sample size is
too small.
©drtamil@gmail.com 2012
17. Example of Type II Error
Data of a clinical trial on 30 patients on comparison of pain control between
two modes of treatment.
Type of treatment * Pain (2 hrs post-op) Crosstabulation
Pain (2 hrs post-op)
No pain In pain Total
Type of treatment Pethidine Count 8 7 15
% within Type
53.3% 46.7% 100.0%
of treatment
Cocktail Count 4 11 15
% within Type
26.7% 73.3% 100.0%
of treatment
Total Count 12 18 30
% within Type
40.0% 60.0% 100.0%
of treatment
Chi-square =2.222, p=0.136
p = 0.136. p bigger than 0.05. No significant difference and the null hypothesis was not
rejected.
There was a large difference between the rates but were not
significant. Type II Error? ©drtamil@gmail.com 2012
18. Not significant since power of
the study is less than 80%.
Power is only
32%!
©drtamil@gmail.com 2012
19. Check for the errors
4 You can check for type II errors of your
own data analysis by checking for the
power of the respective analysis
4 This can easily be done by utilising
software such as Power & Sample Size
(PS2) from the website of the Vanderbilt
University
©drtamil@gmail.com 2012
21. Data Analysis
4 Descriptive – summarising data
4 Test of Association
4 Multivariate – controlling for confounders
©drtamil@gmail.com 2012
22. Test of Association
4 To study the relationship between one
or more risk variable(s) (independent)
with outcome variable (dependent)
4 For example; does ethnicity affects the
suicidal/para-suicidal tendencies of
psychiatric patients.
©drtamil@gmail.com 2012
23. Problem Flow Chart
Independent Variables
Ethnicity Marital Status
Suicidal Tendencies
Dependent Variable
©drtamil@gmail.com 2012
24. Multivariat
4 Studies the association between
multiple causative factors/variables
(independent variables) with the
outcome (dependent).
4 For example; risk factors such as
parental care, practise of religion,
education level of parents & disciplinary
problems of their child (outcome).
©drtamil@gmail.com 2012
25. Hypothesis Testing
4 Distinguish parametric & non-parametric
procedures
4 Test two or more populations using
parametric & non-parametric procedures
• Means
• Medians
• Variances
©drtamil@gmail.com 2012
27. Parametric Test
Procedures
4 Involve population parameters
• Example: Population mean
4 Require interval scale or ratio scale
• Whole numbers or fractions
• Example: Height in inches: 72, 60.5, 54.7
4 Have stringent assumptions
• Example: Normal distribution
4 Examples: Z test, t test
©drtamil@gmail.com 2012
28. Nonparametric Test
Procedures
4 Statistic does not depend on population
distribution
4 Data may be nominally or ordinally
scaled
• Example: Male-female
4 May involve population parameters such
as median
4 Example: Wilcoxon rank sum test
©drtamil@gmail.com 2012
29. Parametric Analysis –
Quantitative
Qualitative Quantitative Normally distributed data Student's t Test
Dichotomus
Qualitative Quantitative Normally distributed data ANOVA
Polinomial
Quantitative Quantitative Repeated measurement of the Paired t Test
same individual & item (e.g.
Hb level before & after
treatment). Normally
distributed data
Quantitative - Quantitative - Normally distributed data Pearson Correlation
continous continous & Linear
Regresssion
©drtamil@gmail.com 2012
30. non-parametric tests
Variable 1 Variable 2 Criteria Type of Test
Qualitative Qualitative Sample size < 20 or (< 40 but Fisher Test
Dichotomus Dichotomus with at least one expected
value < 5)
Qualitative Quantitativ Data not normally distributed Wilcoxon Rank Sum
e
Dichotomus Test or U Mann-
Whitney Test
Qualitative Quantitativ Data not normally distributed Kruskal-Wallis One
e
Polinomial Way ANOVA Test
Quantitative Quantitativ Repeated measurement of the Wilcoxon Rank Sign
e
same individual & item Test
Quantitative - Quantitativ - Data not normally distributed Spearman/Kendall
e
continous continous Rank Correlation
©drtamil@gmail.com 2012
31. Statistical Tests - Qualitative
Variable 1 Variable 2 Criteria Type of Test
Qualitative Qualitative Sample size > 20 dan no Chi Square Test (X2)
expected value < 5
Qualitative Qualitative Sample size > 30 Proportionate Test
Dichotomus Dichotomus
Qualitative Qualitative Sample size > 40 but with at X2 Test with Yates
Dichotomus Dichotomus least one expected value < 5 Correction
Qualitative Quantitative
Qualitative Normallysize < 20 or data but Fisher Test Test
Sample distributed (< 40 Student's t
Dichotomus Dichotomus with at least one expected
value < 5)
Qualitative Quantitative Data not normally distributed Wilcoxon Rank Sum
©drtamil@gmail.com 2012
32. Data Analysis
4Using SPSS;
http://161.142.92.104/spss/
4Using Excel;
http://161.142.92.104/excel/
©drtamil@gmail.com 2012
33. FF2613
T Test, ANOVA &
Proportionate Test
Assoc. Prof . Dr Azmi Mohd Tamil
Dept of Community Health
Universiti Kebangsaan Malaysia
©drtamil@gmail.com 2012
34. T - Test
Independent T-Test
Student’s T-Test
Paired T-Test
ANOVA
© d rta m il@ g m a il. o m
c 2012
35. Student’s T-test
William Sealy Gosset @
“Student”, 1908. The Probable
Error of Mean. Biometrika.
©drtamil@gmail.com 2012
36. Student’s T-Test
4 To compare the means of two independent
groups. For example; comparing the mean
Hb between cases and controls. 2 variables
are involved here, one quantitative (i.e. Hb)
and the other a dichotomous qualitative
variable (i.e. case/control).
4 t=
©drtamil@gmail.com 2012
37. Examples: Student’s t-
test
4 Comparing the level of blood cholestrol
(mg/dL) between the hypertensive and
normotensive.
4 Comparing the HAMD score of two
groups of psychiatric patients treated
with two different types of drugs (i.e.
Fluoxetine & Sertraline
©drtamil@gmail.com 2012
38. Example
Group Statistics
DRUG N Mean Std. Deviation
DHAMAWK6 F 35 4.2571 3.12808
S 32 3.8125 4.39529
Independent Samples Test
t-test for Equality of Means
Sig. Mean
t df (2-tailed) Difference
DHAMAWK6 Equal variances
.48 65 .633 .4446
assumed
©drtamil@gmail.com 2012
39. Assumptions of T test
4 Observations are normally distributed in
each population. (Explore)
4 The population variances are equal.
( L e v e n e ’s T e s t)
The 2 groups are independent of each
other. (Design of study)
©drtamil@gmail.com 2012
40. Manual Calculation
4 Sample size > 30 4 Small sample size,
equal variance
X1 − X 2
t=
X1 − X 2 1 1
t= s0 +
n1 n2
2 2
s s
+ 1 2
n1 n2 (n1 − 1) s12 + (n2 − 1) s2
2
s0 =
2
(n1 − 1) + (n2 − 1)
©drtamil@gmail.com 2012
41. Example – compare
cholesterol level
4 Hypertensive : 4 Normal :
Mean : 214.92 Mean : 182.19
s.d. : 39.22 s.d. : 37.26
n : 64 n : 36
• Comparing the cholesterol level between
hypertensive and normal patients.
• The difference is (214.92 – 182.19) = 32.73 mg%.
• H0 : There is no difference of cholesterol level
between hypertensive and normal patients.
• n > 30, (64+36=100), therefore use the first formula.
©drtamil@gmail.com 2012
42. Calculation
X1 − X 2
t=
2 2
s s
1
+ 2
n1 n2
4t = (214.92- 182.19)________
((39.222/64)+(37.262/36))0.5
4 t = 4.137
4 df = n1+n2-2 = 64+36-2 = 98
4 Refer to t table; with t = 4.137, p < 0.001
©drtamil@gmail.com 2012
43. If df>100, can refer Table A1.
We don’t have 4.137 so we
use 3.99 instead. If t = 3.99,
then p=0.00003x2=0.00006
Therefore if t=4.137,
p<0.00006.
44. Or can refer to Table A3.
We don’t have df=98,
so we use df=60 instead.
t = 4.137 > 3.46 (p=0.001)
Therefore if t=4.137, p<0.001.
45. Conclusion
• Therefore p < 0.05, null hypothesis rejected.
• There is a significant difference of
cholesterol level between hypertensive and
normal patients.
• Hypertensive patients have a significantly
higher cholesterol level compared to
normotensive patients.
©drtamil@gmail.com 2012
46. Exercise (try it)
• Comparing the mini test 1 (2012) results between
UKM and ACMS students.
• The difference is 11.255
• H0 : There is no difference of marks between UKM
and ACMS students.
• n > 30, therefore use the first formula.
©drtamil@gmail.com 2012
48. T-Test In SPSS
4 For this exercise, we will
be using the data from
the CD, under Chapter
7, sga-bab7.sav
4 This data came from a
case-control study on
factors affecting SGA in
Kelantan.
4 Open the data & select -
>Analyse
>Compare Means
>Ind-Samp T
Test…
©drtamil@gmail.com 2012
49. T-Test in SPSS
4 We want to see whether
there is any association
between the mothers’ weight
and SGA. So select the risk
factor (weight2) into ‘Test
Variable’ & the outcome
(SGA) into ‘Grouping
Variable’.
4 Now click on the ‘Define
Groups’ button. Enter
• 0 (Control) for Group 1 and
• 1 (Case) for Group 2.
4 Click the ‘Continue’ button &
then click the ‘OK’ button.
©drtamil@gmail.com 2012
50. T-Test Results
Group Statistics
Std. Error
SGA N Mean Std. Deviation Mean
Weight at first ANC Normal 108 58.666 11.2302 1.0806
SGA 109 51.037 9.3574 .8963
4 Compare the mean+sd of both groups.
• Normal 58.7+11.2 kg
• SGA 51.0+ 9.4 kg
4 Apparently there is a difference of
weight between the two groups.
©drtamil@gmail.com 2012
51. Results & Homogeneity of
Variances
Independent Samples Test
Levene's Test for
Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
Weight at first ANC Equal variances
1.862 .174 5.439 215 .000 7.629 1.4028 4.8641 10.3940
assumed
Equal variances
5.434 207.543 .000 7.629 1.4039 4.8612 10.3969
not assumed
4 Look at the p value of Levene’s Test. If p is not
significant then equal variances is assumed (use top
row).
4 If it is significant then equal variances is not assumed
(use bottom row).
4 So the t value here is 5.439 and p < 0.0005. The
difference is significant. Therefore there is an
association between the mothers weight and SGA.
©drtamil@gmail.com 2012
52. How to present the
result?
Group N Mean test p
Normal 108 58.7+11.2 kg
T test
<0.0005
t = 5.439
SGA 109 51.0+ 9.4
©drtamil@gmail.com 2012
55. Formula
d −0
t=
sd
n
(∑ d )
2
∑d i
2
−
n
sd =
n −1
df = n p − 1
©drtamil@gmail.com 2012
56. Examples of paired t-test
4 Comparing the HAMD score between
week 0 and week 6 of treatment with
Sertraline for a group of psychiatric
patients.
4 Comparing the haemoglobin level
amongst anaemic pregnant women after
6 weeks of treatment with haematinics.
©drtamil@gmail.com 2012
57. Example
Paired Samples Statistics
Mean N Std. Deviation
Pair DHAMAWK0 13.9688 32 6.48315
1 DHAMAWK6 3.8125 32 4.39529
Paired Samples Test
Paired Differences
Std. Sig.
Mean Deviation t df (2-tailed)
Pair DHAMAWK0 -
10.1563 6.75903 8.500 31 .000
1 DHAMAWK6
©drtamil@gmail.com 2012
58. M a n u a l C a l c u l a t i o n
The measurement of the systolic and diastolic
blood pressures was done two consecutive
times with an interval of 10 minutes. You want
to d e te r m in e w h e th e r th e r e w a s a n y
difference between those two measurements.
4 H0:There is no difference of the systolic blood
pressure during the first (time 0) and second
measurement (time 10 minutes).
©drtamil@gmail.com 2012
59. Calculation
4 Calculate the difference between first &
second measurement and square it.
Total up the difference and the square.
©drtamil@gmail.com 2012
60. Calculation
4∑ d = 112 ∑ d2 = 1842 n = 36
4 Mean d = 112/36 = 3.11
4 sd = ((1842-1122/36)/35)0.5 d −0
t=
sd
sd = 6.53 n
4 t = 3.11/(6.53/6)
t = 2.858 (∑ d )
2
4 df = np – 1 = 36 – 1 = 35. ∑ d i2 −
n
sd =
n −1
4 Refer to t table;
df = n p − 1
©drtamil@gmail.com 2012
61. Refer to Table A3.
We don’t have df=35,
so we use df=30 instead.
t = 2.858, larger than 2.75
(p=0.01) but smaller than 3.03
(p=0.005). 3.03>t>2.75
Therefore if t=2.858,
0.005<p<0.01.
62. Conclusion
with t = 2.858, 0.005<p<0.01
Therefore p < 0.01.
Therefore p < 0.05, null hypothesis
rejected.
Conclusion: There is a significant
difference of the systolic blood pressure
between the first and second
measurement. The mean average of first
reading is significantly higher compared
to the second reading.
©drtamil@gmail.com 2012
63. Paired T-Test In SPSS
4 For this exercise, we will
be using the data from
the CD, under Chapter
7, sgapair.sav
4 This data came from a
controlled trial on
haematinic effect on Hb.
4 Open the data & select -
>Analyse
>Compare Means
>Paired-Samples T
Test…
©drtamil@gmail.com 2012
64. Paired T-Test In SPSS
4 We want to see whether
there is any association
between the prescription
on haematinic to
anaemic pregnant
mothers and Hb.
4 We are comparing the
Hb before & after
treatment. So pair the
two measurements (Hb2
& Hb3) together.
4 Click the ‘OK’ button.
©drtamil@gmail.com 2012
65. Paired T-Test Results
Paired Samples Statistics
Std. Error
Mean N Std. Deviation Mean
Pair HB2 10.247 70 .3566 .0426
1 HB3 10.594 70 .9706 .1160
4 Thisshows the mean & standard
deviation of the two groups.
©drtamil@gmail.com 2012
66. Paired T-Test Results
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair 1 HB2 - HB3 -.347 .9623 .1150 -.577 -.118 -3.018 69 .004
4 This shows the mean difference of Hb
before & after treatment is only 0.347
g%.
4 Yet the t=3.018 & p=0.004 show the
difference is statistically significant.
©drtamil@gmail.com 2012
67. How to present the
result?
Mean D
Group N Test p
(Diff.)
Before
treatment
Paired T-
(HB2) vs
70 0.35 + 0.96 test 0.004
After
t = 3.018
treatment
(HB3)
©drtamil@gmail.com 2012
68. ANOVA
©drtamil@gmail.com 2012
69. ANOVA –
Analysis of Variance
4 Extension of independent-samples t test
4 Comparesthe means of groups of
independent observations
• Don’t be fooled by the name. ANOVA does
not compare variances.
4 Can compare more than two groups
©drtamil@gmail.com 2012
70. One-Way ANOVA
F-Test
4 Tests the equality of 2 or more population means
4 Variables
• One nominal scaled independent variable
– 2 or more treatment levels or classifications
(i.e. Race; Malay, Chinese, Indian & Others)
• One interval or ratio scaled dependent variable
(i.e. weight, height, age)
4 Used to analyse completely randomized
experimental designs
©drtamil@gmail.com 2012
71. Examples
4 Comparing the blood cholesterol levels
between the bus drivers, bus conductors
and taxi drivers.
4 Comparing the mean systolic pressure
between Malays, Chinese, Indian &
Others.
©drtamil@gmail.com 2012
72. One-Way ANOVA
F-Test Assumptions
4 Randomness & independence of errors
• Independent random samples are drawn
4 Normality
• Populations are normally distributed
4 Homogeneity of variance
• Populations have equal variances
©drtamil@gmail.com 2012
73. Example
Descriptives
Birth weight
N Mean Std. Deviation Minimum Maximum
Housewife 151 2.7801 .52623 1.90 4.72
Office work 23 2.7643 .60319 1.60 3.96
Field work 44 2.8430 .55001 1.90 3.79
Total 218 2.7911 .53754 1.60 4.72
ANOVA
Birth weight
Sum of
Squares df Mean Square F Sig.
Between Groups .153 2 .077 .263 .769
Within Groups 62.550 215 .291
Total 62.703 217
©drtamil@gmail.com 2012
76. Example:
Time To Complete
Analysis
45 samples were
analysed using 3 different
blood analyser (Mach1,
Mach2 & Mach3).
15 samples were placed
into each analyser.
Time in seconds was
measured for each
sample analysis.
77. Example:
Time To Complete
Analysis
The overall mean of the
entire sample was 22.71
seconds.
This is called the “grand”
mean, and is often
denoted by X .
If H0 were true then we’d
expect the group means
to be close to the grand
mean.
79. The Anova Statistic
To combine the differences from the grand mean we
• Square the differences
• Multiply by the numbers of observations in the groups
• Sum over the groups
( )2
( )
2
(
SSB = 15 X Mach1 − X + 15 X Mach 2 − X + 15 X Mach3 − X )
2
where the X * are the group means.
“SSB” = Sum of Squares Between groups
80. The Anova Statistic
To combine the differences from the grand mean we
• Square the differences
• Multiply by the numbers of observations in the groups
• Sum over the groups
( )2
( )
2
(
SSB = 15 X Mach1 − X + 15 X Mach 2 − X + 15 X Mach3 − X )
2
where the X * are the group means.
“SSB” = Sum of Squares Between groups
Note: This looks a bit like a variance.
81. Sum of Squares Between
( )
2
( )2
(
SSB = 15 X Mach1 − X + 15 X Mach 2 − X + 15 X Mach3 − X )2
4 Grand Mean = 22.71
4 Mean Mach1 = 24.93; (24.93-22.71)2=4.9284
4 Mean Mach2 = 22.61; (22.61-22.71)2=0.01
4 Mean Mach3 = 20.59; (20.59-22.71)2=4.4944
4 SSB = (15*4.9284)+(15*0.01)+(15*4.4944)
4 SSB = 141.492
©drtamil@gmail.com 2012
82. How big is big?
4 For the Time to Complete, SSB = 141.492
4 Is that big enough to reject H0?
4 As with the t test, we compare the statistic to
the variability of the individual observations.
4 InANOVA the variability is estimated by the
Mean Square Error, or MSE
83. MSE
Mean Square Error
The Mean Square Error
is a measure of the
variability after the
group effects have
been taken into
account.
∑∑ (x − X j)
1 2
MSE = ij
N −K j i
where xij is the ith
observation in the jth
group.
84. MSE
Mean Square Error
The Mean Square Error
is a measure of the
variability after the
group effects have
been taken into
account.
∑∑ (x − X j)
1 2
MSE = ij
N −K j i
where xij is the ith
observation in the jth
group.
85. MSE
Mean Square Error
The Mean Square Error
is a measure of the
variability after the
group effects have
been taken into
account.
∑∑ (x − X j)
1 2
MSE = ij
N −K j i
86. ∑∑ (xij − X j )
1 2
MSE =
N −K j i
Mach1 (x-mean)^2 Mach2 (x-mean)^2 Mach3 (x-mean)^2
23.73 1.4400 21.5 1.2321 19.74 0.7225
23.74 1.4161 21.6 1.0201 19.75 0.7056
23.75 1.3924 21.7 0.8281 19.76 0.6889
24.00 0.8649 21.7 0.8281 19.9 0.4761
24.10 0.6889 21.8 0.6561 20 0.3481
24.20 0.5329 21.9 0.5041 20.1 0.2401
25.00 0.0049 22.75 0.0196 20.3 0.0841
25.10 0.0289 22.75 0.0196 20.4 0.0361
25.20 0.0729 22.75 0.0196 20.5 0.0081
25.30 0.1369 23.3 0.4761 20.5 0.0081
25.40 0.2209 23.4 0.6241 20.6 0.0001
25.50 0.3249 23.4 0.6241 20.7 0.0121
26.30 1.8769 23.5 0.7921 22.1 2.2801
26.31 1.9044 23.5 0.7921 22.2 2.5921
26.32 1.9321 23.6 0.9801 22.3 2.9241
SUM 12.8380 9.4160 11.1262
©drtamil@gmail.com 2012
87. ∑∑ (xij − X j )
1 2
MSE =
N −K j i
4 Note that the variation of the means
(141.492) seems quite large (more likely
to be significant???) compared to the
variance of observations within groups
(12.8380+9.4160+11.1262=33.3802).
4 MSE = 33.3802/(45-3) = 0.7948
©drtamil@gmail.com 2012
88. Notes on MSE
4 Ifthere are only two groups, the MSE is equal
to the pooled estimate of variance used in the
equal-variance t test.
4 ANOVA assumes that all the group variances
are equal.
4 Other options should be considered if group
variances differ by a factor of 2 or more.
4 (12.8380 ~ 9.4160 ~ 11.1262)
89. ANOVA F Test
4 The ANOVA F test is based on the F statistic
SSB (K − 1)
F=
MSE
where K is the number of groups.
4 Under H0 the F statistic has an “F” distribution,
with K-1 and N-K degrees of freedom (N is the
total number of observations)
90. Time to Analyse:
F test p-value
To get a p-value we
compare our F statistic
to an F(2, 42)
distribution.
91. Time to Analyse:
F test p-value
To get a p-value we
compare our F statistic
to an F(2, 42)
distribution.
In our example
141.492 2
F= = 89.015
33.3802 42
We cannot draw the line
since the F value is so
large, therefore the p
value is so small!!!!!!
92. Refer to F Dist. Table (α=0.01).
We don’t have df=2;42,
so we use df=2;40 instead.
F = 89.015, larger than 5.18
(p=0.01)
Therefore if F=89.015, p<0.01.
Why use df=2;42?
We have 3 groups
so K-1 = 2
We have 45
samples therefore
N-K = 42. ©drtamil@gmail.com 2012
93. Time to Analyse:
F test p-value
To get a p-value we
compare our F statistic
to an F(2, 42)
distribution.
In our example
141.492 2
F= = 89.015
33.3802 42
The p-value is really
P(F (2,42) > 89.015) = 0.00000000000008
94. ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Mean
Squares df Square F Sig.
Between
141.492 2 40.746 89.015 .0000000
Groups
Within Groups 33.380 42 .795
Total 174.872 44
95. ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Mean
Squares df Square F Sig.
Between
141.492 2 40.746 89.015 .0000000
Groups
Within Groups 33.380 42 .795
Total 174.872 44
Pop Quiz!: Where are the following quantities presented in this table?
Sum of Squares Mean Square F Statistic p value
Between (SSB) Error (MSE)
96. ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Mean
Squares df Square F Sig.
Between
141.492 2 40.746 89.015 .0000000
Groups
Within Groups 33.380 42 .795
Total 174.872 44
Sum of Squares Mean Square F Statistic p value
Between (SSB) Error (MSE)
97. ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Mean
Squares df Square F Sig.
Between
141.492 2 40.746 89.015 .0000000
Groups
Within Groups 33.380 42 .795
Total 174.872 44
Sum of Squares Mean Square F Statistic p value
Between (SSB) Error (MSE)
98. ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Mean
Squares df Square F Sig.
Between
141.492 2 40.746 89.015 .0000000
Groups
Within Groups 33.380 42 .795
Total 174.872 44
Sum of Squares Mean Square F Statistic p value
Between (SSB) Error (MSE)
99. ANOVA Table
Results are often displayed using an ANOVA Table
Sum of Mean
Squares df Square F Sig.
Between
141.492 2 40.746 89.015 .0000000
Groups
Within Groups 33.380 42 .795
Total 174.872 44
Sum of Squares Mean Square F Statistic p value
Between (SSB) Error (MSE)
100. ANOVA In SPSS
4 For this exercise, we will
be using the data from
the CD, under Chapter
7, sga-bab7.sav
4 This data came from a
case-control study on
factors affecting SGA in
Kelantan.
4 Open the data & select -
>Analyse
>Compare Means
>One-Way
ANOVA…
©drtamil@gmail.com 2012
101. ANOVA in SPSS
4 We want to see whether
there is any association
between the babies’ weight
and mothers’ type of work.
So select the risk factor
(typework) into ‘Factor’ & the
outcome (birthwgt) into
‘Dependent’.
4 Now click on the ‘Post Hoc’
button. Select Bonferonni.
4 Click the ‘Continue’ button &
then click the ‘OK’ button.
4 Then click on the ‘Options’
button.
©drtamil@gmail.com 2012
102. ANOVA in SPSS
4 Select ‘Descriptive’,
‘Homegeneity of
variance test’ and
‘Means plot’.
4 Click ‘Continue’ and
then ‘OK’.
©drtamil@gmail.com 2012
103. ANOVA Results
Descriptives
Birth weight
95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
Housewife 151 2.7801 .52623 .04282 2.6955 2.8647 1.90 4.72
Office work 23 2.7643 .60319 .12577 2.5035 3.0252 1.60 3.96
Field work 44 2.8430 .55001 .08292 2.6757 3.0102 1.90 3.79
Total 218 2.7911 .53754 .03641 2.7193 2.8629 1.60 4.72
4 Compare the mean+sd of all groups.
4 Apparently there are not much
difference of babies’ weight between the
groups.
©drtamil@gmail.com 2012
104. Results & Homogeneity of
Variances
Test of Homogeneity of Variances
Birth weight
Levene
Statistic df1 df2 Sig.
.757 2 215 .470
4 Look at the p value of Levene’s Test. If p
is not significant then equal variances is
assumed.
©drtamil@gmail.com 2012
105. ANOVA Results
ANOVA
Birth weight
Sum of
Squares df Mean Square F Sig.
Between Groups .153 2 .077 .263 .769
Within Groups 62.550 215 .291
Total 62.703 217
4 Sothe F value here is 0.263 and p =0.769.
The difference is not significant. Therefore
there is no association between the
babies’ weight and mothers’ type of work.
©drtamil@gmail.com 2012
106. How to present the
result?
Type of Work Mean+sd Test p
Office 2.76 + 0.60
ANOVA
Housewife 2.78 + 0.53 0.769
F = 0.263
Farmer 2.84 + 0.55
©drtamil@gmail.com 2012
108. Proportionate Test
4 Qualitativedata utilises rates, i.e. rate of
anaemia among males & females
4 To compare such rates, statistical tests
such as Z-Test and Chi-square can be
used.
©drtamil@gmail.com 2012
109. Formula
p1 − p2 • where p1 is the rate for
z= event 1 = a1/n1
1 1
p0 q0 + • p2 is the rate for event 2
= a2/n2
n1 n2
• a1 and a2 are frequencies
of event 1 and 2
p1n1 + p2 n2 4 We refer to the normal
p0 = distribution table to
n1 + n2 decide whether to reject
or not the null
hypothesis.
q0 = 1 − p0
©drtamil@gmail.com 2012
110. http://stattrek.com/hypothesis-
test/proportion.aspx
4 ■The sampling method is simple random
sampling.
4 ■Each sample point can result in just two
possible outcomes. We call one of these
outcomes a success and the other, a failure.
4 ■The sample includes at least 10 successes
and 10 failures.
4 ■The population size is at least 10 times as
big as the sample size.
©drtamil@gmail.com 2012
111. Example
4 Comparison of worm infestation rate
between male and female medical
students in Year 2.
4 Rate for males ; p1= 29/96 = 0.302
4 Rate for females;p2 =24/104 = 0.231
4 H0: There is no difference of worm
infestation rate between male and
female medical students in Year 2
©drtamil@gmail.com 2012
112. Cont.
p1 p2
p0 q0
©drtamil@gmail.com 2012
113. Cont.
4 p0 = (29/96*96)+(24/104*104) = 0.265
96+104
4 q0 = 1 – 0.265 = 0.735
©drtamil@gmail.com 2012
114. Cont.
4z = 0.302 - 0.231 = 1.1367
((0.735*0.265) (1/96 + 1/104))0.5
4 From the normal distribution table (A1), z value
is significant at p=0.05 if it is above 1.96. Since
the value is less than 1.96, then there is no
difference of rate for worm infestatation
between the male and female students.
©drtamil@gmail.com 2012
115. Refer to Table A1.
We don’t have 1.1367 so we
use 1.14 instead. If z = 1.14,
then p=0.1271x2=0.2542
Therefore if z=1.14,
p=0.2542. H0 not rejected
116. Exercise (try it)
4 Comparison of failure rate between
ACMS and UKM medical students in
Year 2 for minitest 1 (MS2 2012).
4 Rate for UKM ; p1= 42/196 = 0.214
4 Rate for ACMS;p2 = 35/70 = 0.5
©drtamil@gmail.com 2012
117. Answer
4 P1 = 0.214, p2 = 0.5, p0 = 0.289, q0 = 0.711
4 N1 = 196, n2 = 70, Z = 20.470.5 = 4.52
4 p < 0.00006
©drtamil@gmail.com 2012
Editor's Notes 2 As a result of this class, you will be able to ... 13 14 2 As a result of this class, you will be able to ... 79 Note: There is one dependent variable in the ANOVA model. MANOVA has more than one dependent variable. Ask, what are nominal & interval scales? 80