Upcoming SlideShare
×

# Anova by Hazilah Mohd Amin

3,732
-1

Published on

Nota kuliah Pensampelan dan analisis data

Published in: Education, Technology
5 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
3,732
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
279
0
Likes
5
Embeds 0
No embeds

No notes for slide
• Change to page 800
• Change to page 803
• Change to page 803
• Delete slide and insert procedure 16.1 (steps 1-4) from page 813
• Delete slide and insert procedure 16.1 (steps 5-7 critical value approach) from page 813
• Change to page 803
• You assign randomly 3 people to each method, making sure that they are similar in intelligence etc.
• You assign randomly 3 people to each method, making sure that they are similar in intelligence etc.
• ### Anova by Hazilah Mohd Amin

1. 1. Hazilah Mohd Amin Analysis of Variance (ANOVA)
2. 2. Goals <ul><li>After completing, you should be able to: </li></ul><ul><li>Recognize situations in which to use analysis of variance (ANOVA) </li></ul><ul><li>Perform a single-factor hypothesis test for Comparing More Than Two Means and interpret results </li></ul>
3. 3. <ul><li>The F - Distribution </li></ul><ul><li>Analysis-of-variance procedures rely on F-distribution. </li></ul><ul><li>There are infinitely many F-distributions, and we identify an F-distribution (and F-curve) by its number of degrees of freedom. </li></ul><ul><li>F-distribution has two numbers of degrees of freedom. </li></ul>
4. 4. Key Fact F distribuition curve:
5. 5. Find Critical Value: Example <ul><li>Find the F value for 8 df for numerator, 14 df for denominator, and 0.05 area in the right tail of the F distribuition curve. </li></ul>Critical value: F  , df numerator,df denominator = F  , 8,14 = ?
6. 6. Table 12.1 (p. 534) Critical value: F  , 8,14 = 2.70
7. 7. Hypotheses of One-Way ANOVA <ul><ul><li>All population means are equal </li></ul></ul><ul><ul><li>i.e., no treatment effect (no variation in means among groups) </li></ul></ul><ul><ul><li>At least one population mean is different </li></ul></ul><ul><ul><li>i.e., there is a treatment effect </li></ul></ul><ul><ul><li>Does not mean that all population means are different (some pairs may be the same) </li></ul></ul>The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means .
8. 8. One-Factor ANOVA All Means are the same: The Null Hypothesis is True (No Treatment Effect)
9. 9. One-Factor ANOVA At least one mean is different: The Null Hypothesis is NOT true (Treatment Effect is present) or
10. 10. One-Way Analysis of Variance
11. 13. One-Factor ANOVA F Test: Example 1 <ul><li>You want to see if three different golf clubs yield different distances. </li></ul><ul><li>You randomly select five measurements from trials on an automated driving machine for each club. </li></ul><ul><li>At the .05 significance level, is there a difference in mean distance? </li></ul>Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204
12. 14. <ul><li>Solution of Example 1 </li></ul><ul><ul><li>The data are interval </li></ul></ul><ul><ul><li>The problem objective is to compare mean distances in three type of golf club. </li></ul></ul><ul><ul><li>We hypothesize that the three population means are equal </li></ul></ul>One Way A n a l y s i s o f V a r i a n c e
13. 15. Defining the Hypotheses <ul><ul><ul><li>H 0 :  1 =  2 =  3 </li></ul></ul></ul><ul><ul><ul><li>H 1 : At least two means differ </li></ul></ul></ul><ul><ul><li>Solution </li></ul></ul>
14. 16. N o t a t i o n Independent samples are drawn from k populations (treatments). X 11 x 21 . . . X n1,1 X 12 x 22 . . . X n2,2 X 1k x 2k . . . X nk,k Sample size Sample mean X is the “response variable”. The variables’ value are called “responses”.
15. 17. T e r m i n o l o g y <ul><li>In the context of this problem… </li></ul><ul><ul><li>Response variable – distance Experimental unit – golf club when we record distance figures. </li></ul></ul><ul><ul><li>Factor or treatment – the criterion by which we classify the populations (the treatments). In this problems the factor is the type of golf clubs. </li></ul></ul>
16. 18. The rationale of the name of A n a l y s i s o f V a r i a n c e ( A N O V A ) <ul><li>We are testing the different between means but why ANOVA? </li></ul><ul><li>Two types of variability are employed when testing for the equality of the population means: Within Samples and Between Samples </li></ul>
17. 19. One Way A n a l y s i s o f V a r i a n c e Graphical demonstration : Employing two types of variability: Within Samples and Between Samples
18. 20. Treatment 1 Treatment 2 Treatment 3 20 16 15 14 11 10 9 The sample means are the same as before, but the larger within-sample variability makes it harder to draw a conclusion about the population means. A small variability within the samples makes it easier to draw a conclusion about the population means. 20 25 30 1 7 Treatment 1 Treatment 2 Treatment 3 10 12 19 9
19. 21. One-Factor ANOVA Example: Scatter Diagram • • • • • 270 260 250 240 230 220 210 200 190 • • • • • • • • • • Distance Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204 Club 1 2 3 From scatter diagram, we can clearly see sample means difference because of small within-sample variability
20. 22. Test Statistics (F), Critical Value & Rejection Criterion <ul><li>Test statistic: </li></ul><ul><ul><ul><li>where MSB is mean squares between variances </li></ul></ul></ul><ul><ul><ul><li>where MSW is mean squares within variances </li></ul></ul></ul><ul><li>Rejection Region: F > F  , k-1,n-k </li></ul><ul><li>Degrees of freedom </li></ul><ul><ul><li>df 1 = k – 1 (k = levels or treatments) </li></ul></ul><ul><ul><li>df 2 = n – k (n = sum of sample sizes from all populations) </li></ul></ul>H 0 : μ 1 = μ 2 = … = μ k H A : At least two population means are different The hypothesis test:
21. 23. One-Factor ANOVA Example Computations Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 251 216 204 x 1 = 249.2 x 2 = 226.0 x 3 = 205.8 x = 227.0 n 1 = 5 n 2 = 5 n 3 = 5 n = 15 k = 3 MSB = 4716.4 / (3-1) = 2358.2 MSW = 1119.6 / (15-3) = 93.3 SSB = 4716.4 SSW = 1119.6
22. 24. One-Factor ANOVA Example Solution <ul><li>H 0 : μ 1 = μ 2 = μ 3 </li></ul><ul><li>H A : μ i not all equal </li></ul><ul><li> = .05 </li></ul><ul><li>df 1 = k-1 =3-1 =2 </li></ul><ul><li>df 2 = n-k =15-3 =12 </li></ul>F = 25.275 Test Statistic: Decision: Test statistic F is greater than critical value Conclusion: Reject H 0 at  = 0.05 There is evidence that at least one μ i differs from the rest 0  = .05 F .05 = 3.885 Reject H 0 Do not reject H 0 Critical Value: F  , k-1,n-k = F  , 2,12 = 3.885
23. 25. ANOVA Single Factor: Excel Output EXCEL: tools | data analysis | ANOVA: single factor F  , k-1,n-k = F  , 2,12 = 3.885 SUMMARY Groups Count Sum Average Variance Club 1 5 1246 249.2 108.2 Club 2 5 1130 226 77.5 Club 3 5 1029 205.8 94.2 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 4716.4 2 2358.2 25.275 4.99E-05 3.885 Within Groups 1119.6 12 93.3 Total 5836.0 14
24. 26. Rationale 1: Variability Between Sample <ul><li>If H 0 : μ 1 = μ 2 = … = μ k is true , we would expect all the sample means to be close to one another. </li></ul><ul><li>If the alternative hypothesis is true, at least some of the sample means would differ. </li></ul><ul><li>Thus, we measure variability between sample means (and hence MSB or MSTr). </li></ul>
25. 27. <ul><li>Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means. </li></ul><ul><li>Therefore, even though sample means may markedly differ from one another, we have to consider the “within samples variability” (and hence MSW or MSE). </li></ul>Rationale II: Variability Within
26. 28. Interpreting One-Factor ANOVA F Statistic <ul><li>The F statistic is the ratio of the between estimate of variance and the within estimate of variance </li></ul><ul><ul><li>The ratio must always be positive </li></ul></ul><ul><ul><li>df 1 = k -1 will typically be small </li></ul></ul><ul><ul><li>df 2 = n - k will typically be large </li></ul></ul><ul><li>The test statistic F ratio should be close to 1 (SSB small due to small sample means difference) if </li></ul><ul><li>H 0 : μ 1 = μ 2 = … = μ k is true </li></ul><ul><li>The ratio will be larger than 1 (SSB large due to large sample means difference) if </li></ul><ul><li>H 0 : μ 1 = μ 2 = … = μ k is false </li></ul>
27. 29. Example 2 <ul><li>A study was conducted to determine if the drying time for a certain paint is affected by the type of applicator used. The data in the table on the next screen represents the drying time (in minutes) for 3 different applicators when the paint was applied to standard wallboard. </li></ul><ul><li>Is there any evidence to suggest the type of applicator has a significant effect on the paint drying time at the 0.05 level? </li></ul><ul><li>Notation : </li></ul><ul><li>The type of applicator is the treatment, factor or level . hence k = 3 </li></ul>
28. 30. Notation Used in ANOVA Factor Levels Sample from Sample from Sample from Sample from Replication Level 1 Level 2 Level 3 Level k n = 1 x 1,1 x 2,1 x 3,1 x k ,1 n = 2 x 1,2 x 2,2 x 3,2 x k ,2 n = 3 x 1,3 x 2,3 x 3,3 x k ,3 Column T 1 T 2 T 3 T k T Totals T = grand total = sum of all x 's =  x =  T i . . . . . . . . .
29. 31. Sample Results  1 x  2 x  3 x
30. 32. Solution <ul><li>Assumptions: </li></ul><ul><ul><li>The data (samples) was randomly collected and all observations are independent. </li></ul></ul><ul><ul><li>The populations are (approximately) normally distributed. </li></ul></ul><ul><ul><li>Populations have equal variances. </li></ul></ul><ul><li>The null and the alternative hypothesis: </li></ul><ul><li>H o :  1 =  2 =  3 The mean drying time is the same for each applicator </li></ul><ul><li>H a : At least one mean is different Not all drying time means are equal </li></ul>
31. 33. Partition of Total Variation <ul><li>Commonly referred to as: </li></ul><ul><li>Sum of Squares Within (SSW) </li></ul><ul><li>Sum of Squares Error (SSE) </li></ul><ul><li>Sum of Squares Unexplained </li></ul><ul><li>Within Groups Variation </li></ul>Variation Due to Factor/Treatment (SSB) Variation Due to Random Sampling (SSW) Sum of Squares Total (SST) <ul><li>Commonly referred to as: </li></ul><ul><li>Sum of Squares Between (SSB) </li></ul><ul><li>Sum of Squares Treatment (SSTr) </li></ul><ul><li>Sum of Squares Factor </li></ul><ul><li>Sum of Squares Among </li></ul><ul><li>Sum of Squares Explained </li></ul><ul><li>Among Groups Variation </li></ul>= + Total variation SST can be split into two parts: SST = SSB + SSW
32. 35.  x and  x 2 Calculator: Enter x i data, retrieve  x and  x 2 <ul><li>Enter Statictics SD: Mode Mode 1 </li></ul><ul><li>Clear old data: Shift Clr 1 = </li></ul><ul><li>Enter x i data: 39.1 DT 39.4 DT 31.1 DT 33.7 DT 30.5 DT 34.6 DT …29.5 DT </li></ul><ul><li>Find (  x): Shift S-SUM 2 = 616.5 </li></ul><ul><li>Find (  x 2 ): Shift S-SUM 1 = 20,316.69 </li></ul>
33. 36. Variation Sums of Squares
34. 37. Mean Square The mean square for the factor being tested and for the error is obtained by dividing the sum-of-square value by the corresponding number of degrees of freedom Numerator degrees of freedom = df(factor) = k  1 = 3  1 = 2 df(total) = n  1 = 19  1 = 18 Denominator degrees of freedom = df(error) = n  k = 19  3 = 16 Calculations:
35. 38. One-Way ANOVA Table Source of Variation df SS MS Between Samples SSB MSB = Within Samples n - k SSW MSW = Total n - 1 SST = SSB+SSW k - 1 MSB MSW F ratio SSB k - 1 SSW n - k F = <ul><li>The sums of squares and the degrees of freedom must check SS(factor) + SS(error) = SS(total) or SSB + SSW = SST df(factor) + df(error) = df(total) or df(between) + df(within) = df(total) </li></ul>An ANOVA table is often used to record the sums of squares and to organize the rest of the calculations. Format for the ANOVA Table:
36. 39. The Completed ANOVA Table The Complete ANOVA Table: The Test Statistic:
37. 40. Solution Continued The Results a. Decision: Reject H o at  = 0.05 b. Conclusion : There is evidence to suggest the three population means are not all the same. The type of applicator has a significant effect on the paint drying time at the 0.05 level of significance. Critical Value: F  , k-1,n-k = F  , 2,16 = 3.63 The Test Statistic F = 4.27 is in the rejection region. Reject H 0 F .05 = 3.63 Do not reject H 0  = .05
38. 41. One-Way ANOVA F-Test: Exercise 1 <ul><li>You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level? </li></ul><ul><li>The data in the table represents the learning times (in hours) of 12 people using 4 different training methods. </li></ul><ul><li>M1 M2 M3 M4 10 11 13 18 9 16 8 23 5 9 9 25 </li></ul>© 1984-1994 T/Maker Co. Answer: Critical Value = 4.07. Test statistic = 11.6
39. 42. Hey! Lets get our hand dirty … Using S P S S ….
40. 43. One Way A n a l y s i s o f V a r i a n c e U s i n g S P S S <ul><li>Suppose we want to know whether students who have to work many hours outside school to support themselves find their grade suffering. </li></ul><ul><li>We examine this question by comparing the GPAs of students who work various hours outside school. </li></ul><ul><li>Let’s examine this question using data in Student file. File>Open> Student </li></ul>
41. 44. One Way A n a l y s i s o f V a r i a n c e U s i n g S P S S <ul><li>First examine the average GPA for each of the three work categories (0 hrs, 1-19hrs, >20hrs) - WorkCat </li></ul><ul><li>Graph > Boxplot then choose Simple and click Define . Select GPA as the variable and WorkCat for the Category Axis. Click </li></ul><ul><li>Option </li></ul>
42. 45. After Clicking Options …, click off Display groups defined by missing value , and click Continue then OK . <ul><li>You’ll get this </li></ul>
43. 46. What is the Box-plot telling us? <ul><li>Some variation across the groups </li></ul><ul><li>See median GPAs (dark line in the middle of box) differ slightly between groups. </li></ul><ul><li>So, should we attribute the observed difference to sampling error or they genuinely differ? </li></ul><ul><li>Neither box-plot nor the median offer decisive evidence. Hence we need ANOVA. </li></ul>
44. 47. One Way A n a l y s i s o f V a r i a n c e U s i n g S P S S <ul><li>We are testing: </li></ul><ul><ul><li>H 0 :  1 =  2 =  3 </li></ul></ul><ul><ul><li>H 1 : At least two means differ </li></ul></ul><ul><li>Before attempting ANOVA, need to review the ANOVA assumptions. </li></ul><ul><ul><li>Independent samples </li></ul></ul><ul><ul><li>(ii) Normality </li></ul></ul><ul><ul><li>(iii) Variances equality. </li></ul></ul><ul><ul><li>We can test both (ii) & (iii). </li></ul></ul><ul><li>Analyze>Descriptive Statistics>Explore </li></ul>
45. 48. Analyze>Descriptive Statistics>Explore <ul><li>In the Explore dialog box, select GPAs as the dependent List variable, WorkCat as the Factor List variable and Plot as the Display . Next, click Plot … </li></ul><ul><li>We are interested in a </li></ul><ul><li>normality test, select </li></ul><ul><li> </li></ul><ul><li>Select this & </li></ul><ul><li> deselect this only. Click Continue and OK . See next slide… </li></ul>
46. 49. The Output has several parts, let focus on the tests of normality <ul><li>The Kolmogorov-Smirnov test assesses whether there is significant departure from normality in the population distribution of the 3 groups. H 0 : Distributions are normal . </li></ul><ul><li>Look at the p-values, all are > 0.05. Do not reject H 0 . Hence no evidence of non-normality. </li></ul>
47. 50. One Way A n a l y s i s o f V a r i a n c e U s i n g S P S S <ul><li>We still need to validate the homogeneity of variance assumption. We do this within ANOVA. </li></ul><ul><li>Analyze>Compare Means>One-Way ANOVA </li></ul><ul><li>Dependent List variable </li></ul><ul><li>is GPA and Factor </li></ul><ul><li>variable is WorkCat. </li></ul><ul><li>Click Option, </li></ul>
48. 51. One Way A n a l y s i s o f V a r i a n c e U s i n g S P S S <ul><li>under Statistics , select Descriptive and Homogeneity of variance test . Click Continue & OK </li></ul><ul><li>H 0 : Variances are equal. One-Way ANOVA output consists many parts. </li></ul><ul><li>Look at the p-value > 0.05. </li></ul><ul><li>Hence do not reject H 0 . </li></ul>
49. 52. Normality & Homogeneity of variances assumptions met … hence <ul><li>Let find out whether students who work various hours outside school differ in their GPAs. </li></ul><ul><li>The P-value of .000 is very small, hence we reject Ho and conclude that </li></ul><ul><li>the means GPAs are not all the same. Where are the differences? Hence Post-Hoc test… </li></ul>
50. 53. End of ANOVA See U Later…
51. 54. One-Way ANOVA F-Test: Exercise 1 Solution <ul><li>You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level? </li></ul><ul><li>The data in the table represents the learning times (in hours) of 12 people using 4 different training methods. </li></ul><ul><li>M1 M2 M3 M4 10 11 13 18 9 16 8 23 5 9 9 25 </li></ul>© 1984-1994 T/Maker Co.
52. 55. Summary Table Solution* Source of Variation Degrees of Freedom Sum of Squares Mean Square (Variance) F Treatment ( Methods ) 4 - 1 = 3 348 116 11.6 Error 12 - 4 = 8 80 10 Total 12 - 1 = 11 428
53. 56. One-Way ANOVA F-Test Solution* <ul><li>H 0 :  1 =  2 =  3 =  4 </li></ul><ul><li>H a : Not All Equal </li></ul><ul><li> = .05 </li></ul><ul><li> 1 = 3  2 = 8 </li></ul><ul><li>Critical Value(s): </li></ul>F 0 4.07 Test Statistic: Decision: Conclusion: Reject at  = .05 There Is Evidence Pop. Means Are Different  = .05 F MSB MSE    116 10 11 6 .
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.