Upcoming SlideShare
×

# Anova

18,981 views

Published on

description of 1 way ANOVA in excel and explanation of output

7 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Good explanation

Are you sure you want to  Yes  No
• thank you.....

Are you sure you want to  Yes  No
Views
Total views
18,981
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
845
2
Likes
7
Embeds 0
No embeds

No notes for slide
• Factors =1 (drug condition) Levels = 3 (aspirin, tylenol and placebo) N=12 (3 groups of 4)
• Spreadsheet attached with the analysis completed
• One factor at 3 levels ...the number of factors dictates the type of ANOVA – one way is the most common since it is an extension of the basic t-test
• GM=3 SSt=18
• SSb=8
• Accept the null hypothesis because the Fvalue is lower than F-crit and p is greater than alpha In this case there is no difference in the means but if there had been the best information we could get is that there is a difference- the ANOVA will not tell us which of the three drugs is different... Further testing would be required
• There is a trade off when conducting a series of pairwise significance tests- the more tests carreid out the higher the experimentwise error rate increases and the probability of making a type 1 error i.e. The power of the test decreases.
• Definitely worth telling candidates that there is a large increase in the probability of making the wrong decision when doing a lot of pairwise t-tests instead of ANOVA.... The first instinct of many new analysts would be to simply do a lot of t-tests – this IS NOT THE CORRECT APROACH - EVER
• Harmonic Mean The harmonic mean is used to take the mean of sample sizes. If there are k samples each of size n, then the harmonic mean is defined as: For the numbers 1, 2, 3, and 10, the harmonic mean is: the harmonic mean is defined as: n h =k/(1/n 1 +1/n 2 +...1/n k ) = 2.069. This is less than the geometric mean of 2.78 and the arithmetic mean of 4.
• Someone will ask the question: “if we have to go do pair wise t-tests after the ANOVA anyway, why not just do them and forget the ANOVA? – Well of course that is their choice BUT the ANOVA may return a result of no sig diff. In one test, saving a lot of time and effort AND pairwise testing increases the probability of false results
• ### Anova

1. 1. Dr. Ian Vallance Dr. Ricky Tomanek Impact Laboratories Ian Galloway FRSS, Lean/six sigma green belt Introduction to AN alysis O f VA riance (ANOVA)
2. 2. Course content <ul><li>What is ANOVA </li></ul><ul><li>Different types of ANOVA </li></ul><ul><li>ANOVA Theory </li></ul><ul><li>Worked example in excel </li></ul><ul><li>a) Generating the data </li></ul><ul><li>b) Explanation/Interpretation of output </li></ul><ul><li>c) Rules for accepting/rejecting null hypothesis </li></ul><ul><li>Supplemental testing </li></ul><ul><li>Summary – ANOVA in Excel </li></ul><ul><li>Summary </li></ul>
3. 3. 1 What is ANOVA ANOVA is a general technique that can be used to test the hypothesis that the means among two or more groups are equal, under the assumption that the sampled populations are normally distributed. Analysis of variance can be used to test differences among several means for significance without increasing the Type I error rate. The t-test is designed to test the hypothesis that 2 means could be from the same population of data But what if we want to compare more than 2 means at the same time?
4. 4. 2 Different types of ANOVA To begin, let us consider the effect of temperature on a passive component such as a resistor. We select three different temperatures and observe their effect on the resistors. This experiment can be conducted by measuring all the participating resistors before placing n resistors each in t hree different ovens. Each oven is heated to a selected temperature. Then we measure the resistors again after, say, 24 hours and analyze the responses, which are the differences between before and after being subjected to the temperatures. The temperature is called a factor . The different temperature settings are called levels . In this example there are three levels or settings of the factor Temperature.
5. 5. Different types of ANOVA What is a factor? A factor is an independent treatment variable whose settings (values) are controlled and varied by the experimenter. The intensity setting of a factor is the level. Levels may be quantitative numbers or, in many cases, simply &quot;present&quot; or &quot;not present&quot; (&quot;0&quot; or &quot;1&quot;). In the experiment, there is only one factor, temperature, and the analysis of variance that we will be using to analyze the effect of temperature is called a one-way or one-factor ANOVA . The 1-way ANOVA The 2-way or 3-way ANOVA We could have opted to also study the effect of positions in the oven. In this case there would be two factors, temperature and oven position. Here we speak of a two-way or two-factor ANOVA . Furthermore, we may be interested in a third factor, the effect of time. Now we deal with a three-way or three-factor ANOVA.
6. 6. 3 ANOVA Theory The theory of ANOVA is long, complicated and detailed and will NOT be looked at in this course. If you do want to learn more try the following web pages: http://davidmlane.com/hyperstat/intro_ANOVA.html http://itl.nist.gov/div898/handbook/prc/section4/prc43.htm http://www.experiment-resources.com/anova-test.html http://www.chem.agilent.com/cag/bsp/products/gsgx/Downloads/pdf/one_way_anova.pdf
7. 7. 4 (a) A worked example in excel 2007 In this example of a one way ANOVA we will calculate all the components of the ANOVA without explaining the theory behind the formulas used. The main objectives of this exercise are to learn about the typical layout of ANOVA output (the format will look very similar in excel) and to learn how to interpret the output. And finally how to carry out ANOVA using excel
8. 8. A worked example in excel 2007 How many factors? How many levels? How many subjects in total? In a hypothetical experiment, aspirin, Tylenol, and a placebo were tested to see how much pain relief each provides. Pain relief was rated on a five-point scale. Four subjects were tested in each group and their data are shown below: Aspirin Tylenol Placebo 3 2 2 5 2 1 3 4 3 5 4 2
9. 9. A worked example in excel 2007 (1) Data for analysis (2) Use the data tab (3) Data analysis toolpak
10. 10. A worked example in excel 2007 Select Data Analysis Why are we selecting this option?
11. 11. A worked example in excel 2007 Input range is the data to be analysed If the data columns have labels tick this box Alpha is the confidence Level i.e. 0.05 = 95%CL Output options – where do you want to see the results
12. 12. A worked example in excel 2007
13. 13. 4 (b)A worked example Interpretation of output Now we have an output – We can deconstruct each result and explain the final result Anova: Single Factor SUMMARY Groups Count Sum Average Variance Aspirin 4 16 4 1.33 Tylenol 4 12 3 1.33 Placebo 4 8 2 0.67 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
14. 14. A worked example Interpretation of output Total Sum of Squares The variation among all the subjects in an experiment is measured by what is called total sum of squares or SS T . SS T is the sum of the squared differences of each score from the mean of all the scores. Letting GM (standing for &quot;grand mean&quot;) represent the mean of all scores, then SST = Σ(X - GM)² where GM = ΣX/N and N is the total number of subjects in the experiment ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
15. 15. A worked example Interpretation of output For the example data: N = 12 GM = (3+5+3+5+2+2+4+4+2+1+3+2)/12 = ? SST = (3-3)²+(5-3)²+(3-3)²+(5-3)² + (2-3)²+(2-3)²+(4-3)²+(4-3)² + (2-3)²+(1-3)²+(3-3)²+(2-3)² = ?
16. 16. A worked example Interpretation of output Sum of Squares Between Groups The sum of squares due to differences between groups (SS B ) is computed according to the following formula: where n i is the sample size of the ith group and M i is the mean of the ith group, and GM is the mean of all scores in all groups. ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
17. 17. A worked example Interpretation of output If the sample sizes are equal then the formula can be simplified somewhat: SS B = nΣ(M i - GM)² For the example data, M 1 = (3+5+3+5)/4 = 4 M 2 = (2+4+2+4)/4 = 3 M 3 = (2+1+3+2)/4 = 2 GM = 3 n = 4 SS B = 4[(4-3)² + (3-3)² + (2-3)²] = ?
18. 18. A worked example Interpretation of output Sum of Squares Error (Sum of Squares within groups) The sum of squares error is the sum of the squared differences between the individual scores and their group means. SSE = SSE 1 + SSE 2 + ... + SSE a SSE 1 = Σ(X - M 1 )² ; SSE 2 = Σ(X - M 2 )² SSE a = Σ(X - M a )² where M 1 is the mean of Group 1, M 2 is the mean of Group 2, and M a is the mean of Group a. Or you could Simply say! SS within = Ss total - SS between ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
19. 19. A worked example Interpretation of output df between = a - 1 = 3 - 1 = 2 df within = N - a = 12 - 3 = 9 df total = N - 1 = 12 - 1 = 11 a is the number groups N is the total number of subjects. df total = df groups + df error ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
20. 20. A worked example Interpretation of output Mean squares are estimates of variance and are computed by dividing the sum of squares by the degrees of freedom. The mean square for groups (4.00) was computed by dividing the sum of squares for groups (8.00) by the degrees of freedom for groups (2). ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
21. 21. A worked example Interpretation of output The F ratio is computed by dividing the mean square for between groups by the mean square for within groups. In this example, F = 4.000/1.111 = 3.60 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
22. 22. A worked example Interpretation of output The probability value It is the probability of obtaining an F as large or larger than the one computed in the data assuming that the null hypothesis is true. It can be computed from an F table. The df for groups (2) is used as the degrees of freedom in the numerator and the df for error (9) is used as the degrees of freedom in the denominator. The probability of an F with 2 and 9 df as larger or larger than 3.60 is 0.071 F crit is the highest value of F that can be obtained without rejecting the null hypothesis (obtained from F-test tables for 2&9 DF) ANOVA Source of Variation SS df MS F P-value F crit Between Groups 8 2 4 3.6 0.07 4.26 Within Groups 10 9 1.11 Total 18 11
23. 23. 4(c) A worked example Interpretation of output Using the above table and the results from the example is there an indication of a significant difference in the means? Interpreting the Anova One Way test results If Then test statistic > critical value Reject the null hypothesis (i.e. F> Fcrit) test statistic < critical value Accept the null hypothesis (i.e. F< Fcrit) p value <  Reject the null hypothesis p value >  Accept the null hypothesis
24. 24. 5 Introduction to Tests Supplementing a One-factor Between-Subjects ANOVA Unfortunately, when the analysis of variance is significant and the null hypothesis is rejected. The only valid inference that can be made is that at least one population mean is different from at least one other population mean. The analysis of variance does not reveal which population means differ from which others Consequently, further analyses are usually conducted after a significant analysis of variance. These further analyses almost always involve conducting a series of significance tests.
25. 25. Additional info - PCER (not in training) The probability that a single significance test will result in a Type I error is called the pre-comparison error rate (PCER). The probability that at least one of the tests will result in a Type I error is called the experiment wise error rate(EER). Statisticians differ in their views of how strictly the EER must be controlled. Some statistical procedures provide strict control over the EER whereas others control it to a lesser extent. Naturally there is a trade off between the Type I and Type II error rates. The more strictly the EER is controlled, the lower the power of the significance tests
26. 26. Additional info - PCER (not in training) When a series of significance test is conducted, the experimentwise error rate (EER) is the probability that one or more of the significance tests results in a Type 1 error. If the comparisons areindependant, then the experimentwise error rate is: where α ew is experimentwise error rate α pc is the per-comparison error rate and c is the number of comparisons. For example, if 5 independent comparisons were each to be done at the .05 level then the probability that at least one of them would result in a Type I error is: 1 - (1 - .05) 5 = 0.226.
27. 27. Introduction to Tests Supplementing a One-factor Between-Subjects ANOVA The &quot;Honestly Significantly Different&quot; (HSD) test proposed by the statistician John Tukey is based on what is called the “studentised range distribution.&quot; To test all pairwise comparisons among means using the Tukey HSD, compute t for each pair of means using the formula: where M i - M j is the difference between the ith and jth means, MSE is the Mean Square within , and n h is the harmonic mean of the sample sizes of groups i and j.
28. 28. Introduction to Tests Supplementing a One-factor Between-Subjects ANOVA The critical value of t s is determined from the distribution of the studentised range. The number of means in the experiment is used in the determination of the critical value, and this critical value is used for all comparisons among means. Typically, the largest mean is compared with the smallest mean first. If that difference is not significant, no other comparisons will be significant either, so the computations for these comparisons can be skipped.
29. 29. Summary ANOVA in Excel Anova: Single Factor This tool performs a simple analysis of variance on data for two or more samples. The analysis provides a test of the hypothesis that each sample is drawn from the same underlying probability distribution against the alternative hypothesis that underlying probability distributions are not the same for all samples. If there are only two samples, you can use the worksheet function TTEST. With more than two samples, there is no convenient generalization of TTEST, and the Single Factor Anova model can be called upon instead.
30. 30. Anova: Two-Factor with Replication This analysis tool is useful when data can be classified along two different dimensions. For example, in an experiment to measure the height of plants, the plants may be given different brands of fertilizer (for example, A, B, C) and might also be kept at different temperatures (for example, low, high). For each of the six possible pairs of {fertilizer, temperature}, we have an equal number of observations of plant height. Using this Anova tool, we can test: Whether the heights of plants for the different fertilizer brands are drawn from the same underlying population. Temperatures are ignored for this analysis. Whether the heights of plants for the different temperature levels are drawn from the same underlying population. Fertilizer brands are ignored for this analysis. Summary ANOVA in excel
31. 31. Summary ANOVA in excel Whether, having accounted for the effects of differences between fertilizer brands found in the first bulleted point ,and differences in temperatures found in the second bulleted point, the six samples representing all pairs of {fertilizer, temperature} values are drawn from the same population. The alternative hypothesis is that there are effects due to specific {fertilizer, temperature} pairs over and above the differences that are based on fertilizer alone or on temperature alone.
32. 32. Summary ANOVA in excel Anova: Two-Factor Without Replication This analysis tool is useful when data is classified on two different dimensions as in the Two-Factor case With Replication. However, for this tool it is assumed that there is only a single observation for each pair (for example, each {fertilizer, temperature} pair in the preceding example).
33. 33. Summary <ul><li>To compare 2 or more means in a single test we use ANOVA </li></ul><ul><li>The type of ANOVA test to use is decided by the number of FACTORS in the experiment </li></ul><ul><li>The ANOVA will only tell whether there is a significant difference and gives no information on which mean(s) are different </li></ul><ul><li>Further pairwise comparisons of the means are required to gain further information on which mean(s) are different </li></ul><ul><li>Pairwise testing of means can increase the probability of type 1 errors </li></ul>