Upcoming SlideShare
×

# Multiplicity, how to deal with the testing of more than one hypothesis.

1,106 views

Published on

Covers the body of statistics that deals with dealing with the testing of more than one hypothesis or more than just two groups.

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,106
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
16
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Multiplicity, how to deal with the testing of more than one hypothesis.

1. 1. 1 Multiplicity Gaetan Lion July 2013
2. 2. 2 Probability of Making a Type I error* when using a t test with > 2 Samples *A false positive. Rejecting the null hypothesis when it is true. Prob of Type I Error (Initial Confidence Level 95%) 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 1 2 3 4 5 6 7 8 9 10 # of Hypothesis Confidence level 95% Unadjusted a value 5% Probability of Type I Error # of Logic Logic Logic hypothesis Bonferroni Sidak Sidak 1 0.05 0.05 0.05 2 0.10 0.10 0.10 3 0.15 0.14 0.14 4 0.20 0.19 0.19 5 0.25 0.23 0.23 6 0.30 0.26 0.26 7 0.35 0.30 0.30 8 0.40 0.34 0.34 9 0.45 0.37 0.37 10 0.50 0.40 0.40
3. 3. 3 How to test > 2 Samples Two Basic Steps: 1) Choose a specific ANOVA method, given your testing framework: Between-Groups, Within-Groups, or Mixed ANOVA… or use a nonparametric equivalent if warranted. This is to figure out whether your groups or samples are different overall. 2) Decide in advance whether to conduct Post Hoc (after the fact) or Planned Comparison tests to figure out which specific groups are different.
4. 4. 4 The ANOVAs Between-Groups Unpaired testing. Difference between independent groups. Single measures or observations. Within-Groups Paired testing. Difference between same groups before and after treatment. Repeated measures or observations. Mixed Mixed testing. Difference between independent groups before and after treatment. Repeated measures or observations.
5. 5. 5 ANOVA semantics One-Way Between-Groups ANOVA means an ANOVA with independent groups measuring one single independent variable and one dependent variable. The independent variable could be type of students by Major and the dependent variable could be math proficiency. Four-Way Between-Groups ANOVA using the same data, but in addition to Majors would also look at: Gender, Class (Freshman, Sophomore,…), and Ethnicity. So, you now have four independent variables. Balanced ANOVA means that each group or sample is of the same size (same number of male vs female, etc…). An Unbalanced ANOVA means that some of the samples are of different size.
6. 6. 6 Excel ANOVA(s) Add-in Cryptic Semantics “Factor” means the same as “Way.” They both mean Independent Variable. “With Replication” can be confused with “Repeated Measures” that typically means “Within Group” or paired testing. “Without Replication” can be confused with “Single Measure” that typically means “Between Groups” or unpaired testing. In Excel Add-in “With Replication” means you have more than one single data point per group or sample which is almost always the case. “Without Replication” in Excel Add-in can be used for two very different situations: 1) Two-Way Between Groups ANOVA with a single observation per category; and 2) One-Way Within Groups ANOVA. ANOVA method Excel Add-in corresponding tool One-Way Between-Groups ANOVA Anova: Single Factor Two-Way Between Groups ANOVA Anova: Two-Factor With Replication More than one observation per category (standard) Two-Way Between Groups ANOVA Anova: Two-Factor Without Replication A single observation per category One-Way Within Groups ANOVA Anova: Two-Factor Without Replication
7. 7. 7 Post Hoc vs Planned Comparison Tests Post Hoc test Planned Comparison test Purpose Exploratory. You test the difference between all potential combination of Groups. Confirmation of theory or hypothesis. You test only the Groups you expect to be different in a specific direction (greater, lower). Risk of Type I error Very low. Very unlikely to generate a false positive. Reject null hypothesis when it is true. Low. Not quite as conservative as a Post Hoc test. But, conservative enough. Risk of Type II error High. Not, unlikely to generate a false negative (accept the null hypothesis when it is false). This test lacks Power. Lower risk of Type II error than Post Hoc test. The test is more sensitive, more likely to uncover a difference. It has more Power.
8. 8. 8 PH means Post Hoc test PC means Planned Comparison test HYPOTHESIS TESTING FLOW CHART Multiple hypothesis testing. > 2 Samples or Groups Multiple hypothesis test Transition test Post Hoc Test Are the groups different? to facilitate Which group is different? Post Hoc test Tukey's HSD test (PH) Scheffe test (PH) Normal Between-Groups ANOVA REGWQ test (PH) Dunnett test (PC) Unpaired t test not Kruskal-Wallis test. Mann-Whitney Bonferroni test (PH) not Friedman test Sidak test (PH) Paired t test Simple contrasts (PC) Normal Within-Groups ANOVA Repeated contrasts (PC) Normal Mixed ANOVA No Post Hoc test not No nonparametric alternative Unpaired testing Difference between independent groups (Between-Groups). Single measure or observation. Paired testing Difference between same group before and after treatment (Within-Groups). Repeated measures or observations. Mixed testing Difference between independent groups before and after treatment (Mixed). Repeated measures or observations. Research structure Are we testing different groups once? Are we testing the same group(s) at different times? Wilcoxon Sign Rank Test
9. 9. 9 Two-Ways Between-Groups ANOVA example
10. 10. 10 Data Format For Excel Add-in Y X1 X2 Int. Score Cowboy Gender 71 J. Wayne Male 76 J. Wayne Male 84 J. Wayne Male 72 J. Wayne Male 68 J. Wayne Male 66 J. Wayne Female 64 J. Wayne Female 66 J. Wayne Female 47 J. Wayne Female 66 J. Wayne Female 65 C. Eastwood Male 53 C. Eastwood Male 70 C. Eastwood Male 46 C. Eastwood Male 53 C. Eastwood Male 73 C. Eastwood Female 80 C. Eastwood Female 81 C. Eastwood Female 88 C. Eastwood Female 72 C. Eastwood Female 81 None Male 69 None Male 55 None Male 60 None Male 61 None Male 72 None Female 75 None Female 73 None Female 54 None Female 65 None Female For XLStat XLStat treats this ANOVA as a linear regression with one dependent variable and two qualitative independent variables. Two-Way Between-Groups ANOVA Two Independent variable: Cowboy preference in movies, Gender One Dependent variable: Intelligence score Male Female John Wayne 71 66 76 64 84 66 72 47 68 66 Clint Eastwood 65 73 53 80 70 81 46 88 53 72 None 81 72 69 75 55 73 60 54 61 65
11. 11. 11 Excel Long Hand Between-Sample Variability (BSV) Sample size Average Total Avg. Differ.^2 J. Wayne - Male 5 74.2 67.5 44.4 J. Wayne - Female 5 61.8 67.5 32.9 C. Eastwood - Male 5 57.4 67.5 102.7 C. Eastwood - Female 5 78.8 67.5 126.9 None - Male 5 65.2 67.5 5.4 None - Female 5 67.8 67.5 0.1 J. Wayne 10 68.0 67.5 0.2 C. Eastwood 10 68.1 67.5 0.3 None 10 66.5 67.5 1.1 Male 15 65.6 67.5 3.7 Female 15 69.5 67.5 3.7 SS DF (k - 1) MS Corrected Model 1,562.3 5 312.5 Cowboy 16 2 8.0 Gender 112.1 1 112.1 Within-Sample Variability (WSV) Sample -1 STDEV Variance J. Wayne - Male 4 6.2 38.2 J. Wayne - Female 4 8.3 69.2 C. Eastwood - Male 4 9.8 96.3 C. Eastwood - Female 4 6.5 42.7 None - Male 4 10.2 103.2 None - Female 4 8.6 73.7 SS Within 1,693.2 DF (n - k) 24 MS Within 70.5 Between-Sample Variability/Within-Sample Variability Output BSV/WSV Source SS df MS F Sign. Model 1,562.3 5 312.5 4.4 0.005 Cowboy 16.1 2 8.0 0.1 0.893 Gender 112.1 1 112.1 1.6 0.220 Cowboy*Gender 1,434.1 2 717.0 10.2 0.001 Error/Residual 1,693.2 24 70.5 Corrected Model 3,255.5 29
12. 12. 12 Excel Add-in Anova: Two-Factor With Replication SUMMARY Male Female Total J. Wayne Count 5 5 10 Sum 371 309 680 Average 74.2 61.8 68 Variance 38.2 69.2 90.4 C. Eastwood Count 5 5 10 Sum 287 394 681 Average 57.4 78.8 68.1 Variance 96.3 42.7 189.0 None Count 5 5 10 Sum 326 339 665 Average 65.2 67.8 66.5 Variance 103.2 73.7 80.5 Total Count 15 15 Sum 984 1042 Average 65.6 69.5 Variance 118.4 106.1 ANOVA Source of Variation SS df MS F P-value F crit Sample 16.1 2 8.0 0.11 0.893 3.4 Columns 112.1 1 112.1 1.59 0.220 4.3 Interaction 1434.1 2 717.0 10.16 0.001 3.4 Within 1693.2 24 70.6 Total 3255.5 29 Cowboy Gender Error/Residual Corrected Total/Model
13. 13. 13 XLStat ANOVA Pred(I Score) / I Score 45 50 55 60 65 70 75 80 85 90 45 50 55 60 65 70 75 80 85 90 Pred(I Score) IScore Analysis of variance: Source DF SS MS F Pr > F Model 5 1562.3 312.5 4.43 0.005 Error 24 1693.2 70.5 Corrected Total 29 3255.5 Computed against model Y=Mean(Y) Type I Sum of Squares analysis: Source DF SS MS F Pr > F Cowboy 2 16 8.0 0.1 0.893 Gender 1 112 112.1 1.6 0.220 Cowboy*Gender 2 1434 717.0 10.2 0.001
14. 14. 14 Post Hoc and Planned Comparison tests
15. 15. 15 Tukey’s HSD (PH) vs Dunnett test (PC) for Cowboys Tukey's Honestly Significant Difference (HSD) test. Post Hoc test Dunnett test. Planned Comparison MS Within 70.5 MS Within 70.5 n 10 Number per treatment/Number of Groups n 10 Standard Error 2.66 SQRT(MS within(1/n) Standard Error 3.76 SQRT(2MS within/n) Average intelligence score: Average intelligence score: C. Eastwood 68.1 C. Eastwood 68.1 J. Wayne 68.0 J. Wayne 68.0 None 66.5 None 66.5 A A/B*1.96 A A/B*1.96 Standard. Alpha Estimated Estimated Standard. Alpha Estimated Est. Differ. Difference 0.05 Z value 2-tail P val. Differ. Difference 0.05 Z value 2-tail P val. C. East vs None 1.60 0.60 Not sign. 0.33 0.74 C. East vs None 1.60 0.43 Not sign. 0.36 0.72 C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98 C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98 J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75 J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74 Critical value @ a 0.05. 2-tail Critical value @ a 0.05. 2-tail df within 24 df within 24 k # groups 3 k # groups 3 alpha 0.05 from table 3.53 B alpha 0.05 from table 2.35 B
16. 16. 16 Tuckey’s Test (PH) (for Cowboys) Tukey's Honestly Significant Difference (HSD) test. Post Hoc test MS Within 70.5 n 10 Number per treatment/Number of Groups Standard Error 2.66 SQRT(MS within(1/n) Average intelligence score: C. Eastwood 68.1 J. Wayne 68.0 None 66.5 A A/B*1.96 Standard. Alpha Estimated Estimated Differ. Difference 0.05 Z value 2-tail P val. C. East vs None 1.60 0.60 Not sign. 0.33 0.74 C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98 J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75 Critical value @ a 0.05. 2-tail df within 24 k # groups 3 alpha 0.05 from table 3.53 B
17. 17. 17 Dunnett test (PC) (for Cowboys) Dunnett test. Planned Comparison MS Within 70.5 n 10 Standard Error 3.76 SQRT(2MS within/n) Average intelligence score: C. Eastwood 68.1 J. Wayne 68.0 None 66.5 A A/B*1.96 Standard. Alpha Estimated Est. Differ. Difference 0.05 Z value 2-tail P val. C. East vs None 1.60 0.43 Not sign. 0.36 0.72 C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98 J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74 Critical value @ a 0.05. 2-tail df within 24 k # groups 3 alpha 0.05 from table 2.35 B
18. 18. 18 Comparing Dunnett vs Tukey’s across various Mean difference levels Comparing Dunnett's vs Tukey's across various Mean difference level. Standard error a 5% 2-tail critical value Dunnett's 3.76 2.35 Tukey's 2.66 3.53 Z 1.96 Tuckey's Test Dunnett Test % of % of Mean Standard. 2-tail 2-tail 2-tail Mean Standard. 2-tail 2-tail 2-tail difference differen. Critical val. Z equival. P val est. difference differen. Critical val. Z equival. P val est. 0.5 0.19 5.3% 0.10 0.92 0.5 0.13 5.7% 0.11 0.91 1.0 0.38 10.7% 0.21 0.83 1.0 0.27 11.3% 0.22 0.82 2.0 0.75 21.3% 0.42 0.68 2.0 0.53 22.7% 0.44 0.66 3.0 1.13 32.0% 0.63 0.53 3.0 0.80 34.0% 0.67 0.51 4.0 1.51 42.7% 0.84 0.40 4.0 1.06 45.3% 0.89 0.37 5.0 1.88 53.3% 1.05 0.30 5.0 1.33 56.6% 1.11 0.27 6.0 2.26 64.0% 1.25 0.21 6.0 1.60 68.0% 1.33 0.18 7.0 2.64 74.7% 1.46 0.14 7.0 1.86 79.3% 1.55 0.12 8.0 3.01 85.3% 1.67 0.09 8.0 2.13 90.6% 1.78 0.08 9.0 3.39 96.0% 1.88 0.06 9.0 2.40 102.0% 2.00 0.05
19. 19. 19 Dunnett vs Tukey’s visual comparison Dunnett vs Tuckey 2-tail p value 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Mean difference 2-tailpvalue Tukey Dunnett The Dunnett test is only marginally more sensitive or has more Power than the Tukey’s test (more likely to find a statistically significant difference) when using a 2-tail test. However, with Dunnett, if warranted, you can also use a 1-tail test… which would make a huge difference. With Tukey, you can’t do that.
20. 20. 20 Bonferroni vs Sidak Test adjusted a value Multiple hypothesis testing adjustments Corresponding to a: 5% Adjusted a value: # of hypothesis Bonferroni Sidak 1 5.00% 5.00% 2 2.50% 2.53% 3 1.67% 1.70% 4 1.25% 1.27% 5 1.00% 1.02% 6 0.83% 0.85% 7 0.71% 0.73% 8 0.63% 0.64% 9 0.56% 0.57% 10 0.50% 0.51% Bonferroni: a/# of hypothesis Sidak: 1 - (1- a)1/# of hypothesis Those tests consists in adjusting the relevant Alpha threshold (i.e. 5%) for the number of hypothesis you are testing. Bonferroni simply divides the Alpha value by the # of hypothesis. Sidak uses a compounding formula that is technically more accurate but makes no material difference in this situation.
21. 21. 21 What would be qualifying a value? At what familywise level a would a single hypothesis qualify (Sidak logic) Original unpaired t test p value 0.5% 1% 2.5% 5% 10% 15% 1 0.01 0.01 0.03 0.05 0.10 0.15 2 0.01 0.02 0.05 0.10 0.19 0.28 3 0.01 0.03 0.07 0.14 0.27 0.39 # of 4 0.02 0.04 0.10 0.19 0.34 0.48 hypothesis 5 0.02 0.05 0.12 0.23 0.41 0.56 6 0.03 0.06 0.14 0.26 0.47 0.62 7 0.03 0.07 0.16 0.30 0.52 0.68 8 0.04 0.08 0.18 0.34 0.57 0.73 9 0.04 0.09 0.20 0.37 0.61 0.77 10 0.05 0.10 0.22 0.40 0.65 0.80
22. 22. 22 A Radical Idea: Skipping ANOVA HYPOTHESIS TESTING FLOW CHART Multiple hypothesis testing. > 2 Samples or Groups Multiple hypothesis test Transition test Post Hoc Test Are the groups different? to facilitate Which group is different? Post Hoc test Tukey's HSD test (PH) Scheffe test (PH) Normal Between-Groups ANOVA REGWQ test (PH) Dunnett test (PC) Unpaired t test not Kruskal-Wallis test. Mann-Whitney Bonferroni test (PH) not Friedman test Sidak test (PH) Paired t test Simple contrasts (PC) Normal Within-Groups ANOVA Repeated contrasts (PC) Normal Mixed ANOVA No Post Hoc test not No nonparametric alternative Unpaired testing Difference between independent groups (Between-Groups). Single measure or observation. Paired testing Difference between same group before and after treatment (Within-Groups). Repeated measures or observations. Mixed testing Difference between independent groups before and after treatment (Mixed). Repeated measures or observations. Research structure Are we testing different groups once? Are we testing the same group(s) at different times? Wilcoxon Sign Rank Test
23. 23. 23 Streamlined Testing HYPOTHESIS TESTING FLOW CHART Multiple hypothesis testing. > 2 Samples or Groups Post Hoc Test Which group is different? Normal Unpaired t test not Mann-Whitney not Wilcoxon Sign Rk test Normal Paired t test Unpaired testing Difference between independent groups (Between-Groups). Single measure or observation. Paired testing Difference between same group before and after treatment (Within-Groups). Repeated measures or observations. Research structure Are we testing different groups once? Are we testing the same group(s) at different times? Bonferroni or Sidak test (PH)
24. 24. 24 Disadvantages of Streamlined Testing • You can’t run Tukey’s (PH) and Dunnett (PC). Those tests are not just adjustment to P value, and may be superior in certain circumstances; • You don’t have access to any Planned Comparison tests that are more sensitive (more Power) and can allow you to use a 1-tail P value when warranted; • ANOVA gives you valuable information about the different independent variables, and their interaction. Between-Sample Variability/Within-Sample Variability Output BSV/WSV Source SS df MS F Sign. Model 1,562.3 5 312.45 4.43 0.005 Cowboy 16.1 2 8.03 0.11 0.893 Gender 112.1 1 112.13 1.59 0.220 Cowboy*Gender 1,434.1 2 717.03 10.16 0.001