1
Multiplicity
Gaetan Lion
July 2013
2
Probability of Making a Type I error*
when using a t test with > 2 Samples
*A false positive. Rejecting the null hypothesis when it is true.
Prob of Type I Error (Initial Confidence Level 95%)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 2 3 4 5 6 7 8 9 10
# of Hypothesis
Confidence level 95%
Unadjusted a value 5%
Probability of Type I Error
# of Logic Logic Logic
hypothesis Bonferroni Sidak Sidak
1 0.05 0.05 0.05
2 0.10 0.10 0.10
3 0.15 0.14 0.14
4 0.20 0.19 0.19
5 0.25 0.23 0.23
6 0.30 0.26 0.26
7 0.35 0.30 0.30
8 0.40 0.34 0.34
9 0.45 0.37 0.37
10 0.50 0.40 0.40
3
How to test > 2 Samples
Two Basic Steps:
1) Choose a specific ANOVA method, given your testing
framework: Between-Groups, Within-Groups, or Mixed
ANOVA… or use a nonparametric equivalent if
warranted. This is to figure out whether your groups or
samples are different overall.
2) Decide in advance whether to conduct Post Hoc (after
the fact) or Planned Comparison tests to figure out which
specific groups are different.
4
The ANOVAs
Between-Groups
Unpaired testing. Difference between independent groups.
Single measures or observations.
Within-Groups
Paired testing. Difference between same groups before
and after treatment. Repeated measures or observations.
Mixed
Mixed testing. Difference between independent groups
before and after treatment. Repeated measures or
observations.
5
ANOVA semantics
One-Way Between-Groups ANOVA means an ANOVA with independent
groups measuring one single independent variable and one dependent
variable. The independent variable could be type of students by Major
and the dependent variable could be math proficiency.
Four-Way Between-Groups ANOVA using the same data, but in addition to
Majors would also look at: Gender, Class (Freshman, Sophomore,…), and
Ethnicity. So, you now have four independent variables.
Balanced ANOVA means that each group or sample is of the same size
(same number of male vs female, etc…). An Unbalanced ANOVA means
that some of the samples are of different size.
6
Excel ANOVA(s) Add-in Cryptic Semantics
“Factor” means the same as “Way.” They both mean Independent Variable. “With
Replication” can be confused with “Repeated Measures” that typically means
“Within Group” or paired testing.
“Without Replication” can be confused with “Single Measure” that typically means
“Between Groups” or unpaired testing.
In Excel Add-in “With Replication” means you have more than one single data point
per group or sample which is almost always the case.
“Without Replication” in Excel Add-in can be used for two very different situations:
1) Two-Way Between Groups ANOVA with a single observation per category; and
2) One-Way Within Groups ANOVA.
ANOVA method Excel Add-in corresponding tool
One-Way Between-Groups ANOVA Anova: Single Factor
Two-Way Between Groups ANOVA Anova: Two-Factor With Replication
More than one observation per category (standard)
Two-Way Between Groups ANOVA Anova: Two-Factor Without Replication
A single observation per category
One-Way Within Groups ANOVA Anova: Two-Factor Without Replication
7
Post Hoc vs Planned Comparison Tests
Post Hoc test Planned Comparison test
Purpose
Exploratory. You test the
difference between all potential
combination of Groups.
Confirmation of theory or hypothesis.
You test only the Groups you expect
to be different in a specific direction
(greater, lower).
Risk of Type I
error
Very low. Very unlikely to
generate a false positive. Reject
null hypothesis when it is true.
Low. Not quite as conservative as a
Post Hoc test. But, conservative
enough.
Risk of Type II
error
High. Not, unlikely to generate a
false negative (accept the null
hypothesis when it is false). This
test lacks Power.
Lower risk of Type II error than Post
Hoc test. The test is more sensitive,
more likely to uncover a difference.
It has more Power.
8
PH means Post Hoc test
PC means Planned Comparison test
HYPOTHESIS TESTING FLOW CHART
Multiple hypothesis testing. > 2 Samples or Groups
Multiple hypothesis test Transition test Post Hoc Test
Are the groups different? to facilitate Which group is different?
Post Hoc test
Tukey's HSD test (PH)
Scheffe test (PH)
Normal Between-Groups ANOVA
REGWQ test (PH)
Dunnett test (PC)
Unpaired t test
not Kruskal-Wallis test. Mann-Whitney
Bonferroni test (PH)
not Friedman test
Sidak test (PH)
Paired t test
Simple contrasts (PC)
Normal Within-Groups ANOVA
Repeated contrasts (PC)
Normal Mixed ANOVA No Post Hoc test
not No nonparametric
alternative
Unpaired testing
Difference between independent groups
(Between-Groups).
Single measure or observation.
Paired testing
Difference between same group before and
after treatment (Within-Groups).
Repeated measures or observations.
Mixed testing
Difference between independent groups
before and after treatment (Mixed).
Repeated measures or observations.
Research structure
Are we testing different groups once?
Are we testing the same group(s) at
different times?
Wilcoxon Sign
Rank Test
9
Two-Ways Between-Groups
ANOVA example
10
Data Format
For Excel Add-in
Y X1 X2
Int. Score Cowboy Gender
71 J. Wayne Male
76 J. Wayne Male
84 J. Wayne Male
72 J. Wayne Male
68 J. Wayne Male
66 J. Wayne Female
64 J. Wayne Female
66 J. Wayne Female
47 J. Wayne Female
66 J. Wayne Female
65 C. Eastwood Male
53 C. Eastwood Male
70 C. Eastwood Male
46 C. Eastwood Male
53 C. Eastwood Male
73 C. Eastwood Female
80 C. Eastwood Female
81 C. Eastwood Female
88 C. Eastwood Female
72 C. Eastwood Female
81 None Male
69 None Male
55 None Male
60 None Male
61 None Male
72 None Female
75 None Female
73 None Female
54 None Female
65 None Female
For XLStat
XLStat treats this
ANOVA as a linear
regression with one
dependent variable
and two qualitative
independent
variables.
Two-Way Between-Groups ANOVA
Two Independent variable: Cowboy preference in movies, Gender
One Dependent variable: Intelligence score
Male Female
John Wayne 71 66
76 64
84 66
72 47
68 66
Clint Eastwood 65 73
53 80
70 81
46 88
53 72
None 81 72
69 75
55 73
60 54
61 65
11
Excel Long Hand
Between-Sample Variability (BSV)
Sample
size Average Total Avg. Differ.^2
J. Wayne - Male 5 74.2 67.5 44.4
J. Wayne - Female 5 61.8 67.5 32.9
C. Eastwood - Male 5 57.4 67.5 102.7
C. Eastwood - Female 5 78.8 67.5 126.9
None - Male 5 65.2 67.5 5.4
None - Female 5 67.8 67.5 0.1
J. Wayne 10 68.0 67.5 0.2
C. Eastwood 10 68.1 67.5 0.3
None 10 66.5 67.5 1.1
Male 15 65.6 67.5 3.7
Female 15 69.5 67.5 3.7
SS DF (k - 1) MS
Corrected Model 1,562.3 5 312.5
Cowboy 16 2 8.0
Gender 112.1 1 112.1
Within-Sample Variability (WSV)
Sample -1 STDEV Variance
J. Wayne - Male 4 6.2 38.2
J. Wayne - Female 4 8.3 69.2
C. Eastwood - Male 4 9.8 96.3
C. Eastwood - Female 4 6.5 42.7
None - Male 4 10.2 103.2
None - Female 4 8.6 73.7
SS Within 1,693.2
DF (n - k) 24
MS Within 70.5
Between-Sample Variability/Within-Sample Variability Output
BSV/WSV
Source SS df MS F Sign.
Model 1,562.3 5 312.5 4.4 0.005
Cowboy 16.1 2 8.0 0.1 0.893
Gender 112.1 1 112.1 1.6 0.220
Cowboy*Gender 1,434.1 2 717.0 10.2 0.001
Error/Residual 1,693.2 24 70.5
Corrected Model 3,255.5 29
12
Excel Add-in
Anova: Two-Factor With Replication
SUMMARY Male Female Total
J. Wayne
Count 5 5 10
Sum 371 309 680
Average 74.2 61.8 68
Variance 38.2 69.2 90.4
C. Eastwood
Count 5 5 10
Sum 287 394 681
Average 57.4 78.8 68.1
Variance 96.3 42.7 189.0
None
Count 5 5 10
Sum 326 339 665
Average 65.2 67.8 66.5
Variance 103.2 73.7 80.5
Total
Count 15 15
Sum 984 1042
Average 65.6 69.5
Variance 118.4 106.1
ANOVA
Source of Variation SS df MS F P-value F crit
Sample 16.1 2 8.0 0.11 0.893 3.4
Columns 112.1 1 112.1 1.59 0.220 4.3
Interaction 1434.1 2 717.0 10.16 0.001 3.4
Within 1693.2 24 70.6
Total 3255.5 29
Cowboy
Gender
Error/Residual
Corrected Total/Model
13
XLStat ANOVA
Pred(I Score) / I Score
45
50
55
60
65
70
75
80
85
90
45 50 55 60 65 70 75 80 85 90
Pred(I Score)
IScore
Analysis of variance:
Source DF SS MS F Pr > F
Model 5 1562.3 312.5 4.43 0.005
Error 24 1693.2 70.5
Corrected Total 29 3255.5
Computed against model Y=Mean(Y)
Type I Sum of Squares analysis:
Source DF SS MS F Pr > F
Cowboy 2 16 8.0 0.1 0.893
Gender 1 112 112.1 1.6 0.220
Cowboy*Gender 2 1434 717.0 10.2 0.001
14
Post Hoc and
Planned Comparison tests
15
Tukey’s HSD (PH) vs Dunnett test (PC)
for Cowboys
Tukey's Honestly Significant Difference (HSD) test. Post Hoc test Dunnett test. Planned Comparison
MS Within 70.5 MS Within 70.5
n 10 Number per treatment/Number of Groups n 10
Standard Error 2.66 SQRT(MS within(1/n) Standard Error 3.76 SQRT(2MS within/n)
Average intelligence score: Average intelligence score:
C. Eastwood 68.1 C. Eastwood 68.1
J. Wayne 68.0 J. Wayne 68.0
None 66.5 None 66.5
A A/B*1.96 A A/B*1.96
Standard. Alpha Estimated Estimated Standard. Alpha Estimated Est.
Differ. Difference 0.05 Z value 2-tail P val. Differ. Difference 0.05 Z value 2-tail P val.
C. East vs None 1.60 0.60 Not sign. 0.33 0.74 C. East vs None 1.60 0.43 Not sign. 0.36 0.72
C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98 C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98
J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75 J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74
Critical value @ a 0.05. 2-tail Critical value @ a 0.05. 2-tail
df within 24 df within 24
k # groups 3 k # groups 3
alpha 0.05 from table 3.53 B alpha 0.05 from table 2.35 B
16
Tuckey’s Test (PH) (for Cowboys)
Tukey's Honestly Significant Difference (HSD) test. Post Hoc test
MS Within 70.5
n 10 Number per treatment/Number of Groups
Standard Error 2.66 SQRT(MS within(1/n)
Average intelligence score:
C. Eastwood 68.1
J. Wayne 68.0
None 66.5
A A/B*1.96
Standard. Alpha Estimated Estimated
Differ. Difference 0.05 Z value 2-tail P val.
C. East vs None 1.60 0.60 Not sign. 0.33 0.74
C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98
J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75
Critical value @ a 0.05. 2-tail
df within 24
k # groups 3
alpha 0.05 from table 3.53 B
17
Dunnett test (PC) (for Cowboys)
Dunnett test. Planned Comparison
MS Within 70.5
n 10
Standard Error 3.76 SQRT(2MS within/n)
Average intelligence score:
C. Eastwood 68.1
J. Wayne 68.0
None 66.5
A A/B*1.96
Standard. Alpha Estimated Est.
Differ. Difference 0.05 Z value 2-tail P val.
C. East vs None 1.60 0.43 Not sign. 0.36 0.72
C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98
J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74
Critical value @ a 0.05. 2-tail
df within 24
k # groups 3
alpha 0.05 from table 2.35 B
18
Comparing Dunnett vs Tukey’s across
various Mean difference levels
Comparing Dunnett's vs Tukey's across various Mean difference level.
Standard
error
a 5% 2-tail
critical
value
Dunnett's 3.76 2.35
Tukey's 2.66 3.53
Z 1.96
Tuckey's Test Dunnett Test
% of % of
Mean Standard. 2-tail 2-tail 2-tail Mean Standard. 2-tail 2-tail 2-tail
difference differen. Critical val. Z equival. P val est. difference differen. Critical val. Z equival. P val est.
0.5 0.19 5.3% 0.10 0.92 0.5 0.13 5.7% 0.11 0.91
1.0 0.38 10.7% 0.21 0.83 1.0 0.27 11.3% 0.22 0.82
2.0 0.75 21.3% 0.42 0.68 2.0 0.53 22.7% 0.44 0.66
3.0 1.13 32.0% 0.63 0.53 3.0 0.80 34.0% 0.67 0.51
4.0 1.51 42.7% 0.84 0.40 4.0 1.06 45.3% 0.89 0.37
5.0 1.88 53.3% 1.05 0.30 5.0 1.33 56.6% 1.11 0.27
6.0 2.26 64.0% 1.25 0.21 6.0 1.60 68.0% 1.33 0.18
7.0 2.64 74.7% 1.46 0.14 7.0 1.86 79.3% 1.55 0.12
8.0 3.01 85.3% 1.67 0.09 8.0 2.13 90.6% 1.78 0.08
9.0 3.39 96.0% 1.88 0.06 9.0 2.40 102.0% 2.00 0.05
19
Dunnett vs Tukey’s visual comparison
Dunnett vs Tuckey 2-tail p value
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
Mean difference
2-tailpvalue
Tukey
Dunnett
The Dunnett test is only marginally more sensitive or has more Power than the
Tukey’s test (more likely to find a statistically significant difference) when using a
2-tail test. However, with Dunnett, if warranted, you can also use a 1-tail test…
which would make a huge difference. With Tukey, you can’t do that.
20
Bonferroni vs Sidak Test adjusted a value
Multiple hypothesis testing adjustments
Corresponding to a: 5%
Adjusted a value:
# of
hypothesis Bonferroni Sidak
1 5.00% 5.00%
2 2.50% 2.53%
3 1.67% 1.70%
4 1.25% 1.27%
5 1.00% 1.02%
6 0.83% 0.85%
7 0.71% 0.73%
8 0.63% 0.64%
9 0.56% 0.57%
10 0.50% 0.51%
Bonferroni: a/# of hypothesis
Sidak: 1 - (1- a)1/# of hypothesis
Those tests consists in adjusting
the relevant Alpha threshold (i.e.
5%) for the number of
hypothesis you are testing.
Bonferroni simply divides the
Alpha value by the # of
hypothesis. Sidak uses a
compounding formula that is
technically more accurate but
makes no material difference in
this situation.
21
What would be qualifying a value?
At what familywise level a would a single hypothesis qualify (Sidak logic)
Original unpaired t test p value
0.5% 1% 2.5% 5% 10% 15%
1 0.01 0.01 0.03 0.05 0.10 0.15
2 0.01 0.02 0.05 0.10 0.19 0.28
3 0.01 0.03 0.07 0.14 0.27 0.39
# of 4 0.02 0.04 0.10 0.19 0.34 0.48
hypothesis 5 0.02 0.05 0.12 0.23 0.41 0.56
6 0.03 0.06 0.14 0.26 0.47 0.62
7 0.03 0.07 0.16 0.30 0.52 0.68
8 0.04 0.08 0.18 0.34 0.57 0.73
9 0.04 0.09 0.20 0.37 0.61 0.77
10 0.05 0.10 0.22 0.40 0.65 0.80
22
A Radical Idea: Skipping ANOVA
HYPOTHESIS TESTING FLOW CHART
Multiple hypothesis testing. > 2 Samples or Groups
Multiple hypothesis test Transition test Post Hoc Test
Are the groups different? to facilitate Which group is different?
Post Hoc test
Tukey's HSD test (PH)
Scheffe test (PH)
Normal Between-Groups ANOVA
REGWQ test (PH)
Dunnett test (PC)
Unpaired t test
not Kruskal-Wallis test. Mann-Whitney
Bonferroni test (PH)
not Friedman test
Sidak test (PH)
Paired t test
Simple contrasts (PC)
Normal Within-Groups ANOVA
Repeated contrasts (PC)
Normal Mixed ANOVA No Post Hoc test
not No nonparametric
alternative
Unpaired testing
Difference between independent groups
(Between-Groups).
Single measure or observation.
Paired testing
Difference between same group before and
after treatment (Within-Groups).
Repeated measures or observations.
Mixed testing
Difference between independent groups
before and after treatment (Mixed).
Repeated measures or observations.
Research structure
Are we testing different groups once?
Are we testing the same group(s) at
different times?
Wilcoxon Sign
Rank Test
23
Streamlined Testing
HYPOTHESIS TESTING FLOW CHART
Multiple hypothesis testing. > 2 Samples or Groups
Post Hoc Test
Which group is different?
Normal Unpaired t test
not Mann-Whitney
not Wilcoxon Sign Rk test
Normal Paired t test
Unpaired testing
Difference between independent groups
(Between-Groups).
Single measure or observation.
Paired testing
Difference between same group before and
after treatment (Within-Groups).
Repeated measures or observations.
Research structure
Are we testing different groups once?
Are we testing the same group(s) at
different times?
Bonferroni or
Sidak test (PH)
24
Disadvantages of Streamlined Testing
• You can’t run Tukey’s (PH) and Dunnett (PC).
Those tests are not just adjustment to P value,
and may be superior in certain circumstances;
• You don’t have access to any Planned
Comparison tests that are more sensitive (more
Power) and can allow you to use a 1-tail P value
when warranted;
• ANOVA gives you valuable information about
the different independent variables, and their
interaction.
Between-Sample Variability/Within-Sample Variability Output
BSV/WSV
Source SS df MS F Sign.
Model 1,562.3 5 312.45 4.43 0.005
Cowboy 16.1 2 8.03 0.11 0.893
Gender 112.1 1 112.13 1.59 0.220
Cowboy*Gender 1,434.1 2 717.03 10.16 0.001

Multiplicity, how to deal with the testing of more than one hypothesis.

  • 1.
  • 2.
    2 Probability of Makinga Type I error* when using a t test with > 2 Samples *A false positive. Rejecting the null hypothesis when it is true. Prob of Type I Error (Initial Confidence Level 95%) 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 1 2 3 4 5 6 7 8 9 10 # of Hypothesis Confidence level 95% Unadjusted a value 5% Probability of Type I Error # of Logic Logic Logic hypothesis Bonferroni Sidak Sidak 1 0.05 0.05 0.05 2 0.10 0.10 0.10 3 0.15 0.14 0.14 4 0.20 0.19 0.19 5 0.25 0.23 0.23 6 0.30 0.26 0.26 7 0.35 0.30 0.30 8 0.40 0.34 0.34 9 0.45 0.37 0.37 10 0.50 0.40 0.40
  • 3.
    3 How to test> 2 Samples Two Basic Steps: 1) Choose a specific ANOVA method, given your testing framework: Between-Groups, Within-Groups, or Mixed ANOVA… or use a nonparametric equivalent if warranted. This is to figure out whether your groups or samples are different overall. 2) Decide in advance whether to conduct Post Hoc (after the fact) or Planned Comparison tests to figure out which specific groups are different.
  • 4.
    4 The ANOVAs Between-Groups Unpaired testing.Difference between independent groups. Single measures or observations. Within-Groups Paired testing. Difference between same groups before and after treatment. Repeated measures or observations. Mixed Mixed testing. Difference between independent groups before and after treatment. Repeated measures or observations.
  • 5.
    5 ANOVA semantics One-Way Between-GroupsANOVA means an ANOVA with independent groups measuring one single independent variable and one dependent variable. The independent variable could be type of students by Major and the dependent variable could be math proficiency. Four-Way Between-Groups ANOVA using the same data, but in addition to Majors would also look at: Gender, Class (Freshman, Sophomore,…), and Ethnicity. So, you now have four independent variables. Balanced ANOVA means that each group or sample is of the same size (same number of male vs female, etc…). An Unbalanced ANOVA means that some of the samples are of different size.
  • 6.
    6 Excel ANOVA(s) Add-inCryptic Semantics “Factor” means the same as “Way.” They both mean Independent Variable. “With Replication” can be confused with “Repeated Measures” that typically means “Within Group” or paired testing. “Without Replication” can be confused with “Single Measure” that typically means “Between Groups” or unpaired testing. In Excel Add-in “With Replication” means you have more than one single data point per group or sample which is almost always the case. “Without Replication” in Excel Add-in can be used for two very different situations: 1) Two-Way Between Groups ANOVA with a single observation per category; and 2) One-Way Within Groups ANOVA. ANOVA method Excel Add-in corresponding tool One-Way Between-Groups ANOVA Anova: Single Factor Two-Way Between Groups ANOVA Anova: Two-Factor With Replication More than one observation per category (standard) Two-Way Between Groups ANOVA Anova: Two-Factor Without Replication A single observation per category One-Way Within Groups ANOVA Anova: Two-Factor Without Replication
  • 7.
    7 Post Hoc vsPlanned Comparison Tests Post Hoc test Planned Comparison test Purpose Exploratory. You test the difference between all potential combination of Groups. Confirmation of theory or hypothesis. You test only the Groups you expect to be different in a specific direction (greater, lower). Risk of Type I error Very low. Very unlikely to generate a false positive. Reject null hypothesis when it is true. Low. Not quite as conservative as a Post Hoc test. But, conservative enough. Risk of Type II error High. Not, unlikely to generate a false negative (accept the null hypothesis when it is false). This test lacks Power. Lower risk of Type II error than Post Hoc test. The test is more sensitive, more likely to uncover a difference. It has more Power.
  • 8.
    8 PH means PostHoc test PC means Planned Comparison test HYPOTHESIS TESTING FLOW CHART Multiple hypothesis testing. > 2 Samples or Groups Multiple hypothesis test Transition test Post Hoc Test Are the groups different? to facilitate Which group is different? Post Hoc test Tukey's HSD test (PH) Scheffe test (PH) Normal Between-Groups ANOVA REGWQ test (PH) Dunnett test (PC) Unpaired t test not Kruskal-Wallis test. Mann-Whitney Bonferroni test (PH) not Friedman test Sidak test (PH) Paired t test Simple contrasts (PC) Normal Within-Groups ANOVA Repeated contrasts (PC) Normal Mixed ANOVA No Post Hoc test not No nonparametric alternative Unpaired testing Difference between independent groups (Between-Groups). Single measure or observation. Paired testing Difference between same group before and after treatment (Within-Groups). Repeated measures or observations. Mixed testing Difference between independent groups before and after treatment (Mixed). Repeated measures or observations. Research structure Are we testing different groups once? Are we testing the same group(s) at different times? Wilcoxon Sign Rank Test
  • 9.
  • 10.
    10 Data Format For ExcelAdd-in Y X1 X2 Int. Score Cowboy Gender 71 J. Wayne Male 76 J. Wayne Male 84 J. Wayne Male 72 J. Wayne Male 68 J. Wayne Male 66 J. Wayne Female 64 J. Wayne Female 66 J. Wayne Female 47 J. Wayne Female 66 J. Wayne Female 65 C. Eastwood Male 53 C. Eastwood Male 70 C. Eastwood Male 46 C. Eastwood Male 53 C. Eastwood Male 73 C. Eastwood Female 80 C. Eastwood Female 81 C. Eastwood Female 88 C. Eastwood Female 72 C. Eastwood Female 81 None Male 69 None Male 55 None Male 60 None Male 61 None Male 72 None Female 75 None Female 73 None Female 54 None Female 65 None Female For XLStat XLStat treats this ANOVA as a linear regression with one dependent variable and two qualitative independent variables. Two-Way Between-Groups ANOVA Two Independent variable: Cowboy preference in movies, Gender One Dependent variable: Intelligence score Male Female John Wayne 71 66 76 64 84 66 72 47 68 66 Clint Eastwood 65 73 53 80 70 81 46 88 53 72 None 81 72 69 75 55 73 60 54 61 65
  • 11.
    11 Excel Long Hand Between-SampleVariability (BSV) Sample size Average Total Avg. Differ.^2 J. Wayne - Male 5 74.2 67.5 44.4 J. Wayne - Female 5 61.8 67.5 32.9 C. Eastwood - Male 5 57.4 67.5 102.7 C. Eastwood - Female 5 78.8 67.5 126.9 None - Male 5 65.2 67.5 5.4 None - Female 5 67.8 67.5 0.1 J. Wayne 10 68.0 67.5 0.2 C. Eastwood 10 68.1 67.5 0.3 None 10 66.5 67.5 1.1 Male 15 65.6 67.5 3.7 Female 15 69.5 67.5 3.7 SS DF (k - 1) MS Corrected Model 1,562.3 5 312.5 Cowboy 16 2 8.0 Gender 112.1 1 112.1 Within-Sample Variability (WSV) Sample -1 STDEV Variance J. Wayne - Male 4 6.2 38.2 J. Wayne - Female 4 8.3 69.2 C. Eastwood - Male 4 9.8 96.3 C. Eastwood - Female 4 6.5 42.7 None - Male 4 10.2 103.2 None - Female 4 8.6 73.7 SS Within 1,693.2 DF (n - k) 24 MS Within 70.5 Between-Sample Variability/Within-Sample Variability Output BSV/WSV Source SS df MS F Sign. Model 1,562.3 5 312.5 4.4 0.005 Cowboy 16.1 2 8.0 0.1 0.893 Gender 112.1 1 112.1 1.6 0.220 Cowboy*Gender 1,434.1 2 717.0 10.2 0.001 Error/Residual 1,693.2 24 70.5 Corrected Model 3,255.5 29
  • 12.
    12 Excel Add-in Anova: Two-FactorWith Replication SUMMARY Male Female Total J. Wayne Count 5 5 10 Sum 371 309 680 Average 74.2 61.8 68 Variance 38.2 69.2 90.4 C. Eastwood Count 5 5 10 Sum 287 394 681 Average 57.4 78.8 68.1 Variance 96.3 42.7 189.0 None Count 5 5 10 Sum 326 339 665 Average 65.2 67.8 66.5 Variance 103.2 73.7 80.5 Total Count 15 15 Sum 984 1042 Average 65.6 69.5 Variance 118.4 106.1 ANOVA Source of Variation SS df MS F P-value F crit Sample 16.1 2 8.0 0.11 0.893 3.4 Columns 112.1 1 112.1 1.59 0.220 4.3 Interaction 1434.1 2 717.0 10.16 0.001 3.4 Within 1693.2 24 70.6 Total 3255.5 29 Cowboy Gender Error/Residual Corrected Total/Model
  • 13.
    13 XLStat ANOVA Pred(I Score)/ I Score 45 50 55 60 65 70 75 80 85 90 45 50 55 60 65 70 75 80 85 90 Pred(I Score) IScore Analysis of variance: Source DF SS MS F Pr > F Model 5 1562.3 312.5 4.43 0.005 Error 24 1693.2 70.5 Corrected Total 29 3255.5 Computed against model Y=Mean(Y) Type I Sum of Squares analysis: Source DF SS MS F Pr > F Cowboy 2 16 8.0 0.1 0.893 Gender 1 112 112.1 1.6 0.220 Cowboy*Gender 2 1434 717.0 10.2 0.001
  • 14.
    14 Post Hoc and PlannedComparison tests
  • 15.
    15 Tukey’s HSD (PH)vs Dunnett test (PC) for Cowboys Tukey's Honestly Significant Difference (HSD) test. Post Hoc test Dunnett test. Planned Comparison MS Within 70.5 MS Within 70.5 n 10 Number per treatment/Number of Groups n 10 Standard Error 2.66 SQRT(MS within(1/n) Standard Error 3.76 SQRT(2MS within/n) Average intelligence score: Average intelligence score: C. Eastwood 68.1 C. Eastwood 68.1 J. Wayne 68.0 J. Wayne 68.0 None 66.5 None 66.5 A A/B*1.96 A A/B*1.96 Standard. Alpha Estimated Estimated Standard. Alpha Estimated Est. Differ. Difference 0.05 Z value 2-tail P val. Differ. Difference 0.05 Z value 2-tail P val. C. East vs None 1.60 0.60 Not sign. 0.33 0.74 C. East vs None 1.60 0.43 Not sign. 0.36 0.72 C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98 C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98 J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75 J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74 Critical value @ a 0.05. 2-tail Critical value @ a 0.05. 2-tail df within 24 df within 24 k # groups 3 k # groups 3 alpha 0.05 from table 3.53 B alpha 0.05 from table 2.35 B
  • 16.
    16 Tuckey’s Test (PH)(for Cowboys) Tukey's Honestly Significant Difference (HSD) test. Post Hoc test MS Within 70.5 n 10 Number per treatment/Number of Groups Standard Error 2.66 SQRT(MS within(1/n) Average intelligence score: C. Eastwood 68.1 J. Wayne 68.0 None 66.5 A A/B*1.96 Standard. Alpha Estimated Estimated Differ. Difference 0.05 Z value 2-tail P val. C. East vs None 1.60 0.60 Not sign. 0.33 0.74 C. East vs J. Wayne 0.10 0.04 Not sign. 0.02 0.98 J. Wayne vs None 1.50 0.56 Not sign. 0.31 0.75 Critical value @ a 0.05. 2-tail df within 24 k # groups 3 alpha 0.05 from table 3.53 B
  • 17.
    17 Dunnett test (PC)(for Cowboys) Dunnett test. Planned Comparison MS Within 70.5 n 10 Standard Error 3.76 SQRT(2MS within/n) Average intelligence score: C. Eastwood 68.1 J. Wayne 68.0 None 66.5 A A/B*1.96 Standard. Alpha Estimated Est. Differ. Difference 0.05 Z value 2-tail P val. C. East vs None 1.60 0.43 Not sign. 0.36 0.72 C. East vs J. Wayne 0.10 0.03 Not sign. 0.02 0.98 J. Wayne vs None 1.50 0.40 Not sign. 0.33 0.74 Critical value @ a 0.05. 2-tail df within 24 k # groups 3 alpha 0.05 from table 2.35 B
  • 18.
    18 Comparing Dunnett vsTukey’s across various Mean difference levels Comparing Dunnett's vs Tukey's across various Mean difference level. Standard error a 5% 2-tail critical value Dunnett's 3.76 2.35 Tukey's 2.66 3.53 Z 1.96 Tuckey's Test Dunnett Test % of % of Mean Standard. 2-tail 2-tail 2-tail Mean Standard. 2-tail 2-tail 2-tail difference differen. Critical val. Z equival. P val est. difference differen. Critical val. Z equival. P val est. 0.5 0.19 5.3% 0.10 0.92 0.5 0.13 5.7% 0.11 0.91 1.0 0.38 10.7% 0.21 0.83 1.0 0.27 11.3% 0.22 0.82 2.0 0.75 21.3% 0.42 0.68 2.0 0.53 22.7% 0.44 0.66 3.0 1.13 32.0% 0.63 0.53 3.0 0.80 34.0% 0.67 0.51 4.0 1.51 42.7% 0.84 0.40 4.0 1.06 45.3% 0.89 0.37 5.0 1.88 53.3% 1.05 0.30 5.0 1.33 56.6% 1.11 0.27 6.0 2.26 64.0% 1.25 0.21 6.0 1.60 68.0% 1.33 0.18 7.0 2.64 74.7% 1.46 0.14 7.0 1.86 79.3% 1.55 0.12 8.0 3.01 85.3% 1.67 0.09 8.0 2.13 90.6% 1.78 0.08 9.0 3.39 96.0% 1.88 0.06 9.0 2.40 102.0% 2.00 0.05
  • 19.
    19 Dunnett vs Tukey’svisual comparison Dunnett vs Tuckey 2-tail p value 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Mean difference 2-tailpvalue Tukey Dunnett The Dunnett test is only marginally more sensitive or has more Power than the Tukey’s test (more likely to find a statistically significant difference) when using a 2-tail test. However, with Dunnett, if warranted, you can also use a 1-tail test… which would make a huge difference. With Tukey, you can’t do that.
  • 20.
    20 Bonferroni vs SidakTest adjusted a value Multiple hypothesis testing adjustments Corresponding to a: 5% Adjusted a value: # of hypothesis Bonferroni Sidak 1 5.00% 5.00% 2 2.50% 2.53% 3 1.67% 1.70% 4 1.25% 1.27% 5 1.00% 1.02% 6 0.83% 0.85% 7 0.71% 0.73% 8 0.63% 0.64% 9 0.56% 0.57% 10 0.50% 0.51% Bonferroni: a/# of hypothesis Sidak: 1 - (1- a)1/# of hypothesis Those tests consists in adjusting the relevant Alpha threshold (i.e. 5%) for the number of hypothesis you are testing. Bonferroni simply divides the Alpha value by the # of hypothesis. Sidak uses a compounding formula that is technically more accurate but makes no material difference in this situation.
  • 21.
    21 What would bequalifying a value? At what familywise level a would a single hypothesis qualify (Sidak logic) Original unpaired t test p value 0.5% 1% 2.5% 5% 10% 15% 1 0.01 0.01 0.03 0.05 0.10 0.15 2 0.01 0.02 0.05 0.10 0.19 0.28 3 0.01 0.03 0.07 0.14 0.27 0.39 # of 4 0.02 0.04 0.10 0.19 0.34 0.48 hypothesis 5 0.02 0.05 0.12 0.23 0.41 0.56 6 0.03 0.06 0.14 0.26 0.47 0.62 7 0.03 0.07 0.16 0.30 0.52 0.68 8 0.04 0.08 0.18 0.34 0.57 0.73 9 0.04 0.09 0.20 0.37 0.61 0.77 10 0.05 0.10 0.22 0.40 0.65 0.80
  • 22.
    22 A Radical Idea:Skipping ANOVA HYPOTHESIS TESTING FLOW CHART Multiple hypothesis testing. > 2 Samples or Groups Multiple hypothesis test Transition test Post Hoc Test Are the groups different? to facilitate Which group is different? Post Hoc test Tukey's HSD test (PH) Scheffe test (PH) Normal Between-Groups ANOVA REGWQ test (PH) Dunnett test (PC) Unpaired t test not Kruskal-Wallis test. Mann-Whitney Bonferroni test (PH) not Friedman test Sidak test (PH) Paired t test Simple contrasts (PC) Normal Within-Groups ANOVA Repeated contrasts (PC) Normal Mixed ANOVA No Post Hoc test not No nonparametric alternative Unpaired testing Difference between independent groups (Between-Groups). Single measure or observation. Paired testing Difference between same group before and after treatment (Within-Groups). Repeated measures or observations. Mixed testing Difference between independent groups before and after treatment (Mixed). Repeated measures or observations. Research structure Are we testing different groups once? Are we testing the same group(s) at different times? Wilcoxon Sign Rank Test
  • 23.
    23 Streamlined Testing HYPOTHESIS TESTINGFLOW CHART Multiple hypothesis testing. > 2 Samples or Groups Post Hoc Test Which group is different? Normal Unpaired t test not Mann-Whitney not Wilcoxon Sign Rk test Normal Paired t test Unpaired testing Difference between independent groups (Between-Groups). Single measure or observation. Paired testing Difference between same group before and after treatment (Within-Groups). Repeated measures or observations. Research structure Are we testing different groups once? Are we testing the same group(s) at different times? Bonferroni or Sidak test (PH)
  • 24.
    24 Disadvantages of StreamlinedTesting • You can’t run Tukey’s (PH) and Dunnett (PC). Those tests are not just adjustment to P value, and may be superior in certain circumstances; • You don’t have access to any Planned Comparison tests that are more sensitive (more Power) and can allow you to use a 1-tail P value when warranted; • ANOVA gives you valuable information about the different independent variables, and their interaction. Between-Sample Variability/Within-Sample Variability Output BSV/WSV Source SS df MS F Sign. Model 1,562.3 5 312.45 4.43 0.005 Cowboy 16.1 2 8.03 0.11 0.893 Gender 112.1 1 112.13 1.59 0.220 Cowboy*Gender 1,434.1 2 717.03 10.16 0.001