2. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
2
10. Cohen’s five‐eighty convention
Can’t reject H0 Reject H0
H0 is true
systems are equivalent
Correct conclusion
(1‐α)
Type I error
α
H0 is false
systems are different
Type II error
β
Correct conclusion
(1‐β)
Statistical power:
ability to detect
real differencesCohen’s five‐eighty convention:
α=5%, 1‐β=80% (β=20%)
Type I errors 4 times as serious as Type II errors
The ratio may be set depending on specific situations
10
11. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
11
20. Two‐sample t‐test (1)
x1j : nDCG of System 1 for the j‐th topic (n1 topics)
x2j: nDCG of System 2 for the j‐th topic (n2 topics)
Assume that the scores are independent and that
Homoscedasticity (equal variance) assumption.
But the t‐test is actually quite robust to the assumption
violation. For a discussion on Student’s and Welch’s
t‐tests, see [Sakai16SIGIRshort, Sakai18book]
20
24. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
24
26. One‐way ANOVA, equal group sizes (1)
• Data format:
• Basic assumption:
or
• Question: Are the m population means equal?
unpaired data, but
equal group sizes
(e.g. #topics)
homoscedasticity
Generalises the two‐sample t‐test, and can handle unequal
group sizes as well. See [Sakai18book]
population mean for System i
26
37. One‐way ANOVA with R (4)
• φA = m‐1 = 3‐1 = 2
• φE1 = m(n‐1) = 3(20‐1) = 57
The system effect
is statistically significant
at α = 0.05
p‐value
The three systems are probably not all equally effective,
but we don’t know where the difference lies.
37
45. Two‐way ANOVA without
replication with R (3)
• φA = 3‐1 = 2
• φB = 20‐1 = 19
• φE1 = (3‐1)*(20‐1)= 38
The system effect
is statistically highly significant
(so is the topic effect)
The three systems are probably not all equally effective,
but we don’t know where the difference lies. 45
46. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
46
63. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
63
80. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
80
81. Effect sizes
P‐value = f(sample_size, effect_size)
‐ A large effect size ⇒ a small p‐value
‐ A large sample size ⇒ a small p‐value
For example, consider:
From the paired
t‐test
Magnitude of the difference
A large effect size (standardised mean difference)
⇒ a large t‐value ⇒ a small p‐value
A large sample size (topic set size)
⇒ a large t‐value ⇒ a small p‐value
Anything can be made
statistically significant
by making n large enough! 81
95. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
95
105. Lecture Outline
• Introduction
• How to conduct t‐tests with R
• How to conduct ANOVA with R;
• How to conduct the Tukey HSD test with R;
• How to conduct the randomised Tukey HSD test;
• How to use topic set size design tools;
• How to use power analysis tools.
• Summary
105