This document provides an overview of statistical power, analysis of variance (ANOVA), and post hoc tests. It defines statistical power and explains how to calculate power and minimum sample size. It then describes ANOVA, comparing it to t-tests. ANOVA partitions variability between and within groups. The document interprets ANOVA tables and explains F distributions. Conditions for ANOVA and post hoc tests like Bonferroni corrections are also covered. Finally, it briefly mentions different types of ANOVA like one-way and factorial.
2. Statistical Power
Statistical power of a study (i.e., test of statistical significance)
− Probability that it will correctly reject a false null hypothesis
− Probability that it will correctly detect an effect/difference
Why calculate statistical power?
− Perhaps you want to know in advance the minimum sample size
necessary to have a reasonable chance of detecting an effect
− Alternatively, if you found out that your (costly) study only had power =
0.3, would you proceed with the study?
Fail to reject 𝐻0 Reject 𝐻0
𝐻0 is True Confidence Level Type I error (𝛼)
𝐻0 is False Type II error (𝛽) Power
3. Calculating Statistical Power
Power Calculator: http://www.statisticalsolutions.net/pss_calc.php
You hypothesize that your weight loss drug helps people lose 2kg
over a month. Assuming 𝜎 = 8, 𝑝 = 0.05, and 𝑝𝑜𝑤𝑒𝑟 = 0.8, what
is the minimum sample size required to detect an effect?
You realize that you only have budget for 30 participants to
conduct your trial. Assuming the same parameters as above and
with 𝑁 = 30, what is the power of your trial?
4. Parameters of Statistical Power
𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝐿𝑒𝑣𝑒𝑙 / 𝑝 − 𝑣𝑎𝑙𝑢𝑒 (𝛼) 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑖𝑧𝑒 (𝑁)
𝐸𝑓𝑓𝑒𝑐𝑡 𝑆𝑖𝑧𝑒 (𝐶𝑜ℎ𝑒𝑛′ 𝑠 𝑑) 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝜎2
Tests with smaller p-values are more “rigorous” and
require more power.
− Increasing p-value from 0.01 to 0.1 means that you will
be rejecting 𝐻0 more often (99% vs 90%)
− There is a greater chance of accepting B relative to A
The bars show the 95% CI. In C, the sample sizes
are small and thus the CI is large; in contrast, D has
larger sample sizes and thus smaller CI.
As a result, it would be easier to detect the difference
in D relative to C.
DC
FE
Given that the size of difference (effect size) in F is
much larger than in E, a statistical test would find it
easier to detect the difference in F.
As the distribution of H has lesser variance than G,
there would be lesser overlap in their CIs. Thus, it
would be easier to detect the difference in H.
HG
BA 𝛼 = 0.01 𝛼 = 0.1
5. Comparing more than two means
T-test
− Only two groups/levels are involved
− Dependent t-test: Whether McDonalds makes you gain weight (before vs. after)
− Independent t-test: Whether McDonalds or KFC makes you gain more weight
What if we have more than two levels?
− Analysis of Variance (ANOVA)
6. Hypothesis testing for ANOVA
Null hypothesis 𝐻0
− The mean outcome is the same across all categories
− 𝜇1 = 𝜇2 = … = 𝜇 𝑘
where 𝜇𝑖 = mean of the outcome for observations in category I
where 𝑘 = number of groups
Alternative hypothesis (𝐻 𝑎)
− At least one pair of means are different from each other
Is there a difference between the average weight gain from
consuming three types of fast foods
− Categories: (i) No fast food/control, (ii) McDonalds, (iii) KFC, (iv) Subway
7. Variability portioning in ANOVA
ANOVA allows us to separate out variability due to
conditions/levels
Total variability in
weight gain
Between group variability:
variability due to food type
Within group variability:
variability due to other factors
8. t test vs. ANOVA
t test
− Compare means from two groups
− Are they so far apart that the difference
cannot be attributed to sampling
variability (i.e., randomness)?
− 𝐻0: 𝜇1 = 𝜇2
Test statistic
𝑡 =
𝑥1 − 𝑥2 − 𝜇1 − 𝜇2
𝑆𝐸( 𝑥1− 𝑥2)
ANOVA
− Compare means from more than two
groups
− Are they so far apart that the difference
cannot be attributed to sampling
variability (i.e., randomness)?
− 𝐻0: 𝜇1 = 𝜇2 = ⋯ = 𝜇 𝑘
Test statistic
𝐹 =
𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
Large test statistics lead to small p-values
If p-value is small enough, 𝐻0 is rejected and we conclude that that data provides evidence of a
difference in population means
9. F Distribution
Probability distribution associated with the f statistic
− In order to be able to reject 𝐻0, we need a small p-value which requires a
large F statistic
− To get a large F statistic, variability between sample means needs to be
greater than variability within sample means
p-value
− Probability of as large a ratio between the ‘between’ and ‘within’ group
variabilities, if in fact the means of all groups are equal
Accept 𝐻0
Reject 𝐻0
𝐹 =
𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠
10. Interpreting the ANOVA table (sum of squares)
Sum of squares (total)
− Measures the total variability
− Calculated very similarly to variance
except not scaled by sample size
Sum of squares (group)
− Measures variability between
groups
− Deviation of group mean from
overall mean, weighted by sample
size
Sum of squares (error)
− Measures the variability within
groups
− Unexplained by the group variable
Total 119 501.9
11. Interpreting the ANOVA table (degrees of freedom)
Degrees of freedom (total)
− n - 1
− Where n = number of observations
Degrees of freedom (group)
− k – 1
− Where k = number of groups
Degrees of freedom (error)
− Degrees of freedom total – degrees
of freedom group
Total 119 501.9
12. Interpreting the ANOVA table (mean squares)
Mean squares (group)
− Average variability between groups
− Total variability (sum sq) scaled by
the associated df
− Mean square (group) / degrees of
freedom (group)
Mean squares (error)
− Average variability within groups
− Total variability (sum sq) scaled by
the associated df
− Mean square (error) / degrees of
freedom (error)
Total 119 501.9
13. Interpreting the ANOVA table (F statistics & p)
F statistic
− Ration of the between group and
within group variability
− Mean square (group) / mean
square (error)
p-value
− Probability of as large a ratio
between the ‘between’ and ‘within’
group variabilities, if in fact the
means of all groups are equal
Total 119 501.9
14. Interpreting the ANOVA table (p-value)
If p-value is small (less than 𝛼),
reject 𝐻0
− The data provides evidence that
at least one pair of means
different from each other
− But we can’t tell which pair
If p-value is large (more than 𝛼),
fail to reject 𝐻0
− The data does not provide
evidence that one pair of means
are different from each other
− The observed differences could
be due to chance
Total 119 501.9
15. Conditions for ANOVA
Independence
− Within groups: sampled observations must be independent
− Between groups: groups must be independent of each other
Approximate normality
− Within each group, distributions should be nearly normal
Equal variance (homoscedasticity)
− Groups should have roughly equal variability
16. So how do we find out which means differ?
We conduct independent t tests for differences between each
possible pair of groups (multiple comparisons)
− However, with multiple t test, there could be an inflated Type I error rate
Thus, we use a modified significance level, which ranges from the
most liberal to the most conservative
− Most liberal: no correction
− Most conservative: Bonferronni correction
17. Bonferroni correction
The Bonferroni correction suggests that a more stringent
significance level is more appropriate for multiple corrections
− Thus, we adjust 𝛼 by the number of comparisons considered
𝑎∗ =
𝛼
𝐾
Where 𝐾: number of comparisons, i.e.,
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝𝑠 (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝𝑠 −1)
2
18. Bonferroni Correction
In our example, the fast food variable has 4 level: (i) control, (ii)
McDonalds, (iii) KFC, (iv) Subway. If 𝛼 = 0.05, what should the
modified significance level be?
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑒𝑣𝑒𝑙𝑠 𝑘 = 4
𝐾 =
4 ×(4−1)
2
= 6
𝑎∗ =
𝛼
𝐾
=
0.05
6
≈ 0.0083
19. Types of ANOVA
One-way ANOVA
− Between-groups
− Repeated measures
Factorial ANOVA
− Two or more independent variables
− Allows for examination of interaction effects
20. The t-test should suffice for most of your hypothesis testing needs
− For our understanding though, what other forms of hypothesis tests are there?
− Chi-Square
− Independent variable: Gender (proportion in general population)
− Dependent variable: Gender (proportion in engineering faculty)
− Linear Regression
− Independent variable: Age
− Dependent variable: Income
− Logistic Regression
− Independent variable: Age
− Dependent variable: Marital status
What other kinds of statistical tests are there?
Dependent Variable
Continuous Categorical
Independent
Variable
Continuous
Categorical t-test
Linear Regression Logistic Regression
Chi-square test
21. Time for practice
In this lab session we will cover:
− ANOVA
− Bonferroni Correction
GitHub repository: https://github.com/eugeneyan/Statistical-Inference