STATISTICS: Key Concepts for Experimental Design and Data Analysis

STATISTICS Michael LaValley
4/14/2016

THE P-VALUE POLICE?
Often researchers see statistics (and statisticians) as
barriers to publishing their important work
However, good statistics can help you avoid wasting time
and money following false leads
My personal feeling is that if you are trying to use
statistics to show why your work is important and
publishable, then you need good statistics

ROLE OF EXPERIMENTAL DESIGN
Statistics can only be as good as the data
Good data requires thoughtfully designed experiments
Some failures of animal experiments to translate to human
trials have raised the issue of experimental design of
animal studies
 NXY-059 for Stroke (Gawrylewski 2007)
 Fluid resuscitation in bleeding trauma patients (Roberts 2002)

EXPERIMENTAL DESIGN
A well designed experiment should
 Produce unbiased comparisons between groups
 Provide precise estimates
Well designed experiments require
 Clear objectives
 Planning
 Sample size large enough to achieve the objectives with good
power

EXPERIMENTAL DESIGN
Comparison/Control group
 Concurrent controls
 Internal control (before and after treatment)
Replication
 Reduce effect of uncontrolled variation
 Quantify the uncertainty in the results
Randomization
 Computer generated
Blocking or stratification
Blinding

HYPOTHESIS TESTS
Hypothesis tests answer a yes/no question about a
population value
Example:
 Quantitative assay for level of antibodies for a virus in mice
 Does a vaccine have an effect on the levels of antibodies?
Null Hypothesis (H0) corresponds to no effect
Alternative Hypothesis (HA) indicates that there is an effect

HYPOTHESIS TESTS
Example:
 Suppose there are 10 mice available for the experiment
 Assay the mice for antibodies before and after vaccination
 Xi is the difference in assay values for mouse number i
 Is the mean value of the Xi close to 0? No effect
 μ is population mean difference
 Null hypothesis H0: μ=0
 Alternative hypothesis HA: μ≠0

HYPOTHESIS TESTS
The goal of a hypothesis test is to reject H0
Rejecting H0 indicates that either
 H0 is wrong
 A rare event occurred (type I error)
We cannot confirm H0 on the basis of a test
 We may fail to reject H0, but we do not accept H0

HYPOTHESIS TESTS
Each test has an associated test statistic
For a paired t-test for the mouse vaccine data
We reject H0 when T > t*
 t* is chosen so that
Pr(Reject H0 when H0 is true) = α
 In this case, t* is from a t-distribution with 9 degrees of freedom
(number of mice – 1)
/ 10
X
T
s
=

HYPOTHESIS TESTS
Values
used are
from the t
distribution
with 9
degrees of
freedom

HYPOTHESIS TESTS
Decision
Not
Reject H0
Reject H0
Truth
H0 True Right Type I
Error (α)
H0 False Type II
Error (β)
Right
(Power)
Unfortunately with
testing comes the
possibility of reaching
a wrong conclusion and
making an error

HYPOTHESIS TESTS
Type I Error – reject H0 when it is true (false positive
finding)
 Hypothesis tests are set up so that the user specifies the Type I Error
rate
 Significance level α, almost always 0.05
Type II Error – failing to reject H0 when it is false (false
negative finding)
 As the Type I error rate is decreased, the rate of Type II error is
increased

HYPOTHESIS TESTS
The significance level is the rate of false positive findings
that you are willing to live with
Power is the probability of rejecting the null hypothesis (1
- Type II Error rate)
 Once the significance level is set, the Power is determined by the
sample size
 For the alternative shown in the figure, the power is 76%

HYPOTHESIS TESTS
For a 0.05 two
sided t-test
with 9 degrees
of freedom, we
reject the null if
T<-2.26 or
T>2.26
76% power if
true difference
is 3.0

HYPOTHESIS TESTS
Role of sample size
 In designing an experiment, one should determine an
appropriate sample size for the goals of the experiment
 Given
 Expected difference between groups
 Expected variability of measurements
 Significance level that will be used
 Power to be targeted
 One can determine the sample size to achieve the study goal

HYPOTHESIS TESTS
Role of sample size
 There are software packages and online power calculators
available for determining sample size
 If the sample size is too small for the study goal, test result is likely
to be negative (underpowered)
 If the sample size is too large for the study goal, resources will be
wasted

http://homepage.stat.uiowa.edu/~rlenth/Power/

HYPOTHESIS TESTS
P-value
 Smallest level of significance for which you would reject the Null
Hypothesis with your data
 Probability of obtaining data as extreme as what was found if the
Null Hypothesis were true
 Provides a measure of the evidence against the Null Hypothesis
 Small p-values (close to 0) show strong evidence against the null hypothesis
 Large p-values (close to 1) show only weak evidence against the null hypothesis

HYPOTHESIS TESTS
If p-value ≤ α then reject H0
The p-value is determined by
 How far the data are from the Null Hypothesis
 The sample size
The larger the sample, the smaller the p-value and the
greater the power

HYPOTHESIS TEST LIMITATIONS
P-values and hypothesis tests give a dichotomous
(significant/not significant) view of study results
Statistically significant means that the observed difference
is unlikely to be due to chance
 Either H0 is not correct or
 The observed data is a rare event – happening no more than
(100*α)% of the time

HYPOTHESIS TEST LIMITATIONS
Statistical significance doesn’t mean that the observed
difference is important
 Could find a significantly significant result with a large sample
size when the observed difference is small and unimportant
 Could have a large and important difference between groups
with a small sample size and not have statistical significance
 Would especially be the case for an underpowered study

CONFIDENCE INTERVALS
Confidence intervals show the precision of the sample
values as estimates of population values
 Provides a range of population values that are consistent with the
study findings
 Often more informative than the p-values

TEST OR INTERVAL LIMITATIONS
A significance test/confidence interval doesn’t provide a
check of the study design
 Example: in a study of gene expression
 Cancer tissue samples kept on ice while the normal tissue samples are processed
 Observed differences in expression may be due to iced/not iced rather than
cancer/normal
 A statistical procedure will never indicate that this is the reason for the result

ROLE OF DATA DISTRIBUTION
Particular tests are tuned for data from the normal
(Gaussian) distribution
 Examples
 T-test
 Standard (Pearson) correlation
Often it is difficult to be sure that the data come from
the normal distribution
 Plot histograms of data – bell-shaped and symmetric?
 Plot ordered data values against expected normal values – is
a straight line is obtained? (called QQplots)
 Plots require a substantial amount of data to be conclusive

ROLE OF DATA DISTRIBUTION
Some tests are specifically designed to work reasonably
well with data from any distribution
 Called Nonparametric or distribution-free tests
 Examples
 Wilcoxon test (alternative to t-test)
 Spearman correlation (alternative to standard correlation)
In some situations these may be less likely to reject the null
hypothesis of no difference than tests based on normal
data
May want to see if nonparametric results are similar to
those assuming normality

EXAMPLE
Study question: what is the effect of calcium on blood
pressure in African-American men
Experiment: a Randomized comparison
 Treatment group of 10 men received a calcium supplement for
12 weeks
 Control group of 11 men received a placebo during the same
period
Outcome is the difference in the seated systolic blood
pressure (BP) over the 12-week period
Lyle RM, et al., "Blood pressure and metabolic effects of calcium supplementation in normotensive white
and black men," JAMA, 257(1987), pp. 1772-1776

DATA DISTRIBUTION
Histograms by group

EXAMPLE
These plots aren’t very useful in determining the data
distribution
 Don’t really suggest normality
 Aren’t conclusively non-normal either
 Ambiguity is typical with small numbers
Should probably look at both t-test and Wilcoxon test
 If same results – everything is fine
 If different results – probably trust nonparametric more

EXAMPLE
The t-test is not significant at the 0.05 significance
level
 P-value = 0.12
The Wilcoxon test is not statistically significant at the
0.05 significance level
 P-value = 0.33
The test results are consistent in that with either we fail
to reject the null hypothesis
Important difference? Check the confidence intervals

EXAMPLE
Mean
Decrease in
BP
95%
Confidence
Interval
Calcium
Group
5.00 -1.26 to
11.26
Control
Group
-0.27 -4.24 to 3.69
Difference 5.27 -1.48 to
12.03

EXAMPLE
So we found a 5 mm Hg difference between
groups…
 Might be large enough to be important?
 But can’t rule out that this finding is due to chance (P-value >
α)
If 5 mm Hg is worth pursuing, would need to evaluate
this in a larger sample
 Do the power and sample size calculation!
If not, pursue more promising therapies

MULTIPLE-TESTING
Another issue to be aware of is limits of ordinary statistical
significance when doing many tests
When we use a significance level of α=0.05, we allow
about 5 out of every 100 tests to be false positives
When 10s or 100s of tests are run, false positive findings
are almost guaranteed

http://prefrontal.org/files/posters/Bennett-Salmon-2009.pdf

MULTIPLE-TESTING
Methods exist (and new ones are being continually
developed) to deal with multiple testing issues
 Bonferroni correction
 Tukey’s method
 False discovery rates
 Which method is used is less important than that something is done
to account for the number of tests

REFERENCES
Triola MM, Triola MF. Biostatistics for the Biological
and Health Sciences. Pearson Education Inc., 2006
Broman K. Statistics for Laboratory Scientists I, 2006
(Course Website)
http://ocw.jhsph.edu/courses/StatisticsLaboratoryScie
ntistsI/
Festing MFW, Overend P, Das RG, Borja MC, Berdoy
M. The Design of Animal Experiments. Laboratory
Animal Handbooks #14. Royal Society of Medicine
Press Ltd., 2011

REFERENCES
Festing M. Principles: the need for better
experimental design. TRENDS in Pharmacological
Sciences, 24:341-5, 2003
Roberts I, Kwan I, Evans P, Haig S. Does animal
experimentation inform human healthcare?
Observations from a systematic review of
international animal experiments on fluid resuscitation.
BMJ, 324:474-6, 2002

STATISTICS: Key Concepts for Experimental Design and Data Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to STATISTICS: Key Concepts for Experimental Design and Data Analysis

Similar to STATISTICS: Key Concepts for Experimental Design and Data Analysis (20)

More from Mike LaValley

More from Mike LaValley (6)

Recently uploaded

Recently uploaded (20)

STATISTICS: Key Concepts for Experimental Design and Data Analysis