Common Statistical Errors
In Medical Publications
Dr Petra Graham
• Statistical errors in medical journals are surprisingly
common. For example:
• Olsen (2003) found that 54% of a sample of 141 papers
published in Infection and Immunity had errors in
reporting, analysis or both.
• Yim et al. (2010) found 79% of a sample of 139 papers
published in the Korean Journal of Pain had errors.
• Nieuwenhuis et al (2011) found that 15% of articles
reviewed in the top ranking journals Science, Nature,
Nature Neuroscience, Neuron and The Journal of
Neuroscience had used the wrong method.
Errors are surprisingly common..
Types of errors
Errors can be broadly classified into three main areas
• Errors in design
• Errors in analysis
• Reporting and interpretational errors
Various publications describe these problems:
Eg. Clark (2011), Lang (2004), Olsen (2003), Strasak et al. (2007)
Common Design Errors:
• Lack of a sample size calculation (or wrong calculation)
• Studies with too few subjects are underpowered – a difference won’t be found even if
a real difference exits (Altman and Bland, 1995)
Sung et al. (1993)
More Design Errors:
• Primary outcome measures unclear
• Randomisation method unclear
• Hypotheses unclear
• An a priori analysis plan should be made
so that it’s clear that the research isn’t
the result of a “fishing expedition”
Errors in analysis
• Testing for equality of baseline characteristics
• Potentially misleading, not meaningful, not
Yang et al. (2017)
Expect 5% (~1)
to be spuriously
More on analysis errors:
• Use of the wrong test eg
• Two-sample t-test (for independent groups) used where a paired t-test (for dependent groups)
should have been and vice versa
• Parametric methods used where non-parametric should have been used (i.e. in skewed data,
• Methods not appropriate for data type eg linear regression used with ordinal response.
• Failure to adjust p-values for multiple testing (to avoid Type I errors)
• Failure to carefully define all of the tests used in the methods section
And more on analysis errors:
• In RCTs comparisons within groups but not between groups tests are performed (or are
• Watson et al (2009) compared an anti-ageing product (n=30) with a placebo (“vehicle”) (n=30).
• They found the test product showed significant improvement in facial wrinkles compared to
baseline assessment (P = 0·013), with no significant improvement given by the vehicle (P = 0·11).
• But, there was no significant difference between test and vehicle (P=0·72).
• Media suggested this was the first anti-ageing cream “proven to work.”
• But the treatment vs placebo comparison is what matters – this is the only comparison that shows
that the treatment works (or not)!
See Bland and
Altman (2011) for
a useful discussion
on this paper!
And more on analysis:
• Continuous data made binary or into ordinal categories (or ordinal
categories made binary) without justification
• May be done to “find”/increase significance
• Typically a great loss of information results from dichotomisation
• Failure to show/comment on assumptions required for testing.
Errors/Deficiencies in Reporting Statistics
• Failure to use (or define the use of) a variability measure (eg. SD)
• Use of mean and standard deviation (SD) in skewed data
• median and quartiles are preferable
• Using standard error (SE) of the mean instead of SD in descriptive
statistics or confusing the two
• SE used because it is smaller so “looks” better
• Reporting thresholds for p-values rather than the actual p-values
• Reporting p-value but no data (i.e. estimate and interval, change
and interval etc) – like the anti-ageing cream study
• Reporting significance of a test or analysis not shown or described
Errors in conclusions
• Make sure
Errors in conclusions
• Conclusions are drawn that are not supported by results
• Interpreting “not significant” as “not different” or “equivalent”
Yang et al., 2017
Sung et al., 1993
Errors in conclusions
• Making too much of
potentially spurious results
in the conclusions
Useful summaries of errors
Altman DG, Bland JM. Statistics notes: Absence of evidence is not evidence of absence. BMJ 1995; 311 :485
Bland JM, Altman DG. Comparisons against baseline within randomised groups are often used and can be highly misleading. Trials 2011,
Clark GT, Mulligan R. Fifteen common mistakes encountered in clinical research. Journal of Prosthodontic Research 2011; 55:1-6
Lang T. Twenty Statistical Errors Even YOU Can Find in Biomedical Research Articles. Croatian medical journal 2004; 45(4): 361-370
Nieuwenhuis S et al. Erroneous analyses of interactions in neuroscience: a problem of significance. Nature Neuroscience 2011; 14: 1105-
Olsen CH. Guest commentary: Review of the Use of Statistics in Infection and Immunity. Infection And Immunity 2003;71(12): 6689–6692
Strasak, AM et al. Statistical errors in medical research – a review of common pitfalls. Swiss Medical Weekly 2007; 137: 44-49
Yim KH et al. Analysis of Statistical Methods and Errors in the Articles Published in the Korean Journal of Pain. Korean Journal of Pain 2010;
Sung et al. Octreotide infusion or emergency sclerotherapy for variceal haemorrhage. Lancet 1993; 342: 637-41
Watson REB, et al. A cosmetic ‘anti-ageing’ product improves photoaged skin: a double-blind, randomized controlled trial. Br J Dermatol
Yang et al. Finding the Optimal volume and intensity of Resistance Training Exercise for Type 2 Diabetes: The FORTE Study, a Randomized
Trial. Diabetes Research and Clinical Practice 2017; 130: 98-107.
http://www.tylervigen.com/spurious-correlations (correlation plots)
Thanks to Deb Wyatt, Michael Martin and the MedStats Google Group users for some great examples and references.
• Several of you asked me how you could get in contact with
statisticians to include as reviewers for papers or on your editorial
boards. There are several approaches that can be taken:
1. Email the anzstat mailing list
is a list for people interested in statistics. Because you could identify
people at any stage of their career or non-statisticians it would be
important to ask for a CV and maybe check references.
2. Approach university department heads in stats/maths/biostatistics and
ask for recommendations on people to invite.
3. I plan to talk to the Statistical Society of Australia about putting together
a registry of statisticians willing to help.