Copenhagen 2008

How to improve
the chance of getting your manuscript
accepted for publication

Jonas Ranstam PhD

Cohort study of smoking
and lung cancer (1954)
(Bradford Hill) Evidence based
medicine
Case-control study of (The Cochrane
smoking and lung collaboration 1993)
cancer (1950)
(Bradford Hill)

Randomised clinical
trial of streptomycin
and tubercolosis
(1948)
(Bradford Hill)

Anecdotal
evidence
(Case reports)

Trial registration (2005)

EU directive (2001) Mandatory disclosure
of trial results (2008)
ICH GCP (1996)

CONSORT (1996)

WHO CIOMS (1993)

ICMJE Uniform Requirements (1978)

Helsinki declaration (1964)

Nürnberg convention (1949)

Plan
1. Methodological background
2. General guidelines
3. Special recommendations
a) case reports
b) mechanical experiments
c) in vitro/cadaver experiments
d) cross-sectional studies
e) epidemiological studies
f) randomized trials
4. Summary

What is statistics used for?
1. Describing data (statistics in the plural)

2. Interpreting uncertain data (statistics in the singular)

Two kinds of uncertainty
1. Uncertainty of measurement

2. Uncertainty of sampling

1. Uncertainty of measurement
The precision of the used measurement instrument.

The precision of the Finapres non-invasive blood pressure monitor
is on the average 12.1 mm Hg.

2. Uncertainty of sampling
Individual effects vary between subjects. Different
samples of subjects yield different observed mean
effects.

Example
Assume that the cumulative 10-year revision rate
of the Oxford knee prosthesis is 8% and that two
groups of 100 patients receiving the prosthesis are
randomly selected and followed over time.

The two groups are likely to get different numbers
of patients revised during follow up.

375 randomly ordered patients of which 30 (8%) will be revised within 10 years

6% revised

12% revised

Sampling uncertainty

6% revised

12% revised

H0: The two samples represent the same population
H1: The two samples represent different populations

P-value
The probability that an observed effect only reflects
sampling uncertainty.

12/100 vs. 6/100, Fisher's exact test p = 0.22

P-values are often misunderstood
They cannot

- describe clinical relevance (they depend on sample
size)

- show that a difference “does not exist”, because
n.s. is absence of evidence, not evidence of
absence

Confidence interval
A range of values, which with the specified confidence
level describes how likely it is that the estimated
population parameter is included.

12/100 vs. 6/100, RR = 2.0 (95%Ci: 0.7 - 5.6)

1/2 1 2 Relative Risk

Confidence interval
A range of values, which with the specified confidence
level describes how likely it is that the estimated
population parameter is included.

12/100 vs. 6/100, RR = 2.0 (95%Ci: 0.7 - 5.6)

p < 0.05
n.s.

1/2 1 2 Relative Risk

Important assumptions
Many statistical methods like the Student's t-test and
ANOVA are based on the assumption of Gaussian
distribution and homogeneous variance.

Many statistical methods like the Student's t-test and
ANOVA are based on the assumption of Gaussian
distribution and homogeneous variance.

If the assumptions are not met, use alternative (non-
parametric) methods, like the Mann-Whitney U-test or
Kruskal-Wallis non-parametric anova).

Most conventional methods (both parametric and non-
parametric) require independent observations.

Most conventional methods (both parametric and non-
parametric) require independent observations.

- Patients are independent

- Patients' knees, hips, shoulders, feet, etc. are not

pH against PaCO2 for eight subjects,
with parallel lines fitted for each subject

Bland, J M. et al. BMJ 1995;310:446

Incorrect analysis: r = -0.51, p < 0.001
Correct analysis: r = -0.07, p = 0.7

Copyright ©1995 BMJ Publishing Group Ltd.

How Many Patients? How Many Limbs? Analysis
of Patients or Limbs in the Orthopaedic Literature:
A Systematic Review

Bryant et al. JBJS Am. 2006;88:41-45.

Our findings suggest that a high proportion (42%) of
clinical studies in high-impact-factor orthopaedic journals
involve the inappropriate use of multiple observations from
single individuals, potentially biasing results. Orthopaedic
researchers should attend to this issue when reporting
results.

Most conventional methods (both parametric and
non-parametric) require independent observations.

Include only one observation per patient, or use a
statistical method that can handle dependant data,
e.g. multilevel or mixed effects models.

Always present both number of observations and
patients.

Multiplicity
In contrast to many other forms of precision,
statistical precision depends on the number of
performed measurements (significance tests).

Multiplicity
Each significance test at a 5% significance level
has 5% risk of a false positive test.

Repeated testing increases the risk of at least one
false positive test.

Number of tests Risk of at least one false positive

1 0.05
2 0.10
5 0.23
10 0.40

Example 1 (Subgroups, two tests)

Example 2 (Repeated testing,five tests)

Example 3 (Liver function, 10 tests)

Multiplicity
Common in exploratory analyses

Unacceptable in confirmatory analyses

Statistical Methods

“Describe statistical methods with enough detail to
enable a knowledgeable reader with access to the
original data to verify the reported results.”

Statistical Methods

“Describe statistical methods with enough detail to
enable a knowledgeable reader with access to the
original data to verify the reported results.”

Required for analytical methods (statistical models,
hypothesis tests, confidence intervals).

Descriptions are often unclear, vague or ambiguous.
They need to be clear and detailed.

Results

“When possible, quantify findings and present them
with appropriate indicators of measurement error or
uncertainty (such as confidence intervals).”

Results

“When possible, quantify findings and present them
with appropriate indicators of measurement error or
uncertainty (such as confidence intervals).”

Statistical precision (p-values and confidence inter-
vals) are necessary for generalization of results beyond
examined patients.

Results

“Avoid relying solely on statistical hypothesis testing,
such as the use of P values, which fails to convey
important information about effect size.”

Results

“Avoid relying solely on statistical hypothesis testing,
such as the use of P values, which fails to convey
important information about effect size.”

Describe both your observations and how you interpret
them (use confidence intervals or p-values).

Clinically Statistically significant
significant yes no

yes a b
no c d

There was, or was no, (statistically significant) difference is too simplistic

Example
Two side effects with a new osteoporosis treatment:

- A statistically significant reduction in body hair
growth rate by 5% (p = 0.04)

- A statistically insignificant increase in systolic
blood pressure by 25 mmHg (p = 0.06)

Confidence intervals are better
than p-values
In contrast to p-values they do

- relate to clinical significance

- show when a difference “does not exist”

because they present lower and upper limits of
potential clinical effects/differences

P-value and confidence interval
P-values Conclusion from confidence intervals

[2 alternatives] [6 alternatives]

p < 0.05 Statistically but not clinically significant effect

Statistically and clinically significant effect
p < 0.05

p < 0.05 Statistically, but not necessarily clinically, significant effect

n.s.
Inconclusive

n.s. Neither statistically nor clinically significant effect

p < 0.05 Statistically significant reversed effect

Effect
0
Clinically significant effects

When there is a difference in data

Do not write that there is not a difference!

There were indeed
differences, they are
0.45 and 0.57

There were indeed
differences, they are
0.45 and 0.57

Better alternative:

“The observed differences
in extraction torques
between the two types of
uncoated distal pins can
be explained by chance.”

Avoid non-technical use of technical
terms and use clear expressions

- significant clinically or statistically?
- no difference statistically insignificant?
- statistical difference statistically significant?
- matched selected or just comparable?
- correlation relation, regression?
- normal Gaussian distribution?
- random mathematical algorithm?
- etc.

Case reports can be used for
- Generation of new hypotheses

- Showing inconsistencies in established “facts”

Case reports may need
statistics (in the plural sense)
- Summary description of characteristics

- Description of change or variation over time

Case reports cannot be used for
- Generalizing findings like risk or treatment effect

(This requires statistics in the singular sense)

Mechanical experiments
What do p-values and confidence intervals
relate to?

- Measurement uncertainty (Perhaps)

- Sampling uncertainty (No, there is no
information on subject variation. The
findings cannot be generalized beyond
the device).

c) in vitro/cadaver experiments

In vitro/cadaver experiments
What do p-values and confidence intervals relate
to?

- Measurement uncertainty (Perhaps)

- Sampling uncertainty (Perhaps, if the
observations provide information on
variation between subjects)

Example

In a study with 60 observations 20 specimens
had been taken from each of 3 subjects.

The specimens were distributed randomly
between one control group and one
experimental group.

What do significance tests of these two groups
tell us?

Remember

- Sampling frame
- Target population
Super (for scientific questions)
Finite (requires corrections)
- Non-responders

Epidemiological studies
- Exploratory, hypothesis generating,
multiplicity issues considered less
important than validity issues

- External validity (source of subjects)

- Internal validity (confounding)

Results

Uniform Requirements: “Where scientifically
appropriate, analyses of the data by variables such as
age and sex should be included.”

Results

Uniform Requirements: “Where scientifically
appropriate, analyses of the data by variables such as
age and sex should be included.”

Observational studies require adjustment for known
and suspected confounding factors to produce valid
effect estimates.

This adjustment is usually performed using statistical
modelling (e.g. ANCOVA or regression analysis). The
purpose is to increase validity.

Results

Automatic stepwise regression (forward or backward)
is not an adequate method for confounding
adjustment.

Clinical trials

“The ICMJE member journals will require, as a
condition of consideration for publication in their
journals, registration in a public trials registry.”

“The ICMJE recommends that journals publish the trial
registration number at the end of the Abstract.”

Clinical trials

“When reporting experiments on human subjects,
authors should indicate whether the procedures
followed were in accordance with the ethical
standards of the responsible committee on human
experimentation (institutional and national) and with
the Helsinki Declaration of 1975, as revised in 2000
(5).”

WORLD MEDICAL ASSOCIATION DECLARATION OF HELSINKI

Ethical Principles for Medical Research Involving Human Subjects

27. ...Reports of experimentation not in accordance
with the principles laid down in this Declaration
should not be accepted for publication.

Purpose of a randomized trial

To test a hypothesis with control of random and
systematic errors.

- No bias (randomization & blinding)

- No multiplicity problems

Randomization
Mathematical algorithm

Stratified

Concealment of outcome

Reproducible

Study populations
Intention-to-treat Analyze all randomized subjects
(ITT) principle according to planned treatment
regimen.

Full analysis set The set of subjects that is as close
(FAS) as possible to the ideal implied by
the ITT-principle.

Per protocol The set of subjects who complied
(PP) set with the protocol sufficiently to ensure
that they are likely to exhibit the
effects of treatment according to the
underlying scientific model.

FAS vs. PP-set
FAS + no selection bias
- misclassification problem (effect dilution)

PP-set + no contamination problem
- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the same
conclusions, confidence in the trial is supported.

Endpoints
Primary The variable capable of providing the
most clinically relevant evidence
directly related to the primary objective
of the trial

Secondary Either measurements supporting the
primary endpoint or effects related to
secondary objectives

Statistical analyses
Confirmatory The result concerns a primary endpoint
and the p-value or confidence interval
accounts for potential multiplicity.

The result can support a claim of
superiority, equivalence or non-
inferiority.

Exploratory All other analyses.

The result is either supporting or
explanatory, or simply just a new
hypothesis.

Reporting
“For reports of randomized controlled trials authors
should refer to the CONSORT statement.”

Include with the manuscript
Study Protocol

Statistical Analysis Plan

Clinical trials
International regulatory guidelines
ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates
- missing data
- multiplicity issues
- etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

The responsibilities of a statistical reviewer

“To make sure that the authors spell out for the reader
the limitations imposed upon the conclusions by the
design of the study, the collection of data, and the
analyses performed.”

Shor S. The responsibilities of a statistical reviewer. Chest 1972;61:486-487.

Read the manuscript from end to beginning, and look
for weaknesses in the links between:

1. Conclusion
2. Discussion (Discussion section)
3. Results (Results section)
4. Methods (Material & methods section)
5. Data (Material & methods section)
5. Hypothesis (Introduction)

Make sure the chain holds all the way!

Summary
1. Present statistical methods in detail, and the number
of observations included in each analysis.
2. Present data, statistical results and your conclusions
- data description vs. results interpretation
- clinical vs. statistical significance
- absence of evidence is not evidence of
absence
3. Adjust for confounding factors in observational
studies (but do not use stepwise regression)
4. Comply with the CONSORT checklist in randomized
studies

Copenhagen 2008

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Copenhagen 2008

Similar to Copenhagen 2008 (20)

More from Jonas Ranstam PhD

More from Jonas Ranstam PhD (15)

Copenhagen 2008