Umeapresjr

What is quantified in quantitative research,
and how do you publish the findings?
Jonas Ranstam PhD

Answer: Science is1 generalizable knowledge.

generalizable = reproducible and predictive

1. US National Science Foundation

One major problem:

Sampling uncertainty

Plan
1. Medical research and uncertainty
2. The consequence of study design
3. Publishing uncertain results

1. Medical research and uncertainty

Anecdotal evidence, case reports)

Cohort study of smoking
and lung cancer (1954)
(Bradford Hill)

Case-control study of
smoking and lung
cancer (1950) Evaluation of
(Bradford Hill) sampling uncertainty
Randomised clinical
trial of streptomycin
and tubercolosis
(1948)
(Bradford Hill)

Observed
sample 354 consecutive patients with hip fracture
treated at the Department of Orthopedics,
Umeå University Hospital

Unobserved
population

All potential hip fracture patients all over
the world, now, earlier and future.

Observed
sample 354 consecutive patients with hip fracture
treated at the Department of Orthopedics,
Umeå University Hospital

Unobserved
population

Observed A third observed
sample sample

Another
observed sample

Now consider a laboratory experiment

To what population do experiment A
belong?

Experiment A

belong?
The mother of all possible realizations of

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

belong?
The mother of all possible repetitions of

Experiment A


Sampling variability

belong?

Experiment A


μ


What is the sampling variability of these
experiments?

Experiment A


μ
Observed sampling variability after
thousands of experiments

Do we need to repeat each
experiment thousands of times?

Experiment A SD
n

μ

Sampling uncertainty?

Can we say anything about sampling
uncertainty if only one experiment is
performed?

Experiment A SD
n

SEM = SD/√n

-1.96SEM +1.96SEM
Sampling uncertainty

Now consider a ranking of hospitals

Do different ranks in league tables
represent differences in “hospital
quality”?

Hospital A Hospital B Hospital C Hospital D Hospital E

Sampling variability?

Or do the differences just
reflect sampling variation?

Hospital A

Hospital A Hospital A Hospital A Hospital A Hospital A

μ


It depends on the degree of uncertainty!


Sampling variability? ICC ≈ 1.0

It depends on the degree of uncertainty!


Sampling variability? ICC = 0

What is the difference between
quantitative and qualitative science?

(sampling uncertainty)

Qualitative research
Sampling uncertainty is irrelevant for the
generalization

Quantitative research
Generalization requires quantification of sampling
uncertainty

Qualitative research: 100% of all crows are black

One white crow is sufficient to refute the statement.

Quantitative research: 99% of all crows are black

All crows cannot be studied simultaneously, but the
proportion black crows can be estimated from a
random sample of crows.

Samples are characterized by sampling uncertainty.
This must be quantified to assess the empirical
support of the findings.

What statements describe sampling uncertainty?

Generalizable knowledge

Generalization
Generalization
Observation Generalization

Generalization

P-values and confidence intervals are
used to quantify the uncertainty.

They help us generalize.

How is the uncertainty assessed?

Statistical precision
Statistical precision depends on:

a) the variability (SD) between independent observations

b) the number (n) of independent observations

The standard error of an estimate (SE) = SD/√n

With the same variability, a greater sample size is needed to
detect a lower effect.

Example: Vaccine trial
Protection of pandemic vaccine: 30% ill without vaccine.

Sample size for max 5% risk of false positive and 20% of false
negative result.

Protection Nr patients
90 % 72
80 % 94
70 % 128
60 % 180
50 % 268
40 % 428

Example: Observational safety study
Guillain-Barrés syndrome: Incidence = 1x10-5 pyears

Sample size for max 5% risk of false positive and 20% of false negative result

Relative risk Nr patients Nr affected
100 1 098 9 000
50 2 606 4 500
20 9 075 1 800
10 26 366 900
5 92 248 450
2 992 360 180

The p-value

The probability of by chance obtaining a result at least as
extreme as that observed, when no effect exists.

If |Diffmean/SEDiff| > 1.96 then p < 0.05

and Diffmean is considered statistically significant

Confidence interval

A range of values, which with specified confidence includes the
estimated population parameter.

Diffmean±1.96 SEDiff gives a 95% confidence interval

P-values are usually misconstrued
They do not

- describe clinical relevance, because they depend on sample
size

- show that a difference “does not exist”, because statistical
insignificance indicates absence of evidence, not evidence
of absence

- present the uncertainty in the magnitude of an effect or
difference, because the just relate to null effect (the null
hypothesis)

Results
There was no difference in BMI (p = 0.09), see Table 1.

Table 1 BMI (mean ±SD)

Group 1. 29.2 ±6.9
Group 2. 33.8 ±7.1

Confidence intervals are better
than p-values
In contrast to p-values they do facilitate

- assessment of clinical significance

- show when a difference “does not exist”,
because they present lower and upper limits of
potential clinical effects/differences

Results
There was a difference in BMI of 4.1 (-0.3, 9.0) kg/m2,
see Table 1.

Table 1 BMI (mean ±SD)

Group 1. 29.2 ±6.9
Group 2. 33.8 ±7.1

P-value and confidence interval
Information in p-values Information in confidence intervals
[2 possibilities] [2 possibilities]

p < 0.05
Statistically significant effect

n.s. Inconclusive

Effect
0

P-value and confidence interval
P-value Conclusion from confidence intervals

Statistically and clinically significant effect
p < 0.05

Statistically, but not necessarily clinically, significant effect
p < 0.05

n.s.
Inconclusive

n.s. Neither statistically nor clinically significant effect

p < 0.05 Statistically significant reversed effect

Effect 0
Clinically significant effects

Superiority vs. non-inferiority

Superiority shown

Superiority shown less strongly

Non-inferiority not shown Superiority not shown

Non-inferiority shown Superiority not shown

Equivalence shown Superiority not shown

Control better New agent better
0

Margin of non-inferiority
or equivalence

Science as “significant observations”

P < 0.05 [There is a difference]

Data

NS [There is no difference]


P < 0.05 [There is a difference]

A p-value can be meaningfully interpreted
only when the hypothesis is defined a
priori and when multiplicity issues are
considered.
Data

NS [There is no difference]

No, statistical insignificance indicates absence of
evidence, not evidence of absence.


What should not be asked
Is there a statistically significant difference in
the studied group of patients?
Data

What should be asked
Is there an indication of a clinically significant
difference among patients in general?

2. The consequence of study design

Evidence based medicine
1. Strong evidence from at least one systematic review of multiple
well-designed randomized controlled trials.

2. Strong evidence from at least one properly designed randomized
controlled trial of appropriate size.

3. Evidence from well-designed trials such as pseudo-randomized
or non-randomized trials, cohort studies, time series or matched
case-controlled studies.

4. Evidence from well-designed non-experimental studies from more
than one center or research group or from case reports.

5. Opinions of respected authorities, based on clinical evidence,
descriptive studies or reports of expert committees.

Any claim coming from an observational
study is most likely to be wrong
12 randomised trials have tested 52 observational claims (about
the effects of vitamine B6, B12, C, D, E, beta carotene, hormone
replacement therapy, folic acid and selenium).

“They all confirmed no claims in the direction of the observational
claim. We repeat that figure: 0 out of 52. To put it in another way,
100% of the observational claims failed to replicate. In fact, five
claims (9.6%) are statistically significant in the opposite direction
to the observational claim.”

Stanley Young and Allan Karr, Significance, September 2011

Even good observational research...
A series of observational studies published in the Lancet and the
NEJM generated and tested during the 1980s the hypothesis that
AIDS was caused by the side effect of a drug (amyl nitrite).

The authors of these publications also claimed to have identified
the biological mechanism and urged for preventive measures.

Then the virus was detected.

Vandenbroucke JP and Pardoel VP. An autopsy of epidemiologic
methods: the case of “poppers” in the early epidemic of the
acquired immunodeficiency syndrome (AIDS). Am J Epidemiol
1989;129:455-457.

What is the most important methodological
difference between observational and
experimental studies?

Experimental vs. observational studies
Experiments

Bias is eliminated by design (“Block what you can, randomize
what you cannot”)

Statistical analysis: Focus on precision

Observation

Blocking and randomization is not possible. Bias must be taken
into consideration in the statistical analysis.

Statistical analysis: Focus on validity

Experimental studies
- Randomized clinical trials

- Laboratory experiments

Tests for baseline imbalance
Baseline imbalance after randomization is often tested. This
is not meaningful.

The purpose of randomization is to avoid systematic
imbalance (bias), not random errors (reduced precision).

The method to avoid random baseline imbalance is to use
randomization stratification.

Multiplicity
In contrast to many other forms of precision, statistical
precision depends on the number of measurements
performed (the number of hypotheses tested).

The probability of a false positive finding increases with
the number of performed tests.

Multiplicity
The risk of getting at least one false positive finding can be
calculated as 1 - (1 - α)k

where k is the number of performed comparisons and
α the significance level (usually 0.05).

Number of tests Risk of at least one false positive

1 0.05
2 0.10
10 0.40
20 0.60

Multiplicity
Adjustments of p-values can be made, but these reduce
the type 1 error rate on the expense of the type 2 error
rate, which means that a greater patient number will be
needed, which in turn means higher cost.

Recommendation: Avoid multiplicity adjustments.

Laboratory experimenters often use Bonferroni correction
to address multiplicity issues within endpoints, but hardly
ever to correct for the multiplicity of endpoints. The work is
therefore hypothesis generating rather than confirmatory.

Statistical analyses
Type of test Result

Confirmatory Empirical support for a claim of superiority,
equivalence or non-inferiority.

Hypothesis A new hypothesis, which needs to be tested
generating in a new hypothesis test.

How can I avoid multiplicity adjustments?

Most trials include more than 1 outcome.

Define a structure or hierarchy of endpoints: primary, secondary
and safety. Define primary endpoint(s) as confirmatory and
secondary as hypothesis generating.

No adjustment is necessary when statistical significance is
required for all multiplicities or for supporting or explanatory
hypothesis tests.

Endpoints
Primary The variable capable of providing the
most clinically relevant evidence
directly related to the primary objective
of the trial

Secondary Effects related to secondary objectives,
measurements supporting primary
endpoint(s) or hypothesis generating tests.

Validity issues in randomized trials

External validity

Inclusion/exclusion criteria affects the representativity of the
results (efficacy vs. effectiveness).

Internal validity

Some subjects withdraw, from follow up. The withdrawal may
depend on treatment and on the patient's characteristics.
This can bias both efficacy and effectiveness.

Study populations
Intention-to-treat Analyze all randomized subjects
(ITT) principle according to randomized treatment.

Full analysis set The set of subjects that is as close
(FAS) as possible to the ideal implied by
the ITT-principle.

Per protocol The set of subjects who complied
(PP) set with the protocol sufficiently to ensure
that they are likely to exhibit the
effects of treatment according to the
underlying scientific model.

FAS vs. PP-set
FAS + no selection bias
- misclassification problem (effect dilution)

PP-set + no contamination problem
- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the same
conclusions, confidence in the trial is supported.

Clinical trials
International regulatory guidelines

ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates
- missing data
- multiplicity issues
- etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

Observational studies
Main types

- Cross-sectional studies

- Cohort studies (prospective or historic)

- Case-control studies (always retrospective)

Validity
 Selection bias (systematic differences between
comparison groups caused by
non-random allocation of subjects)

 Information bias (misclassification, measurement
errors, etc.)

 Confounding bias (inadequate analysis, flawed
interpretation of results)

Testing for confounding
Screening for statistically significant effects, or stepwise
regression, is often used to select covariates for inclusion in
a regression model.

However, confounding is a property of the sample, not of the
population. Hypothesis tests have no relevance.

The selection of covariates to adjust for must be based on
clinical knowledge and considerations of cause and effect.

All study designs are (more or less) problematic
- Post hoc hypothesis tests, multiple testing
- Multiple modeling, protopatic bias, confounding
- Recycling of data

Experimental studies (laboratory experiments)
- Multiple testing (Bonferroni correction within endpoints)
- Small sample problems (often n=3)
- Pseudoreplication and pooling of samples

Experimental studies (randomized clinical trials)
- External validity
- No long term effects
- No infrequent events

Independent observations and replicates

Two rats are sampled
from a population with
a mean (μ) of 50 and
a standard deviation
(σ) of 10, and ten
measurements of an
arbitrary outcome
variable are made on
each rat.

3. Publishing uncertain results

A scientific report
The idea is to try and give all the information to help others to
judge the value of your contributions, not just the information
that leads to judgment in one particular direction or another.

Richard P. Feynman

It is impossible to do clinical research
so badly that it cannot be published
“There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design
too warped, no methodology too bungled, no presentation of
results too inaccurate, no argument too circular, no conclusions
too trifling or too unjustified, and no grammar and syntax too
offensive for a paper to end up in print.”

Drummond Rennie 1986 (editor of NEJM and JAMA)

Changes in publication practice
1658 – first scientific journals
1858 – the IMRAD structure
1957 – the abstract
1978 – Vancouver convention (ICMJE)
1987 – the structured abstract

Randomized clinical trials
1997 – Reporting guidelines (CONSORT)
1998 – Analysis guidelines (ICH)
2005 – Trial registration (Clinicaltrials.gov)

2007 – Reporting guidelines (STROBE)
2011 – Analysis guidelines (NARA, ICRS, etc.)

Clinical Trial Registration
In this editorial, published simultaneously in all member journals, the
International Committee of Medical Journal Editors (ICMJE)
proposes comprehensive trials registration as a solution to the
problem of selective awareness and announces that all 11 ICMJE
member journals will adopt a trials-registration policy to promote this
goal.

The ICMJE member journals will require, as a condition of
consideration for publication, registration in a public trials registry.
Trials must register at or before the onset of patient enrollment. This
policy applies to any clinical trial starting enrollment after July 1,
2005. For trials that began enrollment prior to this date, the ICMJE
member journals will require registration by September 13, 2005,
before considering the trial for publication. We speak only for
ourselves, but we encourage editors of other biomedical journals to
adopt similar policies.

Umeapresjr

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Umeapresjr

Similar to Umeapresjr (20)

More from Jonas Ranstam PhD

More from Jonas Ranstam PhD (20)

Umeapresjr