9. Cohort study of smoking
and lung cancer (1954)
(Bradford Hill)
Case-control study of
smoking and lung
cancer (1950) Evaluation of
(Bradford Hill) sampling uncertainty
Randomised clinical
trial of streptomycin
and tubercolosis
(1948)
(Bradford Hill)
11. Observed
sample 354 consecutive patients with hip fracture
treated at the Department of Orthopedics,
Umeå University Hospital
12. Unobserved
population
All potential hip fracture patients all over
the world, now, earlier and future.
Observed
sample 354 consecutive patients with hip fracture
treated at the Department of Orthopedics,
Umeå University Hospital
13. Unobserved
population
Observed A third observed
sample sample
Another
observed sample
16. To what population do experiment A
belong?
The mother of all possible realizations of
Experiment A
Experiment A Experiment A Experiment A Experiment A Experiment A
17. To what population do experiment A
belong?
The mother of all possible repetitions of
Experiment A
Experiment A Experiment A Experiment A Experiment A Experiment A
Sampling variability
18. To what population do experiment A
belong?
The mother of all possible repetitions of
Experiment A
Experiment A Experiment A Experiment A Experiment A Experiment A
μ
Sampling variability
19. What is the sampling variability of these
experiments?
The mother of all possible repetitions of
Experiment A
Experiment A Experiment A Experiment A Experiment A Experiment A
μ
Observed sampling variability after
thousands of experiments
20. Do we need to repeat each
experiment thousands of times?
Experiment A SD
n
μ
Sampling uncertainty?
21. Can we say anything about sampling
uncertainty if only one experiment is
performed?
Experiment A SD
n
SEM = SD/√n
-1.96SEM +1.96SEM
Sampling uncertainty
23. Do different ranks in league tables
represent differences in “hospital
quality”?
Hospital A Hospital B Hospital C Hospital D Hospital E
Sampling variability?
24. Or do the differences just
reflect sampling variation?
The mother of all possible repetitions of
Hospital A
Hospital A Hospital A Hospital A Hospital A Hospital A
μ
Sampling variability
25. It depends on the degree of uncertainty!
Hospital A Hospital B Hospital C Hospital D Hospital E
Sampling variability? ICC ≈ 1.0
26. It depends on the degree of uncertainty!
Hospital A Hospital B Hospital C Hospital D Hospital E
Sampling variability? ICC = 0
27. What is the difference between
quantitative and qualitative science?
(sampling uncertainty)
30. Quantitative research: 99% of all crows are black
All crows cannot be studied simultaneously, but the
proportion black crows can be estimated from a
random sample of crows.
Samples are characterized by sampling uncertainty.
This must be quantified to assess the empirical
support of the findings.
40. Generalizable knowledge
Generalization
Generalization
Observation Generalization
Generalization
P-values and confidence intervals are
used to quantify the uncertainty.
They help us generalize.
43. Statistical precision
Statistical precision depends on:
a) the variability (SD) between independent observations
b) the number (n) of independent observations
The standard error of an estimate (SE) = SD/√n
With the same variability, a greater sample size is needed to
detect a lower effect.
44.
45. Example: Vaccine trial
Protection of pandemic vaccine: 30% ill without vaccine.
Sample size for max 5% risk of false positive and 20% of false
negative result.
Protection Nr patients
90 % 72
80 % 94
70 % 128
60 % 180
50 % 268
40 % 428
46. Example: Observational safety study
Guillain-Barrés syndrome: Incidence = 1x10-5 pyears
Sample size for max 5% risk of false positive and 20% of false negative result
Relative risk Nr patients Nr affected
100 1 098 9 000
50 2 606 4 500
20 9 075 1 800
10 26 366 900
5 92 248 450
2 992 360 180
47. Statistical precision
The p-value
The probability of by chance obtaining a result at least as
extreme as that observed, when no effect exists.
If |Diffmean/SEDiff| > 1.96 then p < 0.05
and Diffmean is considered statistically significant
48. Statistical precision
Confidence interval
A range of values, which with specified confidence includes the
estimated population parameter.
Diffmean±1.96 SEDiff gives a 95% confidence interval
49. P-values are usually misconstrued
They do not
- describe clinical relevance, because they depend on sample
size
- show that a difference “does not exist”, because statistical
insignificance indicates absence of evidence, not evidence
of absence
- present the uncertainty in the magnitude of an effect or
difference, because the just relate to null effect (the null
hypothesis)
50. Results
There was no difference in BMI (p = 0.09), see Table 1.
Table 1 BMI (mean ±SD)
Group 1. 29.2 ±6.9
Group 2. 33.8 ±7.1
51. Confidence intervals are better
than p-values
In contrast to p-values they do facilitate
- assessment of clinical significance
- show when a difference “does not exist”,
because they present lower and upper limits of
potential clinical effects/differences
52. Results
There was a difference in BMI of 4.1 (-0.3, 9.0) kg/m2,
see Table 1.
Table 1 BMI (mean ±SD)
Group 1. 29.2 ±6.9
Group 2. 33.8 ±7.1
53. P-value and confidence interval
Information in p-values Information in confidence intervals
[2 possibilities] [2 possibilities]
p < 0.05
Statistically significant effect
n.s. Inconclusive
Effect
0
54. P-value and confidence interval
P-value Conclusion from confidence intervals
Statistically and clinically significant effect
p < 0.05
Statistically, but not necessarily clinically, significant effect
p < 0.05
n.s.
Inconclusive
n.s. Neither statistically nor clinically significant effect
p < 0.05 Statistically significant reversed effect
Effect 0
Clinically significant effects
55. Superiority vs. non-inferiority
Superiority shown
Superiority shown less strongly
Non-inferiority not shown Superiority not shown
Non-inferiority shown Superiority not shown
Equivalence shown Superiority not shown
Control better New agent better
0
Margin of non-inferiority
or equivalence
56. Science as “significant observations”
P < 0.05 [There is a difference]
Data
NS [There is no difference]
57. Science as “significant observations”
P < 0.05 [There is a difference]
A p-value can be meaningfully interpreted
only when the hypothesis is defined a
priori and when multiplicity issues are
considered.
Data
NS [There is no difference]
No, statistical insignificance indicates absence of
evidence, not evidence of absence.
58. Science as “significant observations”
What should not be asked
Is there a statistically significant difference in
the studied group of patients?
Data
What should be asked
Is there an indication of a clinically significant
difference among patients in general?
61. Evidence based medicine
1. Strong evidence from at least one systematic review of multiple
well-designed randomized controlled trials.
2. Strong evidence from at least one properly designed randomized
controlled trial of appropriate size.
3. Evidence from well-designed trials such as pseudo-randomized
or non-randomized trials, cohort studies, time series or matched
case-controlled studies.
4. Evidence from well-designed non-experimental studies from more
than one center or research group or from case reports.
5. Opinions of respected authorities, based on clinical evidence,
descriptive studies or reports of expert committees.
62. Any claim coming from an observational
study is most likely to be wrong
12 randomised trials have tested 52 observational claims (about
the effects of vitamine B6, B12, C, D, E, beta carotene, hormone
replacement therapy, folic acid and selenium).
“They all confirmed no claims in the direction of the observational
claim. We repeat that figure: 0 out of 52. To put it in another way,
100% of the observational claims failed to replicate. In fact, five
claims (9.6%) are statistically significant in the opposite direction
to the observational claim.”
Stanley Young and Allan Karr, Significance, September 2011
63. Even good observational research...
A series of observational studies published in the Lancet and the
NEJM generated and tested during the 1980s the hypothesis that
AIDS was caused by the side effect of a drug (amyl nitrite).
The authors of these publications also claimed to have identified
the biological mechanism and urged for preventive measures.
Then the virus was detected.
Vandenbroucke JP and Pardoel VP. An autopsy of epidemiologic
methods: the case of “poppers” in the early epidemic of the
acquired immunodeficiency syndrome (AIDS). Am J Epidemiol
1989;129:455-457.
64. What is the most important methodological
difference between observational and
experimental studies?
65. Experimental vs. observational studies
Experiments
Bias is eliminated by design (“Block what you can, randomize
what you cannot”)
Statistical analysis: Focus on precision
Observation
Blocking and randomization is not possible. Bias must be taken
into consideration in the statistical analysis.
Statistical analysis: Focus on validity
70. Tests for baseline imbalance
Baseline imbalance after randomization is often tested. This
is not meaningful.
The purpose of randomization is to avoid systematic
imbalance (bias), not random errors (reduced precision).
The method to avoid random baseline imbalance is to use
randomization stratification.
71. Multiplicity
In contrast to many other forms of precision, statistical
precision depends on the number of measurements
performed (the number of hypotheses tested).
The probability of a false positive finding increases with
the number of performed tests.
72. Multiplicity
The risk of getting at least one false positive finding can be
calculated as 1 - (1 - α)k
where k is the number of performed comparisons and
α the significance level (usually 0.05).
Number of tests Risk of at least one false positive
1 0.05
2 0.10
10 0.40
20 0.60
73.
74. Multiplicity
Adjustments of p-values can be made, but these reduce
the type 1 error rate on the expense of the type 2 error
rate, which means that a greater patient number will be
needed, which in turn means higher cost.
Recommendation: Avoid multiplicity adjustments.
Laboratory experimenters often use Bonferroni correction
to address multiplicity issues within endpoints, but hardly
ever to correct for the multiplicity of endpoints. The work is
therefore hypothesis generating rather than confirmatory.
75. Statistical analyses
Type of test Result
Confirmatory Empirical support for a claim of superiority,
equivalence or non-inferiority.
Hypothesis A new hypothesis, which needs to be tested
generating in a new hypothesis test.
76. How can I avoid multiplicity adjustments?
Most trials include more than 1 outcome.
Define a structure or hierarchy of endpoints: primary, secondary
and safety. Define primary endpoint(s) as confirmatory and
secondary as hypothesis generating.
No adjustment is necessary when statistical significance is
required for all multiplicities or for supporting or explanatory
hypothesis tests.
77. Endpoints
Primary The variable capable of providing the
most clinically relevant evidence
directly related to the primary objective
of the trial
Secondary Effects related to secondary objectives,
measurements supporting primary
endpoint(s) or hypothesis generating tests.
78. Validity issues in randomized trials
External validity
Inclusion/exclusion criteria affects the representativity of the
results (efficacy vs. effectiveness).
Internal validity
Some subjects withdraw, from follow up. The withdrawal may
depend on treatment and on the patient's characteristics.
This can bias both efficacy and effectiveness.
79. Study populations
Intention-to-treat Analyze all randomized subjects
(ITT) principle according to randomized treatment.
Full analysis set The set of subjects that is as close
(FAS) as possible to the ideal implied by
the ITT-principle.
Per protocol The set of subjects who complied
(PP) set with the protocol sufficiently to ensure
that they are likely to exhibit the
effects of treatment according to the
underlying scientific model.
80. FAS vs. PP-set
FAS + no selection bias
- misclassification problem (effect dilution)
PP-set + no contamination problem
- possible selection bias (confounding)
When the FAS and PP-set lead to essentially the same
conclusions, confidence in the trial is supported.
81.
82. Clinical trials
International regulatory guidelines
ICH Topic E9 - Statistical Principles for Clinical Trials
EMEA Points to consider: baseline covariates
- missing data
- multiplicity issues
- etc.
and similar documents from the FDA
These guidelines can all be found on the internet.
83. Observational studies
Main types
- Cross-sectional studies
- Cohort studies (prospective or historic)
- Case-control studies (always retrospective)
84.
85. Observational studies
Validity
Selection bias (systematic differences between
comparison groups caused by
non-random allocation of subjects)
Information bias (misclassification, measurement
errors, etc.)
Confounding bias (inadequate analysis, flawed
interpretation of results)
86.
87.
88.
89. Testing for confounding
Screening for statistically significant effects, or stepwise
regression, is often used to select covariates for inclusion in
a regression model.
However, confounding is a property of the sample, not of the
population. Hypothesis tests have no relevance.
The selection of covariates to adjust for must be based on
clinical knowledge and considerations of cause and effect.
90. All study designs are (more or less) problematic
Observational studies
- Post hoc hypothesis tests, multiple testing
- Multiple modeling, protopatic bias, confounding
- Recycling of data
Experimental studies (laboratory experiments)
- Multiple testing (Bonferroni correction within endpoints)
- Small sample problems (often n=3)
- Pseudoreplication and pooling of samples
Experimental studies (randomized clinical trials)
- External validity
- No long term effects
- No infrequent events
91.
92. Independent observations and replicates
Two rats are sampled
from a population with
a mean (μ) of 50 and
a standard deviation
(σ) of 10, and ten
measurements of an
arbitrary outcome
variable are made on
each rat.
94. A scientific report
The idea is to try and give all the information to help others to
judge the value of your contributions, not just the information
that leads to judgment in one particular direction or another.
Richard P. Feynman
95. It is impossible to do clinical research
so badly that it cannot be published
“There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design
too warped, no methodology too bungled, no presentation of
results too inaccurate, no argument too circular, no conclusions
too trifling or too unjustified, and no grammar and syntax too
offensive for a paper to end up in print.”
Drummond Rennie 1986 (editor of NEJM and JAMA)
96. Changes in publication practice
1658 – first scientific journals
1858 – the IMRAD structure
1957 – the abstract
1978 – Vancouver convention (ICMJE)
1987 – the structured abstract
Randomized clinical trials
1997 – Reporting guidelines (CONSORT)
1998 – Analysis guidelines (ICH)
2005 – Trial registration (Clinicaltrials.gov)
Observational studies
2007 – Reporting guidelines (STROBE)
2011 – Analysis guidelines (NARA, ICRS, etc.)
97.
98. Clinical Trial Registration
In this editorial, published simultaneously in all member journals, the
International Committee of Medical Journal Editors (ICMJE)
proposes comprehensive trials registration as a solution to the
problem of selective awareness and announces that all 11 ICMJE
member journals will adopt a trials-registration policy to promote this
goal.
The ICMJE member journals will require, as a condition of
consideration for publication, registration in a public trials registry.
Trials must register at or before the onset of patient enrollment. This
policy applies to any clinical trial starting enrollment after July 1,
2005. For trials that began enrollment prior to this date, the ICMJE
member journals will require registration by September 13, 2005,
before considering the trial for publication. We speak only for
ourselves, but we encourage editors of other biomedical journals to
adopt similar policies.