This document provides guidance on key principles for conducting rigorous statistical analysis and research. It discusses the importance of clearly articulating the story being told with the data through use of graphs and tables. Variables of interest, outcomes, and potential confounding factors should be identified. The generalizability and interestingness of results are important to consider. Prospective studies are preferable to retrospective studies, which require consideration of multiple factors to establish credibility. Multiple statistical tests on a single data set require adjustments to avoid inflated false positive rates. Data collection and coding should be done consistently to allow for proper analysis. Overall, the document emphasizes the need for thoughtful statistical methodology to ensure useful and meaningful results.
2. Magnitude
What’s the smallest result anyone will
care about?
Reduce the length of stay by one day?
Decrease mortality from 1% to 0.9%?
Are we trying to prove that there is
a meaningful difference, or that any
difference is too small to care
about?
3. Articulation – What’s the Story?
Variable(s) of
Primary interest
Outcomes
Continous:
Length of Stay
Pain scores
Events:
Infection
DVTs
Confounding variables
May be demographics
or comorbidities
Known or reasonably
expected to affect
outcome
Not all outcomes can be neatly
measured as discrete events or
physical units (pain, disability…)
Not all measurable variables may
be confounders. Only control or
match for ones you are sure of.
5. Articulation – a clear story
Tell as much of your story as you can
using graphs and tables. Clinicians are
a visual audience
Can you explain how variables may
interact to produce the observed results?
Can you explain to a clinician (insurer,
administrator, patient…) what the result
means?
6. Articulation – telling the right story
Straight
line with
error
Nonlinear,
no error
No error, but
outlier
No result except
for outlier?
All of these have same regression line and R2
7. Generalizable
Who will be able to benefit from the
results of your study?
All surgeons and patients?
A subset such as:
Urban or rural locations?
Older or younger patients?
An infrequent result (5-10% of cases?)
Something so rare a surgeon may
never see it?
8. Generality
ALL RETROSPECTIVE STUDIES
ARE EXPLORATORY!
Without comparing to another data set, you can’t
confirm
GROUPS DEFINED BY THE
OUTCOMES SHOULD BE
SUSPECT!
Your data set should not drive the analysis
9. Interesting
"Not everything that counts can be
counted, and not everything that can
be counted, counts."Einstein on
endpoints.
Is this new information?
Is this useful?(see also:
Generalizable)
Is this something you yourself
would want to read about on
your own?
10. Credibility - Data ain’t fish!
You can make tasty
imitation crabmeat,
shrimp, etc. by
mixing together
cheaper fish and
seasoning.
You can NOT pull
the same trick with
data.
Collect it right the
https://en.wikipedia.org/wiki/Crab_stick
11. Rosenwasser’s Special Case
“Meta-Analysis is to Analysis
what Metaphysics is to
Physics.”
Robert H. Rosenwasser, MD, FACS,
FAHA
A special case of “data ain’t fish”
Good studies + bad studies do not equal good on
average
12. Credibility – Prospective Studies
A 22-item
checklist for good
reporting of a
randomized
controlled trial is
available at
www.consort-statement.org
Why Randomize?
If you don’t know what other
factors affect the result, you can
at least be confident they’re the
same in all groups.
13. Credibility – Retrospective
Studies
Bradford Hill’s nine criteria for causality
Strength of Association
Consistency with Prior Knowledge
Specificity (more causes, less specific)
Temporal relationship – cause before effect
Dose response – more exposure, greater odds
Plausibility – existing theory linking cause + effect
Coherence – does not contradict existing knowledge
Experimental evidence (such as animal studies)
Analogy – parallels other known cause-effect association
Presence doesn’t prove, absence doesn’t disprove,
but each one helps.
14. Credibility: Math problem
If the Type I error is limited to 5% then we expect
one false positive out of 20 different tests where the
null hypothesis is true.
These could be:
20 different studies from the same person
20 different sites attempting the same study
One study containing 20 different tests
This last case is the only one under our control
15. Correcting for multiple tests
In both one-tailed and two-tailed
tests, the total Type I error
probability (area in red) sums up
to a.
In two-tailed tests, the error is
divided between a /2 for two
possibilities.
Bonferroni and other corrections
for multiple tests also divide up the
Type I error between tests.
Bonferroni divides up a among N
tests as a /N.
This correction protects against inflated type I error
16. Intention to Treat
In randomized studies, analysis must always be based
on the group patients were assigned to, even if they
cross over.
This prevents bias. For example, patients assigned to
a non-operative group may still be given surgery, but
operative patients can’t cross over to non-operative.
Patients having more trouble with one treatment may
be more likely to cross over or drop out
The intention to treat analysis doesn’t ask whether the
treatment is effective; it asks whether the policy of
assigning a patient to the treatment is effective.
17. Six Ways to p-Hack
(list from Leif D. Nelson, Berkeley Initiative for Transparency in the Social Sciences)
Stop collecting data once p<.05
Analyze many measures, but report only those with
p<.05.
Collect and analyze many conditions, but only report
those with p<.05.
Use covariates to get p<.05.
Exclude participants to get p<.05.
Goodhart’s Law: When a
measure becomes a
target, it ceases to be a
good measure
18. Male Age
(years)
Implant Ever
Smoked?
Disability
(%)
1 45 Brass 1 75
0 30 Ceramic 1 45
0 . Ceramic 0 30
1 56 Brass 0 50
0 . Brass 1 50
Sex Age
(years)
Implant Smoker Disability
(%)
M 45 Acme
Brass
Y 75
f 30 Presto
Ceramic
2
packs/day
45%
Y N/A Zenith
Ceramic
No 0.3
male 56 Delta
Brass
NO 50
F ? Metal Sometime
s
half
COLLECT DATA CONSISTENTLY
Revision required
before analysis is
practical.
The same data, clearly
coded with minimal
chance of error.
19. Useful Cynicism from
Statisticians
All models are wrong, but some are useful. (George E. P.
Box)
An approximate answer to the right problem is worth a good
deal more than an exact answer to an approximate
problem. (John Tukey)
The combination of some data and an aching desire for an
answer does not ensure that a reasonable answer can be
extracted from a given body of data. (also John Tukey)
To call in the statistician after the experiment is done may
be no more than asking him to perform a post-mortem
20. Also remember:
People who interview you – whether
hiring committees or patients – are
going to remember whether you spoke
with depth, insight and enthusiasm.
The difference between good medicine
and no medicine is generally smaller
than the difference between good
medicine and bad medicine. Caution
and skepticism help prevent getting bad
medicine out there.