There Is (No) Evidence For
That:
Epistemic Problems in Critical Care
Medicine
SCOTT K. ABEREGG, MD, MPH
SALT LAKE CITY, UTAH
WWW.MEDICALEVIDENCEBLOG.COM
WWW.STATUSIATROGENICUS.BLOGSPOT.COM
@MEDEVIDENCEBLOG
Epistemology
 A branch of philosophy concerned with knowledge and its:
 Essence and Origins
 Methods
 Validity
 Scope and Limitations
 “There is (or is not) evidence for” is a glib statement and should be taken
with a grain of salt
Intelligence versus Rationality
“Book Smarts”
 The algorithmic mind
 Learn and apply rules to solve well
defined problems
 Measured precisely by IQ tests
 Predicts success at calculus
“Common Sense”
 Epistemic Rationality: How well do
your beliefs map onto reality; sound
judgment and decision making
 Instrumental Rationality:
Optimization of goal selection and
fulfillment
 Not measured (at all) by intelligence
tests (need RQ test says Stanovich)
LUDIC FALLACY: The misuse of games to
model real world problems
Dr. John the Statistician
 “The coin has no memory of past flips.
The current flip is independent of
them. Thus, the probability of the
101st flip landing on heads is 50%
 Dr. John has book smarts
Tony the Options Trader
 “The probability that in my lifetime I
will ever see a fair coin land on heads
100 times in a row is so low that I
reject the premise of the problem that
you have presented me. The coin is
rigged.”
 Tony has common sense
A fair coin is flipped and lands on heads 100 times in a row. What is the
probability that it will land on heads on the 101st flip?
No Evidence Condition (There is No
Evidence for)
 Parachutes for Gravitational Challenge
 (“You don’t need a parachute to skydive. You need a
parachute to skydive twice.”)
 Mechanical Ventilation
 Antibiotics for sepsis
 IVF for dehydration
 Insulin for DKA
 Knee Replacement
Knowledge without Formal Evidence
 ARR high [NNT low]
 “Visible” & immediate effects
 Causal Pathways “Obvious”
 Type I diabetes  DKA
 Insulin  Resolution of DKA
 Trials “unethical” – No Equipoise – High Prior Probability for
Ha
 Implicit Bayesian Approach
Category 1 Therapies (Parachute Therapies)
Knowledge without formal evidence?
 Efficacy of parachute A versus parachute B
 Dose of mechanical ventilation
 Duration/spectrum of antibiotics for sepsis
 Dose of IVF for resuscitation
 “Anytensive” insulin therapy in the ICU
 Arthroscopy debridement for osteoarthritis
No knowledge without Formal Evidence
Category 2 Therapies
 ARR low(er) [NNT high(er)]
 “Invisible” & delayed effects
 Associations Prevalent, CPs Obscure
ICU Hyperglycemia  ???
Insulin  correction of hyperglycemia
 Trials imperative – Equipoise - Low(er) Prior Probability for
Ha
There Is (Formal) Evidence for…
Category II Therapies: The Trial As a Diagnostic
Test of a Hypothesis
“There is Evidence for…..”
 True Positives
 False Positives
 Type I Errors: A Specificity Issue
 The “alpha bet” – a product of the
unreflective mind
 “Journal Club Biases”
 Ioannidis - FLEXIBILITY
 FRAUD – you can build an entire career
(or product line) on it
 Bayesian Interpretations
“There is No Evidence for…..”
 True Negatives
 Stochastic Dominance of the Null
Hypothesis
 Bayesian Interpretations
 False Negatives
 Type II Errors: A Sensitivity Issue
 Inadequate Study Power: Sensitivity
 Delta Inflation
False Positive Evidence
False Positive Evidence: COS
False Positive Evidence
False Positives: The “alpha bet”
 Threshold for statistical significance ( = 0.05) based on “convention”
 Convention established by Fisher in 1925 in Statistical Methods for Research Workers
 Fisher was suggesting alpha 0.05 for the science of the 1920s
 Study Population: Frog legs bathed in Ringer’s
 Study Outcome: Action Potentials
 Investigators on Earth studying the same topic: A handful
 Study Cost: Peanuts
 Forward to the 21st Century:
 Study Population: Tens of thousands of patients with coronary disease
 Study Outcome: Death or non-fatal MI
 Investigators on Earth studying the problem: many handfuls
 Study Cost: $250 Million
Statistical Methods for Research Workers
“The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this
point as a limit in judging whether a deviation ought to be considered significant or
not…..Using this criterion we should be led to follow up a false indication only once in
22 trials, even if the statistics were the only guide available.”
“If one in twenty does not seem high enough odds, we may, if we prefer it, draw the
line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point).
Personally, the writer prefers to set a low standard of significance at the 5 per cent point,
and ignore entirely all results which fail to reach this level. A scientific fact should be
regarded as experimentally established only if a properly designed experiment rarely
fails to give this level of significance.”
- Sir R.A. Fisher
What would Fisher say?
Therapy Study Year p-value
Intensive Insulin Van den Berghe 2001 <0.04
Brunkhorst 2008 0.74
Van den Berghe 2006 0.33
Preiser 2009 0.41
NICE-Sugar 2009 0.02 in the wrong direction
drotrecogin-alfa Bernard/PROWESS 2001 0.005
Abraham 2005 0.34
Ranieri 2012 0.31
Early Goal Directed Therapy Rivers 2001 0.009
Muzzin/PROCESS 2014 0.83
ARISE 2015 0.9
Mouncey/ProMISe 2015 0.9
Hypothermia After Cardiac Arrest HACA 2002 CI 1.08-1.81
Bernard 2002 0.046
Nielsen 2013 0.51
Stochastic Dominance of the Null Hypothesis:
The ARDSnet Population of Studies
 KARMA, n=234, standard , β, δ
 Stopped for futility n=234,
δ(observed) = 1.0%, p=.85
 ARMA, n=861, standard , β, δ
 Stopped for efficacy, n=861, δ = 8.8%,
p=0.007
 LaSRS, n=180, standard , β; δ 15%,
revised mid-study to 20% because of
low enrollment
 Observed δ 0.6%, P=1.0
 FACTT, n=1000, standard , β, δ
 Observed δ 2.9%, p=0.30
 ALVEOLI, n=549, standard , β, δ
 Stopped early; Observed δ 2.6, p=0.48
 ARDSnet II
 ALTA
 EDEN
 OMEGA
 SAILS
 This “population” of
hypotheses is dominated by
Ho; Ho not rejected 90% of
the time
Prior probability of Ho = 1-
Ha
Held, BMC Medical Research
Methodology 2010, 10:21
http://www.biomedcentral.com/14
71-2288/10/21
Minimum posterior
probability of Ho; 1-this =
Maximum posterior
probability of Ha
Implications of Stochastic Dominance
of Ho: A Nomogram for p-values
Prior Probability of Ha p-value
Maximum Posterior
Probability of Ha
50% 0.05 75%
0.01 90%
0.001 98.50%
10% (ARDSnet) 0.05 < 50%
0.01 50%
0.001 85%
5% 0.05 <50%
0.01 <50%
0.001 75%
“Extraordinary claims require
extraordinary evidence.”
High Prior Probability of Ho + Lax
Alpha Standard = Many False Positive
Studies Therapy Study Year p-value
Intensive Insulin Van den Berghe 2001 <0.04
Brunkhorst 2008 0.74
Van den Berghe 2006 0.33
Preiser 2009 0.41
NICE-Sugar 2009
0.02 in the wrong
direction
drotrecogin-alfa Bernard/PROWESS 2001 0.005
Abraham 2005 0.34
Ranieri 2012 0.31
Early Goal Directed
Therapy Rivers 2001 0.009
Muzzin/PROCESS 2014 0.83
ARISE 2015 0.9
Mouncey/ProMISe 2015 0.9
Hypothermia After
Cardiac Arrest HACA 2002 CI 1.08-1.81
Bernard 2002 0.046
Nielsen 2013 0.51
False Negatives: Inadequate Study
Power
Cookbook Design of an RCT (Recipes
for Algorithmic Minds)
What Ought to Be
 Type I error rate selected based on
risks of false positive
 Type II error rate selected based on
risk of false negative
 Estimate baseline event rate based on
preliminary or historic data
 Estimate treatment effect size (delta)
based on preliminary or historic data
or precedent
What is
 0.05 used by convention
 Power set at 80 +/- 10% by
contention
 Estimate baseline event rate
 Estimate of how many patients it is
feasible to enroll. Reverse calculate
delta from this number
Delta Inflation is to Research as Grade
Inflation is to Academics
Esteban18
Predicted Delta (%)
Delta Inflation
 Systematic overestimation of observed delta with predicted delta
 Mean Predicted delta 10.1%
 Mean Observed delta 1.4%
 Delta-gap 8.7%
 Only 5/38 provided justification for delta
 A “cluster” of predicted delta around 10%
 Rarely (n=2) does observed delta exceed predicted delta
 Bernard and Rivers not replicable
 In 26/38 trials, the 95% confidence interval for observed delta did not even
include predicted delta
 Wide range of predicted delta argues against MCID guiding its choice
Conclusions
 Knowledge is based upon evidence
 Evidence has many forms and is not limited to “formal” evidence from
RCTs
 RCTs can have false positives and false negatives
 Algorithmic or “cookbook” interpretation of RCTs can lead to erroneous
conclusions
 Algorithmic or “cookbook” design of RCTs can lead to erroneous
conclusions
believe nothing,
no matter where you read it
or who has said it,
not even if I have said it,
unless it agrees with your own reason
and your own common sense
-buddha

Epistemic problems 12_18_15

  • 1.
    There Is (No)Evidence For That: Epistemic Problems in Critical Care Medicine SCOTT K. ABEREGG, MD, MPH SALT LAKE CITY, UTAH WWW.MEDICALEVIDENCEBLOG.COM WWW.STATUSIATROGENICUS.BLOGSPOT.COM @MEDEVIDENCEBLOG
  • 2.
    Epistemology  A branchof philosophy concerned with knowledge and its:  Essence and Origins  Methods  Validity  Scope and Limitations  “There is (or is not) evidence for” is a glib statement and should be taken with a grain of salt
  • 3.
    Intelligence versus Rationality “BookSmarts”  The algorithmic mind  Learn and apply rules to solve well defined problems  Measured precisely by IQ tests  Predicts success at calculus “Common Sense”  Epistemic Rationality: How well do your beliefs map onto reality; sound judgment and decision making  Instrumental Rationality: Optimization of goal selection and fulfillment  Not measured (at all) by intelligence tests (need RQ test says Stanovich)
  • 4.
    LUDIC FALLACY: Themisuse of games to model real world problems Dr. John the Statistician  “The coin has no memory of past flips. The current flip is independent of them. Thus, the probability of the 101st flip landing on heads is 50%  Dr. John has book smarts Tony the Options Trader  “The probability that in my lifetime I will ever see a fair coin land on heads 100 times in a row is so low that I reject the premise of the problem that you have presented me. The coin is rigged.”  Tony has common sense A fair coin is flipped and lands on heads 100 times in a row. What is the probability that it will land on heads on the 101st flip?
  • 5.
    No Evidence Condition(There is No Evidence for)  Parachutes for Gravitational Challenge  (“You don’t need a parachute to skydive. You need a parachute to skydive twice.”)  Mechanical Ventilation  Antibiotics for sepsis  IVF for dehydration  Insulin for DKA  Knee Replacement
  • 6.
    Knowledge without FormalEvidence  ARR high [NNT low]  “Visible” & immediate effects  Causal Pathways “Obvious”  Type I diabetes  DKA  Insulin  Resolution of DKA  Trials “unethical” – No Equipoise – High Prior Probability for Ha  Implicit Bayesian Approach Category 1 Therapies (Parachute Therapies)
  • 7.
    Knowledge without formalevidence?  Efficacy of parachute A versus parachute B  Dose of mechanical ventilation  Duration/spectrum of antibiotics for sepsis  Dose of IVF for resuscitation  “Anytensive” insulin therapy in the ICU  Arthroscopy debridement for osteoarthritis
  • 8.
    No knowledge withoutFormal Evidence Category 2 Therapies  ARR low(er) [NNT high(er)]  “Invisible” & delayed effects  Associations Prevalent, CPs Obscure ICU Hyperglycemia  ??? Insulin  correction of hyperglycemia  Trials imperative – Equipoise - Low(er) Prior Probability for Ha
  • 9.
    There Is (Formal)Evidence for…
  • 10.
    Category II Therapies:The Trial As a Diagnostic Test of a Hypothesis “There is Evidence for…..”  True Positives  False Positives  Type I Errors: A Specificity Issue  The “alpha bet” – a product of the unreflective mind  “Journal Club Biases”  Ioannidis - FLEXIBILITY  FRAUD – you can build an entire career (or product line) on it  Bayesian Interpretations “There is No Evidence for…..”  True Negatives  Stochastic Dominance of the Null Hypothesis  Bayesian Interpretations  False Negatives  Type II Errors: A Sensitivity Issue  Inadequate Study Power: Sensitivity  Delta Inflation
  • 11.
  • 12.
  • 13.
  • 15.
    False Positives: The“alpha bet”  Threshold for statistical significance ( = 0.05) based on “convention”  Convention established by Fisher in 1925 in Statistical Methods for Research Workers  Fisher was suggesting alpha 0.05 for the science of the 1920s  Study Population: Frog legs bathed in Ringer’s  Study Outcome: Action Potentials  Investigators on Earth studying the same topic: A handful  Study Cost: Peanuts  Forward to the 21st Century:  Study Population: Tens of thousands of patients with coronary disease  Study Outcome: Death or non-fatal MI  Investigators on Earth studying the problem: many handfuls  Study Cost: $250 Million
  • 16.
    Statistical Methods forResearch Workers “The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not…..Using this criterion we should be led to follow up a false indication only once in 22 trials, even if the statistics were the only guide available.” “If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.” - Sir R.A. Fisher
  • 17.
    What would Fishersay? Therapy Study Year p-value Intensive Insulin Van den Berghe 2001 <0.04 Brunkhorst 2008 0.74 Van den Berghe 2006 0.33 Preiser 2009 0.41 NICE-Sugar 2009 0.02 in the wrong direction drotrecogin-alfa Bernard/PROWESS 2001 0.005 Abraham 2005 0.34 Ranieri 2012 0.31 Early Goal Directed Therapy Rivers 2001 0.009 Muzzin/PROCESS 2014 0.83 ARISE 2015 0.9 Mouncey/ProMISe 2015 0.9 Hypothermia After Cardiac Arrest HACA 2002 CI 1.08-1.81 Bernard 2002 0.046 Nielsen 2013 0.51
  • 18.
    Stochastic Dominance ofthe Null Hypothesis: The ARDSnet Population of Studies  KARMA, n=234, standard , β, δ  Stopped for futility n=234, δ(observed) = 1.0%, p=.85  ARMA, n=861, standard , β, δ  Stopped for efficacy, n=861, δ = 8.8%, p=0.007  LaSRS, n=180, standard , β; δ 15%, revised mid-study to 20% because of low enrollment  Observed δ 0.6%, P=1.0  FACTT, n=1000, standard , β, δ  Observed δ 2.9%, p=0.30  ALVEOLI, n=549, standard , β, δ  Stopped early; Observed δ 2.6, p=0.48  ARDSnet II  ALTA  EDEN  OMEGA  SAILS  This “population” of hypotheses is dominated by Ho; Ho not rejected 90% of the time
  • 19.
    Prior probability ofHo = 1- Ha Held, BMC Medical Research Methodology 2010, 10:21 http://www.biomedcentral.com/14 71-2288/10/21 Minimum posterior probability of Ho; 1-this = Maximum posterior probability of Ha Implications of Stochastic Dominance of Ho: A Nomogram for p-values
  • 20.
    Prior Probability ofHa p-value Maximum Posterior Probability of Ha 50% 0.05 75% 0.01 90% 0.001 98.50% 10% (ARDSnet) 0.05 < 50% 0.01 50% 0.001 85% 5% 0.05 <50% 0.01 <50% 0.001 75% “Extraordinary claims require extraordinary evidence.”
  • 21.
    High Prior Probabilityof Ho + Lax Alpha Standard = Many False Positive Studies Therapy Study Year p-value Intensive Insulin Van den Berghe 2001 <0.04 Brunkhorst 2008 0.74 Van den Berghe 2006 0.33 Preiser 2009 0.41 NICE-Sugar 2009 0.02 in the wrong direction drotrecogin-alfa Bernard/PROWESS 2001 0.005 Abraham 2005 0.34 Ranieri 2012 0.31 Early Goal Directed Therapy Rivers 2001 0.009 Muzzin/PROCESS 2014 0.83 ARISE 2015 0.9 Mouncey/ProMISe 2015 0.9 Hypothermia After Cardiac Arrest HACA 2002 CI 1.08-1.81 Bernard 2002 0.046 Nielsen 2013 0.51
  • 22.
  • 23.
    Cookbook Design ofan RCT (Recipes for Algorithmic Minds) What Ought to Be  Type I error rate selected based on risks of false positive  Type II error rate selected based on risk of false negative  Estimate baseline event rate based on preliminary or historic data  Estimate treatment effect size (delta) based on preliminary or historic data or precedent What is  0.05 used by convention  Power set at 80 +/- 10% by contention  Estimate baseline event rate  Estimate of how many patients it is feasible to enroll. Reverse calculate delta from this number
  • 24.
    Delta Inflation isto Research as Grade Inflation is to Academics
  • 25.
  • 26.
    Delta Inflation  Systematicoverestimation of observed delta with predicted delta  Mean Predicted delta 10.1%  Mean Observed delta 1.4%  Delta-gap 8.7%  Only 5/38 provided justification for delta  A “cluster” of predicted delta around 10%  Rarely (n=2) does observed delta exceed predicted delta  Bernard and Rivers not replicable  In 26/38 trials, the 95% confidence interval for observed delta did not even include predicted delta  Wide range of predicted delta argues against MCID guiding its choice
  • 27.
    Conclusions  Knowledge isbased upon evidence  Evidence has many forms and is not limited to “formal” evidence from RCTs  RCTs can have false positives and false negatives  Algorithmic or “cookbook” interpretation of RCTs can lead to erroneous conclusions  Algorithmic or “cookbook” design of RCTs can lead to erroneous conclusions
  • 28.
    believe nothing, no matterwhere you read it or who has said it, not even if I have said it, unless it agrees with your own reason and your own common sense -buddha