SlideShare a Scribd company logo
1 of 48
Pavlos Msaouel MD, PhD
Assistant Professor
Genitourinary Medical Oncology
Translational Molecular Pathology
Confounding and Causal Inference
Disclosures
• Advisory Boards / Honoraria:
oMirati Therapeutics
oBristol-Myers Squibb
oExelixis
• Non-branded educational programs:
oExelixis
oPfizer
• Clinical Trials with Grant Support:
oMirati Therapeutics
oBristol-Myers Squibb
oTakeda Pharmaceutical Company
• All my clinical trials use Bayesian designs.
• I refuse to use 3+3.
Statistics cannot directly encode causal knowledge
• 4 Analytical inputs:
 Question
 Knowledge  causal knowledge is a necessary ingredient to construct the golem
 Data
 Relationship
• Examples:
 Mud does not cause rain
 Symptoms do not cause the disease
• The bulk of human knowledge is causal. The bulk of medical and translational knowledge
is causal.
Statistics cannot directly encode causal knowledge
• Clinical scenario: a patient with clear cell renal cell carcinoma comes to clinic. We give
her immunotherapy instead of oral TKI and she has a complete response. How do we
know that choosing immunotherapy over the TKI caused the complete response?
 The only way to truly know if immunotherapy was the cause of the complete
response is if we had access to an alternate universe (“counterfactual universe”)
where everything else was equal but we gave oral TKI instead of immunotherapy.
 Counterfactual universes are the true Gold Standard for causality.
Statistics cannot directly encode causal knowledge
• No access to counterfactual universes: can never claim causality (Hume, Russell,
Pearson etc)
• Less extreme position: randomization can allow us to infer causality (but needs
assumptions)
 The process of randomization lies outside statistics
 There are other ways to infer causality (they also need assumptions)
• Even less extreme position: lab experiments can allow us to infer causality (but need
assumptions)
What is the purpose of RCTs?
• RCTs are clinical experiments.
• Their purpose is to compare two (or more) interventions.
• Relative measures are used to compare the interventions.
 Differences
 Ratios (more transportable)
o Hazard ratios (HR) and odds ratios (OR)
• The most reliable estimates are those contrasting all patients enrolled in each
intervention.
• Subgroup inferences are always less precise.
Interpreting RCT results
The MF07-01 multicenter, phase III RCT (Soran et al. Annals of Surgical Oncology, 2018)
compared resection of the primary tumor (LRT group) vs no surgery (ST group) in de novo
metastatic breast cancer. The overall survival results were HR = 0.66, 95% CI 0.49 to 0.88,
p =0.005 favoring the LRT group. However, when looking at Table 1 of the manuscript you
see the following imbalances:
Which of the following statements is most correct?
1. The imbalances in tumor type between LRT vs ST do not bias the results.
2. The results are biased because the ST group had more triple-negative (worse
prognosis) and fewer ER/PR+ (better prognosis) patients
3. The imbalances in tumor type between LRT vs ST suggest that the quality of
randomization was poor with p < 0.05.
4. I have no idea.
Interpreting RCT results
The MF07-01 multicenter, phase III RCT (Soran et al. Annals of Surgical Oncology, 2018)
compared resection of the primary tumor (LRT group) vs no surgery (ST group) in de novo
metastatic breast cancer. The overall survival results were HR = 0.66, 95% CI 0.49 to 0.88,
p =0.005 favoring the LRT group. However, when looking at Table 1 of the manuscript you
see the following imbalances:
Which of the following statements is most correct?
1. The imbalances in tumor type between LRT vs ST do not bias the results.
2. The results are biased because the ST group had more triple-negative (worse
prognosis) and fewer ER/PR+ (better prognosis) patients
3. The imbalances in tumor type between LRT vs ST suggest that the quality of
randomization was poor with p < 0.05.
4. I have no idea.
WHAT?!!??
Table 1 fallacy
• The practice of seeing imbalances in baseline variables in Table 1 from an RCT and
concluding that these imbalances bias the results.
• Further reading:
• https://discourse.datamethods.org/t/should-we-ignore-covariate-imbalance-and-
stop-presenting-a-stratified-table-one-for-randomized-trials/547
• Assmann et al. "Subgroup analysis and other (mis)uses of baseline data in clinical
trials" Lancet (2000) PMID: 10744093
• Senn S. “Baseline comparisons in randomized clinical trials” Stat Med. (1991) PMID:
1876802
• Begg CB. “Suspended judgment. Significance tests of covariate imbalance in clinical
trials”. Control Clin Trials (1990) PMID: 2171874
Randomness comes in clusters
Random coin flip sequence (heads vs tails) 30 times using random.org website:
H H T T H T H H T T H H T H H T T T T T T T H T H T H T T H
First 15 coin flips: 9 heads and 6 tails
Last 15 coin flips: 4 heads and 11 tails
Random pattern Non-random pattern
Randomness comes in clusters
• The very definition of randomization implies there will be imbalances in prognostic factors
• Valid inference does not require such balance
• Stratification balances prognostic factors
• Balanced prognostic factors result in better precision
• Randomization removes bias from confounders resulting in more accurate inferences
Balance what you know. Randomize what you don’t know.
Precision and accuracy
• Accuracy is the opposite of bias. High accuracy = low bias.
 Accuracy = trueness (answers the question: is my inference true?)
 Randomization is a causal concept. Answers the question: Was choosing LRT over ST
the cause of the better overall survival?
• Precision measures variability
 Answers the question: how close are my measurements to each other?
 The less correct term is “power”
Precision and accuracy
• The width of the 95% CI can tell us the precision of the study.
• Higher precision: more narrow confidence intervals
• Balanced prognostic factors (e.g., via stratification) -> more narrow confidence intervals
Precision and accuracy
RCT with balanced
prognostic factors
Large observational
study with strong
unaccounted
confounding
Accurate &
Precise
Not accurate
but precise
Accurate
but not precise
Not accurate &
not precise
RCT with imbalanced
prognostic factors
Small observational
study with strong
unaccounted
confounding
Interpreting RCT results
The MF07-01 multicenter, phase III RCT (Soran et al. Annals of Surgical Oncology, 2018)
compared resection of primary Tumor (LRT group) vs no surgery (ST group) in de novo
metastatic breast cancer
The overall survival results were HR = 0.66, 95% CI 0.49 to 0.88, p =0.005
The results were precise enough to make the inference that LRT produces better overall
survival vs ST
If tumor types (and other prognostic variables) were balanced, this would have resulted in
even higher precision (more narrow confidence intervals)
Balance improves “power” (the correct term is precision) even if sample size is the same
How to achieve balance
• Stratification
• Adjustment (covariate adjustment): include the prognostic variable in a regression model
 This will produce adjusted HRs (vs unadjusted HRs)
"A Chosen One shall come, born of no father, and through him will ultimate balance in the
Force be restored.“
- Ancient Jedi prophecy
Stratification vs covariate adjustment
The KEYNOTE-426 trial (Rini et al. NEJM, 2019) was a multicenter phase 3 RCT comparing
pembrolizumab + axitinib vs. sunitinib as first-line therapy for clear cell RCC.
Randomization was stratified according to the International Metastatic Renal Cell
Carcinoma Database Consortium (IMDC) risk group (favorable, intermediate, or poor risk)
and geographic region (North America, Western Europe, or the rest of the world).
Which of the following statements is most correct?
1. There is no need to covariate adjust for IMDC and geographic region as those are
already balanced between the treatment groups due to stratification.
2. The unadjusted hazard ratio is a less biased estimate that is more generalizable to
our patient population. Adjusted hazard ratios are less generalizable.
3. Covariate adjustment will further increase the precision of the hazard ratio
estimate.
4. I have no idea.
Stratification vs covariate adjustment
The KEYNOTE-426 trial (Rini et al. NEJM, 2019) was a multicenter phase 3 RCT comparing
pembrolizumab + axitinib vs. sunitinib as first-line therapy for clear cell RCC.
Randomization was stratified according to the International Metastatic Renal Cell
Carcinoma Database Consortium (IMDC) risk group (favorable, intermediate, or poor risk)
and geographic region (North America, Western Europe, or the rest of the world).
Which of the following statements is most correct?
1. There is no need to covariate adjust for IMDC and geographic region as those are
already balanced between the treatment groups due to stratification.
2. The unadjusted hazard ratio is a less biased estimate that is more generalizable to
our patient population. Adjusted hazard ratios are less generalizable.
3. Covariate adjustment will further increase the precision of the hazard ratio
estimate.
4. I have no idea.
The value of covariate adjustment
• Increases precision (even more than stratification)
• Produces more generalizable estimates:
 Adjusted HR: compares a patient who received pembrolizumab + axitinib to a patient
who received sunitinib and started with the same IMDC risk and from the same
geographic region
 Unadjusted HRs make more assumptions because they depend on the entire sample
mix and will not transport to a population with a different covariate mix.
• Balance prognostic factors by using both stratification and adjustment. But it you have to
choose, choose adjustment over stratification.
• Further reading:
 Senn S. “Seven myths of randomisation in clinical trials” Stat Med (2013) PMID:
23255195
 https://www.fharrell.com/post/covadj/
Example graph used in basic & translational Research
Msaouel et al. Cancer Cell, 2020
Example graph used in clinical & co-clinical research
Shapiro & Msaouel. Clin Genitour Cancer, 2020
The bulk of human knowledge is causal
We need a mathematical language to encode causal knowledge: the do-calculus
Judea Pearl et al. “Causal Inference in Statistics: A Primer”
The bulk of human knowledge is causal
H0 : null hypothesis
D: data
Probability theories:
• P(D | H0) -> Frequentist probability
• P(H0 | D) -> Bayesian probability
do-Calculus:
• P(D | do X=x) distinguishes cases where *we* fix X = x
• “Do” vs “See”
• P(Rain | do mud) = 0 ≠ P(Rain | Mud)
• P(Disease | do symptoms) = 0 ≠ P(Disease | Symptoms) > 0
• P(Immunotherapy colitis | do diarrhea) = 0 ≠ P(IO colitis | diarrhea) > 0
• “If Aristotle was alive today, he would be
breathing water”
• Starting with a false premise always gets you
a true inference statement in classical logic
• The goal is to develop artificial general
intelligence
• But we can also use the do-calculus to make
more intelligent clinical decisions
Judea Pearl and Dana MacKenzie. “The Book of Why: The New Science of Cause and Effect”
The Ladder of Causation
Directed acyclic graphs (DAGs)
• Directed acyclic graphs (DAGs) are qualitative visual representations of the structural
causal model describing the functional relationships (ie, the structural equations)
between variables of interest.
• DAGs fully correspond to the do-calculus.
• “Directed”: all arcs have arrows
Judea Pearl et al. “Causal Inference in Statistics: A Primer”
Example non-directional
arc (no arrowheads)
between A and B
Directed acyclic graphs (DAGs)
• Directed acyclic graphs (DAGs) are qualitative visual representations of the structural
causal model describing the functional relationships (ie, the structural equations)
between variables of interest.
• DAGs fully correspond to the do-calculus.
• “Acyclic”: no directed path in the graph forms a closed loop
Judea Pearl et al. “Causal Inference in Statistics: A Primer”
A B
C
D
Cyclic graph: A causes A
Direct causal relationship between exposure (A) and outcome (B)
The causal relationship between exposure (A) and outcome (B) is
mediated by M.
M is a “mediator”
C acts as a collider blocking the causal relationship between
exposure (A) and outcome (B).
C acts as a confounder. There is no causal relationship between
exposure (A) and outcome (B).
Basic Types of Causal Relationships
Basic Types of Causal Relationships
No need for adjustment
Do not adjust for mediators (usually). Adjustment for M blocks the
causal effect we want to estimate.
Never adjust for colliders. Adjustment for C opens a false causal pathway
between A and B (“collider bias”)
Always adjust for confounders (if known and enough sample size).
↑Bias (↓ accuracy)
↑Bias (↓ accuracy)
↓Bias (↑ accuracy)
Question: what is the effect of primary tumor size on overall survival?
Scenarios
Question: what is the effect of primary tumor size on number of metastases?
Scenarios
Question: what is the effect of diet on risk of RCC?
Scenarios
To adjust or not to adjust?
We are performing a retrospective analysis to determine whether a new immunotherapy
(superlumab) improves overall survival compared with other immunotherapies. We have
also measured baseline serum IL-6 levels for these patients. The scientific consensus on
the relationship between immunotherapy type, serum IL-6 and overall survival is codified
below:
Which of the following statements is most correct?
1. Baseline IL-6 level is a mediator. It should be not adjusted for in the analysis.
2. Baseline IL-6 level is a collider. It should be adjusted for in the analysis.
3. Baseline IL-6 level is a confounder. It should be adjusted for in the analysis.
4. I have no idea.
Immunotherapy Overall survival
Baseline IL-6 level
To adjust or not to adjust?
We are performing a retrospective analysis to determine whether a new immunotherapy
(superlumab) improves overall survival compared with other immunotherapies. We have
also measured baseline serum IL-6 levels for these patients. The scientific consensus on
the relationship between immunotherapy type, serum IL-6 and overall survival is codified
below:
Which of the following statements is most correct?
1. Baseline IL-6 level is a mediator. It should be not adjusted for in the analysis.
2. Baseline IL-6 level is a collider. It should be adjusted for in the analysis.
3. Baseline IL-6 level is a confounder. It should be adjusted for in the analysis.
4. I have no idea.
Immunotherapy Overall survival
Baseline IL-6 level
To adjust or not to adjust? (sequel)
We are performing a retrospective analysis to determine whether a new immunotherapy
(superlumab) improves overall survival compared with other immunotherapies. We have
also measured TGFβ genotypes for these patients. The scientific consensus on the
relationship between immunotherapy type, TGFβ genotype and overall survival is codified
below:
Which of the following statements is most correct?
1. TGFβ genotype is a mediator. It should be not adjusted for in the analysis.
2. TGFβ genotype is a collider. It should be adjusted for in the analysis.
3. TGFβ genotype is a confounder. It should be adjusted for in the analysis.
4. I have no idea.
Immunotherapy Overall survival
TGFβ genotype
To adjust or not to adjust? (sequel)
We are performing a retrospective analysis to determine whether a new immunotherapy
(superlumab) improves overall survival compared with other immunotherapies. We have
also measured TGFβ genotypes for these patients. The scientific consensus on the
relationship between immunotherapy type, TGFβ genotype and overall survival is codified
below:
Which of the following statements is most correct?
1. TGFβ genotype is a mediator. It should be not adjusted for in the analysis.
2. TGFβ genotype is a collider. It should be adjusted for in the analysis.
3. TGFβ genotype is a confounder. It should be adjusted for in the analysis.
4. I have no idea.
Immunotherapy Overall survival
TGFβ genotype
Types of causally “neutral” variables
Gender is a “neutral variable”
Adjustment for gender does not affect bias/accuracy.
Adjustment for gender increases precision (↓ outcome heterogeneity)
Gender is a prognostic factor!
Types of causally “neutral” variables
Cancer center is a “neutral variable”
Adjustment for cancer center does not affect bias/accuracy.
Adjustment for cancer center decreases precision (↓ exposure heterogeneity)
What randomization truly does
Adjustment for IMDC
Increases precision
Observational study
Randomized study (do random treatment)
Adjustment for IMDC
reduces bias (↑ accuracy)
How to make your own DAGs
http://www.dagitty.net/dags.html
How to make your own DAGs
https://causalfusion.net
Estimating the effect of RCC histology on overall survival
Typical multivariable regression model
Corresponding DAG
Shapiro & Msaouel et al. Clin Genitour Cancer, 2020
Estimating the effect of RCC histology on overall survival
Start with a plausible DAG
Corresponding regression
model
Shapiro & Msaouel et al. Clin Genitour Cancer, 2020
Estimating the effect of RCC histology on overall survival
Start with a plausible DAG
Table 2 Fallacy:
this regression model should not
be used to estimate the causal
effect of biological sex on overall
survival. RCC subtype is a
mediator for the effect of
biological sex on overall survival.
Shapiro & Msaouel et al. Clin Genitour Cancer, 2020
Summary
• Imbalances between subgroups in RCTs reduce precision but not accuracy
• Stratification and adjustment for prognostic factors increases the
precision of RCTs
• Always adjust, even if you stratify
• DAGs can be used to represent causal relations of interest
• Adjust for confounders but not for colliders or mediators
• The statistical analysis (regression model) will always depend on the
question at hand
Questions?
pmsaouel@mdanderson.org
@PavlosMsaouel

More Related Content

What's hot

The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA FisherStephen Senn
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differencesStephen Senn
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher Stephen Senn
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to misleadStephen Senn
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analysesMike LaValley
 
Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2Stephen Senn
 
What is your question
What is your questionWhat is your question
What is your questionStephenSenn2
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresStephen Senn
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Stephen Senn
 
Very good statistics-overview rbc (1)
Very good statistics-overview rbc (1)Very good statistics-overview rbc (1)
Very good statistics-overview rbc (1)Abdul Wasay Baloch
 
Biostatistics Workshop: Sample Size & Power
Biostatistics Workshop: Sample Size & PowerBiostatistics Workshop: Sample Size & Power
Biostatistics Workshop: Sample Size & PowerHopkinsCFAR
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Stephen Senn
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significanceStephenSenn2
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and ProbablityAbdul Wasay Baloch
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function Stephen Senn
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansStephen Senn
 
Placebos in medical research
Placebos in medical researchPlacebos in medical research
Placebos in medical researchStephen Senn
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxStephen Senn
 
P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding catsStephen Senn
 

What's hot (20)

The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA Fisher
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differences
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to mislead
 
Repeated events analyses
Repeated events analysesRepeated events analyses
Repeated events analyses
 
Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2
 
What is your question
What is your questionWhat is your question
What is your question
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19
 
Stat
StatStat
Stat
 
Very good statistics-overview rbc (1)
Very good statistics-overview rbc (1)Very good statistics-overview rbc (1)
Very good statistics-overview rbc (1)
 
Biostatistics Workshop: Sample Size & Power
Biostatistics Workshop: Sample Size & PowerBiostatistics Workshop: Sample Size & Power
Biostatistics Workshop: Sample Size & Power
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and Probablity
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
 
Placebos in medical research
Placebos in medical researchPlacebos in medical research
Placebos in medical research
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding cats
 

Similar to Pavlos Msaouel MD, PhD on Causal Inference and RCT Interpretation

Research methodology 101
Research methodology 101Research methodology 101
Research methodology 101Hesham Gaber
 
Epidemological methods
Epidemological methodsEpidemological methods
Epidemological methodsKundan Singh
 
Depersonalising medicine
Depersonalising medicineDepersonalising medicine
Depersonalising medicineStephen Senn
 
MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...
MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...
MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...European School of Oncology
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesnQuery
 
UAB Pulmonary board review study design and statistical principles
UAB Pulmonary board review study  design and statistical principles UAB Pulmonary board review study  design and statistical principles
UAB Pulmonary board review study design and statistical principles Terry Shaneyfelt
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataCTSI at UCSF
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Nicole Krämer
 
Critical Appriaisal Skills Basic 1 | May 4th 2011
Critical Appriaisal Skills Basic 1 | May 4th 2011Critical Appriaisal Skills Basic 1 | May 4th 2011
Critical Appriaisal Skills Basic 1 | May 4th 2011NES
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...cambridgeWD
 
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019Cecile Janssens
 
Hss4303b mortality and morbidity
Hss4303b   mortality and morbidityHss4303b   mortality and morbidity
Hss4303b mortality and morbiditycoolboy101pk
 
research poster - karishma patel
research poster - karishma patelresearch poster - karishma patel
research poster - karishma patelkarishma patel
 
FEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUME
FEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUMEFEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUME
FEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUMEKanhu Charan
 
Excelsior College PBH 321 Page 1 CASE-CONTROL STU.docx
Excelsior College PBH 321    Page 1 CASE-CONTROL STU.docxExcelsior College PBH 321    Page 1 CASE-CONTROL STU.docx
Excelsior College PBH 321 Page 1 CASE-CONTROL STU.docxgitagrimston
 
Understanding clinical trial's statistics
Understanding clinical trial's statisticsUnderstanding clinical trial's statistics
Understanding clinical trial's statisticsMagdy Khames Aly
 
Medical Statistics Pt 1
Medical Statistics Pt 1Medical Statistics Pt 1
Medical Statistics Pt 1Fastbleep
 

Similar to Pavlos Msaouel MD, PhD on Causal Inference and RCT Interpretation (20)

Research methodology 101
Research methodology 101Research methodology 101
Research methodology 101
 
Epidemological methods
Epidemological methodsEpidemological methods
Epidemological methods
 
Depersonalising medicine
Depersonalising medicineDepersonalising medicine
Depersonalising medicine
 
Randomization
Randomization Randomization
Randomization
 
MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...
MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...
MCO 2011 - Slide 12 - J. Gligorov - Spotlight session - Triple negative breas...
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
 
UAB Pulmonary board review study design and statistical principles
UAB Pulmonary board review study  design and statistical principles UAB Pulmonary board review study  design and statistical principles
UAB Pulmonary board review study design and statistical principles
 
Analytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational DataAnalytic Methods and Issues in CER from Observational Data
Analytic Methods and Issues in CER from Observational Data
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...
 
Critical Appriaisal Skills Basic 1 | May 4th 2011
Critical Appriaisal Skills Basic 1 | May 4th 2011Critical Appriaisal Skills Basic 1 | May 4th 2011
Critical Appriaisal Skills Basic 1 | May 4th 2011
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
 
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
 
Hss4303b mortality and morbidity
Hss4303b   mortality and morbidityHss4303b   mortality and morbidity
Hss4303b mortality and morbidity
 
research poster - karishma patel
research poster - karishma patelresearch poster - karishma patel
research poster - karishma patel
 
FEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUME
FEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUMEFEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUME
FEBRUARY 2024 ONCOLOGY CARTOON /95TH VOLUME
 
Excelsior College PBH 321 Page 1 CASE-CONTROL STU.docx
Excelsior College PBH 321    Page 1 CASE-CONTROL STU.docxExcelsior College PBH 321    Page 1 CASE-CONTROL STU.docx
Excelsior College PBH 321 Page 1 CASE-CONTROL STU.docx
 
Understanding clinical trial's statistics
Understanding clinical trial's statisticsUnderstanding clinical trial's statistics
Understanding clinical trial's statistics
 
Medical Statistics Pt 1
Medical Statistics Pt 1Medical Statistics Pt 1
Medical Statistics Pt 1
 
Research by MAGIC
Research by MAGICResearch by MAGIC
Research by MAGIC
 

Recently uploaded

Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 

Recently uploaded (20)

Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 

Pavlos Msaouel MD, PhD on Causal Inference and RCT Interpretation

  • 1. Pavlos Msaouel MD, PhD Assistant Professor Genitourinary Medical Oncology Translational Molecular Pathology Confounding and Causal Inference
  • 2. Disclosures • Advisory Boards / Honoraria: oMirati Therapeutics oBristol-Myers Squibb oExelixis • Non-branded educational programs: oExelixis oPfizer • Clinical Trials with Grant Support: oMirati Therapeutics oBristol-Myers Squibb oTakeda Pharmaceutical Company • All my clinical trials use Bayesian designs. • I refuse to use 3+3.
  • 3. Statistics cannot directly encode causal knowledge • 4 Analytical inputs:  Question  Knowledge  causal knowledge is a necessary ingredient to construct the golem  Data  Relationship • Examples:  Mud does not cause rain  Symptoms do not cause the disease • The bulk of human knowledge is causal. The bulk of medical and translational knowledge is causal.
  • 4. Statistics cannot directly encode causal knowledge • Clinical scenario: a patient with clear cell renal cell carcinoma comes to clinic. We give her immunotherapy instead of oral TKI and she has a complete response. How do we know that choosing immunotherapy over the TKI caused the complete response?  The only way to truly know if immunotherapy was the cause of the complete response is if we had access to an alternate universe (“counterfactual universe”) where everything else was equal but we gave oral TKI instead of immunotherapy.  Counterfactual universes are the true Gold Standard for causality.
  • 5. Statistics cannot directly encode causal knowledge • No access to counterfactual universes: can never claim causality (Hume, Russell, Pearson etc) • Less extreme position: randomization can allow us to infer causality (but needs assumptions)  The process of randomization lies outside statistics  There are other ways to infer causality (they also need assumptions) • Even less extreme position: lab experiments can allow us to infer causality (but need assumptions)
  • 6. What is the purpose of RCTs? • RCTs are clinical experiments. • Their purpose is to compare two (or more) interventions. • Relative measures are used to compare the interventions.  Differences  Ratios (more transportable) o Hazard ratios (HR) and odds ratios (OR) • The most reliable estimates are those contrasting all patients enrolled in each intervention. • Subgroup inferences are always less precise.
  • 7. Interpreting RCT results The MF07-01 multicenter, phase III RCT (Soran et al. Annals of Surgical Oncology, 2018) compared resection of the primary tumor (LRT group) vs no surgery (ST group) in de novo metastatic breast cancer. The overall survival results were HR = 0.66, 95% CI 0.49 to 0.88, p =0.005 favoring the LRT group. However, when looking at Table 1 of the manuscript you see the following imbalances: Which of the following statements is most correct? 1. The imbalances in tumor type between LRT vs ST do not bias the results. 2. The results are biased because the ST group had more triple-negative (worse prognosis) and fewer ER/PR+ (better prognosis) patients 3. The imbalances in tumor type between LRT vs ST suggest that the quality of randomization was poor with p < 0.05. 4. I have no idea.
  • 8. Interpreting RCT results The MF07-01 multicenter, phase III RCT (Soran et al. Annals of Surgical Oncology, 2018) compared resection of the primary tumor (LRT group) vs no surgery (ST group) in de novo metastatic breast cancer. The overall survival results were HR = 0.66, 95% CI 0.49 to 0.88, p =0.005 favoring the LRT group. However, when looking at Table 1 of the manuscript you see the following imbalances: Which of the following statements is most correct? 1. The imbalances in tumor type between LRT vs ST do not bias the results. 2. The results are biased because the ST group had more triple-negative (worse prognosis) and fewer ER/PR+ (better prognosis) patients 3. The imbalances in tumor type between LRT vs ST suggest that the quality of randomization was poor with p < 0.05. 4. I have no idea.
  • 10. Table 1 fallacy • The practice of seeing imbalances in baseline variables in Table 1 from an RCT and concluding that these imbalances bias the results. • Further reading: • https://discourse.datamethods.org/t/should-we-ignore-covariate-imbalance-and- stop-presenting-a-stratified-table-one-for-randomized-trials/547 • Assmann et al. "Subgroup analysis and other (mis)uses of baseline data in clinical trials" Lancet (2000) PMID: 10744093 • Senn S. “Baseline comparisons in randomized clinical trials” Stat Med. (1991) PMID: 1876802 • Begg CB. “Suspended judgment. Significance tests of covariate imbalance in clinical trials”. Control Clin Trials (1990) PMID: 2171874
  • 11. Randomness comes in clusters Random coin flip sequence (heads vs tails) 30 times using random.org website: H H T T H T H H T T H H T H H T T T T T T T H T H T H T T H First 15 coin flips: 9 heads and 6 tails Last 15 coin flips: 4 heads and 11 tails Random pattern Non-random pattern
  • 12. Randomness comes in clusters • The very definition of randomization implies there will be imbalances in prognostic factors • Valid inference does not require such balance • Stratification balances prognostic factors • Balanced prognostic factors result in better precision • Randomization removes bias from confounders resulting in more accurate inferences Balance what you know. Randomize what you don’t know.
  • 13. Precision and accuracy • Accuracy is the opposite of bias. High accuracy = low bias.  Accuracy = trueness (answers the question: is my inference true?)  Randomization is a causal concept. Answers the question: Was choosing LRT over ST the cause of the better overall survival? • Precision measures variability  Answers the question: how close are my measurements to each other?  The less correct term is “power”
  • 14. Precision and accuracy • The width of the 95% CI can tell us the precision of the study. • Higher precision: more narrow confidence intervals • Balanced prognostic factors (e.g., via stratification) -> more narrow confidence intervals
  • 15. Precision and accuracy RCT with balanced prognostic factors Large observational study with strong unaccounted confounding Accurate & Precise Not accurate but precise Accurate but not precise Not accurate & not precise RCT with imbalanced prognostic factors Small observational study with strong unaccounted confounding
  • 16. Interpreting RCT results The MF07-01 multicenter, phase III RCT (Soran et al. Annals of Surgical Oncology, 2018) compared resection of primary Tumor (LRT group) vs no surgery (ST group) in de novo metastatic breast cancer The overall survival results were HR = 0.66, 95% CI 0.49 to 0.88, p =0.005 The results were precise enough to make the inference that LRT produces better overall survival vs ST If tumor types (and other prognostic variables) were balanced, this would have resulted in even higher precision (more narrow confidence intervals) Balance improves “power” (the correct term is precision) even if sample size is the same
  • 17. How to achieve balance • Stratification • Adjustment (covariate adjustment): include the prognostic variable in a regression model  This will produce adjusted HRs (vs unadjusted HRs) "A Chosen One shall come, born of no father, and through him will ultimate balance in the Force be restored.“ - Ancient Jedi prophecy
  • 18. Stratification vs covariate adjustment The KEYNOTE-426 trial (Rini et al. NEJM, 2019) was a multicenter phase 3 RCT comparing pembrolizumab + axitinib vs. sunitinib as first-line therapy for clear cell RCC. Randomization was stratified according to the International Metastatic Renal Cell Carcinoma Database Consortium (IMDC) risk group (favorable, intermediate, or poor risk) and geographic region (North America, Western Europe, or the rest of the world). Which of the following statements is most correct? 1. There is no need to covariate adjust for IMDC and geographic region as those are already balanced between the treatment groups due to stratification. 2. The unadjusted hazard ratio is a less biased estimate that is more generalizable to our patient population. Adjusted hazard ratios are less generalizable. 3. Covariate adjustment will further increase the precision of the hazard ratio estimate. 4. I have no idea.
  • 19. Stratification vs covariate adjustment The KEYNOTE-426 trial (Rini et al. NEJM, 2019) was a multicenter phase 3 RCT comparing pembrolizumab + axitinib vs. sunitinib as first-line therapy for clear cell RCC. Randomization was stratified according to the International Metastatic Renal Cell Carcinoma Database Consortium (IMDC) risk group (favorable, intermediate, or poor risk) and geographic region (North America, Western Europe, or the rest of the world). Which of the following statements is most correct? 1. There is no need to covariate adjust for IMDC and geographic region as those are already balanced between the treatment groups due to stratification. 2. The unadjusted hazard ratio is a less biased estimate that is more generalizable to our patient population. Adjusted hazard ratios are less generalizable. 3. Covariate adjustment will further increase the precision of the hazard ratio estimate. 4. I have no idea.
  • 20. The value of covariate adjustment • Increases precision (even more than stratification) • Produces more generalizable estimates:  Adjusted HR: compares a patient who received pembrolizumab + axitinib to a patient who received sunitinib and started with the same IMDC risk and from the same geographic region  Unadjusted HRs make more assumptions because they depend on the entire sample mix and will not transport to a population with a different covariate mix. • Balance prognostic factors by using both stratification and adjustment. But it you have to choose, choose adjustment over stratification. • Further reading:  Senn S. “Seven myths of randomisation in clinical trials” Stat Med (2013) PMID: 23255195  https://www.fharrell.com/post/covadj/
  • 21.
  • 22. Example graph used in basic & translational Research Msaouel et al. Cancer Cell, 2020
  • 23. Example graph used in clinical & co-clinical research Shapiro & Msaouel. Clin Genitour Cancer, 2020
  • 24. The bulk of human knowledge is causal We need a mathematical language to encode causal knowledge: the do-calculus Judea Pearl et al. “Causal Inference in Statistics: A Primer”
  • 25. The bulk of human knowledge is causal H0 : null hypothesis D: data Probability theories: • P(D | H0) -> Frequentist probability • P(H0 | D) -> Bayesian probability do-Calculus: • P(D | do X=x) distinguishes cases where *we* fix X = x • “Do” vs “See” • P(Rain | do mud) = 0 ≠ P(Rain | Mud) • P(Disease | do symptoms) = 0 ≠ P(Disease | Symptoms) > 0 • P(Immunotherapy colitis | do diarrhea) = 0 ≠ P(IO colitis | diarrhea) > 0
  • 26. • “If Aristotle was alive today, he would be breathing water” • Starting with a false premise always gets you a true inference statement in classical logic • The goal is to develop artificial general intelligence • But we can also use the do-calculus to make more intelligent clinical decisions Judea Pearl and Dana MacKenzie. “The Book of Why: The New Science of Cause and Effect” The Ladder of Causation
  • 27. Directed acyclic graphs (DAGs) • Directed acyclic graphs (DAGs) are qualitative visual representations of the structural causal model describing the functional relationships (ie, the structural equations) between variables of interest. • DAGs fully correspond to the do-calculus. • “Directed”: all arcs have arrows Judea Pearl et al. “Causal Inference in Statistics: A Primer” Example non-directional arc (no arrowheads) between A and B
  • 28. Directed acyclic graphs (DAGs) • Directed acyclic graphs (DAGs) are qualitative visual representations of the structural causal model describing the functional relationships (ie, the structural equations) between variables of interest. • DAGs fully correspond to the do-calculus. • “Acyclic”: no directed path in the graph forms a closed loop Judea Pearl et al. “Causal Inference in Statistics: A Primer” A B C D Cyclic graph: A causes A
  • 29. Direct causal relationship between exposure (A) and outcome (B) The causal relationship between exposure (A) and outcome (B) is mediated by M. M is a “mediator” C acts as a collider blocking the causal relationship between exposure (A) and outcome (B). C acts as a confounder. There is no causal relationship between exposure (A) and outcome (B). Basic Types of Causal Relationships
  • 30. Basic Types of Causal Relationships No need for adjustment Do not adjust for mediators (usually). Adjustment for M blocks the causal effect we want to estimate. Never adjust for colliders. Adjustment for C opens a false causal pathway between A and B (“collider bias”) Always adjust for confounders (if known and enough sample size). ↑Bias (↓ accuracy) ↑Bias (↓ accuracy) ↓Bias (↑ accuracy)
  • 31. Question: what is the effect of primary tumor size on overall survival? Scenarios
  • 32. Question: what is the effect of primary tumor size on number of metastases? Scenarios
  • 33. Question: what is the effect of diet on risk of RCC? Scenarios
  • 34.
  • 35. To adjust or not to adjust? We are performing a retrospective analysis to determine whether a new immunotherapy (superlumab) improves overall survival compared with other immunotherapies. We have also measured baseline serum IL-6 levels for these patients. The scientific consensus on the relationship between immunotherapy type, serum IL-6 and overall survival is codified below: Which of the following statements is most correct? 1. Baseline IL-6 level is a mediator. It should be not adjusted for in the analysis. 2. Baseline IL-6 level is a collider. It should be adjusted for in the analysis. 3. Baseline IL-6 level is a confounder. It should be adjusted for in the analysis. 4. I have no idea. Immunotherapy Overall survival Baseline IL-6 level
  • 36. To adjust or not to adjust? We are performing a retrospective analysis to determine whether a new immunotherapy (superlumab) improves overall survival compared with other immunotherapies. We have also measured baseline serum IL-6 levels for these patients. The scientific consensus on the relationship between immunotherapy type, serum IL-6 and overall survival is codified below: Which of the following statements is most correct? 1. Baseline IL-6 level is a mediator. It should be not adjusted for in the analysis. 2. Baseline IL-6 level is a collider. It should be adjusted for in the analysis. 3. Baseline IL-6 level is a confounder. It should be adjusted for in the analysis. 4. I have no idea. Immunotherapy Overall survival Baseline IL-6 level
  • 37. To adjust or not to adjust? (sequel) We are performing a retrospective analysis to determine whether a new immunotherapy (superlumab) improves overall survival compared with other immunotherapies. We have also measured TGFβ genotypes for these patients. The scientific consensus on the relationship between immunotherapy type, TGFβ genotype and overall survival is codified below: Which of the following statements is most correct? 1. TGFβ genotype is a mediator. It should be not adjusted for in the analysis. 2. TGFβ genotype is a collider. It should be adjusted for in the analysis. 3. TGFβ genotype is a confounder. It should be adjusted for in the analysis. 4. I have no idea. Immunotherapy Overall survival TGFβ genotype
  • 38. To adjust or not to adjust? (sequel) We are performing a retrospective analysis to determine whether a new immunotherapy (superlumab) improves overall survival compared with other immunotherapies. We have also measured TGFβ genotypes for these patients. The scientific consensus on the relationship between immunotherapy type, TGFβ genotype and overall survival is codified below: Which of the following statements is most correct? 1. TGFβ genotype is a mediator. It should be not adjusted for in the analysis. 2. TGFβ genotype is a collider. It should be adjusted for in the analysis. 3. TGFβ genotype is a confounder. It should be adjusted for in the analysis. 4. I have no idea. Immunotherapy Overall survival TGFβ genotype
  • 39. Types of causally “neutral” variables Gender is a “neutral variable” Adjustment for gender does not affect bias/accuracy. Adjustment for gender increases precision (↓ outcome heterogeneity) Gender is a prognostic factor!
  • 40. Types of causally “neutral” variables Cancer center is a “neutral variable” Adjustment for cancer center does not affect bias/accuracy. Adjustment for cancer center decreases precision (↓ exposure heterogeneity)
  • 41. What randomization truly does Adjustment for IMDC Increases precision Observational study Randomized study (do random treatment) Adjustment for IMDC reduces bias (↑ accuracy)
  • 42. How to make your own DAGs http://www.dagitty.net/dags.html
  • 43. How to make your own DAGs https://causalfusion.net
  • 44. Estimating the effect of RCC histology on overall survival Typical multivariable regression model Corresponding DAG Shapiro & Msaouel et al. Clin Genitour Cancer, 2020
  • 45. Estimating the effect of RCC histology on overall survival Start with a plausible DAG Corresponding regression model Shapiro & Msaouel et al. Clin Genitour Cancer, 2020
  • 46. Estimating the effect of RCC histology on overall survival Start with a plausible DAG Table 2 Fallacy: this regression model should not be used to estimate the causal effect of biological sex on overall survival. RCC subtype is a mediator for the effect of biological sex on overall survival. Shapiro & Msaouel et al. Clin Genitour Cancer, 2020
  • 47. Summary • Imbalances between subgroups in RCTs reduce precision but not accuracy • Stratification and adjustment for prognostic factors increases the precision of RCTs • Always adjust, even if you stratify • DAGs can be used to represent causal relations of interest • Adjust for confounders but not for colliders or mediators • The statistical analysis (regression model) will always depend on the question at hand

Editor's Notes

  1. PMID: 29777404 https://en.wikipedia.org/wiki/Accuracy_and_precision Trueness
  2. PMID: 29777404 https://en.wikipedia.org/wiki/Accuracy_and_precision Trueness
  3. Starting with a false premise always gets you a true inference statement in classical logic