SlideShare a Scribd company logo
1 of 25
How to use cross-validation to
reliably estimate subgroup effects
Nicole Krämer*, Josef Höfler*, Carina Ittrich#
PSI 2019 - Data Science and Machine Learning Session
*Staburo GmbH, Munich, Germany
#Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
The goals of our presentation are to …
… make you aware how strongly subgroup identification methods can overfit,
… explain how cross-validation can help to obtain more realistic subgroup effects.
… show in simulations that cross-validation leads to more accurate estimates for
subgroup effects.
…illustrate how you can apply cross-validated subgroup effects in a clinical trial.
1
It is not our goal to
 find good or bad subgroup identification methods.
 discuss the usefulness of subgroup identification in general.
(Hypothetical) case study
• Randomized phase II trial comparing treatment A and B in a parallel design.
• Endpoint: Progression-free survival
• Relative treatment benefit: hazard ratio (A versus B)
• Biomarker
• Solid evidence: Expression level of gene SLDEV
• Exploratory: expression levels of 50 genes
Trial population
(n=200)
Treatment A
Treatment B
100 patients
100 patients
How to identify a subgroup
based on these biomarkers?
Modified „Breast“ dataset from the R package biospear.
• In this case, subgroup identification often corresponds to finding a cutoff c and a direction.
• Popular strategy: Go through a list of cut-offs and find the „best one“.
a) Minimize the interaction p-value
b) Minimize min(HR<=c,HR>c)
c) Maximize the partial log-likelihood from
the interaction model
d) ….
Important (boring?) example: one continuous biomarker
1
Typically, a constraint is added to
ensure that the subgroups are
sufficiently large.
Do we really believe that the true hazard ratio is 0.53??
In this example, the cut-off
leads to the smallest interaction
p-value (criterion a).
Foster, J. C., Taylor, J. M., & Ruberg, S. J. (2011).
Subgroup identification from randomized clinical trial data. Statistics in medicine, 30(24), 2867-2880
1. In each treatment arm, model the probability of a response (e.g. via random forests)
2. For each patient, predict the probability of a response
3. Define predicted relative treatment benefit: 𝑓𝐴 − 𝑓𝐵.
4. Learn classification tree on predicted relative treatment benefit.
Another example: The Virtual Twin Method
𝑌 = 𝑓𝐴 𝐵1, … , 𝐵𝑝, 𝑋1, … , 𝑋 𝑘
𝑌 = 𝑓𝐵 𝐵1, … , 𝐵𝑝, 𝑋1, … , 𝑋 𝑘
𝑓𝐴 𝐵1,… , 𝐵𝑝, 𝑋1, … , 𝑋 𝑘
Response under
treatment A
Biomarker 𝐵1, … , 𝐵𝑝
Other characteristica 𝑋1, … , 𝑋 𝑘
𝑓𝐵 𝐵1,… , 𝐵𝑝, 𝑋1,… , 𝑋 𝑘
Response under
treatment B
„Virtual twin“
Formalization: What is a subgroup identification method?
1
Training data
New patient
Image source: www.maxpixel.net
The cross-validated subgroup assignment
• After cross-validation, each patient has a cross-validated subgroup assignment.
1
Split dataset into k blocks (folds).
For each fold:
(k)
Train subgroup model
(k)
Assign patient to the subgroup
or its complement
• After cross-validation, each patient has a cross-validated subgroup assignment.
• The cross-validated relative treatment benefit (e.g. hazard ratio) is the relative treatment benefit
in the cross-validated subgroup.
How to estimate the relative treatment benefit?
1
Cross-validation step 1 2 3 4 5 6 7 8 9 10
Selected cut-off 1.18 0.99 0.88 0.86 0.98 1.13 0.86 0.86 0.98 0.84
Direction ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤
What does the literature say?
• Many papers on subgroup identification methods ...
 evaluate if the method is able to detect the „correct“ subgroup (e.g. sensitivity, specificity)
 but do not evaluate if the subgroup effect is correctly estimated.
• But in general, there is a lot of work on subgroup effect estimation.
However, many papers (only) consider the setting where there is a pre-defined set of subgroups.
 Bootstrap approaches are most similar to the proposed cross-validation approach.
• Combining subgroup identification and cross-validation is not new!
 Freidlin, B., Jiang, W. and Simon, R., 2010.
The cross-validated adaptive signature design.
Clinical Cancer Research, 16(2), pp.691-698.
 Matsui, S., Simon, R., Qu, P., Shaughnessy, J.D., Barlogie, B. and Crowley, J., 2012.
Developing and validating continuous genomic signatures in randomized clinical trials for predictive
medicine.
Clinical Cancer Research, 18(21), pp.6065-6073.
1
Simulation study I - univariate cut-off search
• Two-arm clinical trial (1:1 allocation ratio) with endpoint progression-free survival
• One continuous biomarker with relationship hazard ratio <-> biomarker
a) Linear predictive effect
b) Step-wise predictive effect
c) No predictive effect
• Simulation of training data (n=75, 150, 300) and test data (n=1000) (1000 times)
1
Training set (n=75, 150, 300)
1. Optimize cut-off c by minimizing interaction p-value.
2. Compute 𝑯𝑹 𝒕𝒓𝒂𝒊𝒏 of the identified subgroup.
(> c or ≤ c)
3. Compute cross-validated hazard ratio ratio 𝑯𝑹 𝑪𝑽
(using 10-fold cross-validation)
Test set (n=1000)
4. Compute 𝑯𝑹 𝒕𝒆𝒔𝒕 based on the
cut-off c and the direction.
(> c or ≤ c)
Results: n=150, linear effect
1
Results – comparison HRtrain / HRtest
1
Results – comparison HRCV / HRtest
1
Simulation study II – Virtual Twin Method
 Binary endpoint (response yes/no)
 n=1000 (!) patients
 15 normally distributed variables
 The true subgroup is defined by the first two variables.
1
(Simulation setting from the paper)
Summary of the simulation studies
• The simulations indicate that
the ‘naïve’ subgroup effects lead to substantial overfitting.
overfitting also occurs for large sample sizes.
on average, cross-validated subgroup effects are a good estimate of the subgroup
effects on an independent test set.
• However, results are quite variable.
Both the cross-validated as well as the test set effects vary substantially.
Further simulations indicate that the variability may also be due to the variability of the
subgroup detection methods.
1
Let us go back to our case study …
• The goal is to define a subgroup based on the expression level of p=50 genes.
• Approach: Multivariate Cox proportional hazard model
ℎ 𝑡 = ℎ0(𝑡) ∙ 𝑒𝑥𝑝 𝛽 𝑇 ∙ 𝑇 +
𝑗=1
𝑝
𝛽𝑗,𝑋 ∙ 𝑋𝑗 +
𝑗=1
𝑝
𝛽𝑗,𝐼 ∙ 𝑋𝑗 ∙ 𝑇
1
Treatment
effect
Biomarker
effect Biomarker-dependent
treatment effect
1) Fit the model using regularized regression (here, Ridge regression)
2) Obtain a signature S via
𝐻𝑅 𝐴 𝑣𝑠 𝐵 = 𝑒𝑥𝑝 𝛽 𝑇 +
𝑗=1
𝑝
𝛽𝑗,𝐼 ∙ 𝑋𝑗
3) Cut-off: At the median value of S (could be optimized as well)
= - S
Results for the predictive signature (leave-one out validation)
1
Other measures of interest may be cross-validated as well…
Summary
• For many data-driven subgroup identification algorithms, the estimated treatment
effects are too optimistic (“overfitting”).
• This is also the case for
seemingly simple examples (e.g. cut-off detection) and
large sample sizes (e.g. n=1000 for p=15 variables).
• It is important to obtain more realistic estimates.
• The investigated framework may be applied to all endpoint types and any
subgroup identification algorithm.
• On average, the simulations indicate that the cross-validated relative treatment
benefit is a good estimate of the true relative treatment benefit.
1
Biostatistical services at Staburo
Clinical Statistics Translational
Medicine &
Biomarkers
Statistical
Programming with
CDISC
Pharmacokinetics/-
dynamics
Health Technology
Assessment
Non-clinical
Statistics
Subgroup effects: training set, cross-validation and test set
1
A small simulation
• Randomly permute the endpoint (time,status) within each treatment arm.
In this way, the relationship biomarker <--> relative treatment benefit is broken.
• Find the cut-off that minimizes the interaction p-value
1
Hazard ratio in
the trial population
Simulation study II – Virtual Twin Method
 Binary endpoint (response yes/no)
 n=1000 (!) patients
 15 normally distributed variables
 True subgroup is defined by the first two variables
1
(Simulation setting from the paper)
Mok TS, Wu Y-L, Thongprasert S, et al. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma.
N Engl J Med. 2009;361(10):947-957
Properties of baseline variables
A variable is predictive if the relative treatment benefit
(experimental vs. control) depends on the biomarker.
“Potential patient selection marker”
Properties of baseline variables
A variable is prognostic if it informs about a likely outcome in absence or irrespective of treatment
received.
Note: Most often, this is only investigated in the control arm. (“Placebo”? “Standard of care”?)
Within each treatment
arm, EGFR positive
patients do better
compared to EGFR
negative patients.
Note: In the recent FLAURA trial, the control treatment was Gefitinib / Erlotinib (and was compared to Osimertinib).
Predictive effects and interaction models
odds =
rate
100 − rate
odds ratio = relative treatment benefit
𝑙𝑜𝑔
𝑃(𝑌 = 1)
1 − 𝑃(𝑌 = 1)
= 𝛽0 + 𝛽 𝑇 ∙ 𝑇 + 𝛽 𝐵 ∙ 𝐵𝑀 + 𝛽𝐼 ∙ 𝑇 ∙ 𝐵𝑀
Odds ratio ..
… for a biomarker positive patient: exp(𝛽 𝑇 + 𝛽𝐼)
… for a biomarker negative patient: exp(𝛽 𝑇)
The biomarker is predictive
if 𝛽𝐼 ≠ 0.

More Related Content

What's hot

Chapter 4(2) Hypothesisi Testing
Chapter 4(2)  Hypothesisi TestingChapter 4(2)  Hypothesisi Testing
Chapter 4(2) Hypothesisi Testingghalan
 
Bowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis PresentationBowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis PresentationDaniel Bowen
 
Towards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction ModelsTowards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction ModelsLevi Waldron
 
Chapter 4(2) Hypothesisi Testing
Chapter 4(2) Hypothesisi TestingChapter 4(2) Hypothesisi Testing
Chapter 4(2) Hypothesisi TestingSumit Prajapati
 
2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altmanrgveroniki
 
Imran rizvi statistics in meta analysis
Imran rizvi statistics in meta analysisImran rizvi statistics in meta analysis
Imran rizvi statistics in meta analysisImran Rizvi
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecturedrmomusa
 
PRP for Wound healing
PRP for Wound healingPRP for Wound healing
PRP for Wound healingSchuco
 
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...Cytel USA
 
2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sternergveroniki
 
Intent-to-Treat (ITT) Analysis in Randomized Clinical Trials
Intent-to-Treat (ITT) Analysis in Randomized Clinical TrialsIntent-to-Treat (ITT) Analysis in Randomized Clinical Trials
Intent-to-Treat (ITT) Analysis in Randomized Clinical TrialsMike LaValley
 
Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...
Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...
Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...IOSR Journals
 
Detecting flawed meta analyses
Detecting flawed meta analysesDetecting flawed meta analyses
Detecting flawed meta analysesJames Coyne
 
A Lenda do Valor P
A Lenda do Valor PA Lenda do Valor P
A Lenda do Valor PFUAD HAZIME
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiranKiran Ramakrishna
 
Measures and feedback 2016
Measures and feedback 2016Measures and feedback 2016
Measures and feedback 2016Scott Miller
 
Medical Statistics Pt 2
Medical Statistics Pt 2Medical Statistics Pt 2
Medical Statistics Pt 2Fastbleep
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Kazuki Yoshida
 
Reese Norsworthy Rowlands
Reese Norsworthy RowlandsReese Norsworthy Rowlands
Reese Norsworthy RowlandsBarry Duncan
 
Advanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjadAdvanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjadHeadDPT
 

What's hot (20)

Chapter 4(2) Hypothesisi Testing
Chapter 4(2)  Hypothesisi TestingChapter 4(2)  Hypothesisi Testing
Chapter 4(2) Hypothesisi Testing
 
Bowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis PresentationBowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
Bowen & Neill (2013) Adventure Therapy Meta-Analysis Presentation
 
Towards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction ModelsTowards Replicable and Genereralizable Genomic Prediction Models
Towards Replicable and Genereralizable Genomic Prediction Models
 
Chapter 4(2) Hypothesisi Testing
Chapter 4(2) Hypothesisi TestingChapter 4(2) Hypothesisi Testing
Chapter 4(2) Hypothesisi Testing
 
2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman
 
Imran rizvi statistics in meta analysis
Imran rizvi statistics in meta analysisImran rizvi statistics in meta analysis
Imran rizvi statistics in meta analysis
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecture
 
PRP for Wound healing
PRP for Wound healingPRP for Wound healing
PRP for Wound healing
 
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...
 
2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne
 
Intent-to-Treat (ITT) Analysis in Randomized Clinical Trials
Intent-to-Treat (ITT) Analysis in Randomized Clinical TrialsIntent-to-Treat (ITT) Analysis in Randomized Clinical Trials
Intent-to-Treat (ITT) Analysis in Randomized Clinical Trials
 
Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...
Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...
Response of Watermelon to Five Different Rates of Poultry Manure in Asaba Are...
 
Detecting flawed meta analyses
Detecting flawed meta analysesDetecting flawed meta analyses
Detecting flawed meta analyses
 
A Lenda do Valor P
A Lenda do Valor PA Lenda do Valor P
A Lenda do Valor P
 
Statistics basics for oncologist kiran
Statistics basics for oncologist kiranStatistics basics for oncologist kiran
Statistics basics for oncologist kiran
 
Measures and feedback 2016
Measures and feedback 2016Measures and feedback 2016
Measures and feedback 2016
 
Medical Statistics Pt 2
Medical Statistics Pt 2Medical Statistics Pt 2
Medical Statistics Pt 2
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
 
Reese Norsworthy Rowlands
Reese Norsworthy RowlandsReese Norsworthy Rowlands
Reese Norsworthy Rowlands
 
Advanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjadAdvanced Biostatistics and Data Analysis abdul ghafoor sajjad
Advanced Biostatistics and Data Analysis abdul ghafoor sajjad
 

Similar to Avoid overfitting in precision medicine: How to use cross-validation to reliably estimate subgroup effects

Causal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellowsCausal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellowsPavlos Msaouel, MD, PhD
 
Methods of randomisation in clinical trials
Methods of randomisation in clinical trialsMethods of randomisation in clinical trials
Methods of randomisation in clinical trialsAmy Mehaboob
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Evangelos Kritsotakis
 
Sample size in clinical research 2021 april
Sample size in clinical research 2021 aprilSample size in clinical research 2021 april
Sample size in clinical research 2021 aprilINAAMUL HAQ
 
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfEffective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfPubrica
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
biostatists presentation
biostatists presentationbiostatists presentation
biostatists presentationAnil kumar
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...Levi Waldron
 
UAB Pulmonary board review study design and statistical principles
UAB Pulmonary board review study  design and statistical principles UAB Pulmonary board review study  design and statistical principles
UAB Pulmonary board review study design and statistical principles Terry Shaneyfelt
 
Sample determinants and size
Sample determinants and sizeSample determinants and size
Sample determinants and sizeTarek Tawfik Amin
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesnQuery
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesnQuery
 
25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.ppt25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.pptPriyankaSharma89719
 
Sample size
Sample sizeSample size
Sample sizezubis
 
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019Cecile Janssens
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test SelectionVaggelis Vergoulas
 
Lemeshow samplesize
Lemeshow samplesizeLemeshow samplesize
Lemeshow samplesize1joanenab
 

Similar to Avoid overfitting in precision medicine: How to use cross-validation to reliably estimate subgroup effects (20)

Causal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellowsCausal inference lecture to Texas Children's fellows
Causal inference lecture to Texas Children's fellows
 
Methods of randomisation in clinical trials
Methods of randomisation in clinical trialsMethods of randomisation in clinical trials
Methods of randomisation in clinical trials
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)
 
Sample size in clinical research 2021 april
Sample size in clinical research 2021 aprilSample size in clinical research 2021 april
Sample size in clinical research 2021 april
 
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfEffective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
biostatists presentation
biostatists presentationbiostatists presentation
biostatists presentation
 
Copenhagen 2008
Copenhagen 2008Copenhagen 2008
Copenhagen 2008
 
The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...The impact of different sources of heterogeneity on loss of accuracy from gen...
The impact of different sources of heterogeneity on loss of accuracy from gen...
 
UAB Pulmonary board review study design and statistical principles
UAB Pulmonary board review study  design and statistical principles UAB Pulmonary board review study  design and statistical principles
UAB Pulmonary board review study design and statistical principles
 
Copenhagen 23.10.2008
Copenhagen 23.10.2008Copenhagen 23.10.2008
Copenhagen 23.10.2008
 
Sample determinants and size
Sample determinants and sizeSample determinants and size
Sample determinants and size
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size Challenges
 
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesExtending A Trial’s Design Case Studies Of Dealing With Study Design Issues
Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues
 
25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.ppt25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.ppt
 
Sample size
Sample sizeSample size
Sample size
 
Oac guidelines
Oac guidelinesOac guidelines
Oac guidelines
 
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019Lessons learned in polygenic risk research | Grand Rapids, MI 2019
Lessons learned in polygenic risk research | Grand Rapids, MI 2019
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test Selection
 
Lemeshow samplesize
Lemeshow samplesizeLemeshow samplesize
Lemeshow samplesize
 

Recently uploaded

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

Avoid overfitting in precision medicine: How to use cross-validation to reliably estimate subgroup effects

  • 1. How to use cross-validation to reliably estimate subgroup effects Nicole Krämer*, Josef Höfler*, Carina Ittrich# PSI 2019 - Data Science and Machine Learning Session *Staburo GmbH, Munich, Germany #Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach an der Riss, Germany
  • 2. The goals of our presentation are to … … make you aware how strongly subgroup identification methods can overfit, … explain how cross-validation can help to obtain more realistic subgroup effects. … show in simulations that cross-validation leads to more accurate estimates for subgroup effects. …illustrate how you can apply cross-validated subgroup effects in a clinical trial. 1 It is not our goal to  find good or bad subgroup identification methods.  discuss the usefulness of subgroup identification in general.
  • 3. (Hypothetical) case study • Randomized phase II trial comparing treatment A and B in a parallel design. • Endpoint: Progression-free survival • Relative treatment benefit: hazard ratio (A versus B) • Biomarker • Solid evidence: Expression level of gene SLDEV • Exploratory: expression levels of 50 genes Trial population (n=200) Treatment A Treatment B 100 patients 100 patients How to identify a subgroup based on these biomarkers? Modified „Breast“ dataset from the R package biospear.
  • 4. • In this case, subgroup identification often corresponds to finding a cutoff c and a direction. • Popular strategy: Go through a list of cut-offs and find the „best one“. a) Minimize the interaction p-value b) Minimize min(HR<=c,HR>c) c) Maximize the partial log-likelihood from the interaction model d) …. Important (boring?) example: one continuous biomarker 1 Typically, a constraint is added to ensure that the subgroups are sufficiently large. Do we really believe that the true hazard ratio is 0.53?? In this example, the cut-off leads to the smallest interaction p-value (criterion a).
  • 5. Foster, J. C., Taylor, J. M., & Ruberg, S. J. (2011). Subgroup identification from randomized clinical trial data. Statistics in medicine, 30(24), 2867-2880 1. In each treatment arm, model the probability of a response (e.g. via random forests) 2. For each patient, predict the probability of a response 3. Define predicted relative treatment benefit: 𝑓𝐴 − 𝑓𝐵. 4. Learn classification tree on predicted relative treatment benefit. Another example: The Virtual Twin Method 𝑌 = 𝑓𝐴 𝐵1, … , 𝐵𝑝, 𝑋1, … , 𝑋 𝑘 𝑌 = 𝑓𝐵 𝐵1, … , 𝐵𝑝, 𝑋1, … , 𝑋 𝑘 𝑓𝐴 𝐵1,… , 𝐵𝑝, 𝑋1, … , 𝑋 𝑘 Response under treatment A Biomarker 𝐵1, … , 𝐵𝑝 Other characteristica 𝑋1, … , 𝑋 𝑘 𝑓𝐵 𝐵1,… , 𝐵𝑝, 𝑋1,… , 𝑋 𝑘 Response under treatment B „Virtual twin“
  • 6. Formalization: What is a subgroup identification method? 1 Training data New patient Image source: www.maxpixel.net
  • 7. The cross-validated subgroup assignment • After cross-validation, each patient has a cross-validated subgroup assignment. 1 Split dataset into k blocks (folds). For each fold: (k) Train subgroup model (k) Assign patient to the subgroup or its complement
  • 8. • After cross-validation, each patient has a cross-validated subgroup assignment. • The cross-validated relative treatment benefit (e.g. hazard ratio) is the relative treatment benefit in the cross-validated subgroup. How to estimate the relative treatment benefit? 1 Cross-validation step 1 2 3 4 5 6 7 8 9 10 Selected cut-off 1.18 0.99 0.88 0.86 0.98 1.13 0.86 0.86 0.98 0.84 Direction ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤
  • 9. What does the literature say? • Many papers on subgroup identification methods ...  evaluate if the method is able to detect the „correct“ subgroup (e.g. sensitivity, specificity)  but do not evaluate if the subgroup effect is correctly estimated. • But in general, there is a lot of work on subgroup effect estimation. However, many papers (only) consider the setting where there is a pre-defined set of subgroups.  Bootstrap approaches are most similar to the proposed cross-validation approach. • Combining subgroup identification and cross-validation is not new!  Freidlin, B., Jiang, W. and Simon, R., 2010. The cross-validated adaptive signature design. Clinical Cancer Research, 16(2), pp.691-698.  Matsui, S., Simon, R., Qu, P., Shaughnessy, J.D., Barlogie, B. and Crowley, J., 2012. Developing and validating continuous genomic signatures in randomized clinical trials for predictive medicine. Clinical Cancer Research, 18(21), pp.6065-6073. 1
  • 10. Simulation study I - univariate cut-off search • Two-arm clinical trial (1:1 allocation ratio) with endpoint progression-free survival • One continuous biomarker with relationship hazard ratio <-> biomarker a) Linear predictive effect b) Step-wise predictive effect c) No predictive effect • Simulation of training data (n=75, 150, 300) and test data (n=1000) (1000 times) 1 Training set (n=75, 150, 300) 1. Optimize cut-off c by minimizing interaction p-value. 2. Compute 𝑯𝑹 𝒕𝒓𝒂𝒊𝒏 of the identified subgroup. (> c or ≤ c) 3. Compute cross-validated hazard ratio ratio 𝑯𝑹 𝑪𝑽 (using 10-fold cross-validation) Test set (n=1000) 4. Compute 𝑯𝑹 𝒕𝒆𝒔𝒕 based on the cut-off c and the direction. (> c or ≤ c)
  • 12. Results – comparison HRtrain / HRtest 1
  • 13. Results – comparison HRCV / HRtest 1
  • 14. Simulation study II – Virtual Twin Method  Binary endpoint (response yes/no)  n=1000 (!) patients  15 normally distributed variables  The true subgroup is defined by the first two variables. 1 (Simulation setting from the paper)
  • 15. Summary of the simulation studies • The simulations indicate that the ‘naïve’ subgroup effects lead to substantial overfitting. overfitting also occurs for large sample sizes. on average, cross-validated subgroup effects are a good estimate of the subgroup effects on an independent test set. • However, results are quite variable. Both the cross-validated as well as the test set effects vary substantially. Further simulations indicate that the variability may also be due to the variability of the subgroup detection methods. 1
  • 16. Let us go back to our case study … • The goal is to define a subgroup based on the expression level of p=50 genes. • Approach: Multivariate Cox proportional hazard model ℎ 𝑡 = ℎ0(𝑡) ∙ 𝑒𝑥𝑝 𝛽 𝑇 ∙ 𝑇 + 𝑗=1 𝑝 𝛽𝑗,𝑋 ∙ 𝑋𝑗 + 𝑗=1 𝑝 𝛽𝑗,𝐼 ∙ 𝑋𝑗 ∙ 𝑇 1 Treatment effect Biomarker effect Biomarker-dependent treatment effect 1) Fit the model using regularized regression (here, Ridge regression) 2) Obtain a signature S via 𝐻𝑅 𝐴 𝑣𝑠 𝐵 = 𝑒𝑥𝑝 𝛽 𝑇 + 𝑗=1 𝑝 𝛽𝑗,𝐼 ∙ 𝑋𝑗 3) Cut-off: At the median value of S (could be optimized as well) = - S
  • 17. Results for the predictive signature (leave-one out validation) 1 Other measures of interest may be cross-validated as well…
  • 18. Summary • For many data-driven subgroup identification algorithms, the estimated treatment effects are too optimistic (“overfitting”). • This is also the case for seemingly simple examples (e.g. cut-off detection) and large sample sizes (e.g. n=1000 for p=15 variables). • It is important to obtain more realistic estimates. • The investigated framework may be applied to all endpoint types and any subgroup identification algorithm. • On average, the simulations indicate that the cross-validated relative treatment benefit is a good estimate of the true relative treatment benefit. 1
  • 19. Biostatistical services at Staburo Clinical Statistics Translational Medicine & Biomarkers Statistical Programming with CDISC Pharmacokinetics/- dynamics Health Technology Assessment Non-clinical Statistics
  • 20. Subgroup effects: training set, cross-validation and test set 1
  • 21. A small simulation • Randomly permute the endpoint (time,status) within each treatment arm. In this way, the relationship biomarker <--> relative treatment benefit is broken. • Find the cut-off that minimizes the interaction p-value 1 Hazard ratio in the trial population
  • 22. Simulation study II – Virtual Twin Method  Binary endpoint (response yes/no)  n=1000 (!) patients  15 normally distributed variables  True subgroup is defined by the first two variables 1 (Simulation setting from the paper)
  • 23. Mok TS, Wu Y-L, Thongprasert S, et al. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N Engl J Med. 2009;361(10):947-957 Properties of baseline variables A variable is predictive if the relative treatment benefit (experimental vs. control) depends on the biomarker. “Potential patient selection marker”
  • 24. Properties of baseline variables A variable is prognostic if it informs about a likely outcome in absence or irrespective of treatment received. Note: Most often, this is only investigated in the control arm. (“Placebo”? “Standard of care”?) Within each treatment arm, EGFR positive patients do better compared to EGFR negative patients. Note: In the recent FLAURA trial, the control treatment was Gefitinib / Erlotinib (and was compared to Osimertinib).
  • 25. Predictive effects and interaction models odds = rate 100 − rate odds ratio = relative treatment benefit 𝑙𝑜𝑔 𝑃(𝑌 = 1) 1 − 𝑃(𝑌 = 1) = 𝛽0 + 𝛽 𝑇 ∙ 𝑇 + 𝛽 𝐵 ∙ 𝐵𝑀 + 𝛽𝐼 ∙ 𝑇 ∙ 𝐵𝑀 Odds ratio .. … for a biomarker positive patient: exp(𝛽 𝑇 + 𝛽𝐼) … for a biomarker negative patient: exp(𝛽 𝑇) The biomarker is predictive if 𝛽𝐼 ≠ 0.

Editor's Notes

  1. RS