Moving beyond the comfort zone inMoving beyond the comfort zone in
practicing translational statistics
L.J. Wei
H d U i itHarvard University
Why are we staying in a
“Comfort Zone” ?“Comfort Zone” ?
 Generally following a fixed pattern for conducting
studies
 Are we like lawyers?
 Avoiding delay of review processes? Avoiding delay of review processes?
What is the goal of a clinical
study?study?
 Use efficient and reliable procedures to obtain
robust, clinically interpretable results with respect
to risk-benefit perspectives at the patient’s level.
What are the problems?What are the problems?
 The conventional way to conduct trials gives us The conventional way to conduct trials gives us
fragmentary information
 Lack of clinically meaningful totality evidence
 Difficult to use the trial results for future patient’s
management
A Few Methodology IssuesA Few Methodology Issues
1. Estimation vs. testing
 P-value provides little clinical information about
treatment effectiveness
Th i f th ff t ( ffi d t i it ) The size of the effects (efficacy and toxicity)
matters
 Design using interval estimates is quite flexible Design using interval estimates is quite flexible
 Almost everything we want to know via testing,
we can get from estimatione ca ge o es a o
TREAT study for EPO CV safetyTREAT study for EPO CV safety
 If we follow the patients up to 48 month,
the control arm's average stroke-free time is
46.9 months and the Darb arm's is 46 months.
The difference is 0 9 month (0 95 CI: 0 4m 1 4m)The difference is 0.9 month (0.95 CI: 0.4m, 1.4m)
with p<0.001 (very significant).
2. How do we define a primary endpoint with
multiple outcomes?
 What is current practice?
C t/ it l Component/composite analyses
 Efficacy and toxicity (how to connect them
together?)together?)
 Disease burden measure?
 Competing risks problem?Competing risks problem?
 Informative dropout?
Example : Beta-Blocker Evaluation of
S i l (BEST) T i l (NEJM 2001)Survival (BEST) Trial (NEJM, 2001)
 Study
B i d l l l b Bucindolol vs. placebo
 patients with advanced chronic heart failure
-- n = 2707
f Average follow-up: 2 years
 Primary endpoint: overall survival
H d ti f d th 0 90 ( l 0 1) Hazard ratio for death = 0.90 (p-value = 0.1)
BEST TrialBEST Trial
Possible solutions?Possible solutions?
 Using the patient’s disease burden or progression
information during the entire followup to define
the “responder”
 Creating more than one response categories:
ordinal categorical responseordinal categorical response
 Brian Claggett’s thesis papera C agge s es s pape
BEST Example: 8 CategoriesBEST Example: 8 Categories
 1: No events1: No events
 2: Alive, non-HF hospitalization only
 3: Alive, 1 HF hosp., p
 4: Alive, >1 HF hosp.
 5: Late non-CV death (>12 months)( )
 6: Late CV death (>12 months)
 7: Early non-CV death (<12 months)
 8: Early CV death (<12 months)
3. How to handle dropouts or competing risks?
 LOCF? BOCF?
 MMRM (model based)
 Pattern mixture model (cannot handle non-
random missing)
U i d l i ith diff t t Using responder analysis with different ways to
define informative dropouts for sensitivity analysis
4. Analysis of Covariance
 Compare two treatments with baseline
adjustments via regression models
F li d l diff t dj t t For nonlinear model, different adjustments may
lead to incoherent results
 The inadequacy of the Cox ANCOVA The inadequacy of the Cox ANCOVA
Possible solutions?
 Using the augmentation method by Tsiatis et al;
Tian et al.
N d t if th b li i t No need to pre-specify the baseline covariates,
but a set of potential covariates in the adjustment
processprocess
5. Data monitoring
 Heavily utilizing p-value or conditional power
 A low conditional power may indicate that the
l i i t ll th i lsample size is too small or there is no real
treatment difference
 Using estimation and prediction for monitoring? Using estimation and prediction for monitoring?
6. Stratified medicine (personalized medicine)?
 A negative trial does not mean the treatment is no
good for anyone
A iti t i l d t it k f A positive trial does not mean it works for
everyone
 The usual subgroup analysis is not adequate to The usual subgroup analysis is not adequate to
address this issue
 Need a built-in pre-specified procedure foreed a bu p e spec ed p ocedu e o
identifying patients who benefit from treatment
7. Identify patients who respond the new
therapy (predictive enrichment)
8. How to monitor safety?
 What is the conventional way?
 Component-wise tabulation or analysis?
 No information about multiple AE events at the
patient level
G hi l th d? Graphical method?
9. Quantifying treatment contrast (difference)?
 Should be model-free parameter
 Using difference of means, median, etc.
 For censored data, using a constant hazard ratio
(heavily model-based)?
M d l b d i diffi lt t i t t Model-based measure is difficult to interpret or
validate
Issues for the hazard ratio
estimateestimate
 Hazard ratio estimate is routinely used for
designing, monitoring and analyzing clinical
studies in survival analysis
Model Free Parameter for Treatment
ContrastContrast
* Considering a two treatment comparison study in
“survival analysis”
* How do we quantify the treatment difference?
M di f il ti ( t b ti bl )• Median failure time (may not be estimable);
• t-year survival rate (not an overall measure)?
A t t h d ti ti ith th l• A constant hazard ratio over time with the log-
rank test
Eastern Cooperative Oncology
Gro pGroup
 E4A03 trial to compare low- and high-dose
dexamethasone for naïve patients with multiple
myeloma
 The primary endpoint is the survival time The primary endpoint is the survival time
 n=445
 The trial stopped early at the second interim The trial stopped early at the second interim
analysis; the low dose was superior.
 Patients on high-dose arm were then receiveda e s o g dose a e e e ece ed
low-dose and follow-up for overall survival were
continued.
A Cancer Study ExampleA Cancer Study Example
1.00.8
Group 1
Group 2
0.6
Probability
Group 2
0.20.4
P
0.0
0 10 20 30 40
Month
 The proportional hazards assumption is not valid
 The PH estimator is estimating a quantity which
cannot be interpreted and, worse, depends on the
study specific censoring distributionsstudy-specific censoring distributions
 Any model-based treatment contrast has such
issues (need a model-free parameter)issues (need a model free parameter)
 The logrank test is not powerful
 Conventional analysis:
 Log-rank test: p=0.47
 Hazard Ratio: HR=0.87 (0.60, 1.27)
What is the alternative way for
s r i al anal sis?survival analysis?
 Using the area under the curve of Kaplan-Meier
estimate up to a fixed time point
 Restricted mean survival time Restricted mean survival time
 Model-free and a global measure of efficacyModel free and a global measure of efficacy
 Can be estimated even under heavy censoring
Cancer Study ExampleCancer Study Example
Restricted Mean (up to 40 months):
 35.4 months vs. 33.3 months
 ∆ = 2 1 (0 1 4 2) months; p=0 04 ∆ = 2.1 (0.1, 4.2) months; p=0.04
 Ratio of Survival time = 35.4/33.3 = 1.06 (1.00,
1.13)
 Ratio of time lost = 6.7/4.6 = 1.46 (1.02, 2.13)
10. Post-marketing/safety studies ?
 It is not appropriate to use an event driven
procedure to conduct a safety study.
Th t t i l th ti tt The event rate is low, the exposure time matters
 Requires lot of resources (large or long-term
study)study)
 Meta analysis; observational studies
CV safety study for anti-diabetes
dr gsdrugs
 Event driven studies, that is, we need to have a
pre-specified # of events so the resulting
confidence interval for the treatment difference is
“narrow”narrow
 For example, the upper bound of 95% confidenceFor example, the upper bound of 95% confidence
interval is less than 1.3
The EXAMINE trial (alogliptin)The EXAMINE trial (alogliptin)
NEJM, October 3, 2013
RMST (24 months):
Placebo 21.9 (21.7, 22.2)
Alogliptin 22 0 (21 8 22 3)
RMST (30 months):
Placebo 27.1 (26.7, 27.4)
Alogliptin 27 2 (26 9 27 5)Alogliptin 22.0 (21.8, 22.3)
Difference -0.08 (-0.39, 0.24)
Ratio 1.00 (0.98, 1.01)
Alogliptin 27.2 (26.9, 27.5)
Difference -0.12 (-0.56, 0.33)
Ratio 1.00 (0.98, 1.01)
Whole data
N=5380
1/2 of data
N=2690
1/3 of data
N=1793
1/4 of data
N=1345N 5380 N 2690 N 1793 N 1345
Hazard Ratio 0.824, 1.129 0.775, 1.213 0.742, 1.283 0.711, 1.338
Difference in RMST
(30m)
-0.558, 0.325 -0.721,
0.500
-0.852,
0.647
-0.980, 0.753
 11. Meta analysis for safety issues
( ) f Nissen and Wolski (2007) performed a meta analysis
to examine whether Rosiglitazone (Avandia, GSK), a
drug for treating type 2 diabetes mellitus significantlydrug for treating type 2 diabetes mellitus, significantly
increases the risk of MI or CVD related death.
Example
Eff t f R i lit MI CVD D thEffect of Rosiglitazone on MI or CVD Deaths
 Avandia was introduced in 1999 and is widely used
as monotherapy or in fixed-dose combinations with
ith A d t A d leither Avandamet or Avandaryl.
 The original approval of Avandia was based on its
ability in reducing blood glucose and glycated
hemoglobin levels.
 Initial studies were not adequately powered to
determine the effects of this agent on micro- or
macro- vascular complications of diabetes, including
cardiovascular morbidity and mortality.
Example
Eff t f R i lit MI CVD D thEffect of Rosiglitazone on MI or CVD Deaths
 However the effect of any anti-diabetic therapy on However, the effect of any anti diabetic therapy on
cardiovascular outcomes is particularly important
because more than 65% of deaths in patients with
di b t f di ldiabetes are from cardiovascular causes.
 Of 116 screened studies, 48 satisfied the inclusion
criteria for the analysis proposed in Nissen and
Wolski (2007).
42 studies were reported in Nissen and Wolski (2007) the42 studies were reported in Nissen and Wolski (2007), the
remaining 6 studies have zero MI or CVD death
10 studies with zero MI events
25 t di ith CVD l t d d th25 studies with zero CVD related deaths
 Event Rates from 0% to 2.70% for MI
 Event Rates from 0% to 1.75% for CVD Death
MI CVD Death
?????? ??????
Log Odds Ratio
95% CI: (1 03 1 98); p value = 0 03 95% CI: (0 98 2 74); p value = 0 06
Log Odds Ratio
95% CI: (1.03, 1.98); p-value = 0.03
(in favor of the control)
95% CI: (0.98, 2.74); p-value = 0.06
QuestionsQuestions
 Rare events?
 How to utilize studies with 0/0 events?
f f ? Validity of asymptotic inference?
 Exact inference?
 Choice of effect measure? Choice of effect measure?
 Between Study Heterogeneity?
 Common treatment effect or study specific treatment Common treatment effect or study specific treatment
effect?
 The number of studies not large?g
Exact Inference Asymptotic Inference
MIM
%18.0ˆ  %19.0ˆ 
95% CI: (-0.08,
0.38)%
P-value = 0.27
95% CI: (0.02, 0.42)%
P-value = 0.03
Exact Inference Asymptotic Inference
DeathCVDD
%063.0ˆ  %11.0ˆ 
95% CI: (-0.13,
0.23)%
P-value = 0.83
95% CI: (0.00, 0.31)%
P-value = 0.05
SummarySummary
C ld dif t ti ti l t i i ? Could we modify our statistical training?
 Teaching young generations “how, where and what to
learn”
 Learning from doing a project with mentoring?
 Could we have a coherent approach from the
beginning to the end for a research project?beginning to the end for a research project?
 George Box: Instead of figuring out the optimalg g g p
solution to a wrong problem, try to get A solution to a
right problem.
 Asking ourselves “What is the question?”Asking ourselves What is the question?

2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translational Stats

  • 1.
    Moving beyond thecomfort zone inMoving beyond the comfort zone in practicing translational statistics L.J. Wei H d U i itHarvard University
  • 3.
    Why are westaying in a “Comfort Zone” ?“Comfort Zone” ?  Generally following a fixed pattern for conducting studies  Are we like lawyers?  Avoiding delay of review processes? Avoiding delay of review processes?
  • 4.
    What is thegoal of a clinical study?study?  Use efficient and reliable procedures to obtain robust, clinically interpretable results with respect to risk-benefit perspectives at the patient’s level.
  • 5.
    What are theproblems?What are the problems?  The conventional way to conduct trials gives us The conventional way to conduct trials gives us fragmentary information  Lack of clinically meaningful totality evidence  Difficult to use the trial results for future patient’s management
  • 6.
    A Few MethodologyIssuesA Few Methodology Issues 1. Estimation vs. testing  P-value provides little clinical information about treatment effectiveness Th i f th ff t ( ffi d t i it ) The size of the effects (efficacy and toxicity) matters  Design using interval estimates is quite flexible Design using interval estimates is quite flexible  Almost everything we want to know via testing, we can get from estimatione ca ge o es a o
  • 7.
    TREAT study forEPO CV safetyTREAT study for EPO CV safety  If we follow the patients up to 48 month, the control arm's average stroke-free time is 46.9 months and the Darb arm's is 46 months. The difference is 0 9 month (0 95 CI: 0 4m 1 4m)The difference is 0.9 month (0.95 CI: 0.4m, 1.4m) with p<0.001 (very significant).
  • 8.
    2. How dowe define a primary endpoint with multiple outcomes?  What is current practice? C t/ it l Component/composite analyses  Efficacy and toxicity (how to connect them together?)together?)  Disease burden measure?  Competing risks problem?Competing risks problem?  Informative dropout?
  • 9.
    Example : Beta-BlockerEvaluation of S i l (BEST) T i l (NEJM 2001)Survival (BEST) Trial (NEJM, 2001)  Study B i d l l l b Bucindolol vs. placebo  patients with advanced chronic heart failure -- n = 2707 f Average follow-up: 2 years  Primary endpoint: overall survival H d ti f d th 0 90 ( l 0 1) Hazard ratio for death = 0.90 (p-value = 0.1)
  • 10.
  • 11.
    Possible solutions?Possible solutions? Using the patient’s disease burden or progression information during the entire followup to define the “responder”  Creating more than one response categories: ordinal categorical responseordinal categorical response  Brian Claggett’s thesis papera C agge s es s pape
  • 12.
    BEST Example: 8CategoriesBEST Example: 8 Categories  1: No events1: No events  2: Alive, non-HF hospitalization only  3: Alive, 1 HF hosp., p  4: Alive, >1 HF hosp.  5: Late non-CV death (>12 months)( )  6: Late CV death (>12 months)  7: Early non-CV death (<12 months)  8: Early CV death (<12 months)
  • 13.
    3. How tohandle dropouts or competing risks?  LOCF? BOCF?  MMRM (model based)  Pattern mixture model (cannot handle non- random missing) U i d l i ith diff t t Using responder analysis with different ways to define informative dropouts for sensitivity analysis
  • 14.
    4. Analysis ofCovariance  Compare two treatments with baseline adjustments via regression models F li d l diff t dj t t For nonlinear model, different adjustments may lead to incoherent results  The inadequacy of the Cox ANCOVA The inadequacy of the Cox ANCOVA
  • 15.
    Possible solutions?  Usingthe augmentation method by Tsiatis et al; Tian et al. N d t if th b li i t No need to pre-specify the baseline covariates, but a set of potential covariates in the adjustment processprocess
  • 16.
    5. Data monitoring Heavily utilizing p-value or conditional power  A low conditional power may indicate that the l i i t ll th i lsample size is too small or there is no real treatment difference  Using estimation and prediction for monitoring? Using estimation and prediction for monitoring?
  • 17.
    6. Stratified medicine(personalized medicine)?  A negative trial does not mean the treatment is no good for anyone A iti t i l d t it k f A positive trial does not mean it works for everyone  The usual subgroup analysis is not adequate to The usual subgroup analysis is not adequate to address this issue  Need a built-in pre-specified procedure foreed a bu p e spec ed p ocedu e o identifying patients who benefit from treatment
  • 18.
    7. Identify patientswho respond the new therapy (predictive enrichment)
  • 19.
    8. How tomonitor safety?  What is the conventional way?  Component-wise tabulation or analysis?  No information about multiple AE events at the patient level G hi l th d? Graphical method?
  • 21.
    9. Quantifying treatmentcontrast (difference)?  Should be model-free parameter  Using difference of means, median, etc.  For censored data, using a constant hazard ratio (heavily model-based)? M d l b d i diffi lt t i t t Model-based measure is difficult to interpret or validate
  • 22.
    Issues for thehazard ratio estimateestimate  Hazard ratio estimate is routinely used for designing, monitoring and analyzing clinical studies in survival analysis
  • 23.
    Model Free Parameterfor Treatment ContrastContrast * Considering a two treatment comparison study in “survival analysis” * How do we quantify the treatment difference? M di f il ti ( t b ti bl )• Median failure time (may not be estimable); • t-year survival rate (not an overall measure)? A t t h d ti ti ith th l• A constant hazard ratio over time with the log- rank test
  • 24.
    Eastern Cooperative Oncology GropGroup  E4A03 trial to compare low- and high-dose dexamethasone for naïve patients with multiple myeloma  The primary endpoint is the survival time The primary endpoint is the survival time  n=445  The trial stopped early at the second interim The trial stopped early at the second interim analysis; the low dose was superior.  Patients on high-dose arm were then receiveda e s o g dose a e e e ece ed low-dose and follow-up for overall survival were continued.
  • 25.
    A Cancer StudyExampleA Cancer Study Example 1.00.8 Group 1 Group 2 0.6 Probability Group 2 0.20.4 P 0.0 0 10 20 30 40 Month
  • 26.
     The proportionalhazards assumption is not valid  The PH estimator is estimating a quantity which cannot be interpreted and, worse, depends on the study specific censoring distributionsstudy-specific censoring distributions  Any model-based treatment contrast has such issues (need a model-free parameter)issues (need a model free parameter)  The logrank test is not powerful
  • 27.
     Conventional analysis: Log-rank test: p=0.47  Hazard Ratio: HR=0.87 (0.60, 1.27)
  • 28.
    What is thealternative way for s r i al anal sis?survival analysis?  Using the area under the curve of Kaplan-Meier estimate up to a fixed time point  Restricted mean survival time Restricted mean survival time  Model-free and a global measure of efficacyModel free and a global measure of efficacy  Can be estimated even under heavy censoring
  • 29.
    Cancer Study ExampleCancerStudy Example Restricted Mean (up to 40 months):  35.4 months vs. 33.3 months  ∆ = 2 1 (0 1 4 2) months; p=0 04 ∆ = 2.1 (0.1, 4.2) months; p=0.04  Ratio of Survival time = 35.4/33.3 = 1.06 (1.00, 1.13)  Ratio of time lost = 6.7/4.6 = 1.46 (1.02, 2.13)
  • 30.
    10. Post-marketing/safety studies?  It is not appropriate to use an event driven procedure to conduct a safety study. Th t t i l th ti tt The event rate is low, the exposure time matters  Requires lot of resources (large or long-term study)study)  Meta analysis; observational studies
  • 31.
    CV safety studyfor anti-diabetes dr gsdrugs  Event driven studies, that is, we need to have a pre-specified # of events so the resulting confidence interval for the treatment difference is “narrow”narrow  For example, the upper bound of 95% confidenceFor example, the upper bound of 95% confidence interval is less than 1.3
  • 32.
    The EXAMINE trial(alogliptin)The EXAMINE trial (alogliptin) NEJM, October 3, 2013
  • 33.
    RMST (24 months): Placebo21.9 (21.7, 22.2) Alogliptin 22 0 (21 8 22 3) RMST (30 months): Placebo 27.1 (26.7, 27.4) Alogliptin 27 2 (26 9 27 5)Alogliptin 22.0 (21.8, 22.3) Difference -0.08 (-0.39, 0.24) Ratio 1.00 (0.98, 1.01) Alogliptin 27.2 (26.9, 27.5) Difference -0.12 (-0.56, 0.33) Ratio 1.00 (0.98, 1.01)
  • 34.
    Whole data N=5380 1/2 ofdata N=2690 1/3 of data N=1793 1/4 of data N=1345N 5380 N 2690 N 1793 N 1345 Hazard Ratio 0.824, 1.129 0.775, 1.213 0.742, 1.283 0.711, 1.338 Difference in RMST (30m) -0.558, 0.325 -0.721, 0.500 -0.852, 0.647 -0.980, 0.753
  • 35.
     11. Metaanalysis for safety issues
  • 36.
    ( ) fNissen and Wolski (2007) performed a meta analysis to examine whether Rosiglitazone (Avandia, GSK), a drug for treating type 2 diabetes mellitus significantlydrug for treating type 2 diabetes mellitus, significantly increases the risk of MI or CVD related death.
  • 37.
    Example Eff t fR i lit MI CVD D thEffect of Rosiglitazone on MI or CVD Deaths  Avandia was introduced in 1999 and is widely used as monotherapy or in fixed-dose combinations with ith A d t A d leither Avandamet or Avandaryl.  The original approval of Avandia was based on its ability in reducing blood glucose and glycated hemoglobin levels.  Initial studies were not adequately powered to determine the effects of this agent on micro- or macro- vascular complications of diabetes, including cardiovascular morbidity and mortality.
  • 38.
    Example Eff t fR i lit MI CVD D thEffect of Rosiglitazone on MI or CVD Deaths  However the effect of any anti-diabetic therapy on However, the effect of any anti diabetic therapy on cardiovascular outcomes is particularly important because more than 65% of deaths in patients with di b t f di ldiabetes are from cardiovascular causes.  Of 116 screened studies, 48 satisfied the inclusion criteria for the analysis proposed in Nissen and Wolski (2007). 42 studies were reported in Nissen and Wolski (2007) the42 studies were reported in Nissen and Wolski (2007), the remaining 6 studies have zero MI or CVD death 10 studies with zero MI events 25 t di ith CVD l t d d th25 studies with zero CVD related deaths
  • 39.
     Event Ratesfrom 0% to 2.70% for MI  Event Rates from 0% to 1.75% for CVD Death
  • 40.
    MI CVD Death ???????????? Log Odds Ratio 95% CI: (1 03 1 98); p value = 0 03 95% CI: (0 98 2 74); p value = 0 06 Log Odds Ratio 95% CI: (1.03, 1.98); p-value = 0.03 (in favor of the control) 95% CI: (0.98, 2.74); p-value = 0.06
  • 41.
    QuestionsQuestions  Rare events? How to utilize studies with 0/0 events? f f ? Validity of asymptotic inference?  Exact inference?  Choice of effect measure? Choice of effect measure?  Between Study Heterogeneity?  Common treatment effect or study specific treatment Common treatment effect or study specific treatment effect?  The number of studies not large?g
  • 42.
    Exact Inference AsymptoticInference MIM %18.0ˆ  %19.0ˆ  95% CI: (-0.08, 0.38)% P-value = 0.27 95% CI: (0.02, 0.42)% P-value = 0.03
  • 43.
    Exact Inference AsymptoticInference DeathCVDD %063.0ˆ  %11.0ˆ  95% CI: (-0.13, 0.23)% P-value = 0.83 95% CI: (0.00, 0.31)% P-value = 0.05
  • 44.
    SummarySummary C ld dift ti ti l t i i ? Could we modify our statistical training?  Teaching young generations “how, where and what to learn”  Learning from doing a project with mentoring?  Could we have a coherent approach from the beginning to the end for a research project?beginning to the end for a research project?  George Box: Instead of figuring out the optimalg g g p solution to a wrong problem, try to get A solution to a right problem.  Asking ourselves “What is the question?”Asking ourselves What is the question?