Determining the appropriate number of events needed for survival analysis is a complex task as study planners try to predict what sample size will be needed after accounting for the complications of unequal follow-up, drop-out and treatment crossover.
The statistical, logistical and ethical considerations all complicate life for biostatisticians as issues to balance in planning a survival analysis. However, this complexity has created a need for new analyses and procedures to help the planning process for survival analysis trials.
The wider move from fixed to flexible designs has opened up opportunities for advanced methods such as adaptive design and Bayesian analysis to help deal with the unique complications of planning for survival data but these methods have their own complications that need to be explored too.
7. Sample Size Determination (SSD)
SSD finds the appropriate sample size for your study
Common metrics: Statistical power, interval width, cost
SSD seeks to balance ethical and practical issues
A standard design requirement for regulatory purposes
SSD is crucial to arrive at valid conclusions in a study
High incidence of non-replicable results, Type M/S errors
Yet many studies have insufficient sample size
How to deal with this for more complex studies /outcomes?
8. Survival Analysis is about the expected
duration of time to an event
Methods like log-rank test & Cox Model
Power is related to the number of
events NOT the sample size
Sample Size = Subjects to get # of events
Flexibility expected in survival analysis
methods and estimation
Sample Size methods need to follow suit
but can make mistakes more likely!
Sample Size for Survival Analysis
Source: SEER, NCI
9. What is the expected survival curve(s) in the group(s)?
Assume parametric approximation? Which test appropriate?
Survival SSD Issues
Effect of unequal follow-up due to accrual period?
What accrual pattern to assume? Set max follow-up same for all?
How to deal with expected dropouts or censoring?
Simple loss to follow-up or integrate dropout process into model?
Effect of subjects crossing over from one group to other?
Account for at planning stage? How will analysis handle it?
10. Evolution of Survival SSD
Move from simple approximations
to more complex models
Initial methods based on normal approx.
for exponential curves
Later approximations adjust for unequal
follow-up and dropout
Complex methods (markov,
simulation) allowed more flexibility
Piece-wise curves, crossover etc.
SSD extensions for other survival
models, trial designs & adaptive
Cox, Gehan, >2 Groups, GSD, SSR etc.
Source: D Schoenfeld (1981)
Source: J.M. Lachin & M.A Foulkes (1986)
Source: E Lakatos (1988)
Source: L.S. Freedman (1982)
11. “Using an unstratified log-rank test at the one-sided 2.5%
significance level, a total of 282 events would allow
92.6% power to demonstrate a 33% risk reduction
(hazard ratio for RAD/placebo of about 0.67, as calculated
from an anticipated 50% increase in median PFS, from 6
months in placebo arm to 9 months in the RAD001 arm).
With a uniform accrual of approximately 23 patients per
month over 74 weeks and a minimum follow up of 39
weeks, a total of 352 patients would be required to
obtain 282 PFS events, assuming an exponential
progression-free survival distribution with a median of 6
months in the Placebo arm and of 9 months in RAD001
arm.
With an estimated 10% lost to follow up patients, a total
sample size of 392 patients should be randomized.”
Source:
nejm.org
Parameter Value
Significance Level (One-Sided) 0.025
Placebo Median Survival (months) 6
Everolimus Median Survival (months) 9
Hazard Ratio 0.66667
Accrual Period (Weeks) 74
Minimum Follow-Up (Weeks) 39
Power (%) 92.6
Log-Rank Test
Source: NEJM (2011)
Worked Example
13. Adaptive Design Overview
Adaptive designs are any trial where a change or decision
is made to a trial while still on-going
Encompasses a wide variety of potential adaptions
e.g. Early stopping, SSR, enrichment, seamless, dose-finding
Adaptive trials seek to give control to trialist to
improve trial based on all available information
Adaptive trials can decrease costs & better inferences
15. Group Sequential Design
Group Sequential Designs (GSD) facilitate interim analyses
Interim analyses are those which occur while a trial is on-going
In a GSD, accrued data is analysed at pre-specified times
E.g. After half the subjects have been measured
At an interim analysis, can either stop for benefit or futility
If neither found, continue trial until end/next interim analysis
However, need to account for effect of multiple analyses
Do this by “spending” errors using error spending function
16. Group Sequential Design for Survival
Parameter Value
Significance Level (One-Sided) 0.025
Placebo Median Survival (months) 6
Everolimus Median Survival (months) 9
Hazard Ratio 0.66667
Accrual Period (Weeks) 74
Minimum Follow-Up (Weeks) 39
Power (%) 92.6
Parameter Value
Number of Looks 3
Efficacy Bound O’Brien Fleming
Futility Bound Non-Binding
Beta Spending Function Hwang-Shih-DeCani
HSD Parameter -2
Source: NEJM (2011)
Extend previous example to group
sequential design with 2 interim analyses
with O’Brien Fleming efficacy bound and
non-binding Hwang-Shih-DeCani futility
bound with gamma = -2
Worked Example
17. Sample Size Re-estimation (SSR)
Will focus here on specific adaptive design of SSR
Adaptive Trial focused on higher sample size if needed
Obvious adaption target due to intrinsic SSD uncertainty
Could also adaptively lower N but not encouraged
Two Primary Types: 1) Unblinded SSR; 2) Blinded SSR
Differ on whether decision made on blinded data or not
Both target different aspects of initial SSD uncertainty
18. Unblinded SSR
SSR suggested when interim effect size is “promising” (Chen et al)
“Promising” user-defined but based on unblinded effect size
Extends GSD with 3rd option: continue, stop early, increase N
Power for optimistic effect but increase N for lower relevant effects
Updated FDA Guidance: Design which “can provide efficiency”
Common criteria proposed for unblinded SSR is conditional power (CP)
Probability of significance given interim data
2 methods here: Chen, DeMets & Lan; Cui, Hung & Wang
1st uses GSD statistics but only penultimate look & high CP
2nd uses weighted statistic but allowed at any look and CP
19. Assume previous group sequential design
with added SSR option
Assume interim HR= 0.8 (from 0.666) and
inherit total E of 303 (interim E of 101 and
202) and final look alpha of 0.23 from
GSD.
What will required E for SSR for Chen-
Demets-Lan/Cui-Hung-Wang assuming
maximum events multiplier of 3?
Parameter Value
Nominal Final Look Sig. Level 0.0231
Initial HR 0.667
Interim HR 0.8
Initial Expected Events (E) 303
Interim Events (2nd Look) 202
Maximum Events 909
Lower CP Bound (CDL/CHW) Derived/40%
Upper CP Bound 92.6%
Sample Size Re-Estimation for Survival
Worked Example
20. Adaptive Survival Complications
Unknown follow-up means more interim planning uncertainty
Can be difficult to predict time when interim analysis will occur
Higher numbers likely in active cohort when interim analysis occurs
Adaptive designs for survival often come with new assumptions
Usually strong assumption of constant treatment effect for GSD and SSR
Two dimensions for increasing interim events: Sample Size & Time
N increase biases for early trends, time increase for later trends?
Freidlin and Korn (2017) show ability to pick best treatment period
22. Bayesian Sample Size Approaches
2. Improving Sample Size
Methods
1. Sample Size for Bayesian
Methods
Integrating Bayesian
methods equals adding
greater context for
parameter uncertainty
e.g. Assurance, Predictive
Power, Adaptive Designs
Sample size for specific
values of Bayesian
parameters
e.g. Bayes Factors, Credible
Intervals, Continual
Reassessment Method (CRM),
Utility functions
Bayesian Analysis continues to grow in popularity due to the
ability to integrate prior knowledge and data into analyses
23. Bayesian Assurance
Assurance is the unconditional probability
of significance given a prior
Equals expectation of power over a prior
distribution for parameter
Often called “true probability of
success”/“Bayesian Power”
Can be considered as Bayesian analogue to
sensitivity analysis Source: O’Hagan
(2005)
24. Parameter Value
Significance Level (One-Sided) 0.025
Hazard Ratio 0.66667
Log Hazard Ratio -0.40545
Number of Events 282
St. Dev of log(HR) Normal Prior 0.1
Source: NEJM (2011)
Bayesian Assurance for Survival
Assume 282 events needed as per
original SSD statement from Yao et.
al. Assume an assurance calculation
based on normal prior for log hazard
ratio. Normal prior has mean equal to
log of hazard ratio from statement
(ln(0.666) = -0.405) & standard
deviation of 0.1 (95% of HR
probability mass = 0.548 to 0.811)
Worked Example
26. Easy to make mistakes (nQuery can guide/help)
Same Time units, parameter conversion, equal follow-up
Lots of options and flexibility with survival SSD
But each additional option = an additional assumption
Adaptive Design gives more flexibility for survival study
Unblinded SSR help with optimistic effect but bias issues
Assurance can get “true” probability of success
Useful for stakeholder and sensitivity analysis
Conclusions
28. Survival SSD in nQuery
Linear-Rank Tests
Cox Regression
CI/NI/Equiv
Tables for Classical
Design Analysis
Conditional Power
GSD
Unblinded SSR
Tables for Adaptive
Design Analysis
Assurance for HR
Assurance for
survival rate
Tables for Bayesian
Design Analysis
30. References
Freedman, L. S. (1982). Tables of the number of patients required in clinical trials using the
logrank test. Statistics in medicine, 1(2), 121-129.
Schoenfeld, D. A. (1983). Sample-size formula for the proportional-hazards regression
model. Biometrics, 499-503.
Lachin, J. M., & Foulkes, M. A. (1986). Evaluation of sample size and power for analyses of
survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance,
and stratification. Biometrics, 507-519.
Lakatos, E. (1988). Sample sizes based on the log-rank statistic in complex clinical
trials. Biometrics, 229-241.
Yao, J. C., et. al. (2011). Everolimus for advanced pancreatic neuroendocrine tumors. New
England Journal of Medicine, 364(6), 514-523.
31. References
Jennison, C., & Turnbull, B. W. (1999). Group sequential methods with applications to clinical
trials. CRC Press.
Friede, T., & Kieser, M. (2006). Sample size recalculation in internal pilot study designs: a
review. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 537-555.
US Food and Drug Administration. (2018) Adaptive design clinical trials for drugs and
biologics (Draft guidance). Retrieved from https://www.fda.gov/media/78495/download
Chen, Y. J., DeMets, D. L., & Gordon Lan, K. K. (2004). Increasing the sample size when the
unblinded interim result is promising. Statistics in medicine, 23(7), 1023-1038.
Cui, L., Hung, H. J., & Wang, S. J. (1999). Modification of sample size in group sequential
clinical trials. Biometrics, 55(3), 853-857.
Mehta, C.R. and Pocock, S.J., (2011). Adaptive increase in sample size when interim results
are promising: a practical guide with examples. Statistics in medicine, 30(28), 3267-3284.
32. References
Chen, Y. J., Li, C., & Lan, K. G. (2015). Sample size adjustment based on promising interim
results and its application in confirmatory clinical trials. Clinical Trials, 12(6), 584-595
O'Hagan, A., Stevens, J. W., & Campbell, M. J. (2005). Assurance in clinical trial
design. Pharmaceutical Statistics, 4(3), 187-201.
Ren, S., & Oakley, J. E. (2014). Assurance calculations for planning clinical trials with
time‐to‐event outcomes. Statistics in medicine, 33(1), 31-45.
Liu, Y., & Lim, P. (2017). Sample size increase during a survival trial when interim results are
promising. Communications in Statistics-Theory and Methods, 46(14), 6846-6863.
Freidlin, B., & Korn, E. L. (2017). Sample size adjustment designs with time-to-event
outcomes: a caution. Clinical Trials, 14(6), 597-604.