Title: Bayesian Approaches To Improve Sample Size
Duration: 60 minutes
Speaker: Ronan Fitzpatrick, Head of Statistics, Statsols
In this webinar you'll learn about:
Bayesian Sample Size Determination: See how the growth of Bayesian analysis has helped transform our ideas about statistical inference and methodologies in clinical trials
Bayesian Assurance: Get an informative answer on how likely it is to see a “positive” outcome from the trial and then make better decisions on what trials to back
Posterior Credible Intervals and Mixed Bayesian Likelihood: Enable researchers to use prior information from pilot studies and other sources to make quicker and better decisions
Plus much more
5. WHAT IS SAMPLE SIZE
ESTIMATION?
The process for finding the appropriate
sample size for your study.
Common metrics for this are statistical
power, interval width or cost
considerations
Generally, we find the min sample size
expected to reach a given desired value
of the metric
The sample size estimate depends on the
study design and the statistical analysis
6. WHY ESTIMATE SAMPLE
SIZE?
Crucial to Arrive at Valid Conclusions
Reduce Chance of Large Errors (Type S/M Errors)
Balance Ethical and Practical Considerations
Both how many needed and how many not
needed!
Standard Trial Design Requirement
EMA, FDA, Nature Group Publishing Guidelines
etc.
But Many Studies Still have Low Power
Can’t Rely on Past Studies (Crisis of
7. INTRODUCING nQUERY
Over 20 Years of Experience in Sample Size
Determination and Power Analysis for Clinical
Trials
Latest Release had Methods for ~250 Trial
Designs
Used by 45/50 Top Pharma and Biotech
Companies
nQuery’s 20 Years of Success is Based on
1. Being Easy to Use and Accessible to All Users
9. BAYESIAN ANALYSIS
Bayesian Analysis continues to
grow in popularity for statistical
analysis in clinical trials
Offers the ability to integrate
domain knowledge and prior study
data for more efficient and accurate
testing & estimation
For sample size determination, two
broad approaches:
1. Sample Size for Bayesian
Methods
2. Bayesian Approaches to Improve
Frequentist Sample Size
Methods
One Sample Credible Interval with Known Precision
One Sample Credible Interval with Unknown Precision
One Sample Mixed Bayesian Likelihood Criterion
Two Sample Credible Interval with Known Precision
Two Sample Credible Interval with Unknown Precision
Assurance for Superiority Trial Comparing Two Means
Assurance for Equivalence Trial Comparing Two Means
Assurance Non-inferiority Comparing Normal Means
10. BAYESIAN SAMPLE SIZE
1. Sample Size for Bayesian Methods
Sample size for specific values of Bayesian
parameters
e.g. Bayes Factors, Credible Intervals,
Utility/Cost function
2. Bayesian Approaches to Improve Sample Size
Integrating Bayesian methods into current
methods to add greater context for parameter
uncertainty
11. SAMPLE SIZE FOR CREDIBLE
INTERVALS
Credible intervals are the most commonly used
Bayesian method used for interval estimation
Focus here on methods proposed by Adcock
(1988) & Joseph and Bélisle (1997) and their
subsequent extensions
These methods focused on the testing of
normal means and integrating uncertainty in
the estimation of the variance(s)
The method chosen depended both on the
12. CREDIBLE INTERVAL CHOICES
Joseph & Bélisle gave 3 selection criteria for
sample size:
1. Average Coverage Criterion
2. Average Length Criterion
3. Worst Outcome Criterion
3 main testing scenarios explored for credible
intervals:
1. Bayesian Estimation with Known Precision
2. Bayesian Estimation with Unknown Precision
3. Mixed Bayesian/Likelihood Estimation (unknown
13. CREDIBLE INTERVAL
EXAMPLE
Source: Wiley.com
“In practice, the prior information
will be different in every problem,
so it is impossible to provide
exhaustive tables. Therefore,
Table 1 presents a variety of
examples that illustrate the
relationships between the various
criteria discussed for the case of
estimating sample size
requirements for a single normal
mean. Example 1-3 … show that
the Bayesian approach can provide
larger sample sizes than the
frequentist approach, even though
prior information is incorporated
... The same examples also
illustrate that sample size
provided by the ALC tend to be
14. ASSURANCE FOR CLINICAL TRIALS
Assurance (sometimes called “Bayesian power”)
is the unconditional probability of significance
given a prior
Focus on methods proposed by O’Hagan et al.
(2005)
Assurance is the expectation of the power
averaged over a prior distribution for the effect
Often framed the “true probability of success”
of a trial
Will focus on simple two-sample normal case in
15. ASSURANCE EXAMPLE
“The outcome variable … is reduction
in CRP after four weeks relative to
baseline, and the principal analysis will
be a one-sided test of superiority at
the 2.5% significance level. The (two)
population variance … is assumed to
be … equal to 0.0625. … the test is
required to have 80% power to detect a
treatment effect of 0.2, leading to a
proposed trial size of n1 = n2 = 25
patients … For the calculation of
assurance, we suppose that the
elicitation of prior information … gives
the mean of 0.2 and variance of
0.0625. If we assume a normal prior
distribution, we can compute
assurances with m = 0:2, v = 0.06 …
Source: Wiley.com
Parameter Value
Significance Level (One-
Sided)
0.025
Prior Mean Difference 0.2
Prior Difference Variance 0.06
Posterior Standard
Deviation
√0.0625=0.25
Sample Size per Group 25
17. nQUERY
The latest release of nQuery (nQuery Advanced) is a
complete overhaul which results in a unified
modern application
What is new in nQuery Advanced?
1. New User Interface
2. 17 Additional Survival Sample Size Tables
3. Installation & Operational Qualification Tools
4. Additional nQuery Bayes Add-on
Existing customers can update anytime. Contact
Account Manager
18. NEW INTERFACE
Seeks to combine
nQuery’s basic
workflow with nTerim’s
feature integration
Allows easier
customisation and
multi-tasking within
tables
Guides users towards
appropriate answers
19. NEW TABLES
The major area of
focus in this release is
on survival (time-to-
event) analysis
Focus on new analyses
and updating current
methods for greater
flexibility
In total, 17 new
survival tables will be
One Sample Log-Rank Test
One Sample Log-Rank Test assuming
Exponential
One Sample Cure Model
Updated Two Sample Log-Rank Test
Two Sample Gehan-Breslow Linear-Rank
Test
Two Sample Tarone-Ware Linear-Rank Test
Equivalence Testing Log-Rank Test
Log-rank test accounting for Competing
Risks
Cluster Randomised Trials using Log-Rank
Test
Two Survival Curves using Cox Regression
Equivalence Testing using Cox Regression
Non-inferiority Testing using Cox
Regression
Confidence Interval for Median Survival
Confidence Interval for Exponential Mean
20. NEW QUALIFICATION TOOLS
nQuery will now come bundled with automated
tools for Installation (IQ) and Operation
Qualification (OQ)
For IQ, will compare integrity of each file using
SHA-1 hash according to Statsols’
specifications
For OQ, will run automated scripts for every
nQuery Advanced table for a wide range of
scenarios
Along with automated update process, will
21. OTHER UPGRADES
• Updated Plotting Features with Additional
Flexibility
• New Report Feature for Quicker Calculation
Summary
• Additional Customisation of Table Size and
Performance
• Streamlined Upgrade and License Renewal
Process
• Additional Customisation of Output Statements
and Notes
23. nQUERY BAYES PLANS
Add credible interval methods for two-sample
case with unequal variances and for binomial
proportions
Methods for Bayesian statistics such as Bayes
Factors, posterior error rate and decision
theoretic criteria (Lindley)
Additional assurance methods for more
complex prob. distributions and for other data
types (props, TTE)
Adaptive trials which use Bayesian methods for
24. OTHER ROADMAP ITEMS
New sample size tables related to equivalence,
case-control studies, logistic regression and
vaccine studies
Updates to tables including group sequential
designs, incidence rates and cluster randomised
trials
Optional add-on modules for adaptive designs,
sample size re-estimation and simulation analysis
Cross-platform support for Mac OS, Linux and web
browser
25. nQUERY COMMUNITY
A new integrated portal which will allow you to
access support, webinars and upgrades from within
your nQuery application
Will provide platform for customers to easily submit
ideas for new features, make requests for
prototypes, early access to beta versions and direct
access to our statistical team
Provides a central hub for IT managers to
administer their nQuery installations and for
finance managers to control, assess, renew and
26. nQUERY ADVANCED
nQuery Advanced and nQuery Bayes
were released on August 22nd
Contact sales@statsols.com for
updates, upgrade details and
requests
Contact nquery@statsols.com for
any detailed statistical queries
Thanks for listening
27. REFERENCES
Adcock, C. J. (1988). A Bayesian approach to calculating sample sizes. The
statistician, 433-439.
Joseph, L., & Bélisle, P. (1997). Bayesian sample size determination for
normal means and differences between normal means. Journal of the Royal
Statistical Society: Series D (The Statistician), 46(2), 209-226.
Joseph, L., Du Berger, R., & Bélisle, P. (1997). Bayesian and mixed
Bayesian/likelihood criteria for sample size determination. Statistics in
medicine, 16(7), 769-781.
Chow, S. C., Wang, H., & Shao, J. (2007). Sample size calculations in clinical
research. CRC press.
O'Hagan, A., Stevens, J. W., & Campbell, M. J. (2005). Assurance in clinical
trial design. Pharmaceutical Statistics, 4(3), 187-201.
Editor's Notes
Appropriate would usually be defined in terms of preventing too low sample size, though too high has practical costs.
Other metrics include confidence interval width (precision), cost based and Bayesian methods.
Important to note that you need to specify an exact value for the effect even though alternative hypothesis acceptance space can technically can be any non-null hypothesis point value (commonly any non-zero value)
Can be though of as the area of the alternative pdf which is contained within the rejection region of the null hypothesis.
Interesting to note that power of 50% is equivalent to lower limit of CI being equal to zero for zero-based z-statistic null
Point 1:
http://rsos.royalsocietypublishing.org/content/1/3/140216 -> Screening problem analogy.
Type S Error = Sign Error i.e. sign of estimate is different than actual population value
Type M Error = Magnitude Error i.e. estimate is order of magnitude different than actual value
Point 2:
Know we have only 100 subjects available. Need to know what power will this give us, i.e. is there enough power to justify even doing the study.
Stage III clinical trials constitute 90% of trial costs, vital to reduce waste and ensure can fulfil goal.
Point 3:
Sample Size requirements described in ICH Efficacy Guidelines 9: STATISTICAL PRINCIPLES FOR CLINICAL TRIALS
See FDA/NIH draft protocol template here: http://osp.od.nih.gov/sites/default/files/Protocol_Template_05Feb2016_508.pdf (Section 10.5)
Nature Statistical Checklist: http://www.nature.com/nature/authors/gta/Statistical_checklist.doc
Point 4:
In Cohen’s (1962) seminal power analysis of the journal of Abnormal and Social Psychology he concluded that over half of the published studies were insufficiently powered to result in statistical significance for the main hypothesis. Many journals (e.g. Nature) now require that authors submit power estimates for their studies.
Power/Sample size one of areas highlighted when discussing “crisis of reproducibility” (Ioannidis). Relatively easy fix compared to finding p-hacking etc.
Alternative linear rank tests include Tahone-Ware, Gehan. Planned for next release circa. Summer 2016.
Sample size mainly asking “How many subjects needed to attain X events?”
Most methods optimised for exponential survival but could enter piece-wise linear approximation of probability at time t for other distributions (e.g. Weibull)
Analytic vs Simulation = Much wider debate. Usually have ease of use vs flexibility trade-off. Simulation better suited to programming environment e.g. R
Alternative linear rank tests include Tahone-Ware, Gehan. Planned for next release circa. Summer 2016.
Sample size mainly asking “How many subjects needed to attain X events?”
Most methods optimised for exponential survival but could enter piece-wise linear approximation of probability at time t for other distributions (e.g. Weibull)
Analytic vs Simulation = Much wider debate. Usually have ease of use vs flexibility trade-off. Simulation better suited to programming environment e.g. R
Alternative linear rank tests include Tahone-Ware, Gehan. Planned for next release circa. Summer 2016.
Sample size mainly asking “How many subjects needed to attain X events?”
Most methods optimised for exponential survival but could enter piece-wise linear approximation of probability at time t for other distributions (e.g. Weibull)
Analytic vs Simulation = Much wider debate. Usually have ease of use vs flexibility trade-off. Simulation better suited to programming environment e.g. R
Alternative linear rank tests include Tahone-Ware, Gehan. Planned for next release circa. Summer 2016.
Sample size mainly asking “How many subjects needed to attain X events?”
Most methods optimised for exponential survival but could enter piece-wise linear approximation of probability at time t for other distributions (e.g. Weibull)
Analytic vs Simulation = Much wider debate. Usually have ease of use vs flexibility trade-off. Simulation better suited to programming environment e.g. R