SlideShare a Scribd company logo
1 of 22
Download to read offline
This article was downloaded by: [Amgen Inc]
On: 06 August 2015, At: 11:02
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place,
London, SW1P 1WG
Journal of Biopharmaceutical Statistics
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/lbps20
Doubly Randomized Delayed-Start Design for
Enrichment Studies with Responders or Nonresponders
Qing Liu
a
, Pilar Lim
a
, Jaskaran Singh
a
, David Lewin
a
, Barry Schwab
a
& Justine Kent
a
a
Janssen Research & Development, LLC , Raritan , New Jersey , USA
Published online: 31 May 2012.
To cite this article: Qing Liu , Pilar Lim , Jaskaran Singh , David Lewin , Barry Schwab & Justine Kent (2012) Doubly
Randomized Delayed-Start Design for Enrichment Studies with Responders or Nonresponders, Journal of Biopharmaceutical
Statistics, 22:4, 737-757, DOI: 10.1080/10543406.2012.678234
To link to this article: http://dx.doi.org/10.1080/10543406.2012.678234
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Journal of Biopharmaceutical Statistics, 22: 737–757, 2012
Copyright © Taylor & Francis Group, LLC
ISSN: 1054-3406 print/1520-5711 online
DOI: 10.1080/10543406.2012.678234
DOUBLY RANDOMIZED DELAYED-START DESIGN
FOR ENRICHMENT STUDIES WITH RESPONDERS
OR NONRESPONDERS
Qing Liu, Pilar Lim, Jaskaran Singh, David Lewin,
Barry Schwab, and Justine Kent
Janssen Research & Development, LLC, Raritan, New Jersey, USA
High placebo response has been a major source of bias and is difficult to deal with in
many central nervous system (CNS) clinical trials. This bias has led to a high failure
rate in mood disorder trials even with known effective drugs. For cancer trials, the
traditional parallel group design biases the inference on the maintenance effect of the
new drug with the traditional time-to-treatment failure analysis. To minimize bias, we
propose a doubly randomized delayed-start design for clinical trials with enrichment.
The design consists of two periods. In the first period, patients can be randomized to
receive several doses of a new drug or a control. In the second period, control patients
of the first period of an enriched population can be rerandomized to receive the same
or fewer doses of the new drug or to continue on the control. Depending on the clinical
needs, different randomization ratios can be applied to the two periods. The essential
feature is that the design is naturally adaptive because of the randomization for the
second period. As a result, other aspects of the second period, such as the sample
size, can be modified adaptively when an interim analysis is set up for the first period.
At the end of the trial, response data from both randomizations are combined in an
integrated analysis. Because of the enrichment in the second period, the design increases
the probability of trial success and, in addition, reduces the required sample size. Thus,
for clinical development, the design offers greater efficiency.
Key Words: Adaptive design; Enrichment design; Maintenance effect; Proof-of-concept trials;
Randomized start design; Sequential parallel design.
1. INTRODUCTION
High placebo response in many central nervous system (CNS) clinical trials
leads to reduced sensitivity to distinguish an effective therapeutic agent from a
placebo control. For example, clinical trials employing effective antidepressants are
widely known for their high failure rates (Khin and Chen, 2011). There are many
causes for high placebo response, not all well understood. One of the key factors
is an expectation bias of the likelihood of receiving placebo (Papakostas and Fava,
2009; Sinyor et al., 2010; Tedeschini et al., 2010). Therefore, there is a need to
develop new study designs that aim to reduce the placebo response. For cancer
Received October 31, 2011; Accepted February 15, 2012
Address correspondence to Qing Liu, Janssen Research & Development, LLC, Route 202,
PO Box 300, Raritan, NJ 08869, USA; E-mail: QLiu2@its.jnj.com
737
Downloadedby[AmgenInc]at11:0206August2015
738 LIU ET AL.
trials, it is often important to study the maintenance effect of a new treatment
regimen for patients who have responded to an effective control induction therapy.
However, with the traditional parallel group design, the analysis of the maintenance
effect with the standard time-to-treatment failure may be biased by the differential
induction response rate.
To address the issue of bias, we propose a doubly randomized delayed-
start design that is accomplished in two periods. In the first period, patients are
randomized to receive several doses of a new drug or placebo. At the end of the
first period, an assessment is made to determine whether patients who had been
randomized to receive placebo meet the reentry criteria for randomization in the
second period. Those who meet the reentry criteria are rerandomized into the second
period to receive the same or fewer doses of the new drug, depending on results
of the first period, or to continue on placebo. The reentry criteria can consist of
key enrichment elements, in terms of either efficacy or safety. For example, for
antidepressant trials, enrichment is often made with placebo nonresponders from
the first period. For oncology trials, enrichment can be defined to include treatment
responders or patients whose disease has not progressed. To ensure interpretability
of statistical inference, enrichment criteria using specific cutoffs on predetermined
outcome measures or an adjudication process need to be specified in advance in the
study protocol. Depending on the need of the trial design or clinical considerations,
different randomization ratios can be applied to the two periods. The essential
feature is that the design is naturally adaptive because of the randomization for the
second period. As a result, other aspects of the second period, such as the sample
size, choice of the doses of the new drug, duration of the treatment, and follow-up,
can be modified adaptively when an interim analysis is set up for the first period.
At the end of the trial, efficacy data from both randomizations are integrated via
an optimal combination test.
There is a rich history of employing designs with double randomizations for
clinical trials in different therapeutic areas (see the survey article by Mills et al.,
2007). An early application was described by Heyn et al. (1974) for pediatric acute
lymphocytic leukemia, where patients who were initially randomized to receive
a control regimen and who continued without CNS or marrow relapse were
rerandomized to receive a new treatment regimen or one of the two controls. In
modern cancer trials, double randomization schemes are often used by cooperative
oncology groups funded by the National Cancer Institute (NCI) for new drug
development of biotech companies. For example, the Eastern Cooperative Oncology
Group (ECOG) trial ECOG 4494 on the maintenance effect of rituximab for
patients with diffuse large B-cell lymphoma employed a double randomization
scheme, in which patients who responded to induction treatment with either
an investigational regimen or standard regimen were rerandomized to different
maintenance treatments (Habermann et al., 2006). The results of this trial, along
with supportive data from other trials, led to the maintenance indication for
rituximab. According to the clinical trial registry www.ClinicalTrials.gov, the
National Cancer Institute (NCI) provided trial details in November 1999 (with
identifier NCT00003150), indicating that the trial was initiated in December 1997.
The idea of randomized delayed start was initiated by Dr. Leber of the U.S.
Food and Drug Administration (FDA) Division of Neurophamacological Drug
Products during the period between 1994 and 1996 for degenerative neurologic
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 739
diseases. Leber (1996) describes a randomized start design in which patients are
randomized to receive a given treatment sequence (i.e., drug/drug, placebo/drug,
or placebo/placebo) at baseline but are not actively rerandomized before entering
period 2. More details on the regulatory background, scientific rational and
motivation, and further description of the design are provided in the discussion
article by Leber (1997). The sequential parallel design has been proposed by Fava
et al. (2003), which, in comparison to the randomized start design of Leber (1996),
uses enrichment with the placebo nonresponders of the first period for initiation
of treatment with the new drug in the second period. However, enrichment designs
with nonresponders critically require randomization to ensure valid statistical
inference and unbiased clinical conclusions (Temple, 1994). This design feature is
not included in the sequential parallel design.
The randomized start design also motivated the concept of what we now call
the (adaptive) doubly randomized delayed-start design in late 1997 for use in the
study of a variety of neurologic and psychiatric drugs. During this time, weighted
combination tests were developed for the simpler problem of sample size adjustment
(Chi and Liu, 1999; Cui et al., 1999). The details for the doubly randomized delayed-
start design, however, including sample size calculation and statistical analysis, have
not been developed and reported as originally intended. A slight modification of
the randomized start design was described by McDermott et al. (2002) for which
the placebo patients in the first period are rerandomized to receive the drug or
to continue on placebo. McDermott et al. (2002) provide a weighted method for
statistical analysis for a general two-period factorial design, which includes the
special case of the randomized start design of Leber (1996). In 2009, we proposed
to the FDA at an end of phase 2 meeting the idea of rerandomizing placebo
nonresponders for a confirmatory phase 3 clinical trial to receive one of two doses
of an antidepressant or the placebo control, due to concerns of potential bias in
sequential parallel design with possible unbalanced known or unknown prognostic
factors at baseline. In an article by the FDA staff, Chen et al. (2011) note that
in sequential parallel designs, treatment assignment in period 2 is predetermined at
the randomization of period 1 and raise concerns of bias “due to randomness and
possible unbalanced dropouts among placebo non-responders” prior to period 2.
They also show through theoretical development and simulation work that there
is no need for the complex, seemingly unrelated regression analysis proposed by
Tamura and Huang (2007) when rerandomization of placebo nonresponders for the
second period is in place.
As mentioned earlier, the proposed doubly randomized delayed-start design
is by nature an adaptive design, which is different from the nonadaptive design
by Chen et al. (2011). For applications in nonadaptive trial settings, the proposed
design is also distinct in terms of the statistical method, sample size calculation, and
flexibility in randomization.
This article uses antidepressant drug development to illustrate the basic
construction of clinical trials with the doubly randomized delayed-start design.
Therefore, throughout the remaining article, we only consider enrichment with
placebo nonresponders. The general theory and methods apply to both clinical
settings. We show in section 3.1 that enrichment prior to the second randomization
can greatly enhance the usefulness of the clinical outcomes for the resulting placebo
nonresponders. Specifically, compared to patients randomized in the first period,
Downloadedby[AmgenInc]at11:0206August2015
740 LIU ET AL.
clinical outcomes of the placebo nonresponders for the second period are not
only far less variable but also have a higher correlation. As a result, the placebo
nonresponders are expected to be more sensitive to treatment with an effective new
drug, or to be nonresponsive if they are randomized to continue with placebo.
Central to the proposed design is the second randomization. The enrichment process
can introduce unexpected selection bias in comparing the effect of the new drug
with placebo for the placebo nonresponders if the second randomization is not put
in place. Therefore, this randomized delayed-start enrichment for the second period
would effectively address the placebo response issue and, consequently, increase the
effect size while avoiding the potential bias in the statistical analysis and clinical
conclusions. In section 2, we develop an optimal combination test statistic, as well as
a simple closed-form formula for sample size calculation. Both fully take advantage
of the improved sensitivity and predictability of the enrichment. In section 3.2, we
describe a conditional analysis of covariance (ANCOVA) model for each period. We
construct the test statistics for the two periods, which are then used in the optimal
combination test statistic for hypothesis testing. The proposed test method only
relies on a standard ANCOVA, and therefore, no special statistical procedure (e.g.,
the seemingly unrelated regression analysis) is needed. For settings with binary or
time-to-event endpoint, we use the standard logistic regression model or the Cox
proportional hazards model for analysis.
As alluded to earlier, the second randomization also brings in additional
benefits that are not possible with the sequential parallel design. As shown in
section 2.2, the proposed doubly randomized delayed-start design is naturally
adaptive and utilizes the combination test statistic proposed by Cui et al. (1999).
Following the general measure-theoretic framework of Liu et al. (2002) for adaptive
designs, we provide the justification for the proposed design in the appendix for
any type of clinical endpoints, not just for continuous endpoints following normal
distributions. For clinical development in general, this approach can be used for
early clinical programs such as (Phase 2a) proof-of-concept trials or (Phase 2b) dose-
finding trials, as well as confirmatory Phase 2b/3 combination or Phase 3 trials,
which may include adaptive features such as sample size modification, or adaptive
dose-finding. The design can increase the probability of trial success via a reduced
sample size as compared to a standard parallel group design. In addition, the design
can be further expanded to include a randomized withdrawal for patients who are
randomized to receive the new drug in the first period (see the brief description in
section 6). Because of all these features, the doubly randomized delayed-start design
can substantially increase the efficiency (i.e., in cost and resource) and effectiveness
(i.e., in probability of technical and regulatory success) of antidepressant clinical
development.
In section 4.1 we provide an illustrative example of a proof of concept trial
where a single dose of a new drug is compared to placebo. We also compare
the required sample size to that of the sequential parallel design, as well as the
traditional parallel group design. In section 4.2, we also present a Phase 2b trial
with two doses of a new drug and illustrate how to apply the proposed sample size
procedure as well as existing multiple trend test and closed testing procedures. In
section 5, we provide simulation studies to confirm the adaptive measure-theoretic
theory for the combination test and, in addition, show that the weights in the
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 741
combination test can also depend on the actual sample size randomized to each
period without inflating the type 1 error rates.
The proposed design avoids the potential bias that may be introduced if the
placebo nonresponders in the placebo–drug and placebo–placebo sequences are not
comparable. At a minimum, the proposed design is more efficient and offers greater
inferential interpretability. Because of its adaptive nature, the design also offers
greater flexibility that benefits clinical development.
2. DESIGN
2.1. Description
To develop the theory and method, we consider the simple setting where
patients are randomized in both periods to receive either one dose of the new drug
or placebo. We then illustrate in section 4.2 how to apply the theory and method for
an application with two doses of a new drug. The design consists of two periods. At
the beginning of period 1, patients are assessed for their baseline variables and then
randomized to receive either placebo or treatment(s) with the new drug. Patients are
then treated during the first period. During the first period, patients are allowed to
drop out due to lack of efficacy or safety concerns. Patients who are treated with
the new drug in the first period may continue with their treatment(s) in the second
period. At the end of the first period, patients are evaluated for their response to
the assigned treatment and are classified as responders or nonresponders. For the
second period, patients who received placebo and have not dropped out in the first
period and who are nonresponders at the end of the first period are randomized to
receive the treatment with the new drug or to continue on placebo in the second
period. The rerandomized placebo patients are then treated during the second period
and at the end of the second period, they are evaluated for their final clinical
outcomes. Note that the durations of the two periods may be the same but it is not
required. Also, the randomization ratios may not be balanced.
2.2. General Theory
For each period, patients are evaluated with respect to a clinical endpoint. The
endpoints for the two periods are not required to be the same for the proposed
design. Let 1 be the parameter comparing the onset effect of the new drug to
placebo with the period 1 endpoint. For the second period, let 2 be the parameter
with the period 2 endpoint for comparing the delayed-start effect of the new drug
to placebo. We are interested in testing against the global null hypothesis
0 1 ≤ 0 and 2 ≤ 0
in favor of the alternative hypothesis
A 1 > 0 or 2 > 0
For 1 and 2, the test statistics against the individual null hypothesis 01 1 ≤ 0
and 02 2 ≤ 0 are denoted by Z1 and Z2, respectively. To establish the efficacy of
Downloadedby[AmgenInc]at11:0206August2015
742 LIU ET AL.
the new drug, we combine Z1 and Z2 via
Z = 1
1/2
Z1 + 2
1/2
Z2 (1)
for certain prespecified weights 1 and 2 such that 1 + 2 = 1.
Note that the combination test statistic is widely used for two-stage adaptive
designs in the literature (see Cui et al., 1999) with a single randomization.
The difference here is that Z1 and Z2 correspond to two different randomizations.
In the following, we define Z1 and Z2, and establish that the test Z ≥ z , for which
z is the critical value of the standard normal distribution at the significance level ,
controls the type 1 error rate at .
Let be the -field of the first period data from all randomized patients.
Assume for testing against the null hypothesis 1 ≤ 0 in favor of the alternative
hypothesis 1 > 0, there is a p-value p1, which is measurable, such that PH01
p1 ≤
≤ for all ∈ 0 1 where H01 1 = 0. Let · be the cumulative distribution
function of the standard normal distribution. Define Z1 to be the normal inverse of
p1, that is, Z1 = −1
1 − p1 . Then under the null hypothesis H01, the normal inverse
test statistic Z1 is not stochastically larger than a standard normal distribution.
The construction of Z2 is more involved as the trial involves a selection process
for patients who are randomized to receive placebo in the first period. Let be
the -field of the first period data of all patients who are randomized to receive
placebo in the first period. Let g represent the process for selecting patients from
those who are randomized to receive placebo in the first period to be randomized
for the second period. Following the description of the design, g involves excluding
patients for various reasons of dropout or patients whose outcome meet the criteria
for treatment responders at the end of the first period. In general, g cannot be
fully specified. However, it can be assumed that g is a measurable function with
range M whose elements represent various choices of subsets of patients to be
rerandomized. In Liu et al. (2002), g is known as an adaptation rule. For each
m ∈ M, let p2m be a p-value for testing against the null hypothesis 02 2 ≤ 0
in favor of the alternative hypothesis A2 2 > 0, such that PH02
p2m ≤ C ≤ C
for any measurable function C ∈ 0 1 where H02 2 = 0. Following Liu et al.
(2002), the adaptive p-value is given by
p2g =
m∈M
p2mI g=m (2)
where I g=m is the indicator for the event g = m . By the adaptation theory of Liu
et al. (2002), we establish the following theorem.
Theorem.
(a) PH02
p2g ≤ C ≤ C.
(b) Let Z2 = −1
1 − p2g . Then under the null hypothesis H0 1 = 2 = 0, the test
statistic
Z = 1
1/2
Z1 + 2
1/2
Z2 (3)
is not stochastically larger than the standard normal distribution.
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 743
The proof of the theorem is given in the appendix. Following the theorem, the
combination test statistic Z can be used to test against the null hypothesis 0 in
favor of a more specific alternative hypothesis at a specified significance level . By
construction, the type 1 error rate of the combination test is controlled even though
the sample size for the second period is random and is irrespective of whether
dropout at the end of period 1 is informative or not. The results of the theorem
remain valid even if the selection process g is expanded to depend on comparative
first period data, that is, g is expanded to be a measurable function. The theorem
relies on the p-values p1, and p2m for m ∈ M, that are not stochastically smaller than
uniform distributions. This allows various types of endpoints, including continuous,
binary, and time-to-event endpoints. Therefore, for clinical investigations in general,
this approach can be used for confirmatory Phase 3 trials, as well as for early
clinical development such as (Phase 2a) proof-of-concept trials or (Phase 2b) dose-
finding trials. Either an asymptotic or randomization-based exact justification for
this assumption rests on randomizations at the beginning of each period. Without
the second randomization, such as the design by Fava et al. (2003), there is no
guarantee of uniformity of p2m for m ∈ M.
For continuous endpoints following normal distributions, Chen et al. (2011)
develop a test procedure based on a weighted average of estimates for designs
incorporating double randomizations. They show that the estimates from two
periods have (asymptotic) zero correlation under a constancy assumption of the
correlation between the continuous endpoint measures. The proof also requires that
the sample size for each period is fixed in advance. With the proposed doubly
randomized delayed-start design, the lack of the constancy assumption of the
correlation as well as lack of the normality assumption can be easily handled with
exact randomization tests.
Also note that the theorem assumes the weights k for k = 1 2 are prespecified.
This assumption, however, can be relaxed to allow the weights to depend on blinded
first period data. We can easily justify this following the blinded adaptation theory
by Liu and Chi (2010a). The prespecified weights k for k = 1 2 are given in
section 2.3.
Standard analysis for two-group comparisons provides estimates of the onset
and delayed-start effects, 1 and 2.
2.3. Sample Size
Let rk be the randomization ratio of patients receiving placebo to patients
receiving the treatment with the new drug for period k where k = 1 2. The numbers
of patients for the treatment group with the new drug and the placebo group are
nk and rknk, respectively. Assume that for k = 1 or 2, Zk follows asymptotically a
normal distribution with mean
E Zk = nkRk
1/2
k
for Rk = rk/ 1 + rk , k = k/ k with some standard deviation k, and variance
Var Zk = 1.
With this canonical form (see Jennison and Turnbull, 2000, p. 49), we provide
a uniform approach to sample size calculation. For a normally distributed endpoint,
Downloadedby[AmgenInc]at11:0206August2015
744 LIU ET AL.
k is the standard deviation of the normal distribution. For a binary endpoint, k
is the standard deviation based on the pooled success rate of the two treatment
groups. Fava et al. (2003) provide the sample size calculation for the sequential
parallel design with binary endpoints; however, there is no sample size calculation
method for continuous endpoints; determination of sample size has to rely on time-
consuming simulation studies (Tamura and Huang, 2007).
Under an alternative hypothesis, Z1 and Z2 may not be independent in general.
To simplify sample size calculation, we follow Chen et al. (2011) and assume the
correlation coefficients between measurements at the first and second periods are
identical for both treatment groups. Then Z1 and Z2 are asymptotically independent.
As a result, the test statistic Z given in equation (3) follows an asymptotic normal
distribution with mean
E Z ≈ 1/2
1 n1R1
1/2
1 + 1/2
2 n∗
2R2
1/2
2 (4)
and variance Var Z = 1, where n∗
2 is the expected number of placebo
nonresponders who are randomized to receive the treatment with the new drug.
Now assume further that k ≥ 0 for k = 1 2. Then maximizing the power
of the test Z ≥ z is equivalent to maximizing the expectation E Z given in
equation (4) with respect to k for k = 1 2 subject to the constraint 1 + 2 = 1. This
leads to optimal weights ∗
k for k = 1 2. By the method of Lagrange multipliers, we
obtain
∗
1 =
n1R1
2
1
n1R1
2
1 + n∗
2R2
2
2
(5)
and ∗
2 = 1 − ∗
1. Let be the rate of attrition due to dropout or exclusion of
responders of the placebo patients from period 1. Then, for the second period,
the expected number of placebo nonresponders who are randomized to receive the
treatment with the new drug is
n∗
2 = r1n1 1 − / 1 + r2
and the expected number of placebo nonresponders who are randomized to continue
on placebo is
r2n∗
2 = r1r2n1 1 − / 1 + r2
Using the optimal weights given in equation (5), E Z in equation (4) becomes
E Z = n1R 1/2
1 (6)
for
R = R1 1 + R2 1 − R2 / 1 − R1 1 − 2
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 745
where = 2/ 1. For given type 1 and 2 error rates and , the required sample
size n1 follows the equation
n1R = z + z 2
/ 2
1 (7)
where z and z are critical values of the standard normal distribution at and .
Note that the n∗
2 and r2n∗
2 are expected numbers of placebo nonresponders
to be randomized to the new drug or placebo. The actual number of placebo
nonresponders to be randomized, that is, n2 1 + r2 , is a random variable. As
this variability is not accounted for in the sample size calculation procedure, it is
expected that the actual power is less than the nominal level. This is illustrated in
section 5.3 through simulation studies. An ad hoc fix for this problem is to identify
the amount of power loss through simulation studies under different designs and
then use an adjusted power, which is also illustrated in section 5.3.
3. STATISTICAL MODELS
3.1. Enrichment with Nonresponders
Enrichment with nonresponders is an appealing clinical concept as patients
who are responders tend to continue to remain as responders and, as a result,
reduce the ability to detect a treatment difference of an effective drug from
placebo (Temple, 1994). However, less is known about the characteristics of the
nonresponders in terms of their clinical outcome trajectory, the variability, and the
correlation between clinical outcome measures.
We present results of an analysis of the 17-item Hamilton Depression Rating
Scale (HAMD17) total scores of placebo patients from a randomized, double-blind
parallel group trial to study the efficacy and safety of a new drug in patients with
major depressive disorder (MDD). A balanced randomization was planned with
75 patients allocated to each treatment group. Study duration included 1 week of
screening, 1 week of washout, 7 weeks of treatment with patients monitored weekly,
and 1 week of follow-up. While the primary endpoint for the study was week 7
change from baseline in MADRS total score, the week 7 change from baseline in
HAMD17 total score was a key secondary endpoint. This dataset is used to illustrate
the design of the proof-of-concept trial in section 4.1.
In Table 1, we provide the means and standard deviations of changes in
HAMD17 total scores and correlations between HAMD17 total scores for all
randomized placebo patients as well as placebo nonresponders at week 4. For all
randomized patients, a change in the HAMD17 total score is calculated as the
difference between the weekly HAMD17 total score and baseline. A treatment
nonresponder is defined as a patient with the HAMD17 total score greater than 18
at week 4. For treatment nonresponders, a change in the HAMD17 total score is the
difference of the HAMD17 total scores at week 5, 6, or 7 from week 4. Pearson’s
correlation coefficient is used to assess potential relationships between the baseline
or week 4 and subsequent week HAMD17 total score. A full analysis of this dataset
was performed, as mentioned later in sections 4.1 and 5.2. Due to its scale, it was
decided to not present it here in this article.
Downloadedby[AmgenInc]at11:0206August2015
746 LIU ET AL.
Table 1 Properties of enrichment with placebo nonresponders
All patients Nonresponders
Week N Mean SD N Mean SD
1 69 −4 1 4.2 .504
2 71 −7 1 6.3 .452
3 71 −9 3 7.3 .356
4 71 −10 5 7.6 .245
5 71 −11 7 8.1 .062 33 −1 76 4.1 .582
6 71 −11 3 7.7 .191 33 − 91 3.5 .554
7 71 −12 4 8.3 .184 33 −1 42 3.8 .628
It is seen that for all placebo patients the mean change score decreases over
time. This observation is consistent with widely known reports of analysis databases
conducted by the FDA. A reference of the reports is provided in Chen et al. (2011).
In addition to the decrease of the mean change scores, we also see that the standard
deviation increases over time while the correlation decreases. More than 50% of the
patients responded to treatment with placebo at week 4. By enriching with placebo
nonresponders, the mean change scores are stabilized, the standard deviations are
reduced by more than 50%, and the correlations are increased by roughly 200% or
more.
While this dramatic result testifies to the essence of enrichment with placebo
nonresponders, it also raises concerns of the selection bias in designs (e.g., the
sequential parallel design) where rerandomization to treatment with the drug or
placebo is not in place for placebo nonresponders. This is the case when the
initial randomization fails to balance known or unknown prognostic factors, which
is exacerbated during period 1 by differential and informative dropouts (Chen
et al., 2011). We note that an assessment of the extent of bias is not available,
as simulation studies of the sequential parallel design are only limited to settings
without an imbalance of the prognostic factors.
3.2. Conditional ANCOVA Model
The general theory developed in section 2.2 does not impose a specific
distribution requirement on the endpoint in question. To reflect the trial examples
given in section 4, we construct statistical models that allow continuous endpoints
whose measures can be analyzed via a traditional ANCOVA. We show how to use
the models to derive the standard deviations k for k = 1 2 in section 2.3 for sample
size calculation. The models also provide the basis for simulation studies to evaluate
the operating characteristics of the doubly-randomized design. The ANCOVA
approach can easily be extended to settings with more sophisticated random effect
models for continuous endpoints, logistic regression models for binary endpoints, or
Cox’s proportional hazards model for time-to-event endpoints.
Let the random vector Y0 Y1 Y2 represent the patient measurements at
baseline, period 1, and period 2, respectively. The baseline measure Y0 follows a
normal distribution with mean 0 and standard deviation . To reflect patient entry
criteria, Y0 is truncated by a lower cutoff yL from below. Thus, for patients who are
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 747
enrolled into the trial, the resulting baseline measure Y0 follows a truncated normal
density. For periods 1 and 2, we use conditional normal distributions for Y1 given
Y0 = y0 and Y2 given Y1 = y1. This serves two purposes. First, patients must meet
entry criteria (i.e., baseline Y0 above yL or being nonresponders) to be randomized
in each period. The conditional distributions are unaffected by the entry criteria.
Second, as the analysis for the first or second period is based on change score from
the respective baseline, the conditional normal distributions ensure that the change
scores are also conditionally normally distributed. As shown below, the conditional
normal models motivate the traditional ANCOVA.
Let Y1i be the response for period 1 where i = 0 or 1 indicates that the
patient receives placebo or treatment with the new drug. Assume that without
the truncation on Y0 by the entry criteria, Y0 and Y1i follow a bivariate normal
distribution with the mean vector 0 1i , correlation 1, and variance 2
0 and 2
1.
It is well known that the conditional distribution of Y1i given Y0 = y0 is
Y1i y0 ∼ N 1i + 1 y0 − 0 1/ 0
2
1 1 − 2
1 (8)
Let Y1i = Y1i − Y0; then
Y1i y0 ∼ N 1i + 1 y0 − 0 1/ 0 − y0
2
1 1 − 2
1 (9)
During period 1, patients can drop out due to lack of efficacy or safety concerns.
At the end of period 1, the placebo nonresponders are rerandomized to receive the
new drug or placebo for period 2. To ensure that this process does not introduce
selection bias in the inference, we consider conditional distributions of Y2j given
Y10 = y10 where j = 1 or 0 indicates if patients are randomized to receive the new
drug or placebo. Similarly, assume that without the truncation on Y10 by the reentry
criteria, Y10 and Y2i follow a bivariate normal distribution with the mean vector
10 2i , correlation 2 and variance 2
1 and 2
2. Then
Y2j y10 ∼ N 2i + 2 y10 − 10 2/ 1
2
2 1 − 2
2 (10)
For Y2j = Y2j − Y10,
Y2j y10 ∼ N 2i + 2 y10 − 10 2/ 1 − y10
2
2 1 − 2
2 (11)
The conditional models given by equation (9) and equation (11) provide the
basis for conditional ANCOVA models for constructing test statistics Z1 and Z2
for the combination test in equation (3). For the first period, 1 = 11 − 10 is
the treatment effect between the new drug and placebo. Equation (9) suggests
the conditional ANCOVA model for Y1i including a treatment contrast for
1, the baseline measures y0, and other potential factors as model predictors.
It is immediately obvious that the conditional ANCOVA model, following the
conditional model in equation (9), reduces the standard deviation to 1 = 1 1 −
2
1
1/2
, resulting in a more powerful analysis. In case the randomization for the
first period does not balance out the baseline values y0, the conditional ANCOVA
model reduces the bias in the inference. The test statistic Z1 for 1 can be based on
Downloadedby[AmgenInc]at11:0206August2015
748 LIU ET AL.
the Wald statistic for the treatment contrast, which is easily constructed from its
reported point estimate and standard error.
The parameter of interest for the second period is 2 = 21 − 20. The
conditional ANCOVA model on Y2j includes the treatment contrast for 2,
the baseline measures y10, and other potential factors. It is for this conditional
ANCOVA model that the need for the second randomization stands out. Any
imbalance in either y0 or y10 is adjusted, providing the basis towards an unbiased
inference of 2. The sequential parallel design by Fava et al. (2003) lacks this critical
feature. It is unclear from Tamura and Huang (2007) if a change score from the
period 1 baseline is used for both periods or if the seemingly unrelated regression
model necessarily includes any baseline values. Chen et al. (2011) do not explicitly
define the endpoint; however, it is clear that their ANCOVA model includes the
baseline values of period 2 when placebo nonresponders are rerandomized. An
important benefit of the enrichment with placebo nonresponders is also reflected
by the conditional ANCOVA model for period 2, for which the standard deviation
is dramatically reduced to 2 = 2 1 − 2
2
1/2
due to the large increase of the
correlation from 1 to 2 (see section 4).
It is noted that the standard deviations for the conditional ANCOVA models
of period 1 and 2 are 1 = 1 1 − 2
1
1/2
and 2 = 2 1 − 2
2
1/2
, respectively. These
standard deviations, rather than those from unadjusted change from baseline scores,
are used to calculate the sample size for the examples in section 4.
4. APPLICATION
4.1. Proof of Concept
Clinical trials in mood disorder are known to have large failure rates, mainly
because a large percentage of patients in the trial respond to placebo treatment.
With a doubly randomized delayed-start design, the effect of a new drug can
be further evaluated in placebo non-responders. This design consists of a 2-week
placebo lead-in phase and a drug testing phase with two 4-week periods, where both
phases are double-blind. The purpose of the placebo lead-in phase is to screen out
potential placebo responders. This is important, as without the placebo lead-in, the
placebo response rate at the end of period 1 (or week 4), as seen from Table 1, can
be as high as 50%. In the drug testing phase, eligible patients are randomized with
one-to-one ratio in period 1 to receive either a new test drug or placebo. Treatment
in the second period is based on the patient’s period 1 treatment and response
status. At the end of period 1, all patients are evaluated for efficacy and safety. In
particular, patients are classified as responders if their HAMD17 total scores are
18 or less. For period 2 the one-to-one ratio is also used to rerandomize placebo
nonresponders to receive the new test drug or continue on placebo.
Efficacy will be based on the change from baseline of the HAMD17 total
score with the new test drug compared with placebo after 4 weeks of treatment in
each period. This is carried out with the conditional ANCOVA models with the
respective baseline HAMD17 total score and treatment contrast between the new
test drug and the placebo. The combination test in equation (3) is then used to detect
a possible efficacy signal in HAMD17 total score at the one-sided type 1 error rate
= 1.
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 749
The design parameters for the sample size calculation are based on a full
analysis of the MDD dataset used in section 3.1 followed with hierarchical
longitudinal disease modeling. The resulting models provide, among other things,
the mean and standard deviation of the HAMD17 total scores for each week and the
correlations of the HAMD17 total scores between weeks. Through this modeling, it
is determined that 30% of the patients would respond by the end of 2 weeks after
enrollment, which suggests a 2-week placebo lead-in phase. This would reduce the
responder rate for the drug testing phase. The placebo responder rate for period
1 is also 30%. We assume for the first period a treatment difference 1 = 3 in the
mean change from baseline in HAMD17 total score between the new test drug
and placebo. The full analysis also suggests an increased effect size of at least 1
point (i.e., = 1) in placebo nonresponders in period 2. Thus, we choose 2 =
1 + = 4. With 1 = 8 and 1 = 25, 1 = 1 1 − 2
1
1/2
is approximately 7 75.
For the second period, 2 = 9 and 2 = 8, which leads to 2 = 2 1 − 2
2
1/2
≈ 5 5.
Based on literature review, we use 10% as the dropout rate during period 1. With
the responder rate of 30% for period 1, the attrition rate is = 4. To achieve a
minimum 90% power, we use the 95% nominal power, for which the required total
number of patients to be randomized for period 1 is 112. Based on a simulation
study with 20 000 runs, the simulated power is 92925 with the combination test
that uses adaptive weights (see section 5). The optimal weight for the first period in
the combination test is .4857.
To determine the sample size of the sequential parallel design of Fava et al.
(2003), we performed a series of simulation studies as suggested by Tamura and
Huang (2007). For the seemingly unrelated regression analysis, we used the change
from the respective baseline HAMD17 total score as the dependent variable.
Because of the rerandomization, the correlation between the dependent variables of
the two periods is set to zero following Chen et al. (2011). The required sample
size is 128 using the midpoint 7 of the weight range 6 to 8. In comparison, the
required total sample size of a parallel-group design with a treatment difference of
3 points and a standard deviation of 7 75 would be 225 patients after adjusting for
10% dropouts.
4.2. Dose Finding
Consider a Phase 2b dose finding trial for patients with MDD where two
doses of a new test drug are compared to a placebo control. The design again
consists of a 2-week placebo lead-in phase and a drug testing phase with two 4-
week periods, and both phases are double-blind. In the drug testing phase, eligible
patients are randomized in a 1 1 2 ratio into period 1 to receive either of the
two doses of the new test drug or a placebo. Treatment in the second period
is based on the patient’s period 1 treatment and response status. At the end of
period 1, all patients are evaluated for efficacy and safety. In particular, patients are
classified as responders if there is a 50% or more reduction in their Montgomery–
Asberg Depression Rating Scale (MADRS) total scores. Placebo nonresponders
from period 1 will be rerandomized in a 1 1 1 ratio to receive either of the two
doses of the new test drug or to continue with placebo.
A key feature of this design is the utilization of an unequal randomization
ratio during the first period of the double-blind phase. Based on past clinical trials
Downloadedby[AmgenInc]at11:0206August2015
750 LIU ET AL.
experience, designs using an equal randomization ratio for multiple doses tend to
increase the placebo response rate, largely because of the high probability for a
patient to receive the potentially effective new test drug. For the current trial, the
primary endpoint for each period is the change from the respective baseline in the
MADRS total scores. With the 1 1 2 randomization ratio for the first period, the
treatment difference with respect to the primary endpoint between a dose of the new
test drug is assumed to be 4 5 points. In contrast, the smaller treatment difference
of 4 points would be used if the randomization ratio were 1 1 1 (Sinyor et al.,
2010). For the second period, we still use the treatment difference of 4 5 because it is
expected that the randomization ratio would play a much smaller role in the placebo
response rate for the enriched placebo nonresponders. The standard deviations used
for periods 1 and 2 are 9 and 6, respectively. Again, we assume that 30% of placebo
patients are responders at the end of period 1, and 10% of patients are expected to
drop out during period 1.
For the Phase 2b trial, we choose the one-sided type 1 error rate = 05 in
order to control the probability of failure of the Phase 3 program. An objective of
the Phase 2b trial is to identify the dose(s) to carry forward into a Phase 3 clinical
development program. Thus, an analysis comparing each individual dose to placebo
is necessary. Therefore, the sample size is based on a pairwise comparison of a dose
of the new test drug with the placebo at the type 1 error rate = 05 with 95%
nominal power. The resulting sample size for the first randomization is 33 patients
for a dose of the new test drug, and 66 patients for the placebo. The weight for the
first period is 4969.
To control multiple type 1 error rates at the specified = 05 level, we use
the following closed testing procedure. We first perform an overall test against
the global null hypothesis that neither dose is effective (compared to placebo) at
the = 05 level. If the global null hypothesis is rejected, we then test against
the individual null hypothesis that a particular dose of the new test drug is not
effective at the = 05 level. The overall test employs the combination test given
in equation (3) for which triple trend test statistics (Capizzi et al., 1992) are used
for both periods. For each triple trend test statistic, the same trend scores are
derived from three different sigmoid Emax models; each trend statistic is based on
the ANCOVA model with the baseline MADRS total score and the trend score. As
a result, we use the same weight 4969, which is derived from the pairwise sample
size calculation already shown, to combine the two triple trend test statistics. For
the pairwise comparison, the ANCOVA model includes the baseline MADRS total
score and the treatment contrast between the dose in question and placebo. The
pairwise test statistics from the two periods are then combined via equation (3) with
the first period weight 4969 to test against the individual null hypothesis.
This example illustrates the flexibility of the doubly randomized delayed-start
design, which is not present in the previous example. There are two important design
features. First, there are only three treatment groups for each period. Second is
that different randomization ratios for the two periods are employed to take full
advantage of the enrichment with the placebo non-responders. Analytically, we can
apply the closed testing procedure to control the multiple type 1 error rates, and
we can easily apply the existing triple trend test to test against the global null
hypothesis.
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 751
In comparison, the sequential parallel design by Fava et al. (2003) does not
have these features. Five treatment sequences would be needed, which requires a
large randomization block. There is also a limitation in choosing the randomization
ratio. There is no existing procedure for triple trend tests using the seemingly
unrelated regression analysis.
5. SIMULATION STUDIES
5.1. Principle
Monte Carlo simulations play an important role in clinical trial designs and
analyses. It can be used to verify results established by a statistical theory or to
evaluate operating characteristics that are difficult to study theoretically. The latter
includes either investigations of the robustness of the trial design and analysis or an
acknowledgment of the limitations of the simulation studies. Unfortunately, there
have been many instances of abuse or fraud in practice, which are documented in
great detail by Liu and Chi (2010b).
Prior to carrying out simulation studies, the first task is to construct
a simulation model, which consists of two critical components: (i) the set of
procedures to mimic the trial design that strictly follow the prespecified statistical
analysis of proposed methods, and (ii) a set of mathematical models to approximate
the state of nature. Without (i), it is difficult to regard the simulation results as
relevant to the trial design or analysis in question. As the trial design and analysis
are specified, it is possible and necessary to construct the set of procedures in great
detail. For (ii), it is easy to specify distribution models that follow the assumptions
required by the statistical theory; the challenge is to come up with models for
the state of nature which is not fully understood and difficult to predict. With
this conviction in mind, we construct our simulation model for the simplest design
described in section 2.1 to compare the treatment with a single dose of the new drug
to that of a placebo.
5.2. Simulation Model
For the doubly randomized delayed-start design, our simulation model is
constructed in the following steps, reflecting the actual programming code.
1. As seen in section 3.1 and alluded to in section 4.1, we first performed a full
analysis of a historical dataset from a double-blind, placebo-controlled trial
consisting of a new test drug and an active control with an existing drug. We then
fit the data with a set of hierarchical longitudinal models, from which important
parameters for the doubly randomized delayed-start design are derived. These
parameters include placebo means ( 0, 10 and 20), the standard deviations ( 0,
1 and 2), the correlation coefficients ( 1 and 2), and the increased treatment
effect for placebo nonresponders.
2. For the placebo lead-in phase, we generate a large number of baseline values
of y0 from a normal distribution with mean 0 and standard deviation 0. To
qualify for randomization into period 1 of the drug testing phase, we subset the
baseline values according to the nonresponder criterion y0 ≥ yL. From the subset,
Downloadedby[AmgenInc]at11:0206August2015
752 LIU ET AL.
we randomly select n11 for the treatment group, and n10 = r1n11 for the placebo
group of period 1.
3. For the active treatment group in period 1, we generate n11 scores of y11 from the
conditional normal distribution given in equation (8) with 11 = 10 + 1 for a
specified value of the effect size 1; for the placebo group, we generate n10 scores
of y10 from the conditional normal distribution given in equation (8) with 10.
4. For the period 1 analysis, we use the conditional ANCOVA model for the change
scores of y11 − y0 or y10 − y0 with the baseline value y0 and treatment contrast
for the new test drug versus placebo as the predictors. From the ANCOVA
analysis, we extract the estimate and standard error, and compute the Wald test
statistic Z1.
5. For period 2, we select the period 1 subset of placebo scores of y10 by the
nonresponder criterion y10 ≥ yL. This subset and its size are denoted by S∗
10 and
n∗
10 the number of elements of the subset. Let d be the dropout rate of the
placebo nonresponders. We generate a number of dropouts, say, nd, from a
binomial distribution with size n∗
10 and probability d. We then select randomly
n∗
10 − nd scores of y10 from S∗
10, which are used as baseline values for period 2.
The resulting subset is denoted by S10. From S10 we randomly select n21 scores
y10 for the treatment group with the new drug for period 2, and the remaining
n21 scores of y10 are for the placebo group; n21 and n20 are chosen according to
the randomization ratio r2 = n20/n21 for period 2.
6. For the active treatment group in period 2, we generate n21 scores of y21 from
the conditional normal distribution given in equation (10) with 21 = 20 + 2 for
which 2 = 1 + ; for the placebo group, we generate n20 scores of y20 from the
conditional normal distribution given in equation (10) with 20.
7. For the period 2 analysis, we use the conditional ANCOVA model for the change
scores of y21 − y10 or y20 − y10 with the baseline value y10 and treatment contrast
for the new test drug versus the placebo as the predictors. From the ANCOVA
analysis, we extract the estimate and standard error, and compute the Wald test
statistic Z2.
8. We compute the combination test statistic in equation (3) with three choices of
weights. The first is the optimal weight given by equation (5) with the expected
sample size for the second period. The second is the weight by equation (5) with
the actual second period sample size n21. The third is an arbitrarily specified
weight, say, 6, 7, or 8, as suggested by Tamura and Huang (2007).
Steps 2 to 8 are repeated to a specified simulation size under the null hypothesis
that 1 = 2 = 0 and various alternative hypotheses with different values of standard
deviations and correlation coefficients.
5.3. Results
We consider a phase 3 trial with the same doubly randomized delayed-start
design in the proof-of-concept example in section 4.1. The parameter settings are
identical except for the type 1 error rate = 025, power 90%, effect size 1 = 2 5
and 2 = 3 5, and the dropout rate d = 2. The required total sample size is 206
patients. The number of simulation runs is 20 000. We simulate the type 1 error
rates and values of power for different dropout rates d = 1 2 and 3. The results
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 753
Table 2 Type 1 error rate and power with = 1
opt adp
1
fix
2
fix
3
fix
1 = 0, 2 = 0, d = 1
.02640 .02695 .02650 .02675 .02600
1 = 0, 2 = 0, d = 2
.02655 .02680 .02650 .02610 .02625
1 = 0, 2 = 0, d = 3
.02570 .02680 .02625 .02615 0.02545
1 = 2 5, 2 = 3 5, d = 1
.88445 .88680 .88525 .87835 .86195
1 = 2 5, 2 = 3 5, d = 2
.87290 .87680 .87350 .86985 .85245
1 = 2 5, 2 = 3 5, d = 3
.84710 .85460 .85255 .84830 .83545
are summarized in Tables 2 and 3 where opt = 5069 is the optimal fixed weight,
adp is the adaptive weight, and
1
fix = 6,
2
fix = 7, and
3
fix = 8 are arbitrary fixed
weights.
From Table 2, it is seen that the simulated type 1 error rates are very
close to the theoretical value = 025. None of the simulated type 1 error rates
are statistically different from = 025 at the two-sided significance level 05. In
particular, the combination test using adaptive weight does not yield type 1 error
rates that are significantly different from 025. This is consistent with unblinded
trial modification theory by Liu and Chi (2010a). It is noticed, as expected, that
the proposed combination test, using either the optimal weight or adaptive weight,
is more powerful than the test using the prefixed weight 8. By the simulation
studies conducted by Chen et al. (2011), the weighted estimate procedure with
rerandomization of the placebo nonresponders has nearly the same power as that
of the sequential parallel design when the weight for the estimate, based on Tamura
and Huang (2007), is chosen between 6 and 8. When a small period 2 standard
deviation is used to reflect the enrichment, the resulting optimal weight of the
combination test in equation (3) is substantially smaller than 7 or 8. From this, we
can infer that the proposed method is more powerful than the sequential parallel
design of Fava et al. (2003). This conclusion is also consistent with our calculations
of the required sample size of both designs given in section 4.1.
It is seen from Table 2 that the values of power are all below the required
90%. As explained in section 2.3, this is due to the use of expected sample size
for the second period that ignores the variability of the actual number of placebo
nonresponders that can be rerandomized. To fix this problem for the design, we
calculate the final sample size with 93% power. The resulting total sample size is 230.
Table 3 provides the simulation results with this new sample size. From both Table 2
and Table 3, it is clear that a loss of power also occurs when the dropout rate is
higher than the expected dropout rate. This problem could be resolved by using a
larger dropout rate in the sample size calculation. However, for the actual trial, the
dropout rate can be lower than expected, and therefore, using a larger dropout rate
Downloadedby[AmgenInc]at11:0206August2015
754 LIU ET AL.
Table 3 Type 1 error rate and power with = 07
opt adp
1
fix
2
fix
3
fix
1 = 0, 2 = 0, d = 1
.02705 .02760 .02700 .02700 .02645
1 = 0, 2 = 0, d = 2
.02775 .02835 .02785 .02765 .02740
1 = 0, 2 = 0, d = 3
.02740 .02785 .02705 .02675 .02670
1 = 2 5, 2 = 3 5, d = 1
.92270 .92335 .92145 .91390 .89945
1 = 2 5, 2 = 3 5, d = 2
.90600 .90910 .90800 .90245 .88675
1 = 2 5, 2 = 3 5, d = 3
.88735 .89145 .88930 .88480 .87455
in the sample size calculation may unnecessarily increase the cost and duration of
the trial. A possible solution to this problem may be to adjust the sample size to be
randomized for the first stage according to the actual observed dropout rate.
6. DISCUSSION
6.1. Summary
We propose an adaptive doubly randomized delayed-start design that offers
greater flexibility and interpretability than the sequential parallel design of Fava
et al. (2003). The key differences are the addition of the randomization for placebo
patients who meet the enriched entry criteria for the second period, and the use
of an adaptive combination test. The design allows the adaptive weight to be
based on the actual number of patients randomized to the second period. Through
simulation studies, we show that the combination test with the adaptive weight
is more powerful than using a fixed weight, which must be prespecified for the
sequential parallel design. The design can handle the simple setting for comparing
one dose of a new drug against placebo, as well as a complex setting of multiple
doses, including possibly an active control with an existing approved drug. The
latter is important to allow for comparative effectiveness research.
For drug development in mood disorders, the proposed design consists of
a placebo lead-in phase and a drug testing phase with two periods. The placebo
nonresponders from either the placebo lead-in phase or the first period are
randomized to receive the new test drug or placebo.
Chen et al. (2011) raise concerns of rerandomization and state that patients
may feel the drug effect right after receiving the new drug, which may introduce bias
through undermining the integrity of blinding. But if patients must not feel the drug
effect, how has the FDA managed to approve many effective drugs where clinical
endpoints rely on patients own subjective evaluations? A major reason for having a
randomized, double-blind trial, especially with an enrichment design (Temple, 1994),
as opposed to open-label trials, is for patients to feel the drug effect when in fact
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 755
the drug is efficacious (under the alternative hypothesis). Difficulties only arise when
there are noticeable differential side effects in favor of an ineffective new drug.
Issues with partially unblinding the data are addressed by Liu and Pledger (2005)
and Liu and Chi (2010b).
6.2. Future Research
There are several ways to improve the simulation model. First, there is
evidence from existing data that clinical endpoint measures can follow a distribution
with different characteristics (e.g., standard deviation or correlation) for patients
receiving an active drug. Thus, a simulation model that reflects these differences
is useful to evaluate the robustness of the underlying statistical analysis method
(e.g., ANCOVA). Although the combination test is shown to control the type 1
error rates, it must be realized that the underlying assumption is that the p-value
for a given realization is not stochastically smaller than the uniform distribution.
Thus, an inappropriate analysis method for deriving the period-wise test statistics
Z1 or Z2 can still lead to an inflation of the type 1 error rate. Chen et al. (2011)
specifically require that the correlation coefficient is constant. When it is not, their
procedure may inflate the type 1 error rate as well. However, we point out that the
problem is not particular to the design in question. The concern is for the statistical
method that is used to derive the test statistic or p-value, irrespective of the design
employed.
Another improvement is to incorporate longitudinal models in both the
simulation model and the statistical analysis. This has the advantage of addressing
dropouts in both periods as well as to further increase the power of the test. Chen
et al. (2011) evaluated the robustness of the analysis based on a mixed-effect model
when dropouts are missing-at-random (MAR). It is also necessary to evaluate the
mixed effects model analysis when there are differential and informative dropouts
(i.e., missing-not-at-random or MNAR). We notice that Chen et al. (2011) state
that there is a no evidence of MNAR from the 25 New Drug Applications (NDA).
However, it would not be possible to establish evidence of MNAR unless these trials
actually collected the missing data.
Chen et al. (2011) showed that the sequential parallel design of Fava et al.
(2003) does not inflate the unconditional type 1 error rates over 2,000 hypothetical
replications of their simulation studies. In reality, only a few trials are conducted
in a single clinical development program. For trials with imbalance of known and
unknown prognostic factors, only conditional type 1 error rates are relevant. We are
not aware of methods that allow evaluations of the conditional type 1 error rates.
There are many areas for adding adaptive features to the doubly randomized
delayed-start design. A simple setting is sample size adjustment based on blinded
data review of the dropout rate. As the design is naturally adaptive, it also allows
other adaptive features such as sample size adjustment based on unblinded review of
the period 1 variability, effect size, and so on, or adaptive dose-finding. The design
can also be expanded to include randomized withdrawal for patients who respond
to the new test drug in period 1. This addition allows study of the new test drug’s
effect in many ways, including a reduced dose or frequency for maintenance of the
response.
Downloadedby[AmgenInc]at11:0206August2015
756 LIU ET AL.
APPENDIX: PROOF OF THE THEOREM
(a) Since ⊂ , g is also measurable. Thus PH02
p2g ≤ C ≤ C is true
following the Adaptation Lemma of Liu et al. (2002).
(b) Let X be a measurable random variable that follows a standard
normal distribution under the null hypothesis H01 1 = 0. Let
Z1 Z2 = z − 1/2
2 Z2 / 1/2
1
Then PH Z ≥ z = PH Z1 ≥ Z1 Z2 ≤ PH X ≥ Z1 Z2 The inequality for the
probability PH · is true, as following the construction, Z1 is not stochastically larger
than X. Now let
Z2 X = z − 1/2
1 X / 1/2
2
Then X ≥ Z1 Z2 if and only if Z2 ≥ Z1 X , which is equivalent to p2g ≤ C X where
C X = 1 − Z2 X . Thus, PH X ≥ Z1 Z2 = E PH p2g ≤ C X ≤ E C X
for which the last inequality follows (a). Because X follows a standard normal
distribution, it is easy to show that E C X = . Chaining all the inequalities
together, we have
PH Z ≥ z ≤
ACKNOWLEDGMENTS
The authors thank Trevor McMullan for the analysis of the MDD dataset.
They are also grateful to Dr. Yevgen Tymofyeyev for reviewing the article, and
suggesting the use of the adaptive weight.
REFERENCES
Capizzi, T., Survill, T. T., Heyes, J. F. (1992). An empirical and simulated comparison
of some tests for detecting progressiveness of response with increasing doses of a
compound. Biometrical Journal 3:275–289.
Chen, Y. F., Yang, Y., Hung, H. M. J., Wang, S. J. (2011). Evaluation of performance
of some enrichment designs dealing with high placebo response in psychiatric clinical
trials. Contemporary Clinical Trials 32:592–604.
Chi, Y. G., Liu, Q. (1999). The attractiveness of the concept of a prospectively designed
two-stage clinical trial. Journal of Biopharmaceutical Statistics 9(4):537–547.
Cui, L., Hung, H. M. J., Wang, S. J. (1999). Modification of sample size in group sequential
clinical trials. Biometrics 55:853-857.
Fava, M., Evins, A. E., Dorer, D. J., Schoenfeld, D. A. (2003). The problem of the placebo
response in clinical trials for psychiatric disorders: culprits, possible remedies, and a
novel study design approach. Psychotherapy and Psychosomatics 72:115–127.
Habermann, T. M., Weller, E. A., Morrison, V. A., et al. (2006). Rituximab-CHOP versus
CHOP alone or with maintenance rituximab in older patients with diffuse large B-cell
lymphoma. Journal of Clinical Oncology 24:3121–3127.
Heyn, R. M., Joo, P., Karon, M., et al. (1974). BCG in the treatment of acute lymphocytic
leukemia. Blood 46:431–442.
Downloadedby[AmgenInc]at11:0206August2015
DOUBLY RANDOMIZED DELAYED-START DESIGN 757
Jennison, C., Turnbull, B. W. (2000). Group Sequential Methods with Applications to Clinical
Trials Boca Raton, FL: Chapman & Hall.
Khin, N. A., Chen, Y. (2011). Exploratory analyses of efficacy data from major depressive
disorder trials submitted to the US Food and Drug Administration in support of new
drug applications. Journal of Clinical Psychiatry 72:464–472.
Leber, P. (1996). Observations and suggestions for antidementia drug development. Alzheimer
Disease and Associated Disorders 10(suppl. 1):31–34.
Leber, P. (1997). Slowing the progression of Alzheimer disease: Methodological issues.
Alzheimer Disease and Associated Disorders 11(suppl. 5):10–21.
Liu, Q., Chi, G. Y. H. (2010a). Fundamental theory of adaptive designs with unplanned
design change in clinical trials with blinded data. In: Pong, A., Chow, S. C., eds.
Handbook of Adaptive Designs in Pharmaceutical and Clinical Development. Boca Raton,
FL: Chapman & Hall, pp.2-1–2-8.
Liu, Q., Chi, G. Y. H. (2010b). Understanding the FDA guidance on adaptive designs:
historical, legal and statistical perspectives. Journal of Biopharmaceutical Statistics
20(special issue):1178–1219.
Liu, Q., Pledger, G. W. (2005). Interim analysis and bias in clinical trials: The adaptive
design perspective. In: Buncher, R., Tsay, J.-Y., eds. Statistics in the Pharmaceutical
Industry. 3rd ed. Revised and Expanded, New York: Taylor & Francis, pp. 231–244.
Liu, Q., Proschan, M. A., Pledger, G. W. (2002). A unified theory of two-stage adaptive
designs. Journal of the American Statistical Association 97:1034–1041.
McDermott, M. P., Hall, W. J., Oakes, D., Eberly, S. (2002). Design and analysis of two-
period studies of potentially disease-modifying treatments. Controlled Clinical Trials
23:635–649.
Mills, E. J., Kelly, S., Wu P., Guyatt, G. H. (2007). Epidemiology and reporting of
randomized trials employing rerandomization of patient groups: a systematic survey.
Contemporary Clinical Trials 28:268–275.
Papakostas, G. I., Fava, M. (2009). Does the probability of receiving placebo influence
clinical trial outcome? A meta-regression of double-blind, randomized clinical trials in
MDD. European Neuropsychopharmacology 19:34–40.
Sinyor, M., Levitt, A. J., et al. (2010). Does inclusion of a placebo arm influence response to
active antidepressant treatment in randomized controlled trials? Results from pooled
and meta-analyses. Journal of Clinical Psychiatry 71:270–279.
Tamura, R., Huang, X. (2007). An examination of the efficiency of the sequential parallel
design in psychiatric clinical trials. Clinical Trials 4:309–317.
Tedeschini, E., Fava, M., Goodness, T. M., Papakostas, G. I. (2010). Relationship
between probability of receiving placebo and probability of prematurely discontinuing
treatment in double-blind, randomized clinical trials for MDD: a meta-analysis.
European Neuropsychopharmacology 20:562–567.
Temple, R. J. (1994). Special study designs: early escape, enrichment, studies in non-
responders. Communications in Statistics – Theory and Methods 23:499–531.
Downloadedby[AmgenInc]at11:0206August2015

More Related Content

What's hot

Recommendations on Evidence Needed to Support Measurement Equivalence between...
Recommendations on Evidence Needed to Support Measurement Equivalence between...Recommendations on Evidence Needed to Support Measurement Equivalence between...
Recommendations on Evidence Needed to Support Measurement Equivalence between...CRF Health
 
Automated versus non automated weaning
Automated versus non automated weaningAutomated versus non automated weaning
Automated versus non automated weaningAngelo Roncalli
 
Evidence Based Practice Lecture 6_slides
Evidence Based Practice Lecture 6_slidesEvidence Based Practice Lecture 6_slides
Evidence Based Practice Lecture 6_slidesZakCooper1
 
Importance of clinical trials in drug discovery
Importance of clinical trials in drug discoveryImportance of clinical trials in drug discovery
Importance of clinical trials in drug discoveryKarunaMane1
 
PRP for Wound healing
PRP for Wound healingPRP for Wound healing
PRP for Wound healingSchuco
 
Evidence based medicine, by prof Badr Mesbah
Evidence based medicine, by prof Badr MesbahEvidence based medicine, by prof Badr Mesbah
Evidence based medicine, by prof Badr Mesbahmohamed osama hussein
 
More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data
More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims DataMore Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data
More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims DataM. Christopher Roebuck
 
Observational study an overview
Observational study  an overviewObservational study  an overview
Observational study an overviewDrSatyabrataSahoo
 
Clinical trials and evidence
Clinical trials and evidenceClinical trials and evidence
Clinical trials and evidencePratik patil
 
Mlab 97-04-308
Mlab 97-04-308Mlab 97-04-308
Mlab 97-04-308kerpil
 
PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...
PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...
PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...hiij
 
A review of use of enantiomers in homeopathy
A review of use of enantiomers in homeopathyA review of use of enantiomers in homeopathy
A review of use of enantiomers in homeopathyhome
 
# 5th lect clinical trial process
# 5th lect clinical trial process# 5th lect clinical trial process
# 5th lect clinical trial processDr. Eman M. Mortada
 
JOURNAL CLUB PRESENTATION
JOURNAL CLUB PRESENTATIONJOURNAL CLUB PRESENTATION
JOURNAL CLUB PRESENTATIONKAVIYA AP
 

What's hot (18)

EBM ppt by ANN
EBM ppt by ANNEBM ppt by ANN
EBM ppt by ANN
 
Recommendations on Evidence Needed to Support Measurement Equivalence between...
Recommendations on Evidence Needed to Support Measurement Equivalence between...Recommendations on Evidence Needed to Support Measurement Equivalence between...
Recommendations on Evidence Needed to Support Measurement Equivalence between...
 
Automated versus non automated weaning
Automated versus non automated weaningAutomated versus non automated weaning
Automated versus non automated weaning
 
Evidence Based Practice Lecture 6_slides
Evidence Based Practice Lecture 6_slidesEvidence Based Practice Lecture 6_slides
Evidence Based Practice Lecture 6_slides
 
Importance of clinical trials in drug discovery
Importance of clinical trials in drug discoveryImportance of clinical trials in drug discovery
Importance of clinical trials in drug discovery
 
PRP for Wound healing
PRP for Wound healingPRP for Wound healing
PRP for Wound healing
 
Evidence based medicine, by prof Badr Mesbah
Evidence based medicine, by prof Badr MesbahEvidence based medicine, by prof Badr Mesbah
Evidence based medicine, by prof Badr Mesbah
 
Evidence Based Medicine by Dr. Harmanjit Singh, GMC, Patiala
Evidence Based Medicine by Dr. Harmanjit Singh, GMC, PatialaEvidence Based Medicine by Dr. Harmanjit Singh, GMC, Patiala
Evidence Based Medicine by Dr. Harmanjit Singh, GMC, Patiala
 
More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data
More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims DataMore Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data
More Predictive Modeling of Total Healthcare Costs Using Pharmacy Claims Data
 
Observational study an overview
Observational study  an overviewObservational study  an overview
Observational study an overview
 
Clinical trials and evidence
Clinical trials and evidenceClinical trials and evidence
Clinical trials and evidence
 
Bigtown simulation model
Bigtown simulation modelBigtown simulation model
Bigtown simulation model
 
Mlab 97-04-308
Mlab 97-04-308Mlab 97-04-308
Mlab 97-04-308
 
PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...
PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...
PERSONALIZED MEDICINE SUPPORT SYSTEM: RESOLVING CONFLICT IN ALLOCATION TO RIS...
 
A review of use of enantiomers in homeopathy
A review of use of enantiomers in homeopathyA review of use of enantiomers in homeopathy
A review of use of enantiomers in homeopathy
 
# 5th lect clinical trial process
# 5th lect clinical trial process# 5th lect clinical trial process
# 5th lect clinical trial process
 
JOURNAL CLUB PRESENTATION
JOURNAL CLUB PRESENTATIONJOURNAL CLUB PRESENTATION
JOURNAL CLUB PRESENTATION
 
Lilford. Barach.Research_Methods_Reporting (2)
Lilford. Barach.Research_Methods_Reporting (2)Lilford. Barach.Research_Methods_Reporting (2)
Lilford. Barach.Research_Methods_Reporting (2)
 

Similar to Delayed Randomized Design

Patient Recruitment in Clinical Trials
Patient Recruitment in Clinical TrialsPatient Recruitment in Clinical Trials
Patient Recruitment in Clinical TrialsRaymond Panas
 
Module 4 Submodule 4. 2 Final June 2007
Module 4 Submodule 4. 2 Final June 2007Module 4 Submodule 4. 2 Final June 2007
Module 4 Submodule 4. 2 Final June 2007Flavio Guzmán
 
Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...
Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...
Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...OARSI
 
A Model For Pharmacological Research Treatment Of Cocaine Dependence
A Model For Pharmacological Research Treatment Of Cocaine DependenceA Model For Pharmacological Research Treatment Of Cocaine Dependence
A Model For Pharmacological Research Treatment Of Cocaine DependenceRichard Hogue
 
Clinical trial design, Trial Size, and Study Population
Clinical trial design, Trial Size, and Study Population Clinical trial design, Trial Size, and Study Population
Clinical trial design, Trial Size, and Study Population Shubham Chinchulkar
 
Clinical trial phases, requirements and regulations
Clinical trial  phases, requirements and regulationsClinical trial  phases, requirements and regulations
Clinical trial phases, requirements and regulationsDr. Siddhartha Dutta
 
Mildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docx
Mildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docxMildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docx
Mildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docxARIV4
 
Patient Centered Medical Home (PCMH) is not a pill Kevin Grumbach 2013
Patient Centered Medical Home (PCMH)  is not a pill Kevin Grumbach 2013Patient Centered Medical Home (PCMH)  is not a pill Kevin Grumbach 2013
Patient Centered Medical Home (PCMH) is not a pill Kevin Grumbach 2013Paul Grundy
 
Pcori final writeup
Pcori final writeupPcori final writeup
Pcori final writeuphbocian14
 
Pharmacokinetic Studies in Patients
Pharmacokinetic Studies in PatientsPharmacokinetic Studies in Patients
Pharmacokinetic Studies in PatientsQPS Holdings, LLC
 
Development of clinical trail protocol
Development of clinical trail protocolDevelopment of clinical trail protocol
Development of clinical trail protocolPradnya Shirude
 
Cross sectional study overview
Cross sectional study overviewCross sectional study overview
Cross sectional study overviewherunyu
 
NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015Sitta Sittampalam
 

Similar to Delayed Randomized Design (20)

OBE Impact Measurement.ppt
OBE Impact Measurement.pptOBE Impact Measurement.ppt
OBE Impact Measurement.ppt
 
Patient Recruitment in Clinical Trials
Patient Recruitment in Clinical TrialsPatient Recruitment in Clinical Trials
Patient Recruitment in Clinical Trials
 
Module 4 Submodule 4. 2 Final June 2007
Module 4 Submodule 4. 2 Final June 2007Module 4 Submodule 4. 2 Final June 2007
Module 4 Submodule 4. 2 Final June 2007
 
Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...
Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...
Osteoarthritis: Structural Endpoints for the Development of Drugs, Devices, a...
 
Towse NDDP implications for drug development
Towse NDDP implications for drug developmentTowse NDDP implications for drug development
Towse NDDP implications for drug development
 
Mpharm RA 103.pdf
Mpharm RA 103.pdfMpharm RA 103.pdf
Mpharm RA 103.pdf
 
A Model For Pharmacological Research Treatment Of Cocaine Dependence
A Model For Pharmacological Research Treatment Of Cocaine DependenceA Model For Pharmacological Research Treatment Of Cocaine Dependence
A Model For Pharmacological Research Treatment Of Cocaine Dependence
 
Clinical trial design, Trial Size, and Study Population
Clinical trial design, Trial Size, and Study Population Clinical trial design, Trial Size, and Study Population
Clinical trial design, Trial Size, and Study Population
 
The Impact of Resident Duty Hour Reform on Hospital Readmission Rates 10.27.09
The Impact of Resident Duty Hour Reform on Hospital Readmission Rates 10.27.09The Impact of Resident Duty Hour Reform on Hospital Readmission Rates 10.27.09
The Impact of Resident Duty Hour Reform on Hospital Readmission Rates 10.27.09
 
Clinical trial phases, requirements and regulations
Clinical trial  phases, requirements and regulationsClinical trial  phases, requirements and regulations
Clinical trial phases, requirements and regulations
 
Mildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docx
Mildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docxMildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docx
Mildrey Silverio Week 13 Mildrey SilverioCOLLAPSETop of Fo.docx
 
Patient Centered Medical Home (PCMH) is not a pill Kevin Grumbach 2013
Patient Centered Medical Home (PCMH)  is not a pill Kevin Grumbach 2013Patient Centered Medical Home (PCMH)  is not a pill Kevin Grumbach 2013
Patient Centered Medical Home (PCMH) is not a pill Kevin Grumbach 2013
 
Pcori final writeup
Pcori final writeupPcori final writeup
Pcori final writeup
 
NYP EBP Cohort 8 Under Pressure
NYP EBP Cohort 8 Under PressureNYP EBP Cohort 8 Under Pressure
NYP EBP Cohort 8 Under Pressure
 
Pharmacokinetic Studies in Patients
Pharmacokinetic Studies in PatientsPharmacokinetic Studies in Patients
Pharmacokinetic Studies in Patients
 
Clinical trials article
Clinical trials articleClinical trials article
Clinical trials article
 
Development of clinical trail protocol
Development of clinical trail protocolDevelopment of clinical trail protocol
Development of clinical trail protocol
 
Cra helwan
Cra helwanCra helwan
Cra helwan
 
Cross sectional study overview
Cross sectional study overviewCross sectional study overview
Cross sectional study overview
 
NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015
 

Delayed Randomized Design

  • 1. This article was downloaded by: [Amgen Inc] On: 06 August 2015, At: 11:02 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London, SW1P 1WG Journal of Biopharmaceutical Statistics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lbps20 Doubly Randomized Delayed-Start Design for Enrichment Studies with Responders or Nonresponders Qing Liu a , Pilar Lim a , Jaskaran Singh a , David Lewin a , Barry Schwab a & Justine Kent a a Janssen Research & Development, LLC , Raritan , New Jersey , USA Published online: 31 May 2012. To cite this article: Qing Liu , Pilar Lim , Jaskaran Singh , David Lewin , Barry Schwab & Justine Kent (2012) Doubly Randomized Delayed-Start Design for Enrichment Studies with Responders or Nonresponders, Journal of Biopharmaceutical Statistics, 22:4, 737-757, DOI: 10.1080/10543406.2012.678234 To link to this article: http://dx.doi.org/10.1080/10543406.2012.678234 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions
  • 2. Journal of Biopharmaceutical Statistics, 22: 737–757, 2012 Copyright © Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2012.678234 DOUBLY RANDOMIZED DELAYED-START DESIGN FOR ENRICHMENT STUDIES WITH RESPONDERS OR NONRESPONDERS Qing Liu, Pilar Lim, Jaskaran Singh, David Lewin, Barry Schwab, and Justine Kent Janssen Research & Development, LLC, Raritan, New Jersey, USA High placebo response has been a major source of bias and is difficult to deal with in many central nervous system (CNS) clinical trials. This bias has led to a high failure rate in mood disorder trials even with known effective drugs. For cancer trials, the traditional parallel group design biases the inference on the maintenance effect of the new drug with the traditional time-to-treatment failure analysis. To minimize bias, we propose a doubly randomized delayed-start design for clinical trials with enrichment. The design consists of two periods. In the first period, patients can be randomized to receive several doses of a new drug or a control. In the second period, control patients of the first period of an enriched population can be rerandomized to receive the same or fewer doses of the new drug or to continue on the control. Depending on the clinical needs, different randomization ratios can be applied to the two periods. The essential feature is that the design is naturally adaptive because of the randomization for the second period. As a result, other aspects of the second period, such as the sample size, can be modified adaptively when an interim analysis is set up for the first period. At the end of the trial, response data from both randomizations are combined in an integrated analysis. Because of the enrichment in the second period, the design increases the probability of trial success and, in addition, reduces the required sample size. Thus, for clinical development, the design offers greater efficiency. Key Words: Adaptive design; Enrichment design; Maintenance effect; Proof-of-concept trials; Randomized start design; Sequential parallel design. 1. INTRODUCTION High placebo response in many central nervous system (CNS) clinical trials leads to reduced sensitivity to distinguish an effective therapeutic agent from a placebo control. For example, clinical trials employing effective antidepressants are widely known for their high failure rates (Khin and Chen, 2011). There are many causes for high placebo response, not all well understood. One of the key factors is an expectation bias of the likelihood of receiving placebo (Papakostas and Fava, 2009; Sinyor et al., 2010; Tedeschini et al., 2010). Therefore, there is a need to develop new study designs that aim to reduce the placebo response. For cancer Received October 31, 2011; Accepted February 15, 2012 Address correspondence to Qing Liu, Janssen Research & Development, LLC, Route 202, PO Box 300, Raritan, NJ 08869, USA; E-mail: QLiu2@its.jnj.com 737 Downloadedby[AmgenInc]at11:0206August2015
  • 3. 738 LIU ET AL. trials, it is often important to study the maintenance effect of a new treatment regimen for patients who have responded to an effective control induction therapy. However, with the traditional parallel group design, the analysis of the maintenance effect with the standard time-to-treatment failure may be biased by the differential induction response rate. To address the issue of bias, we propose a doubly randomized delayed- start design that is accomplished in two periods. In the first period, patients are randomized to receive several doses of a new drug or placebo. At the end of the first period, an assessment is made to determine whether patients who had been randomized to receive placebo meet the reentry criteria for randomization in the second period. Those who meet the reentry criteria are rerandomized into the second period to receive the same or fewer doses of the new drug, depending on results of the first period, or to continue on placebo. The reentry criteria can consist of key enrichment elements, in terms of either efficacy or safety. For example, for antidepressant trials, enrichment is often made with placebo nonresponders from the first period. For oncology trials, enrichment can be defined to include treatment responders or patients whose disease has not progressed. To ensure interpretability of statistical inference, enrichment criteria using specific cutoffs on predetermined outcome measures or an adjudication process need to be specified in advance in the study protocol. Depending on the need of the trial design or clinical considerations, different randomization ratios can be applied to the two periods. The essential feature is that the design is naturally adaptive because of the randomization for the second period. As a result, other aspects of the second period, such as the sample size, choice of the doses of the new drug, duration of the treatment, and follow-up, can be modified adaptively when an interim analysis is set up for the first period. At the end of the trial, efficacy data from both randomizations are integrated via an optimal combination test. There is a rich history of employing designs with double randomizations for clinical trials in different therapeutic areas (see the survey article by Mills et al., 2007). An early application was described by Heyn et al. (1974) for pediatric acute lymphocytic leukemia, where patients who were initially randomized to receive a control regimen and who continued without CNS or marrow relapse were rerandomized to receive a new treatment regimen or one of the two controls. In modern cancer trials, double randomization schemes are often used by cooperative oncology groups funded by the National Cancer Institute (NCI) for new drug development of biotech companies. For example, the Eastern Cooperative Oncology Group (ECOG) trial ECOG 4494 on the maintenance effect of rituximab for patients with diffuse large B-cell lymphoma employed a double randomization scheme, in which patients who responded to induction treatment with either an investigational regimen or standard regimen were rerandomized to different maintenance treatments (Habermann et al., 2006). The results of this trial, along with supportive data from other trials, led to the maintenance indication for rituximab. According to the clinical trial registry www.ClinicalTrials.gov, the National Cancer Institute (NCI) provided trial details in November 1999 (with identifier NCT00003150), indicating that the trial was initiated in December 1997. The idea of randomized delayed start was initiated by Dr. Leber of the U.S. Food and Drug Administration (FDA) Division of Neurophamacological Drug Products during the period between 1994 and 1996 for degenerative neurologic Downloadedby[AmgenInc]at11:0206August2015
  • 4. DOUBLY RANDOMIZED DELAYED-START DESIGN 739 diseases. Leber (1996) describes a randomized start design in which patients are randomized to receive a given treatment sequence (i.e., drug/drug, placebo/drug, or placebo/placebo) at baseline but are not actively rerandomized before entering period 2. More details on the regulatory background, scientific rational and motivation, and further description of the design are provided in the discussion article by Leber (1997). The sequential parallel design has been proposed by Fava et al. (2003), which, in comparison to the randomized start design of Leber (1996), uses enrichment with the placebo nonresponders of the first period for initiation of treatment with the new drug in the second period. However, enrichment designs with nonresponders critically require randomization to ensure valid statistical inference and unbiased clinical conclusions (Temple, 1994). This design feature is not included in the sequential parallel design. The randomized start design also motivated the concept of what we now call the (adaptive) doubly randomized delayed-start design in late 1997 for use in the study of a variety of neurologic and psychiatric drugs. During this time, weighted combination tests were developed for the simpler problem of sample size adjustment (Chi and Liu, 1999; Cui et al., 1999). The details for the doubly randomized delayed- start design, however, including sample size calculation and statistical analysis, have not been developed and reported as originally intended. A slight modification of the randomized start design was described by McDermott et al. (2002) for which the placebo patients in the first period are rerandomized to receive the drug or to continue on placebo. McDermott et al. (2002) provide a weighted method for statistical analysis for a general two-period factorial design, which includes the special case of the randomized start design of Leber (1996). In 2009, we proposed to the FDA at an end of phase 2 meeting the idea of rerandomizing placebo nonresponders for a confirmatory phase 3 clinical trial to receive one of two doses of an antidepressant or the placebo control, due to concerns of potential bias in sequential parallel design with possible unbalanced known or unknown prognostic factors at baseline. In an article by the FDA staff, Chen et al. (2011) note that in sequential parallel designs, treatment assignment in period 2 is predetermined at the randomization of period 1 and raise concerns of bias “due to randomness and possible unbalanced dropouts among placebo non-responders” prior to period 2. They also show through theoretical development and simulation work that there is no need for the complex, seemingly unrelated regression analysis proposed by Tamura and Huang (2007) when rerandomization of placebo nonresponders for the second period is in place. As mentioned earlier, the proposed doubly randomized delayed-start design is by nature an adaptive design, which is different from the nonadaptive design by Chen et al. (2011). For applications in nonadaptive trial settings, the proposed design is also distinct in terms of the statistical method, sample size calculation, and flexibility in randomization. This article uses antidepressant drug development to illustrate the basic construction of clinical trials with the doubly randomized delayed-start design. Therefore, throughout the remaining article, we only consider enrichment with placebo nonresponders. The general theory and methods apply to both clinical settings. We show in section 3.1 that enrichment prior to the second randomization can greatly enhance the usefulness of the clinical outcomes for the resulting placebo nonresponders. Specifically, compared to patients randomized in the first period, Downloadedby[AmgenInc]at11:0206August2015
  • 5. 740 LIU ET AL. clinical outcomes of the placebo nonresponders for the second period are not only far less variable but also have a higher correlation. As a result, the placebo nonresponders are expected to be more sensitive to treatment with an effective new drug, or to be nonresponsive if they are randomized to continue with placebo. Central to the proposed design is the second randomization. The enrichment process can introduce unexpected selection bias in comparing the effect of the new drug with placebo for the placebo nonresponders if the second randomization is not put in place. Therefore, this randomized delayed-start enrichment for the second period would effectively address the placebo response issue and, consequently, increase the effect size while avoiding the potential bias in the statistical analysis and clinical conclusions. In section 2, we develop an optimal combination test statistic, as well as a simple closed-form formula for sample size calculation. Both fully take advantage of the improved sensitivity and predictability of the enrichment. In section 3.2, we describe a conditional analysis of covariance (ANCOVA) model for each period. We construct the test statistics for the two periods, which are then used in the optimal combination test statistic for hypothesis testing. The proposed test method only relies on a standard ANCOVA, and therefore, no special statistical procedure (e.g., the seemingly unrelated regression analysis) is needed. For settings with binary or time-to-event endpoint, we use the standard logistic regression model or the Cox proportional hazards model for analysis. As alluded to earlier, the second randomization also brings in additional benefits that are not possible with the sequential parallel design. As shown in section 2.2, the proposed doubly randomized delayed-start design is naturally adaptive and utilizes the combination test statistic proposed by Cui et al. (1999). Following the general measure-theoretic framework of Liu et al. (2002) for adaptive designs, we provide the justification for the proposed design in the appendix for any type of clinical endpoints, not just for continuous endpoints following normal distributions. For clinical development in general, this approach can be used for early clinical programs such as (Phase 2a) proof-of-concept trials or (Phase 2b) dose- finding trials, as well as confirmatory Phase 2b/3 combination or Phase 3 trials, which may include adaptive features such as sample size modification, or adaptive dose-finding. The design can increase the probability of trial success via a reduced sample size as compared to a standard parallel group design. In addition, the design can be further expanded to include a randomized withdrawal for patients who are randomized to receive the new drug in the first period (see the brief description in section 6). Because of all these features, the doubly randomized delayed-start design can substantially increase the efficiency (i.e., in cost and resource) and effectiveness (i.e., in probability of technical and regulatory success) of antidepressant clinical development. In section 4.1 we provide an illustrative example of a proof of concept trial where a single dose of a new drug is compared to placebo. We also compare the required sample size to that of the sequential parallel design, as well as the traditional parallel group design. In section 4.2, we also present a Phase 2b trial with two doses of a new drug and illustrate how to apply the proposed sample size procedure as well as existing multiple trend test and closed testing procedures. In section 5, we provide simulation studies to confirm the adaptive measure-theoretic theory for the combination test and, in addition, show that the weights in the Downloadedby[AmgenInc]at11:0206August2015
  • 6. DOUBLY RANDOMIZED DELAYED-START DESIGN 741 combination test can also depend on the actual sample size randomized to each period without inflating the type 1 error rates. The proposed design avoids the potential bias that may be introduced if the placebo nonresponders in the placebo–drug and placebo–placebo sequences are not comparable. At a minimum, the proposed design is more efficient and offers greater inferential interpretability. Because of its adaptive nature, the design also offers greater flexibility that benefits clinical development. 2. DESIGN 2.1. Description To develop the theory and method, we consider the simple setting where patients are randomized in both periods to receive either one dose of the new drug or placebo. We then illustrate in section 4.2 how to apply the theory and method for an application with two doses of a new drug. The design consists of two periods. At the beginning of period 1, patients are assessed for their baseline variables and then randomized to receive either placebo or treatment(s) with the new drug. Patients are then treated during the first period. During the first period, patients are allowed to drop out due to lack of efficacy or safety concerns. Patients who are treated with the new drug in the first period may continue with their treatment(s) in the second period. At the end of the first period, patients are evaluated for their response to the assigned treatment and are classified as responders or nonresponders. For the second period, patients who received placebo and have not dropped out in the first period and who are nonresponders at the end of the first period are randomized to receive the treatment with the new drug or to continue on placebo in the second period. The rerandomized placebo patients are then treated during the second period and at the end of the second period, they are evaluated for their final clinical outcomes. Note that the durations of the two periods may be the same but it is not required. Also, the randomization ratios may not be balanced. 2.2. General Theory For each period, patients are evaluated with respect to a clinical endpoint. The endpoints for the two periods are not required to be the same for the proposed design. Let 1 be the parameter comparing the onset effect of the new drug to placebo with the period 1 endpoint. For the second period, let 2 be the parameter with the period 2 endpoint for comparing the delayed-start effect of the new drug to placebo. We are interested in testing against the global null hypothesis 0 1 ≤ 0 and 2 ≤ 0 in favor of the alternative hypothesis A 1 > 0 or 2 > 0 For 1 and 2, the test statistics against the individual null hypothesis 01 1 ≤ 0 and 02 2 ≤ 0 are denoted by Z1 and Z2, respectively. To establish the efficacy of Downloadedby[AmgenInc]at11:0206August2015
  • 7. 742 LIU ET AL. the new drug, we combine Z1 and Z2 via Z = 1 1/2 Z1 + 2 1/2 Z2 (1) for certain prespecified weights 1 and 2 such that 1 + 2 = 1. Note that the combination test statistic is widely used for two-stage adaptive designs in the literature (see Cui et al., 1999) with a single randomization. The difference here is that Z1 and Z2 correspond to two different randomizations. In the following, we define Z1 and Z2, and establish that the test Z ≥ z , for which z is the critical value of the standard normal distribution at the significance level , controls the type 1 error rate at . Let be the -field of the first period data from all randomized patients. Assume for testing against the null hypothesis 1 ≤ 0 in favor of the alternative hypothesis 1 > 0, there is a p-value p1, which is measurable, such that PH01 p1 ≤ ≤ for all ∈ 0 1 where H01 1 = 0. Let · be the cumulative distribution function of the standard normal distribution. Define Z1 to be the normal inverse of p1, that is, Z1 = −1 1 − p1 . Then under the null hypothesis H01, the normal inverse test statistic Z1 is not stochastically larger than a standard normal distribution. The construction of Z2 is more involved as the trial involves a selection process for patients who are randomized to receive placebo in the first period. Let be the -field of the first period data of all patients who are randomized to receive placebo in the first period. Let g represent the process for selecting patients from those who are randomized to receive placebo in the first period to be randomized for the second period. Following the description of the design, g involves excluding patients for various reasons of dropout or patients whose outcome meet the criteria for treatment responders at the end of the first period. In general, g cannot be fully specified. However, it can be assumed that g is a measurable function with range M whose elements represent various choices of subsets of patients to be rerandomized. In Liu et al. (2002), g is known as an adaptation rule. For each m ∈ M, let p2m be a p-value for testing against the null hypothesis 02 2 ≤ 0 in favor of the alternative hypothesis A2 2 > 0, such that PH02 p2m ≤ C ≤ C for any measurable function C ∈ 0 1 where H02 2 = 0. Following Liu et al. (2002), the adaptive p-value is given by p2g = m∈M p2mI g=m (2) where I g=m is the indicator for the event g = m . By the adaptation theory of Liu et al. (2002), we establish the following theorem. Theorem. (a) PH02 p2g ≤ C ≤ C. (b) Let Z2 = −1 1 − p2g . Then under the null hypothesis H0 1 = 2 = 0, the test statistic Z = 1 1/2 Z1 + 2 1/2 Z2 (3) is not stochastically larger than the standard normal distribution. Downloadedby[AmgenInc]at11:0206August2015
  • 8. DOUBLY RANDOMIZED DELAYED-START DESIGN 743 The proof of the theorem is given in the appendix. Following the theorem, the combination test statistic Z can be used to test against the null hypothesis 0 in favor of a more specific alternative hypothesis at a specified significance level . By construction, the type 1 error rate of the combination test is controlled even though the sample size for the second period is random and is irrespective of whether dropout at the end of period 1 is informative or not. The results of the theorem remain valid even if the selection process g is expanded to depend on comparative first period data, that is, g is expanded to be a measurable function. The theorem relies on the p-values p1, and p2m for m ∈ M, that are not stochastically smaller than uniform distributions. This allows various types of endpoints, including continuous, binary, and time-to-event endpoints. Therefore, for clinical investigations in general, this approach can be used for confirmatory Phase 3 trials, as well as for early clinical development such as (Phase 2a) proof-of-concept trials or (Phase 2b) dose- finding trials. Either an asymptotic or randomization-based exact justification for this assumption rests on randomizations at the beginning of each period. Without the second randomization, such as the design by Fava et al. (2003), there is no guarantee of uniformity of p2m for m ∈ M. For continuous endpoints following normal distributions, Chen et al. (2011) develop a test procedure based on a weighted average of estimates for designs incorporating double randomizations. They show that the estimates from two periods have (asymptotic) zero correlation under a constancy assumption of the correlation between the continuous endpoint measures. The proof also requires that the sample size for each period is fixed in advance. With the proposed doubly randomized delayed-start design, the lack of the constancy assumption of the correlation as well as lack of the normality assumption can be easily handled with exact randomization tests. Also note that the theorem assumes the weights k for k = 1 2 are prespecified. This assumption, however, can be relaxed to allow the weights to depend on blinded first period data. We can easily justify this following the blinded adaptation theory by Liu and Chi (2010a). The prespecified weights k for k = 1 2 are given in section 2.3. Standard analysis for two-group comparisons provides estimates of the onset and delayed-start effects, 1 and 2. 2.3. Sample Size Let rk be the randomization ratio of patients receiving placebo to patients receiving the treatment with the new drug for period k where k = 1 2. The numbers of patients for the treatment group with the new drug and the placebo group are nk and rknk, respectively. Assume that for k = 1 or 2, Zk follows asymptotically a normal distribution with mean E Zk = nkRk 1/2 k for Rk = rk/ 1 + rk , k = k/ k with some standard deviation k, and variance Var Zk = 1. With this canonical form (see Jennison and Turnbull, 2000, p. 49), we provide a uniform approach to sample size calculation. For a normally distributed endpoint, Downloadedby[AmgenInc]at11:0206August2015
  • 9. 744 LIU ET AL. k is the standard deviation of the normal distribution. For a binary endpoint, k is the standard deviation based on the pooled success rate of the two treatment groups. Fava et al. (2003) provide the sample size calculation for the sequential parallel design with binary endpoints; however, there is no sample size calculation method for continuous endpoints; determination of sample size has to rely on time- consuming simulation studies (Tamura and Huang, 2007). Under an alternative hypothesis, Z1 and Z2 may not be independent in general. To simplify sample size calculation, we follow Chen et al. (2011) and assume the correlation coefficients between measurements at the first and second periods are identical for both treatment groups. Then Z1 and Z2 are asymptotically independent. As a result, the test statistic Z given in equation (3) follows an asymptotic normal distribution with mean E Z ≈ 1/2 1 n1R1 1/2 1 + 1/2 2 n∗ 2R2 1/2 2 (4) and variance Var Z = 1, where n∗ 2 is the expected number of placebo nonresponders who are randomized to receive the treatment with the new drug. Now assume further that k ≥ 0 for k = 1 2. Then maximizing the power of the test Z ≥ z is equivalent to maximizing the expectation E Z given in equation (4) with respect to k for k = 1 2 subject to the constraint 1 + 2 = 1. This leads to optimal weights ∗ k for k = 1 2. By the method of Lagrange multipliers, we obtain ∗ 1 = n1R1 2 1 n1R1 2 1 + n∗ 2R2 2 2 (5) and ∗ 2 = 1 − ∗ 1. Let be the rate of attrition due to dropout or exclusion of responders of the placebo patients from period 1. Then, for the second period, the expected number of placebo nonresponders who are randomized to receive the treatment with the new drug is n∗ 2 = r1n1 1 − / 1 + r2 and the expected number of placebo nonresponders who are randomized to continue on placebo is r2n∗ 2 = r1r2n1 1 − / 1 + r2 Using the optimal weights given in equation (5), E Z in equation (4) becomes E Z = n1R 1/2 1 (6) for R = R1 1 + R2 1 − R2 / 1 − R1 1 − 2 Downloadedby[AmgenInc]at11:0206August2015
  • 10. DOUBLY RANDOMIZED DELAYED-START DESIGN 745 where = 2/ 1. For given type 1 and 2 error rates and , the required sample size n1 follows the equation n1R = z + z 2 / 2 1 (7) where z and z are critical values of the standard normal distribution at and . Note that the n∗ 2 and r2n∗ 2 are expected numbers of placebo nonresponders to be randomized to the new drug or placebo. The actual number of placebo nonresponders to be randomized, that is, n2 1 + r2 , is a random variable. As this variability is not accounted for in the sample size calculation procedure, it is expected that the actual power is less than the nominal level. This is illustrated in section 5.3 through simulation studies. An ad hoc fix for this problem is to identify the amount of power loss through simulation studies under different designs and then use an adjusted power, which is also illustrated in section 5.3. 3. STATISTICAL MODELS 3.1. Enrichment with Nonresponders Enrichment with nonresponders is an appealing clinical concept as patients who are responders tend to continue to remain as responders and, as a result, reduce the ability to detect a treatment difference of an effective drug from placebo (Temple, 1994). However, less is known about the characteristics of the nonresponders in terms of their clinical outcome trajectory, the variability, and the correlation between clinical outcome measures. We present results of an analysis of the 17-item Hamilton Depression Rating Scale (HAMD17) total scores of placebo patients from a randomized, double-blind parallel group trial to study the efficacy and safety of a new drug in patients with major depressive disorder (MDD). A balanced randomization was planned with 75 patients allocated to each treatment group. Study duration included 1 week of screening, 1 week of washout, 7 weeks of treatment with patients monitored weekly, and 1 week of follow-up. While the primary endpoint for the study was week 7 change from baseline in MADRS total score, the week 7 change from baseline in HAMD17 total score was a key secondary endpoint. This dataset is used to illustrate the design of the proof-of-concept trial in section 4.1. In Table 1, we provide the means and standard deviations of changes in HAMD17 total scores and correlations between HAMD17 total scores for all randomized placebo patients as well as placebo nonresponders at week 4. For all randomized patients, a change in the HAMD17 total score is calculated as the difference between the weekly HAMD17 total score and baseline. A treatment nonresponder is defined as a patient with the HAMD17 total score greater than 18 at week 4. For treatment nonresponders, a change in the HAMD17 total score is the difference of the HAMD17 total scores at week 5, 6, or 7 from week 4. Pearson’s correlation coefficient is used to assess potential relationships between the baseline or week 4 and subsequent week HAMD17 total score. A full analysis of this dataset was performed, as mentioned later in sections 4.1 and 5.2. Due to its scale, it was decided to not present it here in this article. Downloadedby[AmgenInc]at11:0206August2015
  • 11. 746 LIU ET AL. Table 1 Properties of enrichment with placebo nonresponders All patients Nonresponders Week N Mean SD N Mean SD 1 69 −4 1 4.2 .504 2 71 −7 1 6.3 .452 3 71 −9 3 7.3 .356 4 71 −10 5 7.6 .245 5 71 −11 7 8.1 .062 33 −1 76 4.1 .582 6 71 −11 3 7.7 .191 33 − 91 3.5 .554 7 71 −12 4 8.3 .184 33 −1 42 3.8 .628 It is seen that for all placebo patients the mean change score decreases over time. This observation is consistent with widely known reports of analysis databases conducted by the FDA. A reference of the reports is provided in Chen et al. (2011). In addition to the decrease of the mean change scores, we also see that the standard deviation increases over time while the correlation decreases. More than 50% of the patients responded to treatment with placebo at week 4. By enriching with placebo nonresponders, the mean change scores are stabilized, the standard deviations are reduced by more than 50%, and the correlations are increased by roughly 200% or more. While this dramatic result testifies to the essence of enrichment with placebo nonresponders, it also raises concerns of the selection bias in designs (e.g., the sequential parallel design) where rerandomization to treatment with the drug or placebo is not in place for placebo nonresponders. This is the case when the initial randomization fails to balance known or unknown prognostic factors, which is exacerbated during period 1 by differential and informative dropouts (Chen et al., 2011). We note that an assessment of the extent of bias is not available, as simulation studies of the sequential parallel design are only limited to settings without an imbalance of the prognostic factors. 3.2. Conditional ANCOVA Model The general theory developed in section 2.2 does not impose a specific distribution requirement on the endpoint in question. To reflect the trial examples given in section 4, we construct statistical models that allow continuous endpoints whose measures can be analyzed via a traditional ANCOVA. We show how to use the models to derive the standard deviations k for k = 1 2 in section 2.3 for sample size calculation. The models also provide the basis for simulation studies to evaluate the operating characteristics of the doubly-randomized design. The ANCOVA approach can easily be extended to settings with more sophisticated random effect models for continuous endpoints, logistic regression models for binary endpoints, or Cox’s proportional hazards model for time-to-event endpoints. Let the random vector Y0 Y1 Y2 represent the patient measurements at baseline, period 1, and period 2, respectively. The baseline measure Y0 follows a normal distribution with mean 0 and standard deviation . To reflect patient entry criteria, Y0 is truncated by a lower cutoff yL from below. Thus, for patients who are Downloadedby[AmgenInc]at11:0206August2015
  • 12. DOUBLY RANDOMIZED DELAYED-START DESIGN 747 enrolled into the trial, the resulting baseline measure Y0 follows a truncated normal density. For periods 1 and 2, we use conditional normal distributions for Y1 given Y0 = y0 and Y2 given Y1 = y1. This serves two purposes. First, patients must meet entry criteria (i.e., baseline Y0 above yL or being nonresponders) to be randomized in each period. The conditional distributions are unaffected by the entry criteria. Second, as the analysis for the first or second period is based on change score from the respective baseline, the conditional normal distributions ensure that the change scores are also conditionally normally distributed. As shown below, the conditional normal models motivate the traditional ANCOVA. Let Y1i be the response for period 1 where i = 0 or 1 indicates that the patient receives placebo or treatment with the new drug. Assume that without the truncation on Y0 by the entry criteria, Y0 and Y1i follow a bivariate normal distribution with the mean vector 0 1i , correlation 1, and variance 2 0 and 2 1. It is well known that the conditional distribution of Y1i given Y0 = y0 is Y1i y0 ∼ N 1i + 1 y0 − 0 1/ 0 2 1 1 − 2 1 (8) Let Y1i = Y1i − Y0; then Y1i y0 ∼ N 1i + 1 y0 − 0 1/ 0 − y0 2 1 1 − 2 1 (9) During period 1, patients can drop out due to lack of efficacy or safety concerns. At the end of period 1, the placebo nonresponders are rerandomized to receive the new drug or placebo for period 2. To ensure that this process does not introduce selection bias in the inference, we consider conditional distributions of Y2j given Y10 = y10 where j = 1 or 0 indicates if patients are randomized to receive the new drug or placebo. Similarly, assume that without the truncation on Y10 by the reentry criteria, Y10 and Y2i follow a bivariate normal distribution with the mean vector 10 2i , correlation 2 and variance 2 1 and 2 2. Then Y2j y10 ∼ N 2i + 2 y10 − 10 2/ 1 2 2 1 − 2 2 (10) For Y2j = Y2j − Y10, Y2j y10 ∼ N 2i + 2 y10 − 10 2/ 1 − y10 2 2 1 − 2 2 (11) The conditional models given by equation (9) and equation (11) provide the basis for conditional ANCOVA models for constructing test statistics Z1 and Z2 for the combination test in equation (3). For the first period, 1 = 11 − 10 is the treatment effect between the new drug and placebo. Equation (9) suggests the conditional ANCOVA model for Y1i including a treatment contrast for 1, the baseline measures y0, and other potential factors as model predictors. It is immediately obvious that the conditional ANCOVA model, following the conditional model in equation (9), reduces the standard deviation to 1 = 1 1 − 2 1 1/2 , resulting in a more powerful analysis. In case the randomization for the first period does not balance out the baseline values y0, the conditional ANCOVA model reduces the bias in the inference. The test statistic Z1 for 1 can be based on Downloadedby[AmgenInc]at11:0206August2015
  • 13. 748 LIU ET AL. the Wald statistic for the treatment contrast, which is easily constructed from its reported point estimate and standard error. The parameter of interest for the second period is 2 = 21 − 20. The conditional ANCOVA model on Y2j includes the treatment contrast for 2, the baseline measures y10, and other potential factors. It is for this conditional ANCOVA model that the need for the second randomization stands out. Any imbalance in either y0 or y10 is adjusted, providing the basis towards an unbiased inference of 2. The sequential parallel design by Fava et al. (2003) lacks this critical feature. It is unclear from Tamura and Huang (2007) if a change score from the period 1 baseline is used for both periods or if the seemingly unrelated regression model necessarily includes any baseline values. Chen et al. (2011) do not explicitly define the endpoint; however, it is clear that their ANCOVA model includes the baseline values of period 2 when placebo nonresponders are rerandomized. An important benefit of the enrichment with placebo nonresponders is also reflected by the conditional ANCOVA model for period 2, for which the standard deviation is dramatically reduced to 2 = 2 1 − 2 2 1/2 due to the large increase of the correlation from 1 to 2 (see section 4). It is noted that the standard deviations for the conditional ANCOVA models of period 1 and 2 are 1 = 1 1 − 2 1 1/2 and 2 = 2 1 − 2 2 1/2 , respectively. These standard deviations, rather than those from unadjusted change from baseline scores, are used to calculate the sample size for the examples in section 4. 4. APPLICATION 4.1. Proof of Concept Clinical trials in mood disorder are known to have large failure rates, mainly because a large percentage of patients in the trial respond to placebo treatment. With a doubly randomized delayed-start design, the effect of a new drug can be further evaluated in placebo non-responders. This design consists of a 2-week placebo lead-in phase and a drug testing phase with two 4-week periods, where both phases are double-blind. The purpose of the placebo lead-in phase is to screen out potential placebo responders. This is important, as without the placebo lead-in, the placebo response rate at the end of period 1 (or week 4), as seen from Table 1, can be as high as 50%. In the drug testing phase, eligible patients are randomized with one-to-one ratio in period 1 to receive either a new test drug or placebo. Treatment in the second period is based on the patient’s period 1 treatment and response status. At the end of period 1, all patients are evaluated for efficacy and safety. In particular, patients are classified as responders if their HAMD17 total scores are 18 or less. For period 2 the one-to-one ratio is also used to rerandomize placebo nonresponders to receive the new test drug or continue on placebo. Efficacy will be based on the change from baseline of the HAMD17 total score with the new test drug compared with placebo after 4 weeks of treatment in each period. This is carried out with the conditional ANCOVA models with the respective baseline HAMD17 total score and treatment contrast between the new test drug and the placebo. The combination test in equation (3) is then used to detect a possible efficacy signal in HAMD17 total score at the one-sided type 1 error rate = 1. Downloadedby[AmgenInc]at11:0206August2015
  • 14. DOUBLY RANDOMIZED DELAYED-START DESIGN 749 The design parameters for the sample size calculation are based on a full analysis of the MDD dataset used in section 3.1 followed with hierarchical longitudinal disease modeling. The resulting models provide, among other things, the mean and standard deviation of the HAMD17 total scores for each week and the correlations of the HAMD17 total scores between weeks. Through this modeling, it is determined that 30% of the patients would respond by the end of 2 weeks after enrollment, which suggests a 2-week placebo lead-in phase. This would reduce the responder rate for the drug testing phase. The placebo responder rate for period 1 is also 30%. We assume for the first period a treatment difference 1 = 3 in the mean change from baseline in HAMD17 total score between the new test drug and placebo. The full analysis also suggests an increased effect size of at least 1 point (i.e., = 1) in placebo nonresponders in period 2. Thus, we choose 2 = 1 + = 4. With 1 = 8 and 1 = 25, 1 = 1 1 − 2 1 1/2 is approximately 7 75. For the second period, 2 = 9 and 2 = 8, which leads to 2 = 2 1 − 2 2 1/2 ≈ 5 5. Based on literature review, we use 10% as the dropout rate during period 1. With the responder rate of 30% for period 1, the attrition rate is = 4. To achieve a minimum 90% power, we use the 95% nominal power, for which the required total number of patients to be randomized for period 1 is 112. Based on a simulation study with 20 000 runs, the simulated power is 92925 with the combination test that uses adaptive weights (see section 5). The optimal weight for the first period in the combination test is .4857. To determine the sample size of the sequential parallel design of Fava et al. (2003), we performed a series of simulation studies as suggested by Tamura and Huang (2007). For the seemingly unrelated regression analysis, we used the change from the respective baseline HAMD17 total score as the dependent variable. Because of the rerandomization, the correlation between the dependent variables of the two periods is set to zero following Chen et al. (2011). The required sample size is 128 using the midpoint 7 of the weight range 6 to 8. In comparison, the required total sample size of a parallel-group design with a treatment difference of 3 points and a standard deviation of 7 75 would be 225 patients after adjusting for 10% dropouts. 4.2. Dose Finding Consider a Phase 2b dose finding trial for patients with MDD where two doses of a new test drug are compared to a placebo control. The design again consists of a 2-week placebo lead-in phase and a drug testing phase with two 4- week periods, and both phases are double-blind. In the drug testing phase, eligible patients are randomized in a 1 1 2 ratio into period 1 to receive either of the two doses of the new test drug or a placebo. Treatment in the second period is based on the patient’s period 1 treatment and response status. At the end of period 1, all patients are evaluated for efficacy and safety. In particular, patients are classified as responders if there is a 50% or more reduction in their Montgomery– Asberg Depression Rating Scale (MADRS) total scores. Placebo nonresponders from period 1 will be rerandomized in a 1 1 1 ratio to receive either of the two doses of the new test drug or to continue with placebo. A key feature of this design is the utilization of an unequal randomization ratio during the first period of the double-blind phase. Based on past clinical trials Downloadedby[AmgenInc]at11:0206August2015
  • 15. 750 LIU ET AL. experience, designs using an equal randomization ratio for multiple doses tend to increase the placebo response rate, largely because of the high probability for a patient to receive the potentially effective new test drug. For the current trial, the primary endpoint for each period is the change from the respective baseline in the MADRS total scores. With the 1 1 2 randomization ratio for the first period, the treatment difference with respect to the primary endpoint between a dose of the new test drug is assumed to be 4 5 points. In contrast, the smaller treatment difference of 4 points would be used if the randomization ratio were 1 1 1 (Sinyor et al., 2010). For the second period, we still use the treatment difference of 4 5 because it is expected that the randomization ratio would play a much smaller role in the placebo response rate for the enriched placebo nonresponders. The standard deviations used for periods 1 and 2 are 9 and 6, respectively. Again, we assume that 30% of placebo patients are responders at the end of period 1, and 10% of patients are expected to drop out during period 1. For the Phase 2b trial, we choose the one-sided type 1 error rate = 05 in order to control the probability of failure of the Phase 3 program. An objective of the Phase 2b trial is to identify the dose(s) to carry forward into a Phase 3 clinical development program. Thus, an analysis comparing each individual dose to placebo is necessary. Therefore, the sample size is based on a pairwise comparison of a dose of the new test drug with the placebo at the type 1 error rate = 05 with 95% nominal power. The resulting sample size for the first randomization is 33 patients for a dose of the new test drug, and 66 patients for the placebo. The weight for the first period is 4969. To control multiple type 1 error rates at the specified = 05 level, we use the following closed testing procedure. We first perform an overall test against the global null hypothesis that neither dose is effective (compared to placebo) at the = 05 level. If the global null hypothesis is rejected, we then test against the individual null hypothesis that a particular dose of the new test drug is not effective at the = 05 level. The overall test employs the combination test given in equation (3) for which triple trend test statistics (Capizzi et al., 1992) are used for both periods. For each triple trend test statistic, the same trend scores are derived from three different sigmoid Emax models; each trend statistic is based on the ANCOVA model with the baseline MADRS total score and the trend score. As a result, we use the same weight 4969, which is derived from the pairwise sample size calculation already shown, to combine the two triple trend test statistics. For the pairwise comparison, the ANCOVA model includes the baseline MADRS total score and the treatment contrast between the dose in question and placebo. The pairwise test statistics from the two periods are then combined via equation (3) with the first period weight 4969 to test against the individual null hypothesis. This example illustrates the flexibility of the doubly randomized delayed-start design, which is not present in the previous example. There are two important design features. First, there are only three treatment groups for each period. Second is that different randomization ratios for the two periods are employed to take full advantage of the enrichment with the placebo non-responders. Analytically, we can apply the closed testing procedure to control the multiple type 1 error rates, and we can easily apply the existing triple trend test to test against the global null hypothesis. Downloadedby[AmgenInc]at11:0206August2015
  • 16. DOUBLY RANDOMIZED DELAYED-START DESIGN 751 In comparison, the sequential parallel design by Fava et al. (2003) does not have these features. Five treatment sequences would be needed, which requires a large randomization block. There is also a limitation in choosing the randomization ratio. There is no existing procedure for triple trend tests using the seemingly unrelated regression analysis. 5. SIMULATION STUDIES 5.1. Principle Monte Carlo simulations play an important role in clinical trial designs and analyses. It can be used to verify results established by a statistical theory or to evaluate operating characteristics that are difficult to study theoretically. The latter includes either investigations of the robustness of the trial design and analysis or an acknowledgment of the limitations of the simulation studies. Unfortunately, there have been many instances of abuse or fraud in practice, which are documented in great detail by Liu and Chi (2010b). Prior to carrying out simulation studies, the first task is to construct a simulation model, which consists of two critical components: (i) the set of procedures to mimic the trial design that strictly follow the prespecified statistical analysis of proposed methods, and (ii) a set of mathematical models to approximate the state of nature. Without (i), it is difficult to regard the simulation results as relevant to the trial design or analysis in question. As the trial design and analysis are specified, it is possible and necessary to construct the set of procedures in great detail. For (ii), it is easy to specify distribution models that follow the assumptions required by the statistical theory; the challenge is to come up with models for the state of nature which is not fully understood and difficult to predict. With this conviction in mind, we construct our simulation model for the simplest design described in section 2.1 to compare the treatment with a single dose of the new drug to that of a placebo. 5.2. Simulation Model For the doubly randomized delayed-start design, our simulation model is constructed in the following steps, reflecting the actual programming code. 1. As seen in section 3.1 and alluded to in section 4.1, we first performed a full analysis of a historical dataset from a double-blind, placebo-controlled trial consisting of a new test drug and an active control with an existing drug. We then fit the data with a set of hierarchical longitudinal models, from which important parameters for the doubly randomized delayed-start design are derived. These parameters include placebo means ( 0, 10 and 20), the standard deviations ( 0, 1 and 2), the correlation coefficients ( 1 and 2), and the increased treatment effect for placebo nonresponders. 2. For the placebo lead-in phase, we generate a large number of baseline values of y0 from a normal distribution with mean 0 and standard deviation 0. To qualify for randomization into period 1 of the drug testing phase, we subset the baseline values according to the nonresponder criterion y0 ≥ yL. From the subset, Downloadedby[AmgenInc]at11:0206August2015
  • 17. 752 LIU ET AL. we randomly select n11 for the treatment group, and n10 = r1n11 for the placebo group of period 1. 3. For the active treatment group in period 1, we generate n11 scores of y11 from the conditional normal distribution given in equation (8) with 11 = 10 + 1 for a specified value of the effect size 1; for the placebo group, we generate n10 scores of y10 from the conditional normal distribution given in equation (8) with 10. 4. For the period 1 analysis, we use the conditional ANCOVA model for the change scores of y11 − y0 or y10 − y0 with the baseline value y0 and treatment contrast for the new test drug versus placebo as the predictors. From the ANCOVA analysis, we extract the estimate and standard error, and compute the Wald test statistic Z1. 5. For period 2, we select the period 1 subset of placebo scores of y10 by the nonresponder criterion y10 ≥ yL. This subset and its size are denoted by S∗ 10 and n∗ 10 the number of elements of the subset. Let d be the dropout rate of the placebo nonresponders. We generate a number of dropouts, say, nd, from a binomial distribution with size n∗ 10 and probability d. We then select randomly n∗ 10 − nd scores of y10 from S∗ 10, which are used as baseline values for period 2. The resulting subset is denoted by S10. From S10 we randomly select n21 scores y10 for the treatment group with the new drug for period 2, and the remaining n21 scores of y10 are for the placebo group; n21 and n20 are chosen according to the randomization ratio r2 = n20/n21 for period 2. 6. For the active treatment group in period 2, we generate n21 scores of y21 from the conditional normal distribution given in equation (10) with 21 = 20 + 2 for which 2 = 1 + ; for the placebo group, we generate n20 scores of y20 from the conditional normal distribution given in equation (10) with 20. 7. For the period 2 analysis, we use the conditional ANCOVA model for the change scores of y21 − y10 or y20 − y10 with the baseline value y10 and treatment contrast for the new test drug versus the placebo as the predictors. From the ANCOVA analysis, we extract the estimate and standard error, and compute the Wald test statistic Z2. 8. We compute the combination test statistic in equation (3) with three choices of weights. The first is the optimal weight given by equation (5) with the expected sample size for the second period. The second is the weight by equation (5) with the actual second period sample size n21. The third is an arbitrarily specified weight, say, 6, 7, or 8, as suggested by Tamura and Huang (2007). Steps 2 to 8 are repeated to a specified simulation size under the null hypothesis that 1 = 2 = 0 and various alternative hypotheses with different values of standard deviations and correlation coefficients. 5.3. Results We consider a phase 3 trial with the same doubly randomized delayed-start design in the proof-of-concept example in section 4.1. The parameter settings are identical except for the type 1 error rate = 025, power 90%, effect size 1 = 2 5 and 2 = 3 5, and the dropout rate d = 2. The required total sample size is 206 patients. The number of simulation runs is 20 000. We simulate the type 1 error rates and values of power for different dropout rates d = 1 2 and 3. The results Downloadedby[AmgenInc]at11:0206August2015
  • 18. DOUBLY RANDOMIZED DELAYED-START DESIGN 753 Table 2 Type 1 error rate and power with = 1 opt adp 1 fix 2 fix 3 fix 1 = 0, 2 = 0, d = 1 .02640 .02695 .02650 .02675 .02600 1 = 0, 2 = 0, d = 2 .02655 .02680 .02650 .02610 .02625 1 = 0, 2 = 0, d = 3 .02570 .02680 .02625 .02615 0.02545 1 = 2 5, 2 = 3 5, d = 1 .88445 .88680 .88525 .87835 .86195 1 = 2 5, 2 = 3 5, d = 2 .87290 .87680 .87350 .86985 .85245 1 = 2 5, 2 = 3 5, d = 3 .84710 .85460 .85255 .84830 .83545 are summarized in Tables 2 and 3 where opt = 5069 is the optimal fixed weight, adp is the adaptive weight, and 1 fix = 6, 2 fix = 7, and 3 fix = 8 are arbitrary fixed weights. From Table 2, it is seen that the simulated type 1 error rates are very close to the theoretical value = 025. None of the simulated type 1 error rates are statistically different from = 025 at the two-sided significance level 05. In particular, the combination test using adaptive weight does not yield type 1 error rates that are significantly different from 025. This is consistent with unblinded trial modification theory by Liu and Chi (2010a). It is noticed, as expected, that the proposed combination test, using either the optimal weight or adaptive weight, is more powerful than the test using the prefixed weight 8. By the simulation studies conducted by Chen et al. (2011), the weighted estimate procedure with rerandomization of the placebo nonresponders has nearly the same power as that of the sequential parallel design when the weight for the estimate, based on Tamura and Huang (2007), is chosen between 6 and 8. When a small period 2 standard deviation is used to reflect the enrichment, the resulting optimal weight of the combination test in equation (3) is substantially smaller than 7 or 8. From this, we can infer that the proposed method is more powerful than the sequential parallel design of Fava et al. (2003). This conclusion is also consistent with our calculations of the required sample size of both designs given in section 4.1. It is seen from Table 2 that the values of power are all below the required 90%. As explained in section 2.3, this is due to the use of expected sample size for the second period that ignores the variability of the actual number of placebo nonresponders that can be rerandomized. To fix this problem for the design, we calculate the final sample size with 93% power. The resulting total sample size is 230. Table 3 provides the simulation results with this new sample size. From both Table 2 and Table 3, it is clear that a loss of power also occurs when the dropout rate is higher than the expected dropout rate. This problem could be resolved by using a larger dropout rate in the sample size calculation. However, for the actual trial, the dropout rate can be lower than expected, and therefore, using a larger dropout rate Downloadedby[AmgenInc]at11:0206August2015
  • 19. 754 LIU ET AL. Table 3 Type 1 error rate and power with = 07 opt adp 1 fix 2 fix 3 fix 1 = 0, 2 = 0, d = 1 .02705 .02760 .02700 .02700 .02645 1 = 0, 2 = 0, d = 2 .02775 .02835 .02785 .02765 .02740 1 = 0, 2 = 0, d = 3 .02740 .02785 .02705 .02675 .02670 1 = 2 5, 2 = 3 5, d = 1 .92270 .92335 .92145 .91390 .89945 1 = 2 5, 2 = 3 5, d = 2 .90600 .90910 .90800 .90245 .88675 1 = 2 5, 2 = 3 5, d = 3 .88735 .89145 .88930 .88480 .87455 in the sample size calculation may unnecessarily increase the cost and duration of the trial. A possible solution to this problem may be to adjust the sample size to be randomized for the first stage according to the actual observed dropout rate. 6. DISCUSSION 6.1. Summary We propose an adaptive doubly randomized delayed-start design that offers greater flexibility and interpretability than the sequential parallel design of Fava et al. (2003). The key differences are the addition of the randomization for placebo patients who meet the enriched entry criteria for the second period, and the use of an adaptive combination test. The design allows the adaptive weight to be based on the actual number of patients randomized to the second period. Through simulation studies, we show that the combination test with the adaptive weight is more powerful than using a fixed weight, which must be prespecified for the sequential parallel design. The design can handle the simple setting for comparing one dose of a new drug against placebo, as well as a complex setting of multiple doses, including possibly an active control with an existing approved drug. The latter is important to allow for comparative effectiveness research. For drug development in mood disorders, the proposed design consists of a placebo lead-in phase and a drug testing phase with two periods. The placebo nonresponders from either the placebo lead-in phase or the first period are randomized to receive the new test drug or placebo. Chen et al. (2011) raise concerns of rerandomization and state that patients may feel the drug effect right after receiving the new drug, which may introduce bias through undermining the integrity of blinding. But if patients must not feel the drug effect, how has the FDA managed to approve many effective drugs where clinical endpoints rely on patients own subjective evaluations? A major reason for having a randomized, double-blind trial, especially with an enrichment design (Temple, 1994), as opposed to open-label trials, is for patients to feel the drug effect when in fact Downloadedby[AmgenInc]at11:0206August2015
  • 20. DOUBLY RANDOMIZED DELAYED-START DESIGN 755 the drug is efficacious (under the alternative hypothesis). Difficulties only arise when there are noticeable differential side effects in favor of an ineffective new drug. Issues with partially unblinding the data are addressed by Liu and Pledger (2005) and Liu and Chi (2010b). 6.2. Future Research There are several ways to improve the simulation model. First, there is evidence from existing data that clinical endpoint measures can follow a distribution with different characteristics (e.g., standard deviation or correlation) for patients receiving an active drug. Thus, a simulation model that reflects these differences is useful to evaluate the robustness of the underlying statistical analysis method (e.g., ANCOVA). Although the combination test is shown to control the type 1 error rates, it must be realized that the underlying assumption is that the p-value for a given realization is not stochastically smaller than the uniform distribution. Thus, an inappropriate analysis method for deriving the period-wise test statistics Z1 or Z2 can still lead to an inflation of the type 1 error rate. Chen et al. (2011) specifically require that the correlation coefficient is constant. When it is not, their procedure may inflate the type 1 error rate as well. However, we point out that the problem is not particular to the design in question. The concern is for the statistical method that is used to derive the test statistic or p-value, irrespective of the design employed. Another improvement is to incorporate longitudinal models in both the simulation model and the statistical analysis. This has the advantage of addressing dropouts in both periods as well as to further increase the power of the test. Chen et al. (2011) evaluated the robustness of the analysis based on a mixed-effect model when dropouts are missing-at-random (MAR). It is also necessary to evaluate the mixed effects model analysis when there are differential and informative dropouts (i.e., missing-not-at-random or MNAR). We notice that Chen et al. (2011) state that there is a no evidence of MNAR from the 25 New Drug Applications (NDA). However, it would not be possible to establish evidence of MNAR unless these trials actually collected the missing data. Chen et al. (2011) showed that the sequential parallel design of Fava et al. (2003) does not inflate the unconditional type 1 error rates over 2,000 hypothetical replications of their simulation studies. In reality, only a few trials are conducted in a single clinical development program. For trials with imbalance of known and unknown prognostic factors, only conditional type 1 error rates are relevant. We are not aware of methods that allow evaluations of the conditional type 1 error rates. There are many areas for adding adaptive features to the doubly randomized delayed-start design. A simple setting is sample size adjustment based on blinded data review of the dropout rate. As the design is naturally adaptive, it also allows other adaptive features such as sample size adjustment based on unblinded review of the period 1 variability, effect size, and so on, or adaptive dose-finding. The design can also be expanded to include randomized withdrawal for patients who respond to the new test drug in period 1. This addition allows study of the new test drug’s effect in many ways, including a reduced dose or frequency for maintenance of the response. Downloadedby[AmgenInc]at11:0206August2015
  • 21. 756 LIU ET AL. APPENDIX: PROOF OF THE THEOREM (a) Since ⊂ , g is also measurable. Thus PH02 p2g ≤ C ≤ C is true following the Adaptation Lemma of Liu et al. (2002). (b) Let X be a measurable random variable that follows a standard normal distribution under the null hypothesis H01 1 = 0. Let Z1 Z2 = z − 1/2 2 Z2 / 1/2 1 Then PH Z ≥ z = PH Z1 ≥ Z1 Z2 ≤ PH X ≥ Z1 Z2 The inequality for the probability PH · is true, as following the construction, Z1 is not stochastically larger than X. Now let Z2 X = z − 1/2 1 X / 1/2 2 Then X ≥ Z1 Z2 if and only if Z2 ≥ Z1 X , which is equivalent to p2g ≤ C X where C X = 1 − Z2 X . Thus, PH X ≥ Z1 Z2 = E PH p2g ≤ C X ≤ E C X for which the last inequality follows (a). Because X follows a standard normal distribution, it is easy to show that E C X = . Chaining all the inequalities together, we have PH Z ≥ z ≤ ACKNOWLEDGMENTS The authors thank Trevor McMullan for the analysis of the MDD dataset. They are also grateful to Dr. Yevgen Tymofyeyev for reviewing the article, and suggesting the use of the adaptive weight. REFERENCES Capizzi, T., Survill, T. T., Heyes, J. F. (1992). An empirical and simulated comparison of some tests for detecting progressiveness of response with increasing doses of a compound. Biometrical Journal 3:275–289. Chen, Y. F., Yang, Y., Hung, H. M. J., Wang, S. J. (2011). Evaluation of performance of some enrichment designs dealing with high placebo response in psychiatric clinical trials. Contemporary Clinical Trials 32:592–604. Chi, Y. G., Liu, Q. (1999). The attractiveness of the concept of a prospectively designed two-stage clinical trial. Journal of Biopharmaceutical Statistics 9(4):537–547. Cui, L., Hung, H. M. J., Wang, S. J. (1999). Modification of sample size in group sequential clinical trials. Biometrics 55:853-857. Fava, M., Evins, A. E., Dorer, D. J., Schoenfeld, D. A. (2003). The problem of the placebo response in clinical trials for psychiatric disorders: culprits, possible remedies, and a novel study design approach. Psychotherapy and Psychosomatics 72:115–127. Habermann, T. M., Weller, E. A., Morrison, V. A., et al. (2006). Rituximab-CHOP versus CHOP alone or with maintenance rituximab in older patients with diffuse large B-cell lymphoma. Journal of Clinical Oncology 24:3121–3127. Heyn, R. M., Joo, P., Karon, M., et al. (1974). BCG in the treatment of acute lymphocytic leukemia. Blood 46:431–442. Downloadedby[AmgenInc]at11:0206August2015
  • 22. DOUBLY RANDOMIZED DELAYED-START DESIGN 757 Jennison, C., Turnbull, B. W. (2000). Group Sequential Methods with Applications to Clinical Trials Boca Raton, FL: Chapman & Hall. Khin, N. A., Chen, Y. (2011). Exploratory analyses of efficacy data from major depressive disorder trials submitted to the US Food and Drug Administration in support of new drug applications. Journal of Clinical Psychiatry 72:464–472. Leber, P. (1996). Observations and suggestions for antidementia drug development. Alzheimer Disease and Associated Disorders 10(suppl. 1):31–34. Leber, P. (1997). Slowing the progression of Alzheimer disease: Methodological issues. Alzheimer Disease and Associated Disorders 11(suppl. 5):10–21. Liu, Q., Chi, G. Y. H. (2010a). Fundamental theory of adaptive designs with unplanned design change in clinical trials with blinded data. In: Pong, A., Chow, S. C., eds. Handbook of Adaptive Designs in Pharmaceutical and Clinical Development. Boca Raton, FL: Chapman & Hall, pp.2-1–2-8. Liu, Q., Chi, G. Y. H. (2010b). Understanding the FDA guidance on adaptive designs: historical, legal and statistical perspectives. Journal of Biopharmaceutical Statistics 20(special issue):1178–1219. Liu, Q., Pledger, G. W. (2005). Interim analysis and bias in clinical trials: The adaptive design perspective. In: Buncher, R., Tsay, J.-Y., eds. Statistics in the Pharmaceutical Industry. 3rd ed. Revised and Expanded, New York: Taylor & Francis, pp. 231–244. Liu, Q., Proschan, M. A., Pledger, G. W. (2002). A unified theory of two-stage adaptive designs. Journal of the American Statistical Association 97:1034–1041. McDermott, M. P., Hall, W. J., Oakes, D., Eberly, S. (2002). Design and analysis of two- period studies of potentially disease-modifying treatments. Controlled Clinical Trials 23:635–649. Mills, E. J., Kelly, S., Wu P., Guyatt, G. H. (2007). Epidemiology and reporting of randomized trials employing rerandomization of patient groups: a systematic survey. Contemporary Clinical Trials 28:268–275. Papakostas, G. I., Fava, M. (2009). Does the probability of receiving placebo influence clinical trial outcome? A meta-regression of double-blind, randomized clinical trials in MDD. European Neuropsychopharmacology 19:34–40. Sinyor, M., Levitt, A. J., et al. (2010). Does inclusion of a placebo arm influence response to active antidepressant treatment in randomized controlled trials? Results from pooled and meta-analyses. Journal of Clinical Psychiatry 71:270–279. Tamura, R., Huang, X. (2007). An examination of the efficiency of the sequential parallel design in psychiatric clinical trials. Clinical Trials 4:309–317. Tedeschini, E., Fava, M., Goodness, T. M., Papakostas, G. I. (2010). Relationship between probability of receiving placebo and probability of prematurely discontinuing treatment in double-blind, randomized clinical trials for MDD: a meta-analysis. European Neuropsychopharmacology 20:562–567. Temple, R. J. (1994). Special study designs: early escape, enrichment, studies in non- responders. Communications in Statistics – Theory and Methods 23:499–531. Downloadedby[AmgenInc]at11:0206August2015