A copula-based Simulation Method for Clustered Multi-State Survival Data

A copula-based simulation method
for clustered multi-state survival data
F. Rotolo• , C. Legrand , I. Van Keilegom , M. Chiogna•

• Dipartmento di Scienze Statistiche Institut de Statistique, Biostatistique
et Sciences Actuarielles

Universit` degli Studi di Padova
a Universit´ Catholique de Louvain
e

September 23, 2011

Clustered Multi-State Survival Data F. Rotolo

Survival Data
Time since an origin event until an event of interest.
Example: from birth to death, since beginning of therapy until remission, etc.
Time
q q

T=5

0 1 2 3 4 5

A copula-based simulation method for clustered multi-state survival data 2/ 22


Survival Data
Time since an origin event until an event of interest.
Example: from birth to death, since beginning of therapy until remission, etc.
Time
q q

T=5

0 1 2 3 4 5

Censoring: some observations cannot be observed, the only
available information being a lower bound.
Example: migration, change of therapy, loss to follow-up, etc.
Time
q x q

T>3.25

0 1 2 3 4 5



Modeling Survival Data
Because of this peculiarity, instead of modeling the density f (t) of
T , the hazard is considered
P[t ≤ T < t + ∆t|T ≥ t] f (t) d
h(t) = lim = = − log S(t),
∆t 0 ∆t S(t) dt
∞
with S(t) = t f (u)du = P[T > t].
t
Note: S(t) = exp{− 0 h(u)du}.



Modeling Survival Data
Because of this peculiarity, instead of modeling the density f (t) of
T , the hazard is considered
P[t ≤ T < t + ∆t|T ≥ t] f (t) d
h(t) = lim = = − log S(t),
∆t 0 ∆t S(t) dt
∞
with S(t) = t f (u)du = P[T > t].
t
Note: S(t) = exp{− 0 h(u)du}.

The basic regression model for the hazard is the Proportional
Hazards (PH) Model (Cox, 1972)

h(t|X ) = h0 (t) exp{β X }.



Survival Models
Complications of Cox models have been developed

Frailty Models (FMs)
account for overdispersion
or clustering by means
of random eﬀects

h(t|Xij ) = h0 (t)Zi e β Xij ,

similar to GLMM
log[h(t|Xij )] = log[h0 (t)]+Wi +β Xij ,

with Zi = e Wi
(Duchateau & Janssen, 2008; Wienke, 2010)



Survival Models

Frailty Models (FMs) Multi-State Models (MSMs)
account for overdispersion consider several events
or clustering by means and their interactions
of random eﬀects LR

T1 T4

T3
NED De
similar to GLMM
T2 T5

Wi
with Zi = e DM

(Putter et al., 2007; de Wreede et al., 2010)



Survival Models


T1 T4

T3
NED De
similar to GLMM
T2 T5

Wi
with Zi = e DM


Possible integration?


Survival Models


T1 T4

T3
NED De
similar to GLMM
T2 T5

Wi
with Zi = e DM


Possible integration? Simulation studies


Simulation of data
A simulation method should be able to generate

the dependence of times of
LR
competing events

NED De

DM



Simulation of data

LR
competing events
subsequent events

NED De

DM



Simulation of data


LR LR LR LR competing events
NED De NED De NED De NED De

LR LR
DM DM DM DM

NED De NED De subsequent events
LR LR LR LR
DM DM

the dependence between clustered
DM DM DM DM observations

LR LR LR LR


LR LR
DM DM DM DM

NED De NED De

LR LR LR LR
DM DM


DM DM DM DM



Simulation of data


LR
competing events
subsequent events
observations
NED x De
the censoring due to competing
events occurrence
x

DM



Simulation of data


LR
competing events
x subsequent events
observations
NED x De
events occurrence
x the censoring due to end of the
study or loss to follow up
DM



Simulation of data

LR
competing events
T1 T4
subsequent events
T3
observations
NED De
events occurrence
the censoring due to end of the
T2 T5
study or loss to follow up
DM the event-speciﬁc covariates eﬀect


Simulation Algorithm F. Rotolo

Outline

Clustered Multi-State Survival Data

Simulation Algorithm

Clustering

Choice of Parameters

Example



Copula Model

LR

Marginal survival functions freely chosen
T1
S1 (t), S2 (t) and S3 (t)

T3
NED De

T2

DM



Copula Model

LR

T1
S1 (t), S2 (t) and S3 (t)
Joint survival function by Clayton Copula
3 −θ −1/θ
S123 (t) = i=1 Si (ti ) −2
T3
NED De

T2

DM



Copula Model

LR
T1 S1 (t), S2 (t) and S3 (t)
3 −θ −1/θ
S123 (t) = i=1 Si (ti ) −2

NED
T3
De
Conditional survivals from the joint
θ −1/θ−1
S1 (t1 )
S2|1 (t2 |t1 ) = 1 + S2 (t2 )
− S1 (t1 )θ

T2

DM



Copula Model

LR
T1 S1 (t), S2 (t) and S3 (t)
3 −θ −1/θ
S123 (t) = i=1 Si (ti ) −2

NED
T3
De
Conditional survivals from the joint
θ −1/θ−1
S1 (t1 )
S2|1 (t2 |t1 ) = 1 + S2 (t2 )
− S1 (t1 )θ
−1/θ−2
S3 (t3 )−θ −1
S3|12 (t3 |t1 , t2 ) = 1 + S1 (t1 )−θ +S2 (t2 )−θ −1
T2

DM



Algorithm
Data from the copula model (Kpanzou, 2007) are simulated as
follows

−1
1 T1 = S1 (U1 )

with U1 , U2 , U3 , UC i.i.d. U(0, 1)


Algorithm
follows

−1
1 T1 = S1 (U1 )
−1
2 T2 |t1 = S2|1 (U2 |t1 ) =
θ −1/θ
−1 − 1+θ
S2 U2 − 1 S1 (t1 )−θ + 1

with U1 , U2 , U3 , UC i.i.d. U(0, 1)


Algorithm
follows

−1
1 T1 = S1 (U1 )
−1
2 T2 |t1 = S2|1 (U2 |t1 ) =
θ −1/θ
−1 − 1+θ
S2 U2 − 1 S1 (t1 )−θ + 1

−1
3 T3 |t1 , t2 = S3|12 (U3 |t1 , t2 ) =
θ −1/θ
−1 − 1+2θ
S3 U3 −1 S1 (t1 )−θ + S2 (t2 )−θ − 1 + 1

with U1 , U2 , U3 , UC i.i.d. U(0, 1)


Algorithm
follows

−1
1 T1 = S1 (U1 )
−1
2 T2 |t1 = S2|1 (U2 |t1 ) =
θ −1/θ
−1 − 1+θ
S2 U2 − 1 S1 (t1 )−θ + 1

−1
3 T3 |t1 , t2 = S3|12 (U3 |t1 , t2 ) =
θ −1/θ
−1 − 1+2θ
S3 U3 −1 S1 (t1 )−θ + S2 (t2 )−θ − 1 + 1

−1
C TC = FC (UC )

with U1 , U2 , U3 , UC i.i.d. U(0, 1)


Algorithm
follows

−1
1 T1 = S1 (U1 )
−1
2 T2 |t1 = S2|1 (U2 |t1 ) =
θ −1/θ
−1 − 1+θ
S2 U2 − 1 S1 (t1 )−θ + 1

−1
3 T3 |t1 , t2 = S3|12 (U3 |t1 , t2 ) =
θ −1/θ
−1 − 1+2θ
S3 U3 −1 S1 (t1 )−θ + S2 (t2 )−θ − 1 + 1

−1
C TC = FC (UC )
T min(TC , T1 , T2 , T3 )

with U1 , U2 , U3 , UC i.i.d. U(0, 1)


Second transitions
For patients with a transition into state LR or DM, an analogous
copula model is used for second transition to state De

LR
The following conditional survivals can be obtained
T1 T4 −1/θ−1
θ
S1 (t1 )
S4|1 (t4 |t1 ) = 1 + S4 (t4 )
− S1 (t1 )θ

θ −1/θ−1
S2 (t2 )
NED De
S5|2 (t5 |t2 ) = 1 + S5 (t5 )
− S2 (t2 )θ

and the same algorithm is used to simulate second
transition times, conditionally on ﬁrst transition ones.

DM



Second transitions
For patients with a transition into state LR or DM, an analogous
copula model is used for second transition to state De

LR
The following conditional survivals can be obtained
θ −1/θ−1
S1 (t1 )
S4|1 (t4 |t1 ) = 1 + S4 (t4 )
− S1 (t1 )θ

θ −1/θ−1
S2 (t2 )
NED De
S5|2 (t5 |t2 ) = 1 + S5 (t5 )
− S2 (t2 )θ

and the same algorithm is used to simulate second
transition times, conditionally on ﬁrst transition ones.
T2 T5

DM



Clustering
The algorithm allows to freely specify the marginal survivals Si (t).
How can we insert clustering?



Clustering

In a PH way
hi (t|Z ) = Z h0i (t),
with h0i (t) the baseline hazard for transition i.



Clustering

In a PH way
hi (t|Z ) = Z h0i (t),
t
Since S0i (t) = exp{− 0 h0i (u)du}, then
t
Si (t|Z ) = exp −Z h0i (u)du = [S0i (t)]Z
0



Clustering

In a PH way
hi (t|Z ) = Z h0i (t),
t
Since S0i (t) = exp{− 0 h0i (u)du}, then
t
Si (t|Z ) = exp −Z h0i (u)du = [S0i (t)]Z
0

The copula model can be used for conditional survivals
{Si (t|Z )}i∈{1,2,3,4,5} and the same algorithm can be used,
conditionally on Z .


Clustering and covariates

The eﬀect of covariates X can be inserted in an analogous way.
The marginals are then
βi X
Si (t|X , Z ) = S0i (t)Ze

and simulation via the copula model is done conditionally on
(X , Z ).



The Clayton–Weibull model

Despite the model is quite general, we consider in the following a
particular case:
Ti ∼ Wei(λi , ρi ), i ∈ {1, 2, 3, 4, 5}
TC ∼ Wei(λC , 1) ∼ Exp(λC )
72 months (6 years) of administrative censoring



The Clayton–Weibull model

Despite the model is quite general, we consider in the following a
particular case:
Ti ∼ Wei(λi , ρi ), i ∈ {1, 2, 3, 4, 5}
TC ∼ Wei(λC , 1) ∼ Exp(λC )
72 months (6 years) of administrative censoring

This model
1. gives simple forms of conditional distributions
T
2. implies that Si|X ,Z (t|x, z) = exp{−λi ze βi x t ρi },
T
i.e. Ti |X , Z ∼ Wei(λi ze βi x , ρi ) is still a Weibull r.v.


Choice of Parameters F. Rotolo

Outline

Clustered Multi-State Survival Data

Simulation Algorithm

Clustering

Choice of Parameters

Example



Choice of parameters
When simulating a dataset, one should be able to choose parameters in
order to obtain particular target values for

LR pi probabilities of LR, DM, De and
censoring from NED
T1 T4

T3
NED De

T2 T5

DM




censoring from NED
T1 T4

mi median of uncensored LR, DM
and De times from NED
T3
NED De

T2 T5

DM




censoring from NED
T1 T4

T3
NED De
pi probabilities of De and censoring
from LR and from DM

T2 T5

DM




censoring from NED
T1 T4

T3
NED De
pi probabilities of De and censoring
from LR and from DM

T2 T5
mi median of uncensored De times
from LR and from DM
DM

It is not possible to analytically express these quantities as functions of
the parameters.


Criterion function
In order to ﬁnd appropriate parameters for given target values
{pi , mi }, we want to minimize the criterion function
2 2
pi mi
Υ(Π) = log + log
pi (Π)
ˆ mi (Π)
ˆ
i∈{1,2,3,4,5}

≥0

with

Π = {λi }i∈{1,2,3,C ,4,C 4,5,C 5} ∪ {ρi }i∈{1,2,3,4,5} ∈ R13
+

and {ˆi (Π), mi (Π)} the observed values in a simulated dataset with
p ˆ
parameters Π



Criterion function
2 2
pi mi
Υ(Π) = log + log
pi (Π)
ˆ mi (Π)
ˆ
i∈{1,2,3,4,5}

= Υ123 (Π123 ) + Υ4 (Π4 ) + Υ5 (Π5 ) ≥ 0

with

Π = {λi }i∈{1,2,3,C ,4,C 4,5,C 5} ∪ {ρi }i∈{1,2,3,4,5} ∈ R13
+

Π = Π123 ∪ Π4 ∪ Π5 ∈ R7 × R3 × R3
+ + +

p ˆ
parameters Π



Criterion function
2 2
pi mi
Υ(Π) = log + log
pi (Π)
ˆ mi (Π)
ˆ
i∈{1,2,3,4,5}

= Υ123 (Π123 ) + Υ4 (Π4 ) + Υ5 (Π5 ) ≥ 0

with

Π = {λi }i∈{1,2,3,C ,4,C 4,5,C 5} ∪ {ρi }i∈{1,2,3,4,5} ∈ R13
+

Π = Π123 ∪ Π4 ∪ Π5 ∈ R7 × R3 × R3
+ + +

Further reduction of problem dimension...
Π = Π123 ∪ Π4 ∪ Π5 ∈ R4+3 × R2+1 × R2+1
+ + +

p ˆ
parameters Π



Minimization of criterion function
In order to further reduce the dimension of the problem, each of
the parameter sets ΠK , K ∈ {{123}, {4}, {5}} is split into the
scale {λi } and the shape parameters {ρi }. The optimization of the
criterion function ΥK (ΠK ) is iterated on each subset

Example: algorithm for K = {123}
Set J = 1
(0)
λ(0) = {λi }i∈{C ,1,2,3} = {1, 1, 1, 1}
(0)
ρ(0) = {ρi }i∈{1,2,3} = {1, 1, 1}



Minimization of criterion function
In order to further reduce the dimension of the problem, each of
the parameter sets ΠK , K ∈ {{123}, {4}, {5}} is split into the
scale {λi } and the shape parameters {ρi }. The optimization of the
criterion function ΥK (ΠK ) is iterated on each subset

Example: algorithm for K = {123}
Set J = 1
(0)
λ(0) = {λi }i∈{C ,1,2,3} = {1, 1, 1, 1}
(0)
ρ(0) = {ρi }i∈{1,2,3} = {1, 1, 1}
Repeat until J = maxit or Υ123 (λ(J−1) , ρ(J−1) ) < th
Obtain λ(J) by minimizing Υ123 (λ, ρ(J−1) ) over λ
Obtain ρ(J) by minimizing Υ123 (λ(J) , ρ) over ρ
Set J = J + 1
where maxit and th are arbitrary termination parameters.


Example F. Rotolo

An example
A dataset of size 44 is available from a multi-center study on head and neck cancer.

Target values {pi } and {mi }

LR
8

15 7

3
NED De
22 14

4 4

DM
Tot: 44 0


Example F. Rotolo

An example

Frailty term
40 Hospitals
LR
8
random sizes
15 7 Z ∼ Gam(1, 0.5)

3
NED De
22 14

4 4

DM
Tot: 44 0


Example F. Rotolo

An example

Frailty term
40 Hospitals
LR
8
random sizes
15 7 Z ∼ Gam(1, 0.5)

Covariates
Age ∼ N (60, 7)
3
NED De with 
22 14 log(0.8)/10
 i =1
βi,Age = log(0.9)/10 i =2

log(1.2)/10 i = 3, 4, 5

4 4

Treat ∼ Bin(0.5)
with 
DM log(1/3) i =1
0

Tot: 44
βi,Treat = 0 i =2

log(1.2) i = 3, 4, 5



Example F. Rotolo

Results
First transitions. The algorithm is run with datasets of size 104 ,
maxit = 10 and th = 0.1. The time of execution was 11:57’ hours

NED→ {LR,DM,De}
λ1 λ2 λ3 λC ρ1 ρ2 ρ3
0.276 0.019 0.013 0.031 0.851 1.076 0.569


Example F. Rotolo

Results
First transitions. The algorithm is run with datasets of size 104 ,
maxit = 10 and th = 0.1. The time of execution was 11:57’ hours

NED→ {LR,DM,De}
λ1 λ2 λ3 λC ρ1 ρ2 ρ3
0.276 0.019 0.013 0.031 0.851 1.076 0.569

NED→ {LR,DM,De}
pi mi
LR DM De C LR DM De
Target 0.34 0.09 0.07 0.50 6.00 10.00 3.00
Simulated 0.33 0.12 0.09 0.46 5.41 9.33 2.29

Υ123 (Π123 ) = 0.24

Example F. Rotolo

Results
Second transitions. Conditionally on ﬁrst transitions data, the
algorithm is run for second transitions from LR and DM with
maxit = 6 and th = 0.05. The times of execution were 4:31’ and
3:57’ hours, respectively.

LR→De DM→De
λ4 λC 4 ρ4 λ5 λC 5 ρ5
0.029 0.099 1.078 0.192 0.039 1.000


Example F. Rotolo

Results
Second transitions. Conditionally on ﬁrst transitions data, the
algorithm is run for second transitions from LR and DM with
maxit = 6 and th = 0.05. The times of execution were 4:31’ and
3:57’ hours, respectively.

LR→De DM→De
λ4 λC 4 ρ4 λ5 λC 5 ρ5
0.029 0.099 1.078 0.192 0.039 1.000

LR→De DM→De
pi mi pi mi
De C De De C De
0.53 0.47 3.25 Target 0.95 0.05 0.50
0.50 0.50 3.32 Simulated 0.97 0.03 0.54

Υ4 (Π4 ) = 0.0043 Υ5 (Π5 ) = 0.0064

Conclusion F. Rotolo

Conclusion
The proposed simulation procedure for clustered MS allows to

MSMs generate dependence between times of the same subject
(between both competing and subsequent event times)

FMs generate dependence between times of clustered subjects
(with arbitrary number and size of groups and free frailty distribution)

PH insert covariates via proportional hazards
parMod choose marginal distributions of time variables


Conclusion F. Rotolo

Conclusion
The proposed simulation procedure for clustered MS allows to

MSMs generate dependence between times of the same subject
(between both competing and subsequent event times)

FMs generate dependence between times of clustered subjects
(with arbitrary number and size of groups and free frailty distribution)

PH insert covariates via proportional hazards
parMod choose marginal distributions of time variables

automatically ﬁnd appropriate parameters, given arbitrary
target values for probabilities of censoring, of competing
events and for medians of uncensored times

generate censoring, both random and administrative


References F. Rotolo

References

Cox, D. R. (1972). Regression models and life-tables. Journal of the
Royal Statistical Society. Series B (Methodological) 34, 187–220.
de Wreede, L. C., Fiocco, M. & Putter, H. (2010). The mstate
package for estimation and prediction in non- and semi-parametric
multi-state and competing risks models. Comput Methods Programs
Biomed 99, 261–74.
Duchateau, L. & Janssen, P. (2008). The frailty model. Springer.
Kpanzou, T. A. (2007). Copulas in statistics. African Institute for
Mathematical Sciences (AIMS) .
Putter, H., Fiocco, M. & Geskus, R. B. (2007). Tutorial in
biostatistics: competing risks and multi-state models. Stat Med 26,
2389–430.
Wienke, A. (2010). Frailty Models in Survival Analysis. Chapman &
Hall/CRC biostatistics series. Taylor and Francis.


F. Rotolo [federico.rotolo@stat.unipd.it – federico.rotolo@uclouvain.be]
PhD Student at University of Padova and Visiting PhD Student at UCL

under the supervision of
prof. C. Legrand, UCL
prof. I. Van Keilegom, UCL
prof. M. Chiogna, UniPd

A copula-based Simulation Method for Clustered Multi-State Survival Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to A copula-based Simulation Method for Clustered Multi-State Survival Data

Similar to A copula-based Simulation Method for Clustered Multi-State Survival Data (20)

A copula-based Simulation Method for Clustered Multi-State Survival Data