PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019

A Bayesian model for joint longitudinal and survival
outcomes in the presence of subpopulation
heterogeneity
Pengpeng Wang, Elizabeth H. Slate and Jonathan Bradley
Department of Statistics
Florida State University

Example: HIV
Longitudinal marker: CD4 cell count
Survival time: time to death or censoring

Example: NPCT Prostate Cancer
02468
Years Since Randomization
ln(PSA+1)
0 2 4 6 8 10
Select PCa Noncases
02468
ln(PSA+1)
0 2 4 6 8 10
PCa cases

Example: NPCT Prostate Cancer
02468
ln(PSA+1)
0 2 4 6 8 10
Select PCa Noncases
02468
ln(PSA+1)
0 2 4 6 8 10
PCa cases
02468
ln(PSA+1)
0 2 4 6 8 10
Select PCa Noncases
02468
ln(PSA+1)
0 2 4 6 8 10
PCa cases

Approaches to joint modeling
[Survival | Marker] [Marker] (selection or measurement error model)
e.g., CD4 counts and progression to AIDS
Tsiatis, DeGruttola and Wulfsohn (1995)
Faucett and Thomas (1996)
Shi, Taylor and Mu˜noz (1996)
Wulfsohn and Tsiatis (1997)
Law, Taylor and Sandler (2002)

[Marker | Survival] [Survival] (pattern mixture model)
Little (1993), Little and Wang (1996)
Pawitan and Self (1993) in AIDS context

[Marker | Survival] [Survival] (pattern mixture model)
Little (1993), Little and Wang (1996)
Pawitan and Self (1993) in AIDS context
Latent Variables to simplify dependence
Shared random eﬀects . . .
e.g., DeGruttola and Tu (1994), Schluchter (1992)
Latent processes . . .
e.g., Xu and Zeger (2001), Henderson, Diggle and Dobson (2000)
Latent classes
Lin et al. (2002), Han et al. (2007)

Latent variable joint modeling
Marker Yi(t) observed at times ti1, ti2, . . . , timi
Event Time Ti event process (may be censored, etc.)
Shared random eﬀects
[Yi, Ti] = [Yi | bi][Ti | bi][bi] dbi
Latent classes: cik = 1 if subject i is in class k
[Yi, Ti] =
K
k=1
[Yi | cik = 1][Ti | cik = 1][cik = 1]
Latent processes
[Yi(t), Ti] = [Yi(t) | ηi(t)][Ti | ηi(t)][ηi(t)] dηi(t)

A Latent Process Model
Brown and Ibrahim (2003, Biometrics)
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γ ψβi
(t) + Xα}

Yij = ψβi
(tij) + ij
(t) + Xα}
Yij is the longitudinal observation for the i-th subject at time tij
ψβi
(tij) is a trajectory function
e.g. linear: ψβi
(tij) = β0i + β1itij
e.g. quadratic: ψβi
(tij) = β0i + β1itij + β2it2
ij

Yij = ψβi
(tij) + ij
(t) + Xα}
Yij is the longitudinal observation for the i-th subject at time tij
ψβi
(tij) is a trajectory function
e.g. linear: ψβi
e.g. quadratic: ψβi
(tij) = β0i + β1itij + β2it2
ij
Longitudinal error ij
iid
∼ N(0, σ2)
λ(t) is the baseline hazard, piecewise constant
γ associates the two outcomes
α associates covariates X to the survival time

Prior Distributions
Dirichlet process prior on βi ≡ (β0i, β1i)T , i = 1, 2, . . . , n
βi|G ∼ G
G ∼ DP(M, G0)
G0 = N2(b0, V0)
Priors and hyperpriors:
λl ∼ Gamma(al, bl), l = 1, 2, . . . , L
γ ∼ N(µγ, σγ)
α ∼ Np(µα, Σα)
σ2
∼ IG(aσ, bσ)
b0 ∼ N2(µb, Σb)
V−1
0 ∼ Wishart(Sv, nv)

Why Gaussian?
Nice properties!
Approximate normality is widespread.
Easy to interpret.

Why Gaussian?
Nice properties!
Approximate normality is widespread.
Easy to interpret.
What if something is not nice?

Why Gaussian?
With the Gaussian distributional assumption
the contribution to the joint likelihood from subject i is
f(Yi, si, νi|βi, σ2
, γ, α)
=
1
(2πσ2)mi/2
exp



−
1
2σ2
mi
j=1
[Yij − ψβi
(tij)]2



× λ(si)νi
exp νi[γψβi
(si) + Xiα] −
si
0
λ(u)exp{γψβi
(u) + Xiα}du

Gibbs Sampler
1. Update cluster indicator z(t) using Neal’s algorithm 8 (Neal 2000);
2. Simulate unique values of β(t) for each cluster using
Metropolis-Hastings method or ARS;
3. Simulate b
(t)
0 from
[b
(t)
0 |Y, z(t), φ
(t)
z , V
(t−1)
0 , σ2(t−1)
, γ(t−1), λ(t−1), α(t−1)];
4. Simulate V
(t)
0 from
[V
(t)
0 |Y, z(t), φ
(t)
z , b
(t)
0 , σ2(t−1)
, γ(t−1), λ(t−1), α(t−1)];
5. Simulate σ2(t)
from
[σ2(t)
|Y, z(t), φ
(t)
z , b
(t)
0 , V
(t)
0 , γ(t−1), λ(t−1), α(t−1)];
6. Simulate γ(t) using Metropolis-Hastings method or ARS;
7. Simulate λ
(t)
l (l = 1, 2, . . . , L) from
[λ
(t)
l |Y, z(t), φ
(t)
z , b
(t)
0 , V
(t)
0 , σ2(t)
, γ(t), α(t−1)];
8. Simulate α(t) using Metropolis-Hastings method or ARS.

Log-Gamma Distribution
q ∼ LG(α, κ) if q = log z for z ∼ Gamma(α, κ).
α > 0 is the shape parameter; κ > 0 is the rate parameter.
The pdf of log-gamma distribution is given by
f(q|α, κ) =
κα
Γ(α)
exp {αq − κ exp(q)} , q ∈ R.

Proposition 1 (Bradley et al., 2018)
Let q ∼ LG(α, α), and q+ = α1/2q. Then q+ converges in
distribution to the standard normal distribution as α goes to
inﬁnity.

Multivariate Log-Gamma Distribution (MLG)
Take w = (w1, w2, . . . , wm)T with wi ∼ LG(αi, κi) independently,
i = 1, 2, . . . , m. Then
q = c + Vw
has a MLG distribution denoted by
q ∼ MLG(c, V, α, κ),
where c ∈ Rm, V ∈ Rm × Rm is an invertible square matrix,
α = (α1, α2, . . . , αm)T , and κ = (κ1, κ2, . . . , κm)T .

Multivariate Log-Gamma Distribution
The pdf of the MLG random variable q is given by
f(q|c, V, α, κ) = C1exp αT
V−1
(q − c) − κT
exp{V−1
(q − c)} ,
where C1 is the normalizing constant given by
C1 =
1
det(VVT )1/2
m
i=1
καi
i
Γ(αi)
.

Multivariate Log-Gamma Distribution
Proposition 2 (Bradley et al., 2018)
Let q ∼ MLG(c, α1/2V, α1m, α1m). Then q converges in distribution
to a multivariate normal distribution with mean c and covariance
matrix VVT as α goes to inﬁnity.

Conditional Multivariate Log-Gamma Distribution
(cMLG)
Let q ∼ MLG(c, V, α, κ), and partition q as follows
q =
q1
q2
with sizes
g
m − g
,
and accordingly V−1 is partitioned as follows
V−1
= Hm×g Bm×(m−g) .
Then (q1|q2 = d) has a cMLG distribution denoted by
(q1|q2 = d) ∼ cMLG(H, α, κc),
where κc ≡ exp{Bd − V −1c + log(κ)}.

Conditional Multivariate Log-Gamma Distribution
The pdf of cMLG distribution is given by
f(q1|q2 = d, c, V, α, κ) = C2exp αT
Hq1 − κT
c exp(Hq1) ,
where C2 is the normalizing constant given by
C2 =
1
det(VVT )1/2
m
i=1
καi
i
Γ(αi)
exp(αT Bd − αT V−1c)
[ f(q|c, V, α, κ)dq1]q2=d
.
The m × g matrix H is not square!
The cMLG distributions do not fall within the same class of the
MLG distributions.
Additional care is needed for simulation from cMLG.
(Bradley et al., 2018)

Our Semiparametric Joint Model
Model formulation:
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γψβi
(t) + Xα}
Linear trajectory function, i.e. ψβi
(tij) = β0i + β1itij.
Assume that the longitudinal error has a log-gamma distribution,
i.e.
ij
iid
∼ LG(α , κ ).

Our Semiparametric Joint Model
With the log-gamma distributional form
the contribution to the joint likelihood from subject i is
f(Yi, si, νi|βi, , α , κ , λ, γ, α)
=
mi
j=1
κα
Γ(α )
exp {α [Yij − ψβi
(tij)] − κ exp{Yij − ψβ(tij)}}
× λ(si)νi
exp νi[γψβi
(si) + Xiα] −
si
0
λ(u)exp{γψβi
(u) + Xiα}du

Prior Distributions
Dirichlet process prior on βi ≡ (β0i, β1i)T , i = 1, 2, . . . , n
βi|G ∼ G
G ∼ DP(M, G0)
G0 = cMLG(HG, αG, κG)
Priors on other parameters:
λl ∼ Gamma(al, bl), l = 1, 2, . . . , L
γ ∼ LG(α2, κ2)
α ∼ cMLG
X
δ51T
p
,
δ41n
α3
,
σ41n
κ3
α ∼ Gamma(θ1, τ1)

Gibbs Sampler
1. Update cluster indicator z(t) using Neal’s algorithm 8 (Neal 2000);
2. Simulate unique values of β(t) for each class from
[φ
(t)
z |Y, z(t), V
(t−1)
0 , α
(t−1)
, κ
(t−1)
, γ(t−1), λ(t−1), α(t−1)];
3. Simulate V0 from
[v21|Y, z(t), φ
(t)
z , α
(t−1)
, κ
(t−1)
, γ, λ(t−1), α(t−1)];
4. Simulate α
(t)
using Metropolis-Hastings method or ARS;
5. Simulate γ(t) from
[γ(t)|Y, z(t), φ
(t)
z , V
(t)
0 , α
(t)
, κ
(t)
, λ(t−1), α(t−1)];
6. Simulate λ
(t)
l (l = 1, 2, . . . , L) from
[λ
(t)
l |Y, z(t), φ
(t)
z , V
(t)
0 , α
(t)
, κ
(t)
, γ(t), α(t−1)];
7. Simulate α(t) from
[α(t)|Y, z(t), φ
(t)
z , V
(t)
0 , α
(t)
, κ
(t)
, γ(t), λ(t)];
8. Update hyper-parameters κ0, κ1, κ2, κ3 using conjugate priors;
9. Update hyper-parameters α0, α1, α2, α3 using uniform priors.

Recall Joint Model Framework
Model formulation:
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γψβi
(t) + Xα}
Linear trajectory function, i.e. ψβi
Gaussian joint model: ij
iid
∼ N(0, σ2)
Log-gamma joint model: ij
iid
∼ LG(α , κ )

Simulation Results
Gibbs sampler with 2,000 burn-in and 3,000 inferential iterations
Model evaluation at individual level (MSE)
Model evaluation at cluster level
Number of clusters
Cluster assignment (adjusted Rand index)
Longitudinal trajectory and hazard rate at cluster level

Simulation: Parameter Estimates (Gaus, Gaus) 3 cluster
True value Bias (SE) 95% Credible Interval
Lower Bound (SE) Upper Bound (SE)
β01 = 3 −0.03 (0.05) 2.54 (0.25) 3.19 (0.06)
β11 = −1 0.03 (0.07) −1.11 (0.05) −0.76 (0.17)
β0,101 = 3 −0.02 (0.05) 2.73 (0.25) 3.16 (0.06)
β1,101 = −0.5 −0.04 (0.07) −0.66 (0.05) −0.42 (0.17)
β0,201 = 2 0.00 (0.05) 1.88 (0.25) 2.31 (0.06)
β1,201 = −0.5 −0.01 (0.07) −0.65 (0.05) −0.45 (0.17)
γ = −1 0.35 (0.02) −0.67 (0.02) −0.63 (0.02)
α = −1 0.06 (0.09) −1.16 (0.10) −0.74 (0.07)
λ1 = 0.5 −0.26 (0.06) 0.14 (0.04) 0.35 (0.08)
λ2 = 0.5 −0.18 (0.06) 0.22 (0.05) 0.44 (0.08)
λ3 = 0.5 −0.11 (0.05) 0.28 (0.05) 0.51 (0.06)
λ4 = 0.5 −0.02 (0.06) 0.36 (0.05) 0.61 (0.07)
λ5 = 1 0.03 (0.09) 0.79 (0.07) 1.29 (0.12)
λ6 = 1 0.00 (0.02) 0.03 (0.002) 3.62 (0.09)
Table 1: Parameter estimates for simulation study of 3-cluster Gaussian distributed
data using Gaussian joint model with slice sampler. n = 100 per cluster

Simulation: Individual Trajectories (Gaus, Gaus) 3 cluster
Figure 2: True longitudinal observations vs. estimated longitudinal trajectories from
Gaussian joint model with slice sampler for three-cluster Gaussian data.

Simulation: Individual hazard (Gaus, Gaus) 3 clusterigure 1: True longitudinal observations vs. estimated longitudinal trajectories using model GaussianSS in Case 3 (
mulation study with three cluster Gaussian distributied data for the Gaussian joint model).
igure 2: True hazard rate vs. estimated hazard rate from model GaussianSS in Case 3 (the simulation study with th
uster Gaussian distributied data for the Gaussian joint model).
Figure 3: True hazard vs. estimated hazard from Gaussian joint model with
slice sampler for three-cluster Gaussian data.

Simulation: Parameter Estimates (Gaus data, LG) 3 cluster
True value Bias (SE) 95% Credible Interval
Lower Bound (SE) Upper Bound (SE)
β01 = 3 −0.03 (0.02) 2.92 (0.02) 3.01 (0.02)
β11 = −1 0.02 (0.01) −0.10 (0.01) −0.96 (0.01)
β0,101 = 3 −0.01 (0.02) 2.95 (0.02) 3.03 (0.02)
β1,101 = −0.5 0.00 (0.01) −0.52 (0.01) −0.49 (0.01)
β0,201 = 2 0.12 (0.02) 2.04 (0.02) 2.22 (0.02)
β1,201 = −0.5 −0.05 (0.01) −0.61 (0.01) −0.53 (0.01)
γ = −1 −0.02 (0.04) −1.14 (0.05) −0.91 (0.04)
α = −1 0.08 (0.08) −1.06 (0.08) −0.77 (0.08)
λ1 = 0.5 0.12 (0.10) 0.40 (0.08) 0.90 (0.13)
λ2 = 0.5 0.10 (0.07) 0.42 (0.05) 0.83 (0.09)
λ3 = 0.5 0.08 (0.07) 0.43 (0.06) 0.75 (0.09)
λ4 = 0.5 0.07 (0.08) 0.44 (0.07) 0.72 (0.10)
λ5 = 1 0.02 (0.11) 0.80 (0.09) 1.26 (0.13)
λ6 = 1 0.00 (0.004) 0.48 (0.01) 1.71 (0.02)
Table 2: Parameter estimates for simulation study of 3-cluster Gaussian distributed
data using log-gamma joint model. n = 100 per cluster

Simulation Results – Individual Level Inference
# Clusters Data Distribution Joint Model Longitudinal MSE (SE) Hazard MSE (SE)
1 Gaussian Gaussian 0.099 (0.004) 10.38 (7.85)
LG 0.101 (0.004) 4.46 (7.74)
LG 0.096 (0.003) 4.97 (2.97)
LG 0.095 (0.004) 6.30 (5.18)
1 LG Gaussian 0.095 (0.005) 12.04 (15.50)
LG 0.100 (0.004) 21.49 (25.61)
2 LG Gaussian 0.090 (0.003) 11.65 (1.55)
LG 0.093 (0.003) 3.62 (2.07)
3 LG Gaussian 0.088 (0.003) 13.19 (13.32)
LG 0.095 (0.004) 6.30 (5.18)
Table 3: MSE of the longitudinal observations and MSE of the hazard rates at
individual level.

Simulation Results – Cluster Level Inference
Data Distribution Joint Model Number of Clusters ARI (SE)
Truth Estimate (SE)
Gaussian Gaussian 1 1.6 (0.70) 0.9 (0.32)
LG 1 1.1 (0.32) 0.9 (0.32)
LG 2 2.1 (0.32) 0.89 (0.04)
LG 3 3 (0) 0.85 (0.03)
LG Gaussian 1 2.2 (1.03) 0.7 (0.48)
LG 1 1 (0) 1 (0)
LG Gaussian 2 2.9 (0.74) 0.87 (0.03)
LG 2 2 (0) 0.90 (0.03)
LG Gaussian 3 4.5 (1.08) 0.83 (0.04)
LG 3 3 (0) 0.85 (0.03)
Table 4: Estimated number of clusters and adjusted Rand index.

Simulation Results – Cluster Level (Gaussian Data)
Figure 4: Gaussian joint model Figure 5: Log-gamma joint model

Simulation Results – Cluster Level (Log-gamma Data)
Figure 6: Gaussian joint model Figure 7: Log-gamma joint model

Computational Eﬃciency
Eﬀective Sample Size, e.g., 3-cluster scenario
Data Joint Model β01 β11
Gaussian Gaussian 2080 2298
LG 3000 3000
LG Gaussian 1913 2360
LG 3000 3000

Simulation Conclusion
Gaussian Model Log-Gamma Model
Individual
Longitudinal Better Good
Hazard OK Better
Clustering
No. Clusters Overestimated Good
ARI Good Better
Cluster
Longitudinal Good Good
Hazard OK Better

Applications – HIV Data
HIV data has both longitudinal and survival outcomes.
Longitudinal marker: CD4 cells count
Survival time: time to death or censoring

Figure 8: True trajectories vs estimated trajectories for 4 randomly selected patients
by using Gaussian joint model with slice sampler (MSE=0.08) and log-gamma
joint model (MSE=0.15).

Figure 9: Estimated trajectories and estimated hazard rate for each cluster by using
Gaussian joint model with slice sampler (38 clusters) and log-gamma joint
model (7 clusters)

A Bayesian Semiparametric Joint Model
Table 7: Parameter estimates for the HIV data using the Gaussian joint model with slice
sampler and the log-gamma joint model.
Gaussian Joint Model Log-gamma Joint Model
Parameter Estimate 95% Credible Interval Estimate 95% Credeble Interval
β01 3.27 (2.69, 3.75) 3.02 (2.86, 3.76)
β11 −0.03 (−0.09, 0.03) −0.04 (−0.05, −0.01)
V ar(error) 0.13 (0.12, 0.15) 0.10 (0.09, 0.16)
γ −0.64 (−0.66, −0.61) −0.91 (−0.96, −0.87)
Drug −0.50 (−0.80, −0.21) −0.41 (−0.68, −0.13)
Gender −1.03 (−1.46, −0.58) −0.78 (−1.11, −0.43)
PrevOI 0.07 (−0.41, 0.51) 0.22 (−0.13, 0.57)
AZT −0.69 (−1.11, −0.33) −0.43 (−0.72, −0.16)
λ1 0.26 (0.11, 0.54) 0.37 (0.21, 0.59)
λ2 0.50 (0.24, 0.95) 0.55 (0.35, 0.81)
λ3 0.65 (0.32, 1.21) 0.70 (0.47, 1.03)
λ4 0.61 (0.28, 1.15) 0.69 (0.43, 1.04)
λ5 5.05 (1.61, 10.43) 1.39 (0.75, 2.25)
riors, which obtained more conjugacy. A simula-
was provided with both Gaussian distributed sim-
References
Abrams, D. I., Goldman, A. I., Launer, C., K

Applications – PSA Data
Longitudinal marker: PSA readings
Survival time: time to diagnosis of prostate cancer or censoring

Table 6.2: Parameter estimates for the PSA data using the Gaussian joint model with slice sampler
and the log-gamma joint model.
Gaussian Joint Model Log-gamma Joint Model
Parameter Estimate 95% Credible Interval Estimate 95% Credeble Interval
β01 0.96 (0.64, 1.27) 1.11 (0.81, 1.36)
β11 0.07 (−0.002, 0.15) 0.04 (0.01, 0.06)
V ar(error) 0.05 (0.04, 0.05) 0.04 (0.04, 0.05)
γ 1.17 (1.13, 1.17) 0.81 (0.53, 1.07)
Figure 6.13 shows the predicted PSA trajectories, predicted hazard rate and predicted survival
probability for each cluster. Different colors represent different clusters. In the bottom left panel
we truncate the trajectory of the cluster labeled with yellow at the largest observed time of PSA
readings. We do this because a high percentage of individuals in this cluster were diagnosed with
prostate cancer early, and hence, feel less confident in making estimates after this time point. The

Figure 10: True trajectories vs estimated trajectories for 4 randomly selected
subjects by using Gaussian joint model with slice sampler (MSE=1.70) and
log-gamma joint model (MSE=4.94).

Figure 11: Estimated trajectories and estimated hazard rate for each cluster by
using Gaussian joint model with Metropolis-Hastings (56 clusters) log-gamma
joint model (7 clusters).

Summary
Jointly modeling associated longitudinal and survival time
outcomes improves precision
Dirichlet process prior for longitudinal trajectory parameters
accommodates heterogeneity and enables discovery of subgroup
structure
Common population parameter γ associating marker trajectory
and survival time hazard
Log-gamma formulation increases ﬂexibility, provides conjugate
structure
Sampling still requires some care, but more eﬃcient wrt ESS

References
Bradley, J. R., Holan, S. H., and Wikle, C. K. (2018). Computationally efficient
multivariate spatio-temporal models for high-dimensional count-valued data (with
discussion). Bayesian Analysis, 13(1):253–310.
Brown, E. R. and Ibrahim, J. G. (2003). A Bayesian semiparametric joint hierarchical
model for longitudinal and survival data. Biometrics, 59(2):221–228.
Goldman, A. I., Carlin, B. P., Crane, L. R., Launer, C., Korvick, J. A., Deyton, L., and
Abrams, D. I. (1996). Response of CD4 lymphocytes and clinical consequences of
treatment using ddI or ddC in patients with advanced HIV infection. JAIDS Journal of
Acquired Immune Deficiency Syndromes, 11(2):161–169.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their
applications.
Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models.
Journal of computational and graphical statistics, 9(2):249–265.
Neal, R. M. (2003). Slice sampling. Annals of statistics, pages 705–741.
Van Dyk, D. A. and Park, T. (2008). Partially collapsed Gibbs samplers: Theory and
methods. Journal of the American Statistical Association, 103(482):790–796.
Wang, Y. and Taylor, J. M. G. (2001). Jointly modeling longitudinal and event time data
with application to acquired immunodeficiency syndrome. Journal of the American
Statistical Association, 96(455):895–905.

PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019

Similar to PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019