PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019
Similar to PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019
Similar to PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019 (20)
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Survival Outcomes in the Presence of Subpopulation Heterogeneity - Elizabeth Slate, May 21, 2019
1. A Bayesian model for joint longitudinal and survival
outcomes in the presence of subpopulation
heterogeneity
Pengpeng Wang, Elizabeth H. Slate and Jonathan Bradley
Department of Statistics
Florida State University
4. Example: NPCT Prostate Cancer
02468
Years Since Randomization
ln(PSA+1)
0 2 4 6 8 10
Select PCa Noncases
02468
Years Since Randomization
ln(PSA+1)
0 2 4 6 8 10
PCa cases
5. Example: NPCT Prostate Cancer
02468
Years Since Randomization
ln(PSA+1)
0 2 4 6 8 10
Select PCa Noncases
02468
Years Since Randomization
ln(PSA+1)
0 2 4 6 8 10
PCa cases
02468
Years Since Randomization
ln(PSA+1)
0 2 4 6 8 10
Select PCa Noncases
02468
Years Since Randomization
ln(PSA+1)
0 2 4 6 8 10
PCa cases
6. Approaches to joint modeling
[Survival | Marker] [Marker] (selection or measurement error model)
e.g., CD4 counts and progression to AIDS
Tsiatis, DeGruttola and Wulfsohn (1995)
Faucett and Thomas (1996)
Shi, Taylor and Mu˜noz (1996)
Wulfsohn and Tsiatis (1997)
Law, Taylor and Sandler (2002)
7. Approaches to joint modeling
[Survival | Marker] [Marker] (selection or measurement error model)
e.g., CD4 counts and progression to AIDS
Tsiatis, DeGruttola and Wulfsohn (1995)
Faucett and Thomas (1996)
Shi, Taylor and Mu˜noz (1996)
Wulfsohn and Tsiatis (1997)
Law, Taylor and Sandler (2002)
[Marker | Survival] [Survival] (pattern mixture model)
Little (1993), Little and Wang (1996)
Pawitan and Self (1993) in AIDS context
8. Approaches to joint modeling
[Survival | Marker] [Marker] (selection or measurement error model)
e.g., CD4 counts and progression to AIDS
Tsiatis, DeGruttola and Wulfsohn (1995)
Faucett and Thomas (1996)
Shi, Taylor and Mu˜noz (1996)
Wulfsohn and Tsiatis (1997)
Law, Taylor and Sandler (2002)
[Marker | Survival] [Survival] (pattern mixture model)
Little (1993), Little and Wang (1996)
Pawitan and Self (1993) in AIDS context
Latent Variables to simplify dependence
Shared random effects . . .
e.g., DeGruttola and Tu (1994), Schluchter (1992)
Latent processes . . .
e.g., Xu and Zeger (2001), Henderson, Diggle and Dobson (2000)
Latent classes
Lin et al. (2002), Han et al. (2007)
9. Latent variable joint modeling
Marker Yi(t) observed at times ti1, ti2, . . . , timi
Event Time Ti event process (may be censored, etc.)
Shared random effects
[Yi, Ti] = [Yi | bi][Ti | bi][bi] dbi
Latent classes: cik = 1 if subject i is in class k
[Yi, Ti] =
K
k=1
[Yi | cik = 1][Ti | cik = 1][cik = 1]
Latent processes
[Yi(t), Ti] = [Yi(t) | ηi(t)][Ti | ηi(t)][ηi(t)] dηi(t)
10. A Latent Process Model
Brown and Ibrahim (2003, Biometrics)
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γ ψβi
(t) + Xα}
11. A Latent Process Model
Brown and Ibrahim (2003, Biometrics)
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γ ψβi
(t) + Xα}
Yij is the longitudinal observation for the i-th subject at time tij
ψβi
(tij) is a trajectory function
e.g. linear: ψβi
(tij) = β0i + β1itij
e.g. quadratic: ψβi
(tij) = β0i + β1itij + β2it2
ij
12. A Latent Process Model
Brown and Ibrahim (2003, Biometrics)
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γ ψβi
(t) + Xα}
Yij is the longitudinal observation for the i-th subject at time tij
ψβi
(tij) is a trajectory function
e.g. linear: ψβi
(tij) = β0i + β1itij
e.g. quadratic: ψβi
(tij) = β0i + β1itij + β2it2
ij
Longitudinal error ij
iid
∼ N(0, σ2)
λ(t) is the baseline hazard, piecewise constant
γ associates the two outcomes
α associates covariates X to the survival time
13. Prior Distributions
Dirichlet process prior on βi ≡ (β0i, β1i)T , i = 1, 2, . . . , n
βi|G ∼ G
G ∼ DP(M, G0)
G0 = N2(b0, V0)
Priors and hyperpriors:
λl ∼ Gamma(al, bl), l = 1, 2, . . . , L
γ ∼ N(µγ, σγ)
α ∼ Np(µα, Σα)
σ2
∼ IG(aσ, bσ)
b0 ∼ N2(µb, Σb)
V−1
0 ∼ Wishart(Sv, nv)
16. Why Gaussian?
With the Gaussian distributional assumption
the contribution to the joint likelihood from subject i is
f(Yi, si, νi|βi, σ2
, γ, α)
=
1
(2πσ2)mi/2
exp
−
1
2σ2
mi
j=1
[Yij − ψβi
(tij)]2
× λ(si)νi
exp νi[γψβi
(si) + Xiα] −
si
0
λ(u)exp{γψβi
(u) + Xiα}du
17. Gibbs Sampler
1. Update cluster indicator z(t) using Neal’s algorithm 8 (Neal 2000);
2. Simulate unique values of β(t) for each cluster using
Metropolis-Hastings method or ARS;
3. Simulate b
(t)
0 from
[b
(t)
0 |Y, z(t), φ
(t)
z , V
(t−1)
0 , σ2(t−1)
, γ(t−1), λ(t−1), α(t−1)];
4. Simulate V
(t)
0 from
[V
(t)
0 |Y, z(t), φ
(t)
z , b
(t)
0 , σ2(t−1)
, γ(t−1), λ(t−1), α(t−1)];
5. Simulate σ2(t)
from
[σ2(t)
|Y, z(t), φ
(t)
z , b
(t)
0 , V
(t)
0 , γ(t−1), λ(t−1), α(t−1)];
6. Simulate γ(t) using Metropolis-Hastings method or ARS;
7. Simulate λ
(t)
l (l = 1, 2, . . . , L) from
[λ
(t)
l |Y, z(t), φ
(t)
z , b
(t)
0 , V
(t)
0 , σ2(t)
, γ(t), α(t−1)];
8. Simulate α(t) using Metropolis-Hastings method or ARS.
18. Log-Gamma Distribution
Log-Gamma Distribution
q ∼ LG(α, κ) if q = log z for z ∼ Gamma(α, κ).
α > 0 is the shape parameter; κ > 0 is the rate parameter.
The pdf of log-gamma distribution is given by
f(q|α, κ) =
κα
Γ(α)
exp {αq − κ exp(q)} , q ∈ R.
19. Log-Gamma Distribution
Proposition 1 (Bradley et al., 2018)
Let q ∼ LG(α, α), and q+ = α1/2q. Then q+ converges in
distribution to the standard normal distribution as α goes to
infinity.
20. Multivariate Log-Gamma Distribution (MLG)
Take w = (w1, w2, . . . , wm)T with wi ∼ LG(αi, κi) independently,
i = 1, 2, . . . , m. Then
q = c + Vw
has a MLG distribution denoted by
q ∼ MLG(c, V, α, κ),
where c ∈ Rm, V ∈ Rm × Rm is an invertible square matrix,
α = (α1, α2, . . . , αm)T , and κ = (κ1, κ2, . . . , κm)T .
21. Multivariate Log-Gamma Distribution
The pdf of the MLG random variable q is given by
f(q|c, V, α, κ) = C1exp αT
V−1
(q − c) − κT
exp{V−1
(q − c)} ,
where C1 is the normalizing constant given by
C1 =
1
det(VVT )1/2
m
i=1
καi
i
Γ(αi)
.
22. Multivariate Log-Gamma Distribution
Proposition 2 (Bradley et al., 2018)
Let q ∼ MLG(c, α1/2V, α1m, α1m). Then q converges in distribution
to a multivariate normal distribution with mean c and covariance
matrix VVT as α goes to infinity.
23. Conditional Multivariate Log-Gamma Distribution
(cMLG)
Let q ∼ MLG(c, V, α, κ), and partition q as follows
q =
q1
q2
with sizes
g
m − g
,
and accordingly V−1 is partitioned as follows
V−1
= Hm×g Bm×(m−g) .
Then (q1|q2 = d) has a cMLG distribution denoted by
(q1|q2 = d) ∼ cMLG(H, α, κc),
where κc ≡ exp{Bd − V −1c + log(κ)}.
24. Conditional Multivariate Log-Gamma Distribution
The pdf of cMLG distribution is given by
f(q1|q2 = d, c, V, α, κ) = C2exp αT
Hq1 − κT
c exp(Hq1) ,
where C2 is the normalizing constant given by
C2 =
1
det(VVT )1/2
m
i=1
καi
i
Γ(αi)
exp(αT Bd − αT V−1c)
[ f(q|c, V, α, κ)dq1]q2=d
.
The m × g matrix H is not square!
The cMLG distributions do not fall within the same class of the
MLG distributions.
Additional care is needed for simulation from cMLG.
(Bradley et al., 2018)
25. Our Semiparametric Joint Model
Model formulation:
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γψβi
(t) + Xα}
Linear trajectory function, i.e. ψβi
(tij) = β0i + β1itij.
Assume that the longitudinal error has a log-gamma distribution,
i.e.
ij
iid
∼ LG(α , κ ).
26. Our Semiparametric Joint Model
With the log-gamma distributional form
the contribution to the joint likelihood from subject i is
f(Yi, si, νi|βi, , α , κ , λ, γ, α)
=
mi
j=1
κα
Γ(α )
exp {α [Yij − ψβi
(tij)] − κ exp{Yij − ψβ(tij)}}
× λ(si)νi
exp νi[γψβi
(si) + Xiα] −
si
0
λ(u)exp{γψβi
(u) + Xiα}du
27. Prior Distributions
Dirichlet process prior on βi ≡ (β0i, β1i)T , i = 1, 2, . . . , n
βi|G ∼ G
G ∼ DP(M, G0)
G0 = cMLG(HG, αG, κG)
Priors on other parameters:
λl ∼ Gamma(al, bl), l = 1, 2, . . . , L
γ ∼ LG(α2, κ2)
α ∼ cMLG
X
δ51T
p
,
δ41n
α3
,
σ41n
κ3
α ∼ Gamma(θ1, τ1)
28. Gibbs Sampler
1. Update cluster indicator z(t) using Neal’s algorithm 8 (Neal 2000);
2. Simulate unique values of β(t) for each class from
[φ
(t)
z |Y, z(t), V
(t−1)
0 , α
(t−1)
, κ
(t−1)
, γ(t−1), λ(t−1), α(t−1)];
3. Simulate V0 from
[v21|Y, z(t), φ
(t)
z , α
(t−1)
, κ
(t−1)
, γ, λ(t−1), α(t−1)];
4. Simulate α
(t)
using Metropolis-Hastings method or ARS;
5. Simulate γ(t) from
[γ(t)|Y, z(t), φ
(t)
z , V
(t)
0 , α
(t)
, κ
(t)
, λ(t−1), α(t−1)];
6. Simulate λ
(t)
l (l = 1, 2, . . . , L) from
[λ
(t)
l |Y, z(t), φ
(t)
z , V
(t)
0 , α
(t)
, κ
(t)
, γ(t), α(t−1)];
7. Simulate α(t) from
[α(t)|Y, z(t), φ
(t)
z , V
(t)
0 , α
(t)
, κ
(t)
, γ(t), λ(t)];
8. Update hyper-parameters κ0, κ1, κ2, κ3 using conjugate priors;
9. Update hyper-parameters α0, α1, α2, α3 using uniform priors.
34. Recall Joint Model Framework
Model formulation:
Yij = ψβi
(tij) + ij
h(t|Y) = λ(t) exp {γψβi
(t) + Xα}
Linear trajectory function, i.e. ψβi
(tij) = β0i + β1itij
Gaussian joint model: ij
iid
∼ N(0, σ2)
Log-gamma joint model: ij
iid
∼ LG(α , κ )
35. Simulation Results
Gibbs sampler with 2,000 burn-in and 3,000 inferential iterations
Model evaluation at individual level (MSE)
Model evaluation at cluster level
Number of clusters
Cluster assignment (adjusted Rand index)
Longitudinal trajectory and hazard rate at cluster level
37. Simulation: Individual Trajectories (Gaus, Gaus) 3 cluster
Figure 2: True longitudinal observations vs. estimated longitudinal trajectories from
Gaussian joint model with slice sampler for three-cluster Gaussian data.
38. Simulation: Individual hazard (Gaus, Gaus) 3 clusterigure 1: True longitudinal observations vs. estimated longitudinal trajectories using model GaussianSS in Case 3 (
mulation study with three cluster Gaussian distributied data for the Gaussian joint model).
igure 2: True hazard rate vs. estimated hazard rate from model GaussianSS in Case 3 (the simulation study with th
uster Gaussian distributied data for the Gaussian joint model).
Figure 3: True hazard vs. estimated hazard from Gaussian joint model with
slice sampler for three-cluster Gaussian data.
40. Simulation Results – Individual Level Inference
# Clusters Data Distribution Joint Model Longitudinal MSE (SE) Hazard MSE (SE)
1 Gaussian Gaussian 0.099 (0.004) 10.38 (7.85)
LG 0.101 (0.004) 4.46 (7.74)
2 Gaussian Gaussian 0.091 (0.002) 13.68 (3.07)
LG 0.096 (0.003) 4.97 (2.97)
3 Gaussian Gaussian 0.090 (0.004) 9.21 (3.35)
LG 0.095 (0.004) 6.30 (5.18)
1 LG Gaussian 0.095 (0.005) 12.04 (15.50)
LG 0.100 (0.004) 21.49 (25.61)
2 LG Gaussian 0.090 (0.003) 11.65 (1.55)
LG 0.093 (0.003) 3.62 (2.07)
3 LG Gaussian 0.088 (0.003) 13.19 (13.32)
LG 0.095 (0.004) 6.30 (5.18)
Table 3: MSE of the longitudinal observations and MSE of the hazard rates at
individual level.
41. Simulation Results – Cluster Level Inference
Data Distribution Joint Model Number of Clusters ARI (SE)
Truth Estimate (SE)
Gaussian Gaussian 1 1.6 (0.70) 0.9 (0.32)
LG 1 1.1 (0.32) 0.9 (0.32)
Gaussian Gaussian 2 2.2 (0.63) 0.88 (0.03)
LG 2 2.1 (0.32) 0.89 (0.04)
Gaussian Gaussian 3 4.4 (0.70) 0.82 (0.04)
LG 3 3 (0) 0.85 (0.03)
LG Gaussian 1 2.2 (1.03) 0.7 (0.48)
LG 1 1 (0) 1 (0)
LG Gaussian 2 2.9 (0.74) 0.87 (0.03)
LG 2 2 (0) 0.90 (0.03)
LG Gaussian 3 4.5 (1.08) 0.83 (0.04)
LG 3 3 (0) 0.85 (0.03)
Table 4: Estimated number of clusters and adjusted Rand index.
42. Simulation Results – Cluster Level (Gaussian Data)
Figure 4: Gaussian joint model Figure 5: Log-gamma joint model
43. Simulation Results – Cluster Level (Log-gamma Data)
Figure 6: Gaussian joint model Figure 7: Log-gamma joint model
44. Computational Efficiency
Effective Sample Size, e.g., 3-cluster scenario
Data Joint Model β01 β11
Gaussian Gaussian 2080 2298
LG 3000 3000
LG Gaussian 1913 2360
LG 3000 3000
45. Simulation Conclusion
Gaussian Model Log-Gamma Model
Individual
Longitudinal Better Good
Hazard OK Better
Clustering
No. Clusters Overestimated Good
ARI Good Better
Cluster
Longitudinal Good Good
Hazard OK Better
46. Applications – HIV Data
HIV data has both longitudinal and survival outcomes.
Longitudinal marker: CD4 cells count
Survival time: time to death or censoring
47. Applications – HIV Data
Figure 8: True trajectories vs estimated trajectories for 4 randomly selected patients
by using Gaussian joint model with slice sampler (MSE=0.08) and log-gamma
joint model (MSE=0.15).
48. Applications – HIV Data
Figure 9: Estimated trajectories and estimated hazard rate for each cluster by using
Gaussian joint model with slice sampler (38 clusters) and log-gamma joint
model (7 clusters)
49. Applications – HIV Data
A Bayesian Semiparametric Joint Model
Table 7: Parameter estimates for the HIV data using the Gaussian joint model with slice
sampler and the log-gamma joint model.
Gaussian Joint Model Log-gamma Joint Model
Parameter Estimate 95% Credible Interval Estimate 95% Credeble Interval
β01 3.27 (2.69, 3.75) 3.02 (2.86, 3.76)
β11 −0.03 (−0.09, 0.03) −0.04 (−0.05, −0.01)
V ar(error) 0.13 (0.12, 0.15) 0.10 (0.09, 0.16)
γ −0.64 (−0.66, −0.61) −0.91 (−0.96, −0.87)
Drug −0.50 (−0.80, −0.21) −0.41 (−0.68, −0.13)
Gender −1.03 (−1.46, −0.58) −0.78 (−1.11, −0.43)
PrevOI 0.07 (−0.41, 0.51) 0.22 (−0.13, 0.57)
AZT −0.69 (−1.11, −0.33) −0.43 (−0.72, −0.16)
λ1 0.26 (0.11, 0.54) 0.37 (0.21, 0.59)
λ2 0.50 (0.24, 0.95) 0.55 (0.35, 0.81)
λ3 0.65 (0.32, 1.21) 0.70 (0.47, 1.03)
λ4 0.61 (0.28, 1.15) 0.69 (0.43, 1.04)
λ5 5.05 (1.61, 10.43) 1.39 (0.75, 2.25)
riors, which obtained more conjugacy. A simula-
was provided with both Gaussian distributed sim-
References
Abrams, D. I., Goldman, A. I., Launer, C., K
50. Applications – PSA Data
Longitudinal marker: PSA readings
Survival time: time to diagnosis of prostate cancer or censoring
51. Applications – HIV Data
Table 6.2: Parameter estimates for the PSA data using the Gaussian joint model with slice sampler
and the log-gamma joint model.
Gaussian Joint Model Log-gamma Joint Model
Parameter Estimate 95% Credible Interval Estimate 95% Credeble Interval
β01 0.96 (0.64, 1.27) 1.11 (0.81, 1.36)
β11 0.07 (−0.002, 0.15) 0.04 (0.01, 0.06)
V ar(error) 0.05 (0.04, 0.05) 0.04 (0.04, 0.05)
γ 1.17 (1.13, 1.17) 0.81 (0.53, 1.07)
Figure 6.13 shows the predicted PSA trajectories, predicted hazard rate and predicted survival
probability for each cluster. Different colors represent different clusters. In the bottom left panel
we truncate the trajectory of the cluster labeled with yellow at the largest observed time of PSA
readings. We do this because a high percentage of individuals in this cluster were diagnosed with
prostate cancer early, and hence, feel less confident in making estimates after this time point. The
52. Applications – PSA Data
Figure 10: True trajectories vs estimated trajectories for 4 randomly selected
subjects by using Gaussian joint model with slice sampler (MSE=1.70) and
log-gamma joint model (MSE=4.94).
53. Applications – PSA Data
Figure 11: Estimated trajectories and estimated hazard rate for each cluster by
using Gaussian joint model with Metropolis-Hastings (56 clusters) log-gamma
joint model (7 clusters).
54. Summary
Jointly modeling associated longitudinal and survival time
outcomes improves precision
Dirichlet process prior for longitudinal trajectory parameters
accommodates heterogeneity and enables discovery of subgroup
structure
Common population parameter γ associating marker trajectory
and survival time hazard
Log-gamma formulation increases flexibility, provides conjugate
structure
Sampling still requires some care, but more efficient wrt ESS
55. References
Bradley, J. R., Holan, S. H., and Wikle, C. K. (2018). Computationally efficient
multivariate spatio-temporal models for high-dimensional count-valued data (with
discussion). Bayesian Analysis, 13(1):253–310.
Brown, E. R. and Ibrahim, J. G. (2003). A Bayesian semiparametric joint hierarchical
model for longitudinal and survival data. Biometrics, 59(2):221–228.
Goldman, A. I., Carlin, B. P., Crane, L. R., Launer, C., Korvick, J. A., Deyton, L., and
Abrams, D. I. (1996). Response of CD4 lymphocytes and clinical consequences of
treatment using ddI or ddC in patients with advanced HIV infection. JAIDS Journal of
Acquired Immune Deficiency Syndromes, 11(2):161–169.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their
applications.
Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models.
Journal of computational and graphical statistics, 9(2):249–265.
Neal, R. M. (2003). Slice sampling. Annals of statistics, pages 705–741.
Van Dyk, D. A. and Park, T. (2008). Partially collapsed Gibbs samplers: Theory and
methods. Journal of the American Statistical Association, 103(482):790–796.
Wang, Y. and Taylor, J. M. G. (2001). Jointly modeling longitudinal and event time data
with application to acquired immunodeficiency syndrome. Journal of the American
Statistical Association, 96(455):895–905.