Uncertainty quantification for
Bayesian survival analysis
Isma¨el Castillo1 St´ephanie van der Pas2
1Laboratoire de Probabilit´es, Statistique et Mod´elisation, Sorbonne Universit´e
2Mathematical Institute, Leiden University
BFF Conference, April 29, 2019
Survival model
T : event time; C : censoring time.
Y = min{T, C}; δ = 1{T ≤ C}.
Independent right censoring.
We observe n independent pairs (Y1, δ1), . . . , (Yn, δn).
Survival function: P(T > t)
1 / 16
Survival objects
Hazard function Cumulative hazard Survival
λ(t) = lim
∆t→0
P(t ≤ T < t + ∆t | T ≥ t)
∆t
.
Λ(t) =
t
0
λ(u)du.
S(t) = e−Λ(t).
2 / 16
Goal
Pointwise intervals Credible band
3 / 16
The piecewise exponential model
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
Hazard
E.g. Gamerman (1991, 1994), Ibrahim, Chen and Sinha
(2013), Kalbfleisch (1978), Arjas and Gasbarra (1994),
Nieto-Barajas and Walker (2002).
4 / 16
Bayesian survival analysis
Kim and Lee (2004), Kim (2006), BvM for survival objects
using neutral to the right processes.
De Blasi, Peccati, Pr¨unster (2009), CLTs for linear and
quadratic functionals of the hazard, using kernel mixtures
with respect to a completely random measure.
De Blasi and Hjort (2009), BvM in competing risks setting,
using a beta process prior.
Castillo (2012), semiparametric BvM for the Cox model,
using a Riemann-Liouville type process.
Donnet, Rivoirard, Rousseau, Scricciolo (2017),
L1-concentration for priors where the normalized hazard
and its integral are specified independently.
5 / 16
Assumptions
Hazard function Cumulative hazard Survival
There exists τ > 0 such that S0(τ) > 0 and
Pλ0
(C ≥ τ) > 0. We take τ = 1.
There exists ρ > 0 such that λ0(t) ≥ ρ for all t ∈ [0, τ].
C admits a continuous density g with respect to the
Lebesgue measure on [0, τ).
There exists ρ > 0 such that g(t) ≥ ρ for all t ∈ [0, τ).
6 / 16
Our approach
Our setting is non-conjugate and the techniques potentially
apply to many priors.
Key result: establish convergence at the minimax rate in
. ∞-norm.
Using tools from Castillo (2012), Castillo (2014), Castillo &
Rousseau (2015).
Next, obtain nonparametric BvM in appropriately weighted
multiscale space, and use continuous mapping and the
functional delta method.
As in Castillo & Nickl (2014), Gill (1989).
7 / 16
Haar wavelets
ψ00
ψ10 ψ11
ψ20 ψ21 ψ22 ψ23
8 / 16
Smoothness
For β > 0, M > 0 and a set ψ = {ψlk }:
Cψ
β,M
= λ ∈ C([0, 1]) : sup
l≥0
max
0≤k<2l
2l(β+1/2)
| ψlk , λ 2| ≤ M .
For the standard H¨older class H(β, M) and any B > 0:
H(β, M) ⊂ Cψ
β,M for β ≤ 1 when {ψlk } are the Haar
wavelets.
H(β, M) ⊂ Cψ
β,M for all β ≤ B if {ψlk } is smooth enough.
9 / 16
Haar-histogram priors
Defined on r = log λ, with given cut-off Ln such that for some
δ > 0, 2Ln n
1
1+2δ :
r =
Ln
l=−1
2l −1
k=0
Zlk ψlk , Zlk ∼ N(0, σ2
l ).
Prior on log of hazard Prior on log of hazard Prior on log of hazard
Prior on hazard Prior on hazard Prior on hazard
10 / 16
Dyadic histogram priors
With IJ
k = [k2−J, (k + 1)2−J), k = 1, . . . , 2J:
r =
2J
k=1
Zk IJ
k .
Prior on log of hazard Prior on log of hazard Prior on log of hazard
Prior on hazard Prior on hazard Prior on hazard
11 / 16
Autoregressive histogram priors
Building on the autoregressive idea in Arjas and Gasbarra
(1994), we construct dependent histograms such that, with λk
the value on interval IJ
k on the hazard-scale:
E[λk | λk−1, . . . , λ1] = λk−1.
Var(λk | λk−1, . . . , λ1) = σ2
(λk−1)2
.
Prior on hazard Prior on hazard Prior on hazard
12 / 16
. ∞-rate result for Haar-histogram priors
Theorem. Let X(n) = (Y1, δ1), . . . , (Yn, δn) be the observations.
Suppose λ0 ∈ Cβ,M
ψ with β > 1/2, some M > 0, and ψ the Haar
wavelets. Set α = min{β, 1}. Let the prior on the hazard be the
Gaussian Haar-histogram:
r =
Ln
l=−1
2l −1
k=0
Zlk ψlk , Zlk ∼ N(0, σ2
l ),
where σl = 2−l(1/2+α) and Ln = log2{(n/ log n)1/(2α+1)} .
Then, with ε∗
n,α = (log n
n )α/(2α+1), there exists M > 0 such that:
En
λ0
λ − λ0 ∞dΠ(λ | X(n)
) ≤ Mε∗
n,α.
13 / 16
Proof idea
λ − λ0 ∞ ≤ λ − λ∗
Ln ∞ + λ∗
Ln
− λ0,Ln ∞ + λ0,Ln
− λ0 ∞,
where
λ∗
Ln
is a ’frequentist centering’;
λ0,Ln
is the L2-projection of λ0 on Vect{ψlk }.
14 / 16
Maximal inequality
λ − λ∗
Ln ∞ ≤
1
√
n
Ln
l=0
2l/2
√
n max
0≤k<2l
| λ − λ∗
Ln
, ψlk 2|.
For any t > 0:
Eλ0
EΠn
max
0≤k<2l
t
√
n| λ − λ∗
Ln
, ψlk 2|
≤ log


2l −1
k=0
Eλ0
EΠn
et
√
n λ−λ∗
Ln
,ψlk 2
+ e−t
√
n λ−λ∗
Ln
,ψlk 2


On sets An = {λ : λ − λ0 1 ≤ εn}, there exists a C > 0
independent of l, k such that:
E et
√
n λ−λ∗
Ln
,ψlk 2
| Xn
, An ≤ e
t2
2
C
(1 + op(1)).
15 / 16
References
Arjas E and Gasbarra D (1994). Nonparametric Bayesiana inference from right censored survival data, using the
Gibbs sampler. Statistica Sinica, 505–24.
Castillo I (2012). A semiparametric Bernstein-von Mises theorem for Gaussian process priors. Probability Theory
and Related Fields 152, 53–99.
Castillo I (2014). On Bayesian supremum norm contraction rates. The Annals of Statistics 42, 2058–91.
Castillo I and Nickl R (2014). On the Bernstein-von Mises phenomenon for nonparametric Bayes procedures.
The Annals of Statistics 42, 1941–69.
Castillo I and Rousseau J (2015). A Bernstein-von Mises theorem for smooth functionals in semiparametric
models. The Annals of Statistics 43, 2353–83.
De Blasi P and Hjort N (2009). The Bernstein-von Mises theorem in semiparametric competing risks models.
Journal of Statistical Planning and Inference 139, 2316–28.
De Blasi P, Peccati G and Pr¨unster I (2009). Asymptotics for posterior hazards. The Annals of Statistics 37,
1906–45.
Donnet S, Rivoirard V, Rousseau J and Scricciolo C (2017). Posterior concentration rates for counting processes
with Aalen multiplicative intensities. Bayesian Analysis 12, 53–87.
Gamerman D (1991). Dynamic Bayesian models for survival data. Applied Statistics, 63–79.
Gamerman D (1994). Bayes estimation of the piece-wise exponential distribution. IEEE Transactions on
Reliability 43, 128–31.
Gill R (1989). Non- and semi-parametric maximum likelihood estimators and the Von Mises method, Part 1.
Scandinavian Journal of Statistics 16, 97–128.
Ibrahim JG, Chen MH, Sinha D (2013). Bayesian survival analysis. Springer Science & Business Media.
Kalbfleisch JD (1978). Non-parametric Bayesian analysis of survival time data. Journal of the Royal Statistical
Society. Series B (Methodological), 241–21.
Kim Y (2006). The Bernstein-von Mises theorem for the proportional hazard model. The Annals of Statistics 34,
1678–1700.
Kim Y and Lee J (2004). A Bernstein-von Mises theorem in the nonparametric right-censoring model. The
Annals of Statistics 32, 1492–1512.
Nieto-Barajas LE and Walker SG (2002). Markova beta and gamma processes for modelling hazard rates.
Scandinavian Journal of Statistics 29, 413–24.
16 / 16

MUMS: Bayesian, Fiducial, and Frequentist Conference - Uncertainty Quantification for Bayesian Survival Analysis, Stéphanie van der Pas, April 29, 2019

  • 1.
    Uncertainty quantification for Bayesiansurvival analysis Isma¨el Castillo1 St´ephanie van der Pas2 1Laboratoire de Probabilit´es, Statistique et Mod´elisation, Sorbonne Universit´e 2Mathematical Institute, Leiden University BFF Conference, April 29, 2019
  • 2.
    Survival model T :event time; C : censoring time. Y = min{T, C}; δ = 1{T ≤ C}. Independent right censoring. We observe n independent pairs (Y1, δ1), . . . , (Yn, δn). Survival function: P(T > t) 1 / 16
  • 3.
    Survival objects Hazard functionCumulative hazard Survival λ(t) = lim ∆t→0 P(t ≤ T < t + ∆t | T ≥ t) ∆t . Λ(t) = t 0 λ(u)du. S(t) = e−Λ(t). 2 / 16
  • 4.
  • 5.
    The piecewise exponentialmodel qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq Hazard E.g. Gamerman (1991, 1994), Ibrahim, Chen and Sinha (2013), Kalbfleisch (1978), Arjas and Gasbarra (1994), Nieto-Barajas and Walker (2002). 4 / 16
  • 6.
    Bayesian survival analysis Kimand Lee (2004), Kim (2006), BvM for survival objects using neutral to the right processes. De Blasi, Peccati, Pr¨unster (2009), CLTs for linear and quadratic functionals of the hazard, using kernel mixtures with respect to a completely random measure. De Blasi and Hjort (2009), BvM in competing risks setting, using a beta process prior. Castillo (2012), semiparametric BvM for the Cox model, using a Riemann-Liouville type process. Donnet, Rivoirard, Rousseau, Scricciolo (2017), L1-concentration for priors where the normalized hazard and its integral are specified independently. 5 / 16
  • 7.
    Assumptions Hazard function Cumulativehazard Survival There exists τ > 0 such that S0(τ) > 0 and Pλ0 (C ≥ τ) > 0. We take τ = 1. There exists ρ > 0 such that λ0(t) ≥ ρ for all t ∈ [0, τ]. C admits a continuous density g with respect to the Lebesgue measure on [0, τ). There exists ρ > 0 such that g(t) ≥ ρ for all t ∈ [0, τ). 6 / 16
  • 8.
    Our approach Our settingis non-conjugate and the techniques potentially apply to many priors. Key result: establish convergence at the minimax rate in . ∞-norm. Using tools from Castillo (2012), Castillo (2014), Castillo & Rousseau (2015). Next, obtain nonparametric BvM in appropriately weighted multiscale space, and use continuous mapping and the functional delta method. As in Castillo & Nickl (2014), Gill (1989). 7 / 16
  • 9.
    Haar wavelets ψ00 ψ10 ψ11 ψ20ψ21 ψ22 ψ23 8 / 16
  • 10.
    Smoothness For β >0, M > 0 and a set ψ = {ψlk }: Cψ β,M = λ ∈ C([0, 1]) : sup l≥0 max 0≤k<2l 2l(β+1/2) | ψlk , λ 2| ≤ M . For the standard H¨older class H(β, M) and any B > 0: H(β, M) ⊂ Cψ β,M for β ≤ 1 when {ψlk } are the Haar wavelets. H(β, M) ⊂ Cψ β,M for all β ≤ B if {ψlk } is smooth enough. 9 / 16
  • 11.
    Haar-histogram priors Defined onr = log λ, with given cut-off Ln such that for some δ > 0, 2Ln n 1 1+2δ : r = Ln l=−1 2l −1 k=0 Zlk ψlk , Zlk ∼ N(0, σ2 l ). Prior on log of hazard Prior on log of hazard Prior on log of hazard Prior on hazard Prior on hazard Prior on hazard 10 / 16
  • 12.
    Dyadic histogram priors WithIJ k = [k2−J, (k + 1)2−J), k = 1, . . . , 2J: r = 2J k=1 Zk IJ k . Prior on log of hazard Prior on log of hazard Prior on log of hazard Prior on hazard Prior on hazard Prior on hazard 11 / 16
  • 13.
    Autoregressive histogram priors Buildingon the autoregressive idea in Arjas and Gasbarra (1994), we construct dependent histograms such that, with λk the value on interval IJ k on the hazard-scale: E[λk | λk−1, . . . , λ1] = λk−1. Var(λk | λk−1, . . . , λ1) = σ2 (λk−1)2 . Prior on hazard Prior on hazard Prior on hazard 12 / 16
  • 14.
    . ∞-rate resultfor Haar-histogram priors Theorem. Let X(n) = (Y1, δ1), . . . , (Yn, δn) be the observations. Suppose λ0 ∈ Cβ,M ψ with β > 1/2, some M > 0, and ψ the Haar wavelets. Set α = min{β, 1}. Let the prior on the hazard be the Gaussian Haar-histogram: r = Ln l=−1 2l −1 k=0 Zlk ψlk , Zlk ∼ N(0, σ2 l ), where σl = 2−l(1/2+α) and Ln = log2{(n/ log n)1/(2α+1)} . Then, with ε∗ n,α = (log n n )α/(2α+1), there exists M > 0 such that: En λ0 λ − λ0 ∞dΠ(λ | X(n) ) ≤ Mε∗ n,α. 13 / 16
  • 15.
    Proof idea λ −λ0 ∞ ≤ λ − λ∗ Ln ∞ + λ∗ Ln − λ0,Ln ∞ + λ0,Ln − λ0 ∞, where λ∗ Ln is a ’frequentist centering’; λ0,Ln is the L2-projection of λ0 on Vect{ψlk }. 14 / 16
  • 16.
    Maximal inequality λ −λ∗ Ln ∞ ≤ 1 √ n Ln l=0 2l/2 √ n max 0≤k<2l | λ − λ∗ Ln , ψlk 2|. For any t > 0: Eλ0 EΠn max 0≤k<2l t √ n| λ − λ∗ Ln , ψlk 2| ≤ log   2l −1 k=0 Eλ0 EΠn et √ n λ−λ∗ Ln ,ψlk 2 + e−t √ n λ−λ∗ Ln ,ψlk 2   On sets An = {λ : λ − λ0 1 ≤ εn}, there exists a C > 0 independent of l, k such that: E et √ n λ−λ∗ Ln ,ψlk 2 | Xn , An ≤ e t2 2 C (1 + op(1)). 15 / 16
  • 17.
    References Arjas E andGasbarra D (1994). Nonparametric Bayesiana inference from right censored survival data, using the Gibbs sampler. Statistica Sinica, 505–24. Castillo I (2012). A semiparametric Bernstein-von Mises theorem for Gaussian process priors. Probability Theory and Related Fields 152, 53–99. Castillo I (2014). On Bayesian supremum norm contraction rates. The Annals of Statistics 42, 2058–91. Castillo I and Nickl R (2014). On the Bernstein-von Mises phenomenon for nonparametric Bayes procedures. The Annals of Statistics 42, 1941–69. Castillo I and Rousseau J (2015). A Bernstein-von Mises theorem for smooth functionals in semiparametric models. The Annals of Statistics 43, 2353–83. De Blasi P and Hjort N (2009). The Bernstein-von Mises theorem in semiparametric competing risks models. Journal of Statistical Planning and Inference 139, 2316–28. De Blasi P, Peccati G and Pr¨unster I (2009). Asymptotics for posterior hazards. The Annals of Statistics 37, 1906–45. Donnet S, Rivoirard V, Rousseau J and Scricciolo C (2017). Posterior concentration rates for counting processes with Aalen multiplicative intensities. Bayesian Analysis 12, 53–87. Gamerman D (1991). Dynamic Bayesian models for survival data. Applied Statistics, 63–79. Gamerman D (1994). Bayes estimation of the piece-wise exponential distribution. IEEE Transactions on Reliability 43, 128–31. Gill R (1989). Non- and semi-parametric maximum likelihood estimators and the Von Mises method, Part 1. Scandinavian Journal of Statistics 16, 97–128. Ibrahim JG, Chen MH, Sinha D (2013). Bayesian survival analysis. Springer Science & Business Media. Kalbfleisch JD (1978). Non-parametric Bayesian analysis of survival time data. Journal of the Royal Statistical Society. Series B (Methodological), 241–21. Kim Y (2006). The Bernstein-von Mises theorem for the proportional hazard model. The Annals of Statistics 34, 1678–1700. Kim Y and Lee J (2004). A Bernstein-von Mises theorem in the nonparametric right-censoring model. The Annals of Statistics 32, 1492–1512. Nieto-Barajas LE and Walker SG (2002). Markova beta and gamma processes for modelling hazard rates. Scandinavian Journal of Statistics 29, 413–24. 16 / 16