Survival analysis 1

Regression with Frailty in Survival analysis
STATS 756: Topics in Biostatistics - Fall 2021
Lim, Kyuson
Department of Mathematics and Statistics
McMaster University
October 29th, 2021
Kyuson Lim 1 / 25

Outline
1 Introduction/Motivation
2 Partial Likelihood
3 Penalize Partial Likelihood (PPL)
4 Newton-Raphson Method
5 Simulation Study
6 Expansion/Appendix
7 Biblibiography
Kyuson Lim 2 / 25

Survival analysis- How to survive?
The Cox proportional-hazards model (Cox, 1972) is essentially a regression model commonly used
statistical in medical research for investigating the association between the survival time of patients and one
or more predictor variables
A graph of survival analysis
The empirical hazard function is a step function evaluated at each time.
The survival probability, S(t) is the probability that an individual survives from time origin to a specified
future time t.
The hazard, h(t) is a continuous probability function that an individual who is under observation at a time t
has an event at that time.
Kyuson Lim 3 / 25

Survival analysis- Introduction
As the survival function is a decreasing from t = 0 to ∞ for continuous variable, the
distribution function is explained by F(t) = p(T ≤ t).
Hence,
S(t) = p(T > t) =
Z ∞
t
f(u)du = 1 − F(t)
For survival time T, hazard function h(t) = f(t)
S(t) and instantaneous time (4t → 0) illustrate
the event after time t, h(t) = −d(log S(t))
dt = f(t)
1−F(t) for cdf H(t) =
R t
0
h(u)du.
Therefore,
H(t) = − log(S(t))
and S(t) = exp(−H(t)).
If one is known, other two are easily determined.
Kyuson Lim 4 / 25

Survival analysis- Regression with frailty in survival analysis
Infections of kidney patients
A graph of survival analysis for Kidney data
Survival function defines a probability of surviving up to a point t, S(t) = p(T > t).
Hazard function is an instantaneous failure rate, given subject has survived up to time
t and fails in next small interval of time, h(t) = limδ→0
p(t<T<t+δ|T>t)
δ
Kyuson Lim 5 / 25

Survival analysis- Goal and Ideas
Cox proportional hazard model analyze the various influential covariates
simultaneously, where h0(t) is non-parametric part and exp(βxi) is parametric part.
Fit regression model to censored survival data and partial likelihood allows us to
compare 2 groups of survival data.
By maximized log-likelihood of β, Newton-Raphson algorithm is used to derive the
estimates.
Kyuson Lim 6 / 25

Cox proportional hazard model
An endpoint, a single cause of death, and the survival times of each case have been
assumed to be independent.
Methods for analyzing such survival data is not sufficient if cases are not independent or
if the event could occur repeatedly.
Proportional hazard model: h(t|xi ) = h0(t) exp(β0
xi (t)), β0
= (β1, .., βp)
A cluster-specific random effect terms have a relative effect on the baseline hazard
function, h0(xi ), reflect underlying hazard for subjects with all covariates x1, ..., xp equal
to 0.
For 2 covariates, x1 = 1 and x2 = 0, a hazard rate for treated group is
h1(t|xi = 1) = h0(t) exp(β), h1(t|xi = 1) = h1(t)
Two hazards is constant exp(β), not dependent on time (t) and two hazards ratio of 2 groups
remain proportional over time
h1(t)
h0(t)
= exp(β)
Kyuson Lim 7 / 25

Parametric cox model with frailty term
In clustered data, survival times of individuals that are in same unit or family, meaning that
survival times within a cluster are similar, then the independence no longer holds.
To accommodate such structure of subjects in the same cluster is to assign each individual in
a cluster a common factor known as a frailty or as random effect.
A random effects incorporated for within-cluster homogeneity in outcomes 1
.
Shared frailty model: h(t|xij ) = h0(t) exp(β0
xij (t) + wi )
wi : random effect for ith cluster for all individuals, vary across clusters.
Subjects in the same cluster all share the same frailty factor.
A frailty model refer to a survival model with only a random intercept.
In the paper, log-frailities are assumed to be normally distributed with
E(log(wi )) = 0, Var(log(wi )) = σ2
(I − M−1
110
) 2
1
increase/decrease hazard for distinct class
2
wi = 1, 10
u = 0
Kyuson Lim 8 / 25

Partial likelihood
Goal: estimate β that does not depend on h0(t) for ordered death time of r individuals,
t(1) < · · · < t(r).
Define risk set, R(t(j)), to be the group of individuals who are alive and uncensored at at
a time prior to t(j).
P(individuals i dies at t(j) given one individual from risk set on R(t(j)) dies at t(j)| one
death from the risk set R(t(j)) at t(j)) = P(individual i dies at t(j))/P( one death at t(j))
⇔
hi (tj |xi )
P
k∈R(t(j)) hk (ti |xj )
=
h0(t(j)) exp(β0
xi )
P
k∈R(t(j)) h0(t(j)) exp(β0
xk )
=
exp(β0
xi )
P
k∈R(t(j)) exp(β0
xk )
Partial likelihood: L(β) = Πr
j=1
exp(β0
xi )
P
k∈R(t(j)) exp(β0
xk )
, r ∈ {t(1), ..., t(r)}(survival time)
xi is a vector of covariates for individual i who dies at t(j).
A partial likelihood allow to use unspecified baseline survival distribution to define a survival
distributions of subjects based on their covariates.
Kyuson Lim 9 / 25

Breslow Partial likelihood
A likelihood function is only for uncensored individuals.
L(β) = Πn
i=1
(
exp(β0
xi )
P
k∈R(t(j)) exp(β0
xk )
)di
t1, ..., tn: observed survival time for n individuals.
di : event indicator, 0 if ith survival time is censored.
log L(β) =
n
X
i=1
log

Πr
j=1
exp(β0
xi )
P
k∈R(t(j)) exp(β0
xk )

=
n
X
i=1
di log

exp(β0
xi )
P
k∈R(t(j)) exp(β0
xk )

=
n
X
i=1
di log(exp(β0
xi ) −
n
X
i=1
di log
X
k∈R(t(j))
exp(β0
xk )

=
n
X
i=1
di β0
xi −
n
X
i=1
di log
X
k∈R(t(j))
exp(β0
xk )

The partial likelihood is valid when there are no two subjects who have same event time.
Variation of hazard rate attribute to dependence of risk variables or frailty terms, hence the frailty is a random
component.
Kyuson Lim 10 / 25

Example for computing Partial likelihood
At time 0, 6 patients are at a risk of experiencing an event, which is defined as group of patients for initial set R1.
1 Before first failure at time t = 6, 6 patients are at risk and anyone could experience event.
2 By groups, exp(x1β) = exp(x2β) = exp(x4β) = 1, exp(x3β) = exp(x5β) = exp(x6β) = exp(β)
Patient Survtime Censor Group
1 6 1 C
2 7 0 C
3 10 1 T
4 15 1 T
5 19 0 T
6 25 1 T
Table 1. Survival data
3 Substitute for p1 = h0(t1) exp(xi β)
P
k∈R1
h0(t1) exp(xk β) , where h0(t1) is the hazard for a subject from a control group, yield
p1 = 1h0(t1)
3h0(t1) exp(β)+3h0(t1) = 1
3 exp(β)+3 .
4 At time 7, a control patient dropped out and at t = 10, p2 = exp(β)
3 exp(β)+1 as well as at time t = 15 three patients at
risk to give p3 = 1
2 exp(β)+1 .
5 At last event t = 25, one subject is at risk with partial likelihood to be the product of all, L(exp(β))
= exp(β)
(3 exp(β)+3)(3 exp(β)+1)(2 exp(β)+1) .
6 Taking the log transformation, l(β) = β − log(3 exp(β) + 3) − log(3 exp(β) + 1) − log(2 exp(β) + 1).
Kyuson Lim 11 / 25

Example for computing Partial likelihood
A partial likelihood is maximized to obtain for the estimate of β.
plsimple - function(beta) {
+ psi - exp(beta)
+ result - log(psi) - log(3*psi + 3) - log(3*psi + 1) - log(2*psi + 1)
+ result }
result - optim(par=0, fn = plsimple, method = L-BFGS-B,
control=list(fnscale = -1), lower = -3, upper = 1)
result$par
[1] -1.326129
Maximum partial likelihood estimate by Newton-Rapshon algorithm
The solid curved black line is a plot of the log partial likelihood over a range of values of β.
The maximum is indicated by the vertical dashed blue line, and the value of the log-partial likelihood at a point is
-3.672.
The value -4.277 of the log-partial likelihood is at the null hypothesis value, β = 0.
Kyuson Lim 12 / 25

Penalize Partial Likelihood (PPL)
Taking a log, random effect are treated as penalty term in GLM by the Best Linear Unbiased
Prediction (BLUP)
Maximization in PPL is a double iterative process, alternates between inner (lpart ) and outer
loop (lpenalize) until convergence.
A penalty term of random effect is far away from mean value 0, by reducing a penalized
partial likelihood.
If wi ∼ N(0, σ2
D) where D is known matrix, then BLUP consists of maximizing a sum of two
log-likelihood:
lPPL(θ, β, w) = lpart (β, w) − lpenalize(θ, w)
lpart (β, w): conditional likelihood for data given frailties.
lpenalize(θ, w): stands for the distribution for frailties.
In lpart , a Newton-Raphson uses local quadratic approximations of penalty term.
Iterate to estimate β and wi , using the derivative of likelihood and variance matrix, V.
Kyuson Lim 13 / 25

Newton-Raphson algorithm
Originated from Taylor’s series f(x) ≈ f(xk ) + (x − xk )f0
(xk ) + 1
2! (x − xk )2
f00
(xk ) + · · ·
+ 1
n! (x − xk )n
f(n)
(xk ), the system of non-linear equations is solved by the procedure of
Newton-Raphson method.
Approximate roots, f(β) = 0
1 Start with initial value β(0)
of β.
2 First-order linear approximation of f at β(0)
+ h:
f(β(0)
+ h) ≈ f(β(0)
) + hf0
(β(0)
)
3 Solve to find solution β(1)
(updated) = β(0)
+ h of f(β) = 0 ⇒ f(β(1)
) = 0 by
h = −{f0
(β(0)
)}−1
f(β(0)
) and thus β(1)
= β(0)
− {f0
(β(0)
)}−1
f(β(0)
)
4 Iterate until process converges β(k+1)
≈ β(k)
.
A GLM (poisson, logistic) uses the method of iteration for estimating the coefficients.
A Newton-Raphson procedure converge if sufficient variation of measure risk variables exists
within each patients.
Kyuson Lim 14 / 25

Newton-Raphson algorithm - Visual illustration
Produces better approximations to the roots of a real-valued function.
xn+1 = xn −
f(xn)
f0(xn)
For example, when f(x) = x2
− a and f0
(x) = 2x, the initial guess is x0 = 10 and the
difference is set to be small to iterate until convergence.
x1 = x0 −
f(x0)
f0(x0)
= 10 −
102 − 612
2 × 10
= 35.6
x2 = x1 −
f(x1)
f0(x1)
= 35.6 −
35.62 − 612
2 × 35.6
= 26.395
x3 = · · · = 24.790
x4 = · · · = 24.7376
x5 = · · · = 24.738633753
A graph of Newton-Raphson method in quadratic formula
Kyuson Lim 15 / 25

Newton-Raphson iterative procedure
With initial estimate of β0, w0, the goal is to iteratively estimate β with PPL.
Log-likelihood is approximately quadratic in region of true values.
Maximizing lpart − lpenalize gives estimators where a joint log-likelihood is lPPL,

β̂
ŵ

=

β0
w0

+ V−1

∂lpart /∂β0
∂lpart /∂w0

− V−1

0
σ−2w0

, V =

−∂2lpart /∂β∂β0
−∂2lpart /∂β∂w0
−∂2lpart /∂β∂β0
−∂2lpart /∂w∂w0 + σ−2I

The variance matrix V taken to be σ2
(I − M−1
), 10
V = 0.
β̂, ŵ has approximately a joint normal distribution with mean β, w with variance matrix V.
Kyuson Lim 16 / 25

Simulation study: Infections in Kidney Patient
The data contains recurrence times to infection, at the point of insertion of the catheter, for
kidney patients using portable dialysis equipment.
Disease types are classified as 0 = GN, 1 = AN, 2 = PKD, 3 = other.
A patient have two recurrence times given, as Infection occurs = 1 and censored = 0.
A 5 regression variables are fitted with age, sex and presence or absence of disease type.
The package used is ‘survival’ (p.55), where the data of the original paper is preserved
for public access.
Package of ‘ggsurvplot’ (survival plot), ‘ggforest’ (hazard ratio) is used to model and run
model diagnostics.
Kyuson Lim 17 / 25

Simulation study: specified distribution
The data of 76 patients for
data(kidney)
kfitm1 - coxph(Surv(time,status) ~ age + sex + disease + frailty(id, dist=’gauss’))
kfitm1
Call:
coxph(formula = Surv(time, status) ~ age + sex + disease + frailty(id,
dist = gauss), data = kidney)
coef se(coef) se2 Chisq DF p
age 0.00489 0.01497 0.01059 0.10678 1.0 0.74384
sex -1.69728 0.46101 0.36170 13.55454 1.0 0.00023
diseaseGN 0.17986 0.54485 0.39273 0.10897 1.0 0.74131
diseaseAN 0.39294 0.54482 0.39816 0.52016 1.0 0.47077
diseasePKD -1.13631 0.82519 0.61728 1.89621 1.0 0.16850
frailty(id, dist = gauss 17.89195 12.1 0.12376
Iterations: 7 outer, 42 Newton-Raphson
Variance of random effect= 0.493
Degrees of freedom for terms= 0.5 0.6 1.7 12.1
Likelihood ratio test=47.5 on 14.9 df, p=3e-05
n= 76, number of events= 58
The only regression coefficient with -1.697 that is significantly large compared to its
standard error is that of the sex variable, indicating a lower infection rate for female patients.
Kyuson Lim 18 / 25

Simulation study: Infections in Kidney patients
The estimate of σ2 = 0.3821. In general, the effect of the prior distribution on frailty terms is to
shrink estimates toward the origin, which bias the estimate.
kfit - coxph(Surv(time, status)~ age + sex + disease + frailty(id), kidney)
kfit
Iterations: 6 outer, 35 Newton-Raphson
Variance of random effect= 5e-07 I-likelihood = -179.1
Degrees of freedom for terms= 1 1 3 0
Likelihood ratio test=17.6 on 5 df, p=0.003
n= 76, number of events= 58
round(kfit$coefficients, 3)
age sex diseaseGN diseaseAN diseasePKD
0.003 -1.483 0.088 0.351 -1.431
Kyuson Lim 19 / 25

Simulation study: Infections in Kidney patients
Test the proportional hazards assumption for a Cox regression model fit (coxph).
cox.zph(kfit)
chisq df p
age 0.105 1 0.746
sex 5.953 1 0.015
disease 1.985 3 0.576
GLOBAL 7.869 5 0.164
Figure: A graph of coefficient vs. time
The plot gives an estimate of the time-dependent coefficient β(t). If the proportional hazards
assumption holds then the true β(t) function would be a horizontal line, slope of 0.
Kyuson Lim 20 / 25

Simulation study: Linearity
Martingale residuals: a discrepancy between the observed value of a subject’s failure indicator and its expected
value, integrated over the time for which that patient was at risk.
The martingale residuals are plotted against covariates to detect nonlinearity.
Plots of martingale residuals and partial residuals are examined against the last two of covariates, age and sex.
Smooths are produced by local linear regression (using the lowess function).
There are no observed non-linearity.
Kyuson Lim 21 / 25

Simulation study: Influential points
Comparing the magnitudes of the largest values to the regression coefficients suggests that 1 observation is
influential individually.
One of the males (id 21) is a large outlier, with much longer survival than his peers. If this observation is removed no
evidence remains for a random subject effect.
Kyuson Lim 22 / 25

Cox proportional hazard model
The predicted survival profiles for patients 5, and 12 is modeled.
Predicted survival curves for three patients using the penalization
Kyuson Lim 23 / 25

Generalized gamma frailty model
The paper presents a frailty model using the generalized gamma distribution as the frailty
distribution, and lognormal, Weibull frailty model as special cases.
Written by Dr. N. Balakrishnan, the BLUP method of this paper is addressed for modeling a
new frailty model with generalized gamma distribution that has more parameters to be less
parametric and more flexible.
Instead of EM algorithm, the Newton-Raphson algorithm is applied to obtain the MLE of
parameters.
The use of generalized gamma distribution as the frailty distribution in a frailty model has
substantially improved the goodness-of-fit of the frailty model.
The model is particularly useful in reducing errors in frailty variance estimation. Also, the
performance of the likelihood ratio test depends on the cluster size.
Kyuson Lim 24 / 25

References
Regression with Frailty in Survival Analysis [C. A. McGilchrist and C. W. Aisbett]
https://www.jstor.org/stable/2532138?casa_token=cxuDrkxyJzUAAAAA%
3AEnp4ejKDMHcBHgMbROgKulGAA-lUE0Iw16oVqCSqDXPbWGutHjuBeIJ7URMAZSIioGrZdBNLmqvx4fYUX_
3D0LUaBnEGd-dVIBW88Bkm6vPgEhEca24seq=1#metadata_info_tab_contents.
Generalized gamma frailty model [N. Balakrishnan and Yingwei Peng]
https://pubmed.ncbi.nlm.nih.gov/16220516/
(R) Package ‘survival’ [Terry M Therneau, et. al]
https://cran.r-project.org/web/packages/survival/survival.pdf
Applied Survival Analysis Using R [Dirk F Moore]
https://link.springer.com/book/10.1007/978-3-319-31245-3
Thank you for the participation and understandings !
Kyuson Lim 25 / 25

Survival analysis 1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Survival analysis 1

Similar to Survival analysis 1 (20)

More from KyusonLim

More from KyusonLim (7)

Recently uploaded

Recently uploaded (20)

Survival analysis 1