430 PROJJ

University of Illinois
Statistics 430
Survival Analysis
Non-Parametric Estimation of Summary
Curves for Competing Risks
Authors:
McClelland Kemp & Michael Smith
December 20, 2012

Introduction
In survival analysis a participant in a study may experience an event other than the one of
interest, altering the probability of experiencing the event of interest. These events are known
as competing risk events. Competing risks commonly occur in medical studies, and meth-
ods were developed specifically to deal with their occurrence in cancer research since both
treatment-related mortality and disease recurrence are important events of interest.
The standard methodology for approaching this kind of time-to-event data was developed
by Robert Gray in the 1980’s, and is still used today. Statistical software such as SAS and R
have PROC’s and packages developed for this specific purpose. The R package cmprsk was
written by Gray for data with competing risks and is used extensively in this report.
Method
This topic deviates from the main focus of the course in that the practice of censoring data
in a non-informative way is no longer acceptable. Since the Kaplan-Meier estimation proce-
dure observes only the primary event of interest, it is rendered useless in a competing risks
situation. Similarly, the Mantel-Haenzel log-rank test for comparison of cumulative incidence
curves, and the standard Cox model for the assessment of covariates lead to incorrect and bi-
ased results. This bias arises because the Kaplan-Meier estimation assumes that all events are
independent and consequently censors events other than the event of interest.
We wish, rather, to find the cumulative incidence of a specific event of interest. Any subject
who fails to experience the event of interest can be treated as censored, but in an informa-
tive manner. The cumulative incidence function for an event of interest must be calculated
by appropriately accounting for the presence of competing risk events. In order to compare
the cumulative incidence curves of a particular type of failure among different groups in the
presence of competing risks, Robert Gray proposed a test that compares weighted averages of
the hazards of cumulative incidence function using the cumulative incidence estimation equa-
tion. The null hypothesis predicts that all the cumulative incidence curves for all the groups
are equivalent, and the test statistic follows a χ2
distribution with degrees of freedom ng − 1
where ng is the number of groups.
The cumulative incidence function estimates the probability that the event of interest occurs
before time t and that it occurs before any of the competing causes of failure. Let t1 < t2 <
· · · < tK be the distinct times where one of the competing risks occur, where at time ti, Yi is
the number of subjects at risk, ri is the number of subjects with an occurrence at time ti and di
is the number of subjects with an occurrence of any other of the competing risks at this time.
The cumulative incidence function is then defined by:
CIi(t) =



0 if t ≤ t1
t1≤t
i−1
j=1 1 −
[dj+rj]
Yj
ri
Yi
if t1 ≤ t
(0.1)
i

With variance:
V [CI(t)] =
t1≤t
ˆS (ti)2
[CI(t) − CI(ti)]2 ri + di
Y2
i
+ [1 + 2 (CI(t) − CI(ti))]
ri
Y2
i
, (0.2)
where ˆS (ti) is the Kaplan-Meier estimator for the survival curve deﬁned by:
ˆS (ti) =
1 if t < t1
t1≤t 1 − di
Yi
if t1 ≤ t
(0.3)
Gray’s test evaluates H0 : CI1 = CI2 = · · · = CIng
. A score is calculated for each of the ng-1
groups and put in a vector. The form of the test statistic for the ng=2 situation is below, while
the derivation of the test statistic for the ng > 2 groups situation is omitted.
τk
0
Kn(t) 1 − ˆCI11(t−)
−1
d ˆCI11(t) − 1 − ˆCI12(t−)
−1
d ˆCI12(t) , (0.4)
where ˆCI1n(t−) is the cumulative incidence function evaluated just before time t and Kn(t) is the
weight function for the nth group. We would like to choose a weight function that maximizes
the power of the test against particular alternatives of interest. Gray suggests that regular
survival analysis for competing risks can give a good indication of what a good choice for the
weight function should be, and in general should use the Harrington and Flemming family of
weight functions. Here, we use ρ = 0 for simplicity. The calculation of the test statistic is as
follows:
ZΣ−1
Zt
(0.5)
Where Z is the vector of the K − 1 scores for the corresponding groups (Linear dependence
lets us exclude the last score) and Σ is the estimated (K-1× K-1) covariance matrix. In the
K=2 case, the vector Z is a scalar and the estimated covariance matrix is just the variance of
the score for K=1.
Another topic of interest is the conditional probability function for each competing risk,
which gives an estimate of the conditional probability of event K’s occuring by t given that
none of the other causes have occurred by t. It can be calculated by:
CPi(t) =
CIi(t)
1 − CIic (t)
(0.6)
Where CIi(t) denotes the cumulative incidence function for the ith competing risk at time t,
and CIic (t) denotes the sum of all other risks.
ii

Objectives
We will compare the estimated cumulative incidence curves with those obtained using the
Kaplan - Meier approach to demonstrate the importance of appropriately estimating the cu-
mulative incidence of an event of interest in the presence of competing risk events. We will
also interpret the Gray’s tests for equality of cumulative incidence curves and conditional
probability functions to infer conclusions about our data.
We recognize that testing the equality of curves between the Kaplan-Meier Curves and the
estimated cumulative incidence function using a Log-Rank test is not appropriate for com-
peting risks but due to complexity, the proper method of testing requires competing risks
regression is not included in this paper. We will rely on the ’cuminc’ function to perform the
Gray’s test.
Relevance of Data
The data that will be used for analysis is the ’bmt’ data which documents bone marrow trans-
plants for leukemia patients. The transplantation was considered to be a failure when the
patient’s leukemia returned (relapse) or the patient died in remission. This makes it an ideal
case for competing risks with K = 2 and ng = 3.
iii

Analysis
The bmt data includes three groups: ALL, AML Low Risk and AML High Risk. The variable
’group’ contains factors that indicate which group the subject is in. The variable t2 contains
numeric values indicating the number of days before the subject died, relapsed or left the study
and was censored. The variable d1 is an indicator on whether or not there was an observed
death. The variable d2 is an indicator on whether or not there was an observed relapse. Since
we are only concerned with the competing risks of death or relapse, we did not use the rest of
the data set. To create a usable data set, we created a new variable delta:
delta[i] =



0, if no relapse or death was observed
1, if a relapse was observed
2, if a death was observed
(0.7)
For subjects that experienced a relapse and then death, they were coded as a relapse (obvi-
ously the subject would have experienced a relapse before their death, since it is impossible to
relapse once you are dead). Our new data set contained three variables: group, t2, and delta.
We then scaled the variable t2 so it described the time to event in years.
Our ﬁrst point of analysis was estimating using the Kaplan-Meier process, which can be
compared to the cumulative incidence and conditional probability summary curves when one
minus the Kaplan-Meier estimate is plotted against time. For each group we plotted the curves
for the event probability of death and relapse after the transplant against years on study.
To examine the cumulative incidence curves, we used the function ’cuminc’ out of the
package created by Gray called ’cmprsk’ which also includes a function for competing risks
regression for sub-distribution functions. The function ’cuminc’ takes arguments for a time
variable, a nominal group variable, a nominal indicator variable (k for an observed kth com-
peting risk.), a value for rho (the power of the weight function), and optional stratiﬁcation
variable. It yields the results of Gray’s test, the time of events, the cumulative incidence esti-
mates and their variances. We created plots to examine the summary curves and inferred on
the Gray’s test when appropriate.
To calculate the conditional probabilities we again used the ’cuminc’ function. It stores the
times and estimates for group n and event m at object[["n m"]]$ time and object[["n m"]]$ est
where ’object’ is the character string assigned to the ’cuminc’ function. To calculate the actual
estimates, we used equation (0.6) and looped over all estimates, then plotted them against the
times.
1

Kaplan-Meier Estimation
0 1 2 3 4 5
0.10.20.30.4
Disease Group (ALL)
Time
Probability
Death
Relapse
0 1 2 3 4 5 6 7
0.10.20.30.4
Disease Group (AML Low Risk)
Time
Probability
Death
Relapse
0 2 4 6
0.00.20.40.6
Disease Group (AML High Risk)
Time
Probability
Death
Relapse
The plots of one minus the survival
estimates found using the ’survﬁt’
function are plotted above for each
disease group. It appears that in all
groups except the AML Low Risk, study
subjects have a higher probability of
relapsing than dying for a majority of the
study. The large diﬀerence in the AML
Low Risk subjects could be attributed to
the low number of events observed in the
group.
Again, it should be noted that these curves are based on naive estimates. That is, that they were
estimated based upon the assumption that there did not exist any other competing risk, and we
are merely calculating them for comparison with other summary curves that are deemed more
appropriate to represent event probability when there are competing risks present.
2

Cumulative Incidence
0 1 2 3 4 5
0.00.20.40.60.81.0
Cumulative Incidence (ALL Group)
Time
Probability
ALL 1
ALL 2Death
Relapse
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
Cumulative Incidence (AML Low Risk Group)
Time
Probability
2 1
2 2
Death
Relapse
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence (AML High Risk Group)
Time
Probability
3 1
3 2Death
Relapse
While Gray’s test is not preformed, these
cumulative incidence curves are
descriptive of the difference between the
Kaplan-Meier estimation and the
procedure that accounts for the
competing risks. In every set of curves
there is a reversal from the KM curves.
For example, the AML Low Risk group
was estimated by the KM curve to have a
higher event probabilities for death, but
when the competing risk is taken into
account, relapse has a higher probability.
The results confirm the inadequacy of the Kaplan-Meier estimator to accurately model a situ-
ation with competing risks. This is an extreme case, where the conclusions of the cumulative
incidence curves directly contradicted the KM curves, but in general one would expect to see
a drastic effect of including the competing risks when estimating.
As stated above, Gray’s test was not preformed because it only tests for equality of cumu-
lative incidence curves between groups, not between risks. We will, however, test the equality
of the cumulative incidence functions for the three groups by risk on the next page.
3

Cumulative Incidence (cont.)
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence − Relapse Probabilities
Time
Probability
1 1
2 1
3 1
ALL
AML Low Risk
AML High Risk
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence − Death Probabilities
Time
Probability
1 2
2 2
3 2
ALL
AML Low Risk
AML High Risk
Tests:
stat pv df
1 15.19568 0.0005015337 2
Tests:
stat pv df
2 3.128035 0.2092935 2
When comparing the global risk of relapse and death amongst the three types of groups, we
see clear differences in trends for relapse probabilities and a not-so-clear difference in death
probabilities. This is confirmed by the Grey’s test conducted as part of the ’cuminc’ function.
We see a significant p-value in the test for relapse, and a less than significant p-value for the
test for death, which leads to the conclusion that relapse is more likely depending on the group
to which one belongs, but probability of death is not necessarily different for the three groups.
4

Conditional Probability
Plotting the Kaplan-Meier, cumulative incidence, and the conditional probability curves to-
gether gives us some more insight about the cumulative probability. As seen in the plot, the
conditional probability curve changes value at the occurrence of either death or relapse. This
is the result of a change in likelihood of one event causing changes in the probabilities of
future events. The conditional probability should generally be greater than the other two and
the cumulative incidence generally smaller.
Conclusions
The use of the ’cuminc’ function enabled us to exhibit the importance of adjusting event prob-
ability based on the presence of competing risks. The results of the analysis of our data
confirmed the need for a less naive form of estimation than was offered by the Kaplan-Meier
procedure, which produced completely contradictory results.
When comparing the summary curves of the three groups ALL, AML Low Risk and AML
High Risk, we found that the cumulative incidence curves for Relapse were significantly dif-
ferent from each other. This signifies that the event probability for a subject in each group
would have to be estimated independently of the other groups. Conversely, we found that the
groups have cumulative incidence curves for death that were not significantly different from
each other. This implies that the event probability for a subject in any group could be estimated
well by any of the curves.
5

Sources
Gray, Robert "A class of K sample tests for comparing the cumulative
incidence of a competing risk" (1986) 1-15newline
Klein, John P. Moeschberger, Melvin L. "Survival Analysis: Techniques
for Censored and Truncated Data" (2003) 127-133
R Code
#### Packages #####
install.packages(c("mstate","survival","KMsurv","cmprsk"))
library(mstate)
library(survival)
library(KMsurv)
library(cmprsk)
setwd("C:/Users/User/Dropbox/430 Proj")
detach()
#### Data Prep ####
delta=vector()
attach(bmt)
for(i in 1:length(bmt$t1)){
if (d1[i]==1 & d2[i]==1) {delta[i]=1}
if (d1[i]==1 & d2[i]==0) {delta[i]=2}
if (d1[i]==0 & d2[i]==1) {delta[i]=1}
if (d1[i]==0 & d2[i]==0) {delta[i]=0}
}
detach(bmt)
group=vector()
for(i in 1:length(bmt$group)){
if (bmt$group[i]==1) {group[i]="ALL"}
if (bmt$group[i]==2) {group[i]="AML Low Risk"}
if (bmt$group[i]==3) {group[i]="AML High Risk"}
}
group<-rep(c("ALL","AML Low Risk","AML High Risk"),c(38,54,45))
Marrow<-cbind(group,delta,bmt[,-1])
Marrow$t2=as.numeric(bmt$t2/365)
Marrow<-as.data.frame(Marrow)
#### Kaplan - Meier Estimate ####
6

ALL<-subset(Marrow,group=="ALL") # rows 1 - 38
AMLL<-subset(Marrow,group=="AML Low Risk") # rows 39 - 92
AMLH<-subset(Marrow,group=="AML High Risk") # rows 93 - 137
ALLDeath<-subset(ALL,delta != 1)
ALLRelapse<-subset(ALL,delta != 2)
KMALL1<-survfit(Surv(t2,d1)~1,data=ALLDeath,type="kaplan-meier")
KMALL2<-survfit(Surv(t2,d2)~1,data=ALLRelapse,type="kaplan-meier")
plot(KMALL1$time,1-KMALL1$surv,col=1,type="s",ylab="Probability",xlab="Time",mai
lines(KMALL2$time,1-KMALL2$surv,type="s",lty=1,col=2)
legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2)
AMLLDeath<-subset(AMLL,delta != 1)
AMLLRelapse<-subset(AMLL,delta != 2)
KMAMLL1<-survfit(Surv(t2,d1)~1,data=AMLLDeath,type="kaplan-meier")
KMAMLL2<-survfit(Surv(t2,d2)~1,data=AMLLRelapse,type="kaplan-meier")
plot(KMAMLL1$time,1-KMAMLL1$surv,type="s",ylab="Probability",xlab="Time",main="
lines(KMAMLL2$time,1-KMAMLL2$surv,type="s",col=2)
AMLHDeath<-subset(AMLH,delta != 1)
AMLHRelapse<-subset(AMLH,delta != 2)
KMAMLH1<-survfit(Surv(t2,d1)~1,data=AMLHDeath,type="kaplan-meier")
KMAMLH2<-survfit(Surv(t2,d2)~1,data=AMLHRelapse,type="kaplan-meier")
plot(KMAMLH1$time,1-KMAMLH1$surv,type="s",ylab="Probability",ylim=c(0,max(1-KMA
lines(KMAMLH2$time,1-KMAMLH2$surv,type="s",col=2)
legend(locator(1), legend=c("Death","Relapse"), col=1:2,lty=1)
#plot(KMAMLL1$time,1-KMAMLL1$surv,type="s",col=1,ylab="Probability",xlab="Time",mai
#lines(KMALL1$time,1-KMALL1$surv,type="s",col=2)
#lines(KMAMLH1$time,1-KMAMLH1$surv,type="s",col=3)
#legend(locator(1), legend=c("AML Low Risk","ALL","AML High Risk"), col=1:3,lty=1)
#plot(KMAMLH2$time,1-KMAMLH2$surv,type="s",col=1,ylab="Probability",xlab="Time",mai
#lines(KMAMLL2$time,1-KMAMLL2$surv,type="s",col=2)
#lines(KMALL2$time,1-KMALL2$surv,type="s",col=3)
#legend(locator(1), legend=c("AML Low Risk","ALL","AML High Risk"), col=1:3,lty=1)
7

#### Cumulative Incidence ####
# The cuminc function returns an option "tests" that gives the test statistics and
# comparing the subdistribution for each cause across groups.
CI1<-cuminc(Marrow$t2[1:38],Marrow$delta[1:38],Marrow$group[1:38]); # ALL Death vs.
plot(CI1,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence
CI2<-cuminc(Marrow$t2[39:92],Marrow$delta[39:92],Marrow$group[39:92]); # AMLL Death
plot(CI2,col=1:2,lty=1:2,ylab="Probability",xlab="Time",main="Cumulative Incidenc
legend(locator(1), legend=c("Death","Relapse"), lty=1, col=1:2)
CI3<-cuminc(Marrow$t2[93:137],Marrow$delta[93:137],Marrow$group[93:137]); # AMLH De
plot(CI3,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence
Death<-subset(Marrow,delta!=1)
CI4<-cuminc(Death$t2,Death$delta,Death$group); name<-names(CI4) # Death for all g
plot(CI4,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat
legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt
Relapse<-subset(Marrow,delta!=2)
CI5<-cuminc(Relapse$t2,Relapse$delta,Relapse$group);name<-names(CI5) # Relapse fo
plot(CI5,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat
legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt
CI6<-cuminc(Marrow$t2,Marrow$delta,Marrow$group);name<-names(CI6)
plot(CI6,name[1:6],col=1:6,lty=1:6,ylab="Probability",xlab="Time",main="Cumulative
#### Conditional Probability ####
sourced = ifelse(source==1,1,0)
sourcer = ifelse(source==2,1,0)
#ALL group:
#Cumulative Incidence
cirall = cuminc(t2[1:38],sourcer[1:38],group[1:38])
cidall = cuminc(t2[1:38],sourced[1:38],group[1:38])
8

cpktrall = (cirall[["1 1"]]$est)/(1-cidall[["1 1"]]$est)
cpktdall = (cidall[["1 1"]]$est)/(1-cirall[["1 1"]]$est)
plot(cuminc(t2[1:38],sourcer[1:38],group[1:38]),ylab="Probability",xlab="Time",main
lines(KMALL1$time,1-KMALL1$surv,type="s",col="2")
lines(cirall[["1 1"]]$time,cpktrall,type="s",col="3")
legend(locator(1), legend=c("Cumulative-Incidence","Kaplan-Meier","Conditional Pr
plot(cuminc(t2[1:38],sourced[1:38],group[1:38]),ylab="Probability",xlab="Time",main
lines(KMALL1$time,1-KMALL1$surv,type="s",col="2")
lines(cidall[["1 1"]]$time,cpktdall,type="s",col="3")
legend(locator(1), legend=c("Cumulative-Incidence","Kaplan-Meier","Conditional P
9

430 PROJJ

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to 430 PROJJ

Similar to 430 PROJJ (20)

430 PROJJ