SlideShare a Scribd company logo
University of Illinois
Statistics 430
Survival Analysis
Non-Parametric Estimation of Summary
Curves for Competing Risks
Authors:
McClelland Kemp & Michael Smith
December 20, 2012
Introduction
In survival analysis a participant in a study may experience an event other than the one of
interest, altering the probability of experiencing the event of interest. These events are known
as competing risk events. Competing risks commonly occur in medical studies, and meth-
ods were developed specifically to deal with their occurrence in cancer research since both
treatment-related mortality and disease recurrence are important events of interest.
The standard methodology for approaching this kind of time-to-event data was developed
by Robert Gray in the 1980’s, and is still used today. Statistical software such as SAS and R
have PROC’s and packages developed for this specific purpose. The R package cmprsk was
written by Gray for data with competing risks and is used extensively in this report.
Method
This topic deviates from the main focus of the course in that the practice of censoring data
in a non-informative way is no longer acceptable. Since the Kaplan-Meier estimation proce-
dure observes only the primary event of interest, it is rendered useless in a competing risks
situation. Similarly, the Mantel-Haenzel log-rank test for comparison of cumulative incidence
curves, and the standard Cox model for the assessment of covariates lead to incorrect and bi-
ased results. This bias arises because the Kaplan-Meier estimation assumes that all events are
independent and consequently censors events other than the event of interest.
We wish, rather, to find the cumulative incidence of a specific event of interest. Any subject
who fails to experience the event of interest can be treated as censored, but in an informa-
tive manner. The cumulative incidence function for an event of interest must be calculated
by appropriately accounting for the presence of competing risk events. In order to compare
the cumulative incidence curves of a particular type of failure among different groups in the
presence of competing risks, Robert Gray proposed a test that compares weighted averages of
the hazards of cumulative incidence function using the cumulative incidence estimation equa-
tion. The null hypothesis predicts that all the cumulative incidence curves for all the groups
are equivalent, and the test statistic follows a χ2
distribution with degrees of freedom ng − 1
where ng is the number of groups.
The cumulative incidence function estimates the probability that the event of interest occurs
before time t and that it occurs before any of the competing causes of failure. Let t1 < t2 <
· · · < tK be the distinct times where one of the competing risks occur, where at time ti, Yi is
the number of subjects at risk, ri is the number of subjects with an occurrence at time ti and di
is the number of subjects with an occurrence of any other of the competing risks at this time.
The cumulative incidence function is then defined by:
CIi(t) =



0 if t ≤ t1
t1≤t
i−1
j=1 1 −
[dj+rj]
Yj
ri
Yi
if t1 ≤ t
(0.1)
i
With variance:
V [CI(t)] =
t1≤t
ˆS (ti)2
[CI(t) − CI(ti)]2 ri + di
Y2
i
+ [1 + 2 (CI(t) − CI(ti))]
ri
Y2
i
, (0.2)
where ˆS (ti) is the Kaplan-Meier estimator for the survival curve defined by:
ˆS (ti) =
1 if t < t1
t1≤t 1 − di
Yi
if t1 ≤ t
(0.3)
Gray’s test evaluates H0 : CI1 = CI2 = · · · = CIng
. A score is calculated for each of the ng-1
groups and put in a vector. The form of the test statistic for the ng=2 situation is below, while
the derivation of the test statistic for the ng > 2 groups situation is omitted.
τk
0
Kn(t) 1 − ˆCI11(t−)
−1
d ˆCI11(t) − 1 − ˆCI12(t−)
−1
d ˆCI12(t) , (0.4)
where ˆCI1n(t−) is the cumulative incidence function evaluated just before time t and Kn(t) is the
weight function for the nth group. We would like to choose a weight function that maximizes
the power of the test against particular alternatives of interest. Gray suggests that regular
survival analysis for competing risks can give a good indication of what a good choice for the
weight function should be, and in general should use the Harrington and Flemming family of
weight functions. Here, we use ρ = 0 for simplicity. The calculation of the test statistic is as
follows:
ZΣ−1
Zt
(0.5)
Where Z is the vector of the K − 1 scores for the corresponding groups (Linear dependence
lets us exclude the last score) and Σ is the estimated (K-1× K-1) covariance matrix. In the
K=2 case, the vector Z is a scalar and the estimated covariance matrix is just the variance of
the score for K=1.
Another topic of interest is the conditional probability function for each competing risk,
which gives an estimate of the conditional probability of event K’s occuring by t given that
none of the other causes have occurred by t. It can be calculated by:
CPi(t) =
CIi(t)
1 − CIic (t)
(0.6)
Where CIi(t) denotes the cumulative incidence function for the ith competing risk at time t,
and CIic (t) denotes the sum of all other risks.
ii
Objectives
We will compare the estimated cumulative incidence curves with those obtained using the
Kaplan - Meier approach to demonstrate the importance of appropriately estimating the cu-
mulative incidence of an event of interest in the presence of competing risk events. We will
also interpret the Gray’s tests for equality of cumulative incidence curves and conditional
probability functions to infer conclusions about our data.
We recognize that testing the equality of curves between the Kaplan-Meier Curves and the
estimated cumulative incidence function using a Log-Rank test is not appropriate for com-
peting risks but due to complexity, the proper method of testing requires competing risks
regression is not included in this paper. We will rely on the ’cuminc’ function to perform the
Gray’s test.
Relevance of Data
The data that will be used for analysis is the ’bmt’ data which documents bone marrow trans-
plants for leukemia patients. The transplantation was considered to be a failure when the
patient’s leukemia returned (relapse) or the patient died in remission. This makes it an ideal
case for competing risks with K = 2 and ng = 3.
iii
Analysis
The bmt data includes three groups: ALL, AML Low Risk and AML High Risk. The variable
’group’ contains factors that indicate which group the subject is in. The variable t2 contains
numeric values indicating the number of days before the subject died, relapsed or left the study
and was censored. The variable d1 is an indicator on whether or not there was an observed
death. The variable d2 is an indicator on whether or not there was an observed relapse. Since
we are only concerned with the competing risks of death or relapse, we did not use the rest of
the data set. To create a usable data set, we created a new variable delta:
delta[i] =



0, if no relapse or death was observed
1, if a relapse was observed
2, if a death was observed
(0.7)
For subjects that experienced a relapse and then death, they were coded as a relapse (obvi-
ously the subject would have experienced a relapse before their death, since it is impossible to
relapse once you are dead). Our new data set contained three variables: group, t2, and delta.
We then scaled the variable t2 so it described the time to event in years.
Our first point of analysis was estimating using the Kaplan-Meier process, which can be
compared to the cumulative incidence and conditional probability summary curves when one
minus the Kaplan-Meier estimate is plotted against time. For each group we plotted the curves
for the event probability of death and relapse after the transplant against years on study.
To examine the cumulative incidence curves, we used the function ’cuminc’ out of the
package created by Gray called ’cmprsk’ which also includes a function for competing risks
regression for sub-distribution functions. The function ’cuminc’ takes arguments for a time
variable, a nominal group variable, a nominal indicator variable (k for an observed kth com-
peting risk.), a value for rho (the power of the weight function), and optional stratification
variable. It yields the results of Gray’s test, the time of events, the cumulative incidence esti-
mates and their variances. We created plots to examine the summary curves and inferred on
the Gray’s test when appropriate.
To calculate the conditional probabilities we again used the ’cuminc’ function. It stores the
times and estimates for group n and event m at object[["n m"]]$ time and object[["n m"]]$ est
where ’object’ is the character string assigned to the ’cuminc’ function. To calculate the actual
estimates, we used equation (0.6) and looped over all estimates, then plotted them against the
times.
1
Kaplan-Meier Estimation
0 1 2 3 4 5
0.10.20.30.4
Disease Group (ALL)
Time
Probability
Death
Relapse
0 1 2 3 4 5 6 7
0.10.20.30.4
Disease Group (AML Low Risk)
Time
Probability
Death
Relapse
0 2 4 6
0.00.20.40.6
Disease Group (AML High Risk)
Time
Probability
Death
Relapse
The plots of one minus the survival
estimates found using the ’survfit’
function are plotted above for each
disease group. It appears that in all
groups except the AML Low Risk, study
subjects have a higher probability of
relapsing than dying for a majority of the
study. The large difference in the AML
Low Risk subjects could be attributed to
the low number of events observed in the
group.
Again, it should be noted that these curves are based on naive estimates. That is, that they were
estimated based upon the assumption that there did not exist any other competing risk, and we
are merely calculating them for comparison with other summary curves that are deemed more
appropriate to represent event probability when there are competing risks present.
2
Cumulative Incidence
0 1 2 3 4 5
0.00.20.40.60.81.0
Cumulative Incidence (ALL Group)
Time
Probability
ALL 1
ALL 2Death
Relapse
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
Cumulative Incidence (AML Low Risk Group)
Time
Probability
2 1
2 2
Death
Relapse
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence (AML High Risk Group)
Time
Probability
3 1
3 2Death
Relapse
While Gray’s test is not preformed, these
cumulative incidence curves are
descriptive of the difference between the
Kaplan-Meier estimation and the
procedure that accounts for the
competing risks. In every set of curves
there is a reversal from the KM curves.
For example, the AML Low Risk group
was estimated by the KM curve to have a
higher event probabilities for death, but
when the competing risk is taken into
account, relapse has a higher probability.
The results confirm the inadequacy of the Kaplan-Meier estimator to accurately model a situ-
ation with competing risks. This is an extreme case, where the conclusions of the cumulative
incidence curves directly contradicted the KM curves, but in general one would expect to see
a drastic effect of including the competing risks when estimating.
As stated above, Gray’s test was not preformed because it only tests for equality of cumu-
lative incidence curves between groups, not between risks. We will, however, test the equality
of the cumulative incidence functions for the three groups by risk on the next page.
3
Cumulative Incidence (cont.)
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence − Relapse Probabilities
Time
Probability
1 1
2 1
3 1
ALL
AML Low Risk
AML High Risk
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence − Death Probabilities
Time
Probability
1 2
2 2
3 2
ALL
AML Low Risk
AML High Risk
Tests:
stat pv df
1 15.19568 0.0005015337 2
Tests:
stat pv df
2 3.128035 0.2092935 2
When comparing the global risk of relapse and death amongst the three types of groups, we
see clear differences in trends for relapse probabilities and a not-so-clear difference in death
probabilities. This is confirmed by the Grey’s test conducted as part of the ’cuminc’ function.
We see a significant p-value in the test for relapse, and a less than significant p-value for the
test for death, which leads to the conclusion that relapse is more likely depending on the group
to which one belongs, but probability of death is not necessarily different for the three groups.
4
Conditional Probability
Plotting the Kaplan-Meier, cumulative incidence, and the conditional probability curves to-
gether gives us some more insight about the cumulative probability. As seen in the plot, the
conditional probability curve changes value at the occurrence of either death or relapse. This
is the result of a change in likelihood of one event causing changes in the probabilities of
future events. The conditional probability should generally be greater than the other two and
the cumulative incidence generally smaller.
Conclusions
The use of the ’cuminc’ function enabled us to exhibit the importance of adjusting event prob-
ability based on the presence of competing risks. The results of the analysis of our data
confirmed the need for a less naive form of estimation than was offered by the Kaplan-Meier
procedure, which produced completely contradictory results.
When comparing the summary curves of the three groups ALL, AML Low Risk and AML
High Risk, we found that the cumulative incidence curves for Relapse were significantly dif-
ferent from each other. This signifies that the event probability for a subject in each group
would have to be estimated independently of the other groups. Conversely, we found that the
groups have cumulative incidence curves for death that were not significantly different from
each other. This implies that the event probability for a subject in any group could be estimated
well by any of the curves.
5
Sources
Gray, Robert "A class of K sample tests for comparing the cumulative
incidence of a competing risk" (1986) 1-15newline
Klein, John P. Moeschberger, Melvin L. "Survival Analysis: Techniques
for Censored and Truncated Data" (2003) 127-133
R Code
#### Packages #####
install.packages(c("mstate","survival","KMsurv","cmprsk"))
library(mstate)
library(survival)
library(KMsurv)
library(cmprsk)
setwd("C:/Users/User/Dropbox/430 Proj")
detach()
#### Data Prep ####
delta=vector()
attach(bmt)
for(i in 1:length(bmt$t1)){
if (d1[i]==1 & d2[i]==1) {delta[i]=1}
if (d1[i]==1 & d2[i]==0) {delta[i]=2}
if (d1[i]==0 & d2[i]==1) {delta[i]=1}
if (d1[i]==0 & d2[i]==0) {delta[i]=0}
}
detach(bmt)
group=vector()
for(i in 1:length(bmt$group)){
if (bmt$group[i]==1) {group[i]="ALL"}
if (bmt$group[i]==2) {group[i]="AML Low Risk"}
if (bmt$group[i]==3) {group[i]="AML High Risk"}
}
group<-rep(c("ALL","AML Low Risk","AML High Risk"),c(38,54,45))
Marrow<-cbind(group,delta,bmt[,-1])
Marrow$t2=as.numeric(bmt$t2/365)
Marrow<-as.data.frame(Marrow)
#### Kaplan - Meier Estimate ####
6
ALL<-subset(Marrow,group=="ALL") # rows 1 - 38
AMLL<-subset(Marrow,group=="AML Low Risk") # rows 39 - 92
AMLH<-subset(Marrow,group=="AML High Risk") # rows 93 - 137
ALLDeath<-subset(ALL,delta != 1)
ALLRelapse<-subset(ALL,delta != 2)
KMALL1<-survfit(Surv(t2,d1)~1,data=ALLDeath,type="kaplan-meier")
KMALL2<-survfit(Surv(t2,d2)~1,data=ALLRelapse,type="kaplan-meier")
plot(KMALL1$time,1-KMALL1$surv,col=1,type="s",ylab="Probability",xlab="Time",mai
lines(KMALL2$time,1-KMALL2$surv,type="s",lty=1,col=2)
legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2)
AMLLDeath<-subset(AMLL,delta != 1)
AMLLRelapse<-subset(AMLL,delta != 2)
KMAMLL1<-survfit(Surv(t2,d1)~1,data=AMLLDeath,type="kaplan-meier")
KMAMLL2<-survfit(Surv(t2,d2)~1,data=AMLLRelapse,type="kaplan-meier")
plot(KMAMLL1$time,1-KMAMLL1$surv,type="s",ylab="Probability",xlab="Time",main="
lines(KMAMLL2$time,1-KMAMLL2$surv,type="s",col=2)
legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2)
AMLHDeath<-subset(AMLH,delta != 1)
AMLHRelapse<-subset(AMLH,delta != 2)
KMAMLH1<-survfit(Surv(t2,d1)~1,data=AMLHDeath,type="kaplan-meier")
KMAMLH2<-survfit(Surv(t2,d2)~1,data=AMLHRelapse,type="kaplan-meier")
plot(KMAMLH1$time,1-KMAMLH1$surv,type="s",ylab="Probability",ylim=c(0,max(1-KMA
lines(KMAMLH2$time,1-KMAMLH2$surv,type="s",col=2)
legend(locator(1), legend=c("Death","Relapse"), col=1:2,lty=1)
#plot(KMAMLL1$time,1-KMAMLL1$surv,type="s",col=1,ylab="Probability",xlab="Time",mai
#lines(KMALL1$time,1-KMALL1$surv,type="s",col=2)
#lines(KMAMLH1$time,1-KMAMLH1$surv,type="s",col=3)
#legend(locator(1), legend=c("AML Low Risk","ALL","AML High Risk"), col=1:3,lty=1)
#plot(KMAMLH2$time,1-KMAMLH2$surv,type="s",col=1,ylab="Probability",xlab="Time",mai
#lines(KMAMLL2$time,1-KMAMLL2$surv,type="s",col=2)
#lines(KMALL2$time,1-KMALL2$surv,type="s",col=3)
#legend(locator(1), legend=c("AML Low Risk","ALL","AML High Risk"), col=1:3,lty=1)
7
#### Cumulative Incidence ####
# The cuminc function returns an option "tests" that gives the test statistics and
# comparing the subdistribution for each cause across groups.
CI1<-cuminc(Marrow$t2[1:38],Marrow$delta[1:38],Marrow$group[1:38]); # ALL Death vs.
plot(CI1,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence
legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2)
CI2<-cuminc(Marrow$t2[39:92],Marrow$delta[39:92],Marrow$group[39:92]); # AMLL Death
plot(CI2,col=1:2,lty=1:2,ylab="Probability",xlab="Time",main="Cumulative Incidenc
legend(locator(1), legend=c("Death","Relapse"), lty=1, col=1:2)
CI3<-cuminc(Marrow$t2[93:137],Marrow$delta[93:137],Marrow$group[93:137]); # AMLH De
plot(CI3,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence
legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2)
Death<-subset(Marrow,delta!=1)
CI4<-cuminc(Death$t2,Death$delta,Death$group); name<-names(CI4) # Death for all g
plot(CI4,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat
legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt
Relapse<-subset(Marrow,delta!=2)
CI5<-cuminc(Relapse$t2,Relapse$delta,Relapse$group);name<-names(CI5) # Relapse fo
plot(CI5,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat
legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt
CI6<-cuminc(Marrow$t2,Marrow$delta,Marrow$group);name<-names(CI6)
plot(CI6,name[1:6],col=1:6,lty=1:6,ylab="Probability",xlab="Time",main="Cumulative
#### Conditional Probability ####
sourced = ifelse(source==1,1,0)
sourcer = ifelse(source==2,1,0)
#ALL group:
#Cumulative Incidence
cirall = cuminc(t2[1:38],sourcer[1:38],group[1:38])
cidall = cuminc(t2[1:38],sourced[1:38],group[1:38])
8
cpktrall = (cirall[["1 1"]]$est)/(1-cidall[["1 1"]]$est)
cpktdall = (cidall[["1 1"]]$est)/(1-cirall[["1 1"]]$est)
plot(cuminc(t2[1:38],sourcer[1:38],group[1:38]),ylab="Probability",xlab="Time",main
lines(KMALL1$time,1-KMALL1$surv,type="s",col="2")
lines(cirall[["1 1"]]$time,cpktrall,type="s",col="3")
legend(locator(1), legend=c("Cumulative-Incidence","Kaplan-Meier","Conditional Pr
plot(cuminc(t2[1:38],sourced[1:38],group[1:38]),ylab="Probability",xlab="Time",main
lines(KMALL1$time,1-KMALL1$surv,type="s",col="2")
lines(cidall[["1 1"]]$time,cpktdall,type="s",col="3")
legend(locator(1), legend=c("Cumulative-Incidence","Kaplan-Meier","Conditional P
9

More Related Content

Viewers also liked

ประวัติส่วนตัว
ประวัติส่วนตัวประวัติส่วนตัว
ประวัติส่วนตัว
suphalak khamsunan
 
Natal
NatalNatal
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Data Con LA
 
What Marketing is in just 3 slides!
What Marketing is in just 3 slides!What Marketing is in just 3 slides!
What Marketing is in just 3 slides!harshad_test
 
Modalidades educativas
Modalidades educativasModalidades educativas
Modalidades educativas
Elizabeth Monserrat García Ramírez
 
História da arte.ppsx jogo damemoria constble
História da arte.ppsx   jogo damemoria constbleHistória da arte.ppsx   jogo damemoria constble
História da arte.ppsx jogo damemoria constble
deasilvia
 
Fiche competence fourmentraux olivier pdf1
Fiche competence fourmentraux olivier pdf1Fiche competence fourmentraux olivier pdf1
Fiche competence fourmentraux olivier pdf1
OLIVIER FOURMENTRAUX
 
ЦТК організував екскурсію до м.Києва
ЦТК організував екскурсію до м.КиєваЦТК організував екскурсію до м.Києва
ЦТК організував екскурсію до м.Києва
Marina Tkachuk
 
Classical Hodgkin’s lymphoma
Classical Hodgkin’s lymphomaClassical Hodgkin’s lymphoma
Classical Hodgkin’s lymphoma
Ankit Raiyani
 
I.i
I.iI.i

Viewers also liked (11)

ประวัติส่วนตัว
ประวัติส่วนตัวประวัติส่วนตัว
ประวัติส่วนตัว
 
Natal
NatalNatal
Natal
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 
What Marketing is in just 3 slides!
What Marketing is in just 3 slides!What Marketing is in just 3 slides!
What Marketing is in just 3 slides!
 
Modalidades educativas
Modalidades educativasModalidades educativas
Modalidades educativas
 
História da arte.ppsx jogo damemoria constble
História da arte.ppsx   jogo damemoria constbleHistória da arte.ppsx   jogo damemoria constble
História da arte.ppsx jogo damemoria constble
 
Los tiristores
Los tiristoresLos tiristores
Los tiristores
 
Fiche competence fourmentraux olivier pdf1
Fiche competence fourmentraux olivier pdf1Fiche competence fourmentraux olivier pdf1
Fiche competence fourmentraux olivier pdf1
 
ЦТК організував екскурсію до м.Києва
ЦТК організував екскурсію до м.КиєваЦТК організував екскурсію до м.Києва
ЦТК організував екскурсію до м.Києва
 
Classical Hodgkin’s lymphoma
Classical Hodgkin’s lymphomaClassical Hodgkin’s lymphoma
Classical Hodgkin’s lymphoma
 
I.i
I.iI.i
I.i
 

Similar to 430 PROJJ

TestSurvRec manual
TestSurvRec manualTestSurvRec manual
TestSurvRec manual
Carlos M Martínez M
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
Chandan Reddy
 
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...
Carlos M Martínez M
 
Basic survival analysis
Basic survival analysisBasic survival analysis
Basic survival analysis
Mike LaValley
 
Non-Parametric Survival Models
Non-Parametric Survival ModelsNon-Parametric Survival Models
Non-Parametric Survival Models
MangaiK4
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppt
habtamu biazin
 
Risk and interdependencies in critical infrastructures
Risk and interdependencies in critical infrastructuresRisk and interdependencies in critical infrastructures
Risk and interdependencies in critical infrastructuresSpringer
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
Chandan Reddy
 
Time Series Project
Time Series Project Time Series Project
Time Series Project Sean Cahill
 
Art01
Art01Art01
MATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAY
MATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAYMATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAY
MATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAY
cscpconf
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions www.ijeijournal.com
 
Data science
Data scienceData science
Data science
Rakibul Hasan Pranto
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...
BenVanCalster
 
V. pacáková, d. brebera
V. pacáková, d. breberaV. pacáková, d. brebera
V. pacáková, d. brebera
logyalaa
 
Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...
Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...
Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...
Dr. Amir Mosavi, PhD., P.Eng.
 
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Waqas Tariq
 

Similar to 430 PROJJ (20)

TestSurvRec manual
TestSurvRec manualTestSurvRec manual
TestSurvRec manual
 
Bachelor_thesis
Bachelor_thesisBachelor_thesis
Bachelor_thesis
 
Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
 
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...
 
Basic survival analysis
Basic survival analysisBasic survival analysis
Basic survival analysis
 
Non-Parametric Survival Models
Non-Parametric Survival ModelsNon-Parametric Survival Models
Non-Parametric Survival Models
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppt
 
Risk and interdependencies in critical infrastructures
Risk and interdependencies in critical infrastructuresRisk and interdependencies in critical infrastructures
Risk and interdependencies in critical infrastructures
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Time Series Project
Time Series Project Time Series Project
Time Series Project
 
Art01
Art01Art01
Art01
 
Part 1 Survival Analysis
Part 1 Survival AnalysisPart 1 Survival Analysis
Part 1 Survival Analysis
 
MATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAY
MATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAYMATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAY
MATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAY
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 
Data science
Data scienceData science
Data science
 
Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...Calibration of risk prediction models: decision making with the lights on or ...
Calibration of risk prediction models: decision making with the lights on or ...
 
Sampling theory
Sampling theorySampling theory
Sampling theory
 
V. pacáková, d. brebera
V. pacáková, d. breberaV. pacáková, d. brebera
V. pacáková, d. brebera
 
Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...
Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...
Predicting Trends of Coronavirus Disease (COVID-19) Using SIRD and Gaussian-S...
 
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
 

430 PROJJ

  • 1. University of Illinois Statistics 430 Survival Analysis Non-Parametric Estimation of Summary Curves for Competing Risks Authors: McClelland Kemp & Michael Smith December 20, 2012
  • 2. Introduction In survival analysis a participant in a study may experience an event other than the one of interest, altering the probability of experiencing the event of interest. These events are known as competing risk events. Competing risks commonly occur in medical studies, and meth- ods were developed specifically to deal with their occurrence in cancer research since both treatment-related mortality and disease recurrence are important events of interest. The standard methodology for approaching this kind of time-to-event data was developed by Robert Gray in the 1980’s, and is still used today. Statistical software such as SAS and R have PROC’s and packages developed for this specific purpose. The R package cmprsk was written by Gray for data with competing risks and is used extensively in this report. Method This topic deviates from the main focus of the course in that the practice of censoring data in a non-informative way is no longer acceptable. Since the Kaplan-Meier estimation proce- dure observes only the primary event of interest, it is rendered useless in a competing risks situation. Similarly, the Mantel-Haenzel log-rank test for comparison of cumulative incidence curves, and the standard Cox model for the assessment of covariates lead to incorrect and bi- ased results. This bias arises because the Kaplan-Meier estimation assumes that all events are independent and consequently censors events other than the event of interest. We wish, rather, to find the cumulative incidence of a specific event of interest. Any subject who fails to experience the event of interest can be treated as censored, but in an informa- tive manner. The cumulative incidence function for an event of interest must be calculated by appropriately accounting for the presence of competing risk events. In order to compare the cumulative incidence curves of a particular type of failure among different groups in the presence of competing risks, Robert Gray proposed a test that compares weighted averages of the hazards of cumulative incidence function using the cumulative incidence estimation equa- tion. The null hypothesis predicts that all the cumulative incidence curves for all the groups are equivalent, and the test statistic follows a χ2 distribution with degrees of freedom ng − 1 where ng is the number of groups. The cumulative incidence function estimates the probability that the event of interest occurs before time t and that it occurs before any of the competing causes of failure. Let t1 < t2 < · · · < tK be the distinct times where one of the competing risks occur, where at time ti, Yi is the number of subjects at risk, ri is the number of subjects with an occurrence at time ti and di is the number of subjects with an occurrence of any other of the competing risks at this time. The cumulative incidence function is then defined by: CIi(t) =    0 if t ≤ t1 t1≤t i−1 j=1 1 − [dj+rj] Yj ri Yi if t1 ≤ t (0.1) i
  • 3. With variance: V [CI(t)] = t1≤t ˆS (ti)2 [CI(t) − CI(ti)]2 ri + di Y2 i + [1 + 2 (CI(t) − CI(ti))] ri Y2 i , (0.2) where ˆS (ti) is the Kaplan-Meier estimator for the survival curve defined by: ˆS (ti) = 1 if t < t1 t1≤t 1 − di Yi if t1 ≤ t (0.3) Gray’s test evaluates H0 : CI1 = CI2 = · · · = CIng . A score is calculated for each of the ng-1 groups and put in a vector. The form of the test statistic for the ng=2 situation is below, while the derivation of the test statistic for the ng > 2 groups situation is omitted. τk 0 Kn(t) 1 − ˆCI11(t−) −1 d ˆCI11(t) − 1 − ˆCI12(t−) −1 d ˆCI12(t) , (0.4) where ˆCI1n(t−) is the cumulative incidence function evaluated just before time t and Kn(t) is the weight function for the nth group. We would like to choose a weight function that maximizes the power of the test against particular alternatives of interest. Gray suggests that regular survival analysis for competing risks can give a good indication of what a good choice for the weight function should be, and in general should use the Harrington and Flemming family of weight functions. Here, we use ρ = 0 for simplicity. The calculation of the test statistic is as follows: ZΣ−1 Zt (0.5) Where Z is the vector of the K − 1 scores for the corresponding groups (Linear dependence lets us exclude the last score) and Σ is the estimated (K-1× K-1) covariance matrix. In the K=2 case, the vector Z is a scalar and the estimated covariance matrix is just the variance of the score for K=1. Another topic of interest is the conditional probability function for each competing risk, which gives an estimate of the conditional probability of event K’s occuring by t given that none of the other causes have occurred by t. It can be calculated by: CPi(t) = CIi(t) 1 − CIic (t) (0.6) Where CIi(t) denotes the cumulative incidence function for the ith competing risk at time t, and CIic (t) denotes the sum of all other risks. ii
  • 4. Objectives We will compare the estimated cumulative incidence curves with those obtained using the Kaplan - Meier approach to demonstrate the importance of appropriately estimating the cu- mulative incidence of an event of interest in the presence of competing risk events. We will also interpret the Gray’s tests for equality of cumulative incidence curves and conditional probability functions to infer conclusions about our data. We recognize that testing the equality of curves between the Kaplan-Meier Curves and the estimated cumulative incidence function using a Log-Rank test is not appropriate for com- peting risks but due to complexity, the proper method of testing requires competing risks regression is not included in this paper. We will rely on the ’cuminc’ function to perform the Gray’s test. Relevance of Data The data that will be used for analysis is the ’bmt’ data which documents bone marrow trans- plants for leukemia patients. The transplantation was considered to be a failure when the patient’s leukemia returned (relapse) or the patient died in remission. This makes it an ideal case for competing risks with K = 2 and ng = 3. iii
  • 5. Analysis The bmt data includes three groups: ALL, AML Low Risk and AML High Risk. The variable ’group’ contains factors that indicate which group the subject is in. The variable t2 contains numeric values indicating the number of days before the subject died, relapsed or left the study and was censored. The variable d1 is an indicator on whether or not there was an observed death. The variable d2 is an indicator on whether or not there was an observed relapse. Since we are only concerned with the competing risks of death or relapse, we did not use the rest of the data set. To create a usable data set, we created a new variable delta: delta[i] =    0, if no relapse or death was observed 1, if a relapse was observed 2, if a death was observed (0.7) For subjects that experienced a relapse and then death, they were coded as a relapse (obvi- ously the subject would have experienced a relapse before their death, since it is impossible to relapse once you are dead). Our new data set contained three variables: group, t2, and delta. We then scaled the variable t2 so it described the time to event in years. Our first point of analysis was estimating using the Kaplan-Meier process, which can be compared to the cumulative incidence and conditional probability summary curves when one minus the Kaplan-Meier estimate is plotted against time. For each group we plotted the curves for the event probability of death and relapse after the transplant against years on study. To examine the cumulative incidence curves, we used the function ’cuminc’ out of the package created by Gray called ’cmprsk’ which also includes a function for competing risks regression for sub-distribution functions. The function ’cuminc’ takes arguments for a time variable, a nominal group variable, a nominal indicator variable (k for an observed kth com- peting risk.), a value for rho (the power of the weight function), and optional stratification variable. It yields the results of Gray’s test, the time of events, the cumulative incidence esti- mates and their variances. We created plots to examine the summary curves and inferred on the Gray’s test when appropriate. To calculate the conditional probabilities we again used the ’cuminc’ function. It stores the times and estimates for group n and event m at object[["n m"]]$ time and object[["n m"]]$ est where ’object’ is the character string assigned to the ’cuminc’ function. To calculate the actual estimates, we used equation (0.6) and looped over all estimates, then plotted them against the times. 1
  • 6. Kaplan-Meier Estimation 0 1 2 3 4 5 0.10.20.30.4 Disease Group (ALL) Time Probability Death Relapse 0 1 2 3 4 5 6 7 0.10.20.30.4 Disease Group (AML Low Risk) Time Probability Death Relapse 0 2 4 6 0.00.20.40.6 Disease Group (AML High Risk) Time Probability Death Relapse The plots of one minus the survival estimates found using the ’survfit’ function are plotted above for each disease group. It appears that in all groups except the AML Low Risk, study subjects have a higher probability of relapsing than dying for a majority of the study. The large difference in the AML Low Risk subjects could be attributed to the low number of events observed in the group. Again, it should be noted that these curves are based on naive estimates. That is, that they were estimated based upon the assumption that there did not exist any other competing risk, and we are merely calculating them for comparison with other summary curves that are deemed more appropriate to represent event probability when there are competing risks present. 2
  • 7. Cumulative Incidence 0 1 2 3 4 5 0.00.20.40.60.81.0 Cumulative Incidence (ALL Group) Time Probability ALL 1 ALL 2Death Relapse 0 1 2 3 4 5 6 7 0.00.20.40.60.81.0 Cumulative Incidence (AML Low Risk Group) Time Probability 2 1 2 2 Death Relapse 0 2 4 6 0.00.20.40.60.81.0 Cumulative Incidence (AML High Risk Group) Time Probability 3 1 3 2Death Relapse While Gray’s test is not preformed, these cumulative incidence curves are descriptive of the difference between the Kaplan-Meier estimation and the procedure that accounts for the competing risks. In every set of curves there is a reversal from the KM curves. For example, the AML Low Risk group was estimated by the KM curve to have a higher event probabilities for death, but when the competing risk is taken into account, relapse has a higher probability. The results confirm the inadequacy of the Kaplan-Meier estimator to accurately model a situ- ation with competing risks. This is an extreme case, where the conclusions of the cumulative incidence curves directly contradicted the KM curves, but in general one would expect to see a drastic effect of including the competing risks when estimating. As stated above, Gray’s test was not preformed because it only tests for equality of cumu- lative incidence curves between groups, not between risks. We will, however, test the equality of the cumulative incidence functions for the three groups by risk on the next page. 3
  • 8. Cumulative Incidence (cont.) 0 2 4 6 0.00.20.40.60.81.0 Cumulative Incidence − Relapse Probabilities Time Probability 1 1 2 1 3 1 ALL AML Low Risk AML High Risk 0 2 4 6 0.00.20.40.60.81.0 Cumulative Incidence − Death Probabilities Time Probability 1 2 2 2 3 2 ALL AML Low Risk AML High Risk Tests: stat pv df 1 15.19568 0.0005015337 2 Tests: stat pv df 2 3.128035 0.2092935 2 When comparing the global risk of relapse and death amongst the three types of groups, we see clear differences in trends for relapse probabilities and a not-so-clear difference in death probabilities. This is confirmed by the Grey’s test conducted as part of the ’cuminc’ function. We see a significant p-value in the test for relapse, and a less than significant p-value for the test for death, which leads to the conclusion that relapse is more likely depending on the group to which one belongs, but probability of death is not necessarily different for the three groups. 4
  • 9. Conditional Probability Plotting the Kaplan-Meier, cumulative incidence, and the conditional probability curves to- gether gives us some more insight about the cumulative probability. As seen in the plot, the conditional probability curve changes value at the occurrence of either death or relapse. This is the result of a change in likelihood of one event causing changes in the probabilities of future events. The conditional probability should generally be greater than the other two and the cumulative incidence generally smaller. Conclusions The use of the ’cuminc’ function enabled us to exhibit the importance of adjusting event prob- ability based on the presence of competing risks. The results of the analysis of our data confirmed the need for a less naive form of estimation than was offered by the Kaplan-Meier procedure, which produced completely contradictory results. When comparing the summary curves of the three groups ALL, AML Low Risk and AML High Risk, we found that the cumulative incidence curves for Relapse were significantly dif- ferent from each other. This signifies that the event probability for a subject in each group would have to be estimated independently of the other groups. Conversely, we found that the groups have cumulative incidence curves for death that were not significantly different from each other. This implies that the event probability for a subject in any group could be estimated well by any of the curves. 5
  • 10. Sources Gray, Robert "A class of K sample tests for comparing the cumulative incidence of a competing risk" (1986) 1-15newline Klein, John P. Moeschberger, Melvin L. "Survival Analysis: Techniques for Censored and Truncated Data" (2003) 127-133 R Code #### Packages ##### install.packages(c("mstate","survival","KMsurv","cmprsk")) library(mstate) library(survival) library(KMsurv) library(cmprsk) setwd("C:/Users/User/Dropbox/430 Proj") detach() #### Data Prep #### delta=vector() attach(bmt) for(i in 1:length(bmt$t1)){ if (d1[i]==1 & d2[i]==1) {delta[i]=1} if (d1[i]==1 & d2[i]==0) {delta[i]=2} if (d1[i]==0 & d2[i]==1) {delta[i]=1} if (d1[i]==0 & d2[i]==0) {delta[i]=0} } detach(bmt) group=vector() for(i in 1:length(bmt$group)){ if (bmt$group[i]==1) {group[i]="ALL"} if (bmt$group[i]==2) {group[i]="AML Low Risk"} if (bmt$group[i]==3) {group[i]="AML High Risk"} } group<-rep(c("ALL","AML Low Risk","AML High Risk"),c(38,54,45)) Marrow<-cbind(group,delta,bmt[,-1]) Marrow$t2=as.numeric(bmt$t2/365) Marrow<-as.data.frame(Marrow) #### Kaplan - Meier Estimate #### 6
  • 11. ALL<-subset(Marrow,group=="ALL") # rows 1 - 38 AMLL<-subset(Marrow,group=="AML Low Risk") # rows 39 - 92 AMLH<-subset(Marrow,group=="AML High Risk") # rows 93 - 137 ALLDeath<-subset(ALL,delta != 1) ALLRelapse<-subset(ALL,delta != 2) KMALL1<-survfit(Surv(t2,d1)~1,data=ALLDeath,type="kaplan-meier") KMALL2<-survfit(Surv(t2,d2)~1,data=ALLRelapse,type="kaplan-meier") plot(KMALL1$time,1-KMALL1$surv,col=1,type="s",ylab="Probability",xlab="Time",mai lines(KMALL2$time,1-KMALL2$surv,type="s",lty=1,col=2) legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2) AMLLDeath<-subset(AMLL,delta != 1) AMLLRelapse<-subset(AMLL,delta != 2) KMAMLL1<-survfit(Surv(t2,d1)~1,data=AMLLDeath,type="kaplan-meier") KMAMLL2<-survfit(Surv(t2,d2)~1,data=AMLLRelapse,type="kaplan-meier") plot(KMAMLL1$time,1-KMAMLL1$surv,type="s",ylab="Probability",xlab="Time",main=" lines(KMAMLL2$time,1-KMAMLL2$surv,type="s",col=2) legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2) AMLHDeath<-subset(AMLH,delta != 1) AMLHRelapse<-subset(AMLH,delta != 2) KMAMLH1<-survfit(Surv(t2,d1)~1,data=AMLHDeath,type="kaplan-meier") KMAMLH2<-survfit(Surv(t2,d2)~1,data=AMLHRelapse,type="kaplan-meier") plot(KMAMLH1$time,1-KMAMLH1$surv,type="s",ylab="Probability",ylim=c(0,max(1-KMA lines(KMAMLH2$time,1-KMAMLH2$surv,type="s",col=2) legend(locator(1), legend=c("Death","Relapse"), col=1:2,lty=1) #plot(KMAMLL1$time,1-KMAMLL1$surv,type="s",col=1,ylab="Probability",xlab="Time",mai #lines(KMALL1$time,1-KMALL1$surv,type="s",col=2) #lines(KMAMLH1$time,1-KMAMLH1$surv,type="s",col=3) #legend(locator(1), legend=c("AML Low Risk","ALL","AML High Risk"), col=1:3,lty=1) #plot(KMAMLH2$time,1-KMAMLH2$surv,type="s",col=1,ylab="Probability",xlab="Time",mai #lines(KMAMLL2$time,1-KMAMLL2$surv,type="s",col=2) #lines(KMALL2$time,1-KMALL2$surv,type="s",col=3) #legend(locator(1), legend=c("AML Low Risk","ALL","AML High Risk"), col=1:3,lty=1) 7
  • 12. #### Cumulative Incidence #### # The cuminc function returns an option "tests" that gives the test statistics and # comparing the subdistribution for each cause across groups. CI1<-cuminc(Marrow$t2[1:38],Marrow$delta[1:38],Marrow$group[1:38]); # ALL Death vs. plot(CI1,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2) CI2<-cuminc(Marrow$t2[39:92],Marrow$delta[39:92],Marrow$group[39:92]); # AMLL Death plot(CI2,col=1:2,lty=1:2,ylab="Probability",xlab="Time",main="Cumulative Incidenc legend(locator(1), legend=c("Death","Relapse"), lty=1, col=1:2) CI3<-cuminc(Marrow$t2[93:137],Marrow$delta[93:137],Marrow$group[93:137]); # AMLH De plot(CI3,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2) Death<-subset(Marrow,delta!=1) CI4<-cuminc(Death$t2,Death$delta,Death$group); name<-names(CI4) # Death for all g plot(CI4,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt Relapse<-subset(Marrow,delta!=2) CI5<-cuminc(Relapse$t2,Relapse$delta,Relapse$group);name<-names(CI5) # Relapse fo plot(CI5,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt CI6<-cuminc(Marrow$t2,Marrow$delta,Marrow$group);name<-names(CI6) plot(CI6,name[1:6],col=1:6,lty=1:6,ylab="Probability",xlab="Time",main="Cumulative #### Conditional Probability #### sourced = ifelse(source==1,1,0) sourcer = ifelse(source==2,1,0) #ALL group: #Cumulative Incidence cirall = cuminc(t2[1:38],sourcer[1:38],group[1:38]) cidall = cuminc(t2[1:38],sourced[1:38],group[1:38]) 8
  • 13. cpktrall = (cirall[["1 1"]]$est)/(1-cidall[["1 1"]]$est) cpktdall = (cidall[["1 1"]]$est)/(1-cirall[["1 1"]]$est) plot(cuminc(t2[1:38],sourcer[1:38],group[1:38]),ylab="Probability",xlab="Time",main lines(KMALL1$time,1-KMALL1$surv,type="s",col="2") lines(cirall[["1 1"]]$time,cpktrall,type="s",col="3") legend(locator(1), legend=c("Cumulative-Incidence","Kaplan-Meier","Conditional Pr plot(cuminc(t2[1:38],sourced[1:38],group[1:38]),ylab="Probability",xlab="Time",main lines(KMALL1$time,1-KMALL1$surv,type="s",col="2") lines(cidall[["1 1"]]$time,cpktdall,type="s",col="3") legend(locator(1), legend=c("Cumulative-Incidence","Kaplan-Meier","Conditional P 9