This document summarizes competing risk analysis methods for estimating event probabilities in the presence of competing risks. It analyzes bone marrow transplant data with relapse and death as competing risks using Kaplan-Meier, cumulative incidence, and conditional probability estimates. The cumulative incidence curves estimated using Gray's method are more appropriate than Kaplan-Meier curves when competing risks are present. Gray's test is used to compare cumulative incidence curves between patient groups. Conditional probabilities give the probability of an event occurring before other events.
Prediction model of algal blooms using logistic regression and confusion matrix IJECEIAES
Algal blooms data are collected and refined as experimental data for algal blooms prediction. Refined algal blooms dataset is analyzed by logistic regression analysis, and statistical tests and regularization are performed to find the marine environmental factors affecting algal blooms. The predicted value of algal bloom is obtained through logistic regression analysis using marine environment factors affecting algal blooms. The actual values and the predicted values of algal blooms dataset are applied to the confusion matrix. By improving the decision boundary of the existing logistic regression, and accuracy, sensitivity and precision for algal blooms prediction are improved. In this paper, the algal blooms prediction model is established by the ensemble method using logistic regression and confusion matrix. Algal blooms prediction is improved, and this is verified through big data analysis.
Crime Analysis using Regression and ANOVATom Donoghue
A statistical analysis of damage to property using a predictive regression model. Also an investigation to ascertain possible differences in reported divisional burglary rates using ANOVA.
Special Double Sampling Plan for truncated life tests based on the Marshall-O...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Prediction model of algal blooms using logistic regression and confusion matrix IJECEIAES
Algal blooms data are collected and refined as experimental data for algal blooms prediction. Refined algal blooms dataset is analyzed by logistic regression analysis, and statistical tests and regularization are performed to find the marine environmental factors affecting algal blooms. The predicted value of algal bloom is obtained through logistic regression analysis using marine environment factors affecting algal blooms. The actual values and the predicted values of algal blooms dataset are applied to the confusion matrix. By improving the decision boundary of the existing logistic regression, and accuracy, sensitivity and precision for algal blooms prediction are improved. In this paper, the algal blooms prediction model is established by the ensemble method using logistic regression and confusion matrix. Algal blooms prediction is improved, and this is verified through big data analysis.
Crime Analysis using Regression and ANOVATom Donoghue
A statistical analysis of damage to property using a predictive regression model. Also an investigation to ascertain possible differences in reported divisional burglary rates using ANOVA.
Special Double Sampling Plan for truncated life tests based on the Marshall-O...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA
During my time working on attribution and ingest systems, I've encountered several different approaches to solving the simple question: "How do I get data from A to B". In this session, I'd like to share some of the problems I've encountered and how to effectively solve them.
Este manual es útil e indispensable para el uso del "Package TesSurvRec_1.2.1" de CRAN. Importante para estadístico, médicos, farmacéuticos, seguros, bancos, ingenieros, psicólogos, astrónomos, entre otras profesiones. Son pruebas estadísticas que se utilizan para medir diferencias entre funciones del análisis de supervivencias de grupos de poblaciones que manifiestan eventos recurrentes.
Due to the advancements in various data acquisition and storage technologies, different disciplines have attained the ability to not only accumulate a wide variety of data but also to monitor observations over longer time periods. In many real-world applications, the primary objective of monitoring these observations is to estimate when a particular event of interest will occur in the future. One of the major difficulties in handling such problem is the presence of censoring, i.e., the event of interests is unobservable in some instance which is either because of time limitation or losing track. Due to censoring, standard statistical and machine learning based predictive models cannot readily be applied to analyze the data. An important subfield of statistics called survival analysis provides different mechanisms to handle such censored data problems. In addition to the presence of censoring, such time-to-event data also encounters several other research challenges such as instance/feature correlations, high-dimensionality, temporal dependencies, and difficulty in acquiring sufficient event data in a reasonable amount of time. To tackle such practical concerns, the data mining and machine learning communities have started to develop more sophisticated and effective algorithms that either complement or compete with the traditional statistical methods in survival analysis. In spite of the importance of this problem and relevance to real-world applications, this research topic is scattered across various disciplines. In this tutorial, we will provide a comprehensive and structured overview of both statistical and machine learning based survival analysis methods along with different applications. We will also discuss the commonly used evaluation metrics and other related topics. The material will be coherently organized and presented to help the audience get a clear picture of both the fundamentals and the state-of-the-art techniques.
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...Carlos M Martínez M
The objective of this paper is to propose statistical tests to compare k survival curves involving recurrent events. Recurrent events occur in many important scientific areas: psychology, bioengineering, medicine, physics, astronomy,
biology, economics and so on. Such events are very common in the real world: viral diseases, seizure, carcinogenic tumors, fevers, machinery and equipment failures, births, murders, rain, industrial accidents, car accidents and so on. The idea is to generalize the weighted statistics used to compare survival curves in classical models. The estimation of the survival functions is based on a non-parametric model proposed by Peña et al., using counting processes. Rlanguage programs using known routines like survival and survrec were designed to make the calculations. The database Byar experiment is used and the time (months) of recurrence of tumors in 116 sick patients with superficia bladder cancer is measured. These patients were randomly allocated to the following treatments: placebo (47 patients), pyridoxine (31 patients) and thiotepa (38 patients). The aim is to compare the survival curves of the three groups and to determine if there are significant differences between treatments.
Abstract- Statistical models include issues such as statistical characterization of numerical data, estimating the probabilistic future behaviour of a system based on past behaviour, extrapolation or interpolation of data based on some best-fit, error estimates of observations or model generated output. If the statistical model is used to analyse the survival data it is known as statistical model in survival analysis. There are different statistical data. Censored data is one of its kinds. Censoring means the actual survival time is unknown. Censoring may occur when a person does not experience the event before the study ends or lost to follow-up during the study period or withdraws from the study. For this type of censored data the suitable model is survival models. Survival models are classified as non-parametric, semi-parametric and parametric models. The survival probability can be obtained using these models. Using the health data of cancer registry in Tiruchirappalli, Tamil Nadu , a study on survival pattern of cancer patients was explored, the non-parametric modelling that is Kaplan-Meier method was used to estimate the survival probability and the comparison of survival probability of obtained by life table and Kaplan Meier methods for each stage of the disease were made. Log rank test has been used for the comparison between the estimates obtained at the different stages of the disease.
Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. More details are available here http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
El paquete TestSurvRec implementa las pruebas estadíıticas para comparar dos curvas de supervivencia con eventos recurrentes. Este software ofrece herramientas ´utiles para el an´alisis de la supervivencia en el campo de la biomedicina, epidemiolog´ıa, farmac´eutica y otras áreas. El paquete TestSurvRec contiene dos conjuntos de datos con eventos recurrentes, un conjunto de datos referido al experimento de Byar que contiene los tiempos de recurrencia de tumores de c´ancer de vejiga en los pacientes tratados con piridoxina, tiotepa o considerado como un placebo. Y otro conjunto de datos que contiene los tiempos de rehospitalizaci´on despu´es de la cirug´ıa en pacientes con cáncer colorrectal. Estos datos provienen de un estudio que se llev´o a cabo en el Hospital de Bellvitge, un hospital universitario p´ublico en Barcelona (España).
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA
During my time working on attribution and ingest systems, I've encountered several different approaches to solving the simple question: "How do I get data from A to B". In this session, I'd like to share some of the problems I've encountered and how to effectively solve them.
Este manual es útil e indispensable para el uso del "Package TesSurvRec_1.2.1" de CRAN. Importante para estadístico, médicos, farmacéuticos, seguros, bancos, ingenieros, psicólogos, astrónomos, entre otras profesiones. Son pruebas estadísticas que se utilizan para medir diferencias entre funciones del análisis de supervivencias de grupos de poblaciones que manifiestan eventos recurrentes.
Due to the advancements in various data acquisition and storage technologies, different disciplines have attained the ability to not only accumulate a wide variety of data but also to monitor observations over longer time periods. In many real-world applications, the primary objective of monitoring these observations is to estimate when a particular event of interest will occur in the future. One of the major difficulties in handling such problem is the presence of censoring, i.e., the event of interests is unobservable in some instance which is either because of time limitation or losing track. Due to censoring, standard statistical and machine learning based predictive models cannot readily be applied to analyze the data. An important subfield of statistics called survival analysis provides different mechanisms to handle such censored data problems. In addition to the presence of censoring, such time-to-event data also encounters several other research challenges such as instance/feature correlations, high-dimensionality, temporal dependencies, and difficulty in acquiring sufficient event data in a reasonable amount of time. To tackle such practical concerns, the data mining and machine learning communities have started to develop more sophisticated and effective algorithms that either complement or compete with the traditional statistical methods in survival analysis. In spite of the importance of this problem and relevance to real-world applications, this research topic is scattered across various disciplines. In this tutorial, we will provide a comprehensive and structured overview of both statistical and machine learning based survival analysis methods along with different applications. We will also discuss the commonly used evaluation metrics and other related topics. The material will be coherently organized and presented to help the audience get a clear picture of both the fundamentals and the state-of-the-art techniques.
STATISTICAL TESTS TO COMPARE k SURVIVAL ANALYSIS FUNCTIONS INVOLVING RECURREN...Carlos M Martínez M
The objective of this paper is to propose statistical tests to compare k survival curves involving recurrent events. Recurrent events occur in many important scientific areas: psychology, bioengineering, medicine, physics, astronomy,
biology, economics and so on. Such events are very common in the real world: viral diseases, seizure, carcinogenic tumors, fevers, machinery and equipment failures, births, murders, rain, industrial accidents, car accidents and so on. The idea is to generalize the weighted statistics used to compare survival curves in classical models. The estimation of the survival functions is based on a non-parametric model proposed by Peña et al., using counting processes. Rlanguage programs using known routines like survival and survrec were designed to make the calculations. The database Byar experiment is used and the time (months) of recurrence of tumors in 116 sick patients with superficia bladder cancer is measured. These patients were randomly allocated to the following treatments: placebo (47 patients), pyridoxine (31 patients) and thiotepa (38 patients). The aim is to compare the survival curves of the three groups and to determine if there are significant differences between treatments.
Abstract- Statistical models include issues such as statistical characterization of numerical data, estimating the probabilistic future behaviour of a system based on past behaviour, extrapolation or interpolation of data based on some best-fit, error estimates of observations or model generated output. If the statistical model is used to analyse the survival data it is known as statistical model in survival analysis. There are different statistical data. Censored data is one of its kinds. Censoring means the actual survival time is unknown. Censoring may occur when a person does not experience the event before the study ends or lost to follow-up during the study period or withdraws from the study. For this type of censored data the suitable model is survival models. Survival models are classified as non-parametric, semi-parametric and parametric models. The survival probability can be obtained using these models. Using the health data of cancer registry in Tiruchirappalli, Tamil Nadu , a study on survival pattern of cancer patients was explored, the non-parametric modelling that is Kaplan-Meier method was used to estimate the survival probability and the comparison of survival probability of obtained by life table and Kaplan Meier methods for each stage of the disease were made. Log rank test has been used for the comparison between the estimates obtained at the different stages of the disease.
Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. More details are available here http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
El paquete TestSurvRec implementa las pruebas estadíıticas para comparar dos curvas de supervivencia con eventos recurrentes. Este software ofrece herramientas ´utiles para el an´alisis de la supervivencia en el campo de la biomedicina, epidemiolog´ıa, farmac´eutica y otras áreas. El paquete TestSurvRec contiene dos conjuntos de datos con eventos recurrentes, un conjunto de datos referido al experimento de Byar que contiene los tiempos de recurrencia de tumores de c´ancer de vejiga en los pacientes tratados con piridoxina, tiotepa o considerado como un placebo. Y otro conjunto de datos que contiene los tiempos de rehospitalizaci´on despu´es de la cirug´ıa en pacientes con cáncer colorrectal. Estos datos provienen de un estudio que se llev´o a cabo en el Hospital de Bellvitge, un hospital universitario p´ublico en Barcelona (España).
MATHEMATICAL MODELLING OF EPIDEMIOLOGY IN PRESENCE OF VACCINATION AND DELAYcscpconf
The Mathematical modeling of infectious disease is currently a major research topic in the public health domain. In some cases the infected individuals may not be infectious at the time of
infection. To become infectious, the infected individuals take some times which is known as latent period or delay. Here the two SIR models are taken into consideration for present analysis where the newly entered individuals have been vaccinated with a specific rate. The analysis of these models show that if vaccination is administered to the newly entering individuals then the system will be asymptotically stable in both cases i.e. with delay and
without delay
call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, research and review articles, engineering journal, International Journal of Engineering Inventions, hard copy of journal, hard copy of certificates, journal of engineering, online Submission, where to publish research paper, journal publishing, international journal, publishing a paper, hard copy journal, engineering journal
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...Waqas Tariq
The Kaplan Meier method is used to analyze data based on the survival time. In this paper used Kaplan Meier procedure and Cox regression with these objectives. The objectives are finding the percentage of survival at any time of interest, comparing the survival time of two studied groups and examining the effect of continuous covariates with the relationship between an event and possible explanatory variables. The variables (Age, Gender, Weight, Drinking, Smoking, District, Employer, Blood Group) are used to study the survival patients with cancer stomach. The data in this study taken from Hiwa/Hospital in Sualamaniyah governorate during the period of (48) months starting from (1/1/2010) to (31/12/2013) .After Appling the Cox model and achieve the hypothesis we estimated the parameters of the model by using (Partial Likelihood) method and then test the variables by using (Wald test) the result show that the variables age and weight are influential at the survival of time.
Use Proportional Hazards Regression Method To Analyze The Survival of Patient...
430 PROJJ
1. University of Illinois
Statistics 430
Survival Analysis
Non-Parametric Estimation of Summary
Curves for Competing Risks
Authors:
McClelland Kemp & Michael Smith
December 20, 2012
2. Introduction
In survival analysis a participant in a study may experience an event other than the one of
interest, altering the probability of experiencing the event of interest. These events are known
as competing risk events. Competing risks commonly occur in medical studies, and meth-
ods were developed specifically to deal with their occurrence in cancer research since both
treatment-related mortality and disease recurrence are important events of interest.
The standard methodology for approaching this kind of time-to-event data was developed
by Robert Gray in the 1980’s, and is still used today. Statistical software such as SAS and R
have PROC’s and packages developed for this specific purpose. The R package cmprsk was
written by Gray for data with competing risks and is used extensively in this report.
Method
This topic deviates from the main focus of the course in that the practice of censoring data
in a non-informative way is no longer acceptable. Since the Kaplan-Meier estimation proce-
dure observes only the primary event of interest, it is rendered useless in a competing risks
situation. Similarly, the Mantel-Haenzel log-rank test for comparison of cumulative incidence
curves, and the standard Cox model for the assessment of covariates lead to incorrect and bi-
ased results. This bias arises because the Kaplan-Meier estimation assumes that all events are
independent and consequently censors events other than the event of interest.
We wish, rather, to find the cumulative incidence of a specific event of interest. Any subject
who fails to experience the event of interest can be treated as censored, but in an informa-
tive manner. The cumulative incidence function for an event of interest must be calculated
by appropriately accounting for the presence of competing risk events. In order to compare
the cumulative incidence curves of a particular type of failure among different groups in the
presence of competing risks, Robert Gray proposed a test that compares weighted averages of
the hazards of cumulative incidence function using the cumulative incidence estimation equa-
tion. The null hypothesis predicts that all the cumulative incidence curves for all the groups
are equivalent, and the test statistic follows a χ2
distribution with degrees of freedom ng − 1
where ng is the number of groups.
The cumulative incidence function estimates the probability that the event of interest occurs
before time t and that it occurs before any of the competing causes of failure. Let t1 < t2 <
· · · < tK be the distinct times where one of the competing risks occur, where at time ti, Yi is
the number of subjects at risk, ri is the number of subjects with an occurrence at time ti and di
is the number of subjects with an occurrence of any other of the competing risks at this time.
The cumulative incidence function is then defined by:
CIi(t) =
0 if t ≤ t1
t1≤t
i−1
j=1 1 −
[dj+rj]
Yj
ri
Yi
if t1 ≤ t
(0.1)
i
3. With variance:
V [CI(t)] =
t1≤t
ˆS (ti)2
[CI(t) − CI(ti)]2 ri + di
Y2
i
+ [1 + 2 (CI(t) − CI(ti))]
ri
Y2
i
, (0.2)
where ˆS (ti) is the Kaplan-Meier estimator for the survival curve defined by:
ˆS (ti) =
1 if t < t1
t1≤t 1 − di
Yi
if t1 ≤ t
(0.3)
Gray’s test evaluates H0 : CI1 = CI2 = · · · = CIng
. A score is calculated for each of the ng-1
groups and put in a vector. The form of the test statistic for the ng=2 situation is below, while
the derivation of the test statistic for the ng > 2 groups situation is omitted.
τk
0
Kn(t) 1 − ˆCI11(t−)
−1
d ˆCI11(t) − 1 − ˆCI12(t−)
−1
d ˆCI12(t) , (0.4)
where ˆCI1n(t−) is the cumulative incidence function evaluated just before time t and Kn(t) is the
weight function for the nth group. We would like to choose a weight function that maximizes
the power of the test against particular alternatives of interest. Gray suggests that regular
survival analysis for competing risks can give a good indication of what a good choice for the
weight function should be, and in general should use the Harrington and Flemming family of
weight functions. Here, we use ρ = 0 for simplicity. The calculation of the test statistic is as
follows:
ZΣ−1
Zt
(0.5)
Where Z is the vector of the K − 1 scores for the corresponding groups (Linear dependence
lets us exclude the last score) and Σ is the estimated (K-1× K-1) covariance matrix. In the
K=2 case, the vector Z is a scalar and the estimated covariance matrix is just the variance of
the score for K=1.
Another topic of interest is the conditional probability function for each competing risk,
which gives an estimate of the conditional probability of event K’s occuring by t given that
none of the other causes have occurred by t. It can be calculated by:
CPi(t) =
CIi(t)
1 − CIic (t)
(0.6)
Where CIi(t) denotes the cumulative incidence function for the ith competing risk at time t,
and CIic (t) denotes the sum of all other risks.
ii
4. Objectives
We will compare the estimated cumulative incidence curves with those obtained using the
Kaplan - Meier approach to demonstrate the importance of appropriately estimating the cu-
mulative incidence of an event of interest in the presence of competing risk events. We will
also interpret the Gray’s tests for equality of cumulative incidence curves and conditional
probability functions to infer conclusions about our data.
We recognize that testing the equality of curves between the Kaplan-Meier Curves and the
estimated cumulative incidence function using a Log-Rank test is not appropriate for com-
peting risks but due to complexity, the proper method of testing requires competing risks
regression is not included in this paper. We will rely on the ’cuminc’ function to perform the
Gray’s test.
Relevance of Data
The data that will be used for analysis is the ’bmt’ data which documents bone marrow trans-
plants for leukemia patients. The transplantation was considered to be a failure when the
patient’s leukemia returned (relapse) or the patient died in remission. This makes it an ideal
case for competing risks with K = 2 and ng = 3.
iii
5. Analysis
The bmt data includes three groups: ALL, AML Low Risk and AML High Risk. The variable
’group’ contains factors that indicate which group the subject is in. The variable t2 contains
numeric values indicating the number of days before the subject died, relapsed or left the study
and was censored. The variable d1 is an indicator on whether or not there was an observed
death. The variable d2 is an indicator on whether or not there was an observed relapse. Since
we are only concerned with the competing risks of death or relapse, we did not use the rest of
the data set. To create a usable data set, we created a new variable delta:
delta[i] =
0, if no relapse or death was observed
1, if a relapse was observed
2, if a death was observed
(0.7)
For subjects that experienced a relapse and then death, they were coded as a relapse (obvi-
ously the subject would have experienced a relapse before their death, since it is impossible to
relapse once you are dead). Our new data set contained three variables: group, t2, and delta.
We then scaled the variable t2 so it described the time to event in years.
Our first point of analysis was estimating using the Kaplan-Meier process, which can be
compared to the cumulative incidence and conditional probability summary curves when one
minus the Kaplan-Meier estimate is plotted against time. For each group we plotted the curves
for the event probability of death and relapse after the transplant against years on study.
To examine the cumulative incidence curves, we used the function ’cuminc’ out of the
package created by Gray called ’cmprsk’ which also includes a function for competing risks
regression for sub-distribution functions. The function ’cuminc’ takes arguments for a time
variable, a nominal group variable, a nominal indicator variable (k for an observed kth com-
peting risk.), a value for rho (the power of the weight function), and optional stratification
variable. It yields the results of Gray’s test, the time of events, the cumulative incidence esti-
mates and their variances. We created plots to examine the summary curves and inferred on
the Gray’s test when appropriate.
To calculate the conditional probabilities we again used the ’cuminc’ function. It stores the
times and estimates for group n and event m at object[["n m"]]$ time and object[["n m"]]$ est
where ’object’ is the character string assigned to the ’cuminc’ function. To calculate the actual
estimates, we used equation (0.6) and looped over all estimates, then plotted them against the
times.
1
6. Kaplan-Meier Estimation
0 1 2 3 4 5
0.10.20.30.4
Disease Group (ALL)
Time
Probability
Death
Relapse
0 1 2 3 4 5 6 7
0.10.20.30.4
Disease Group (AML Low Risk)
Time
Probability
Death
Relapse
0 2 4 6
0.00.20.40.6
Disease Group (AML High Risk)
Time
Probability
Death
Relapse
The plots of one minus the survival
estimates found using the ’survfit’
function are plotted above for each
disease group. It appears that in all
groups except the AML Low Risk, study
subjects have a higher probability of
relapsing than dying for a majority of the
study. The large difference in the AML
Low Risk subjects could be attributed to
the low number of events observed in the
group.
Again, it should be noted that these curves are based on naive estimates. That is, that they were
estimated based upon the assumption that there did not exist any other competing risk, and we
are merely calculating them for comparison with other summary curves that are deemed more
appropriate to represent event probability when there are competing risks present.
2
7. Cumulative Incidence
0 1 2 3 4 5
0.00.20.40.60.81.0
Cumulative Incidence (ALL Group)
Time
Probability
ALL 1
ALL 2Death
Relapse
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
Cumulative Incidence (AML Low Risk Group)
Time
Probability
2 1
2 2
Death
Relapse
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence (AML High Risk Group)
Time
Probability
3 1
3 2Death
Relapse
While Gray’s test is not preformed, these
cumulative incidence curves are
descriptive of the difference between the
Kaplan-Meier estimation and the
procedure that accounts for the
competing risks. In every set of curves
there is a reversal from the KM curves.
For example, the AML Low Risk group
was estimated by the KM curve to have a
higher event probabilities for death, but
when the competing risk is taken into
account, relapse has a higher probability.
The results confirm the inadequacy of the Kaplan-Meier estimator to accurately model a situ-
ation with competing risks. This is an extreme case, where the conclusions of the cumulative
incidence curves directly contradicted the KM curves, but in general one would expect to see
a drastic effect of including the competing risks when estimating.
As stated above, Gray’s test was not preformed because it only tests for equality of cumu-
lative incidence curves between groups, not between risks. We will, however, test the equality
of the cumulative incidence functions for the three groups by risk on the next page.
3
8. Cumulative Incidence (cont.)
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence − Relapse Probabilities
Time
Probability
1 1
2 1
3 1
ALL
AML Low Risk
AML High Risk
0 2 4 6
0.00.20.40.60.81.0
Cumulative Incidence − Death Probabilities
Time
Probability
1 2
2 2
3 2
ALL
AML Low Risk
AML High Risk
Tests:
stat pv df
1 15.19568 0.0005015337 2
Tests:
stat pv df
2 3.128035 0.2092935 2
When comparing the global risk of relapse and death amongst the three types of groups, we
see clear differences in trends for relapse probabilities and a not-so-clear difference in death
probabilities. This is confirmed by the Grey’s test conducted as part of the ’cuminc’ function.
We see a significant p-value in the test for relapse, and a less than significant p-value for the
test for death, which leads to the conclusion that relapse is more likely depending on the group
to which one belongs, but probability of death is not necessarily different for the three groups.
4
9. Conditional Probability
Plotting the Kaplan-Meier, cumulative incidence, and the conditional probability curves to-
gether gives us some more insight about the cumulative probability. As seen in the plot, the
conditional probability curve changes value at the occurrence of either death or relapse. This
is the result of a change in likelihood of one event causing changes in the probabilities of
future events. The conditional probability should generally be greater than the other two and
the cumulative incidence generally smaller.
Conclusions
The use of the ’cuminc’ function enabled us to exhibit the importance of adjusting event prob-
ability based on the presence of competing risks. The results of the analysis of our data
confirmed the need for a less naive form of estimation than was offered by the Kaplan-Meier
procedure, which produced completely contradictory results.
When comparing the summary curves of the three groups ALL, AML Low Risk and AML
High Risk, we found that the cumulative incidence curves for Relapse were significantly dif-
ferent from each other. This signifies that the event probability for a subject in each group
would have to be estimated independently of the other groups. Conversely, we found that the
groups have cumulative incidence curves for death that were not significantly different from
each other. This implies that the event probability for a subject in any group could be estimated
well by any of the curves.
5
10. Sources
Gray, Robert "A class of K sample tests for comparing the cumulative
incidence of a competing risk" (1986) 1-15newline
Klein, John P. Moeschberger, Melvin L. "Survival Analysis: Techniques
for Censored and Truncated Data" (2003) 127-133
R Code
#### Packages #####
install.packages(c("mstate","survival","KMsurv","cmprsk"))
library(mstate)
library(survival)
library(KMsurv)
library(cmprsk)
setwd("C:/Users/User/Dropbox/430 Proj")
detach()
#### Data Prep ####
delta=vector()
attach(bmt)
for(i in 1:length(bmt$t1)){
if (d1[i]==1 & d2[i]==1) {delta[i]=1}
if (d1[i]==1 & d2[i]==0) {delta[i]=2}
if (d1[i]==0 & d2[i]==1) {delta[i]=1}
if (d1[i]==0 & d2[i]==0) {delta[i]=0}
}
detach(bmt)
group=vector()
for(i in 1:length(bmt$group)){
if (bmt$group[i]==1) {group[i]="ALL"}
if (bmt$group[i]==2) {group[i]="AML Low Risk"}
if (bmt$group[i]==3) {group[i]="AML High Risk"}
}
group<-rep(c("ALL","AML Low Risk","AML High Risk"),c(38,54,45))
Marrow<-cbind(group,delta,bmt[,-1])
Marrow$t2=as.numeric(bmt$t2/365)
Marrow<-as.data.frame(Marrow)
#### Kaplan - Meier Estimate ####
6
12. #### Cumulative Incidence ####
# The cuminc function returns an option "tests" that gives the test statistics and
# comparing the subdistribution for each cause across groups.
CI1<-cuminc(Marrow$t2[1:38],Marrow$delta[1:38],Marrow$group[1:38]); # ALL Death vs.
plot(CI1,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence
legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2)
CI2<-cuminc(Marrow$t2[39:92],Marrow$delta[39:92],Marrow$group[39:92]); # AMLL Death
plot(CI2,col=1:2,lty=1:2,ylab="Probability",xlab="Time",main="Cumulative Incidenc
legend(locator(1), legend=c("Death","Relapse"), lty=1, col=1:2)
CI3<-cuminc(Marrow$t2[93:137],Marrow$delta[93:137],Marrow$group[93:137]); # AMLH De
plot(CI3,col=1:2,lty=1,ylab="Probability",xlab="Time",main="Cumulative Incidence
legend(locator(1), legend=c("Death","Relapse"), lty=1,col=1:2)
Death<-subset(Marrow,delta!=1)
CI4<-cuminc(Death$t2,Death$delta,Death$group); name<-names(CI4) # Death for all g
plot(CI4,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat
legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt
Relapse<-subset(Marrow,delta!=2)
CI5<-cuminc(Relapse$t2,Relapse$delta,Relapse$group);name<-names(CI5) # Relapse fo
plot(CI5,name[1:3],col=1:3,lty=1:3,ylab="Probability",xlab="Time",main="Cumulat
legend(locator(1), legend=c("ALL","AML Low Risk","AML High Risk"),col=1:3, lt
CI6<-cuminc(Marrow$t2,Marrow$delta,Marrow$group);name<-names(CI6)
plot(CI6,name[1:6],col=1:6,lty=1:6,ylab="Probability",xlab="Time",main="Cumulative
#### Conditional Probability ####
sourced = ifelse(source==1,1,0)
sourcer = ifelse(source==2,1,0)
#ALL group:
#Cumulative Incidence
cirall = cuminc(t2[1:38],sourcer[1:38],group[1:38])
cidall = cuminc(t2[1:38],sourced[1:38],group[1:38])
8