SlideShare a Scribd company logo
1 of 39
Download to read offline
Introduction Proposed method Simulation studies Real example Conclusion References
K-Adaptive Partitioning for Survival Data
with an Application to SEER
Soo-Heang Eo
with Sungwan Bang, Seung-Mo Hong, HyungJun Cho
Department of Statistics
Korea University
April 25, 2014
Introduction Proposed method Simulation studies Real example Conclusion References
Table of Contents
Introduction
Motivation example
Previous studies
Proposed method
Finding the best split set
Finding optimal number of subgroups
Simulation studies
Simulation design
Simulation results
Real example
Conclusion
References
Introduction Proposed method Simulation studies Real example Conclusion References
Cancer staging
cancer staging system that describes the extent of cancer in a
patient’s body. [2]
Introduction Proposed method Simulation studies Real example Conclusion References
TNM staging system
The TNM Classification of Malignant Tumors (TNM) is a cancer
staging system that describes the extent of cancer in a patient’s
body. [2]
T describes the size of the tumor and whether it has invaded
nearby tissue,
N describes distant metastasis lymph node (spread of
cancer from one body part to another).
M describes regional lymph nodes that are involved,
Introduction Proposed method Simulation studies Real example Conclusion References
Previous studies
• Hilsenbeck and Clark (1996) concerned simulation studies to
examine the effects of number of cutpoints and true marker
prognostic effect size [3].
• Contal and O’Quigley (1999) proposed the asymptotic distribution
of a re-scaled rank statistic [1].
• Lausen et al. (1994) and Hong et al. (2007) employed decision
tree methodology [8, 4].
• Hothorn and Lausen (2003) used maximally selected rank
statistics [6] to find the best cutpoints and extended to CTree [5].
• Ishwaran et al. (2009) adopted the idea of random survival forests
Introduction Proposed method Simulation studies Real example Conclusion References
Problem is ...
|
Node 8 Node 9
Node 20 Node 21
Node 11
Node 6 Node 7
n= 36086 n= 7331
n= 4693 n= 3288
n= 4401
n= 6183 n= 3204
med= 115 med= 80
med= 57 med= 47
med= 38
med= 23 med= 13
meta
meta
meta meta
meta
meta
<=5
<=1
<=0 <=3
<=2
<=11
Node 1
Node 2
Node 4 Node 5
Node 10
Node 3
n= 65186
n= 55799
n= 43417 n= 12382
n= 7981
n= 9387
med= 77
med= 95
med= 110 med= 47
med= 53
med= 19
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Node 8 (# of Meta = 0)
Node 9 (# of Meta = 1)
Node 20 (# of Meta = 2)
Node 21 (# of Meta = 3)
Node 11 (# of Meta = 4,5)
Node 6 (# of Meta = 6,7,...,11)
Node 7 (# of Meta = 12,...,80)
Introduction Proposed method Simulation studies Real example Conclusion References
Problem is ...
(a) TSK = 148
p<0.0001
(b) TSK = 132
p<0.0001
Introduction Proposed method Simulation studies Real example Conclusion References
Our aim is to
• Divide the whole data D into K heterogeneous subgroups
D1, . . . , DK based on the information of X .
• Overcome the limitation that some subgroups differ
substantially in survival, but others may differ barely or
insignificantly.
• Evaluate multi-way split points simultaneously and find an
optimal set of cutpoints.
• Implement the proposed algorithm into an R package kaps.
Introduction Proposed method Simulation studies Real example Conclusion References
Finding the best split set
Let Ti be a survival time, Ci a censoring status, and Xi be an
ordered covariate for the ith observation. We observe the triples
(Yi , δi , Xi ) and define
Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ),
which represent the observed response variable and the censoring
indicator, respectively.
Introduction Proposed method Simulation studies Real example Conclusion References
Finding the best split set
Let χ2
1 be the χ2 statistic with one degree of freedom (df) for
comparing the gth and hth of K subgroups created by a split set
sK when K is given. For a split set sK of D into D1, D2, . . . , DK ,
the test statistic for a measure of deviance can be defined as
T1(sK ) = min
1≤g<h≤K
χ2
1 for sK ∈ SK , (1)
where SK is a collection of split sets sK generating K disjoint
subgroups.
Introduction Proposed method Simulation studies Real example Conclusion References
Finding the best split set
Then, take s∗
K
as the best split set such that
T∗
1(s∗
K ) = max
sK ∈SK
T1(sK ). (2)
The best split s∗
K
is a set of (K − 1) cutpoints which clearly
separate the data D into K disjoint subsets of the data:
D1, D2, . . . , DK .
Introduction Proposed method Simulation studies Real example Conclusion References
Finding the best split set - Illustration
• We assumed that the patients were divided into 3 heterogeneous
groups by the number of lymph nodes (X = {0, 1, . . . , 6}).
• Each group was categorised as D1, D2 and D3.
• 3 pairs of groups (D1vs.D2, D2vs.D3, and D1vs.D3).
• We imagined existing 3 cut-off points candidates, each of which
were composed of two cutpoints; {0,1}, {0,2}, and {1,2}.
• For example, the cutpoints candidate {0, 1} mean that
D1 = {X = 0}, D2 = {0 < X ≤ 1}, and D3 = {X > 1}.
• Out of the 3 pairs of the candidates, the smallest test statistic was
selected as a representative statiatic for the candidate.
Introduction Proposed method Simulation studies Real example Conclusion References
Finding the best split set - Illustration
Introduction Proposed method Simulation studies Real example Conclusion References
Finding the best split set
Introduction Proposed method Simulation studies Real example Conclusion References
Algorithm 1. Finding the best split set for given K
Step 1: Compute chi-squared test statistics χ2
1 for all possible pairs, g and h, of K
subgroups by sK , where 1 ≤ g < h ≤ K and sK is a split set of (K − 1)
cutpoints generating K disjoint subgroups.
Step 2: Obtain the minimum pairwise statistic T1(sK ) by minimizing χ2
1 for all
possible pairs, i.e., T1(sK ) = min1≤g<h≤K χ2
1 for sK ∈ SK ,where SK is a
collection of split sets sK generating K disjoint subsets of the data.
Step 3: Repeat Steps 1 and 2 for all possible split sets SK .
Step 4: Take the best split set s∗
K
such that T∗
1(s∗
K
) = maxsK ∈SK
T1(sK ). When two
or more split sets have the maximum T∗
1 of the minimum pairwise statistics,
choose the best split set with the largest overall statistic T∗
K−1
.
Introduction Proposed method Simulation studies Real example Conclusion References
Finding optimal subgroups
• One of the important issues is to determine a reasonable
number of subgroups, i.e. the selection of an optimal K.
• We find an optimal multi-way split at a time for the given
number of subgroups.
• We need to choose only one of a possible number of
subgroups.
• For a data-driven objective choice, we here suggest a
statistical procedure to choose an optimal number of
subgroups.
Introduction Proposed method Simulation studies Real example Conclusion References
Finding optimal subgroups
The minimum pairwise statistics of the permuted data
pK =
R
r=1
I (T
(r)
1 (s∗
K ) ≥ T∗
1(s∗
K ))/R, K = 2, 3, . . . ,
where T
(r)
1 (s∗
K
) is the rth repeated minimum pairwise statistic for
the permuted data.
• The data can be reconstructed by matching their labels after
permuting the labels of X with retaining the labels of (Y , δ).
• When the permuted data are allocated into each subgroup by
s∗
K
, there should be no significant differences in survival
among the subgroups.
Introduction Proposed method Simulation studies Real example Conclusion References
Finding optimal subgroups
We choose the largest number to discover as many significantly
different subgroups as possible, given that the corrected p-values
are smaller than or equal to a pre-determined significance level
ˆK = max{K|pc
K ≤ α, K = 2, 3, . . .},
where pC
K
is the corrected p-values for multiple comparison because
there are (K − 1) comparisons between two adjacent subgroups
Introduction Proposed method Simulation studies Real example Conclusion References
Finding optimal subgroups - Illustration
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
G1
G2
(a) K = 2
TSK = 26.37
TS1 = 26.37
pK <0.0001
p1 = <0.0001
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
G1
G2
G3
(b) K = 3
TSK = 36.80
TS1 = 7.2
pK <0.0001
p1 = 0.0073
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
G1
G2
G3
G4
(c) K = 4
TSK = 38.00
TS1 = 1.895
pK <0.0001
p1 = 0.1686
Introduction Proposed method Simulation studies Real example Conclusion References
Finding optimal subgroups - Illustration
meta
Survivalmonths
0 10 20 30
1248729612014
Survival months
Survivalprobability
0.00.20.40.6
0 24 48 72 96 120 144 168 192
G3
K
p-valuesofXk
2 3 4
0.00.10.20.30.40.5
K
p-valuesofX1
2 3 4
0.00.20.40.60.81.0
Introduction Proposed method Simulation studies Real example Conclusion References
Algorithm 2. Selecting the optimal number of subgroups
Step 1: Find s∗
K
and T1(s∗
K
) with the raw data for each K using Algorithm 1.
Step 2: Construct the permuted data by permuting the labels of X whilst retaining
the labels of (Y , δ).
Step 3: Allocate the permuted data into each subgroup by s∗
K
.
Step 4: Obtain the minimum pairwise statistic T
(r)
1 (s∗
K
) for the permuted data.
Step 5: Repeat steps 2 to 4 R times, and then obtain
T
(1)
1 (s∗
K
), T
(2)
1 (s∗
K
), . . . , T
(R)
1 (s∗
K
).
Step 6: Compute the permutation p-value pK for each K, i.e.,
pK = R
r=1 I (T
(r)
1 (s∗
K
) ≥ T1(s∗
K
))/R, K = 2, 3, . . . .
Step 7: Correct the permutation p-value pK by correcting for multiple comparisons,
e.g., corrected p-value pc
K
= pK /(K − 1), K = 2, 3, . . ..
Step 8: Select the largest K when the corrected p-values are less than or equal to α,
i.e., ˆK = max{K| pc
K
≤ α, K = 2, 3, . . .}.
Introduction Proposed method Simulation studies Real example Conclusion References
Simulation
Simulation study: kaps vs. binary split (rpart and ctree)
Design:
• survival time Ti ∼ exp(λi )
• censoring time Ci ∼ exp(c0)
• response Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ), where i = 1, 2, . . . , n
• a prognostic factor Xi ∼ DU (1, 20)
• KAPS vs. Survival CART [9] and CTREE [5]
• training and test samples: 200 observations each with censoring rates of
15% or 30%
• repeated 100 times independently
Introduction Proposed method Simulation studies Real example Conclusion References
Setting 1 - stepwise model
We consider the following stepwise model (SM) defining parameter
λi as follows.
λi =



0.02, Xi ≤ 7,
0.04, 7 < Xi ≤ 14,
0.08, 14 < Xi ,
(3)
This model has three different hazard rates that are distinguished
by two cutpoints 7 and 14.
Introduction Proposed method Simulation studies Real example Conclusion References
Setting 2 - linear model
In addition, we consider the following linear model (LM) defining
parameter λi as follows.
λi = 0.1Xi . (4)
In this model, λi depends on Xi linearly.
It follows that Yi depends on Xi nonlinearly.
This model has a number of different hazard rates.
Introduction Proposed method Simulation studies Real example Conclusion References
Results of the stepwise model (SM)
Selection of cut-off points when K is known
Cutpoint
Proportion(%)
0 5 10 15 20
0102030405060
kaps
Cutpoint
Proportion(%)
0 5 10 15 20
0102030405060
ctree
Cutpoint
Proportion(%)
0 5 10 15 20
0102030405060
rpart
0 5 10 15 20
05101520
C1
C2
kaps
0 5 10 15 20
05101520
C1
C2
ctree
0 5 10 15 20
05101520
C1
C2
rpart
Introduction Proposed method Simulation studies Real example Conclusion References
Results of the stepwise model (SM)
Selection of the optimal number of subgroups (K = 3)
Number of subgroups
Proportion(%)
020406080
1 2 3 4 1 2 3 4 1 2 3 4
rpart ctree kaps
(a) SM model with CR = 15%
Number of subgroups
Proportion(%)
020406080
1 2 3 4 1 2 3 4 1 2 3 4
rpart ctree kaps
(b) SM model with CR = 30%
Introduction Proposed method Simulation studies Real example Conclusion References
Results of the linear model (LM)
Selection of cut-off points when K is unknown
Model K Method CR = 15% CR = 30%
Overall Pairwise Overall Pairwise
rpart 43.57 (1.17) 43.57 (1.17) 38.73 (1.03) 38.73 (1.03)
2 ctree 38.94 (1.16) 38.94 (1.16) 36.29 (1.01) 36.29 (1.01)
kaps 44.52 (1.15) 44.52 (1.15) 38.74 (1.11) 38.74 (1.11)
rpart 48.07 (1.47) 6.87 (0.50) 43.21 (1.11) 6.32 (0.44)
LM 3 ctree 55.64 (1.39) 12.94 (1.14) 47.99 (1.40) 8.33 (0.64)
kaps 54.96 (1.39) 13.83 (0.63) 47.95 (1.36) 11.33 (0.59)
rpart 59.60 (1.59) 2.59 (0.26) 53.01 (1.33) 2.30 (0.22)
4 ctree 59.82 (1.35) 3.17 (0.37) 52.05 (1.34) 2.10 (0.21)
kaps 61.27 (1.39) 3.22 (0.24) 53.48 (1.34) 2.70 (0.23)
Introduction Proposed method Simulation studies Real example Conclusion References
Results of the linear model (LM)
Selection of the optimal number of subgroups (K =?)
Number of subgroups
Proportion(%)
010203040506070
2 3 4 5 2 3 4 5 2 3 4 5
rpart ctree kaps
(a) LM model with CR = 15%
Number of subgroups
Proportion(%)
020406080
2 3 4 5 2 3 4 5 2 3 4 5
rpart ctree kaps
(b) LM model with CR = 30%
Introduction Proposed method Simulation studies Real example Conclusion References
Surveillance, Epidemiology, and End Results Program
Introduction Proposed method Simulation studies Real example Conclusion References
Surveillance, Epidemiology, and End Results Program
• Population-based database by NCI and CDC
• Collects data on cancer cases from various locations and
sources throughout the US
• Data collection began in 1973 with a limited amount of
registries and continues to expand to include even more areas
and demographics today
Introduction Proposed method Simulation studies Real example Conclusion References
Colon cancer in SEER database
Introduction Proposed method Simulation studies Real example Conclusion References
Colon cancer analysis
Table: Numbers of metastasis lymph nodes for the staging systems by
AJCC, kaps, ctree, and rpart
Subgroup AJCC rpart ctree kaps
Subgroup 1 0 0 0 0
Subgroup 2 1 1,2,3 1 1
Subgroup 3 2,3 4 ∼ 10 2,3 2,3
Subgroup 4 4,5,6 ≥ 11 4,5 4,5,6
Subgroup 5 ≥7 — 6,7,8 7,8,9,10
Subgroup 6 — — ≥9 ≥ 11
Min. pairwise statistic 131.23 932.30 78.35 131.23
Corresponding pair (2, 3) (3, 4) (3, 4) (2, 3)
Introduction Proposed method Simulation studies Real example Conclusion References
Colon cancer analysis
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
Subgroup 3 (2-3 LNs)
Subgroup 4 (4-6 LNs)
Subgroup 5 (7-10 LNs)
Subgroup 6 (>=11 LNs)
(a) kaps
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
Subgroup 3 (2-3 LNs)
Subgroup 4 (4-6 LNs)
Subgroup 5 (>=7 LNs)
(b) AJCC
Introduction Proposed method Simulation studies Real example Conclusion References
Colon cancer analysis
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
Subgroup 3 (2-3 LNs)
Subgroup 4 (4-6 LNs)
Subgroup 5 (7-10 LNs)
Subgroup 6 (>=11 LNs)
(a) kaps
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1-3 LNs)
Subgroup 3 (4-10 LNs)
Subgroup 4 (>=11 LNs)
(b) rpart
Introduction Proposed method Simulation studies Real example Conclusion References
Colon cancer analysis
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
Subgroup 3 (2-3 LNs)
Subgroup 4 (4-6 LNs)
Subgroup 5 (7-10 LNs)
Subgroup 6 (>=11 LNs)
(a) kaps
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
Subgroup 3 (2-3 LNs)
Subgroup 4 (4-5 LNs)
Subgroup 5 (6-8 LNs)
Subgroup 6 (>=9 LNs)
(b) ctree
Introduction Proposed method Simulation studies Real example Conclusion References
Conclusion
We propose a multi-way partitioning algorithm for survival data,
• Divide the data into K heterogeneous subgroups based on the
information of a prognostic factor.
• Generate only fairly well-separated subgroups rather than a
mixture of extremely poorly and well-separated subgroups.
• Identify a multi-way partition which maximizes the minimum
of the pairwise test statistics among subgroups.
• Utilize a permutation test to consists of tow or more
cutpoints.
• Implement an add-on R package into kaps available at CRAN.
Introduction Proposed method Simulation studies Real example Conclusion References
Conclusion
• HJ Kang*, S-H Eo*, SC Kim, KM Park, YJ Lee, SK Lee, E Yu, H Cho,
and S-M Hong (2014). Increased Number of Metastatic Lymph Nodes in
Adenocarcinoma of the Ampulla of Vater as a Prognostic Factor: A
Proposal of New Nodal Classification, Surgery, 155:1, 74-84. (*:co-first
author)
• S-H Eo, S-M Hong, and H Cho (2014). K-adaptive partitioning for
survival data: the kaps add-on package for R, arXiv, 1306.4015v2.
• S-H Eo, S-M Hong, and H Cho. K-adaptive partitioning for survival
data, submitted to Statistics in Medicine.
• Park et al. Survival effect of tumor size and extrapancreatic extension in
surgically resected pancreas cancer patients: Proposal for improved T
classification, submitted to Surgery.
Introduction Proposed method Simulation studies Real example Conclusion References
C´ecile Contal and John O’Quigley.
An application of changepoint methods in studying the effect of age on survival in breast cancer.
Computational Statistics and Data Analysis, 30:253–270, 1999.
S.B. Edge, D.R. Byrd, C.C. Compton, A.G. Fritz, F.L. Greene, and A. 3rd Trotti.
AJCC Cancer staging manual.
Springer, New York, 2010.
Susan Galloway Hilsenbeck and Gary M Clark.
Practical p-value adjustment for optimally selected cutpoints.
Statistics in Medicine, 15(1):103–112, 1996.
SM Hong, H Cho, C. Moskaluk, and E Yu.
Measurement of the invasion depth of extrahepatic bile duct carcinoma: An alternative method overcoming
the current t classification problems of the ajcc staging system.
American Journal of Surgical Pathology, 31:199–206, 2007.
Torsten Hothorn, Kurt Hornik, and Achim Zeileis.
Unbiased recursive partitioning: A conditional inference framework.
Journal of Computational and Graphical Statistics, 15(3), 2006.
Torsten Hothorn and Berthold Lausen.
On the exact distribution of maximally selected rank statistics.
Computational Statistics and Data Analysis, 43:121–137, 2003.
Hemant Ishwaran, Eugene H Blackstone, Carolyn Apperson-Hansen, and Thomas W Rice.
A novel approach to cancer staging: application to esophageal cancer.
Biostatistics, 10(4):603–620, 2009.
Introduction Proposed method Simulation studies Real example Conclusion References
Berthold Lausen, Willi Sauerbrei, and Martin Schumacher.
Classification and regression trees (cart) used for the exploration of prognostic factors measured on different
scales.
In P Dirschedl and R Ostermann, editors, Computational Statistics. Physica Verlag, Heidelberg, 1994.
Michael LeBlanc and John Crowley.
Relative risk trees for censored survival data.
Biometrics, pages 411–425, 1992.

More Related Content

What's hot

Intro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithmIntro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithmkhalid Shah
 
Estimating Future Initial Margin with Machine Learning
Estimating Future Initial Margin with Machine LearningEstimating Future Initial Margin with Machine Learning
Estimating Future Initial Margin with Machine LearningAndres Hernandez
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...asahiushio1
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...asahiushio1
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...Christian Robert
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...asahiushio1
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERINGsingh7599
 
Fractal dimensions of 2d quantum gravity
Fractal dimensions of 2d quantum gravityFractal dimensions of 2d quantum gravity
Fractal dimensions of 2d quantum gravityTimothy Budd
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisMason Ziemer
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++sabbirantor
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Higgs Boson Machine Learning Challenge Report
Higgs Boson Machine Learning Challenge ReportHiggs Boson Machine Learning Challenge Report
Higgs Boson Machine Learning Challenge ReportChamila Wijayarathna
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...mathsjournal
 

What's hot (19)

Intro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithmIntro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithm
 
Survadapt-Webinar_2014_SLIDES
Survadapt-Webinar_2014_SLIDESSurvadapt-Webinar_2014_SLIDES
Survadapt-Webinar_2014_SLIDES
 
Data miningpresentation
Data miningpresentationData miningpresentation
Data miningpresentation
 
Estimating Future Initial Margin with Machine Learning
Estimating Future Initial Margin with Machine LearningEstimating Future Initial Margin with Machine Learning
Estimating Future Initial Margin with Machine Learning
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
 
"reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli..."reflections on the probability space induced by moment conditions with impli...
"reflections on the probability space induced by moment conditions with impli...
 
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
MUMS Opening Workshop - Extrapolation: The Art of Connecting Model-Based Pred...
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Fractal dimensions of 2d quantum gravity
Fractal dimensions of 2d quantum gravityFractal dimensions of 2d quantum gravity
Fractal dimensions of 2d quantum gravity
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Higgs Boson Machine Learning Challenge Report
Higgs Boson Machine Learning Challenge ReportHiggs Boson Machine Learning Challenge Report
Higgs Boson Machine Learning Challenge Report
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Cs36565569
Cs36565569Cs36565569
Cs36565569
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 

Similar to K-adaptive partitioning for survival data

machinelearning project
machinelearning projectmachinelearning project
machinelearning projectLianli Liu
 
Molecular Solutions For The Set-Partition Problem On Dna-Based Computing
Molecular Solutions For The Set-Partition Problem On Dna-Based ComputingMolecular Solutions For The Set-Partition Problem On Dna-Based Computing
Molecular Solutions For The Set-Partition Problem On Dna-Based Computingijcsit
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko
 
Statistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskritStatistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskritSelvin Hadi
 
Some New Fixed Point Theorems on S Metric Spaces
Some New Fixed Point Theorems on S Metric SpacesSome New Fixed Point Theorems on S Metric Spaces
Some New Fixed Point Theorems on S Metric Spacesijtsrd
 
Replica exchange MCMC
Replica exchange MCMCReplica exchange MCMC
Replica exchange MCMC. .
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...Alexander Decker
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodNonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodSSA KPI
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONAndré Panisson
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theoremWathna
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsFrank Nielsen
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsStefan Eng
 
Statistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data LanguageStatistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data Languagemaggiexyz
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applicationsFrank Nielsen
 

Similar to K-adaptive partitioning for survival data (20)

machinelearning project
machinelearning projectmachinelearning project
machinelearning project
 
Molecular Solutions For The Set-Partition Problem On Dna-Based Computing
Molecular Solutions For The Set-Partition Problem On Dna-Based ComputingMolecular Solutions For The Set-Partition Problem On Dna-Based Computing
Molecular Solutions For The Set-Partition Problem On Dna-Based Computing
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Statistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskritStatistik 1 5 distribusi probabilitas diskrit
Statistik 1 5 distribusi probabilitas diskrit
 
Some New Fixed Point Theorems on S Metric Spaces
Some New Fixed Point Theorems on S Metric SpacesSome New Fixed Point Theorems on S Metric Spaces
Some New Fixed Point Theorems on S Metric Spaces
 
Replica exchange MCMC
Replica exchange MCMCReplica exchange MCMC
Replica exchange MCMC
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo MethodNonlinear Stochastic Optimization by the Monte-Carlo Method
Nonlinear Stochastic Optimization by the Monte-Carlo Method
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
 
A new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributionsA new implementation of k-MLE for mixture modelling of Wishart distributions
A new implementation of k-MLE for mixture modelling of Wishart distributions
 
Phase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle SystemsPhase-Type Distributions for Finite Interacting Particle Systems
Phase-Type Distributions for Finite Interacting Particle Systems
 
Statistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data LanguageStatistics and Data Mining with Perl Data Language
Statistics and Data Mining with Perl Data Language
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

K-adaptive partitioning for survival data

  • 1. Introduction Proposed method Simulation studies Real example Conclusion References K-Adaptive Partitioning for Survival Data with an Application to SEER Soo-Heang Eo with Sungwan Bang, Seung-Mo Hong, HyungJun Cho Department of Statistics Korea University April 25, 2014
  • 2. Introduction Proposed method Simulation studies Real example Conclusion References Table of Contents Introduction Motivation example Previous studies Proposed method Finding the best split set Finding optimal number of subgroups Simulation studies Simulation design Simulation results Real example Conclusion References
  • 3. Introduction Proposed method Simulation studies Real example Conclusion References Cancer staging cancer staging system that describes the extent of cancer in a patient’s body. [2]
  • 4. Introduction Proposed method Simulation studies Real example Conclusion References TNM staging system The TNM Classification of Malignant Tumors (TNM) is a cancer staging system that describes the extent of cancer in a patient’s body. [2] T describes the size of the tumor and whether it has invaded nearby tissue, N describes distant metastasis lymph node (spread of cancer from one body part to another). M describes regional lymph nodes that are involved,
  • 5. Introduction Proposed method Simulation studies Real example Conclusion References Previous studies • Hilsenbeck and Clark (1996) concerned simulation studies to examine the effects of number of cutpoints and true marker prognostic effect size [3]. • Contal and O’Quigley (1999) proposed the asymptotic distribution of a re-scaled rank statistic [1]. • Lausen et al. (1994) and Hong et al. (2007) employed decision tree methodology [8, 4]. • Hothorn and Lausen (2003) used maximally selected rank statistics [6] to find the best cutpoints and extended to CTree [5]. • Ishwaran et al. (2009) adopted the idea of random survival forests
  • 6. Introduction Proposed method Simulation studies Real example Conclusion References Problem is ... | Node 8 Node 9 Node 20 Node 21 Node 11 Node 6 Node 7 n= 36086 n= 7331 n= 4693 n= 3288 n= 4401 n= 6183 n= 3204 med= 115 med= 80 med= 57 med= 47 med= 38 med= 23 med= 13 meta meta meta meta meta meta <=5 <=1 <=0 <=3 <=2 <=11 Node 1 Node 2 Node 4 Node 5 Node 10 Node 3 n= 65186 n= 55799 n= 43417 n= 12382 n= 7981 n= 9387 med= 77 med= 95 med= 110 med= 47 med= 53 med= 19 Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 Node 8 (# of Meta = 0) Node 9 (# of Meta = 1) Node 20 (# of Meta = 2) Node 21 (# of Meta = 3) Node 11 (# of Meta = 4,5) Node 6 (# of Meta = 6,7,...,11) Node 7 (# of Meta = 12,...,80)
  • 7. Introduction Proposed method Simulation studies Real example Conclusion References Problem is ... (a) TSK = 148 p<0.0001 (b) TSK = 132 p<0.0001
  • 8. Introduction Proposed method Simulation studies Real example Conclusion References Our aim is to • Divide the whole data D into K heterogeneous subgroups D1, . . . , DK based on the information of X . • Overcome the limitation that some subgroups differ substantially in survival, but others may differ barely or insignificantly. • Evaluate multi-way split points simultaneously and find an optimal set of cutpoints. • Implement the proposed algorithm into an R package kaps.
  • 9. Introduction Proposed method Simulation studies Real example Conclusion References Finding the best split set Let Ti be a survival time, Ci a censoring status, and Xi be an ordered covariate for the ith observation. We observe the triples (Yi , δi , Xi ) and define Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ), which represent the observed response variable and the censoring indicator, respectively.
  • 10. Introduction Proposed method Simulation studies Real example Conclusion References Finding the best split set Let χ2 1 be the χ2 statistic with one degree of freedom (df) for comparing the gth and hth of K subgroups created by a split set sK when K is given. For a split set sK of D into D1, D2, . . . , DK , the test statistic for a measure of deviance can be defined as T1(sK ) = min 1≤g<h≤K χ2 1 for sK ∈ SK , (1) where SK is a collection of split sets sK generating K disjoint subgroups.
  • 11. Introduction Proposed method Simulation studies Real example Conclusion References Finding the best split set Then, take s∗ K as the best split set such that T∗ 1(s∗ K ) = max sK ∈SK T1(sK ). (2) The best split s∗ K is a set of (K − 1) cutpoints which clearly separate the data D into K disjoint subsets of the data: D1, D2, . . . , DK .
  • 12. Introduction Proposed method Simulation studies Real example Conclusion References Finding the best split set - Illustration • We assumed that the patients were divided into 3 heterogeneous groups by the number of lymph nodes (X = {0, 1, . . . , 6}). • Each group was categorised as D1, D2 and D3. • 3 pairs of groups (D1vs.D2, D2vs.D3, and D1vs.D3). • We imagined existing 3 cut-off points candidates, each of which were composed of two cutpoints; {0,1}, {0,2}, and {1,2}. • For example, the cutpoints candidate {0, 1} mean that D1 = {X = 0}, D2 = {0 < X ≤ 1}, and D3 = {X > 1}. • Out of the 3 pairs of the candidates, the smallest test statistic was selected as a representative statiatic for the candidate.
  • 13. Introduction Proposed method Simulation studies Real example Conclusion References Finding the best split set - Illustration
  • 14. Introduction Proposed method Simulation studies Real example Conclusion References Finding the best split set
  • 15. Introduction Proposed method Simulation studies Real example Conclusion References Algorithm 1. Finding the best split set for given K Step 1: Compute chi-squared test statistics χ2 1 for all possible pairs, g and h, of K subgroups by sK , where 1 ≤ g < h ≤ K and sK is a split set of (K − 1) cutpoints generating K disjoint subgroups. Step 2: Obtain the minimum pairwise statistic T1(sK ) by minimizing χ2 1 for all possible pairs, i.e., T1(sK ) = min1≤g<h≤K χ2 1 for sK ∈ SK ,where SK is a collection of split sets sK generating K disjoint subsets of the data. Step 3: Repeat Steps 1 and 2 for all possible split sets SK . Step 4: Take the best split set s∗ K such that T∗ 1(s∗ K ) = maxsK ∈SK T1(sK ). When two or more split sets have the maximum T∗ 1 of the minimum pairwise statistics, choose the best split set with the largest overall statistic T∗ K−1 .
  • 16. Introduction Proposed method Simulation studies Real example Conclusion References Finding optimal subgroups • One of the important issues is to determine a reasonable number of subgroups, i.e. the selection of an optimal K. • We find an optimal multi-way split at a time for the given number of subgroups. • We need to choose only one of a possible number of subgroups. • For a data-driven objective choice, we here suggest a statistical procedure to choose an optimal number of subgroups.
  • 17. Introduction Proposed method Simulation studies Real example Conclusion References Finding optimal subgroups The minimum pairwise statistics of the permuted data pK = R r=1 I (T (r) 1 (s∗ K ) ≥ T∗ 1(s∗ K ))/R, K = 2, 3, . . . , where T (r) 1 (s∗ K ) is the rth repeated minimum pairwise statistic for the permuted data. • The data can be reconstructed by matching their labels after permuting the labels of X with retaining the labels of (Y , δ). • When the permuted data are allocated into each subgroup by s∗ K , there should be no significant differences in survival among the subgroups.
  • 18. Introduction Proposed method Simulation studies Real example Conclusion References Finding optimal subgroups We choose the largest number to discover as many significantly different subgroups as possible, given that the corrected p-values are smaller than or equal to a pre-determined significance level ˆK = max{K|pc K ≤ α, K = 2, 3, . . .}, where pC K is the corrected p-values for multiple comparison because there are (K − 1) comparisons between two adjacent subgroups
  • 19. Introduction Proposed method Simulation studies Real example Conclusion References Finding optimal subgroups - Illustration Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 G1 G2 (a) K = 2 TSK = 26.37 TS1 = 26.37 pK <0.0001 p1 = <0.0001 Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 G1 G2 G3 (b) K = 3 TSK = 36.80 TS1 = 7.2 pK <0.0001 p1 = 0.0073 Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 G1 G2 G3 G4 (c) K = 4 TSK = 38.00 TS1 = 1.895 pK <0.0001 p1 = 0.1686
  • 20. Introduction Proposed method Simulation studies Real example Conclusion References Finding optimal subgroups - Illustration meta Survivalmonths 0 10 20 30 1248729612014 Survival months Survivalprobability 0.00.20.40.6 0 24 48 72 96 120 144 168 192 G3 K p-valuesofXk 2 3 4 0.00.10.20.30.40.5 K p-valuesofX1 2 3 4 0.00.20.40.60.81.0
  • 21. Introduction Proposed method Simulation studies Real example Conclusion References Algorithm 2. Selecting the optimal number of subgroups Step 1: Find s∗ K and T1(s∗ K ) with the raw data for each K using Algorithm 1. Step 2: Construct the permuted data by permuting the labels of X whilst retaining the labels of (Y , δ). Step 3: Allocate the permuted data into each subgroup by s∗ K . Step 4: Obtain the minimum pairwise statistic T (r) 1 (s∗ K ) for the permuted data. Step 5: Repeat steps 2 to 4 R times, and then obtain T (1) 1 (s∗ K ), T (2) 1 (s∗ K ), . . . , T (R) 1 (s∗ K ). Step 6: Compute the permutation p-value pK for each K, i.e., pK = R r=1 I (T (r) 1 (s∗ K ) ≥ T1(s∗ K ))/R, K = 2, 3, . . . . Step 7: Correct the permutation p-value pK by correcting for multiple comparisons, e.g., corrected p-value pc K = pK /(K − 1), K = 2, 3, . . .. Step 8: Select the largest K when the corrected p-values are less than or equal to α, i.e., ˆK = max{K| pc K ≤ α, K = 2, 3, . . .}.
  • 22. Introduction Proposed method Simulation studies Real example Conclusion References Simulation Simulation study: kaps vs. binary split (rpart and ctree) Design: • survival time Ti ∼ exp(λi ) • censoring time Ci ∼ exp(c0) • response Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ), where i = 1, 2, . . . , n • a prognostic factor Xi ∼ DU (1, 20) • KAPS vs. Survival CART [9] and CTREE [5] • training and test samples: 200 observations each with censoring rates of 15% or 30% • repeated 100 times independently
  • 23. Introduction Proposed method Simulation studies Real example Conclusion References Setting 1 - stepwise model We consider the following stepwise model (SM) defining parameter λi as follows. λi =    0.02, Xi ≤ 7, 0.04, 7 < Xi ≤ 14, 0.08, 14 < Xi , (3) This model has three different hazard rates that are distinguished by two cutpoints 7 and 14.
  • 24. Introduction Proposed method Simulation studies Real example Conclusion References Setting 2 - linear model In addition, we consider the following linear model (LM) defining parameter λi as follows. λi = 0.1Xi . (4) In this model, λi depends on Xi linearly. It follows that Yi depends on Xi nonlinearly. This model has a number of different hazard rates.
  • 25. Introduction Proposed method Simulation studies Real example Conclusion References Results of the stepwise model (SM) Selection of cut-off points when K is known Cutpoint Proportion(%) 0 5 10 15 20 0102030405060 kaps Cutpoint Proportion(%) 0 5 10 15 20 0102030405060 ctree Cutpoint Proportion(%) 0 5 10 15 20 0102030405060 rpart 0 5 10 15 20 05101520 C1 C2 kaps 0 5 10 15 20 05101520 C1 C2 ctree 0 5 10 15 20 05101520 C1 C2 rpart
  • 26. Introduction Proposed method Simulation studies Real example Conclusion References Results of the stepwise model (SM) Selection of the optimal number of subgroups (K = 3) Number of subgroups Proportion(%) 020406080 1 2 3 4 1 2 3 4 1 2 3 4 rpart ctree kaps (a) SM model with CR = 15% Number of subgroups Proportion(%) 020406080 1 2 3 4 1 2 3 4 1 2 3 4 rpart ctree kaps (b) SM model with CR = 30%
  • 27. Introduction Proposed method Simulation studies Real example Conclusion References Results of the linear model (LM) Selection of cut-off points when K is unknown Model K Method CR = 15% CR = 30% Overall Pairwise Overall Pairwise rpart 43.57 (1.17) 43.57 (1.17) 38.73 (1.03) 38.73 (1.03) 2 ctree 38.94 (1.16) 38.94 (1.16) 36.29 (1.01) 36.29 (1.01) kaps 44.52 (1.15) 44.52 (1.15) 38.74 (1.11) 38.74 (1.11) rpart 48.07 (1.47) 6.87 (0.50) 43.21 (1.11) 6.32 (0.44) LM 3 ctree 55.64 (1.39) 12.94 (1.14) 47.99 (1.40) 8.33 (0.64) kaps 54.96 (1.39) 13.83 (0.63) 47.95 (1.36) 11.33 (0.59) rpart 59.60 (1.59) 2.59 (0.26) 53.01 (1.33) 2.30 (0.22) 4 ctree 59.82 (1.35) 3.17 (0.37) 52.05 (1.34) 2.10 (0.21) kaps 61.27 (1.39) 3.22 (0.24) 53.48 (1.34) 2.70 (0.23)
  • 28. Introduction Proposed method Simulation studies Real example Conclusion References Results of the linear model (LM) Selection of the optimal number of subgroups (K =?) Number of subgroups Proportion(%) 010203040506070 2 3 4 5 2 3 4 5 2 3 4 5 rpart ctree kaps (a) LM model with CR = 15% Number of subgroups Proportion(%) 020406080 2 3 4 5 2 3 4 5 2 3 4 5 rpart ctree kaps (b) LM model with CR = 30%
  • 29. Introduction Proposed method Simulation studies Real example Conclusion References Surveillance, Epidemiology, and End Results Program
  • 30. Introduction Proposed method Simulation studies Real example Conclusion References Surveillance, Epidemiology, and End Results Program • Population-based database by NCI and CDC • Collects data on cancer cases from various locations and sources throughout the US • Data collection began in 1973 with a limited amount of registries and continues to expand to include even more areas and demographics today
  • 31. Introduction Proposed method Simulation studies Real example Conclusion References Colon cancer in SEER database
  • 32. Introduction Proposed method Simulation studies Real example Conclusion References Colon cancer analysis Table: Numbers of metastasis lymph nodes for the staging systems by AJCC, kaps, ctree, and rpart Subgroup AJCC rpart ctree kaps Subgroup 1 0 0 0 0 Subgroup 2 1 1,2,3 1 1 Subgroup 3 2,3 4 ∼ 10 2,3 2,3 Subgroup 4 4,5,6 ≥ 11 4,5 4,5,6 Subgroup 5 ≥7 — 6,7,8 7,8,9,10 Subgroup 6 — — ≥9 ≥ 11 Min. pairwise statistic 131.23 932.30 78.35 131.23 Corresponding pair (2, 3) (3, 4) (3, 4) (2, 3)
  • 33. Introduction Proposed method Simulation studies Real example Conclusion References Colon cancer analysis Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 Subgroup 1 (No LN) Subgroup 2 (1 LN) Subgroup 3 (2-3 LNs) Subgroup 4 (4-6 LNs) Subgroup 5 (7-10 LNs) Subgroup 6 (>=11 LNs) (a) kaps Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 Subgroup 1 (No LN) Subgroup 2 (1 LN) Subgroup 3 (2-3 LNs) Subgroup 4 (4-6 LNs) Subgroup 5 (>=7 LNs) (b) AJCC
  • 34. Introduction Proposed method Simulation studies Real example Conclusion References Colon cancer analysis Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 Subgroup 1 (No LN) Subgroup 2 (1 LN) Subgroup 3 (2-3 LNs) Subgroup 4 (4-6 LNs) Subgroup 5 (7-10 LNs) Subgroup 6 (>=11 LNs) (a) kaps Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 Subgroup 1 (No LN) Subgroup 2 (1-3 LNs) Subgroup 3 (4-10 LNs) Subgroup 4 (>=11 LNs) (b) rpart
  • 35. Introduction Proposed method Simulation studies Real example Conclusion References Colon cancer analysis Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 Subgroup 1 (No LN) Subgroup 2 (1 LN) Subgroup 3 (2-3 LNs) Subgroup 4 (4-6 LNs) Subgroup 5 (7-10 LNs) Subgroup 6 (>=11 LNs) (a) kaps Survival months Survivalprobability 0.00.20.40.60.81.0 0 24 48 72 96 120 144 168 192 Subgroup 1 (No LN) Subgroup 2 (1 LN) Subgroup 3 (2-3 LNs) Subgroup 4 (4-5 LNs) Subgroup 5 (6-8 LNs) Subgroup 6 (>=9 LNs) (b) ctree
  • 36. Introduction Proposed method Simulation studies Real example Conclusion References Conclusion We propose a multi-way partitioning algorithm for survival data, • Divide the data into K heterogeneous subgroups based on the information of a prognostic factor. • Generate only fairly well-separated subgroups rather than a mixture of extremely poorly and well-separated subgroups. • Identify a multi-way partition which maximizes the minimum of the pairwise test statistics among subgroups. • Utilize a permutation test to consists of tow or more cutpoints. • Implement an add-on R package into kaps available at CRAN.
  • 37. Introduction Proposed method Simulation studies Real example Conclusion References Conclusion • HJ Kang*, S-H Eo*, SC Kim, KM Park, YJ Lee, SK Lee, E Yu, H Cho, and S-M Hong (2014). Increased Number of Metastatic Lymph Nodes in Adenocarcinoma of the Ampulla of Vater as a Prognostic Factor: A Proposal of New Nodal Classification, Surgery, 155:1, 74-84. (*:co-first author) • S-H Eo, S-M Hong, and H Cho (2014). K-adaptive partitioning for survival data: the kaps add-on package for R, arXiv, 1306.4015v2. • S-H Eo, S-M Hong, and H Cho. K-adaptive partitioning for survival data, submitted to Statistics in Medicine. • Park et al. Survival effect of tumor size and extrapancreatic extension in surgically resected pancreas cancer patients: Proposal for improved T classification, submitted to Surgery.
  • 38. Introduction Proposed method Simulation studies Real example Conclusion References C´ecile Contal and John O’Quigley. An application of changepoint methods in studying the effect of age on survival in breast cancer. Computational Statistics and Data Analysis, 30:253–270, 1999. S.B. Edge, D.R. Byrd, C.C. Compton, A.G. Fritz, F.L. Greene, and A. 3rd Trotti. AJCC Cancer staging manual. Springer, New York, 2010. Susan Galloway Hilsenbeck and Gary M Clark. Practical p-value adjustment for optimally selected cutpoints. Statistics in Medicine, 15(1):103–112, 1996. SM Hong, H Cho, C. Moskaluk, and E Yu. Measurement of the invasion depth of extrahepatic bile duct carcinoma: An alternative method overcoming the current t classification problems of the ajcc staging system. American Journal of Surgical Pathology, 31:199–206, 2007. Torsten Hothorn, Kurt Hornik, and Achim Zeileis. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 2006. Torsten Hothorn and Berthold Lausen. On the exact distribution of maximally selected rank statistics. Computational Statistics and Data Analysis, 43:121–137, 2003. Hemant Ishwaran, Eugene H Blackstone, Carolyn Apperson-Hansen, and Thomas W Rice. A novel approach to cancer staging: application to esophageal cancer. Biostatistics, 10(4):603–620, 2009.
  • 39. Introduction Proposed method Simulation studies Real example Conclusion References Berthold Lausen, Willi Sauerbrei, and Martin Schumacher. Classification and regression trees (cart) used for the exploration of prognostic factors measured on different scales. In P Dirschedl and R Ostermann, editors, Computational Statistics. Physica Verlag, Heidelberg, 1994. Michael LeBlanc and John Crowley. Relative risk trees for censored survival data. Biometrics, pages 411–425, 1992.