K-adaptive partitioning for survival data

Introduction Proposed method Simulation studies Real example Conclusion References
K-Adaptive Partitioning for Survival Data
with an Application to SEER
Soo-Heang Eo
with Sungwan Bang, Seung-Mo Hong, HyungJun Cho
Department of Statistics
Korea University
April 25, 2014

Table of Contents
Introduction
Motivation example
Previous studies
Proposed method
Finding the best split set
Finding optimal number of subgroups
Simulation studies
Simulation design
Simulation results
Real example
Conclusion
References

Cancer staging
cancer staging system that describes the extent of cancer in a
patient’s body. [2]

TNM staging system
The TNM Classiﬁcation of Malignant Tumors (TNM) is a cancer
staging system that describes the extent of cancer in a patient’s
body. [2]
T describes the size of the tumor and whether it has invaded
nearby tissue,
N describes distant metastasis lymph node (spread of
cancer from one body part to another).
M describes regional lymph nodes that are involved,

Previous studies
• Hilsenbeck and Clark (1996) concerned simulation studies to
examine the effects of number of cutpoints and true marker
prognostic effect size [3].
• Contal and O’Quigley (1999) proposed the asymptotic distribution
of a re-scaled rank statistic [1].
• Lausen et al. (1994) and Hong et al. (2007) employed decision
tree methodology [8, 4].
• Hothorn and Lausen (2003) used maximally selected rank
statistics [6] to find the best cutpoints and extended to CTree [5].
• Ishwaran et al. (2009) adopted the idea of random survival forests

Problem is ...
|
Node 8 Node 9
Node 20 Node 21
Node 11
Node 6 Node 7
n= 36086 n= 7331
n= 4693 n= 3288
n= 4401
n= 6183 n= 3204
med= 115 med= 80
med= 57 med= 47
med= 38
med= 23 med= 13
meta
meta
meta meta
meta
meta
<=5
<=1
<=0 <=3
<=2
<=11
Node 1
Node 2
Node 4 Node 5
Node 10
Node 3
n= 65186
n= 55799
n= 43417 n= 12382
n= 7981
n= 9387
med= 77
med= 95
med= 110 med= 47
med= 53
med= 19
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Node 8 (# of Meta = 0)
Node 11 (# of Meta = 4,5)
Node 6 (# of Meta = 6,7,...,11)
Node 7 (# of Meta = 12,...,80)

Problem is ...
(a) TSK = 148
p<0.0001
(b) TSK = 132
p<0.0001

Our aim is to
• Divide the whole data D into K heterogeneous subgroups
D1, . . . , DK based on the information of X .
• Overcome the limitation that some subgroups differ
substantially in survival, but others may differ barely or
insignificantly.
• Evaluate multi-way split points simultaneously and find an
optimal set of cutpoints.
• Implement the proposed algorithm into an R package kaps.

Let Ti be a survival time, Ci a censoring status, and Xi be an
ordered covariate for the ith observation. We observe the triples
(Yi , δi , Xi ) and deﬁne
Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ),
which represent the observed response variable and the censoring
indicator, respectively.

Let χ2
1 be the χ2 statistic with one degree of freedom (df) for
comparing the gth and hth of K subgroups created by a split set
sK when K is given. For a split set sK of D into D1, D2, . . . , DK ,
the test statistic for a measure of deviance can be deﬁned as
T1(sK ) = min
1≤g<h≤K
χ2
1 for sK ∈ SK , (1)
where SK is a collection of split sets sK generating K disjoint
subgroups.

Then, take s∗
K
as the best split set such that
T∗
1(s∗
K ) = max
sK ∈SK
T1(sK ). (2)
The best split s∗
K
is a set of (K − 1) cutpoints which clearly
separate the data D into K disjoint subsets of the data:
D1, D2, . . . , DK .

Finding the best split set - Illustration
• We assumed that the patients were divided into 3 heterogeneous
groups by the number of lymph nodes (X = {0, 1, . . . , 6}).
• Each group was categorised as D1, D2 and D3.
• 3 pairs of groups (D1vs.D2, D2vs.D3, and D1vs.D3).
• We imagined existing 3 cut-oﬀ points candidates, each of which
were composed of two cutpoints; {0,1}, {0,2}, and {1,2}.
• For example, the cutpoints candidate {0, 1} mean that
D1 = {X = 0}, D2 = {0 < X ≤ 1}, and D3 = {X > 1}.
• Out of the 3 pairs of the candidates, the smallest test statistic was
selected as a representative statiatic for the candidate.

Finding the best split set - Illustration

Algorithm 1. Finding the best split set for given K
Step 1: Compute chi-squared test statistics χ2
1 for all possible pairs, g and h, of K
subgroups by sK , where 1 ≤ g < h ≤ K and sK is a split set of (K − 1)
cutpoints generating K disjoint subgroups.
Step 2: Obtain the minimum pairwise statistic T1(sK ) by minimizing χ2
1 for all
possible pairs, i.e., T1(sK ) = min1≤g<h≤K χ2
1 for sK ∈ SK ,where SK is a
collection of split sets sK generating K disjoint subsets of the data.
Step 3: Repeat Steps 1 and 2 for all possible split sets SK .
Step 4: Take the best split set s∗
K
such that T∗
1(s∗
K
) = maxsK ∈SK
T1(sK ). When two
or more split sets have the maximum T∗
1 of the minimum pairwise statistics,
choose the best split set with the largest overall statistic T∗
K−1
.

Finding optimal subgroups
• One of the important issues is to determine a reasonable
number of subgroups, i.e. the selection of an optimal K.
• We ﬁnd an optimal multi-way split at a time for the given
number of subgroups.
• We need to choose only one of a possible number of
subgroups.
• For a data-driven objective choice, we here suggest a
statistical procedure to choose an optimal number of
subgroups.

The minimum pairwise statistics of the permuted data
pK =
R
r=1
I (T
(r)
1 (s∗
K ) ≥ T∗
1(s∗
K ))/R, K = 2, 3, . . . ,
where T
(r)
1 (s∗
K
) is the rth repeated minimum pairwise statistic for
the permuted data.
• The data can be reconstructed by matching their labels after
permuting the labels of X with retaining the labels of (Y , δ).
• When the permuted data are allocated into each subgroup by
s∗
K
, there should be no signiﬁcant diﬀerences in survival
among the subgroups.

We choose the largest number to discover as many significantly
different subgroups as possible, given that the corrected p-values
are smaller than or equal to a pre-determined significance level
ˆK = max{K|pc
K ≤ α, K = 2, 3, . . .},
where pC
K
is the corrected p-values for multiple comparison because
there are (K − 1) comparisons between two adjacent subgroups

Finding optimal subgroups - Illustration
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
G1
G2
(a) K = 2
TSK = 26.37
TS1 = 26.37
pK <0.0001
p1 = <0.0001
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
G1
G2
G3
(b) K = 3
TSK = 36.80
TS1 = 7.2
pK <0.0001
p1 = 0.0073
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
G1
G2
G3
G4
(c) K = 4
TSK = 38.00
TS1 = 1.895
pK <0.0001
p1 = 0.1686

Finding optimal subgroups - Illustration
meta
Survivalmonths
0 10 20 30
1248729612014
Survival months
Survivalprobability
0.00.20.40.6
0 24 48 72 96 120 144 168 192
G3
K
p-valuesofXk
2 3 4
0.00.10.20.30.40.5
K
p-valuesofX1
2 3 4
0.00.20.40.60.81.0

Algorithm 2. Selecting the optimal number of subgroups
Step 1: Find s∗
K
and T1(s∗
K
) with the raw data for each K using Algorithm 1.
Step 2: Construct the permuted data by permuting the labels of X whilst retaining
the labels of (Y , δ).
Step 3: Allocate the permuted data into each subgroup by s∗
K
.
Step 4: Obtain the minimum pairwise statistic T
(r)
1 (s∗
K
) for the permuted data.
Step 5: Repeat steps 2 to 4 R times, and then obtain
T
(1)
1 (s∗
K
), T
(2)
1 (s∗
K
), . . . , T
(R)
1 (s∗
K
).
Step 6: Compute the permutation p-value pK for each K, i.e.,
pK = R
r=1 I (T
(r)
1 (s∗
K
) ≥ T1(s∗
K
))/R, K = 2, 3, . . . .
Step 7: Correct the permutation p-value pK by correcting for multiple comparisons,
e.g., corrected p-value pc
K
= pK /(K − 1), K = 2, 3, . . ..
Step 8: Select the largest K when the corrected p-values are less than or equal to α,
i.e., ˆK = max{K| pc
K
≤ α, K = 2, 3, . . .}.

Simulation
Simulation study: kaps vs. binary split (rpart and ctree)
Design:
• survival time Ti ∼ exp(λi )
• censoring time Ci ∼ exp(c0)
• response Yi = min(Ti , Ci ) and δi = I (Ti ≤ Ci ), where i = 1, 2, . . . , n
• a prognostic factor Xi ∼ DU (1, 20)
• KAPS vs. Survival CART [9] and CTREE [5]
• training and test samples: 200 observations each with censoring rates of
15% or 30%
• repeated 100 times independently

Setting 1 - stepwise model
We consider the following stepwise model (SM) deﬁning parameter
λi as follows.
λi =



0.02, Xi ≤ 7,
0.04, 7 < Xi ≤ 14,
0.08, 14 < Xi ,
(3)
This model has three diﬀerent hazard rates that are distinguished
by two cutpoints 7 and 14.

Setting 2 - linear model
In addition, we consider the following linear model (LM) deﬁning
parameter λi as follows.
λi = 0.1Xi . (4)
In this model, λi depends on Xi linearly.
It follows that Yi depends on Xi nonlinearly.
This model has a number of diﬀerent hazard rates.

Results of the stepwise model (SM)
Selection of cut-oﬀ points when K is known
Cutpoint
Proportion(%)
0 5 10 15 20
0102030405060
kaps
Cutpoint
Proportion(%)
0 5 10 15 20
0102030405060
ctree
Cutpoint
Proportion(%)
0 5 10 15 20
0102030405060
rpart
0 5 10 15 20
05101520
C1
C2
kaps
0 5 10 15 20
05101520
C1
C2
ctree
0 5 10 15 20
05101520
C1
C2
rpart

Results of the stepwise model (SM)
Selection of the optimal number of subgroups (K = 3)
Number of subgroups
Proportion(%)
020406080
1 2 3 4 1 2 3 4 1 2 3 4
rpart ctree kaps
(a) SM model with CR = 15%
Number of subgroups
Proportion(%)
020406080
1 2 3 4 1 2 3 4 1 2 3 4
rpart ctree kaps
(b) SM model with CR = 30%

Results of the linear model (LM)
Selection of cut-oﬀ points when K is unknown
Model K Method CR = 15% CR = 30%
Overall Pairwise Overall Pairwise
rpart 43.57 (1.17) 43.57 (1.17) 38.73 (1.03) 38.73 (1.03)
2 ctree 38.94 (1.16) 38.94 (1.16) 36.29 (1.01) 36.29 (1.01)
kaps 44.52 (1.15) 44.52 (1.15) 38.74 (1.11) 38.74 (1.11)
rpart 48.07 (1.47) 6.87 (0.50) 43.21 (1.11) 6.32 (0.44)
LM 3 ctree 55.64 (1.39) 12.94 (1.14) 47.99 (1.40) 8.33 (0.64)
kaps 54.96 (1.39) 13.83 (0.63) 47.95 (1.36) 11.33 (0.59)
rpart 59.60 (1.59) 2.59 (0.26) 53.01 (1.33) 2.30 (0.22)
4 ctree 59.82 (1.35) 3.17 (0.37) 52.05 (1.34) 2.10 (0.21)
kaps 61.27 (1.39) 3.22 (0.24) 53.48 (1.34) 2.70 (0.23)

Results of the linear model (LM)
Selection of the optimal number of subgroups (K =?)
Number of subgroups
Proportion(%)
010203040506070
2 3 4 5 2 3 4 5 2 3 4 5
rpart ctree kaps
(a) LM model with CR = 15%
Number of subgroups
Proportion(%)
020406080
2 3 4 5 2 3 4 5 2 3 4 5
rpart ctree kaps
(b) LM model with CR = 30%

Surveillance, Epidemiology, and End Results Program

Surveillance, Epidemiology, and End Results Program
• Population-based database by NCI and CDC
• Collects data on cancer cases from various locations and
sources throughout the US
• Data collection began in 1973 with a limited amount of
registries and continues to expand to include even more areas
and demographics today

Colon cancer in SEER database

Colon cancer analysis
Table: Numbers of metastasis lymph nodes for the staging systems by
AJCC, kaps, ctree, and rpart
Subgroup AJCC rpart ctree kaps
Subgroup 1 0 0 0 0
Subgroup 2 1 1,2,3 1 1
Subgroup 3 2,3 4 ∼ 10 2,3 2,3
Subgroup 4 4,5,6 ≥ 11 4,5 4,5,6
Subgroup 5 ≥7 — 6,7,8 7,8,9,10
Subgroup 6 — — ≥9 ≥ 11
Min. pairwise statistic 131.23 932.30 78.35 131.23
Corresponding pair (2, 3) (3, 4) (3, 4) (2, 3)

Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
Subgroup 3 (2-3 LNs)
Subgroup 6 (>=11 LNs)
(a) kaps
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
(b) AJCC

Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
(a) kaps
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
(b) rpart

Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
(a) kaps
Survival months
Survivalprobability
0.00.20.40.60.81.0
0 24 48 72 96 120 144 168 192
Subgroup 1 (No LN)
Subgroup 2 (1 LN)
(b) ctree

Conclusion
We propose a multi-way partitioning algorithm for survival data,
• Divide the data into K heterogeneous subgroups based on the
information of a prognostic factor.
• Generate only fairly well-separated subgroups rather than a
mixture of extremely poorly and well-separated subgroups.
• Identify a multi-way partition which maximizes the minimum
of the pairwise test statistics among subgroups.
• Utilize a permutation test to consists of tow or more
cutpoints.
• Implement an add-on R package into kaps available at CRAN.

Conclusion
• HJ Kang*, S-H Eo*, SC Kim, KM Park, YJ Lee, SK Lee, E Yu, H Cho,
and S-M Hong (2014). Increased Number of Metastatic Lymph Nodes in
Adenocarcinoma of the Ampulla of Vater as a Prognostic Factor: A
Proposal of New Nodal Classification, Surgery, 155:1, 74-84. (*:co-first
author)
• S-H Eo, S-M Hong, and H Cho (2014). K-adaptive partitioning for
survival data: the kaps add-on package for R, arXiv, 1306.4015v2.
• S-H Eo, S-M Hong, and H Cho. K-adaptive partitioning for survival
data, submitted to Statistics in Medicine.
• Park et al. Survival effect of tumor size and extrapancreatic extension in
surgically resected pancreas cancer patients: Proposal for improved T
classification, submitted to Surgery.

Cécile Contal and John O’Quigley.
An application of changepoint methods in studying the effect of age on survival in breast cancer.
Computational Statistics and Data Analysis, 30:253–270, 1999.
S.B. Edge, D.R. Byrd, C.C. Compton, A.G. Fritz, F.L. Greene, and A. 3rd Trotti.
AJCC Cancer staging manual.
Springer, New York, 2010.
Susan Galloway Hilsenbeck and Gary M Clark.
Practical p-value adjustment for optimally selected cutpoints.
Statistics in Medicine, 15(1):103–112, 1996.
SM Hong, H Cho, C. Moskaluk, and E Yu.
Measurement of the invasion depth of extrahepatic bile duct carcinoma: An alternative method overcoming
the current t classification problems of the ajcc staging system.
American Journal of Surgical Pathology, 31:199–206, 2007.
Torsten Hothorn, Kurt Hornik, and Achim Zeileis.
Unbiased recursive partitioning: A conditional inference framework.
Journal of Computational and Graphical Statistics, 15(3), 2006.
Torsten Hothorn and Berthold Lausen.
On the exact distribution of maximally selected rank statistics.
Computational Statistics and Data Analysis, 43:121–137, 2003.
Hemant Ishwaran, Eugene H Blackstone, Carolyn Apperson-Hansen, and Thomas W Rice.
A novel approach to cancer staging: application to esophageal cancer.
Biostatistics, 10(4):603–620, 2009.

Berthold Lausen, Willi Sauerbrei, and Martin Schumacher.
Classiﬁcation and regression trees (cart) used for the exploration of prognostic factors measured on diﬀerent
scales.
In P Dirschedl and R Ostermann, editors, Computational Statistics. Physica Verlag, Heidelberg, 1994.
Michael LeBlanc and John Crowley.
Relative risk trees for censored survival data.
Biometrics, pages 411–425, 1992.

K-adaptive partitioning for survival data

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to K-adaptive partitioning for survival data

Similar to K-adaptive partitioning for survival data (20)

Recently uploaded

Recently uploaded (20)

K-adaptive partitioning for survival data