SlideShare a Scribd company logo
1 of 36
Download to read offline
Biostatistics – (SA408/SE408)
Spring 2023
Instructor
Dr. Heba Ahmed Emera
Chapter 6
Survival Analysis (Cont.)
Kaplan-Meier
estimator (Cont.)
Kaplan-Meier estimator
• The Kaplan–Meier approach, also called the product-limit approach, is a very
popular method that re-estimates the survival probability at each time an event occurs.
• This estimator is the product over the failure times of the conditional probabilities of
surviving to the next failure time.
• There are several assumptions for appropriate use of the Kaplan–Meier approach.
Specifically, we assume that (1) censoring is independent of the probabilities of
developing the event of interest and that, (2) survival probabilities are comparable in
participants who are recruited earlier as well as later into the study. (3) when
comparing several groups, it is also important that these assumptions are satisfied in
each comparison group so that, for example, censoring is not more likely in one group
than in another.
First, We define the following notation to be used in deriving the Kaplan-Meier estimator.
• Let t0 < t1 < t2 < …< tk, represents the observed failure times as well as the censored times in a
sample of size n = n0 (where n0 is the number of participants at the baseline).
• dj : the failure cases at tj.
• cj : the censored cases during the interval [tj ,tj+1).
• 𝑛𝑗: the number of individuals at risk just prior to tj,. Where, this number is computed as:
𝑛𝑗 = 𝑛𝑗−1 − (𝑑𝑗−1+𝑐𝑗−1).
• The probability of surviving at the jth interval is estimated as Ƹ
𝑝𝑗 =
𝑛𝑗−𝑑𝑗
𝑛𝑗
= 1 −
𝑑𝑗
𝑛𝑗
.
• The probability of surviving up to tj is the product of the probabilities of surviving all the
intervals up to the jth interval.
• The survivor function is then estimated by: መ
𝑆𝑡 = ෑ
𝑗 | 𝑡𝑗<𝑡
𝑗
𝑛𝑗 − 𝑑𝑗
𝑛𝑗
= መ
𝑆𝑡−1 ×
𝑛𝑡 − 𝑑𝑡
𝑛𝑡
Kaplan-Meier estimator
Kaplan-Meier estimator - Example-1
Consider a small prospective cohort study designed to study time to death. The study involves participants
who are 65+ years of age who are followed for up to 24 years. Twenty participants are followed until they
die, until the study ends, or until they drop out of the study. Data obtained from the study are presented in
the following table (year at which the subject lost to follow-up indicated by (+)).
1. Derive the Kaplan-Meier estimate of survivor function at times of failure, also compute the standard
errors of those estimates.
2. Draw the Kaplan-Meier survivor function curve. Based on the curve, find: (a) the probability that a
participant survives past 10 years, (b) the minimum no. of years at which 75% of participants will
survive, and (c) the estimate of median survival time.
Participant ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Year of death/ last
contact
24+ 3 11+ 19+ 24 13 14 2+ 18 17+ 24 21+ 12 1 10+ 23 6+ 5 9+ 17
Kaplan-Meier estimator – Example-1
event plot
Kaplan-Meier estimator – Example-1
Years to death: 1, 2+, 3, 5, 6+, 9+, 10+, 11+, 12, 13, 14, 17+, 17, 18, 19+, 21+, 23, 24+,
24, 24
(+ indicates censored cases)
This can be easily obtained using R as follows:
Note that, data is re-organized ascendingly,
with indicating to the censored cases to ease
constructing the Kaplan-Meier table to get the
survival estimates.
Kaplan-Meier estimator – Example-1
Note that, in Kaplan-Meier table, computation will be displayed only at failure times
(not the censored) where survival estimates are constant at censored times.
Time
Number alive
(people at risk)
at prior to tj
Number of
deaths
(failure) at tj
Number of
censored
during the
interval
[tj ,tj+1)
Proportion
surviving at tj
Survival
propability
estimates up
to tj
tj nt dt ct pt= (nt-dt)/nt St = pj * St-1
0 20 0 0 1.000 1
1 20 1 1 0.950 0.950 0.003 0.003 0.0487
3 18 1 0 0.944 0.897 0.003 0.006 0.0689
5 17 1 4 0.941 0.844 0.004 0.010 0.0826
12 12 1 0 0.917 0.774 0.008 0.017 0.1014
13 11 1 0 0.909 0.704 0.009 0.026 0.1140
14 10 1 0 0.900 0.633 0.011 0.037 0.1224
17 9 1 1 0.889 0.563 0.014 0.051 0.1274
18 7 1 2 0.857 0.483 0.024 0.075 0.1322
23 4 1 0 0.750 0.362 0.083 0.158 0.1440
24 3 2 1 0.333 0.121 0.667 0.825 0.1096
11 9
(1-pj)/ (nj * pj) SE(St)
S(1-pj)/ (nj * pj)
𝑛𝑗 = 𝑛𝑗−1 − (𝑑𝑗−1+𝑐𝑗−1).
Ƹ
𝑝𝑗 = 1 −
𝑑𝑗
𝑛𝑗
.
Kaplan-Meier estimator – Example-1 (Cont.)
• K-M plot drawn as a step function:
+: indicate where censoring occurred.
Kaplan-Meier estimator Standard Error (SE)
Estimates:
▪ A popular formula to estimate the standard error of the survival estimates is
Greenwood’s formula.
▪ It is obtained the illustrated formula:
෢
𝑉𝑎𝑟( መ
𝑆𝑡) = መ
𝑆𝑡
2
෍
𝑗≤𝑡
1 − Ƹ
𝑝𝑗
𝑛𝑗 Ƹ
𝑝𝑗
where the sum in the above formula computed cumulatively across all time points
before the time point of interest.
▪ Also, (1-)% CI of 𝑆𝑡 is obtained as መ
𝑆𝑡 ± 𝑍𝛼/2
෢
𝑆𝐸( መ
𝑆𝑡).
▪ Unfortunately, confidence intervals computed based on the above variance may
extend above one or below zero. A more satisfying approach is to find confidence
intervals for the log-log transformation (only in R).
Greenwood’s formula
Kaplan-Meier estimator - Example-1 (Cont.)
Consider a small prospective cohort study designed to study time to death. The study
involves participants who are 65+ years of age who are followed for up to 24 years.
Twenty participants are followed until they die, until the study ends, or until they drop
out of the study. Data obtained from the study are presented in the following table (year
at which the subject lost to follow-up indicated by (+)).
1. Derive the Kaplan-Meier estimate of survivor function at times of failure, also compute the
standard errors of those estimates.
2. Draw the Kaplan-Meier survivor function curve. Based on the curve, find: (a) the probability
that a participant survives past 10 years, (b) the minimum no. of years at which 75% of
participants will survive, and (c) the estimate of median survival time.
Participant ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Year of death/ last
contact
24+ 3 11+ 19+ 24 13 14 2+ 18 17+ 24 21+ 12 1 10+ 23 6+ 5 9+ 17
Kaplan-Meier estimator – Example-1(Cont.)
Note that, in Kaplan-Meier table, computation will be displayed only at failure times
(not the censored) where survival estimates are constant at censored times.
Time
Number alive
(people at risk)
at prior to tj
Number of
deaths
(failure) at tj
Number of
censored
during the
interval
[tj ,tj+1)
Proportion
surviving at tj
Survival
propability
estimates up
to tj
tj nt dt ct pt= (nt-dt)/nt St = pj * St-1
0 20 0 0 1.000 1
1 20 1 1 0.950 0.950 0.003 0.003 0.0487
3 18 1 0 0.944 0.897 0.003 0.006 0.0689
5 17 1 4 0.941 0.844 0.004 0.010 0.0826
12 12 1 0 0.917 0.774 0.008 0.017 0.1014
13 11 1 0 0.909 0.704 0.009 0.026 0.1140
14 10 1 0 0.900 0.633 0.011 0.037 0.1224
17 9 1 1 0.889 0.563 0.014 0.051 0.1274
18 7 1 2 0.857 0.483 0.024 0.075 0.1322
23 4 1 0 0.750 0.362 0.083 0.158 0.1440
24 3 2 1 0.333 0.121 0.667 0.825 0.1096
11 9
(1-pj)/ (nj * pj) SE(St)
S(1-pj)/ (nj * pj)
෢
𝑉𝑎𝑟( መ
𝑆𝑡)
= መ
𝑆𝑡
2
෍
𝑗≤𝑡
1 − Ƹ
𝑝𝑗
𝑛𝑗 Ƹ
𝑝𝑗
(1) (2) (3) (4) (5) (6) (7) (8)
Kaplan-Meier estimator – Example-1 (Cont.)
R output
Kaplan-Meier estimator – Example-1 (Cont.)
R output
Kaplan-Meier estimator – Example-1 (Cont.)
85%
13
18
75%
50%
Kaplan-Meier estimator - Example-2
Table shows the results of a clinical trial of a
treatment (drug 6-mercaptopurine or 6-MP)
versus a placebo in 42 children with acute
leukemia. Patients were followed until their
leukemia returned (relapse or go out of
remission) or until the end of the study.
1. Use the Kaplan-Meier method to estimate
the survival function for each group.
2. Draw the survival curve for each group.
Comment on the results.
Kaplan-Meier estimator - Example-2 (Cont.)
Group = Treatment
Time
Number alive
(people at risk)
at prior to tj
Number of
deaths
(failure) at
tj
Number of
censored
during the
interval
[tj ,tj+1)
Proportion
surviving at tj
Survival
propability
estimates up to
tj
tj nt dt ct pt= (nt-dt)/nt St = pj * St-1
0 21 0 0 1 1
6 21 3 1 0.8571 0.8571 0.008 0.008 0.0764
7 17 1 1 0.9412 0.8067 0.004 0.012 0.0869
10 15 1 2 0.9333 0.7529 0.005 0.016 0.0963
13 12 1 0 0.9167 0.6902 0.008 0.024 0.1068
16 11 1 3 0.9091 0.6275 0.009 0.033 0.1141
22 7 1 0 0.8571 0.5378 0.024 0.057 0.1282
23 6 1 5 0.8333 0.4482 0.033 0.090 0.1346
9 12
(1-pj)/ (nj * pj) S(1-pj)/ (nj * pj) SE(St)
Kaplan-Meier estimator - Example-2 (Cont.)
Group = Placebo ` `
Time
Number alive
(people at risk)
at prior to tj
Number of
deaths
(failure) at
tj
Number of
censored
during the
interval
[tj ,tj+1)
Proportion
surviving at tj
Survival
propability
estimates up to
tj
tj nt dt ct pt= (nt-dt)/nt St = pj * St-1
0 21 0 0 1.000 1
1 21 2 0 0.9048 0.9048 0.005 0.005 0.0641
2 19 2 1 0.8947 0.8095 0.006 0.011 0.0857
4 16 1 1 0.9375 0.7589 0.004 0.015 0.0941
5 14 1 1 0.9286 0.7047 0.005 0.021 0.1018
8 12 3 1 0.7500 0.5285 0.028 0.049 0.1166
11 8 1 1 0.8750 0.4625 0.018 0.067 0.1193
12 6 1 1 0.8333 0.3854 0.033 0.100 0.1218
15 4 1 0 0.7500 0.2890 0.083 0.183 0.1237
17 3 1 0 0.6667 0.1927 0.167 0.350 0.1140
22 2 1 1 0.5000 0.0963 0.500 0.850 0.0888
14 7
(1-pj)/ (nj * pj) S(1-pj)/ (nj * pj) SE(St)
Kaplan-Meier estimator - Example-2
(R output)
Kaplan-Meier estimator - Example-2 (Cont.)
Interpretation: Based on the figure, the
survival probabilities for the treatment
group are higher than the survival
probabilities for the placebo. That is the
placebo group shows faster rates of going
out of remission than the treatment group.
The KM curves established the efficacy of
6-MP (treatment) for maintaining longer
remissions in acute leukemia than the
placebo.
Comparing survival
functions: Log-Rank
Test
Comparing survival functions: Log-rank test
▪ One of the important goals of survival analysis is to assess whether there are
differences in survival among different groups (two or more) of participants.
For example:
(1) In a clinical trial with a survival outcome, we are interested in comparing
survival between participants receiving a new drug as compared to a placebo.
(2) In an observational study, we might be interested in comparing survival
between men and women or between participants with and without a
particular risk factor (e.g., hypertension or diabetes).
▪ There are many tests, Log-rank test is the most popular one to test the null
hypothesis of no difference in survival between two or more independent
groups.
Log-rank test
Test hypothesis:
H0: Survival functions of the two (or more say r) independent groups are identical
(S1t = S2t , at all times t)
H1: Survival functions of the two (or more say r) independent groups are not identical
(S1t ≠ S2t , at any time t)
Test statistics
▪ There are many versions of this test, the one is presented here is related to 𝜒2 test
statistics that is known as Cox-Mantel log-rank test . The test statistic is derived based
on comparing the observed numbers to the expected numbers of events at each time
point over the follow-up period.
▪ First, the Kaplan-Meier (K-M) table is constructed for each group, considering the data
from example 2 above, K-M estimate tables for treatment and placebo groups are
shown below.
Log-rank test (teststatisticcomputation)
1. For each group, at each event time compute the number at risk and the observed number of
failure events. This can be extracted from the Kaplan-Meier table for each group.
2. Rank the survival times (event times) for the combined data (over all groups).
3. Under H0 is true, The log-rank statistic follows a 2 distribution (with df= r-1) is computed as
follow:
𝜒2 = ෍
𝑖=1
𝑟
(σ𝑡 𝑂𝑖𝑡 − σ𝑡 𝐸𝑖𝑡)2
σ𝑡 𝐸𝑖𝑡
σ𝑡 𝑂𝑖𝑡 : the sum of the observed number of events in the ith group over time
σ𝑡 𝐸𝑖𝑡 : the sum of the expected number of events in the ith group over time, where
𝐸𝑖𝑡 = 𝑁𝑖𝑡 ×
𝑂𝑡
𝑁𝑡
, 𝑖 = 1,2, … , 𝑟
𝑁𝑖𝑡 : number of subjects at risk in group i at time point t.
𝑁𝑡 : total number of subjects at risk (in both groups) at time point t (𝑁𝑡 = 𝑁1𝑡 + 𝑁2𝑡+… + 𝑁𝑟𝑡).
𝑂𝑡 : total number of observed events (in both groups) at time point t (𝑂𝑡 = 𝑂1𝑡 + 𝑂2𝑡+… + 𝑂𝑟𝑡).
(1)
Log-rank test (example-2)
1- For each group, at each event time compute the number at risk and the observed number of events.
Then Rank the survival times (event times) for the combined data (over all groups).
Time to
event
Number at
risk in
treatment
group
Number of
observed
event in
treatment
group
time to
event
Number at
risk in
placebo
group
Number of
observed
event in
placebo
group
t N1 O1 t N2 O2
6 21 3 1 21 2
7 17 1 2 19 2
10 15 1 4 16 1
13 12 1 5 14 1
16 11 1 8 12 3
22 7 1 11 8 1
23 6 1 12 6 1
9 15 4 1
17 3 1
22 2 1
14
Time to
event
Number at
risk in
treatment
group
Number of
observed
event in
treatment
group
Number at
risk in
placebo
group
Number of
observed
event in
placebo
group
t N1 O1 N2 O2
1 21 0 21 2
2 21 0 19 2
4 21 0 16 1
5 21 0 14 1
6 21 3 14 0
7 17 1 14 0
8 17 0 12 3
10 15 1 12 0
11 15 0 8 1
12 15 0 6 1
13 12 1 6 0
15 12 0 4 1
16 11 1 4 0
17 11 0 3 1
22 7 1 2 1
23 6 1 2 0
9 14
Should be extracted from
Kaplan-Meier table of each
group
Log-rank test (example-2)(cont.)
𝜒2 =
(9−14.488)2
14.488
+
(14−8.512)2
8.512
𝜒2 = 5.617
0.018 < 0.05 (or 5.617 > 3.841),
then we reject the null hypothesis
that the two survival functions
are identical.
2- In the combined table, add the number of observed events and the number at risk for each group (from the
previous table), then compute 𝐸1𝑡 and 𝐸2𝑡 over time, and finally apply in the above formula.
Time to
event
Number at
risk in
treatment
group
Number of
observed
event in
treatment
group
Number at
risk in
placebo
group
Number of
observed
event in
placebo
group
Total number
at risk
Total number
of observed
event
Expected number of
event in treatment
group
Expected number of
event in placebo group
t N1 O1 N2 O2 N=N1+N2 O=O1+O2 E1 = N1 * (O/N) E2 = N2 * (O/N)
1 21 0 21 2 42 2 1.000 1.000
2 21 0 19 2 40 2 1.050 0.950
4 21 0 16 1 37 1 0.568 0.432
5 21 0 14 1 35 1 0.600 0.400
6 21 3 14 0 35 3 1.800 1.200
7 17 1 14 0 31 1 0.548 0.452
8 17 0 12 3 29 3 1.759 1.241
10 15 1 12 0 27 1 0.556 0.444
11 15 0 8 1 23 1 0.652 0.348
12 15 0 6 1 21 1 0.714 0.286
13 12 1 6 0 18 1 0.667 0.333
15 12 0 4 1 16 1 0.750 0.250
16 11 1 4 0 15 1 0.733 0.267
17 11 0 3 1 14 1 0.786 0.214
22 7 1 2 1 9 2 1.556 0.444
23 6 1 2 0 8 1 0.750 0.250
9 14 14.488 8.512
P-value of obtained in R as follows:
1 - pchisq (5.617,1)
[1] 0.01778707 or
qchisq(0.95,1)
[1] 3.841459
𝜒2 = ෍
𝑖=1
𝑟
(σ𝑡 𝑂𝑖𝑡 − σ𝑡 𝐸𝑖𝑡)2
σ𝑡 𝐸𝑖𝑡
Log-rank test (example-2)(R-output)
𝜒2 =
(𝑂1 − 𝐸1)2
෢
𝑉𝑎𝑟(𝑂1 − 𝐸1)
=
[σ𝑡(𝑂1𝑡 − 𝐸1𝑡)]2
෠
𝑉
෠
𝑉 = σ𝑡
𝑁1𝑡𝑁2𝑡𝑂𝑡(𝑁𝑡−𝑂𝑡)
𝑁𝑡
2(𝑁𝑡−1)
.
Formula of Log-rank test statistic (in case of two groups) used by R and many
other software such as STATA is obtained as:
This statistic is distributed as 𝜒2 with 1 df under H0 is true, where:
The formula introduced by (2) is the original one, while (1) is an approximated
version. Also, formula (1) is slightly smaller than the log–rank statistic for formula (2).
(2)
Comparing survival
functions: Hazard
Ratio
Comparing survival functions: Hazard Ratio (HR)
▪ To compare between two independent groups (exposed vs. unexposed or treatment vs. control),
hazard ratio (HR) is used. Hazard ratio is estimated from data organized to conduct log-rank test.
▪ Specifically, the hazard ratio is the ratio of the total number of observed to expected
events in two independent comparison groups:
𝐻𝑅 =
σ𝑡 𝑂𝐸𝑥𝑝𝑜𝑠𝑒𝑑, 𝑡 / σ𝑡 𝐸𝐸𝑥𝑝𝑜𝑠𝑒𝑑,𝑡
σ𝑡 𝑂𝑈𝑛𝑒𝑝𝑜𝑠𝑒𝑑, 𝑡 / σ𝑡 𝐸𝑈𝑛𝑒𝑥𝑝𝑜𝑠𝑒𝑑, 𝑡
=
σ𝑡 𝑂𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡, 𝑡 / σ𝑡 𝐸𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡,𝑡
σ𝑡 𝑂𝐶𝑜𝑛𝑡𝑟𝑜𝑙, 𝑡 / σ𝑡 𝐸𝐶𝑜𝑛𝑡𝑟𝑜𝑙, 𝑡
▪ In example 2, to compare survival functions between the treatment and placebo, on can
use the hazard ratio as follow
𝐻𝑅 =
σ𝑡 𝑂𝑃𝑙𝑎𝑐𝑒𝑏𝑜, 𝑡 / σ𝑡 𝐸𝑃𝑙𝑎𝑐𝑒𝑏𝑜,𝑡
σ𝑡 𝑂𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡, 𝑡 / σ𝑡 𝐸𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡,𝑡
=
14/8.512
9/14.488
= 2.65
Thus, participants in the placebo group have 2.65 times the risk of going out
of remission (relapse) as compared to participants in the treatment group.
Semi-parametric
methods:
Cox-Proportional
hazards model
• Kaplan-Meier curves and log-rank tests - are examples of univariate analysis. They
describe the survival according to one factor under investigation, but ignore the
impact of any others.
• Additionally, Kaplan-Meier curves and log-rank tests are useful only when the
predictor variable is categorical (e.g.: treatment A vs treatment B; males vs females).
They don’t work easily for quantitative predictors such as gene expression, weight,
or age.
• The Cox PH model is the most commonly used survival data analysis technique that
simultaneously allows one to include and to assess the effect of multiple covariates.
Procedures for analyzing time-to-event
Procedures for analyzing time-to-event
▪ When research interest in modeling the relationship between the time to event and a
set of explanatory variables (or risk factors), regression models are needed.
▪ Two types of regression models can be used
1. Semiparametric models: Do not require us to specify a parametric form for the
baseline hazard (defined as the hazard at time t for observations with all
predictors equal to zero)
2. Parametric models: Parametric distributional assumption are made about the
hazard function.
▪ One of the most popular models used under the first type is the Proportional
hazard model or Cox regression.
ℎ 𝑡, 𝑿 = ℎ0(𝑡) exp(𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝)
Cox Regression Model
Example:
These data involve two groups of
leukemia patients, with 21
patients in each group. Group 1 is
the treatment group, and group 2
is the placebo group.
The data set also contains the
variable log WBC, which is a
well-known prognostic indicator
of survival for leukemia patients
Sum – Up
Biostat_Chapter6_Part3.pdf Bio Statistics

More Related Content

Similar to Biostat_Chapter6_Part3.pdf Bio Statistics

Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Waqas Tariq
 
Chapter 11Survival AnalysisLearning Objectives.docx
Chapter 11Survival AnalysisLearning Objectives.docxChapter 11Survival AnalysisLearning Objectives.docx
Chapter 11Survival AnalysisLearning Objectives.docxketurahhazelhurst
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Looking at data
Looking at dataLooking at data
Looking at datapcalabri
 
Answer the following.   (5 pts ea)A study is conducted to estimate.docx
Answer the following.   (5 pts ea)A study is conducted to estimate.docxAnswer the following.   (5 pts ea)A study is conducted to estimate.docx
Answer the following.   (5 pts ea)A study is conducted to estimate.docxboyfieldhouse
 
7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCEL7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCELrajesh1655
 
Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)Hau Pham
 
Test of Random Walk Hypothesis: Before & After the 2007-09 Crisis
Test of Random Walk Hypothesis: Before & After the 2007-09 CrisisTest of Random Walk Hypothesis: Before & After the 2007-09 Crisis
Test of Random Walk Hypothesis: Before & After the 2007-09 CrisisGabriel Koh
 
QSIR knowledge exchange - Matt Tite presentation
QSIR knowledge exchange   -  Matt Tite presentationQSIR knowledge exchange   -  Matt Tite presentation
QSIR knowledge exchange - Matt Tite presentationNHS Improving Quality
 
Presentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin PremPresentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin PremAshikAminPrem
 

Similar to Biostat_Chapter6_Part3.pdf Bio Statistics (20)

Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
 
Chapter07.pdf
Chapter07.pdfChapter07.pdf
Chapter07.pdf
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
 
Chapter 11Survival AnalysisLearning Objectives.docx
Chapter 11Survival AnalysisLearning Objectives.docxChapter 11Survival AnalysisLearning Objectives.docx
Chapter 11Survival AnalysisLearning Objectives.docx
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...
MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...
MUMS Undergraduate Workshop - Introduction to Bayesian Inference & Uncertaint...
 
Looking at data
Looking at dataLooking at data
Looking at data
 
Qc tools
Qc toolsQc tools
Qc tools
 
Qc tools
Qc toolsQc tools
Qc tools
 
Answer the following.   (5 pts ea)A study is conducted to estimate.docx
Answer the following.   (5 pts ea)A study is conducted to estimate.docxAnswer the following.   (5 pts ea)A study is conducted to estimate.docx
Answer the following.   (5 pts ea)A study is conducted to estimate.docx
 
7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCEL7 qc toools LEARN and KNOW how to BUILD IN EXCEL
7 qc toools LEARN and KNOW how to BUILD IN EXCEL
 
Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)
 
C070409013
C070409013C070409013
C070409013
 
Test of Random Walk Hypothesis: Before & After the 2007-09 Crisis
Test of Random Walk Hypothesis: Before & After the 2007-09 CrisisTest of Random Walk Hypothesis: Before & After the 2007-09 Crisis
Test of Random Walk Hypothesis: Before & After the 2007-09 Crisis
 
Informe experimentos # 2
Informe experimentos # 2Informe experimentos # 2
Informe experimentos # 2
 
Chapter15
Chapter15Chapter15
Chapter15
 
QSIR knowledge exchange - Matt Tite presentation
QSIR knowledge exchange   -  Matt Tite presentationQSIR knowledge exchange   -  Matt Tite presentation
QSIR knowledge exchange - Matt Tite presentation
 
Presentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin PremPresentation on Hypothesis Test by Ashik Amin Prem
Presentation on Hypothesis Test by Ashik Amin Prem
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
The Right Way
The Right WayThe Right Way
The Right Way
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Biostat_Chapter6_Part3.pdf Bio Statistics

  • 1. Biostatistics – (SA408/SE408) Spring 2023 Instructor Dr. Heba Ahmed Emera
  • 4. Kaplan-Meier estimator • The Kaplan–Meier approach, also called the product-limit approach, is a very popular method that re-estimates the survival probability at each time an event occurs. • This estimator is the product over the failure times of the conditional probabilities of surviving to the next failure time. • There are several assumptions for appropriate use of the Kaplan–Meier approach. Specifically, we assume that (1) censoring is independent of the probabilities of developing the event of interest and that, (2) survival probabilities are comparable in participants who are recruited earlier as well as later into the study. (3) when comparing several groups, it is also important that these assumptions are satisfied in each comparison group so that, for example, censoring is not more likely in one group than in another.
  • 5. First, We define the following notation to be used in deriving the Kaplan-Meier estimator. • Let t0 < t1 < t2 < …< tk, represents the observed failure times as well as the censored times in a sample of size n = n0 (where n0 is the number of participants at the baseline). • dj : the failure cases at tj. • cj : the censored cases during the interval [tj ,tj+1). • 𝑛𝑗: the number of individuals at risk just prior to tj,. Where, this number is computed as: 𝑛𝑗 = 𝑛𝑗−1 − (𝑑𝑗−1+𝑐𝑗−1). • The probability of surviving at the jth interval is estimated as Ƹ 𝑝𝑗 = 𝑛𝑗−𝑑𝑗 𝑛𝑗 = 1 − 𝑑𝑗 𝑛𝑗 . • The probability of surviving up to tj is the product of the probabilities of surviving all the intervals up to the jth interval. • The survivor function is then estimated by: መ 𝑆𝑡 = ෑ 𝑗 | 𝑡𝑗<𝑡 𝑗 𝑛𝑗 − 𝑑𝑗 𝑛𝑗 = መ 𝑆𝑡−1 × 𝑛𝑡 − 𝑑𝑡 𝑛𝑡 Kaplan-Meier estimator
  • 6. Kaplan-Meier estimator - Example-1 Consider a small prospective cohort study designed to study time to death. The study involves participants who are 65+ years of age who are followed for up to 24 years. Twenty participants are followed until they die, until the study ends, or until they drop out of the study. Data obtained from the study are presented in the following table (year at which the subject lost to follow-up indicated by (+)). 1. Derive the Kaplan-Meier estimate of survivor function at times of failure, also compute the standard errors of those estimates. 2. Draw the Kaplan-Meier survivor function curve. Based on the curve, find: (a) the probability that a participant survives past 10 years, (b) the minimum no. of years at which 75% of participants will survive, and (c) the estimate of median survival time. Participant ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Year of death/ last contact 24+ 3 11+ 19+ 24 13 14 2+ 18 17+ 24 21+ 12 1 10+ 23 6+ 5 9+ 17
  • 7. Kaplan-Meier estimator – Example-1 event plot
  • 8. Kaplan-Meier estimator – Example-1 Years to death: 1, 2+, 3, 5, 6+, 9+, 10+, 11+, 12, 13, 14, 17+, 17, 18, 19+, 21+, 23, 24+, 24, 24 (+ indicates censored cases) This can be easily obtained using R as follows: Note that, data is re-organized ascendingly, with indicating to the censored cases to ease constructing the Kaplan-Meier table to get the survival estimates.
  • 9. Kaplan-Meier estimator – Example-1 Note that, in Kaplan-Meier table, computation will be displayed only at failure times (not the censored) where survival estimates are constant at censored times. Time Number alive (people at risk) at prior to tj Number of deaths (failure) at tj Number of censored during the interval [tj ,tj+1) Proportion surviving at tj Survival propability estimates up to tj tj nt dt ct pt= (nt-dt)/nt St = pj * St-1 0 20 0 0 1.000 1 1 20 1 1 0.950 0.950 0.003 0.003 0.0487 3 18 1 0 0.944 0.897 0.003 0.006 0.0689 5 17 1 4 0.941 0.844 0.004 0.010 0.0826 12 12 1 0 0.917 0.774 0.008 0.017 0.1014 13 11 1 0 0.909 0.704 0.009 0.026 0.1140 14 10 1 0 0.900 0.633 0.011 0.037 0.1224 17 9 1 1 0.889 0.563 0.014 0.051 0.1274 18 7 1 2 0.857 0.483 0.024 0.075 0.1322 23 4 1 0 0.750 0.362 0.083 0.158 0.1440 24 3 2 1 0.333 0.121 0.667 0.825 0.1096 11 9 (1-pj)/ (nj * pj) SE(St) S(1-pj)/ (nj * pj) 𝑛𝑗 = 𝑛𝑗−1 − (𝑑𝑗−1+𝑐𝑗−1). Ƹ 𝑝𝑗 = 1 − 𝑑𝑗 𝑛𝑗 .
  • 10. Kaplan-Meier estimator – Example-1 (Cont.) • K-M plot drawn as a step function: +: indicate where censoring occurred.
  • 11. Kaplan-Meier estimator Standard Error (SE) Estimates: ▪ A popular formula to estimate the standard error of the survival estimates is Greenwood’s formula. ▪ It is obtained the illustrated formula: ෢ 𝑉𝑎𝑟( መ 𝑆𝑡) = መ 𝑆𝑡 2 ෍ 𝑗≤𝑡 1 − Ƹ 𝑝𝑗 𝑛𝑗 Ƹ 𝑝𝑗 where the sum in the above formula computed cumulatively across all time points before the time point of interest. ▪ Also, (1-)% CI of 𝑆𝑡 is obtained as መ 𝑆𝑡 ± 𝑍𝛼/2 ෢ 𝑆𝐸( መ 𝑆𝑡). ▪ Unfortunately, confidence intervals computed based on the above variance may extend above one or below zero. A more satisfying approach is to find confidence intervals for the log-log transformation (only in R). Greenwood’s formula
  • 12. Kaplan-Meier estimator - Example-1 (Cont.) Consider a small prospective cohort study designed to study time to death. The study involves participants who are 65+ years of age who are followed for up to 24 years. Twenty participants are followed until they die, until the study ends, or until they drop out of the study. Data obtained from the study are presented in the following table (year at which the subject lost to follow-up indicated by (+)). 1. Derive the Kaplan-Meier estimate of survivor function at times of failure, also compute the standard errors of those estimates. 2. Draw the Kaplan-Meier survivor function curve. Based on the curve, find: (a) the probability that a participant survives past 10 years, (b) the minimum no. of years at which 75% of participants will survive, and (c) the estimate of median survival time. Participant ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Year of death/ last contact 24+ 3 11+ 19+ 24 13 14 2+ 18 17+ 24 21+ 12 1 10+ 23 6+ 5 9+ 17
  • 13. Kaplan-Meier estimator – Example-1(Cont.) Note that, in Kaplan-Meier table, computation will be displayed only at failure times (not the censored) where survival estimates are constant at censored times. Time Number alive (people at risk) at prior to tj Number of deaths (failure) at tj Number of censored during the interval [tj ,tj+1) Proportion surviving at tj Survival propability estimates up to tj tj nt dt ct pt= (nt-dt)/nt St = pj * St-1 0 20 0 0 1.000 1 1 20 1 1 0.950 0.950 0.003 0.003 0.0487 3 18 1 0 0.944 0.897 0.003 0.006 0.0689 5 17 1 4 0.941 0.844 0.004 0.010 0.0826 12 12 1 0 0.917 0.774 0.008 0.017 0.1014 13 11 1 0 0.909 0.704 0.009 0.026 0.1140 14 10 1 0 0.900 0.633 0.011 0.037 0.1224 17 9 1 1 0.889 0.563 0.014 0.051 0.1274 18 7 1 2 0.857 0.483 0.024 0.075 0.1322 23 4 1 0 0.750 0.362 0.083 0.158 0.1440 24 3 2 1 0.333 0.121 0.667 0.825 0.1096 11 9 (1-pj)/ (nj * pj) SE(St) S(1-pj)/ (nj * pj) ෢ 𝑉𝑎𝑟( መ 𝑆𝑡) = መ 𝑆𝑡 2 ෍ 𝑗≤𝑡 1 − Ƹ 𝑝𝑗 𝑛𝑗 Ƹ 𝑝𝑗 (1) (2) (3) (4) (5) (6) (7) (8)
  • 14. Kaplan-Meier estimator – Example-1 (Cont.) R output
  • 15. Kaplan-Meier estimator – Example-1 (Cont.) R output
  • 16. Kaplan-Meier estimator – Example-1 (Cont.) 85% 13 18 75% 50%
  • 17. Kaplan-Meier estimator - Example-2 Table shows the results of a clinical trial of a treatment (drug 6-mercaptopurine or 6-MP) versus a placebo in 42 children with acute leukemia. Patients were followed until their leukemia returned (relapse or go out of remission) or until the end of the study. 1. Use the Kaplan-Meier method to estimate the survival function for each group. 2. Draw the survival curve for each group. Comment on the results.
  • 18. Kaplan-Meier estimator - Example-2 (Cont.) Group = Treatment Time Number alive (people at risk) at prior to tj Number of deaths (failure) at tj Number of censored during the interval [tj ,tj+1) Proportion surviving at tj Survival propability estimates up to tj tj nt dt ct pt= (nt-dt)/nt St = pj * St-1 0 21 0 0 1 1 6 21 3 1 0.8571 0.8571 0.008 0.008 0.0764 7 17 1 1 0.9412 0.8067 0.004 0.012 0.0869 10 15 1 2 0.9333 0.7529 0.005 0.016 0.0963 13 12 1 0 0.9167 0.6902 0.008 0.024 0.1068 16 11 1 3 0.9091 0.6275 0.009 0.033 0.1141 22 7 1 0 0.8571 0.5378 0.024 0.057 0.1282 23 6 1 5 0.8333 0.4482 0.033 0.090 0.1346 9 12 (1-pj)/ (nj * pj) S(1-pj)/ (nj * pj) SE(St)
  • 19. Kaplan-Meier estimator - Example-2 (Cont.) Group = Placebo ` ` Time Number alive (people at risk) at prior to tj Number of deaths (failure) at tj Number of censored during the interval [tj ,tj+1) Proportion surviving at tj Survival propability estimates up to tj tj nt dt ct pt= (nt-dt)/nt St = pj * St-1 0 21 0 0 1.000 1 1 21 2 0 0.9048 0.9048 0.005 0.005 0.0641 2 19 2 1 0.8947 0.8095 0.006 0.011 0.0857 4 16 1 1 0.9375 0.7589 0.004 0.015 0.0941 5 14 1 1 0.9286 0.7047 0.005 0.021 0.1018 8 12 3 1 0.7500 0.5285 0.028 0.049 0.1166 11 8 1 1 0.8750 0.4625 0.018 0.067 0.1193 12 6 1 1 0.8333 0.3854 0.033 0.100 0.1218 15 4 1 0 0.7500 0.2890 0.083 0.183 0.1237 17 3 1 0 0.6667 0.1927 0.167 0.350 0.1140 22 2 1 1 0.5000 0.0963 0.500 0.850 0.0888 14 7 (1-pj)/ (nj * pj) S(1-pj)/ (nj * pj) SE(St)
  • 20. Kaplan-Meier estimator - Example-2 (R output)
  • 21. Kaplan-Meier estimator - Example-2 (Cont.) Interpretation: Based on the figure, the survival probabilities for the treatment group are higher than the survival probabilities for the placebo. That is the placebo group shows faster rates of going out of remission than the treatment group. The KM curves established the efficacy of 6-MP (treatment) for maintaining longer remissions in acute leukemia than the placebo.
  • 23. Comparing survival functions: Log-rank test ▪ One of the important goals of survival analysis is to assess whether there are differences in survival among different groups (two or more) of participants. For example: (1) In a clinical trial with a survival outcome, we are interested in comparing survival between participants receiving a new drug as compared to a placebo. (2) In an observational study, we might be interested in comparing survival between men and women or between participants with and without a particular risk factor (e.g., hypertension or diabetes). ▪ There are many tests, Log-rank test is the most popular one to test the null hypothesis of no difference in survival between two or more independent groups.
  • 24. Log-rank test Test hypothesis: H0: Survival functions of the two (or more say r) independent groups are identical (S1t = S2t , at all times t) H1: Survival functions of the two (or more say r) independent groups are not identical (S1t ≠ S2t , at any time t) Test statistics ▪ There are many versions of this test, the one is presented here is related to 𝜒2 test statistics that is known as Cox-Mantel log-rank test . The test statistic is derived based on comparing the observed numbers to the expected numbers of events at each time point over the follow-up period. ▪ First, the Kaplan-Meier (K-M) table is constructed for each group, considering the data from example 2 above, K-M estimate tables for treatment and placebo groups are shown below.
  • 25. Log-rank test (teststatisticcomputation) 1. For each group, at each event time compute the number at risk and the observed number of failure events. This can be extracted from the Kaplan-Meier table for each group. 2. Rank the survival times (event times) for the combined data (over all groups). 3. Under H0 is true, The log-rank statistic follows a 2 distribution (with df= r-1) is computed as follow: 𝜒2 = ෍ 𝑖=1 𝑟 (σ𝑡 𝑂𝑖𝑡 − σ𝑡 𝐸𝑖𝑡)2 σ𝑡 𝐸𝑖𝑡 σ𝑡 𝑂𝑖𝑡 : the sum of the observed number of events in the ith group over time σ𝑡 𝐸𝑖𝑡 : the sum of the expected number of events in the ith group over time, where 𝐸𝑖𝑡 = 𝑁𝑖𝑡 × 𝑂𝑡 𝑁𝑡 , 𝑖 = 1,2, … , 𝑟 𝑁𝑖𝑡 : number of subjects at risk in group i at time point t. 𝑁𝑡 : total number of subjects at risk (in both groups) at time point t (𝑁𝑡 = 𝑁1𝑡 + 𝑁2𝑡+… + 𝑁𝑟𝑡). 𝑂𝑡 : total number of observed events (in both groups) at time point t (𝑂𝑡 = 𝑂1𝑡 + 𝑂2𝑡+… + 𝑂𝑟𝑡). (1)
  • 26. Log-rank test (example-2) 1- For each group, at each event time compute the number at risk and the observed number of events. Then Rank the survival times (event times) for the combined data (over all groups). Time to event Number at risk in treatment group Number of observed event in treatment group time to event Number at risk in placebo group Number of observed event in placebo group t N1 O1 t N2 O2 6 21 3 1 21 2 7 17 1 2 19 2 10 15 1 4 16 1 13 12 1 5 14 1 16 11 1 8 12 3 22 7 1 11 8 1 23 6 1 12 6 1 9 15 4 1 17 3 1 22 2 1 14 Time to event Number at risk in treatment group Number of observed event in treatment group Number at risk in placebo group Number of observed event in placebo group t N1 O1 N2 O2 1 21 0 21 2 2 21 0 19 2 4 21 0 16 1 5 21 0 14 1 6 21 3 14 0 7 17 1 14 0 8 17 0 12 3 10 15 1 12 0 11 15 0 8 1 12 15 0 6 1 13 12 1 6 0 15 12 0 4 1 16 11 1 4 0 17 11 0 3 1 22 7 1 2 1 23 6 1 2 0 9 14 Should be extracted from Kaplan-Meier table of each group
  • 27. Log-rank test (example-2)(cont.) 𝜒2 = (9−14.488)2 14.488 + (14−8.512)2 8.512 𝜒2 = 5.617 0.018 < 0.05 (or 5.617 > 3.841), then we reject the null hypothesis that the two survival functions are identical. 2- In the combined table, add the number of observed events and the number at risk for each group (from the previous table), then compute 𝐸1𝑡 and 𝐸2𝑡 over time, and finally apply in the above formula. Time to event Number at risk in treatment group Number of observed event in treatment group Number at risk in placebo group Number of observed event in placebo group Total number at risk Total number of observed event Expected number of event in treatment group Expected number of event in placebo group t N1 O1 N2 O2 N=N1+N2 O=O1+O2 E1 = N1 * (O/N) E2 = N2 * (O/N) 1 21 0 21 2 42 2 1.000 1.000 2 21 0 19 2 40 2 1.050 0.950 4 21 0 16 1 37 1 0.568 0.432 5 21 0 14 1 35 1 0.600 0.400 6 21 3 14 0 35 3 1.800 1.200 7 17 1 14 0 31 1 0.548 0.452 8 17 0 12 3 29 3 1.759 1.241 10 15 1 12 0 27 1 0.556 0.444 11 15 0 8 1 23 1 0.652 0.348 12 15 0 6 1 21 1 0.714 0.286 13 12 1 6 0 18 1 0.667 0.333 15 12 0 4 1 16 1 0.750 0.250 16 11 1 4 0 15 1 0.733 0.267 17 11 0 3 1 14 1 0.786 0.214 22 7 1 2 1 9 2 1.556 0.444 23 6 1 2 0 8 1 0.750 0.250 9 14 14.488 8.512 P-value of obtained in R as follows: 1 - pchisq (5.617,1) [1] 0.01778707 or qchisq(0.95,1) [1] 3.841459 𝜒2 = ෍ 𝑖=1 𝑟 (σ𝑡 𝑂𝑖𝑡 − σ𝑡 𝐸𝑖𝑡)2 σ𝑡 𝐸𝑖𝑡
  • 28. Log-rank test (example-2)(R-output) 𝜒2 = (𝑂1 − 𝐸1)2 ෢ 𝑉𝑎𝑟(𝑂1 − 𝐸1) = [σ𝑡(𝑂1𝑡 − 𝐸1𝑡)]2 ෠ 𝑉 ෠ 𝑉 = σ𝑡 𝑁1𝑡𝑁2𝑡𝑂𝑡(𝑁𝑡−𝑂𝑡) 𝑁𝑡 2(𝑁𝑡−1) . Formula of Log-rank test statistic (in case of two groups) used by R and many other software such as STATA is obtained as: This statistic is distributed as 𝜒2 with 1 df under H0 is true, where: The formula introduced by (2) is the original one, while (1) is an approximated version. Also, formula (1) is slightly smaller than the log–rank statistic for formula (2). (2)
  • 30. Comparing survival functions: Hazard Ratio (HR) ▪ To compare between two independent groups (exposed vs. unexposed or treatment vs. control), hazard ratio (HR) is used. Hazard ratio is estimated from data organized to conduct log-rank test. ▪ Specifically, the hazard ratio is the ratio of the total number of observed to expected events in two independent comparison groups: 𝐻𝑅 = σ𝑡 𝑂𝐸𝑥𝑝𝑜𝑠𝑒𝑑, 𝑡 / σ𝑡 𝐸𝐸𝑥𝑝𝑜𝑠𝑒𝑑,𝑡 σ𝑡 𝑂𝑈𝑛𝑒𝑝𝑜𝑠𝑒𝑑, 𝑡 / σ𝑡 𝐸𝑈𝑛𝑒𝑥𝑝𝑜𝑠𝑒𝑑, 𝑡 = σ𝑡 𝑂𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡, 𝑡 / σ𝑡 𝐸𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡,𝑡 σ𝑡 𝑂𝐶𝑜𝑛𝑡𝑟𝑜𝑙, 𝑡 / σ𝑡 𝐸𝐶𝑜𝑛𝑡𝑟𝑜𝑙, 𝑡 ▪ In example 2, to compare survival functions between the treatment and placebo, on can use the hazard ratio as follow 𝐻𝑅 = σ𝑡 𝑂𝑃𝑙𝑎𝑐𝑒𝑏𝑜, 𝑡 / σ𝑡 𝐸𝑃𝑙𝑎𝑐𝑒𝑏𝑜,𝑡 σ𝑡 𝑂𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡, 𝑡 / σ𝑡 𝐸𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡,𝑡 = 14/8.512 9/14.488 = 2.65 Thus, participants in the placebo group have 2.65 times the risk of going out of remission (relapse) as compared to participants in the treatment group.
  • 32. • Kaplan-Meier curves and log-rank tests - are examples of univariate analysis. They describe the survival according to one factor under investigation, but ignore the impact of any others. • Additionally, Kaplan-Meier curves and log-rank tests are useful only when the predictor variable is categorical (e.g.: treatment A vs treatment B; males vs females). They don’t work easily for quantitative predictors such as gene expression, weight, or age. • The Cox PH model is the most commonly used survival data analysis technique that simultaneously allows one to include and to assess the effect of multiple covariates. Procedures for analyzing time-to-event
  • 33. Procedures for analyzing time-to-event ▪ When research interest in modeling the relationship between the time to event and a set of explanatory variables (or risk factors), regression models are needed. ▪ Two types of regression models can be used 1. Semiparametric models: Do not require us to specify a parametric form for the baseline hazard (defined as the hazard at time t for observations with all predictors equal to zero) 2. Parametric models: Parametric distributional assumption are made about the hazard function. ▪ One of the most popular models used under the first type is the Proportional hazard model or Cox regression. ℎ 𝑡, 𝑿 = ℎ0(𝑡) exp(𝛽1𝑋1 + 𝛽2𝑋2 + ⋯ + 𝛽𝑝𝑋𝑝)
  • 34. Cox Regression Model Example: These data involve two groups of leukemia patients, with 21 patients in each group. Group 1 is the treatment group, and group 2 is the placebo group. The data set also contains the variable log WBC, which is a well-known prognostic indicator of survival for leukemia patients