Survival analysis for lab scientists

Survival Analysis for Lab
Scientists
Mike LaValley
1/24/2011

Rats Data
 Rats treated with a potential
carcinogen
 Control rats – no exposure
 Different doses
 Low dose
 Medium dose
 High dose
 Rats followed until a tumor developed or
until day 104 (15 weeks)

Rats Data
 Experiment terminated at 104 days
and any tumor free rats are sacrificed
 Censoring
 If a rat escapes or is hurt during
follow-up, follow-up is terminated
 Censoring
 Question – do exposed rats develop
tumors at a faster rate than
unexposed?

Time-to-Event Data
 Many different types of studies collect data
on the length of time until a particular
event happens
 Time to heart attack or stroke in the
Framingham Heart Study
 Time to AIDS in treatment studies with HIV-
infected persons
 Time to return to work following a workplace
injury
 Time to disease onset in hamsters following
exposure to C. difficile

Time-to-Event Data
 Time-to-event data has 2
characteristics that complicate its
analysis
 Times to event are not symmetrically
distributed
 Not every person/hamster/rat/specimen
has the event under observation

Lack of Symmetry
4 6 8 10 12 14 16
01020304050
temp.norm
4 6 8 10 12 14 16 18
020406080100
temp.logs
0 10 20 30 40 50 60
050100150
temp.exp
0 5 10 15 20 25
010203040 temp.weib
The top 2 bar
charts show
samples from
symmetric
distributions
The bottom 2 bar
charts show
samples from
right-skewed
distributions

Lack of Symmetry
 When data aren’t symmetric the usual
statistics don’t work well
 The mean is no longer in the center of the
sample
 It is shifted to the right
 The standard deviation isn’t very useful
 Measuring one standard deviation to the right
from the mean gives a lower percentage of the
population than measuring one standard
deviation to the left

Lack of Symmetry
 Ways to cope with a lack of symmetry
 Transform the data using a log or square root
 This tends to make the distribution more
symmetric
 Use robust or non-parametric methods
 Medians instead of means
 Inter-quartile ranges instead of standard
deviations
 Rank-based Wilcoxon tests instead of t-tests

Lack of Symmetry
 In survival analysis, there is an
emphasis on use of non-parametric
methods because of the skewness
 However, it is also possible to use
non-symmetric distributions to cope
with the lack of symmetry
 Exponential distribution
 Weibull distribution

Censoring
 That not every
person/hamster/rat/specimen has the
event under observation is a much
harder problem
 This is called censoring
 Censoring is what prevents us from
being able to use standard methods
on time-to-event data

Censoring
 Censoring creates a difference
between
 What we get to see on everyone – the
follow-up time
 Time to the event or censoring
 What we want to see on everyone – the
time to event

Censoring
 The follow-up times are always less
than or equal to the time-to-event
 This is a big problem
 The follow-up times are systematically
too short
 The mean follow-up time is less than the
mean time-to-event
 The median follow-up time is less than
the median time-to-event

Censoring
The figure shows a
study with 7 subjects.
Circles show when
subjects had events
and squares show
when they were
censored. The solid
blue lines are the
follow-up times, and
the red dotted lines
are the times from
censoring until the
event.
median follow-up time is 2.5 years
median survival time is 4.0 years

Censoring
 The more censoring there is, the
worse the discrepancy between
follow-up time and time to event
 Need to remove the effect of
censoring from the follow-up times to
get a sense of time-to-event
 This is what survival analysis is all
about

Hazard
 In longitudinal studies, data is collected on
a sample of individuals over time
 Often we are interested in the occurrence
of an important event in our sample
 Death
 Heart attack
 Tumor

Risk
 If our interest is on whether subjects
ever experience the event in the
study
 Analysis of risk of the event
 Based on the proportion of subjects
who have the event
 Unadjusted analysis done using
contingency tables and chi-square tests
 Adjusted analysis using logistic
regression

Hazard
 If our interest is on how long until
subjects experience the event in the
study
 Analysis of the hazard of the event
 Based on the rate of occurrence of the
event
 The difference between risk and hazard
is that the hazard incorporates time

Hazard
 A risk analysis is based on the
number of events per person
 A hazard analysis is based on the
number of events per person-year (or
other person-time measure)
 A hazard analysis would be more
appropriate if the length of follow-up
varies

Hazard
 Risk: 2 out of 6
subjects have the
event (33%)
 Hazard: 2 events
per 38 person-
weeks (0.05 events
per person-week)

Censoring
 In the example we just saw, one subject withdrew
and did not complete the study or have the event
 Some other subjects completed the study but never
had the event
 In a hazard analysis, both types of subject are
considered censored and their follow-up time is used
in the denominator
 In a risk analysis, incorporating the first type of
subject requires some arbitrary rules
 Restricting analysis to subjects who do not
withdraw
 Assigning the event (or absence of event) to
subjects who withdraw

Hazard
 A person-years analysis is fine as long
as we assume that the hazard rate is
constant
 Constant hazard -> exponential
distribution
 Most modern survival analysis is done in
a way to avoid assuming that the hazard
rate is constant over time
 Makes the notation and terminology more abstract
 Analyses are more complicated than taking the ratio
of events to person-time

Hazard
 To move away from having a
constant hazard over time
 To calculate the hazard at a particular
time t
 Look only in subjects eligible to have the
event at t
 The hazard at t is the person-time event
rate for a very short period of time
starting at t (limit as the period of time
shrinks to 0)

Hazard Ratio
 If we follow two groups over time, we
may want to compare their hazard
rates
 The hazard ratio is the hazard (at
time t) in group 1 divided by the
hazard (at time t) in group 2
( )
( )
( )
1
2
Group hazard t
Hazard Ratio t
Group hazard t
=

Hazard Ratio
 The hazard ratio has similar
construction to the relative risk,
 Except that the hazards can change
with time
 To simplify the models, we often
assume Proportional Hazards
 i.e. the hazard ratio is constant over
time

Hazard Ratio
 Under proportional hazards
 Although the hazards within each group
may change over time
 The ratio of the hazards between groups
stays the same over time
( )
( )
1
2
Group hazard t
Hazard Ratio
Group hazard t
=

Hazard Ratio
These hazard
curves are
proportional
There is a
constant hazard
ratio of 0.5
comparing
group 1 to
group 2

Testing Survival Difference
 Nonparametric tests are typically
used to test for survival differences
between groups
 Logrank test – best test if the hazards
are proportional between groups
 Gives equal weight to events that happen
throughout follow-up
 Wilcoxon test – gives more weight to
events early in follow-up

Testing Survival Difference
 Parametric test assuming exponential
survival is also sometimes used
 Best test if the hazard is constant within
the groups being compared
 Goes along with the person-time analysis

Treatment
Groups
Tumors Average
Follow-Up
(days)
Tumors /
Rat-Day
Control 3/30 90 0.0011
Low Dose 2/10 89 0.0022
Medium
Dose
4/10 86 0.0047
High Dose 4/10 91 0.0043

Survival Comparison
 Comparing the 4 groups
 Logrank test p=0.095
 Wilcoxon test p=0.311
 Exponential test p=0.171
 So, we don’t find a difference between
groups that couldn’t be attributed to
chance
 But the logrank result is suggestive that
with a larger experiment we might attain
statistical significance

Analyzing Time-to-Event
Outcomes
 Becoming more and more common
 Challenge mostly due to censoring
 Most basic analysis – exponential
hazard is estimated by
 #events/total person-time
 Modern methods avoid assuming
exponential data
 Logrank test

Survival analysis for lab scientists

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Survival analysis for lab scientists

Similar to Survival analysis for lab scientists (20)

Recently uploaded

Recently uploaded (20)

Survival analysis for lab scientists