This document discusses analyzing data from a study where rats were exposed to different doses of a potential carcinogen. Rats were followed until a tumor developed or 104 days. The study aims to determine if exposed rats develop tumors faster than unexposed rats. It discusses challenges in analyzing time-to-event data like censoring and lack of symmetry. Key methods covered include hazard ratios, log-rank tests, and comparing tumor rates between groups while accounting for varying follow-up times.
2. Rats Data
Rats treated with a potential
carcinogen
Control rats – no exposure
Different doses
Low dose
Medium dose
High dose
Rats followed until a tumor developed or
until day 104 (15 weeks)
3. Rats Data
Experiment terminated at 104 days
and any tumor free rats are sacrificed
Censoring
If a rat escapes or is hurt during
follow-up, follow-up is terminated
Censoring
Question – do exposed rats develop
tumors at a faster rate than
unexposed?
4. Time-to-Event Data
Many different types of studies collect data
on the length of time until a particular
event happens
Time to heart attack or stroke in the
Framingham Heart Study
Time to AIDS in treatment studies with HIV-
infected persons
Time to return to work following a workplace
injury
Time to disease onset in hamsters following
exposure to C. difficile
5. Time-to-Event Data
Time-to-event data has 2
characteristics that complicate its
analysis
Times to event are not symmetrically
distributed
Not every person/hamster/rat/specimen
has the event under observation
6. Lack of Symmetry
4 6 8 10 12 14 16
01020304050
temp.norm
4 6 8 10 12 14 16 18
020406080100
temp.logs
0 10 20 30 40 50 60
050100150
temp.exp
0 5 10 15 20 25
010203040 temp.weib
The top 2 bar
charts show
samples from
symmetric
distributions
The bottom 2 bar
charts show
samples from
right-skewed
distributions
7. Lack of Symmetry
When data aren’t symmetric the usual
statistics don’t work well
The mean is no longer in the center of the
sample
It is shifted to the right
The standard deviation isn’t very useful
Measuring one standard deviation to the right
from the mean gives a lower percentage of the
population than measuring one standard
deviation to the left
9. Lack of Symmetry
Ways to cope with a lack of symmetry
Transform the data using a log or square root
This tends to make the distribution more
symmetric
Use robust or non-parametric methods
Medians instead of means
Inter-quartile ranges instead of standard
deviations
Rank-based Wilcoxon tests instead of t-tests
10. Lack of Symmetry
In survival analysis, there is an
emphasis on use of non-parametric
methods because of the skewness
However, it is also possible to use
non-symmetric distributions to cope
with the lack of symmetry
Exponential distribution
Weibull distribution
11. Censoring
That not every
person/hamster/rat/specimen has the
event under observation is a much
harder problem
This is called censoring
Censoring is what prevents us from
being able to use standard methods
on time-to-event data
12. Censoring
Censoring creates a difference
between
What we get to see on everyone – the
follow-up time
Time to the event or censoring
What we want to see on everyone – the
time to event
13. Censoring
The follow-up times are always less
than or equal to the time-to-event
This is a big problem
The follow-up times are systematically
too short
The mean follow-up time is less than the
mean time-to-event
The median follow-up time is less than
the median time-to-event
14. Censoring
The figure shows a
study with 7 subjects.
Circles show when
subjects had events
and squares show
when they were
censored. The solid
blue lines are the
follow-up times, and
the red dotted lines
are the times from
censoring until the
event.
median follow-up time is 2.5 years
median survival time is 4.0 years
15. Censoring
The more censoring there is, the
worse the discrepancy between
follow-up time and time to event
Need to remove the effect of
censoring from the follow-up times to
get a sense of time-to-event
This is what survival analysis is all
about
16. Hazard
In longitudinal studies, data is collected on
a sample of individuals over time
Often we are interested in the occurrence
of an important event in our sample
Death
Heart attack
Tumor
17. Risk
If our interest is on whether subjects
ever experience the event in the
study
Analysis of risk of the event
Based on the proportion of subjects
who have the event
Unadjusted analysis done using
contingency tables and chi-square tests
Adjusted analysis using logistic
regression
18. Hazard
If our interest is on how long until
subjects experience the event in the
study
Analysis of the hazard of the event
Based on the rate of occurrence of the
event
The difference between risk and hazard
is that the hazard incorporates time
19. Hazard
A risk analysis is based on the
number of events per person
A hazard analysis is based on the
number of events per person-year (or
other person-time measure)
A hazard analysis would be more
appropriate if the length of follow-up
varies
20. Hazard
Risk: 2 out of 6
subjects have the
event (33%)
Hazard: 2 events
per 38 person-
weeks (0.05 events
per person-week)
21. Censoring
In the example we just saw, one subject withdrew
and did not complete the study or have the event
Some other subjects completed the study but never
had the event
In a hazard analysis, both types of subject are
considered censored and their follow-up time is used
in the denominator
In a risk analysis, incorporating the first type of
subject requires some arbitrary rules
Restricting analysis to subjects who do not
withdraw
Assigning the event (or absence of event) to
subjects who withdraw
22. Hazard
A person-years analysis is fine as long
as we assume that the hazard rate is
constant
Constant hazard -> exponential
distribution
Most modern survival analysis is done in
a way to avoid assuming that the hazard
rate is constant over time
Makes the notation and terminology more abstract
Analyses are more complicated than taking the ratio
of events to person-time
23. Hazard
To move away from having a
constant hazard over time
To calculate the hazard at a particular
time t
Look only in subjects eligible to have the
event at t
The hazard at t is the person-time event
rate for a very short period of time
starting at t (limit as the period of time
shrinks to 0)
24. Hazard Ratio
If we follow two groups over time, we
may want to compare their hazard
rates
The hazard ratio is the hazard (at
time t) in group 1 divided by the
hazard (at time t) in group 2
( )
( )
( )
1
2
Group hazard t
Hazard Ratio t
Group hazard t
=
25. Hazard Ratio
The hazard ratio has similar
construction to the relative risk,
Except that the hazards can change
with time
To simplify the models, we often
assume Proportional Hazards
i.e. the hazard ratio is constant over
time
26. Hazard Ratio
Under proportional hazards
Although the hazards within each group
may change over time
The ratio of the hazards between groups
stays the same over time
( )
( )
1
2
Group hazard t
Hazard Ratio
Group hazard t
=
28. Testing Survival Difference
Nonparametric tests are typically
used to test for survival differences
between groups
Logrank test – best test if the hazards
are proportional between groups
Gives equal weight to events that happen
throughout follow-up
Wilcoxon test – gives more weight to
events early in follow-up
29. Testing Survival Difference
Parametric test assuming exponential
survival is also sometimes used
Best test if the hazard is constant within
the groups being compared
Goes along with the person-time analysis
36. Survival Comparison
Comparing the 4 groups
Logrank test p=0.095
Wilcoxon test p=0.311
Exponential test p=0.171
So, we don’t find a difference between
groups that couldn’t be attributed to
chance
But the logrank result is suggestive that
with a larger experiment we might attain
statistical significance
37. Analyzing Time-to-Event
Outcomes
Becoming more and more common
Challenge mostly due to censoring
Most basic analysis – exponential
hazard is estimated by
#events/total person-time
Modern methods avoid assuming
exponential data
Logrank test