Introduction To Survival Analysis

3,647 views

Published on

An introduction to the analysis of duration data, with particular insight into biomedical use

Published in: Health & Medicine

Introduction To Survival Analysis

  1. 1. Special Topics in Biostatistics An Introduction to Survival Data Analysis Federico Rotolo federico.rotolo@stat.unipd.it — federico.rotolo@uclouvain.be Visiting PhD student at PhD student atDipartmento di Scienze Statistiche Institut de Statistique, Biostatistique et Sciences Actuarielles Universit` degli Studi di Padova a Universit´ Catholique de Louvain e March 30, 2011
  2. 2. F. Rotolo Survival Analysis Outline An example Peculiarities of Survival Data Notation and Basic Functions Survival Likelihood Parametric models Non-Parametric models Regression Complications of Survival Models Non-proportional hazards Informative censoring Dependent observations Multi-state phenomena Competing Risks ReferencesSTiB: Survival Data Analysis 2/ 57
  3. 3. Survival Analysis F. Rotolo Survival Analysis What is Survival Analysis? The field of statistics providing tools for handling duration data, i.e. continuous and positive numerical variables measuring the time from an origin event until the occurrence of an event of interest.STiB: Survival Data Analysis 3/ 57
  4. 4. Survival Analysis F. Rotolo Survival Analysis What is Survival Analysis? The field of statistics providing tools for handling duration data, i.e. continuous and positive numerical variables measuring the time from an origin event until the occurrence of an event of interest. Why “Survival” Analysis? First works on this topic originated from the problem of studying death times, that is times from birth to death.STiB: Survival Data Analysis 3/ 57
  5. 5. Survival Analysis F. Rotolo Survival Analysis What is Survival Analysis? The field of statistics providing tools for handling duration data, i.e. continuous and positive numerical variables measuring the time from an origin event until the occurrence of an event of interest. Why “Survival” Analysis? First works on this topic originated from the problem of studying death times, that is times from birth to death. Many ad-hoc statistical tools have been developed for survival data (Cox model, Kaplan–Meier estimator, Mantel–Haenszel test, etc.) and research interest in such problems has been increasing. Why is Survival Data Analysis so peculiar?STiB: Survival Data Analysis 3/ 57
  6. 6. Survival Analysis F. Rotolo An example An example Consider a clinical trial with patients undergone tumour surgical removal.STiB: Survival Data Analysis 4/ 57
  7. 7. Survival Analysis F. Rotolo An example An example Consider a clinical trial with patients undergone tumour surgical removal. One can be interested in M: the level of a tumor marker after 6 months T : the time until recurrence of the disease In both cases the measured variable is continuous numerical and positive, so there is no apparent difference.STiB: Survival Data Analysis 4/ 57
  8. 8. Survival Analysis F. Rotolo An example Actually, other situations can perturb the experiment before the variable of interest is observed: the patient dies, gives up the study, migrates, another disease occurs, the study ends, etc. . .STiB: Survival Data Analysis 5/ 57
  9. 9. Survival Analysis F. Rotolo An example Actually, other situations can perturb the experiment before the variable of interest is observed: the patient dies, gives up the study, migrates, another disease occurs, the study ends, etc. . . In such cases M is missing T is missing and we know that T > s, with s the time of the “disturbing event”STiB: Survival Data Analysis 5/ 57
  10. 10. Survival Analysis F. Rotolo Peculiarities of Survival Data Censoring Then the most particular feature of survival data is censoring.STiB: Survival Data Analysis 6/ 57
  11. 11. Survival Analysis F. Rotolo Peculiarities of Survival Data Censoring Then the most particular feature of survival data is censoring. Right censoring (T > t) is very frequent and often unavoidable; all survival methods account for it. Interval censoring (T ∈ (l, r ]) is very frequent, too, but much more ignored in usual practice. Left censoring (T ≤ t) is very infrequent.STiB: Survival Data Analysis 6/ 57
  12. 12. Survival Analysis F. Rotolo Peculiarities of Survival Data Censoring Then the most particular feature of survival data is censoring. Right censoring (T > t) is very frequent and often unavoidable; all survival methods account for it. Interval censoring (T ∈ (l, r ]) is very frequent, too, but much more ignored in usual practice. Left censoring (T ≤ t) is very infrequent. Left truncation is a different concept, concerning the selection bias introduced by including in the study only subjects having a survival time greater than a certain value, say t ∗ ; then we do not observe T but T = T |T > t ∗ .STiB: Survival Data Analysis 6/ 57
  13. 13. Survival Analysis F. Rotolo Peculiarities of Survival Data Conditioning The second important feature of survival data is the concept of conditioning, even more important than censoring according to some authors (Hougaard, 2000).STiB: Survival Data Analysis 7/ 57
  14. 14. Survival Analysis F. Rotolo Peculiarities of Survival Data Conditioning The second important feature of survival data is the concept of conditioning, even more important than censoring according to some authors (Hougaard, 2000). As time passes, new information is available, not only for subjects dying, but also for those surviving.STiB: Survival Data Analysis 7/ 57
  15. 15. Survival Analysis F. Rotolo Peculiarities of Survival Data Conditioning The second important feature of survival data is the concept of conditioning, even more important than censoring according to some authors (Hougaard, 2000). As time passes, new information is available, not only for subjects dying, but also for those surviving. In this case it is useful to consider, rather than the density f (t) of T , its hazard function f (t) h(t) = · 1 − F (t)STiB: Survival Data Analysis 7/ 57
  16. 16. Survival Analysis F. Rotolo Survival Analysis Notation and Basic Functions Consider the event time variable T with distribution F (t) and density f (t) = dF (t)/dt. The survival function is defined as S(t) = P(T > t) = 1 − F (t). (1)STiB: Survival Data Analysis 8/ 57
  17. 17. Survival Analysis F. Rotolo Survival Analysis Notation and Basic Functions Consider the event time variable T with distribution F (t) and density f (t) = dF (t)/dt. The survival function is defined as S(t) = P(T > t) = 1 − F (t). (1) Then, the hazard function is P(t ≤ T < t + ∆t|T ≥ t) f (t) h(t) = lim = · (2) ∆t 0 ∆t S(t) If the censoring time C is independent of the event time T , then h(t) coincides with the Crude Hazard Function (Fleming & Harrington, 1991, Theorem 1.3.1) P(t ≤ T < t + ∆t|T ≥ t, C ≥ t) h# (t) = lim · ∆t 0 ∆tSTiB: Survival Data Analysis 8/ 57
  18. 18. Survival Analysis F. Rotolo Survival Analysis Notation and Basic Functions The cumulative hazard functions is defined as t H(t) = h(u)du. (3) 0STiB: Survival Data Analysis 9/ 57
  19. 19. Survival Analysis F. Rotolo Survival Analysis Notation and Basic Functions The cumulative hazard functions is defined as t H(t) = h(u)du. (3) 0 Since f (t) = −dS(t)/dt, then S(t) = e −H(t) (4) or, equivalently, d h(t) = − log{S(t)}. dtSTiB: Survival Data Analysis 9/ 57
  20. 20. Survival Analysis F. Rotolo Hazard and Conditioning The hazard function already contains conditioning. Then, it is particularly advantageous in a survival context, as shown by Hougaard (1999) in the following table. In truncated In full distribution Quantity distribution given survival to time v Survival function S(t) S(t)/S(v ) Density f (t) f (t)/S(v ) Hazard function h(t) h(t) Conditioning corresponds to considering only actually possible events, accounting for the past being fixed and known.STiB: Survival Data Analysis 10/ 57
  21. 21. Survival Analysis F. Rotolo Survival Likelihood Since right censoring is almost unavoidable, the observable variable is not the time T , but Y = min(T , C ) (Y , δ), , δ = I(T ≤C ) with C ∼ G (·) the censoring time variable and IA the indicator variable on the set A.STiB: Survival Data Analysis 11/ 57
  22. 22. Survival Analysis F. Rotolo Survival Likelihood Since right censoring is almost unavoidable, the observable variable is not the time T , but Y = min(T , C ) (Y , δ), , δ = I(T ≤C ) with C ∼ G (·) the censoring time variable and IA the indicator variable on the set A. What we are interested in is inference on the survival distribution and its parameters, the vector ζ.STiB: Survival Data Analysis 11/ 57
  23. 23. Survival Analysis F. Rotolo Survival Likelihood Since right censoring is almost unavoidable, the observable variable is not the time T , but Y = min(T , C ) (Y , δ), , δ = I(T ≤C ) with C ∼ G (·) the censoring time variable and IA the indicator variable on the set A. What we are interested in is inference on the survival distribution and its parameters, the vector ζ. What is the survival likelihood L(ζ; y )?STiB: Survival Data Analysis 11/ 57
  24. 24. Survival Analysis F. Rotolo Survival Likelihood The contribution of an event time yi to the likelihood is T⊥ ⊥C L(ζ; yi ) = (1 − G (yi ))f (yi ) ∝ f (yi ) = h(yi )S(yi ). The contribution of a right-censor time yi is T⊥ ⊥C L(ζ; yi ) = g (yi )(1 − F (yi )) ∝ (1 − F (yi )) = S(yi ). Under i.i.d. sampling of size n with T ⊥ C , the total likelihood is ⊥ n L(ζ; y ) = {h(yi )}δi S(yi ). (5) i=1STiB: Survival Data Analysis 12/ 57
  25. 25. Survival Analysis F. Rotolo Parametric models A parametric form can be assumed for the hazard function and its parameters can be estimated via maximization of the likelihood (5). The most common models are: Exponential, with constant hazard h(t) = λ > 0 Weibull, with monotone hazard h(t) = λρt ρ−1 , (λ > 0, ρ > 0) Gompertz, with monotone hazard h(t) = λ exp(γt) (λ > 0, γ ∈ R) and a fraction (e λ/γ ) of long-term survivors if γ<0 Piecewise Constant over m intervals with fixed end points {xq }, and hazard h(t) = m λq I(xq−1<t≤xq ) q=1STiB: Survival Data Analysis 13/ 57
  26. 26. Survival Analysis F. Rotolo Parametric models Comparison of parametric models (Hougaard, 2000, Table 2.6) Property Exponential Weibull Gompertz Piecewise constant Increasing hazard possible No Yes Yes Yes Continuous hazard Yes Yes Yes No Estimate monotone (Constant) Yes Yes No Non-zero initial hazard Yes No Yes Yes Minimum stable Yes Yes No No Explicit estimation Yes No No Yes Needs choice of intervals No No No Yes No. of parameters 1 2 2 m Dim. of suff.stat. Complete data 1 n n 2m − 1 Censored data 2 2n 2n 2m n = number of observations; m + 1 = number of intervals in the piecewise constant modelSTiB: Survival Data Analysis 14/ 57
  27. 27. Survival Analysis F. Rotolo Non-Parametric models Non-parametric methods require no assumption on the form of survival function. In general, the most common NP estimator is the empirical ˆ distribution function F (t), but censoring prevents its use.STiB: Survival Data Analysis 15/ 57
  28. 28. Survival Analysis F. Rotolo Non-Parametric models Non-parametric methods require no assumption on the form of survival function. In general, the most common NP estimator is the empirical ˆ distribution function F (t), but censoring prevents its use. Two methods are very widely used: ˆ the Kaplan–Meier estimator SKM (t) of the Survival function ˆ the Nelson–Aalen estimator HNA (t) of the Cumulative Hazard ˆ ˆ Note that SKM (t) = exp{−HNA (t)}.STiB: Survival Data Analysis 15/ 57
  29. 29. Survival Analysis F. Rotolo Kaplan–Meier estimator The Kaplan–Meier Product Limit estimator (Kaplan & Meier, 1958) of the Survival Function is ˆ Ni , SKM (t) = 1− (6) Ri i|ti ≤t with {ti }i the observed event times, Ni the number of events at time ti and Ri the number of survivors at time ti .STiB: Survival Data Analysis 16/ 57
  30. 30. Survival Analysis F. Rotolo Kaplan–Meier estimator The Kaplan–Meier Product Limit estimator (Kaplan & Meier, 1958) of the Survival Function is ˆ Ni , SKM (t) = 1− (6) Ri i|ti ≤t with {ti }i the observed event times, Ni the number of events at time ti and Ri the number of survivors at time ti . Its variance can be evaluated by the Greenwood’s formula (Greenwood, 1926; Meier, 1975): ˆ ˆ Ni V SKM (t) = [SKM (t)]2 · Ri (Ri − Ni ) i|ti ≤tSTiB: Survival Data Analysis 16/ 57
  31. 31. Survival Analysis F. Rotolo Nelson–Aalen estimator Nelson (1969); Aalen (1976) The Nelson–Aalen estimator of the cumulative hazard function is ˆ Ni , HNA (t) = (7) Ri i|ti ≤t with {ti }i the observed event times, Ni the number of events at time ti and Ri the number of survivors at time ti .STiB: Survival Data Analysis 17/ 57
  32. 32. Survival Analysis F. Rotolo Nelson–Aalen estimator Nelson (1969); Aalen (1976) The Nelson–Aalen estimator of the cumulative hazard function is ˆ Ni , HNA (t) = (7) Ri i|ti ≤t with {ti }i the observed event times, Ni the number of events at time ti and Ri the number of survivors at time ti . Its variance evaluated by the Greenwood’s formula is ˆ Ni V HNA (t) = · Ri2 i|ti ≤tSTiB: Survival Data Analysis 17/ 57
  33. 33. Survival Analysis F. Rotolo Cox proportional hazards model The most common and popular model in survival analysis is by far the Cox Regression Model (Cox, 1972).STiB: Survival Data Analysis 18/ 57
  34. 34. Survival Analysis F. Rotolo Cox proportional hazards model The most common and popular model in survival analysis is by far the Cox Regression Model (Cox, 1972). For a subject with covariates vector x, the hazard is expressed as Tβ h(t; x) = h0 (t)e x , (8) with β the linear regression parameters vector and h0 (t) the so-called baseline hazard function, corresponding to the hazard of a (hypothetical) reference subject with x = (0, . . . 0).STiB: Survival Data Analysis 18/ 57
  35. 35. Survival Analysis F. Rotolo Cox proportional hazards model For any two subjects i and j with covariates xi and xj , the hazard ratio h(t; xi ) h0 (t) exp(xT β) i = = exp{(xi − xj )T β} h(t; xj ) h0 (t) exp(xT β) j is time-constant, so the two hazard functions are proportional.STiB: Survival Data Analysis 19/ 57
  36. 36. Survival Analysis F. Rotolo Cox proportional hazards model For any two subjects i and j with covariates xi and xj , the hazard ratio h(t; xi ) h0 (t) exp(xT β) i = = exp{(xi − xj )T β} h(t; xj ) h0 (t) exp(xT β) j is time-constant, so the two hazard functions are proportional. The hypothesis of Proportional Hazards (PH) is quite strong ! On the other hand, the regression parameters have a very straightforward meaning. Indeed, if xi(k) = xj(k) + 1 and xi(l) = xj(l) , ∀l = k, then h(t; xi ) β(k) = log · h(t; xj )STiB: Survival Data Analysis 19/ 57
  37. 37. Survival Analysis F. Rotolo Cox proportional hazards model Semiparametric approach Under PH assumption, the likelihood (5) is n L(β, ξ; y ) = {h0 (yi ) exp(xT β)}δi exp {−H0 (yi ) exp(xT β)} , (9) i i i=1 with ξ are the baseline parameters and (β, ξ) corresponding to ζ.STiB: Survival Data Analysis 20/ 57
  38. 38. Survival Analysis F. Rotolo Cox proportional hazards model Semiparametric approach Under PH assumption, the likelihood (5) is n L(β, ξ; y ) = {h0 (yi ) exp(xT β)}δi exp {−H0 (yi ) exp(xT β)} , (9) i i i=1 with ξ are the baseline parameters and (β, ξ) corresponding to ζ. If the interest is in the covariates effect, the baseline hazard can be left unspecified and the likelihood can be profiled (Duchateau & Janssen, 2008, pg.’s 24–26) reducing to the Partial Likelihood n exp (xT β) i , L(β) = (10) j∈R(yi ) exp(xT β) j i=1 where R(t) = {r |yr ≥ t} is the risk set at t.STiB: Survival Data Analysis 20/ 57
  39. 39. Survival Analysis F. Rotolo Accelerated failure times model Very less used is the Accelerated Failure Time Model (AFT), where the covariates act directly on time via a scale factor. In this case the probability of surviving is S(t) = S0 (exp(xT β)t).STiB: Survival Data Analysis 21/ 57
  40. 40. Survival Analysis F. Rotolo Accelerated failure times model Very less used is the Accelerated Failure Time Model (AFT), where the covariates act directly on time via a scale factor. In this case the probability of surviving is S(t) = S0 (exp(xT β)t). Consequently the density and the hazard functions are f (t) = exp(xT β)f0 (exp(xT β)t) h(t) = exp(xT β)h0 (exp(xT β)t).STiB: Survival Data Analysis 21/ 57
  41. 41. Survival Analysis F. Rotolo Accelerated failure times model Very less used is the Accelerated Failure Time Model (AFT), where the covariates act directly on time via a scale factor. In this case the probability of surviving is S(t) = S0 (exp(xT β)t). Consequently the density and the hazard functions are f (t) = exp(xT β)f0 (exp(xT β)t) h(t) = exp(xT β)h0 (exp(xT β)t). The usual way of representing an AFT model is as loglinear model of times log T = xT α + .STiB: Survival Data Analysis 21/ 57
  42. 42. Survival Analysis F. Rotolo Accelerated failure times model Very less used is the Accelerated Failure Time Model (AFT), where the covariates act directly on time via a scale factor. In this case the probability of surviving is S(t) = S0 (exp(xT β)t). Consequently the density and the hazard functions are f (t) = exp(xT β)f0 (exp(xT β)t) h(t) = exp(xT β)h0 (exp(xT β)t). The usual way of representing an AFT model is as loglinear model of times log T = xT α + . In the (only) case of T ∼ Weibull, the model corresponds to a PH regression.STiB: Survival Data Analysis 21/ 57
  43. 43. Complications of Survival Models F. Rotolo Outline Survival Analysis Complications of Survival Models Non-proportional hazards Informative censoring Dependent observations Multi-state phenomena Competing Risks Incidence Covariates effect ReferencesSTiB: Survival Data Analysis 22/ 57
  44. 44. Complications of Survival Models F. Rotolo Complications of Survival Models Most of the methods for Survival Data Analysis rest on some hypotheses, notably proportional hazards uninformative censoring independent observations one type of unavoidable eventSTiB: Survival Data Analysis 23/ 57
  45. 45. Complications of Survival Models F. Rotolo Complications of Survival Models Most of the methods for Survival Data Analysis rest on some hypotheses, notably proportional hazards uninformative censoring independent observations one type of unavoidable event How to test for these assumptions? How to handle data not satisfying these assumptions?STiB: Survival Data Analysis 23/ 57
  46. 46. Complications of Survival Models F. Rotolo Complications of Survival Models Most of the methods for Survival Data Analysis rest on some hypotheses, notably proportional hazards uninformative censoring independent observations one type of unavoidable event How to test for these assumptions? How to handle data not satisfying these assumptions?STiB: Survival Data Analysis 23/ 57
  47. 47. Complications of Survival Models F. Rotolo Non-proportional hazards Despite most of the survival methods are based on the cox model, there might happen that hazards are not proportional.STiB: Survival Data Analysis 24/ 57
  48. 48. Complications of Survival Models F. Rotolo Non-proportional hazards Despite most of the survival methods are based on the cox model, there might happen that hazards are not proportional. The most simple case to handle is when hazards are proportional in subgroups, but not globally.STiB: Survival Data Analysis 24/ 57
  49. 49. Complications of Survival Models F. Rotolo Non-proportional hazards Despite most of the survival methods are based on the cox model, there might happen that hazards are not proportional. The most simple case to handle is when hazards are proportional in subgroups, but not globally. Proportional hazards within subgroups (Collett, 2003, pg. 316)STiB: Survival Data Analysis 24/ 57
  50. 50. Complications of Survival Models F. Rotolo Non-proportional hazards The effect of the treatment in the whole population is not multiplicative, despite it is so within each centre.STiB: Survival Data Analysis 25/ 57
  51. 51. Complications of Survival Models F. Rotolo Non-proportional hazards The effect of the treatment in the whole population is not multiplicative, despite it is so within each centre. What can be done is to use a stratified PH model hij (t) = h0j (t) exp(xT β), ij where the hazard of patient i from center j is exp(xT β) times the ij baseline h0j (t) of the stratum (center) at each time point.STiB: Survival Data Analysis 25/ 57
  52. 52. Complications of Survival Models F. Rotolo Non-proportional hazards The effect of the treatment in the whole population is not multiplicative, despite it is so within each centre. What can be done is to use a stratified PH model hij (t) = h0j (t) exp(xT β), ij where the hazard of patient i from center j is exp(xT β) times the ij baseline h0j (t) of the stratum (center) at each time point. Since different baselines are taken into account, the covariates effect is multiplicative and it can be estimated thanks to usual methods for PH cox models.STiB: Survival Data Analysis 25/ 57
  53. 53. Complications of Survival Models F. Rotolo Non-proportional hazards A more complex situation is when there are non-proportional hazards between levels of a dichotomous variable. Non-proportional hazards (Collett, 2003, pg. 317)STiB: Survival Data Analysis 26/ 57
  54. 54. Complications of Survival Models F. Rotolo Non-proportional hazards A more complex situation is when there are non-proportional hazards between levels of a dichotomous variable. Non-proportional hazards modelled as PH (Collett, 2003, pg. 317)STiB: Survival Data Analysis 26/ 57
  55. 55. Complications of Survival Models F. Rotolo Non-proportional hazards Hazards can be modelled as proportional in a series of k consecutive time intervals, obtaining the piecewise PH model     k  hi (t) = h0 (t) exp xi β1 + βj zj (t) ,   j=2 where xi is 0 for standard treatment and 1 for new treatment and the zj (t)’s are (time-varying) indicators for being in the j th interval.STiB: Survival Data Analysis 27/ 57
  56. 56. Complications of Survival Models F. Rotolo Non-proportional hazards Hazards can be modelled as proportional in a series of k consecutive time intervals, obtaining the piecewise PH model     k  hi (t) = h0 (t) exp xi β1 + βj zj (t) ,   j=2 where xi is 0 for standard treatment and 1 for new treatment and the zj (t)’s are (time-varying) indicators for being in the j th interval. Log-hazard ratio for treatments is now different in each interval: β1 for interval 1 β1 + βk for interval k > 1.STiB: Survival Data Analysis 27/ 57
  57. 57. Complications of Survival Models F. Rotolo Non-proportional hazards Hazards can be modelled as proportional in a series of k consecutive time intervals, obtaining the piecewise PH model hi (t) = h0 (t) exp xi β1 + βj zj (t) , where xi is 0 for standard treatment and 1 for new treatment and the zj (t)’s are (time-varying) indicators for being in the j th interval. Log-hazard ratio for treatments is now different in each interval: β1 for interval 1 β1 + βk for interval k > 1. Testing PH assumption: if all βk ’s are not significantly different from 0 then there is no evidence of non-PH.STiB: Survival Data Analysis 27/ 57
  58. 58. Complications of Survival Models F. Rotolo Complications of Survival Models Most of the methods for Survival Data Analysis rest on some hypotheses, notably proportional hazards uninformative censoring independent observations one type of unavoidable event How to test for these assumptions? How to handle data not satisfying these assumptions?STiB: Survival Data Analysis 28/ 57
  59. 59. Complications of Survival Models F. Rotolo Informative censoring Most of the survival analysis methods are only valid under independent censoring hypothesis: Ci ⊥ Ti . ⊥STiB: Survival Data Analysis 29/ 57
  60. 60. Complications of Survival Models F. Rotolo Informative censoring Most of the survival analysis methods are only valid under independent censoring hypothesis: Ci ⊥ Ti . ⊥ For censoring due to end of the study, independence is reasonable. For censoring due to loss to follow-up or competing risk it is much more questionable.STiB: Survival Data Analysis 29/ 57
  61. 61. Complications of Survival Models F. Rotolo Informative censoring Two typical situations (Putter et al., 2007): Healthy participants feel less need for medical services offered by the study, and therefore quit. → C is negatively correlated with T → Overestimation of event riskSTiB: Survival Data Analysis 30/ 57
  62. 62. Complications of Survival Models F. Rotolo Informative censoring Two typical situations (Putter et al., 2007): Healthy participants feel less need for medical services offered by the study, and therefore quit. → C is negatively correlated with T → Overestimation of event risk Persons with advanced disease progression have become too ill for further follow-up or they return to their country to spend the last period with their family. → C is positively correlated with T → Underestimation of event riskSTiB: Survival Data Analysis 30/ 57
  63. 63. Complications of Survival Models F. Rotolo Informative censoring Empirical evaluation An empirical way to check the uninformative censoring assumption is to plot observed survival times against each regressor, distinguishing censored and event times.STiB: Survival Data Analysis 31/ 57
  64. 64. Complications of Survival Models F. Rotolo Informative censoring Empirical evaluation An empirical way to check the uninformative censoring assumption is to plot observed survival times against each regressor, distinguishing censored and event times. (a) (b) + + + + + + + + + + ++ + 50 50 + + + + + + Time Time + 30 30 + + + + + q + + + q + + +q q q q ++ + q q q q 10 q 10 q + ++ + ++ ++ + ++ + + + + + + + q q q q q q q + q q q q + q q q q q 40 50 60 70 80 40 50 60 70 80 Age at diagnosis Age at diagnosis o = censored; + = event Example of data not suggesting (a) and suggesting (b) informative censoringSTiB: Survival Data Analysis 31/ 57
  65. 65. Complications of Survival Models F. Rotolo Informative censoring Bounding unobserved event times A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study, comparing conclusions from two extreme situations, where censored times are treated as event times with the same time value of censoring time with the largest event time in the data setSTiB: Survival Data Analysis 32/ 57
  66. 66. Complications of Survival Models F. Rotolo Informative censoring Bounding unobserved event times A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study, comparing conclusions from two extreme situations, where censored times are treated as event times with the same time value of censoring time with the largest event time in the data set o o + o 40 o + o o o o + o + o 30 o o o o o o + o + + 20 o o o + o + + + o o 10 o o + o o + o o + o 0 0 10 20 30 40 50 60 Time o = censored; + = eventSTiB: Survival Data Analysis 32/ 57
  67. 67. Complications of Survival Models F. Rotolo Informative censoring Bounding unobserved event times A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study, comparing conclusions from two extreme situations, where censored times are treated as event times with the same time value of censoring time with the largest event time in the data set + o + o + + + o 40 o + ++ o o + o + o + + + o + + o 30 o + o + o + o ++ o o + + + o + + 20 o + o + o + + + o + + + o ++ o 10 o + o + + o + o + + o + o + + o 0 0 10 20 30 40 50 60 Time o = censored; + = eventSTiB: Survival Data Analysis 32/ 57
  68. 68. Complications of Survival Models F. Rotolo Informative censoring Bounding unobserved event times A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study, comparing conclusions from two extreme situations, where censored times are treated as event times with the same time value of censoring time with the largest event time in the data set o ++o + + + o 40 o + ++ o o + o + + o + + o + + o 30 o o o + + o o + + o + + + + o + + 20 o o o + + + + o + + + + o ++o 10 o o + + o + + o + o o + + + o + 0 0 10 20 30 40 50 60 Time o = censored; + = eventSTiB: Survival Data Analysis 32/ 57
  69. 69. Complications of Survival Models F. Rotolo Informative censoring Bounding unobserved event times A more formal way to investigate sensibleness of the independent censoring hypothesis is a sort of robustness study, comparing conclusions from two extreme situations, where censored times are treated as event times with the same time value of censoring time with the largest event time in the data set o ++o + + + o 40 o + ++ o o + o + + o + + o + + o 30 o o o + + o o + + o + + + + o + + 20 o o o + + + + o + + + + o ++o 10 o o + + o + + o + o o + + + o + 0 0 10 20 30 40 50 60 Time If essentially the same conclusions can be drawn from the original and these two models, then the censoring times can be safely treated as independent of the event times.STiB: Survival Data Analysis 32/ 57
  70. 70. Complications of Survival Models F. Rotolo Informative censoring Logistic regression The most formal way of testing independent censoring hypothesis is to use a linear logistic model with censoring variable as response.STiB: Survival Data Analysis 33/ 57
  71. 71. Complications of Survival Models F. Rotolo Informative censoring Logistic regression The most formal way of testing independent censoring hypothesis is to use a linear logistic model with censoring variable as response. If any covariate results significant in predicting whether the event time is observed or censored, then the independence hypothesis is quite unlikely.STiB: Survival Data Analysis 33/ 57
  72. 72. Complications of Survival Models F. Rotolo Informative censoring Logistic regression The most formal way of testing independent censoring hypothesis is to use a linear logistic model with censoring variable as response. If any covariate results significant in predicting whether the event time is observed or censored, then the independence hypothesis is quite unlikely. What to do?STiB: Survival Data Analysis 33/ 57
  73. 73. Complications of Survival Models F. Rotolo Informative censoring Solutions are quite limited and no satisfactory way to overcome the problem exists.STiB: Survival Data Analysis 34/ 57
  74. 74. Complications of Survival Models F. Rotolo Informative censoring Solutions are quite limited and no satisfactory way to overcome the problem exists. Censoring all data before the first censored observation makes the censoring really independent of event times, but it is little useful if this occurs early. o o + o 40 o + o o o o + o + o 30 o o o q o o o + o + + 20 o o o + o + + + o o 10 o o + o o + o o + o 0 0 10 20 30 40 50 60 Time o = censored; + = eventSTiB: Survival Data Analysis 34/ 57
  75. 75. Complications of Survival Models F. Rotolo Informative censoring Solutions are quite limited and no satisfactory way to overcome the problem exists. Censoring all data before the first censored observation makes the censoring really independent of event times, but it is little useful if this occurs early. o o o o o + o o 40 o o o + o o o o o o o o + o o + o o 30 o o o o o o q o o o o o o + o o o + o + 20 o o o o o o o + o o o + o + o + o o o o 10 o o o o o + o o o o + o o o o o + o o 0 0 10 20 30 40 50 60 Time o = censored; + = eventSTiB: Survival Data Analysis 34/ 57
  76. 76. Complications of Survival Models F. Rotolo Complications of Survival Models Most of the methods for Survival Data Analysis rest on some hypotheses, notably proportional hazards uninformative censoring independent observations one type of unavoidable event How to test for these assumptions? How to handle data not satisfying these assumptions?STiB: Survival Data Analysis 35/ 57
  77. 77. Complications of Survival Models F. Rotolo Dependent observations Cox models and most of the survival analysis models assume that, conditionally on possible regressors, event times are i.i.d.STiB: Survival Data Analysis 36/ 57
  78. 78. Complications of Survival Models F. Rotolo Dependent observations Cox models and most of the survival analysis models assume that, conditionally on possible regressors, event times are i.i.d. This is an unreasonable assumption in many situations: multi-centre studies repeated measures on the same subject inclusion of relatives in the same study measures on similar organs from the same organism paired samples ...STiB: Survival Data Analysis 36/ 57
  79. 79. Complications of Survival Models F. Rotolo Dependent observations Cox models and most of the survival analysis models assume that, conditionally on possible regressors, event times are i.i.d. This is an unreasonable assumption in many situations: multi-centre studies repeated measures on the same subject inclusion of relatives in the same study measures on similar organs from the same organism paired samples ... If the group effect is of interest, the factor is inserted in the model as usual. More often one is only interested in controlling its effect in a parsimonious way in term of parameters.STiB: Survival Data Analysis 36/ 57
  80. 80. Complications of Survival Models F. Rotolo Dependent observations The most common way to account for clustering in hazard regression models is in a mixed model form (McCullagh & Nelder, 1989) through a random effect. 2 log{hij (t)} = log{h0 (t)} + wj + xT β, ij wj ∼ IID(0, σw ).STiB: Survival Data Analysis 37/ 57
  81. 81. Complications of Survival Models F. Rotolo Dependent observations The most common way to account for clustering in hazard regression models is in a mixed model form (McCullagh & Nelder, 1989) through a random effect. 2 log{hij (t)} = log{h0 (t)} + wj + xT β, ij wj ∼ IID(0, σw ). The random effect wj is unobservable and common to all elements of a cluster.STiB: Survival Data Analysis 37/ 57
  82. 82. Complications of Survival Models F. Rotolo Dependent observations The most common way to account for clustering in hazard regression models is in a mixed model form (McCullagh & Nelder, 1989) through a random effect. 2 log{hij (t)} = log{h0 (t)} + wj + xT β, ij wj ∼ IID(0, σw ). The random effect wj is unobservable and common to all elements of a cluster. Its actual realizations are not that important; on the contrary its distribution is of primary interest to eliminate the variability introduced by it.STiB: Survival Data Analysis 37/ 57
  83. 83. Complications of Survival Models F. Rotolo Dependent observations In survival analysis, the model is usually expressed in the form 2 hij (t) = h0 (t)zj exp{xT β}, ij zj ∼ IID(1, σz ). (11) with zj = e wj > 0 and is called Frailty Model (Duchateau & Janssen, 2008; Wienke, 2009).STiB: Survival Data Analysis 38/ 57
  84. 84. Complications of Survival Models F. Rotolo Dependent observations In survival analysis, the model is usually expressed in the form 2 hij (t) = h0 (t)zj exp{xT β}, ij zj ∼ IID(1, σz ). (11) with zj = e wj > 0 and is called Frailty Model (Duchateau & Janssen, 2008; Wienke, 2009). The random variable zj was named frailty (term) by Vaupel et al. (1979) as long as subjects with larger values have an increased hazard, then they are more likely to die sooner.STiB: Survival Data Analysis 38/ 57
  85. 85. Complications of Survival Models F. Rotolo Dependent observations In survival analysis, the model is usually expressed in the form 2 hij (t) = h0 (t)zj exp{xT β}, ij zj ∼ IID(1, σz ). (11) with zj = e wj > 0 and is called Frailty Model (Duchateau & Janssen, 2008; Wienke, 2009). The random variable zj was named frailty (term) by Vaupel et al. (1979) as long as subjects with larger values have an increased hazard, then they are more likely to die sooner. Note that the frailty is time-constant, so the hazard is increased or decreased at any time.STiB: Survival Data Analysis 38/ 57
  86. 86. Complications of Survival Models F. Rotolo Dependent observations The main consequences of this approach are two: Dependence between event times in the same cluster Thanks to that Frailty Models can account for dependency! Non-proportionality of hazards in general Hazards are still proportional conditionally on frailty valuesSTiB: Survival Data Analysis 39/ 57
  87. 87. Complications of Survival Models F. Rotolo Dependent observations The main consequences of this approach are two: Dependence between event times in the same cluster Thanks to that Frailty Models can account for dependency! Non-proportionality of hazards in general Hazards are still proportional conditionally on frailty values Clusters can also have dimension 1, in which case all methods are unchanged but their meaning and interpretation are quite different. (Univariate frailty models for overdispersion: Wienke, 2009, Chp. 3)STiB: Survival Data Analysis 39/ 57

×