• Like
  • Save

Application of survival data analysis introduction and discussion

  • 1,723 views
Uploaded on

Application of Survival Data Analysis- Introduction and Discussion (存活数据分析及应用- 简介和讨论), will give an overview of survival data analysis, including parametric and non-parametric approaches and …

Application of Survival Data Analysis- Introduction and Discussion (存活数据分析及应用- 简介和讨论), will give an overview of survival data analysis, including parametric and non-parametric approaches and proportional hazard model, providing a real life example of survival data-based field return analysis. Several common issues in survival data analysis will also be discussed.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,723
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
67
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Application of Survival Data  Analysis‐ Introduction and  Analysis Introduction and Discussion (存活数据分析及应 ( 用‐ 简介和讨论) Shaoang Zhang, Ph.D. , (张少昂博士) ©2012 ASQ & Presentation Xing ©2012 ASQ & Presentation Xing Presented live on Dec 15th, 2012http://reliabilitycalendar.org/The_Reliability Calendar/Webinars_‐ y_ /_Chinese/Webinars_‐_Chinese.html
  • 2. ASQ Reliability Division  ASQ Reliability Division Chinese Webinar Series Chinese Webinar Series One of the monthly webinars  One of the monthly webinars on topics of interest to  reliability engineers. To view recorded webinar (available to ASQ Reliability  ( y Division members only) visit asq.org/reliability To sign up for the free and available to anyone live webinars  To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to  find links to register for upcoming eventshttp://reliabilitycalendar.org/The_Reliability Calendar/Webinars_‐ y_ /_Chinese/Webinars_‐_Chinese.html
  • 3. Survival Analysis- Introduction and Discussion(存活数据分析及应用- 简介和讨论)Shaoang Zhang, Ph.D.Biostatistics, OptumRXDecember 16, 2012
  • 4. Outline Introduction  Measurements of ARR and reliability  Survival data – a glance  Special Features in survival data Overview of Statistical Methods  Parametric approach  Distribution based approach  Semi-parametric approach – Cox PH model  Accelerated Failure Time Model  Frailty Model  Non-parametric approach  Kaplan–Meier curve  Log-Rank Test Examples Discussion Summary
  • 5. Measurements of Field Failure andReliability ARR (Annual Return Rate) – based on field returns in one year.  How to define one year in field? - Shipments can go out at different times, so one year in the field may mean different starting date in calendar. One year from the first shipment, one year for every shipment, or one year of continuous operation for every unit included in the shipments considered?  Many different ARR calculations by applying different adjustments  Linear extrapolation  Prediction based on survival curve Reliability Prediction  MTTF – MTTF is estimated based on reliability tests. For example, MTTF of a hard disk drive can be millions of hours. However, the reliability may only cover thousands of hours (in field). How accurate is the estimation?  Multiple distributions for failure time  Multiple failure modes may govern failures at different life time.  Example – bathtub hazard curve
  • 6. ARR Calculation Annual Returns• Shipment 1 Actual Estimated • Shipment 2 • Shipment 3 Estimated • Shipment 4 Estimated • Shipment 5 Estimated • Shipment 6 Estimated How to estimate or predict survival at a future time point?
  • 7. Survival Data – A Glance  What is survival data?  Data measuring the time to event Number alive and under  Events: death, failure, received, a complication, etc. Year since observation at  Incomplete data in terms of event time entry into the beginning Number dyning Numbercensor 1- Mortality Survival study of interval during interval ed or withdraw Mortality rate Rate function [0,1) 146 27 3 0.18 0.82 0.82 An example [1,2) 116 18 10 0.16 0.84 0.69 [2,3) 88 21 10 0.24 0.76 0.52 [3,4) 57 9 3 0.16 0.84 0.44 [4,5) 45 1 3 0.02 0.98 0.43Year since Number alive and Number dying Number [5,6) 41 2 11 0.05 0.95 0.41entry into under observation during intervalcensored or [6,7) 28 3 5 0.11 0.89 0.37study withdrawn [7,8) 20 1 8 0.05 0.95 0.35 at the beginning of [8,9) 11 2 1 0.18 0.82 0.28 interval [9,10) 8 2 6 0.25 0.75 0.21[0,1) 146 27 3[1,2) 116 18 10 1[2,3) 88 21 10 0.8[3,4) 57 9 3 Survival Function[4,5) 45 1 3 0.6[5,6) 41 2 11[6,7) 28 3 5 0.4[7,8) 20 1 8 0.2[8,9) 11 2 1[9,10) 8 2 6 0 0 1 2 3 4 5 6 7 8 9 10 Data cited from a clinical trial on myocardial infarction (MI) (Svetlana, S., 2002) Year after enter into study
  • 8. Special Features of Survival Data? Time-to-event - The primary interest of the survival analysis is time to event.  Time to event can be modeled by a distribution function  Random variable  The „time to event‟ for every unit is available as time goes infinity (or approaching to a limit)  The time to event is usually not normally distributed Censored - with incomplete information about the „time to event‟. General issues in survival data analysis  The non-normality aspect of the survival data violates the normality assumption of most commonly used statistical model such as regression or ANOVA, etc.  Incompleteness may cause issues such as:  Estimation bias.  Difficulty in validating the assumption
  • 9. Censoring A censored observation is defined as an observation with incomplete information about the „time-to-event‟ Different types of censoring, such as right censoring, left censoring, and interval censoring, etc. Right censoring --- The information about time to event is incomplete because the subject did not have an event during the time when the subject was studied.
  • 10. Overview of Statistical Methods Objectives:  Characterize and estimate the distribution of the failure time;  Compare failure times among different groups, e.g. generations of products (old vs. new), treatment vs. control, etc.  Assess the relationship of covariates to time-to-event, e.g. which factors significantly affect the distribution of time-to-event? Approaches:  To estimate the survival (hazard) function:  parametric approach: specify a parametric model, i.e. a specific distribution (exponential, Weibull, etc.)  empirical approach: use nonparametric or semi-parametric estimation (more popular in biomedical sciences), such as Kaplan–Meier estimator  To compare two survival functions:  Log-rank test  To model the relationship between failure time and covariates:  Cox proportional hazard model  Accelerate failure-time model  Frailty model
  • 11. Parametric Survival Model Parametric Survival Model  Assumption on underlying distribution  Hazard function, h(t), and survival function, S(t), is completely specified  Continuous process  Prediction possible Main Assumption  The survival time t is assumed to follow a distribution with density function f (t). Specifying one of the three functions f(t), S(t), or h(t) means to specify the other two functions.  S (t )  P (T  t )   f (u )du t d  S (t )  t  f (t )    h(u )du  h(t )   dt S (t )  exp   S (t ) S (t )  0 
  • 12. Shapes of Hazard Function
  • 13. Weibull Model Assumption:  Time to event, t, follows Weibull ( ,  ) with probability function: f (t )  t  1 exp(t  ), where  ,   0  The hazard function is given by: h(t )  t  1  The survival function S (t )  exp(t  ) S (t )  exp( t  )   log( S (t ))  t   log(  log( S (t )))  log(  )   log(t ) Exponential Distribution – nice properties Flexible Graphical evaluation
  • 14. Likelihood and Censored Survival Data Likelihood estimate (right censored data):  The likelihood function of parameter(s)  : n L( , t )    f (ti , ) i [ S (ti ,  )]1 i  i 1  MLE ˆ of  : ( ; t ) U ( ; t )   0 where ( ; t ) is the log likelihood function  ˆ  ~ N ( ,V ) where V  J 1 and J denotes Fisher informatio n matrix  Hypothesis Tests  Score test  Likelihood ratio test
  • 15. Semi-Parametric Model Cox PH Model - a very popular model in Biostatistics  Distribution of time-to-event unknown but proportional hazard ratio is assumed.  Baseline hazard is not needed in the estimation of hazard ratio  Semi-parametric - The baseline hazard can take any form, the covariates enter the model linearly Proportional hazard assumption h(t | X )  h0 (t ) exp( X ) h(t | X 1 ) h0 (t ) exp( X 1 )   exp(( X 1  X 0 )  ) h(t | X 0 ) h0 (t ) exp( X 0  ) Parameter estimation – based on partial likelihood function k exp( X [ j ]  ) L j 1 lR exp( X l ) j where X [ j ] denotes the covariate vector for the observation which actually experience d the event at t j ; R j denotes the risk set at time t j ; k denotes dictinct event time s.
  • 16. Cox PH Model Effect of treatment vs. control (X=1 vs. X=0) ˆ HR  exp(  ) ˆ is exp(  ) the relative odds of observations from the treatment group, relative to observations from the control group. An intuitive way of understanding the influence of covariates on the hazard Weibull model and proportional hazard  If the shape parameter does not change but the scale parameter is influenced by the covariates, Weibull model implies the assumption of proportional hazard holds. Let   exp( X ) in the Weibull Model, we have h(t | X 1 )  exp( X 1 )t  1   1  exp(( X 1  X 0 )  ) h(t | X 2)  exp( X 0  )t
  • 17. Accelerate Failure Time Model Accelerated failure time model (AFT)  A parametric model that describes covariate effects in terms of survival time instead of relative hazard as Cox PH model. A distribution has a scale parameter.  Log-logistic distribution  Other distributions, such as Weibull distribution Gamma distribution, etc.  Assumption:  The influence of a covariate is to multiply the predicted time to event (not hazard) by some constant. Therefore, it can be expressed as a linear model for the logarithm of the survival time.  Model: S (t | X 1 )  S (t | X 2 ) where  is the accelerati factor on log(t )  X  Weibull distribution and AFT 1 1 Assume :  exp( X ), we have : log( t )  X    1/  
  • 18. Frailty Model Model Assumption: h j (t | X i , j )  h0 (t ) j exp( X j  )  It is assumed that the frailty factor  j follow a distribution (such as Gamma and inverse Gaussian) with mean of 1 and an unknown variance that can be specified by a parameter. Frailty model is usually used to a population that are likely to have a mixture of hazards (with heterogeneity). Some subjects are more failure-prone so that more „frail‟. A random effect model - to count for unmeasured or unobserved „frailties‟. Weibull Model: For Weibul l Model, with a simple gamma frailty assumption ,  ~ g (1 /  ,  ), we have :  h(t )   (t ) 1 S (t ) , where S (t )  1    t    1 / 
  • 19. Non-Parametric Approach Kaplan-Meier survival curve  The approach was published in 1958 by Edward L. Kaplan and Paul Meier in their paper, “Non-parametric estimation from incomplete observations”. J. Am. Stat. Assoc. 53:457-481. Kaplan and Meier were interested in the lifetime of vacuum tubes and the duration of cancer, respectively.  Also called product limit method, since  d  S (t )   1  i  ˆ  ti t  ni   where d i is number of events at time ti and ni is the number of subjects at risk just prior to time t i .  Confidence interval: Kalbfleisch and Prentice (2002) suggested using: ˆ ˆ  V log(  log( S (t )))  1 ˆ (log( S (t )) 2  n (n di  di ) ˆ to get a confidence for log(  log( S (t ))). ti t i i ˆ The confidence interval for s (t ) can be derived accordingly.
  • 20. Non-Parametric Approach Log-Rank test is used to test the equality of two survival functions. For comparing two survival curves, we have: Z   j (o1 j  e1 j )  j v1 j Z 2 ~ 1 2 v1 j is estimated based on a hypergeometric distribution.
  • 21. Example 1 Example 1: Field survival data can be used to further evaluate product quality and may indicate possible quality related issues. The hazard function for hard disk drive field returns (or Weibull fit failures) shows a significant peak at early life time. Lognormal fit Commonly used parametric distribution models such as Weibull, Lognormal, or Logistic model fit such a hazard function poorly. Therefore, Kaplan-Meier and Log-Rank test are used to Logistic fit describe survival functions and evaluate the effects of two interested factors on drive‟s field survivals, respectively.
  • 22. Example 1 In addition, field survival data is observational. Propensity score matching is applied to balance out possible effect from other factors (covariates). Both before and after matching results are presented here. Chi- Test Chi-Square DF ProbChiSq Test Square DF ProbChiSq Matched Sample Log-Rank 138.5724 1 <.0001 Log-Rank 1.2565 1 0.2613Original Data Description HazardRatio WaldLower WaldUpper Description HazardRatio WaldLower WaldUpper GROUP1 vs. GROUP2 2.287 1.971 2.653 GROUP1 vs. 1.151 0.643 2.060 GROUP2
  • 23. Example 2 This is an example to demonstrate Cox PH model application. The time to event is the disease free time for a Acute Myelocytic Leukemia (AML) patient after a special treatment. It is interested to evaluate if the disease free time after the treatment may vary by gender and by age.Obs Group gender age Time Status1 AML-Low Risk M 24 3395 02 AML-Low Risk F 26 3471 03 AML-Low Risk F 26 3618 04 AML-Low Risk M 27 3286 05 AML-Mediate Risk F 29 3034 06 AML-Mediate Risk F 31 3676 07 AML-Low Risk M 31 2547 08 AML-Low Risk M 32 3183 09 AML-High Risk F 32 4123 010 AML-Low Risk M 33 2569 011 AML-Low Risk M 33 2900 012 AML-Low Risk F 33 2805 113 AML-Low Risk M 34 3691 014 AML-Low Risk F 34 3179 015 AML-Low Risk F 34 2246 016 AML-High Risk F 34 3328 0 Test of Equality over Strata17 AML-High Risk F 35 2640 0 Test Chi-Square DF Pr >Chi-Square18 AML-Low Risk M 39 1760 1… … … … … Log-Rank 26.9998 5 <.0001273 AML-High Risk M 74 16 1Part of the data used in this example is from anexample published by SAS
  • 24. Example 2 • SAS codes proc phreg data=Example2; class gender group; model Time*Status(0)=age group gender /selection=stepwise; run;Analysis of Maximum Likelihood EstimatesParameter DF Paramete Standard Chi-Square Pr > ChiS Hazard r Error q Ratio Estimateage 1 0.15180 0.01229 152.5961 <.0001 1.164Group AML-High 1 0.46243 0.19063 5.8844 0.0153 1.588 RiskGroup AML-Low 1 -0.18436 0.20569 0.8034 0.3701 0.832 Risk Summary of Stepwise Selection Step Effect DF Number Score Wald Pr > ChiSq In Chi-Square Chi-Square Entered Removed 1 age 1 1 169.3010 <.0001 2 Group 2 2 13.1022 0.0014 Test of Equality over Strata Test Chi-Square DF Pr >Chi-Square The modeling result suggests that the effect of gender on Log-Rank 17.1657 2 <.0002 survival function after the transplant is not statistically significant, but the effects of age and severity group are significant.
  • 25. Discussion – Parametric Models Nice properties  Efficient data reduction – a function with a few parameters completely describes a survival pattern.  Enable Standardized comparison – evaluation and comparison based on statistics such as MTTF  Prediction into future possible Possible issues  Assumptions  Non-informative censoring  Parametric distribution  Exponential family – flexible enough?  One vs. multiple distributions – three Weibull distributions for describing a bathtub hazard?  How confident we are about future survival path?  Estimation  Distribution – usually non symmetric  Sample size and time period covered by observations  Censoring
  • 26. Discussion – Cox PH Model Nice properties:  Parametric distribution assumption is not needed.  Easy to evaluate or test the hypotheses about the effect of a covariate on survival  Very popular in clinical trail analysis and outcome studies Possible issues:  Proportional hazard – a strong assumption  When violated, stratified or extended Cox models may be used.  Tests of the assumption  log(-log(S(t))) plot  Including interactions with time in the model  Scaled Schoenfeld residuals plot  Estimation  Censored observation – not informative  Similar issues as seen in a multivariate regression model
  • 27. Discussion – Non-Parametric Approach Nice properties:  Distribution free  Graphical and intuitive  Describe well observed survival Possible issues  Not continuous  Estimates can be biased when improperly stratified– For example, survival function estimates on the tail can be poor.  Smoothing is usually needed when estimating hazard function  Not informative in terms of future survival function  In cases with cross survival or hazard curves, Log-Rank test is not appropriate.
  • 28. Discussion – Estimation Improvement Bayesian based survival analysis approaches  Introducing prior knowledge to improve parameter estimation Application of multiple imputation to survival analysis  May reduce the effect of censored observations.  The availability of large historical observations may be informative to the imputation.
  • 29. Summary Survival analysis – has found its applications in many fields. It can be powerful in providing insightful information to evaluate a product reliability, monitoring field quality, assisting in making warranty policy, and validating new drug efficacy, etc. Parametric distribution based approach would be the most popular survival analysis approach in reliability engineering while Cox PH model and non- parametric approach are usually favored in biostatistical survival analysis. Each approach comes with its own assumptions and is designed to meet a specified purpose. Validation of these assumptions should always be conducted to ensure the appropriate applications of an approach. Censored data – a major characteristic for survival data that contributes to the uniqueness of survival data analysis and possible issues in model estimation. It should always be kept in mind when designing related experiments and analyzing survival data.
  • 30. Questions? Thanks!Contact Email: shao_zhang100@yahoo.com