Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Impact of censored data on reliabil... by ASQ Reliability D... 2260 views
- A gentle introduction to survival a... by Angelo Tinazzi 2203 views
- Survival Analysis for Predicting Em... by Tom Briggs 3248 views
- Survival Data Analysis for Sekolah ... by Setio Pramono 3576 views
- Introduction To Survival Analysis by federicorotolo 3312 views
- Survival analysis by Sanjaya Sahoo 1329 views

3,167 views

Published on

Published in:
Technology

No Downloads

Total views

3,167

On SlideShare

0

From Embeds

0

Number of Embeds

77

Shares

0

Downloads

67

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Application of Survival Data Analysis‐ Introduction and Analysis Introduction and Discussion (存活数据分析及应 ( 用‐ 简介和讨论) Shaoang Zhang, Ph.D. , (张少昂博士) ©2012 ASQ & Presentation Xing ©2012 ASQ & Presentation Xing Presented live on Dec 15th, 2012http://reliabilitycalendar.org/The_Reliability Calendar/Webinars_‐ y_ /_Chinese/Webinars_‐_Chinese.html
- 2. ASQ Reliability Division ASQ Reliability Division Chinese Webinar Series Chinese Webinar Series One of the monthly webinars One of the monthly webinars on topics of interest to reliability engineers. To view recorded webinar (available to ASQ Reliability ( y Division members only) visit asq.org/reliability To sign up for the free and available to anyone live webinars To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming eventshttp://reliabilitycalendar.org/The_Reliability Calendar/Webinars_‐ y_ /_Chinese/Webinars_‐_Chinese.html
- 3. Survival Analysis- Introduction and Discussion(存活数据分析及应用- 简介和讨论)Shaoang Zhang, Ph.D.Biostatistics, OptumRXDecember 16, 2012
- 4. Outline Introduction Measurements of ARR and reliability Survival data – a glance Special Features in survival data Overview of Statistical Methods Parametric approach Distribution based approach Semi-parametric approach – Cox PH model Accelerated Failure Time Model Frailty Model Non-parametric approach Kaplan–Meier curve Log-Rank Test Examples Discussion Summary
- 5. Measurements of Field Failure andReliability ARR (Annual Return Rate) – based on field returns in one year. How to define one year in field? - Shipments can go out at different times, so one year in the field may mean different starting date in calendar. One year from the first shipment, one year for every shipment, or one year of continuous operation for every unit included in the shipments considered? Many different ARR calculations by applying different adjustments Linear extrapolation Prediction based on survival curve Reliability Prediction MTTF – MTTF is estimated based on reliability tests. For example, MTTF of a hard disk drive can be millions of hours. However, the reliability may only cover thousands of hours (in field). How accurate is the estimation? Multiple distributions for failure time Multiple failure modes may govern failures at different life time. Example – bathtub hazard curve
- 6. ARR Calculation Annual Returns• Shipment 1 Actual Estimated • Shipment 2 • Shipment 3 Estimated • Shipment 4 Estimated • Shipment 5 Estimated • Shipment 6 Estimated How to estimate or predict survival at a future time point?
- 7. Survival Data – A Glance What is survival data? Data measuring the time to event Number alive and under Events: death, failure, received, a complication, etc. Year since observation at Incomplete data in terms of event time entry into the beginning Number dyning Numbercensor 1- Mortality Survival study of interval during interval ed or withdraw Mortality rate Rate function [0,1) 146 27 3 0.18 0.82 0.82 An example [1,2) 116 18 10 0.16 0.84 0.69 [2,3) 88 21 10 0.24 0.76 0.52 [3,4) 57 9 3 0.16 0.84 0.44 [4,5) 45 1 3 0.02 0.98 0.43Year since Number alive and Number dying Number [5,6) 41 2 11 0.05 0.95 0.41entry into under observation during intervalcensored or [6,7) 28 3 5 0.11 0.89 0.37study withdrawn [7,8) 20 1 8 0.05 0.95 0.35 at the beginning of [8,9) 11 2 1 0.18 0.82 0.28 interval [9,10) 8 2 6 0.25 0.75 0.21[0,1) 146 27 3[1,2) 116 18 10 1[2,3) 88 21 10 0.8[3,4) 57 9 3 Survival Function[4,5) 45 1 3 0.6[5,6) 41 2 11[6,7) 28 3 5 0.4[7,8) 20 1 8 0.2[8,9) 11 2 1[9,10) 8 2 6 0 0 1 2 3 4 5 6 7 8 9 10 Data cited from a clinical trial on myocardial infarction (MI) (Svetlana, S., 2002) Year after enter into study
- 8. Special Features of Survival Data? Time-to-event - The primary interest of the survival analysis is time to event. Time to event can be modeled by a distribution function Random variable The „time to event‟ for every unit is available as time goes infinity (or approaching to a limit) The time to event is usually not normally distributed Censored - with incomplete information about the „time to event‟. General issues in survival data analysis The non-normality aspect of the survival data violates the normality assumption of most commonly used statistical model such as regression or ANOVA, etc. Incompleteness may cause issues such as: Estimation bias. Difficulty in validating the assumption
- 9. Censoring A censored observation is defined as an observation with incomplete information about the „time-to-event‟ Different types of censoring, such as right censoring, left censoring, and interval censoring, etc. Right censoring --- The information about time to event is incomplete because the subject did not have an event during the time when the subject was studied.
- 10. Overview of Statistical Methods Objectives: Characterize and estimate the distribution of the failure time; Compare failure times among different groups, e.g. generations of products (old vs. new), treatment vs. control, etc. Assess the relationship of covariates to time-to-event, e.g. which factors significantly affect the distribution of time-to-event? Approaches: To estimate the survival (hazard) function: parametric approach: specify a parametric model, i.e. a specific distribution (exponential, Weibull, etc.) empirical approach: use nonparametric or semi-parametric estimation (more popular in biomedical sciences), such as Kaplan–Meier estimator To compare two survival functions: Log-rank test To model the relationship between failure time and covariates: Cox proportional hazard model Accelerate failure-time model Frailty model
- 11. Parametric Survival Model Parametric Survival Model Assumption on underlying distribution Hazard function, h(t), and survival function, S(t), is completely specified Continuous process Prediction possible Main Assumption The survival time t is assumed to follow a distribution with density function f (t). Specifying one of the three functions f(t), S(t), or h(t) means to specify the other two functions. S (t ) P (T t ) f (u )du t d S (t ) t f (t ) h(u )du h(t ) dt S (t ) exp S (t ) S (t ) 0
- 12. Shapes of Hazard Function
- 13. Weibull Model Assumption: Time to event, t, follows Weibull ( , ) with probability function: f (t ) t 1 exp(t ), where , 0 The hazard function is given by: h(t ) t 1 The survival function S (t ) exp(t ) S (t ) exp( t ) log( S (t )) t log( log( S (t ))) log( ) log(t ) Exponential Distribution – nice properties Flexible Graphical evaluation
- 14. Likelihood and Censored Survival Data Likelihood estimate (right censored data): The likelihood function of parameter(s) : n L( , t ) f (ti , ) i [ S (ti , )]1 i i 1 MLE ˆ of : ( ; t ) U ( ; t ) 0 where ( ; t ) is the log likelihood function ˆ ~ N ( ,V ) where V J 1 and J denotes Fisher informatio n matrix Hypothesis Tests Score test Likelihood ratio test
- 15. Semi-Parametric Model Cox PH Model - a very popular model in Biostatistics Distribution of time-to-event unknown but proportional hazard ratio is assumed. Baseline hazard is not needed in the estimation of hazard ratio Semi-parametric - The baseline hazard can take any form, the covariates enter the model linearly Proportional hazard assumption h(t | X ) h0 (t ) exp( X ) h(t | X 1 ) h0 (t ) exp( X 1 ) exp(( X 1 X 0 ) ) h(t | X 0 ) h0 (t ) exp( X 0 ) Parameter estimation – based on partial likelihood function k exp( X [ j ] ) L j 1 lR exp( X l ) j where X [ j ] denotes the covariate vector for the observation which actually experience d the event at t j ; R j denotes the risk set at time t j ; k denotes dictinct event time s.
- 16. Cox PH Model Effect of treatment vs. control (X=1 vs. X=0) ˆ HR exp( ) ˆ is exp( ) the relative odds of observations from the treatment group, relative to observations from the control group. An intuitive way of understanding the influence of covariates on the hazard Weibull model and proportional hazard If the shape parameter does not change but the scale parameter is influenced by the covariates, Weibull model implies the assumption of proportional hazard holds. Let exp( X ) in the Weibull Model, we have h(t | X 1 ) exp( X 1 )t 1 1 exp(( X 1 X 0 ) ) h(t | X 2) exp( X 0 )t
- 17. Accelerate Failure Time Model Accelerated failure time model (AFT) A parametric model that describes covariate effects in terms of survival time instead of relative hazard as Cox PH model. A distribution has a scale parameter. Log-logistic distribution Other distributions, such as Weibull distribution Gamma distribution, etc. Assumption: The influence of a covariate is to multiply the predicted time to event (not hazard) by some constant. Therefore, it can be expressed as a linear model for the logarithm of the survival time. Model: S (t | X 1 ) S (t | X 2 ) where is the accelerati factor on log(t ) X Weibull distribution and AFT 1 1 Assume : exp( X ), we have : log( t ) X 1/
- 18. Frailty Model Model Assumption: h j (t | X i , j ) h0 (t ) j exp( X j ) It is assumed that the frailty factor j follow a distribution (such as Gamma and inverse Gaussian) with mean of 1 and an unknown variance that can be specified by a parameter. Frailty model is usually used to a population that are likely to have a mixture of hazards (with heterogeneity). Some subjects are more failure-prone so that more „frail‟. A random effect model - to count for unmeasured or unobserved „frailties‟. Weibull Model: For Weibul l Model, with a simple gamma frailty assumption , ~ g (1 / , ), we have : h(t ) (t ) 1 S (t ) , where S (t ) 1 t 1 /
- 19. Non-Parametric Approach Kaplan-Meier survival curve The approach was published in 1958 by Edward L. Kaplan and Paul Meier in their paper, “Non-parametric estimation from incomplete observations”. J. Am. Stat. Assoc. 53:457-481. Kaplan and Meier were interested in the lifetime of vacuum tubes and the duration of cancer, respectively. Also called product limit method, since d S (t ) 1 i ˆ ti t ni where d i is number of events at time ti and ni is the number of subjects at risk just prior to time t i . Confidence interval: Kalbfleisch and Prentice (2002) suggested using: ˆ ˆ V log( log( S (t ))) 1 ˆ (log( S (t )) 2 n (n di di ) ˆ to get a confidence for log( log( S (t ))). ti t i i ˆ The confidence interval for s (t ) can be derived accordingly.
- 20. Non-Parametric Approach Log-Rank test is used to test the equality of two survival functions. For comparing two survival curves, we have: Z j (o1 j e1 j ) j v1 j Z 2 ~ 1 2 v1 j is estimated based on a hypergeometric distribution.
- 21. Example 1 Example 1: Field survival data can be used to further evaluate product quality and may indicate possible quality related issues. The hazard function for hard disk drive field returns (or Weibull fit failures) shows a significant peak at early life time. Lognormal fit Commonly used parametric distribution models such as Weibull, Lognormal, or Logistic model fit such a hazard function poorly. Therefore, Kaplan-Meier and Log-Rank test are used to Logistic fit describe survival functions and evaluate the effects of two interested factors on drive‟s field survivals, respectively.
- 22. Example 1 In addition, field survival data is observational. Propensity score matching is applied to balance out possible effect from other factors (covariates). Both before and after matching results are presented here. Chi- Test Chi-Square DF ProbChiSq Test Square DF ProbChiSq Matched Sample Log-Rank 138.5724 1 <.0001 Log-Rank 1.2565 1 0.2613Original Data Description HazardRatio WaldLower WaldUpper Description HazardRatio WaldLower WaldUpper GROUP1 vs. GROUP2 2.287 1.971 2.653 GROUP1 vs. 1.151 0.643 2.060 GROUP2
- 23. Example 2 This is an example to demonstrate Cox PH model application. The time to event is the disease free time for a Acute Myelocytic Leukemia (AML) patient after a special treatment. It is interested to evaluate if the disease free time after the treatment may vary by gender and by age.Obs Group gender age Time Status1 AML-Low Risk M 24 3395 02 AML-Low Risk F 26 3471 03 AML-Low Risk F 26 3618 04 AML-Low Risk M 27 3286 05 AML-Mediate Risk F 29 3034 06 AML-Mediate Risk F 31 3676 07 AML-Low Risk M 31 2547 08 AML-Low Risk M 32 3183 09 AML-High Risk F 32 4123 010 AML-Low Risk M 33 2569 011 AML-Low Risk M 33 2900 012 AML-Low Risk F 33 2805 113 AML-Low Risk M 34 3691 014 AML-Low Risk F 34 3179 015 AML-Low Risk F 34 2246 016 AML-High Risk F 34 3328 0 Test of Equality over Strata17 AML-High Risk F 35 2640 0 Test Chi-Square DF Pr >Chi-Square18 AML-Low Risk M 39 1760 1… … … … … Log-Rank 26.9998 5 <.0001273 AML-High Risk M 74 16 1Part of the data used in this example is from anexample published by SAS
- 24. Example 2 • SAS codes proc phreg data=Example2; class gender group; model Time*Status(0)=age group gender /selection=stepwise; run;Analysis of Maximum Likelihood EstimatesParameter DF Paramete Standard Chi-Square Pr > ChiS Hazard r Error q Ratio Estimateage 1 0.15180 0.01229 152.5961 <.0001 1.164Group AML-High 1 0.46243 0.19063 5.8844 0.0153 1.588 RiskGroup AML-Low 1 -0.18436 0.20569 0.8034 0.3701 0.832 Risk Summary of Stepwise Selection Step Effect DF Number Score Wald Pr > ChiSq In Chi-Square Chi-Square Entered Removed 1 age 1 1 169.3010 <.0001 2 Group 2 2 13.1022 0.0014 Test of Equality over Strata Test Chi-Square DF Pr >Chi-Square The modeling result suggests that the effect of gender on Log-Rank 17.1657 2 <.0002 survival function after the transplant is not statistically significant, but the effects of age and severity group are significant.
- 25. Discussion – Parametric Models Nice properties Efficient data reduction – a function with a few parameters completely describes a survival pattern. Enable Standardized comparison – evaluation and comparison based on statistics such as MTTF Prediction into future possible Possible issues Assumptions Non-informative censoring Parametric distribution Exponential family – flexible enough? One vs. multiple distributions – three Weibull distributions for describing a bathtub hazard? How confident we are about future survival path? Estimation Distribution – usually non symmetric Sample size and time period covered by observations Censoring
- 26. Discussion – Cox PH Model Nice properties: Parametric distribution assumption is not needed. Easy to evaluate or test the hypotheses about the effect of a covariate on survival Very popular in clinical trail analysis and outcome studies Possible issues: Proportional hazard – a strong assumption When violated, stratified or extended Cox models may be used. Tests of the assumption log(-log(S(t))) plot Including interactions with time in the model Scaled Schoenfeld residuals plot Estimation Censored observation – not informative Similar issues as seen in a multivariate regression model
- 27. Discussion – Non-Parametric Approach Nice properties: Distribution free Graphical and intuitive Describe well observed survival Possible issues Not continuous Estimates can be biased when improperly stratified– For example, survival function estimates on the tail can be poor. Smoothing is usually needed when estimating hazard function Not informative in terms of future survival function In cases with cross survival or hazard curves, Log-Rank test is not appropriate.
- 28. Discussion – Estimation Improvement Bayesian based survival analysis approaches Introducing prior knowledge to improve parameter estimation Application of multiple imputation to survival analysis May reduce the effect of censored observations. The availability of large historical observations may be informative to the imputation.
- 29. Summary Survival analysis – has found its applications in many fields. It can be powerful in providing insightful information to evaluate a product reliability, monitoring field quality, assisting in making warranty policy, and validating new drug efficacy, etc. Parametric distribution based approach would be the most popular survival analysis approach in reliability engineering while Cox PH model and non- parametric approach are usually favored in biostatistical survival analysis. Each approach comes with its own assumptions and is designed to meet a specified purpose. Validation of these assumptions should always be conducted to ensure the appropriate applications of an approach. Censored data – a major characteristic for survival data that contributes to the uniqueness of survival data analysis and possible issues in model estimation. It should always be kept in mind when designing related experiments and analyzing survival data.
- 30. Questions? Thanks!Contact Email: shao_zhang100@yahoo.com

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment