Brd project


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Bovine Respiratory Disease (BRD) is a severe disease for cattle. The common symptoms include coughing, fever, dehydration and death. BRD is accounting for “approximately 75 percent of feedlot morbidity and 50 percent to 70 percent of all feedlot deaths” in the United States. It is very interesting to discover the relationship between BRD occurrence and clinical diagnosis, such as temperature, haptoglobin. Survival analysis can do this work.
  • For quite long time, survival analysis is extensively used for understanding and modeling time-to-event data. Survival analysis has a unique feature: it can process censoring and time-dependent covariate. Here censoring means incomplete observation due to death, withdrawal, etc. In the generalized linear models, linear regression can deal with normally distributed response variable. If the response variable is binary, we have to use logistic regression. If the predictor is time-dependent and censoring happens, survival analysis is the right choice.
  • To describe the distribution of survival data, besides the common pdf and CDF, we can use two functions: survival proportion and hazard function. The survival proportion gives the probability of surviving after time t. The hazard function, h(t), defines event rate at time t conditional on survival after t.
  • There are many methods available for survival analysis. If there is no covariate, Kaplan-Meier method is very popular to estimate the survival proportion. Dj is the death number and Nj is the total number at time j. If there is covariate, Cox's proportional hazards regression is a very successful model. Here, h-i-t is hazard function for individual i. The lamda-o-t is the baseline hazard function. Then we will be able to fit the regression and evaluate the parameters for the covariates.
  • Right now we have some BRD data from OSU animal science department. There are three studies. Study I has 137 cattle. The duration is 21 days. It has covariates. Study II has 265 cattle. The duration is 42 days. It also has covariates. Study III has 347 cattle. The duration is 56 days. It does not have useful covariate.
  • Last year, Xuesong used survival analysis and explored the first two datasets. She made a lot of wonderful conclusions.
  • Then we can move to next step. As we all know, a bigger sample size can bring more power. Since we have three studies here, how about we combine them together statistically?
  • To answer such a question, the first idea is of course meta-analysis. Meta-analysis is a statistical method to combine several studies’ with the same hypothesis. It has two advantages: first it controls between-study variation; second it increases statistical power. Starting from 1990’s, meta-analysis became popular in many fields. A typical meta-analysis has five steps: 1. Define the research question; 2. Search and select the literatures; 3. Compute the effect size and its variance for each study; 4. Calculate the summary effect by inverse-variance weighting; 5. Report and interpret the result.
  • Here we have one example about how to use meta-analysis on survival data. First step, this purpose of this research is to investigate the psychosocial interventions on survival time in cancer patients; second step, the authors selected some qualified publications; third step, the hazard ratios between the treatment and the control were calculated; Forth step, the summary hazard ratio is computed by inverse-variance weighting. Last step, all results were demonstrated in this forest plot.
  • However, our report has some practical problems. First the data is actually very messy. The three studies don’t have the same duration. Second, our data is observational data. Not like a designed experiment, it has no control or treatment. So it is impossible to calculate the effect size and the within variance. It means that we cannot follow the procedures by the traditional meta-analysis. Then the question is: if we cannot use the traditional meta-analysis, how can we combine these three studies?
  • Fortunately, besides the traditional meta-analysis, there are some statistical models which can combine multiple studies. In 2000, Earle and Wells summarized five methods to combine multiple studies on survival data. They are IGLS, MFD, NLR, LRR and W-LRR. All the five methods can produce accurate summary survival curve.
  • One of the five methods is IGLS, iterative generalized least-squares. Starting from 1988, a lot of statisticians have consistently improved this method. The latest one is a multivariate random-effect model by Arends and her coworkers in 2008.
  • Arends’ model can be described as this equation. On the left, we can see that log-negative-log transformation of the survival proportion estimates is the response variable; i indexes the studies incorporated. On the right, Beta is the parameter vector of fixed effects; bi is the parameter vector of random effects. With iterative generalized linear regression, all coefficients and covariances can be estimated.
  • OK, let’s talk about the second part: methodology. In this report, we followed a three-step process. The first step is data preparation. The second step is modeling. The last step is result report. In the first step, we did data transformation. Because the temperature in the study I is reticular temperature, we just used the equation by a recent paper and transformed the reticular temperature to the rectal temperature. Then both Study I and Study II have the rectal temperatures. Then we cleaned the raw data. Some invalid observations in the study I and III were excluded.
  • Then in the modeling step, we applied two different methods: no covariates method and covariate method. This step can further divide into four stages: target variable generation, model selection, model fitting and model assessment. For each stage, we used a number of procedures in the SAS 9.2 for the two methods separately.
  • Right now we can talk about the result and discussion. The first method is no covariates method. In this step, we don’t use any covariate. We just used the Kaplan-Meier to generate the estimated survival proportions for all the three datasets.
  • This is the Kaplan-Meier survival curves we have introduce a moment ago. We can see that as time goes, the survival proportion decreases at different rates. The study III usually has the highest survival proportion, followed by Study II and I.
  • If we give the negative log transformation to the survival proportion, those survive curves would be upside down, and we will be able to see an increasing trend in this plot. Still they are not in ideal shape. So how about we give them a second transformation. The time has been log transformed, and the y axis is the log-negative-log transformed survival proportions. Now they look better and we can have a try to fit those dots. If without considering the interaction among the three studies, we can fit those dots with linear least-square lines and quadratic least-square lines independently. Comparing the two plots, the quadratic ones seem better. But we are not very sure.
  • Thus, according to the observations, we proposed two models. The equation 6 doesn’t have a quadratic term, while the equation 7 has a quadratic term. The source of study is taken as random effect. Through the model fitting, the coefficients and covariances are estimated. All of them are significant. The fit statistics also show that the model in equation (7) is better than the model in equation (6).
  • OK. Let us have a look at the performance of those two model. The first model is the model in equation (6). The original survival curves and their 95% confidence intervals are displayed in the first plot. After the modeling, we have the thinner confidence intervals. But the study-specific result underestimates the survival proportions. The combined survival curve also has such problems.
  • As for the model in equation 7, after the modeling, we can see that this time we have much thinner confidence intervals for individual studies. The combined survival curve combines the three studies pretty well. So according to the results, we may say here that by this method, we can combine the three studies together. And obviously the model in equation 7 is better than the model in equation 6.
  • Then we start to explore a covariate method. This time we introduce temperature as covariate. Because the study III has no temperature information, we can only use study I and study II. At the very beginning, we used the Cox's proportional hazards regression to produce the survival proportion.
  • Because this time we have temperature, the survival proportion and their 95% confidence intervals can be displayed in a three dimension environment. The left axis is the temperature and the right axis is the time. The first row is about Study I; and the second row is about Study II. The first column is the estimated survival proportion, and the second column is the 95% confidence intervals. The upper layer is the upper confidence limit and the lower layer is the lower confidence limit.
  • If we section the 3d image into 2d images, we will be able to see the survival curves at different temperatures. Here we have several temperature degrees. Overall, as the temperature increases, the decreasing rate of survival proportion turns larger.
  • Now we have time and also temperature. There are many possibilities about input variables, because they could have combination or high order terms. We used a stepwise selection process to choose some valid input variable. At the end, three input variables, temperature, ln(day), [ln(day)]2, are confirmed.
  • Then we used those input variables to construct a model. After the fitting, the coefficient and covariance are estimated. All coefficients are significant.
  • After the modeling, for individual studies, the updated survival proportions and corresponding 95% confidence intervals are demonstrated. As we can see here, the confidence intervals are very thin. The upper confidence limit and lower confidence limit overlap each other, which looks like just one layer with temperature as a covariate.
  • The section results are quite similar. The 95% confidence intervals are quite thin, very close to estimated survival proportions.
  • As for the combined result, the result is displayed here. By such a means, finally we will be able to combine study I and II together.
  • In this report, we used Arend’s model to successfully combine the survival analysis. The model has some strength: it handles the observational data; it is very simply and can be easily explained. We can use SAS to finish all process. The statistical power could be increased after combining. It also has the weakness: sometimes it is not a real survival curve; it requires an assumption that random effects have the normal distributions; over-fitting may occur like what we see in the covariate method; Published results in journal papers may not have time information, which may cause some problem for .
  • In the future, this model can be further improved. For example, the ln(-ln) may not be necessary. We can also use other types of transformation. Second the normal distribution assumption for the random effect may be substituted, for example, by gamma distribution.
  • Brd project

    1. 1. Multi-study Analysis Of Survival Data For Bovine Respiratory Disease Reporter: Chao ‘Charlie’ Huang Project presentation
    2. 2. OUTLINE <ul><li>1. Introduction </li></ul><ul><ul><li>Bovine Respiratory Disease </li></ul></ul><ul><ul><li>Survival analysis </li></ul></ul><ul><ul><li>Meta-analysis </li></ul></ul><ul><ul><li>Statistical models combining multi-study </li></ul></ul><ul><ul><li>Arends’ multivariate random-effects model </li></ul></ul><ul><li>2. Methodology </li></ul><ul><ul><li>Data manipulation </li></ul></ul><ul><ul><li>Modeling </li></ul></ul><ul><li>3. Results and discussion </li></ul><ul><ul><li>No covariates method </li></ul></ul><ul><ul><li>Covariate method </li></ul></ul><ul><li>4. Conclusion </li></ul>
    3. 3. 1. INTRODUCTION <ul><li>1.1 Bovine Respiratory Disease (BRD) </li></ul><ul><ul><li>a severe cattle disease </li></ul></ul><ul><ul><li>coughing, fever, dehydration and death </li></ul></ul><ul><ul><li>accounting for “approximately 75 percent of feedlot morbidity and 50 percent to 70 percent of all feedlot deaths” in the United States (Stotts 2010). </li></ul></ul>BRD occurrence Clinical diagnosis ( temperature, haptoglobin, etc) Survival analysis
    4. 4. The table is modified based on Brian F. Gage, 2004 <ul><ul><li>1.2 Survival analysis </li></ul></ul><ul><ul><ul><li>models time-to-event data </li></ul></ul></ul><ul><ul><ul><ul><li>censoring </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>incomplete observation due to death, withdrawal, etc </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>time-dependent covariates </li></ul></ul></ul></ul>Generalized linear model Type of predictor variable Type of response variable Censor? Linear regression Categorical or continuous Normally distributed No Logistic regression Categorical or continuous Binary No Survival analysis Categorical or continuous (maybe time-dependent) Binary Allowed
    5. 5. h(t) = P{ t < T < (t + Δt) | T >t} S(t) = P{T > t}
    6. 7. <ul><li>BRD Data from OSU Animal Science Department </li></ul><ul><ul><li>Study I </li></ul></ul><ul><ul><ul><li>137 cattle; 21 days; covariates(reticular temperature, haptoglobin, etc) </li></ul></ul></ul><ul><ul><li>Study II </li></ul></ul><ul><ul><ul><li>265 cattle; 42 days; covariates(rectal temperature, haptoglobin, etc) </li></ul></ul></ul><ul><ul><li>Study III </li></ul></ul><ul><ul><ul><li>347 cattle; 56 days </li></ul></ul></ul>
    7. 8. <ul><li>Using Study I and II, Li (2009) finished survival analysis with Kaplan-Meier method and Cox's proportional hazards regression. </li></ul><ul><ul><li>Overall “nearly half of the sick animals developed the disease in the first 7 days after arrival” and “when temperature is higher, the hazard of developing BRD is higher for both data sets”. </li></ul></ul><ul><ul><li>“ when the haptoglobin level is higher, the hazard for developing BRD also increases” for Study I, and “the two coefficients, temperature and the interaction between temperature and time, are significant” for Study II. </li></ul></ul>
    8. 9. <ul><li>Next step </li></ul><ul><ul><li>Increased sample size  more power </li></ul></ul><ul><ul><li>How about we combine the three studies together? </li></ul></ul>
    9. 10. <ul><li>1.3 Meta-analysis </li></ul><ul><ul><li>a statistical method to combine several studies’ results targeting the same or similar hypotheses </li></ul></ul><ul><ul><ul><li>controls between-study variation </li></ul></ul></ul><ul><ul><ul><li>increases statistical power </li></ul></ul></ul>
    10. 11. <ul><li>An example </li></ul>Meta-analysis of the effects of psychosocial interventions on survival time in cancer patients
    11. 12. <ul><li>However, our data </li></ul><ul><ul><li>Has messy structure </li></ul></ul><ul><ul><ul><li>Missing or invalid variable </li></ul></ul></ul><ul><ul><ul><li>Different duration </li></ul></ul></ul><ul><ul><li>Is observational data </li></ul></ul><ul><ul><ul><li>No randomization </li></ul></ul></ul><ul><ul><ul><li>No treatment vs. treatment </li></ul></ul></ul><ul><li>If we cannot use the traditional meta-analysis, how can we combine these three studies? </li></ul>
    12. 13. <ul><li>1.4 Statistical models combining multi-study </li></ul>
    13. 14. <ul><li>Iterative generalized least-squares </li></ul>
    14. 15. <ul><li>1.5 Arends’ multivariate random-effects model </li></ul>Survival proportion estimated by survival analysis methods Parameter vector of fixed effects Parameter vector of random effects Coefficient and covariance are estimated by iterative generalized linear regression
    15. 16. 2. METHODOLOGY <ul><ul><li>Data transformation </li></ul></ul><ul><ul><ul><li>Study I </li></ul></ul></ul><ul><ul><ul><ul><li>Reticular temperature (RETT)  rectal temperature (RECT) </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>RECT=15.88 + 0.587*RETT by Bewley et al. (2008) </li></ul></ul></ul></ul></ul><ul><ul><li>Data cleaning </li></ul></ul><ul><ul><ul><li>Study I </li></ul></ul></ul><ul><ul><ul><ul><li>137 animals  129 </li></ul></ul></ul></ul><ul><ul><ul><li>Study III </li></ul></ul></ul><ul><ul><ul><ul><li>347 animals  230 </li></ul></ul></ul></ul><ul><ul><li>Data transformation </li></ul></ul><ul><ul><ul><li>Study I </li></ul></ul></ul><ul><ul><ul><ul><li>Reticular temperature (RETT)  rectal temperature (RECT) </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>RECT=15.88 + 0.587*RETT by Bewley et al. (2008) </li></ul></ul></ul></ul></ul><ul><ul><li>Data cleaning </li></ul></ul><ul><ul><ul><li>Study I </li></ul></ul></ul><ul><ul><ul><ul><li>137 animals  129 animals </li></ul></ul></ul></ul><ul><ul><ul><li>Study III </li></ul></ul></ul><ul><ul><ul><ul><li>347 animals  230 animals </li></ul></ul></ul></ul>
    16. 17. 2. METHODOLOGY No covariates method Covariate method
    17. 18. 3. RESULTS AND DISCUSSION <ul><li>3.1 No covariates method </li></ul>Time
    18. 19. 3. RESULTS AND DISCUSSION <ul><li>3.1 No covariates method </li></ul>
    19. 20. 3. RESULTS AND DISCUSSION <ul><li>3.1 No covariates method </li></ul>
    20. 21. 3. RESULTS AND DISCUSSION <ul><li>3.1 No covariates method </li></ul>
    21. 22. 3. RESULTS AND DISCUSSION <ul><li>3.1 No covariates method </li></ul>Study-specific result Combined result After the model in equation (6)
    22. 23. 3. RESULTS AND DISCUSSION <ul><li>3.1 No covariates method </li></ul>Study-specific result Combined result After the model in equation (7)
    23. 24. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>Time Temperature
    24. 25. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>Study I Study II Survival proportion 95% confidence interval
    25. 26. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>
    26. 27. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>The selected fixed effect  temperature, ln(day), [ln(day)] 2
    27. 28. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>
    28. 29. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>Study-specific results Survival proportion 95% confidence interval Study I Study II
    29. 30. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>
    30. 31. 3. RESULTS AND DISCUSSION <ul><li>3.2 Covariate method </li></ul>Survival proportion 95% confidence interval Combined result
    31. 32. 4. CONCLUSION <ul><li>Strength </li></ul><ul><ul><li>Handles the observational data </li></ul></ul><ul><ul><li>Simple and robust </li></ul></ul><ul><ul><li>Easy to be programmed in SAS® </li></ul></ul><ul><li>Weakness </li></ul><ul><ul><li>Not a real survival curve </li></ul></ul><ul><ul><li>Random effects have the normal distributions </li></ul></ul><ul><ul><li>Over-fitting may occur </li></ul></ul><ul><ul><li>Journal papers? </li></ul></ul>
    32. 33. <ul><li>Future Improvement </li></ul><ul><ul><li>ln(-ln) transformation </li></ul></ul><ul><ul><ul><li>Regression splines, fractional polynomials, etc. </li></ul></ul></ul><ul><ul><ul><li>Simulation test may decide the best transformation </li></ul></ul></ul><ul><ul><li>Normal distribution assumption </li></ul></ul><ul><ul><ul><li>A gamma distribution by Fiocco, Putter and van Houwelingen (2009) </li></ul></ul></ul>