Brd project

•Download as PPT, PDF•

1 like•429 views

Chao Huang

Multi-study Analysis Of Survival Data For Bovine Respiratory Disease Reporter: Chao ‘Charlie’ Huang Project presentation

OUTLINE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

1. INTRODUCTION ,[object Object],[object Object],[object Object],[object Object],BRD occurrence Clinical diagnosis ( temperature, haptoglobin, etc) Survival analysis

The table is modified based on Brian F. Gage, 2004 ,[object Object],[object Object],[object Object],[object Object],[object Object],Generalized linear model Type of predictor variable Type of response variable Censor? Linear regression Categorical or continuous Normally distributed No Logistic regression Categorical or continuous Binary No Survival analysis Categorical or continuous (maybe time-dependent) Binary Allowed

h(t) = P{ t < T < (t + Δt) | T >t} S(t) = P{T > t}

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object]

[object Object],Meta-analysis of the effects of psychosocial interventions on survival time in cancer patients

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],Survival proportion estimated by survival analysis methods Parameter vector of fixed effects Parameter vector of random effects Coefficient and covariance are estimated by iterative generalized linear regression

2. METHODOLOGY ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

2. METHODOLOGY No covariates method Covariate method

3. RESULTS AND DISCUSSION ,[object Object],Time

3. RESULTS AND DISCUSSION ,[object Object]

3. RESULTS AND DISCUSSION ,[object Object],Study-specific result Combined result After the model in equation (6)

3. RESULTS AND DISCUSSION ,[object Object],Study-specific result Combined result After the model in equation (7)

3. RESULTS AND DISCUSSION ,[object Object],Time Temperature

3. RESULTS AND DISCUSSION ,[object Object],Study I Study II Survival proportion 95% confidence interval

3. RESULTS AND DISCUSSION ,[object Object],The selected fixed effect  temperature, ln(day), [ln(day)] 2

3. RESULTS AND DISCUSSION ,[object Object],Study-specific results Survival proportion 95% confidence interval Study I Study II

3. RESULTS AND DISCUSSION ,[object Object],Survival proportion 95% confidence interval Combined result

4. CONCLUSION ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Similar to Brd project

Projecting ‘time to event’ outcomes in technology assessment: an alternative ...cheweb1

Re-analysis of the Cochrane Library data and heterogeneity challengesEvangelos Kontopantelis

A real life example to show the added value of the Phenotype Database (dbNP)....Chris Evelo

Basic survival analysisMike LaValley

Adjusting for treatment switching in randomised controlled trialscheweb1

slides Testing of hypothesis.pptxssuser504dda

2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...Cytel USA

RSS 2013 - A re-analysis of the Cochrane Library data]Evangelos Kontopantelis

Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)Vaggelis Vergoulas

Optimizing Oncology Trial Design FAQs & Common IssuesnQuery

BIOASSAY PPT (DEEPRAJ SINGH RAUTELA).pptxDeeprajrautela

Non-Parametric Survival ModelsMangaiK4

Internal 2014 - Cochrane dataEvangelos Kontopantelis

Metanalysis Lecturedrmomusa

Practical Work In BiologyGerryC

Comparison of Type and Time of Fixation on Tissue DNA Sequencing ResultsThermo Fisher Scientific

Extending A Trial’s Design Case Studies Of Dealing With Study Design IssuesnQuery

2010 smg training_cardiff_day2_session3_dwan_altmanrgveroniki

Chapter 10 Designghalan

Survival analysisShalli Bavoria

Similar to Brd project (20)

Projecting ‘time to event’ outcomes in technology assessment: an alternative ...

Re-analysis of the Cochrane Library data and heterogeneity challenges

A real life example to show the added value of the Phenotype Database (dbNP)....

Basic survival analysis

Adjusting for treatment switching in randomised controlled trials

slides Testing of hypothesis.pptx

2014-10-22 EUGM | WEI | Moving Beyond the Comfort Zone in Practicing Translat...

RSS 2013 - A re-analysis of the Cochrane Library data]

Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)

Optimizing Oncology Trial Design FAQs & Common Issues

BIOASSAY PPT (DEEPRAJ SINGH RAUTELA).pptx

Non-Parametric Survival Models

Internal 2014 - Cochrane data

Metanalysis Lecture

Practical Work In Biology

Comparison of Type and Time of Fixation on Tissue DNA Sequencing Results

Extending A Trial’s Design Case Studies Of Dealing With Study Design Issues

2010 smg training_cardiff_day2_session3_dwan_altman

Chapter 10 Design

Survival analysis

Brd project

1. Multi-study Analysis Of Survival Data For Bovine Respiratory Disease Reporter: Chao ‘Charlie’ Huang Project presentation

5. h(t) = P{ t < T < (t + Δt) | T >t} S(t) = P{T > t}

10.

11.

12.

13.

14.

15.

16.

17. 2. METHODOLOGY No covariates method Covariate method

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

Editor's Notes

Bovine Respiratory Disease (BRD) is a severe disease for cattle. The common symptoms include coughing, fever, dehydration and death. BRD is accounting for “approximately 75 percent of feedlot morbidity and 50 percent to 70 percent of all feedlot deaths” in the United States. It is very interesting to discover the relationship between BRD occurrence and clinical diagnosis, such as temperature, haptoglobin. Survival analysis can do this work.
For quite long time, survival analysis is extensively used for understanding and modeling time-to-event data. Survival analysis has a unique feature: it can process censoring and time-dependent covariate. Here censoring means incomplete observation due to death, withdrawal, etc. In the generalized linear models, linear regression can deal with normally distributed response variable. If the response variable is binary, we have to use logistic regression. If the predictor is time-dependent and censoring happens, survival analysis is the right choice.
To describe the distribution of survival data, besides the common pdf and CDF, we can use two functions: survival proportion and hazard function. The survival proportion gives the probability of surviving after time t. The hazard function, h(t), defines event rate at time t conditional on survival after t.
There are many methods available for survival analysis. If there is no covariate, Kaplan-Meier method is very popular to estimate the survival proportion. Dj is the death number and Nj is the total number at time j. If there is covariate, Cox's proportional hazards regression is a very successful model. Here, h-i-t is hazard function for individual i. The lamda-o-t is the baseline hazard function. Then we will be able to fit the regression and evaluate the parameters for the covariates.
Right now we have some BRD data from OSU animal science department. There are three studies. Study I has 137 cattle. The duration is 21 days. It has covariates. Study II has 265 cattle. The duration is 42 days. It also has covariates. Study III has 347 cattle. The duration is 56 days. It does not have useful covariate.
Last year, Xuesong used survival analysis and explored the first two datasets. She made a lot of wonderful conclusions.
Then we can move to next step. As we all know, a bigger sample size can bring more power. Since we have three studies here, how about we combine them together statistically?
To answer such a question, the first idea is of course meta-analysis. Meta-analysis is a statistical method to combine several studies’ with the same hypothesis. It has two advantages: first it controls between-study variation; second it increases statistical power. Starting from 1990’s, meta-analysis became popular in many fields. A typical meta-analysis has five steps: 1. Define the research question; 2. Search and select the literatures; 3. Compute the effect size and its variance for each study; 4. Calculate the summary effect by inverse-variance weighting; 5. Report and interpret the result.
Here we have one example about how to use meta-analysis on survival data. First step, this purpose of this research is to investigate the psychosocial interventions on survival time in cancer patients; second step, the authors selected some qualified publications; third step, the hazard ratios between the treatment and the control were calculated; Forth step, the summary hazard ratio is computed by inverse-variance weighting. Last step, all results were demonstrated in this forest plot.
However, our report has some practical problems. First the data is actually very messy. The three studies don’t have the same duration. Second, our data is observational data. Not like a designed experiment, it has no control or treatment. So it is impossible to calculate the effect size and the within variance. It means that we cannot follow the procedures by the traditional meta-analysis. Then the question is: if we cannot use the traditional meta-analysis, how can we combine these three studies?
Fortunately, besides the traditional meta-analysis, there are some statistical models which can combine multiple studies. In 2000, Earle and Wells summarized five methods to combine multiple studies on survival data. They are IGLS, MFD, NLR, LRR and W-LRR. All the five methods can produce accurate summary survival curve.
One of the five methods is IGLS, iterative generalized least-squares. Starting from 1988, a lot of statisticians have consistently improved this method. The latest one is a multivariate random-effect model by Arends and her coworkers in 2008.
Arends’ model can be described as this equation. On the left, we can see that log-negative-log transformation of the survival proportion estimates is the response variable; i indexes the studies incorporated. On the right, Beta is the parameter vector of fixed effects; bi is the parameter vector of random effects. With iterative generalized linear regression, all coefficients and covariances can be estimated.
OK, let’s talk about the second part: methodology. In this report, we followed a three-step process. The first step is data preparation. The second step is modeling. The last step is result report. In the first step, we did data transformation. Because the temperature in the study I is reticular temperature, we just used the equation by a recent paper and transformed the reticular temperature to the rectal temperature. Then both Study I and Study II have the rectal temperatures. Then we cleaned the raw data. Some invalid observations in the study I and III were excluded.
Then in the modeling step, we applied two different methods: no covariates method and covariate method. This step can further divide into four stages: target variable generation, model selection, model fitting and model assessment. For each stage, we used a number of procedures in the SAS 9.2 for the two methods separately.
Right now we can talk about the result and discussion. The first method is no covariates method. In this step, we don’t use any covariate. We just used the Kaplan-Meier to generate the estimated survival proportions for all the three datasets.
This is the Kaplan-Meier survival curves we have introduce a moment ago. We can see that as time goes, the survival proportion decreases at different rates. The study III usually has the highest survival proportion, followed by Study II and I.
If we give the negative log transformation to the survival proportion, those survive curves would be upside down, and we will be able to see an increasing trend in this plot. Still they are not in ideal shape. So how about we give them a second transformation. The time has been log transformed, and the y axis is the log-negative-log transformed survival proportions. Now they look better and we can have a try to fit those dots. If without considering the interaction among the three studies, we can fit those dots with linear least-square lines and quadratic least-square lines independently. Comparing the two plots, the quadratic ones seem better. But we are not very sure.
Thus, according to the observations, we proposed two models. The equation 6 doesn’t have a quadratic term, while the equation 7 has a quadratic term. The source of study is taken as random effect. Through the model fitting, the coefficients and covariances are estimated. All of them are significant. The fit statistics also show that the model in equation (7) is better than the model in equation (6).
OK. Let us have a look at the performance of those two model. The first model is the model in equation (6). The original survival curves and their 95% confidence intervals are displayed in the first plot. After the modeling, we have the thinner confidence intervals. But the study-specific result underestimates the survival proportions. The combined survival curve also has such problems.
As for the model in equation 7, after the modeling, we can see that this time we have much thinner confidence intervals for individual studies. The combined survival curve combines the three studies pretty well. So according to the results, we may say here that by this method, we can combine the three studies together. And obviously the model in equation 7 is better than the model in equation 6.
Then we start to explore a covariate method. This time we introduce temperature as covariate. Because the study III has no temperature information, we can only use study I and study II. At the very beginning, we used the Cox's proportional hazards regression to produce the survival proportion.
Because this time we have temperature, the survival proportion and their 95% confidence intervals can be displayed in a three dimension environment. The left axis is the temperature and the right axis is the time. The first row is about Study I; and the second row is about Study II. The first column is the estimated survival proportion, and the second column is the 95% confidence intervals. The upper layer is the upper confidence limit and the lower layer is the lower confidence limit.
If we section the 3d image into 2d images, we will be able to see the survival curves at different temperatures. Here we have several temperature degrees. Overall, as the temperature increases, the decreasing rate of survival proportion turns larger.
Now we have time and also temperature. There are many possibilities about input variables, because they could have combination or high order terms. We used a stepwise selection process to choose some valid input variable. At the end, three input variables, temperature, ln(day), [ln(day)]2, are confirmed.
Then we used those input variables to construct a model. After the fitting, the coefficient and covariance are estimated. All coefficients are significant.
After the modeling, for individual studies, the updated survival proportions and corresponding 95% confidence intervals are demonstrated. As we can see here, the confidence intervals are very thin. The upper confidence limit and lower confidence limit overlap each other, which looks like just one layer with temperature as a covariate.
The section results are quite similar. The 95% confidence intervals are quite thin, very close to estimated survival proportions.
As for the combined result, the result is displayed here. By such a means, finally we will be able to combine study I and II together.
In this report, we used Arend’s model to successfully combine the survival analysis. The model has some strength: it handles the observational data; it is very simply and can be easily explained. We can use SAS to finish all process. The statistical power could be increased after combining. It also has the weakness: sometimes it is not a real survival curve; it requires an assumption that random effects have the normal distributions; over-fitting may occur like what we see in the covariate method; Published results in journal papers may not have time information, which may cause some problem for .
In the future, this model can be further improved. For example, the ln(-ln) may not be necessary. We can also use other types of transformation. Second the normal distribution assumption for the random effect may be substituted, for example, by gamma distribution.

Brd project

Recommended

Recommended

More Related Content

Similar to Brd project

Similar to Brd project (20)

Brd project

Editor's Notes