The use of Prediction Intervals in Meta-Analysis
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
960
On Slideshare
960
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
25
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The use of Prediction Intervals in Meta-Analysis Nikesh Patel March 28, 2013
  • 2. Abstract Background Systematic reviews containing meta-analyses of randomised controlled trials provide the best and most reliable information on health care interventions. Meta-analysis combines treatment effects from included studies to produce overall summary results. In the fixed-effect analysis, a common effect is assumed whereas in a random-effects analysis, the model allows for between-study heterogeneity. The goal of analysing heterogeneous studies is not only to report a summary estimate but to explain the observed differences. Whilst a random-effects model remains gold standard for analysing heterogeneous studies, solely reporting the summary estimate and its 95% confidence interval masks the potential effects of heterogeneity. A 95% prediction interval, which takes into the account the full uncertainty surround the summary estimate, describes the whole distribution of effects in a random-effects model, the degree of betweenstudy heterogeneity and conveniently gives a range for which we are 95% sure that the treatment effect in a brand new study lies within. Aims I aim to apply a 95% prediction interval to a collection of meta-analyses of randomised controlled trials and observe the impact it has on their outcomes. I also aim to apply a 95% prediction interval to meta-epidemiological studies which assesses the influence of trial characteristics on the treatment effect estimates in meta-analyses. Results I carried out an empirical review to look at the impact of 95% prediction intervals on existing meta-analyses of randomised controlled trials on the Lancet. From 26 studies, I extracted 36 meta-analyses containing between three and thirty-four randomised controlled trials (median eight, IQ range seven) and reproduced each using a randomeffects model with a 95% prediction interval. I found 19 (52.8%) had significant 95% confidence intervals of which 10 (27.8%) had insignificant 95% prediction intervals, 9 (25%) had significant 95% prediction intervals. Also, 95% prediction intervals were applied to 4 meta-epidemiological studies revealing extra information concerning the summary estimates.
  • 3. Conclusion Every random-effects meta-analysis should include a 95% prediction interval but for best performance, the analysis should include a sufficient number of good quality unbiased randomised controlled trials. To enhance quality and robustness of metaepidemiological studies, a 95% prediction interval should be included. 2
  • 4. Contents 1 Introduction 1.1 Systematic Review . . . . . . . . . . . . . . . 1.2 Meta-Analysis . . . . . . . . . . . . . . . . . . 1.3 Fixed-Effect Meta-Analysis . . . . . . . . . . . 1.4 Carrying out a Fixed-Effect Meta-Analysis . . 1.5 Heterogeneity . . . . . . . . . . . . . . . . . . 1.6 Random-Effects Meta-Analysis . . . . . . . . . 1.7 Carrying out a Random-Effects Meta-Analysis 1.8 Fixed-Effect v Random-Effects . . . . . . . . . . . . . . . . . 3 3 4 5 6 9 11 12 14 2 Prediction Interval 2.1 95% Prediction Interval . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Calculating a Prediction Interval . . . . . . . . . . . . . . . . . . . . 2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18 18 20 3 Empirical review of the impact of using prediction isting meta-analyses 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Search Strategy and Selection Criteria . . . . 3.2.2 Data Calculations . . . . . . . . . . . . . . . . 3.2.3 Software . . . . . . . . . . . . . . . . . . . . . 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Principal Findings . . . . . . . . . . . . . . . 3.4.2 Limitations . . . . . . . . . . . . . . . . . . . 3.4.3 Comparison with other studies . . . . . . . . . 3.4.4 Final Remarks and Implications . . . . . . . . . . . . . . . . . . . 22 22 23 23 24 27 27 36 37 39 40 40 4 Prediction intervals in Meta-Epidemiological studies 4.1 Meta-Epidemiological Study . . . . . . . . . . . . . . . . . . . . . . . 4.2 Prediction Intervals in Meta-Epidemiological Studies . . . . . . . . . 42 43 43 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . intervals on ex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 5. 4.3 4.2.1 Example 4.2.2 Example 4.2.3 Example 4.2.4 Example Discussion . . . 1 2 3 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 46 48 50 52 5 Final Discussion and Conclusion 53 A STATA Codes 57 2
  • 6. Chapter 1 Introduction In health care and medicine, clinicians, researchers and other important figures require quality and accurate information to assist them in being able to make the best possible decisions on health care interventions. Such information is normally found in systematic reviews containing meta-analyses of randomised controlled trials. 1 The aim of this paper is to investigate the use of prediction intervals in meta-analysis, a typical statistical component of a systematic review and how its application can help aid interpretation of meta-analysis results to a higher degree of quality and accuracy. 1.1 Systematic Review Since the 1990s, systematic reviews have become very important in medicine and health care. Reasons for this are down to the sheer volume of medical literature produced annually and the requirement for clinicians and other health care officials to have up to data quality and accurate information on health care interventions. 1 The objective of a systematic review is to present a balanced and impartial summary of all the available research on a well-defined research question. 1 It uses systematic and explicit methods to identify, assess, select and synthesise all the evidence that is relevant to answering a well-defined research questions in an objective and unbiased manner. Systematic reviews have replaced traditional narrative reviews since the former does not follow peer-protocol, do not use any kind of rigorous methods and tend to lack transparency causing bias; a systematic review corrects these issues. 2 A systematic review begins by clearly defining a research question of interest, this may include what treatments are being compared, what outcomes are being measured, what is the population of interest etc. The next step is to search for studies that are relevant to the research question, this is done by searching all of the published and unpublished information against a well-defined quality search criterion which can 3
  • 7. involve searching databases such as MEDLINE, PubMed etc. The studies which pass through the search criterion go through further quality assessment to remove any irrelevant studies. The next step is extract all the relevant data from the included studies and then carry out a statistical synthesis of the data which is done using meta-analysis (see Meta-Analysis). The final step is to present all the findings from the analysis as well as analysing any possible heterogeneity between the studies, commenting on the quality of the studies (e.g. bias) and identifying areas of further research. 1 Examples of systematic reviews can be found easily on the internet on websites of the British Medical Journal (BMJ) or the Cochrane Collaboration and many more. These websites dedicate themselves to provide information on health care interventions to the health care and medicine industry. A robust methodology for preparing and producing systematic reviews can be found on these websites for example, The Cochrane handbook for systematic reviews of interventions. 3 1.2 Meta-Analysis “The Statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings” Gene V. Glass definition of Meta-Analysis A meta-analysis is a statistical technique whereby results from studies, included in the analysis, are combined to produce a total and complete summary of the studies. In epidemiology, a stereotypical systematic review of randomised controlled trials will use meta-analysis as its statistical component whereby treatment effects from individual trials are synthesised in the aim of assessing clinical effectiveness of healthcare interventions. 4 Meta-analysis is based on one of two models, the fixed-effect and the random-effects model. In this chapter, I discuss both models and when each type should be used. It first seems appropriate to address the reasons why we would want to use a metaanalysis and not the traditional narrative approach. In a narrative approach, the focus tends to be on p-values of individual studies and observing if there is significant effect in each study. Since there is no rigorous way of synthesising p-values, the findings from a narrative approach tends to lack transparency and in many cases, the researchers may only include studies that support their own opinions which leads to the results being biased towards their own opinions. 1;2 A meta-analysis, on the other hand, works directly with the treatment effects of each study and their respective standard errors and performs one single synthesis of all the data to produce an aggregate summary ˆ estimate, which I denote as θ. 4 Since we are combining all the information across the studies, we reduce uncertainty compared to any individual study since we are increasing the sample size and in turn, increasing the power to detect clinically meaningful 4
  • 8. results. 2 A meta-analysis also addresses the consistency of treatment effects across the studies, something a narrative approach fails to do. If the treatment effects are consistent, then the focus is on the summary estimate and making sure we estimate this accurately as possible. If the treatment effects are not consistent, the not only should we estimate the summary effect but explain the differences that exist between the studies. 2;5 Treatment effects are generally much more important to clinicians and other health care officials compared to p-values. The effect size tells us not only if the treatment effect is better/worse (i.e. greater or less than the null value), but also the magnitude of the effect. Also, p-values can be easily misinterpreted as some researchers may deem a non-significant p-value to suggest the treatment effect has no effect. 2 I later return to the argument of a narrative approach against a meta-analysis when I consider an example (see page 8). A vital requirement for a strong meta-analysis is a well-conducted systematic review. If the underlying systematic review isn’t carried out under good conduct, the metaanalysis will produce results that may lead to misleading conclusions. 1;2 A metaanalysis should also be carried out under good conduct, once again I recommend the Cochrane handbook for systematic reviews on how to conduct a good meta-analysis. 3 1.3 Fixed-Effect Meta-Analysis The first type of meta-analysis I discuss is the fixed-effect meta-analysis. The fixedeffect model assumes that all the studies included in the analysis are estimating the same underlying treatment effect or in other words, we believe the true treatment effect is common across all the other studies and each study is estimating that same true treatment effect. The repercussions of this model is that any differences observed between the individual treatment effects are down solely to random sampling error (within-study error). If we had an infinite number of studies with an infinitely large sample size, we expect the within-study error in each study to tend to zero and the individual treatment effects to be the same as the true common treatment effect. 2 In the fixed-effect model, we can express the observed treatment effects in the following way, ˆ Y k = θ + εk (1.1) ˆ where Yk is the observed treatment effect in study k, θ is the estimate of the summary treatment effect and εk is the random sampling error in study k. We can assume that the errors follow a normal distribution with mean 0 and variance equal to the variance of the treatment effect in study k, i.e. that εk ∼ N(0,Var(Yk )). Here the 5
  • 9. errors account for the within-study error in each study since in the fixed-effect model, we assume this is the only source of variation. 2 ˆ For the fixed-effect meta-analysis, the aim is to compute the summary estimate θ, which is interpreted as the best estimate of the common treatment effect that underlies each of the studies in the analysis, along with a 95% confidence interval. 1.4 Carrying out a Fixed-Effect Meta-Analysis A general approach to meta-analysis is given by the inverse-variance method, this method works for any type of data as long as we can obtain a treatment effect and its standard error. 2 For continuous data, we need a mean difference (or any kind of difference), for survival data, we need a log hazard ratio and for binary outcomes, we need a log odds ratio or log relative risk along with their respective standard errors (log standard errors for ratios). In the fixed-effect model, the weight assigned to each study is one over the variance of the study, hence the term inverse variance method. Studies with smaller variances are assigned larger weights than studies with larger variances. The fixed-effect inverse-variance weighting is therefore given by Wk = 1 . V ar(Yk ) (1.2) where V ar(Yk ) is the variance of the observed treatment effect in study k. ˆ The formula for θ using a fixed-effect model is given by ˆ θ= n k=1 Yk Wk n k=1 Wk (1.3) which has variance given by 1 ˆ V ar(θ) = n k=1 Wk . (1.4) Here Wk is the weighting given using the inverse variance given by (1.2). I note that ˆ θ is the maximum likelihood estimate for θ and it is asymptotically unbiased, efficient ˆ and normal. 6 I reiterate that θ should be interpreted as the best estimate of the common treatment effect since the fixed-effect model assumes that each of the studies in the analysis are estimating the same treatment effect. 6
  • 10. We also calculate a 95% confidence interval to express our uncertainty around our ˆ ˆ summary estimate θ, assuming that θ is approximately normally distributed, using the following formula ˆ ˆ θ ± 1.96s.e.(θ) . (1.5) If we are working on the log scale, i.e. we are using some type of ratio, we must ˆ remember to exponentiate θ in (1.3) and the end points of the confidence interval in (1.5). I could also present a 100(1 − α)% confidence interval but for convention, I am only going to calculate 95% confidence intervals in this paper. Example 1 Table 1.1, presented below, shows the results from ten randomised controlled trials each comparing the benefit of an anti-hyperintensive treatment, treatment A, against placebo. Each trial is presented with its unbiased estimated mean difference in change in systolic blood pressure (mmHg), variance and a 95% confidence interval. 7 T rial(k) 1 2 3 4 5 6 7 8 9 10 Yk -0.49 -0.17 -0.52 -0.48 -0.26 -0.36 -0.47 -0.30 -0.15 -0.28 V ark 0.12 0.05 0.06 0.14 0.06 0.08 0.05 0.02 0.07 0.25 95% C.I. [-1.17,0.19] [-0.61,0.27] [-1.00,-0.04] [-1.21,0.25] [-0.74,0.22] [-0.91,0.19] [-0.91,0.03] [-0.58,-0.02] [-0.67,0.37] [-1.26,0.70] Table 1.1: Results of trials comparing treatment A against placebo (A value < 0 represents a reduction in blood pressure and therefore beneficial) Using a fixed-effect model, we weight each study using (1.2) and then obtain a summary estimate for the treatment effect along with a 95% confidence interval. Using ˆ (1.3), we calculated our summary estimate θ to be -0.33, so we expect treatment A to consistently reduce systolic blood pressure by 0.33mmHg. Our 95% confidence interval calculated using (1.5) is [-0.48,-0.18]. Since the null value of 0 is not in the ˆ 95% confidence interval for θ, there is strong evidence at 5% level that treatment A is effective in reducing systolic blood pressure. The results are presented in a forest plot given below in figure 1.1. 7
  • 11. Figure 1.1: Forest plot of a meta-analysis of randomised controlled trials showing the effects of treatment A on reducing systolic blood-pressure (SMD = standardised mean difference) 7 On the forest plot, the squares represent the weight that is assigned to the corresponding study with the centre of the square depicting the observed treatment effect for that study. The 95% confidence interval for each study is represented by the lines going through the squares with them beginning and ending at the end points of the interval. The diamond at the bottom of the forest plot represents the 95% confidence interval of the summary estimate with the centre representing the summary estimate. I know return to the argument of using meta-analysis over a narrative approach. If we observing the forest plot in figure 1.1, eight trials have a confidence interval that contains the null value 0 and therefore have insignificant p-values. If we took a narrative approach and consider each study separately, we would most likely conclude that since 80% of the studies produced insignificant p-values, the treatment isn’t beneficial. When we perform a meta-analysis, the 95% confidence interval for the summary estimate doesn’t contain the null value and therefore we obtain a significant p-value since we have increased the power to detect significant results. 2 8
  • 12. 1.5 Heterogeneity In the fixed-effect model, we assumed that all the studies in the analysis are estimating the same treatment effect and the only error we allow for is random sampling error (within-study heterogeneity), but is this always a plausible assumption. In general, studies looking at the same treatment may differ in many ways such as patient characteristics (age, patient health etc), location of study, intervention applied (dosage etc) and many more known and unknown factors causing the treatment effects across the studies to longer remain consistent. 2 If the treatment effects are no longer consistent, then there exist real differences between the studies (between-study heterogeneity) and the aim of a meta-analysis should be to assess the heterogeneity between the treatment effects as well as calculate a summary estimate. 2;5;8 If we used a fixed-effect method in the presence of between-study heterogeneity, we would be wrongly implying a common effect exists and hence lead to misleading conclusions about the treatment. I now discuss ways in which we can assess heterogeneity, since heterogeneity is made up of real differences (between-study heterogeneity) and random sampling error (withinstudy error), we need some tools to help us see if between-study heterogeneity is present. I first introduce the Q-statistic which is based on the result of the Q-test. This test is useful if we believe the presence of between-study heterogeneity is causing more variation in the treatment effects than is expected only by random sampling error. 2;9 The Q-test is then defined as follows; H0 : Y1 = Y2 = · · · = Yk (for all k studies) H1 : At least one Yk differs, where Yk is the observed treatment in study k and Wk is the fixed-effect weighting of study k. The Q-statistic, which is given by the following formula n Wk Yk2 − Q= k=1 ( n 2 k=1 Wk Yk ) , n k=1 Wk (1.6) is compared to χ2 (α). If we find Q > χ2 (α), then we reject the null hypothesis at n−1 n−1 (1−α)% level and this suggests that there is evidence of between-study heterogeneity. If Q < χ2 (α), then we accept the null hypothesis at at (1 − α)% and this suggests n−1 there is no evidence of between-study heterogeneity. 2;9 Another useful statistic is the I 2 -statistic, this measures approximately the percentage of total variation that is down to between-study heterogeneity. 9 It is given by the following formula; 9
  • 13. I 2 = 100% (Q − (k − 1)) Q (1.7) where Q is the Q-statistic worked out using (1.5). If our I 2 is 0%, this suggests that all the variability in our summary estimate is down to random sampling error (within-study heterogeneity) and not down to between-study variation and therefore it could make sense to use a fixed-effect model. I 2 values are considered by Higgins et al. to be low, moderate and high on the values of 25%, 50% and 75% respectively. 2;9 If we obtain a negative value for I 2 , the value is set to 0 and interpreted in the same way as 0. I must stress that both the Q-test and the I 2 -statistic should be used as tools to help us to decide what model we use, the decision on what model we use shouldn’t be solely based on the performance of the Q-test and I 2 -statistic since they aren’t precise. If we consider the Q-test, while a significant p-value suggests that there exists variation in the individual treatment effects, a non-significant p-value doesn’t necessarily mean a common effect exists. The lack of significant can be as a result of a lack of power. If there are few trials or we have lots of within-study error as a result of trials having small sample sizes, the even the presence of a large amount of between-study heterogeneity may result in a non-significant p-value. 2 If there are few studies, a significance level of 10% is often used because of lack of power, so a p-value strictly less than 0.1 would be enough to accept the null hypothesis that there exists no between-study heterogeneity. The I 2 -statistic itself is dependent on the Q-statistic therefore if the Q-test lacks power, then the I 2 will be imprecise. Also, I 2 may tell us what proportion of the variation is down to real error but what it doesn’t tell us is how spread out the error is. A high value of I 2 implies a high proportion of the variation is down to real error but this error may only be spread out narrowly since the studies may have high precision. Conversely, a low I 2 only implies a low proportion of variation is down to real error but doesn’t imply the effects are grouped together in a narrow range, they could easily vary of a wide range if the studies used lack precision. 2 Higgins in his paper 10 talks about the misunderstanding of the I 2 -statistic and believes it should only be used a descriptive statistic. Example 1 (Continued) I now apply both the Q-test and the I 2 -statistic to example 1 and see if conducting a fixed-effect analysis to that example was appropriate. Conducting a Q-test leads to a Q-statistic of 2.490 using (1.6), this is compared to χ2 = 14.684 (we use 10% 9 level of significance, since we only have a few studies). Since our test statistic of 10
  • 14. 2.490 < 14.684, there is no statistical evidence against H0 at 10% level of significance. This suggests that there is no sign of between-study heterogeneity. I also work out the I 2 -statistic, here our I 2 value is −261.385% using (1.7), which is set to 0 which suggests that the total variation across the studies is only down to within-study error. If we observe the forest plots in figure 1.1, it’s fairly clear to see that the observed treatment effects do not deviate too far from the summary estimate so using a fixedeffect model seems appropriate, so I can regard our summary estimate as the common effect. If we conclude that between-study heterogeneity is present, we cannot use the fixedeffect model, we instead use the random-effects model which is discussed in the next chapter. I briefly discuss two alternatives that try and eradicate all presence of between-study heterogeneity which can be ideal from a researchers perspective. The first is sub-group analysis, in this case, a series of fixed-effect meta-analyses are performed on each sub-group where studies in each group are deemed similar enough to assume a common effect. Problems with the sub-group analysis is that each sub-group will contain fewer studies so we have a loss of power and instead of carrying out one synthesis, we are doing several and we still aren’t guaranteed a sufficient amount of between-study heterogeneity will be removed. 2 The second option is meta-regression where the covariates in the model explain the variation in the data and we can obtain the treatment effect for each covariate while adjusting for the others. A problem with this method is that unidentified sources of heterogeneity aren’t accounted for. 11 A problem inherent in both alternatives is that with a few studies, both aren’t useful since there is a loss of power, i.e. in the case of meta-regression, we have low power to detect what covariates explain heterogeneity. 2;11 1.6 Random-Effects Meta-Analysis The second type of meta-analysis I discuss is the random-effects meta-analysis. This model assumed that the individual treatments effects vary across the studies because of the presence of real differences (between-study heterogeneity) as well as random sampling error. A random-effects model assumes that the true effects from the individual studies come from a distribution of true effects with mean θ and variance equal to the magnitude of the between-study heterogeneity which I denote as τ 2 and term between-study variance (we can usually assume a normal distribution). The repercussions of this model is that if we had an infinite number of studies with an infinitely large sample size, we expect the random sampling error to tend to zero but expect the individual treatment effects to still differ because of real differences that exist between them. 2;5 In the random-effects model, we can express the observed treatment effects in the following way, 11
  • 15. ˆ Yk = θ + ζk + εk (1.8) ˆ where θ is the summary estimate, εk is the sampling error in study k and ζk is the between-study error in study k. We again assume that εk ∼ N(0,Var(Yk )) and assume that ζk ∼ N(0,ˆ2 ). Here the errors account for the within-study error and the betweenτ study error since in the random-effects model, we allow for two sources of variation. 2 ˆ For the fixed-effect meta-analysis, the aim is to compute the summary estimate θ, which is interpreted as the best estimate of the common treatment effect that underlies each of the studies in the analysis, along with a 95% confidence interval. For the random-effects meta-analysis, computing the summary estimate alone and its 95% confidence interval is insufficient. Since we assume there exists real differences between the treatment effects, the aim of a random-effects meta-analysis is not only to compute the summary estimate but also to explain the differences that exists between the trials and learn about how the individual treatment effects are distributed about the ˆ summary estimate. 2;5 I note that the summary estimate θ is now interpreted as the average effect. 1.7 Carrying out a Random-Effects Meta-Analysis To carry out a random-effects meta-analysis, we first need to estimate the betweenstudy variance since it describes the magnitude of the between-study heterogeneity ˆ and this has to be incorporated into the calculations of the summary estimate θ. To estimate τ 2 , we use the DerSimonian and Laird method which provides an unbiased point estimate for τ 2 . 12 This is given by the following formula, τ2 = ˆ Q − (k − 1) n k=1 Wk − n 2 k=1 Wk n k=1 Wk (1.9) where Q is the Q-statistic calculated using (1.6) and Wk are the weights for each study from the fixed-effect meta-analysis calculated using (1.2). I note that should Q < (k − 1), then we set τ 2 = 0. If our point estimate of between-study variance is zero (implying no between-study heterogeneity), then the random-effects model reduces to the fixed-effects model. Similar to the fixed-effect model, we use the inverse variance method to weight the individual studies. In the fixed-effect model, since we assume each study is estimating the same common effect, the study with the highest precision is given the largest weighting since it will contain the most information about the true summary effect 12
  • 16. θ. In a random-effects model, the weighting has to be given more care since each study is no longer estimating the same treatment effect. 2 The weighting must now take into account the estimate of the between-study variance τ 2 so the study with the ˆ largest precision doesn’t have as much influence as it would if a fixed-effect model was assumed. So, in a random-effects model, the weight given to each study is given by ∗ Wk = 1 . V ar(Yk + τ 2 ) ˆ (1.10) ˆ The formula for θ using a random-effects model is given by ˆ θ= n ∗ k=1 Yk Wk n ∗ k=1 Wk (1.11) and has variance ˆ V ar(θ) = 1 n k=1 ∗ Wk . (1.12) ˆ I reiterate that θ should be interpreted as average or mean treatment effect and not the common effect, since by using a random-effects model, I am assuming that the true effects from each of the studies are distributed about the man of a distribution ˆ of true effects and θ is the estimate of this mean. I also note that the true treatment effect in an individual study could be lower or higher than this average effect. ˆ A 95% confidence interval for θ is given by ˆ ˆ θ ± 1.96s.e.(θ) . (1.13) Example 2 Table 1.2 presented below shows the results from ten randomised trials each comparing the benefit of another anti-hyperintensive treatment, treatment B against placebo. Each trial is presented with its unbiased estimated mean difference in change in systolic blood pressure (mmHg), variance and a 95% confidence interval. 7 13
  • 17. T rial(k) 1 2 3 4 5 6 7 8 9 10 θk 0.00 0.10 -0.40 -0.80 -0.63 -0.22 -0.34 -0.51 -0.03 -0.81 V ark 0.423 0.219 0.026 0.199 0.301 0.301 0.071 0.102 0.122 0.301 95% C.I. [-0.829,0.829] [-0.329,0.529] [-0.451,-0.349] [-1.190,-0.410] [-1.220,-0.040] [-0.370,0.810] [-0.480,-0.201] [-0.710,-0.310] [-0.209,0.269] [-1.340,-0.220] Table 1.2: Results of trials comparing treatment B against placebo (A value < 0 represents a reduction in blood pressure and therefore beneficial) I first test for heterogeneity to help us decide what type of meta-analysis we should use. We obtain a Q-statistic of 30.876 > χ2 (0.05) = 14.684 using (1.6) which suggests 9 evidence of heterogeneity at 10% level of significance. I also obtained an I 2 value of 70.85% using (1.7) which suggests that 70.85% of the variation in treatment effects is due to between-study heterogeneity and the rest is due to chance. This is considered a high level of between-study heterogeneity and therefore a random-effects meta-analysis would seem appropriate to use. Using the formulas (1.9) through to (1.13), I obtained τ 2 to be 0.029 and summary ˆ estimate of -0.33 along with 95% confidence of [-0.48,-0.18]. So on average, treatment B reduced systolic blood pressure by 0.33mmHg but in an individual study, the treatment effect can vary from this average and since the null value of 0 is not in the 95% confidence interval, there is strong evidence at 5% level that treatment B, on average, is beneficial. A forest plot of the results from the meta-analysis is shown in figure 1.2. We can see that unlike in figure 1.1, there is clear deviations from the individual treatment effects to the summary estimate so it would therefore seem appropriate to assume that each trial is estimating a different treatment effect and use a randomeffects model to account for it. 1.8 Fixed-Effect v Random-Effects It is imperative that when conducting a meta-analysis, the right model is chosen since it influences how we interpret the results. If we look at examples 1 (figure 1.1 on page 8) and 2 (figure 1.2 on page 15), both of these produce the same summary estimate of -0.33 and have the same 95% confidence interval of [-0.48,-0.18]. Despite these similarities, the way in which they are interpreted are very different. In example 1, I used 14
  • 18. Figure 1.2: Forest plot of a meta-analysis of randomised controlled trials showing the effects of treatment B on reducing systolic blood-pressure (SMD = standardised mean difference) 7 a fixed-effect model which I justified because I believeed there is no presence of real differences between the studies so the summary estimate is the common effect across the studies. In example 2, I decided to use a random-effects model since I believed the variation between the individual treatment effects were down to real differences as well as random-sampling error so therefore, I regard the summary estimate as the average across the studies but in an individual study, the treatment effect can vary from this average effect. Despite these differences, there still seems to be some misunderstanding when it comes to choosing what model to use and in interpreting the results. Riley at al. 7 reviewed ˆ 44 Cochrane reviews that wrongly interpreted θ as the common effect rather than the average effect when using a random-effects approach. They also reviewed 31 Cochrane reviews that used a fixed-effect meta-analysis and found that 26 of these had I 2 values of 25% or more without justifying why a fixed-effect model was used. Using a fixed-effect model in these situations must be justified, otherwise we end up making inaccurate conclusions from the results since we are suggesting there is 15
  • 19. a single common effect when actually no common treatment effect exists because of real differences amongst the studies. A reason for misinterpretation can be put down to the fact that if we observed the forest plots for examples 1 and 2, the results are presented in the same way which causes confusion. Skipka et al. 13 point this out and also point out that the point estimate of τ 2 is never displayed on the forest plot. I have already commented that the choice of what model we use shouldn’t be solely based on the Q-test and the I 2 -statistic but how should we go about choosing what model we use. Let say we wish to carry out a meta-analysis on a sufficient number of studies looking at some treatment against placebo. If we know there are a sufficient number of properties that these studies have in common, for example similar age range, similar dosage, similar follow-up time etc, then it would seem appropriate to use a fixed-effect model since we believe there are negligible real differences between the studies and any factors that do affect the treatment effects are the same across the studies. A common procedure is to carry out a fixed-effect meta-analysis and observe the forest plot to see if the observed treatment effects are similar. 2 There are two problems with this, firstly it isn’t clear if the observed differences are only down to random sampling error and if this was the incorrect model, then carrying it out was a waste of time. If we believe there are real differences, then a random-effects model should be implemented. In this model, each study is expected to be estimating a difference treatment effect and the job of this type of meta-analysis is to make sense of the differences between the studies and how the true individual treatment effects are distributed about the summary estimate. 2;5 A clear advantage of a random-effects meta-analysis is that we can generalise our results to a range of populations not included in the analysis given that the analysis includes a sufficient number of studies, this maybe one of the goals of the underlying systematic review. 2;5 If we wanted to estimate what the treatment effects will be in a new study, we can draw it from our results as long as we can describe how the individual treatments are distributed about the summary estimate with adequate precision. 5 In a fixed-effect model, we cannot generalise since our results are exclusive to certain properties, for example a particular population. 2 16
  • 20. Chapter 2 Prediction Interval In the presence of between-study heterogeneity, the aim of a meta-analysis isn’t just to calculate the summary estimate but also to make sense of the heterogeneity. I have already pointed out that methods of eradicating all presence of heterogeneity can be difficult because of unknown sources of heterogeneity so it would seem better to assess heterogeneity rather than try and remove it. Higgins 10 believes any amount of heterogeneity is acceptable provided there is a “sound predefined eligibility criteria” and that the “data is correct” but stresses that a meta-analysis must provide a stern assessment of heterogeneity. Since a random-effects meta-analysis accounts for unidentified sources of heterogeneity 7 , I believe it should be gold standard for explaining heterogeneous data. Unfortunately, once researchers have carried out a random-effects meta-analysis, they tend to focus on the summary estimate and its 95% confidence interval, this however isn’t sufficient since, by the assumption of a random-effects model, we allow for real differences between the individual studies. 2;7 If we were using a fixed-effect model, then focusing on the summary estimate, which gives the best estimate of the common effect, and its 95% confidence interval, which describes the impact of within-study heterogeneity on the summary estimate, is adequate. The random-effects summary estimate tell us the average effect across the studies and its 95% confidence interval indicates the region in which we are 95% sure that our estimate lies in, neither tell us how the individual treatment effects are distributed about the random-effects summary estimate. 5 This leads us to the introduction of the prediction interval which is discussed in this chapter. 17
  • 21. 2.1 95% Prediction Interval A 95% prediction interval gives the range for which we are 95% sure that the potential treatment effect of a brand new individual study lies. The beauty of a prediction interval is that not only does it quantitatively give a range for a treatment effect in a new study thus allowing the researcher, clinicians etc to apply the results into future applications, but it also offers a suitable way to express the full uncertainty around the summary estimate in a way which acknowledges heterogeneity. A prediction interval can also describe how the true individual treatment effects are distributed about the summary estimate. 2;5;7;13 For these reasons, the inclusion of a prediction interval in a random-effects meta-analysis can make its conclusions more robust and provide a more complete summary of the results and therefore making the results more relevant to clinical practice. 14 The notion of a prediction interval was first proposed by Ades et al. 8 where they propose a predictive distribution of a future treatment effect in a brand new study using a Bayesian approach to meta-analysis. A further push for the prediction interval in meta-analysis is seen in a paper by Higgins et al. 5 . The authors acknowledge the small attention that has been given to predictions to meta-analysis and present the prediction interval in a classical (frequentist) framework to meta-analysis. Higgins et al. 10;5 believe that a prediction interval is the most convenient way to present the findings of a random-effects meta-analysis in a way that acknowledges heterogeneity since it takes into account the full distribution of effects in the analysis. 2.2 Calculating a Prediction Interval When calculating a prediction interval, we not only account for the between-study and within-study heterogeneity, but also for the uncertainty of the summary estimate ˆ θ and the uncertainty of the between-study variance τ 2 . 2 Let say we knew the true ˆ values of the summary effect θ and the between-study variance τ 2 , if we made the assumption that the treatment effects across the studies are normally distributed, the 95% prediction interval would be given by √ θ ± 1.96 τ 2 . (2.1) The problem with (2.1) is that we do not know the exact values of theta and τ 2 , rather we are estimating them and because of this, there is uncertainty surrounding these estimates. 2 To account for this, we use the following formula provided by Higgins et al. 5 for a 95% prediction interval which is given by 18
  • 22. ˆ 0.05 θ ± tn−2 ˆ τ 2 + V ar(θ) . ˆ (2.2) ˆ ˆ Here, θ is the summary estimate form the random-effects meta-analysis, V ar(θ) is the variance of the summary estimate accounting for the uncertainty of the estimate of 0.05 θ, τ 2 is the estimate of the between-study variance, tn−2 is the t-value corresponding ˆ to the 95th percentile of the t-distribution where there are n − 2 degrees of freedom (where n is the number of studies) which accounts for the uncertainty of the estimate of τ 2 . 2;5 We require at least three studies to calculate a prediction interval 7 and we also must remember to exponentiate the end points of (2.2) if we are working on the log scale. Example 2 with a Prediction Interval In example 2, I used a random-effects model and found the summary estimate to be -0.33mmHg, between-study variance τ 2 to be 0.029 and the 95% confidence interval ˆ for the summary estimate to be [-0.48,-0.18] (see figure 1.2). I can now calculate a prediction interval for example 2 using (2.2), I obtained the interval [-0.76,0.09]. We notice that the null value of 0 is now in the prediction interval so therefore, it isn’t statistically significant at the 5% level. So, in a brand new individual study setting, we are 95% sure that the potential treatment effect for this study will be between 0.76mmHg and 0.09mmHg. Although on average, the treatment will be beneficial (as indicated from the 95% confidence interval), in a single study setting, we cannot rule out that the treatment may actually be harmful (since the 95% prediction interval contains values < 0). The prediction interval therefore acknowledges the impact of heterogeneity that was masked by just focusing on the random-effects summary estimate and its 95% confidence by themselves. A forest plot for example 2 is given in figure 2.1 but now includes a 95% prediction interval. The prediction interval is given by the diamond at the bottom of the forest plot in figure 2.1. The centre of the diamond represents the random-effects summary estimate, the width of the diamond represents the 95% confidence interval for the summary estimate and the width of the lines going through the diamond represents the 95% prediction interval. Skipka et al. 13 discuss different methods that have been proposed of how a prediction interval should be presented in a forest plot. They also suggests that the inclusion of a prediction interval in a forest plot is a good way of distinguishing between a random-effects and fixed-effect forest plot. Throughout this paper, I will present a 95% prediction interval in a forest plot as is seen in figure 2.1. 19
  • 23. Figure 2.1: Forest plot of a meta-analysis of randomised controlled trials showing the effects of treatment B on reducing systolic blood pressure with a 95% prediction interval (SMD = standardised mean difference) 7 2.3 Discussion It is important that I address a few issues that arise when working with a prediction interval. A problem that exists in both prediction interval and in a random-effects meta-analysis is when the analysis has few studies. If we have few studies, regardless how large they are, the prediction interval will be wide because of the lack of precision in the DerSimonian and Laird method (using (1.9)) estimate of τ 2 . 2;5 If our meta-analysis contains few studies and has substantial between-study heterogeneity, a random-effects meta-analysis remains the correct option but an alternative approach could be to use a Bayesian approach to estimate τ 2 instead of using the DerSimonian and Laird method which is sensitive to the number of studies in the analysis. A Bayesian approach uses prior information outside the studies to calculate an estimate to τ 2 . This approach has the advantage of naturally allowing the full uncertainty ˆ around all the parameters in the model and incorporation information that may not be considered in a frequentist model. The approach however can be difficult to im20
  • 24. plement and could be prone to bias. I refer papers by Higgins et al. 5 and Ades et al. 8 which provide a more thorough description on the Bayesian approach to prediction intervals. Another problem that occurs because of having a small number of studies is the validity of the assumption that when calculating a prediction interval, the population in a new study “sufficiently similar” to those in the studies already included in the analysis. In a random-effects meta-analysis, since we allow for real differences, each study will be different in many ways, the more studies we have, the broader the range of populations we cover thus validating this assumption. 5 We also assume that each study has a low risk of bias, i.e. that each study included in the analysis has been carried out under good conduct. If this wasn’t the case, the prediction interval will inherit heterogeneity caused by these biases. 7 Finally, it seems meaningful to make it absolutely clear the differences between a random-effects 95% confidence interval and a 95% prediction interval since. A 95% confidence interval in a random-effects meta-analysis contains the region in which we are 95% sure that our summary estimate (regarded as the average effect) lies within. The width of the confidence interval accounts for the error in the summary estimate and with an infinite number of infinitely large studies, the end points of the confidence interval will tend to the summary estimate. 2 The mistake that is made is that the 95% confidence interval from a random-effects meta-analysis measures the extent of heterogeneity but this wrong since we only consider the error in the summary estimate. 5 A 95% prediction interval contains the region in which we are 95% sure that the potential treatment effect in a brand new individual study lies within. Another way to describe a 95% prediction interval is that we can draw the potential treatment effect, denoted ynew with 95% precision from the prediction interval since the prediction interval describes how the true individual treatment effects are distributed about the summary estimate. 5 If we had an infinite number of infinitely large studies, we expect the width of the prediction interval to reflect the actual variation between the true treatment effects. 2 Since the 95% prediction interval accounts for all the uncertainty, the 95% prediction interval will never be smaller than its corresponding 95% random-effects confidence interval so we can regard the 95% random-effects confidence interval as a subset of the 95% prediction interval. 21
  • 25. Chapter 3 Empirical review of the impact of using prediction intervals on existing meta-analyses 3.1 Introduction A random-effects meta-analysis should remain gold standard for analysing heterogeneous studies but solely presenting the summary estimate from the random-effects meta-analysis and its 95% confidence interval masks the potential effects of heterogeneity. 7 The addition of a prediction interval gives a more complete summary of the results from a random-effects meta-analysis in a way that acknowledges heterogeneity and therefore making it easy to apply to clinical practice. 5 A 95% prediction interval, with enough studies, can describe the distribution of true treatment effects and therefore gives a range for which we can be 95% sure that the potential treatment effect in a brand new study, ynew , is within. 2;5 The aim of this review is to assess the impact of a 95% prediction interval on the outcomes of existing meta-analyses of randomised controlled trials. I want to see if the inclusion of a 95% prediction interval can help interpret the results of a randomeffects meta-analysis to a higher degree of accuracy and therefore recommend whether or not a random-effects meta-analysis should always include a 95% prediction interval in its analysis. 22
  • 26. 3.2 3.2.1 Methods Search Strategy and Selection Criteria To find the studies for the review, I electronically searched for studies on the Lancet website (www.lancet.com). I used the Lancet since it is one of the oldest and most respected medical journals and has vast amounts of medical literature. I used the advanced search toolbar on the Lancet website using the key words “RANDOMISED TRIAL” and “META ANALYSIS” in the abstract of all research, reviews and seminars in all years in all Lancet journals. The search was carried out on 20/12/2011 and produced 61 studies. For each study, I initially obtained a PDF file of the study plus any supplementary material using Sciencedirect via access through the University of Birmingham student portal. The eligibility criteria for the studies to enter the review is that each study must include at least one meta-analysis of three or randomised controlled trials on their primary outcomes as defined by the authors of the studies. Of the 61 studies, I reviewed their abstracts to remove any irrelevant studies. I excluded studies that only contained a meta-analysis of non-randomised controlled trials (e.g. observational studies) since I am only interested in meta-analyses of randomised controlled trials whereby patients are randomly assigned to the treatment or control group. Randomised controlled trials cancel the effects of known and unknown confounding factors as well as selection bias. 2 I also excluded studies that had a meta-analysis of less than three randomised controlled trials which is seen as the minimum number of trials required to calculate a prediction interval. 7 In the case where the meta-analysis contained a mixture of randomised and non-randomised controlled trial, I took the meta-analysis of the randomised controlled trials only if the author had explicitly presented the meta-analysis of the randomised controlled trials along with the overall meta-analysis, if they only presented a meta-analysis covering all randomised and non-randomised trials, the study is excluded. I also excluded any studies that didn’t display data by trial. Other reasons for study exclusion were that some of the studies were only randomised controlled trials and not meta-analyses, some studies were informative studies or research papers on meta-analysis and a couple of studies were network meta-analyses which were removed since they are potentially more subject to error than typical meta-analyses. I also came across studies that were duplicates for which I only considered the most recent study. The flow chart given below in figure 3.1 describes the process. The boxes contain the reasons for excluding the studies and the number represents the studies that were removed for that reasons. 23
  • 27. Figure 3.1: Flow chart describing the process of excluding studies for the review 3.2.2 Data Calculations I had a total of 26 studies that passed my eligibility criteria to enter the review. From these studies, I extracted 36 meta-analyses containing between three to thirty-four randomised controlled trials. For each meta-analysis, I reproduced the analysis using a random-effects model (using formulas (1.9) to (1.13)) with a 95% prediction interval (using formula (2.2)) as well as calculating I 2 -statistic (using formula (1.7)). For 20 of the studies, from which 26 meta-analyses were extracted, I could directly calculate individual trial treatment effects and its variance (log variance if the effect-size of interest is a ratio). For these, the individual treatment effects are calculated using 24
  • 28. the following formulas depending on the relevant outcome of interest. We define the following a = Number of events in the treatment group b = Number of events in the control group NT = Total number of patients in the treatment group NC = Total number of patients in the control group c = NT − a d = NC − b Odds Ratio The odds ratio for trial k is given by 2 a·d b·c (3.1) 1 1 1 1 + + + . a b c d (3.2) YkOR = and has log variance ln(V ar(YkOR )) = A 95% confidence interval for the odds ratio in the k-th trial is given by exp ln(YkOR ) ± 1.96 V ar(YkOR ) . (3.3) Relative Risk The relative risk for trial k is given by 2 YkRR = a · NC b · NT (3.4) and has log variance ln(V ar(YkRR )) = 1 1 1 1 + − − . a b NT NC A 95% confidence for the relative risk in the k-th trial is given by 25 (3.5)
  • 29. exp ln(YkRR ) ± 1.96 V ar(YkRR ) . (3.6) Risk Difference The risk difference for trial k is given by 2 YkRD = a b − NT NC (3.7) and has variance V ar(YkRD ) = a NT 1− a NT NT + b NC 1− b NC NC . (3.8) A 95% confidence for the risk difference in the k-th trial is given by YkRD ± 1.96 V ar(YkRD ) (3.9) Hazard Ratio To calculate the Hazard Ratio for the k-th trial, we require the difference between the observed deaths and expected deaths (O − E) and the variance V ar(O − E). 15 YkHR = exp (O − E) V ar(O − E) (3.10) and has log variance ln(V ar(YkHR )) = 1 . V ar(O − E) (3.11) A 95% confidence for the hazard ratio in the k-th trial is given by exp ln(YkHR ) ± 1.96 V ar(YkHR ) 26 . (3.12)
  • 30. Extra Formulas For 6 of the studies, from which 10 meta-analyses were extracted, only the individual trial treatment effects along with their 95% confidence intervals were reported. For these studies, I couldn’t directly calculate the individual trial standard errors and therefore the standard errors are estimated using the following formulas. We let x− and x+ be the lower and upper bounds respectively of the 95% confidence interval for θk . For effect-sizes that require us to work on the log scale, i.e. odds ratios, relative risks and hazard ratios, the standard error in the k-th trial is calculated using the formula s.e. YkHR,RR,OR = 1 2 ln(x+ ) − ln(x− ) 1.96 , (3.13) For differences (continous outcomes), the standard error in the k-th trial is calculated using the formula s.e. YkRD = 3.2.3 1 2 x + − x− 1.96 . (3.14) Software I used the statistical software STATA v10.1 to perform a random-effects meta-analysis with a 95% prediction interval on each meta-analysis that we included in the study. The software incorporates the formulas (1.7), (1.9) to (1.13), (2.2) and any of the relevant formulas from (3.1) to (3.12). All forest plots produced in this paper are created using STATA (see Appendix for STATA codes). 3.3 Results From 26 studies, I took 36 meta-analyses containing between three to thirty-four randomised controlled trials (median eight trials, IQ range seven trials) and reproduced each meta-analysis using a random-effects model with a 95% prediction interval. The results of all 36 random-effects meta-analyses with a 95% prediction interval are presented in the table in figure 3.2. 27
  • 31. Figure 3.2: Main characteristics of studies included in the review (Note: Outcome of interest defined as given by authors, HR = Hazard Ratio, OR = Odds Ratio, RD = ˆ Risk Difference, RR = Relative Risk, θ is the random-effects summary estimate, 95% C.I. = 95% confidence interval ,I 2 is percentage of heterogeneity down to real differences, τ 2 is estimate of between-study variance, 95% P.I. = 95% prediction interval) ˆ 28
  • 32. I classified each study to the following groups; 1. Their 95% confidence and prediction interval contained their null value 2. Their 95% confidence and prediction interval excluded their null value 3. Their 95% confidence interval excluded the null value but their 95% prediction included the null value For the first type, I found 17 (47.2%) of the meta-analyses had their 95% confidence interval contain their respective null values. For these meta-analyses, the 95% prediction interval will also contain the null value since the 95% confidence interval is a subset of the 95% prediction interval. Focusing on these studies, 6 of these had only three trials which is the minimum required to calculate a prediction interval. In fact, 11 of these 17 meta-analyses had less than ten trials in their analysis which may explain why their 95% confidence intervals contain their null value, since a randomeffects meta-analysis will have low power to detect significant results when there are few studies in the analysis. 2 In study ID 15 30 , the meta-analysis contains only three trials (there were originally four trials but no events occurred in one of the trials so the trial was discarded from the analysis), yet there is a significant amount of between-study heterogeneity as indicated by the large I 2 value of 49.4% (suggesting that almost half of the variation in treatment effects is down to real differences) and τ 2 value of 0.3369. The study itself is primarily ˆ a randomised controlled trial looking at assessing whether granulocyte-macrophage colony stimulating factor (GM-CSF) administered as prophylaxis to preterm neonates at high risk of neutropenia reduces sepsis, mortality and morbidity. The authors also carried out a meta-analysis of their trial along with two other published randomised controlled trials to see if there is a treatment benefit. Each trial estimated on odds ratio with an odds ratio < 1 indicated treatment is beneficial. The authors used a fixed-effect model stating “there was no evidence of between-trial heterogeneity” yet the large τ 2 and I 2 values suggest otherwise so a random-effects model would be better ˆ suited to analyse the data. I obtained a summary estimate of 0.84 (authors obtained 0.94) and a 95% confidence interval of [0.32,2.17] (authors obtained [0.55,1.60]). In both cases, the 95% confidence intervals included the null value so on average, there isn’t any evidence at 5% level that the treatment is beneficial. The authors look to use subgroup analysis to analyse the data but a prediction interval can further explain the results in a way that acknowledges heterogeneity. A 95% prediction interval was calculated to be (0,12655.86]. All the results are presented in a forest plot in figure 3.3. The 95% prediction interval is extremely large in this case. The results occurs because we are using the t-distribution, which accounts for the uncertainty in τ 2 , with few studˆ ies which results in a large value of tk−2 as well as accounting for large between-study heterogeneity. When using a random-effects meta-analysis, we make the assumption 29
  • 33. Figure 3.3: Forest plot showing a meta-analysis of randomised controlled trials of GM-CSF for preventing neonatal infections 30 that each study is estimating a different treatment effect, if we have studies in the presence of substantial between-study heterogeneity, irrespective of how large they are, we have low power to detect significant results. 2;5 Study ID 17 32 , a meta-analysis of three randomised controlled trials, also has a large 95% prediction interval given by (0,91064.69] but unlike study ID 15 30 , has no evidence ˆ of between-study heterogeneity suggested by I 2 and τ 2 values of 0. In this case, the large prediction interval is attributed to the uncertainty in the estimate of τ 2 since there are too few trials. In these cases, a Bayesian approach to calculating τ 2 may ˆ work better. 5;8 The studies that had more than 10 trials which had both their 95% confidence and prediction intervals contain the null value tended to have narrower 95% confidence intervals and apart from study ID 3c 18 , only slightly include their respective null value. For the second type, 9 (25%) meta-analyses had both their 95% confidence and prediction interval exclude their respective null value. In these cases, the prediction interval remains significant at the 5% level even after we have considered the whole distribution of effects. Out of these 9 meta-analyses, 7 of there had their I 2 and τ 2 values to ˆ 2 be 0 (or very close to 0) and 1 other meta-analysis had an I value of 6.1% and τ 2 ˆ 30
  • 34. value of 0.0027. In the case of these 8 meta-analyses, the 95% predictions intervals are only slightly wider than the 95% confidence intervals. In the general case where a prediction interval slightly increases the width of a random-effects confidence interval and I 2 and τ 2 are 0 (suggesting no evidence of between-study heterogeneity), a comˆ mon effect may be assumed since the impact of heterogeneity is negligible and the extra width in the prediction interval is only attributable in the uncertainty surround the estimate of τ 2 (which are 0 or very close to 0 in these cases). In study ID 11a 26 , the authors carried out two meta-analyses of individual patient data to investigate the effect of adjuvant chemotherapy in operable non-small-cell lung cancer. The first meta-analysis was observing the effect of surgery and chemotherapy against surgery on survival by type of chemotherapy and the second was the effect of surgery and radiotherapy and chemotherapy versus surgery and radiotherapy on survival by type of chemotherapy. Both meta-analyses were extracted for the review but the first meta-analysis is the one of interest. The analysis included thirty-four randomised controlled trials each estimating a hazard ratio where a hazard ratio < 1 indicates survival better with surgery and chemotherapy. I calculated I 2 and τ 2 values ˆ to be 6.1% (authors calculated 4% and 0.0027 respectively) indicating little betweenstudy heterogeneity across the trials despite the trials differing by number of patients, drug used, number of cycles, etc. The authors used a fixed-effect model to analyse the data and used χ2 test to investigate any differences in treatment effects across the trials. Using a random-effects meta-analysis, I obtained a summary estimate of 0.86 (authors also obtained 0.86), a 95% confidence interval of [0.80,0.92] (authors obtained [0.81,0.92]) and 95% prediction interval of [0.75,0.97], the results are displayed in figure 3.4. The summary estimate suggests that on average, survival is better with surgery and chemotherapy compared to surgery alone. The 95% confidence interval didn’t contain the null value and is entire < 1 so there is strong evidence that on average, survival better with surgery and chemotherapy. The authors acknowledge this and state along with their second meta-analysis “The results showed a clear benefit of chemotherapy with little heterogeneity”, but is this always the case. The 95% prediction interval is also entirely < 1, so now having considered the whole distribution of effects, we can say that chemotherapy surgery will increase survival when carried out in at least 95% of brand new individual study settings. I point out that the author’s results, using a fixed-effect meta-analysis, were very similar to my results using a randomeffects meta-analysis. Furthermore, the 95% prediction interval is only slightly wider than the 95% confidence interval which indicates that the impact of between-study heterogeneity is small across all the trials and there maybe justification for using a fixed-effect model. Despite this, a random-effects model is still useful since it accounts for all uncertainty 5 . We’ve seen already how a prediction interval can be wide (e.g. Study ID 15 30 , Study ID 17 32 ) if there is uncertainty in the actual estimates regardless of whether there is evidence of between-study heterogeneity or not. 31
  • 35. Figure 3.4: Forest plot showing a meta-analysis of randomised controlled trials assessing the effect of surgery (S) and chemotherapy (CT) versus surgery alone 26 The 1 other meta-analysis that is yet unaccounted for is study ID 3d 18 . The authors are assessing the use of recombinant tissue plasminogen activator (rt-Pa) for acute ischaemic stroke. They had updated a previous systematic review by adding a new large randomised controlled trial to the analysis. The review contained four metaanalyses, all of which were extracted for the review but the meta-analysis of interest (study ID 3d) is looking at the effect or rt-Pa on systematic intracranial haemorrhage (SICH) within 7 days on patients who have suffered an acute ischaemic stroke. The 32
  • 36. analysis included twelve randomised controlled trials each estimating an odds ratio where an odds ratio < 1 indicates rt-Pa reduced development of SICH. The trials used in this study differed by dosage, final follow-up time, stroke type etc, which has resulted in us obtaining large I 2 and τ 2 values of 43.4% and 0.2320 respectively. ˆ The authors used a standard fixed-effect model and calculated heterogeneity using χ2 -statistic if there is presence of substantial heterogeneity. Given the large values of I 2 and τ 2 and observing the treatment effect as well as taking into account the differˆ ences between the trials, a random-effects meta-analysis seems more appropriate. So, using a random-effects meta-analysis, I obtained a summary estimate of 3.93 (authors obtained 3.72), 95% confidence interval of [3.44,6.35] (authors obtained [2.98,4.64]) and a 95% prediction interval of [1.18,13.10], the results are displayed in figure 3.5. Figure 3.5: Forest plot showing a meta-analysis of randomised controlled trials assessing the effects of SICH within 7 days (treatment up to 6 hours) 18 The summary estimate suggests that on average, the odds of developing SICH in the treatment group is 3.93 times the odds of developing SICH in the control group. The 95% confidence interval didn’t contain the null value and is entirely > 1 so provides 33
  • 37. strong evidence that on average, the treatment is more likely to increase the odds of SICH but it doesn’t indicate whether it will be always be the case. The 95% interval is entire > 1 suggesting that the treatment will increase the odds of SICH when carried out in at least 95% of brand new individual settings. Like study ID 11a 26 , the 95% prediction interval remains significant but unlike study ID 11a, the 95% prediction interval in study ID 3d was much wider than its 95% random-effects confidence interal. Here the impact of between-study heterogeneity is large, this can also be seen by the large I 2 ad τ 2 values which result in the large width of the 95% ˆ prediction interval. Like study ID 11a, the 95% prediction interval remains significant but unlike study ID 11a, the 95% prediction interval in study ID 3d is much wider than its 95% random-effects confidence interval. Here the impact of between-study heterogeneity is large (in study ID 11a, the impact is low), this can also be seen by the large I 2 and τ 2 values. The impact is such that in some cases, the odds of SICH, ˆ when rt-Pa is given, could be as low as 1.18 times the odds in the control but could be as high as 13.1 times the odds in the control group. The authors, by using a fixed-effect method, fail to acknowledge the potential effects of heterogeneity. They report that “42 more patients were alive and independent, 55 more were alive with a favourable outcome at the end of follow up despite an increase in the number of early symptomatic intracranial haemorrhages and early deaths. Since the odds of SICH in the treatment group could be as high as 13.1, further research could be carried out to identify scenarios when this may occur since this could reduce the number of patients that will have favourable results come the end of follow up. For the third type, 10 (27.8%) of the meta-analyses had their 95% confidence intervals exclude the null value but had their 95% prediction interval include the null. In these cases, the 95% prediction intervals are not significant at the 5% level after we have considered the whole distribution of effects. Most of the studies, apart from two, tended to have a significant amount of between-study heterogeneity based on the I 2 value ranging from 22.3% to 62.7% and τ 2 values ranging from 0.022 to 0.098. Two ˆ 2 2 studies had I value and τ values of 0. These were study ID 9 24 , which had 3 trials ˆ and justifiably use a fixed-effect method, and study ID 16b 31 , which had 9 trials, used a random-effects meta-analysis but do exercise caution since there are few trials which can result in the summary estimates carrying large uncertainty. In study ID 20 35 , the authors are looking at the efficacy of probiotics in prevention of acute diarrhoea . They carried out a meta-analysis of thirty-four randomised controlled trials each estimating a relative risk with a relative risk < 1 indicating the probiotic has a beneficial effect. The authors used a random-effects meta-analysis acknowledging the potential effects of heterogeneity since the studies differed in many such as study setting, age grow, follow-up duration, probiotic administered, dosage etc which resulted in a large I 2 value of 62.7% and τ 2 value of 0.0980. I obtained ˆ identical results to the authors, a summary estimate of 0.65 and a 95% confidence interval of [0.55,0.78]. Additionally, I obtained a 95% prediction interval of [0.34,1.27], 34
  • 38. the results are displayed in figure 3.6. Figure 3.6: Forest plot of a meta-analysis of randomised controlled trials assessing the effects of probiotics on diarrhoeal morbidity 35 The summary estimate of 0.65 indicates on average, the risk of diarrhoea morbidity is 0.65 times the risk of diarrhoea morbidity in the placebo group. The 95% confidence interval is entirely < 1 providing strong evidence that on average, the probiotics are beneficial but is this always the case. The authors acknowledge heterogeneity first by using a random-effects model and then by carrying out a subgroup and stratified 35
  • 39. analysis by assessing the effect of age, setting of trial, type of diarrhoea, probiotic strains used, formulation of probiotics administered, influence of setting and quality score of trials. A more formal way of acknowledging heterogeneity is to consider a 95% prediction interval which I calculated to be [0.34,1.27]. This interval now contains the null value and contains values > 1, so although on average the use of probiotics are beneficial, it may not always be the case in a brand new individual setting, in fact in some cases it may be harmful and further research is required to identify these scenarios. In study 23 38 , the authors are looking at the efficacy and safety of electroconvulsive therapy in depressive disorders. They carried out a meta-analysis of twenty-two randomised controlled trials each estimating a standardised risk difference where a risk difference > 0 favoured unilateral ECT and a risk difference < 0 favoured bilateral ECT. The authors reported both fixed-effect and random-effects results and acknowledge heterogeneity since the trials differ by dosage, methods of administration etc and this can be seen by the I 2 value of 24.00% and τ 2 value of 0.0286. I obtained ˆ slightly different results to the authors when using a random-effects meta-analysis, a summary estimate of -0.34 (authors obtained -0.32) and a 95% confidence interval of [-0.49,-0.20] (authors obtained [-0.46,-0.19]). I also obtained a 95% prediction interval of [-0.73,0.04], the results are displayed in figure 3.7. The summary estimate suggests that on average, out of a 100 patients, 34 more patients had favourable results in the bilateral group compared to the unilateral group. The 95% confidence interval is entirely < 0 providing strong evidence that on average, the bilateral group is better but is this always the case. The authors acknowledge heterogeneity by firstly reporting random-effects results and then by carrying out a meta-regression analysis but considering a prediction interval would be a more formal way of acknowledging heterogeneity. The 95% prediction interval is [-0.73,0.04] which now contains the null value 0 and slightly exceeds 0. This suggests that although on average the bilateral group is better, in a brand new individual study setting, the bilateral group may not be better and further research is required to identify such scenarios. 3.4 Discussion From 26 studies that entered my review, 36 meta-analyses were extracted and each reproduced using a random-effects model with a 95% prediction interval. My aim was to see whether or not these intervals had a significant impact on the conclusions of these studies. Most of the studies that I found reported a summary estimates (fixed or random-effects) along with a 95% confidence interval and carried out some type of analysis to assess heterogeneity. An observation worth noting is that none of the studies post 2005 mentioned the idea of predictions in the context of meta-analysis. 36
  • 40. Figure 3.7: Forest plot of a meta-analysis of randomised controlled trials assessing the effect of bilateral versus unilateral electrode placement on depressive symptoms 38 Papers by Ades et al. 8 and Higgins et al. 5 set the foundations for the use of prediction intervals in traditional and Bayesian meta-analysis and how presenting it can describe the extent of heterogeneity, how the true individual treatment effects are distributed about the random-effects summary estimate as well as giving a range for which the true treatment effect in an individual brand new study setting lies within. 2;5 3.4.1 Principal Findings I found that 17 (47.2%) of the 36 meta-analyses had their 95% confidence interval contain the null value. In these cases, the average effect across the trials is not significant at the 5% level and the 95% prediction interval will also include the null value. Presenting a 95% prediction interval in these cases is still useful since it helps describe 37
  • 41. the distribution of effects across the studies given there is between-study heterogeneity. The other 19 (52.8%) meta-analyses had their 95% confidence interval exclude the null values. In these cases, the average effect is significant at the 5% level, the aim is to see how many of their 95% predictions intervals now include the null value. I found that 9 of the meta-analyses had their 95% prediction interval exclude the null value whilst the other 10 included the null value. In terms of clinical practice, the prediction interval excluding the null indicates that in 95% of the times the treatment is applied in brand new study settings, the treatment will be beneficial/worse which is much more useful to clinicians than just reporting the average effect and the uncertainty around it. If the prediction interval included the null, then although the average effect is beneficial/worse, in some brand new individual study settings, the effect may be worse/beneficial. Again, this is much useful to clinicians and researchers since it reveals the impact of heterogeneity and can motivate further research to identify such cases. Another way of discussing our results is to consider the size of heterogeneity across the meta-analyses. I reiterate that describing heterogeneity is a key motivation for a prediction interval. If heterogeneity wasn’t a problem, then we could use a fixed-effect model in all cases but even the slightest differences between studies must be considered. 2 I found 12 meta-analyses had no evidence of between-study heterogeneity (I 2 and τ 2 values of 0), only in two of these cases 20;26 did they have more than ten trials. ˆ In many of these cases, the authors would tend to use a fixed-effect model but since there are few studies, we have low power to detect heterogeneity and therefore there may be uncertainty around I 2 and τ 2 values. 2 A common-effect should be assumed ˆ if there is no evidence of between-study heterogeneity and the 95% confidence and prediction intervals are close suggesting that the impact of heterogeneity is negligible and the uncertainty around the parameters are low (e.g. Study ID 11b 26 ). In some cases, there may seem to be no evidence of heterogeneity but if there are few studies, the uncertainty around τ 2 can be large resulting in wide prediction intervals (e.g. ˆ 32 Study ID 17 ). The other 24 meta-analyses had evidence of between-study heterogeneity (I 2 ranging from 0.30% to 62.90% and τ 2 ranging from 0.0001 to 0.3369). Whilst the randomˆ effects model wasn’t always used in these cases, in most of these cases, the authors did carry out some analysis of heterogeneity (e.g. subgroup analysis, meta regression etc). The problem that occurs is that if there are few trials in the analysis, the power to detect sources of heterogeneity is low and therefore the analysis lacks precision. 2;11 . A prediction interval when calculated with few studies will be large (e.g. study ID 15 30 and may not be useful from a clinicians point of view since the range of effects is so wide. On the other hand, in study ID 3d 18 , the 95% prediction interval is large yet was entirely above the null value, so even though there is uncertainty on what the effect could be in an individual study setting, we know that 95% of the times the treatment will have a negative effect (in that case), we just don’t know how bad of 38
  • 42. an effect it could be. From a researchers point of view, large prediction intervals can still have meaning since it reveals the uncertainty surrounding the parameters and therefore may just indicate that more trials, further research or other information (incorporate a Bayesian approach 5;8 ) should be required whereas a 95% confidence interval only tells us the average effect is significant/insignificant but this result may be imprecise due to the lack of trials. 3.4.2 Limitations It is important that potential limitations of this review are acknowledged. I decided to only use the Lancet database to search for studies since it is regarded as one of the world’s most respected medical journal. I expected each study to be of high standard in terms of methodology and conduct. Unfortunately, I cannot be sure that this is the case, flaws in procedure at trial level and meta-analysis level can result in error prone results and may not reflect the true performance of the intervention. 42 In these cases, the prediction interval will be wider since it mixes heterogeneity caused by real differences with heterogeneity as a result of methodological errors. 7 I also only included meta-analyses of randomised controlled trials since such trials cancel the effects of known and unknown confounding factors. I did come across meta-analyses of non-randomised trials (mainly observational studies) but excluded them since they are more influenced by confounders. Whilst randomised controlled trials are held in higher regard relative to observational studies, the jury remains out on whether we would take randomised trials of low or even average quality over high quality observational studies. Stroup et al. 44 “inclusion of sufficient detail to allow a reader to replicate meta-analytic methods was the only characteristic related to acceptance for publication” suggesting that high quality observational studies could be considered. I could’ve extended our search beyond the Lancet to other databases but I felt the Lancet already covered a wide variety of studies. There are also technical limitations to the review that must be addressed. Whilst there was a criteria that every meta-analysis must have at least three randomised controlled trials, with few studies, assumptions made when calculating a prediction interval may become violated. We assume a normal distribution but with few studies, this may be an inappropriate choice. 5 When considering the true treatment effect of a brand new study, I assume the population in this new study is “sufficiently similar” to those already covered in the analysis. If we have few studies, we fail to cover a sufficient range of populations resulting in a wider prediction interval accounting for large uncertainty. 2;5 I also wasn’t specific on what types of outcomes we allowed into the review. There is evidence that suggests that certain biases are more likely to arise when subjective outcomes (e.g. favourable outcome (Study ID 3d 18 , poor outcome (Study ID 2 17 or any outcome that requires human input). 45 It may have been more prudent to only consider outcomes such as survival, mortality or continuous outcomes that have no 39
  • 43. chance of being influenced by an external source. 3.4.3 Comparison with other studies A related study complied by Graham et al. 14 explored prediction intervals on metaanalysis. They performed a meta-epidemiological study of binary from meta-analyses published between 2002 to 2010. Their study included 72 meta-analyses from 70 studies each containing between 3-80 studies and for each, they calculated a randomeffects meta-analysis incorporating DerSimonian and Laird 12 method and calculated traditional and Bayesian 95% prediction intervals for odds ratios and risk ratios. They found that 50 out of 72 meta-analyses had their 95% random-effects confidence interval for odds ratios exclude their null value, of these, 18 had their 95% prediction intervals exclude the null. They also found that 46 out of the 72 meta-analyses had their 95% random-effects confidence interval for risk ratios exclude the null value, of these, 19 had their 95% prediction intervals exclude the null. They concluded “meta-analytic conclusions may be appropriately signaled by consideration of initial interval estimates with prediction intervals” but also stress that increasing heterogeneity can result in wide predictions intervals and caution must be taken when writing conclusions on a meta-analysis. 14 Comparing my results to theirs, I found less meta-analyses had their 95% prediction interval include the null when their 95% confidence interval had excluded theirs. Their study was larger than mine and they also were able to directly calculate odds ratios and relative risks for each meta-analysis. I worked out the effect size according to the authors of the studies and in some cases, couldn’t directly work out the summary estimate since the relevant data wasn’t available, only the individual treatment effects along with their 95% confidence intervals were reported. 3.4.4 Final Remarks and Implications Perhaps only looking at focusing on cases where prediction intervals include the null when their corresponding 95% confidence intervals didn’t may somewhat deviate away from why a prediction interval is useful. Since we were able to apply a 95% prediction interval to all cases, whether the analysis had high between-study heterogeneity, no between-study heterogeneity, whether the analysis had few or large trials, I was able to describe the results of random-effects meta-analysis more accurately since we are considering the whole distribution of effects, even if what I am deducing is that the authors require more trials or further research/information in cases where there are few studies. In the case where there is no evidence of between-study heterogeneity (indicated by I 2 , τ equal to 0), if we used a random-effects model with a predicˆ tion interval, if the prediction interval is significant wider than the random-effects 40
  • 44. confidence interval, then this suggests there is uncertainty amongst the parameters (e.g. lack of power if there are few studies). If the prediction interval is fairly close to the confidence interval, then this suggests a common effect may exists since we have considered the whole distribution of effects and the impact of heterogeneity is negligible. If there is evidence of betweens-study heterogeneity, then a prediction interval can reveal the impact of between-study heterogeneity which is useful to clinicians/researchers regardless if the average effect is significant. I therefore believe a 95% prediction interval should be presented in every random-effects meta-analysis to enhance the interpretation of its results, but I stress the need for the analysis to have a sufficient number of good quality unbiased randomised controlled trials. 41
  • 45. Chapter 4 Prediction intervals in Meta-Epidemiological studies It seems widely agreed that systematic reviews which contain a meta-analysis of randomised controlled trials provide the strongest and most reliable evidence of the effects of health care interventions since they use systematic and explicit methods to summarise all the evidence to answer a research question of interest. 1;42;46 Unfortunately, they are not impervious to bias, if the meta-analysis is biased or includes biased trials; the results from a meta-analysis will incorporate these biases resulting in either an over/underestimation of the summary treatment effect which can lead to misleading conclusions of how well the intervention works. 42;46 In the process of systematic reviews, when the relevant trials are searched for, we must make sure that al ofl the evidence (published and unpublished) is searched for so we can get the most accurate results. There is evidence that supports the fact that published studies are more likely to reflect a statistical significant results and more likely to report larger treatment effects and moreover, published studies are more likely to be used in a systematic review and therefore a meta-analysis, which can lead to a biased summary treatment effect in a meta-analysis (publication bias). 2;47 Furthermore, randomised controlled trials themselves are in danger of bias if there are imperfections in their methodological properties, i.e. there wasn’t proper allocation concealment, lack of blinding etc. 46 If we were to calculate a prediction interval in the presence of bias, heterogeneity accounting for real differences mixes with heterogeneity caused by these bias resulting in a much wider prediction interval. 7 Other biases that can arise are citation bias, language bias, cost bias etc. 2 The fundamental idea here is that bias must be assessed to make the conclusions of a meta-analysis more robust, failure to acknowledge it can result in misleading results. 42
  • 46. 4.1 Meta-Epidemiological Study A way in which we can inspect bias is to carry out a meta-epidemiological study which assesses the influence of trial characteristics on the treatment effect estimates in a meta-analysis. 43;42;46 A meta-epidemiological study will assess a specific trial characteristic by carrying out a meta-analysis on summary effects from a collection of meta-analysis (essentially a ’meta-analysis of meta-analyses’). 43;42;46 Like a normal meta-analysis, meta-epidemiological study should describe the distribution of all evidence, describe any heterogeneity between the meta-analyses, inspect associated risk factors and identify and control bias. The first time meta-epidemiology surfaced was in an editorial in the BMJ by David Naylor 48 , in 1997, where cautions are raised concerning the summary effect of a metaanalysis. The author mentions how meta-analyses can generate “inflated and unduly precise” estimates if biases exist. He also refers to evidence stating statistically significant outcomes were more likely to be published than non-significant studies and adds “readers need to examine any meta-analyses critically to see whether researchers have overlooked important sources of clinical heterogeneity among the included trials”. In 2002, meta-epidemiology is defined, by Sterne et al. 46 , as a statistical method to “identify and quantify the influence of study level characteristics”. In 2007, the method has been generalised in a systematic review conducted by OARSI (Osteoarthritis Research Society International). 49 This has resulted in many published meta-epidemiological studies which can be founded on the internet such as the BMJ website. These types of studies have provided strong evidence that flaws in trial characteristics lead on average to exaggeration of intervention effect estimates and in turn increase heterogeneity. 42 4.2 Prediction Intervals in Meta-Epidemiological Studies The aim of this chapter is to apply a 95% prediction interval to meta-epidemiological studies. Meta-epidemiological studies will use either a fixed-effect or a random-effects model and report a summary estimate with a 95% confidence interval. They still, however, need to describe the extent of heterogeneity that exits across all the evidence so the inclusion of a prediction interval can help formally describe it. We searched for meta-epidemiological studies on the website of the British Medical Journal (www.bmj.com). We used the advanced search toolbar and used the keyword “META EPIDEMIOLOGICAL” in text, abstract and title in all articles in all years. Any meta-epidemiological study looking at a trial characteristic was eligible as long as we are able to carry out their meta-analysis ourselves. We took 4 studies at random and carried out their meta-epidemiological meta-analysis using a random-effects meta43
  • 47. analysis with a 95% prediction interval using the formulas (1.9 to 1.13) and (2.2). In all 4 of the examples we use, we estimated the standard errors using the formulas (3.13) or (3.14) depending on outcome of interest, since we couldn’t work them out directly. 4.2.1 Example 1 A trial characteristic that can influence the estimates of individual trial treatment effect is the status of the study centre, i.e. is it carried out in a single centre or in multicentres. Bafeta et al. 50 carry out a meta-epidemiological study in the aim to compare estimates of intervention effects between single centre and multicentre randomised controlled trials on continuous outcomes. They address a previous study that concluded the effect of interventions using binary outcomes are larger in single centre randomised controlled trials compare to multicentre ones 51 and address a paper by Bellomo et al. 52 who state single centre trials often contradict multi centre trials. The authors included 26 meta-analyses with a total of 292 randomised controlled trials (177 in single centres and 115 in multicentres) with continuous outcomes that were published between January 2007 to January 2010 in the Cochrane database for systematic reviews (which they state as having “high methodological quality”). They ignored meta-analyses of non-randomised trials, IPD meta-analyses and meta-analyses where all trials were only single centre or only multicentres and any meta-analysis that had less than 5 randomised controlled trials. They used the risk of bias tool as recommended by the Cochrane Collaboration 3 to assess risk of bias from individual reports for each trial. For each meta-analysis, they used a random-effects metaanalysis incorporated DerSimonian and Laird estimate for τ 2 to combine treatment effects across the trials and assessed heterogeneity using χ2 and I 2 . The authors then estimate a standardised mean difference between single centre and multicentre trials using a random-effects meta-regression to incorporate potential heterogeneity between trials. They then synthesised these using a random-effects model and used I 2 , Qtest to assess between-meta-analysis heterogeneity. A standardised mean difference < 0 indicates that single centre trials, on average, showed larger treatment effects than multicentre trials. They calculated a summary estimate of -0.09 with a 95% confidence interval of [-0.17,-0.01] with low between-meta-analysis heterogeneity (I 2 and τ 2 values of 0). We obtained the same random-effects summary estimate of -0.09 ˆ and same 95% confidence interval of [-0.17,-0.01], additionally we calculated a 95% prediction interval of [-0.18,0.00]. The results are shown in the forest plot below in figure 4.1. The summary estimate (-0.09) indicates that on average, single centre trials produced a larger estimate of the intervention effect than multicentre trials. Since the 95% confidence interval ([-0.17,-0.01]) is entirely < 0, there is strong evidence that on average, single centre trials show a larger effect than multicentre trials looking at the 44
  • 48. Figure 4.1: Forest plot of a meta-epidemiological analysis assessing the difference in intervention effect estimates between single centre and multicentre randomised controlled trials 50 same intervention but is this always the case. The authors report “on average single centre trials with continuous outcomes showed slightly larger intervention effects than multicentre” and acknowledge between-meta-analysis heterogeneity and risk of bias by using subgroup and sensitive analysis but a 95% prediction interval can describe all the uncertainty more formally. The calculated 95% prediction interval ([-0.18,0.00]) now includes the null value 0 but doesn’t exceed it and is only slightly wider than the 95% random-effects confidence interval revealing the impact of heterogeneity is low. We can say, that after considering the whole distribution of effects, in at least 95% of the times, the effect in a multicentre will never be strictly larger than the corresponding effect in a single centre but we cannot rule out that the effect might be the same. We mirror the authors views that further research is needed to investigate 45
  • 49. potential causes of these differences. 4.2.2 Example 2 Nuesch et al. 53 carried out a meta-epidemiological study to examine whether or not excluding patients from the analysis of randomised controlled trials are associated with biased estimates of treatment effects and whether or not it causes heterogeneity between trials. They address evidence that departure from protocol and losses to follow-up in randomised controlled trials can lead to exclusion of patients from the final analysis, and such handling of these patients lead to treatment effects that differ methodically from the true treatment effects. 54;55 Such bias is termed attrition bias 56 or selection bias and this study aims to see how it affects the summary effects in a meta-analysis and does it increase between-study heterogeneity. The authors include 14 meta-analyses, with a total of 167 trials (39 with all patients in the analysis, 128 where some patients excluded). Eligible meta-analyses were those of random/quasi-randomised trials in patients with osteoarthritis of the knee or hip and reported non-binary patient reported outcome (e.g. pain intensity) which assessed any intervention with placebo or a non-intervention control. If a meta-analysis only included trials that had patient exclusions or had trials where there were no exclusions, it is ignored. Within each meta-analysis, the authors used a random-effects meta-analysis to calculated a summary effect for trials with and trials without exclusions before deriving differences between them. A difference of < 0 suggests trials with exclusions have a more beneficial treatment effect. These differences were then synthesised using a random-effects meta-analysis for which the authors state “fully accounted for variability in bias between meta-analysis” and they estimate τ 2 as a measure of between-study heterogeneity. They obtained a summary estimate of 0.13 with a 95% confidence interval of [-0.29,0.04] with what they consider as high between-meta-analysis heterogeneity indicated by τ 2 value of 0.07. We obtained the ˆ same random-effects summary estimate of -0.13 but a different confidence interval of [-0.31,0.05] noticing an error in the 3rd meta-analysis in the forest plot presented in the paper. We also obtained an I 2 value of 78.2% and a slightly larger τ 2 value of ˆ 0.0811 as well as a 95% prediction interval of [-0.78,0.52].The results are shown in the forest plot below in figure 4.2. The summary estimate (-0.13) indicates that on average, trials with exclusions produce a larger estimate of the treatment effect compare to those without exclusions. The 95% confidence interval ([-0.31,0.05]) contains the null value so the average isn’t significant (nor is the authors 95% confidence interval). However, both ours and the authors 95% confidence interval suggests there is evidence (albeit non-significant at 5% level) that on average, patient exclusion leads to more beneficial treatment effects. This may have lead the authors to report that “excluding patients from the analysis of randomised trials often resulted in biased estimates of treatment effects, but the 46
  • 50. Figure 4.2: Forest plot of a meta-epidemiological analysis assessing the difference in effect sizes between trials with and without exclusions of patients from analysis 50 extent and direction of bias remained unpredictable in a specific situation” and recommend “results from intention to treat analysis should always be described in reports of randomised trials”. They acknowledge the large between-meta-analysis heterogeneity by carrying out stratified analysis but a 95% prediction interval can reveal the full uncertainty around the summary estimate. The calculated 95% prediction interval ([-0.78,0.52]) is fairly wide since it is accounting for the large between-meta-analysis heterogeneity (indicated by I 2 and τ 2 values of 78.2% and 0.0811 respectively). I ˆ can say that after considering the whole distribution of effects, although on average it seems as though studies with exclusions lead to more beneficial treatment effect, analysis where the trials have no patient exclusions could quite easily have a more beneficial treatment effect compared to those where there are exclusions. Here, the impact of heterogeneity is much more evidential than the 95% confidence interval and further reveals in a brand new situation, the chance of a trial with exclusion being better than a trial without exclusions is unpredictable. Possible reasons for such unpredictability could be down to the fact the analysis had a combined 39 trials without 47
  • 51. any exclusions compared to a combined 167 trials that did have exclusions. Further research is required but, like the authors, I believe an intention to treat analysis should be reported to account for all patients. 4.2.3 Example 3 Pildal et al. 57 carry out a meta-epidemiological study to assess the impact of removing randomised controlled trials without reported adequate allocation concealment has on the conclusions drawn from a meta-analysis. The study also looks at how trials without double blinding affects conclusions but the study of interest is concerning reported adequate allocation concealment. There is evidence that without adequate allocation concealment, which conceals what treatment the next patient will receive, this may lead to selection bias and an exaggerated treatment effect. 55 They state “without concealment, person in charge might channel patients with better prognosis into his or her preferred treatment. The authors searched for reviews in the Cochrane Library and PubMed and included reviews that included a meta-analysis of randomised controlled trial that reported a binary outcome and this had to be their first statistically significant result that supported a conclusion in favour of an intervention. They excluded any non-binary outcome meta-analyses, any that had more than 40 trials and any based on non-randomised trials. For each meta-analysis, they reproduced it using the author’s original method and then redid each meta-analysis but only with trials that had reported adequate allocation concealment. They included 34 metaanalyses, of which 29 (covering 284 trials) went through to the analysis. For each, they estimate ratio of odds ratios using a univariate random-effects meta-regression and combined these using a random-effects meta-analysis and calculated I 2 as a measure of heterogeneity.. A ratio of odds ratio < 1 indicates trials with inadequate allocation concealment shows a more favourable treatment effect. They calculated a summary estimate of 0.81 with a 95% confidence of interval [0.81,1.01] and I 2 to be 0. We obtained the exact same summary estimate and 95% confidence interval, we also retrieved the same I 2 value of 0 and calculated τ 2 to be 0. We also calculated a ˆ 95% prediction interval of [0.80,1.02].The results are shown in the forest plot below in figure 4.3. The summary estimate (0.90) indicates that on average, trials without reported adequate allocation concealment showed a more favourable treatment effect. The 95% confidence interval ([0.81,1.01]) slightly goes over the null value so the average effect isnt significant at the 5% level but there does seem to be evidence inadequate allocation concealment does exaggerate the treatment benefit. The authors state “There was a non-significant trend towards seemingly more beneficial effect of the experimental treatment in the trials without reported adequate allocation concealment and “The impact of reported allocation concealment and double-blinding on the treatment effect estimate is smaller and less consistent than previously through. This can be 48
  • 52. Figure 4.3: Forest plot of a meta-epidemiological analysis assessing ratio of odds ratio of trials with and without adequate allocation concealment 57 further fortified by a 95% prediction interval which we calculated to be [0.80,1.02]. The 95% prediction interval is only slightly wider than the 95% confidence interval revealing the impact of heterogeneity to be low. After considering the whole distribution of effects, whilst on average there is evidence that inadequate allocation concealment does lead to a more beneficial treatment effect (albeit non-significant at 5% level), there are situations when trials with adequate allocation concealment may produce a more beneficial treatment effect compared to those that have inadequate allocation concealment. 49
  • 53. 4.2.4 Example 4 Tzoulaki et al. 58 carried out a meta-epidemiological study to compare reported effect sizes of cardiovascular biomarkers in datasets from observational studies with those in datasets from randomised controlled trials. The authors state that biomarkers are becoming prominently used as predictors of cardiovascular outcomes but address concerns that reported effect sizes may have be exaggerated since “several biases may inflate the observed association. 59 They point out that study testing prognostic associations comes mainly from observational epidemiological studies (cohort studies, case-control studies etc) but point out the same type of information may be available from randomised controlled trials. They report how epidemiological studies differ from randomised controlled trials and these differences may result in larger treatment effect estimates, 60 but remain unsure whether or not differences exist when biomarkers, rather than treatment effects, are assessed hence the motivation for this study. The authors included 31 meta-analyses with a total of 555 studies (472 observational studies and 83 randomised controlled trials), each examining association between pre-specified eligible biomarkers with an eligible outcome. For each meta-analysis, a random-effects model is used firstly to combined reported relative risks from all the data and then for randomised controlled trials and observation studies separately. The authors were also recommended to calculate prediction intervals for summary estimate. 7 The authors then calculated a design difference which “represents the difference between datasets from observational studies and from randomised controlled trials as a proportion of the summary effect of each meta-analysis. A study difference < 0 indicates treatment effect is stronger in randomised controlled trials. The study differences were then combined using both a fixed-effect and a random-effects meta-analysis along with a 95% prediction interval. They calculated a random-effects summary estimate of 0.24 with a 95% confidence interval of [0.07,0.40], I 2 value of 39% and a 95% prediction interval of [-0.29,0.76]. We obtained the exact same randomeffects summary estimate and 95% confidence interval as the authors, I 2 value of 39.1%, τ 2 value of 0.0578 and a 95% prediction interval [-0.28,0.76]. The results are ˆ shown in the forest plot below in figure 4.4. The summary estimate (0.24) indicates that on average, there was a stronger effect observed in observational studies compared to randomised controlled trials. Since the 95% confidence interval ([0.07,0.40]) is entirely < 0, there is strong evidence that on average, observational studies have more favourable results than results from randomised controlled trials, but is this always the case. The authors acknowledge what they believe to be “modest” heterogeneity, indicated by I 2 value of 39% (we obtained 39.1% as well as τ 2 value of 0.0578 suggesting moderate to high betweenˆ meta-analysis heterogeneity), by carrying out subgroup analysis. They were also recommended to calculate a 95% prediction interval, which they obtained to be [0.29,0.76] (we obtained [-0.28,0.76]), which is a much more formal way of describing heterogeneity. The 95% prediction interval in both mine and their cases now includes 50
  • 54. Figure 4.4: Forest plot of a meta-epidemiological analysis assessing design difference in effect sizes in datasets from observational studies v those from randomised controlled trials populations 58 the null value and is quite wide revealing the impact of heterogeneity. The authors state “typically, observational studies are expected to show larger or even much larger effects than randomised controlled trials, but exceptions can exist where larger effects are seen in the randomised controlled trials”. Since we have both considered the whole distribution of effects, although on average cardiovascular biomarkers seem to have more favourable results in observational studies, we cannot rule out that they could have a better effect in randomised controlled trials, but we can also see that the effect 51
  • 55. 4.3 Discussion Meta-epidemiological studies, which assess how trial characteristics may influence treatment effects in a meta-analysis, are becoming more and more prominent in evidenced based clinical practice. Published meta-epidemiological studies that are readily available (for example on the BMJ) provide strong evidence that imperfections in trial protocol lead to biases that may lead to over/underestimation of the true summary effect of the intervention. 42 We took 4 meta-epidemiological studies at random and applied a 95% prediction interval. In all cases, the 95% prediction interval was able to add extra important information about the meta-epidemiological summary estimate which may be overlooked if the focus is solely on the summary estimate. Interestingly, only two meta-epidemiological studies that we found on the BMJ included a 95% prediction interval 58;61 , one of which I reproduced, this came about after recommendation from Riley et al. 7 after reviewing the original study. Zhang 43 anticipates that meta-epidemiology will “further evolve to improve evidence based clinical practice”. I believe the inclusion of a 95% prediction interval in a meta-epidemiological study, which takes into account the whole distribution of effects in a way that acknowledges heterogeneity, will further increase the quality and stature of meta-epidemiological studies and lead to more accurate and robust conclusions drawn from them. 52
  • 56. Chapter 5 Final Discussion and Conclusion The aim of this paper was to investigate the use of prediction interval in meta-analysis and whether or not their inclusion can help interpret the results of the analysis to a higher degree. The importance of meta-analysis has grown over the last 20 years in medicine and health care but on-going research is continuing to try and improve the method. Ades et al. 8 and Higgins et al. 5 set the scene for the use of predictions in a traditional meta-analysis and further publications 7;13;10;14 provide stern evidence that including a 95% prediction interval will enhance the quality and robustness of conclusions drawn from meta-analysis. Despite this, recent meta-analyses of randomised controlled trials still do not include a prediction interval in their analysis. My aim was to take a collection of meta-analyses and apply a prediction interval and examine how these affect the outcomes of the analysis. In chapter 1, I introduced meta-analysis as the statistical component of a systematic review; these remain the best sources of information health care interventions. 1 The first point I made was why we use meta-analysis over traditional narrative reviews. What I am not saying is that narrative reviews should be abolished. If we are trying to compare studies that differ by outcome, effect size or study design, it may be more sensible to have two or more experts write a report rather than use meta-analysis. I discussed both fixed-effect and random-effects meta-analysis in this chapter, provided the formulas to carry out both types as well as an example for each. The goal of this chapter is to stress the differences between the two types of model. A fixedeffect model assumes a common effect exists amongst the trials included the analysis. However, when information is extracted from published and unpublished data, trials will differ in many ways so a common effect is unlikely to exists because of betweenstudy heterogeneity (real differences), a critical theme in this paper. To account for this, a random-effects model is used which assumes the treatment effects are normally distributed about the summary estimate. The goal of a random-effects model is to not only estimate the summary effect but explain the differences that exist. 2;5 To see if 53
  • 57. heterogeneity is present, I presented the I 2 -statistics, Q-statistic based on the Q-test as tools to help us see if between-study heterogeneity is present and the DerSimonian and Laird 12 estimate of the between-study variance τ 2 which is incorporated into the ˆ random-effects model. I also stated evidence provided by Riley et al. 7 that there is misunderstanding of when to use a fixed-effect or random-effects model as well as misinterpretations of the summary estimate. In my view, random-effects models can be used without any real justification, but a fixed-effect model requires firm justification since we are being specific in the assumption that a common effect exists. In chapter 2, we present a 95% prediction interval as given by Higgins et al. 5 . A 95% prediction interval describes the whole distribution of effects in a random-effects metaanalysis, describes the degree of heterogeneity since it takes into account of all the uncertainty and gives a range for which we can be 95% sure that the treatment effect in a brand new study lies within. 2;5;7;13 In this chapter, the goal is not only to introduce the prediction interval but to stress why it is useful clinically. In a random-effects meta-analysis, the summary estimate and its 95% confidence interval only provides inferences over the average effect, since a random-effects assumes that the individual treatment effects are normally distributed around the summary estimate, we need to describe that, this motivates the 95% prediction interval. If the 95% prediction interval contains the associated null value, then in some cases the intervention will be beneficial, unbeneficial or may have no effect. If the 95% prediction excludes the null value, then when the intervention is applied, at least 95% of the teams, the effect will be beneficial/unbeneficial, this is much more useful to clinicians. I also describe the differences between a 95% confidence interval and a 95% prediction interval since they may be misinterpreted. In chapter 3, I carried out an empirical review on the impact of a 95% prediction interval on existing meta-analyses of three or more randomised controlled trials from the Lancet. We discuss the results, limitations and implications in chapter 3. Over a quarter (27.8%) of the meta-analyses which had significant 95% confidence intervals had insignificant 95% predictions intervals. In all these cases, their respective authors generally stated there was evidence that the intervention was beneficial/unbeneficial and where there was evidence of between-study heterogeneity, they carried out some type of analysis (e.g. subgroup analysis, meta-regression etc). I pointed out limitations of such types of analysis in chapter 1 and instead use a 95% prediction interval to describe the extent of heterogeneity that is present. In these cases, although on average, the treatment effect is beneficial/unbeneficial, we cannot rule that in some cases, the effect may not be beneficial/unbeneficial. I believe 27.8% is a fairly substantial proportion of meta-analyses which may have to acknowledge the fact that in some cases, their conclusions may not be valid. However, this review reveals more than just the above. If a 95% prediction interval is only slightly wider than a 95% confidence interval and there is no evidence of between-study heterogeneity and we have a sufficient number of good quality randomised trials, a common effect may ex54
  • 58. ists (e.g. Study 11b 26 ). This is a much more powerful way of deducing a common effect since we have considered the whole distribution of effects. Since the 95% prediction interval is only slightly wider, this may be attributed to within-study variation and slight uncertainty in parameter estimates (given we have sufficient good quality trials). For meta-analyses that had their 95% prediction interval exclude the null (25%), the prediction interval reveals that after considering the whole distribution of effects, when the treatment is applied in at least 95% of brand new study settings, the effect will always be beneficial/unbeneficial. These cases generally had no evidence of between-study heterogeneity (I 2 , τ 2 values of 0 or closer to 0) apart from study ˆ ID 3d 18 . If we have few studies, a 95% prediction interval can reveal the impact of uncertainty around the estimates, e.g. study 17 32 had no evidence of between-study heterogeneity (I 2 , τ 2 values of 0) yet had an extremely large 95% prediction interval ˆ attributed by the vast uncertainty in the parameter estimates. In chapter 3, it is clear to see that with few studies, the 95% prediction interval is generally wider since there is uncertainty in parameter estimates as well as accounting for between-study heterogeneity. Whilst a 95% prediction interval should still be considered, its clinical meaningfulness may become nullified. The problem lies in the DerSimonian and Laird 12 estimate for τ 2 which is sensitive to the number of ˆ studies. The better we are able to estimate τ 2 , the more accurate the 95% prediction ˆ interval and moreover, the more accurate the random-effects results are. Brockwell et al. 63 and Hardy et al. 64 discuss further problems of the DerSimonian and Laird estimate. I have so far restricted myself to mentioning the Bayesian approach that exists to estimating τ 2 . Graham et al. 14 not only calculate traditional prediction ˆ intervals in their study but also estimate 95% Bayesian prediction intervals. They state “Bayesian methods incorporate uncertainty into estimates more readily than frequentist; Bayesian posterior probabilities indicate clinically relevant effects in an immediate and tangible manner. It would therefore seem wise to recommend this type of approach in the case of few studies. We refer 8;5 for more information. We also refer Knapp et al. 62 who describe other methods for estimating τ 2 . ˆ In chapter 4, we applied a 95% prediction interval to four meta-epidemiological studies. Meta-epidemiological studies assesses the influence of trial characteristics on the treatment effect estimates in a meta-analysis. 43;42;46 These types of studies are relatively new in medical and health care literature but they already have gained importance since they reveal strong evidence that imperfections in trial protocol lead to biases that lead to over/underestimation of the true summary effect of the intervention. 46 They will use a fixed-effect or a random-effects model and report a summary estimate and 95% confidence interval and use some type of analysis to describe between-meta-analysis heterogeneity. Riley et al 7 have recommended authors to include a prediction interval and one of the studies we reproduced 58 had included a 95% prediction interval and it was clear to see the added benefit of it. Like chapter 3, the 95% prediction interval enhanced the interpretation of the results of each of the 55
  • 59. meta-epidemiological studies we took (see chapter 4). Since the method itself is in its early stages, adding a 95% prediction interval to its analysis would further enhance the methods reputation and make the results easier to apply to clinical practice since we consider the whole distribution of effects. In chapter 3, we have seen how the inclusion of a prediction interval can enhance the interpretation of results in a traditional random-effects meta-analysis. In chapter 4, we applied a prediction interval to meta-epidemiological studies which suggests that we can further expand the uses of prediction intervals to other study designs in epidemiology. It could be possible to apply a prediction interval to meta-regression since meta-regression will still incorporate τ 2 . A prediction interval could be applied ˆ to meta-analysis of cluster randomised controlled trials whereby groups of patients are randomised rather than patients randomised individually. In all these cases, further research is required to assess its applications. I also recommend further research into the Bayesian approach of meta-analysis when there are few studies or generally when the DerSimonial and Laird estimate for τ 2 lack precision. Borenstein et al 2 state ˆ “This is probably the best option, but the problem is that relatively few researchers have expertise in Bayesian meta-analysis. This makes it fairly clear that when τ 2 lacks ˆ precision, more emphasis must be given to a Bayesian approach and more research is required to educate researchers on the procedures of Bayesian meta-analysis. In conclusion, I believe that every single random-effects meta-analysis should present a 95% prediction interval in its analysis but for best performance, the meta-analysis should be of good quality unbiased randomised controlled trials. This gives the 95% prediction interval the best chance of describing the full uncertainty around the random-effects summary estimate as well as the impact of between-study heterogeneity and giving an accurate clinically meaningful range for which we are 95% sure that the treatment effect in a brand new study lies within. If we have few studies, I recommend a Bayesian approach or whilst a random-effects meta-analysis may still be used, the uncertainty must be addressed in the conclusions. Furthermore, I recommend that every meta-epidemiological study include a 95% prediction interval to further enhance the quality and robustness of its conclusions and to enhance its reputation as a method. 56
  • 60. Appendix A STATA Codes I used STATA v10.1 to perform a random-effects meta-analysis with a 95% prediction interal on each of the meta-analyses included in the empirical review as well as the meta-epidemiological studies. All forest plots presented in the paper are produced using STATA v10.1. The codes for each figure are presented below. Figure 1.1: metan smd se, effect(SMD) xlabel(-1.5,-1,-0.5,0,0.5,1) xtitle(Mean Difference (Treatment A minus Placebo)) favours( Favours Treatment # Favours Placebo) Figure 1.2: metan smd se, random effect(SMD) xlabel(-1.5,-1,-0.5,0,0.5,1) xtitle(Mean Difference (Treatment B minus Placebo)) favours( Favours Treatment # Favours Placebo) Figure 2.1: metan smd se, random rfdist effect(SMD) xlabel(-1.5,-1,-0.5,0,0.5,1) xtitle(Mean Difference (Treatment B minus Placebo)) favours( Favours Treatment # Favours Placebo) Figure 3.3: metan a b c d, or random rfdist label(namevar=trial) xlabel(0.001,0.01,0.1,1.0,10,100,1000) xtitle(Odds Ratio) favours( Favours Treatment # Favours Control) Figure 3.4: metan lnhr lnse, random rfdist eform effect(HR) label(namevar =trial) xlabel(0.1,0.2,0.5,1,2,5,10) xtitle(Hazard Ratio) favours( S+CT better # S alone better) Figure 3.5: metan a b c d, or random rfdist label(namevar=trial) xlabel(0.25,1,2,5,10) xtitle(Odds Ratio) favours( Thrombolysis Decreases # Thrombolysis Increases) boxsca(45) Figure 3.6: metan a b c d, rr random rfdist label(namevar=trial) xlabel(0.1,0.25,1,2,5, 57
  • 61. 10) xtitle(Relative Risk) favours( “Probiotic has” “protective effect” # “Probiotic has” “non-protective effect”) boxsca(30) Figure 3.7: metan rd se, random rfdist effect(RD) label(namevar=trial) xlabel(-2,-1,0,1,2) xtitle(Risk Difference) favours( “Favours Bilateral” # 11Favours Unilateral”) boxsca(30) Figure 4.1: metan dsmd se, random rfdist effect(Difference in SMD) label(namevar=ma) xlabel(-2,-1.5,-1,-0.5,0,0.5,1,1.5) xtitle(Difference in standardised mean difference) favours( “Single centre trials” “show larger effect” # “Multicentre trials” “show larger effect”) boxsca(30) Figure 4.2: metan des se, random rfdist effect(Difference in Effect Size) label(namevar=ma) xlabel(-1.5,-1,-0.5,0,0.5,1,1.5) xtitle(Difference in effect size) favours( “Trials with” “exclusions” “more beneficial” # “Trials without” “exclusions” “more beneficial”) Figure 4.3: metan ror se, randomf rfdist eform effect(Ratio of Odds Ratio) label(namevar=ma) xlabel(0.1,0.2,0.5,1,2,5,10) xtitle(“Ratio of” “Odds Ratios”) favours( “Trials with unclear or inadequate” “concealment show a more favourable” “effect of the experimental treatment”) boxsca(30) Figure 4.4: metan dd se, random rfdist effect(Design Difference) label(namevar=ma) xlabel(-15,-5,-1,0,1,5,15) xtitle(Design Difference) favours( “Stronger effect” “in RCTs” # “Stronger effect in” “observational studies”) boxsca(30) 58
  • 62. Bibliography [1] Hemingway P, Brereton N. What is a systematic review? Hayward Medical Communications NPR09/1111, 2009. [2] Borenstein M, Hedges L.V., Higgins J.P.T., et al. Introduction to Meta-Analysis. Wiley, 2009. [3] Higgins J.P.T, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. Available from www.cochrane-handbook.org. [4] Crombie I, Davies H. What is meta-analysis? Hayward Medical Communications NPR09/1112, 2009. [5] Higgins J.P.T, Thompson S.G, Spiegelhalter D.J. A re-evaluation of randomeffects meta-analysis. J R Stat Soc A 2009, 172:137159. [6] Lin D.Y, Zeng D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika (2010), 97, 2, pp. 321332. [7] Riley R.D, Higgins J.P.T, Deeks J.J. Interpretation of random effects metaanalysis. BMJ 2011;342:d549. [8] Ades A.E, Lu G, Higgins J.P.T. The Interpretation of Random-Effects MetaAnalysis in Decision Models. Med Decis Making 2005;25:646-54. [9] Higgins J.P.T, Thompson S.G. Quantifying heterogeneity in a meta-analysis. Statist. Med, 2002; 21:1539-1558. [10] Higgins J.P.T. Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified. International Journal of Epidemiology 2008;37:11581160 [11] Thompson S.G, Higgins J.P.T. How should meta-regression analyses be undertaken and interpret. StatMed 2002;21:1559-74 [12] DerSimonian R, Laird N. Meta-Analysis in Clinical Trials. Control Clinical Trials 1986. 7:177-188. 59
  • 63. [13] Guddat C, Grouven U, Skipka G, et al. A note on the graphical presentation of prediction intervals in random-effects meta-analyses. BioMed Central 10.1186/2046-4053-1-34, 2012. [14] Graham P.L, Moran J.L. Robust meta-analytic conclusions mandate the provision of prediction intervals in meta-analysis summaries. J Clin Epidemiol 2012, 65:503510. [15] Pignon J, Hill C. Meta-analyses of randomised clinical trials in oncology Lancet Oncol 2001; 2: 47582. [16] Independent UK panel on Breast Cancer Screening. The benefits and harms of breast cancer screening: an independent review. Lancet 2012; 380: 177886. [17] Mees S.M.D, Algra A, Vandertop W.P, et al. Magnesium for aneurysmal subarachnoid haemorrhage (MASH-2): a randomised placebo-controlled trial. Lancet 2012; 380: 4449. [18] Wardlaw J.M, Murray V, Berge E, et al. Recombinant tissue plasminogen activator for acute ischaemic stroke: an updated systematic review and metaanalysis. Lancet 2012; 379: 236472. [19] Jozwiak M, Rengerink K.O, Benthem M, et. al. Foley catheter versus vaginal protaglandin E2 gel for induction of labour at term (PROBAAT trial): an open label, randomised controlled trial. Lancet 2011; 378: 2095103. [20] Tasina E, Haidich A, Kokkali S, et al. Efficacy and safety of tigecycline for the treatment of infectious diseases: a meta-analysis. Lancet Infect Dis 2011;11: 83444. [21] Holmes M.V, Newcombe P, Hubacel J, et al. Effect modification by population dietary folate on the association between MTHFR genotype, homocysteine, and stroke risk: a meta-analysis of genetic studies and randomised trials. Lancet 2011; 378: 58494. [22] Sjoquist K.M, Burmeister B.H, Smithers B.M, et al. Survival after neoadjuvant chemotherapy or chemoradiotherapy for resectable oesophagel carcinoma: an updated meta-analysis. Lancet Oncol 2011; 12: 68192. [23] Dondorp A.M, Fanello C, Hendriksen I.C.E, et al. Artesunate versus quinine in the treatment of severe falciparum malaria in African children (AQUAMAT): an open-label, randomised trial. Lancet 2010; 376: 164757. [24] Hopfl M, Selig H.F, Nagele P. Chest-compression-only versus standard cardiopulmonary resuscitation: a meta-analysis. Lancet 2010; 376: 155257. [25] Carotid Stenting Trialist’s Collaboration. Short-term outcome after stenting 60
  • 64. versus endarterectomy for symptomatic carotid stenosis: a preplanned metaanalysis of individual patient data. Lancet 2010; 376: 106273. [26] NSCLC Meta-analyses Collaborative Group. Adjuvant chemotherapy, with or without postoperative radiotherapy, in operable non-small-cell lung cancer: two meta-analyses of individual patient data. Lancet 2010; 375: 126777. [27] Norman J.E, Mackenzie F, Owen P, et al. Progesterone for the prevention of preterm birth in twin pregnancy (STOPPIT): a randomised, double-blind, placebo-controlled study and meta-analysis Lancet 2009; 373: 203440. [28] Ray K.K, Seshasai S.R.K.S, Wijesuriya S, et al. Effect of intensive control of glucose on cardiovascular outcomes and death in patients with diabetes mellitus: a meta-analysis of randomised controlled trials. Lancet 2009; 373: 176572. [29] Hofmeijer J, Kappelle L.J, Algra A, et al. Surgical decompression for spaceoccupying cerebral infarction (the Hemicraniectomy After Middle Cerebral Artery infarction with Life-threatening Edema Trial [HAMLET]): a multicentre, open, randomised trial. Lancet Neurol 2009; 8: 32633. [30] Carr R, Brocklehurst P, Dare C.J, et al. Granulocyte-macrophage colony stimulating factor administered as prophylaxis for reduction of sepsis in extremely preterm, small for gestational age neonates (the PROGRAMS trial): a singleblind, multicentre, randomised controlled trial. Lancet 2009; 373: 22633. [31] Golfinopoulos V, Salanti G, Pavlidis N, et al. Survival and disease-progression benefits with treatment regimens for advanced colorectal cancer: a metaanalysis. Lancet Oncol 2007; 8: 898911. [32] Barker A, Maratos E.C, Edmonds L, et al. Recurrence rates of video-assisted thoracoscopic versus open surgery in the prevention of recurrent pneumothoraces: a systematic review of randomised and non-randomised trials. Lancet 2007; 370: 32935. [33] Catovsky D, Richards S, Matutes E, et al. Assessment of fl udarabine plus cyclophosphamide for patients with chronic lymphocytic leukaemia (the LRF CLL4 Trial): a randomised controlled trial. Lancet 2007; 370: 23039. [34] Phromminitkul A, Haas S.J, Elsik M, et al. Mortality and target haemoglobin concentrations in anaemic patients with chronic kidney disease treated with erythropoietin: a meta-analysis. Lancet 2007; 369: 38188. [35] Sazawal S, Hiremath G, Dhingra U, et al. Effi cacy of probiotics in prevention of acute diarrhoea: a meta-analysis of masked, randomised, placebo-controlled trials. Lancet Infect Dis 2006; 6:37482. [36] The ESPRIT Study Group Aspirin plus dipyridamole versus aspirin alone af- 61
  • 65. ter cerebral ischaemia of arterial origin (ESPRIT): randomised controlled trial. Lancet 2006; 367: 166573. [37] Bjelakovic G, Nikolva D, Simonetti R.G, et al. Antioxidant supplements for prevention of gastrointestinal cancers: a systematic review and meta-analysis. Lancet 2004; 364: 121928. [38] The UK ECT Review Group. Efficacy and safety of electroconvulsive therapy in depressive disorders: a systematic review and meta-analysis. Lancet 2003; 361: 799808. [39] Shepherd J, Blauw G.J, Murphy M, et al. Pravastatin in elderly individuals at risk of vascular disease (PROSPER): a randomised controlled trial. Lancet 2002; 360: 162330. [40] Mehta S.R, Eikelboom J.W, Yusuf S. Risk of intracranial haemorrhage with bolus versus infusion thrombolytic therapy: a meta-analysis. Lancet 2000; 356: 44954. [41] Gueyfifer F, Bulpitt C, Boissel J, et al. Antihypertensive drugs in very old people: a subgroup meta-analysis of randomised controlled trials. Lancet 1999; 353: 79396. [42] Savovic J, Harris R, Wood L, et al. Development of a combined database for meta-epidemiological research. Res. Syn. Meth. 2010, 1 212–225. [43] Zhang W. Abstracts from Invited Speakres 1-01 Osteoarthritis and Cartilage 18, Supplement 2 (2010) S1S8. [44] Stroup D.F, Thacker S.B, Olson C.M, et al. Characteristics of metaanalyses related to acceptance for publication in a medical journal. Clin Epidemiol 2001; 54: 655-660. [45] Wood L, Egger M, Gluud L.L, et a.l Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ 2008;336:601-5. [46] Sterne J, Juni P, Schulz K, et al. Statistical methods for assessing the influence of study characteristics on treatment effects in ’meta-epidemiological’ research. Statist. Med. 2002; 21:15131524. [47] Easterbrook P.J, Berlin J.A, Gopalan R, et al. Publication bias in clinical research. Lancet 1991; 337:867-872. [48] Naylor D. Meta-analysis and the meta-epidemiology of clinical research. BMJ 1997;315:6171. [49] Zhang W, Moskowitz R, Nuki G, et al. ARSI recommendations for the management of hip and knee osteoarthritis, Part I: Critical appraisal of existing 62
  • 66. treatment guidelines and systematic review of current research evidences. Osteoarthritis and Cartilage 2007;15:981-1000. [50] Bafeta A, Dechartres A, Trinquart L, et al. Impact of single centre status on estimates of intervention effects in trials with continuous: meta-epidemiological study BMJ 2012;344:e813. [51] Deschartres A, Boutron I, Trinquart L, et al. Single-centre trials show larger treatment effects than multicentre trials: evidence from a meta-epidemiologic study. Ann Intern Med 20011;155:39-51. [52] Bellomo R, Warrillow S.J, Reade M.C. Why we should be wary of single-centre trials. Crit Care Med 2009;37:31 14-9. [53] Nuesch E, Trelle S, Reichenbach S, et al. The effects of excluding patients from the analysis in randomised controlled trials: meta-epidemiological study BMJ 2009;339:b3244. [54] Tierne J.F, Stewart L.A, et al. Investigating patient exclusion bias in metaanalysis. Int J Epidemiol 2005;34:79-87. [55] Juni P, Altman D.G, Egger M. Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001;323:42-6. [56] Juni P, Egger M. Commentary: empirical evidence of attrition bias in clinical trials. Int J Epidemiol 2005;34:87-8. [57] Pildal J, Hrobjartsson A, Jorgensen K, et al. Impact of allocation concealment on conclusions drawn from meta-analysis of randomized trials. International Journal of Epidemiology 2007;36:847857. [58] Tzoulaki I, Siontis K, Ioannidis J. Prognostic effect size of cardiovascular biomarkers in datasets from observational studies versus randomised trials: meta-epidemiology study. BMJ 2011;343:d6829. [59] Tzoulaki I, Liberopoulos G, Ioannidis JP. Use of reclassifiation for assessmen of improved prediction beyond the Framingham risk score. Int J Epidemiol 2011;40:1094-105. [60] Vandenbroucke J.P. When are observational studies as credible as randomised trials? Lancet 2004;363:1728-31. [61] Nuesch E, Trelle S, Reichenbach S, et al. Small study effects in meta-analyses of osteoarthritis trials: meta-epidemiological study. BMJ 2010;341:c3515. [62] Knapp G, Biggerstaff B, Hartung J. Assessing the amount of heterogeneity in random-effects meta-analysis. Biom J 2006, 48:271285. 63
  • 67. [63] Brockwell S.E, Gordon I.R. A comparison of statistical methods for metaanalysis. Stat Med 2001; 20: 825-840. [64] Hardy R.J, Thompson S.G. A likelihood approach to meta-analysis with random effects. Stat Med 1996; 15: 619-629. 64