Like this document? Why not share!

- A gentle introduction to meta-analysis by Angelo Tinazzi 1286 views
- Internal 2012 - individual patient ... by Evangelos Kontopa... 114 views
- Cabildeo de la Ley del Ejercicio de... by UCAB / UCV / USB 2295 views
- Tutorial spss by Yasril Syaf 2518 views
- SPSS-SYNTAX by D Dutta Roy 5831 views
- Attikon 2014 - Software and model s... by Evangelos Kontopa... 170 views

1,303

Published on

No Downloads

Total Views

1,303

On Slideshare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

50

Comments

0

Likes

1

No embeds

No notes for slide

- 1. The use of Prediction Intervals in Meta-Analysis Nikesh Patel March 28, 2013
- 2. Abstract Background Systematic reviews containing meta-analyses of randomised controlled trials provide the best and most reliable information on health care interventions. Meta-analysis combines treatment eﬀects from included studies to produce overall summary results. In the ﬁxed-eﬀect analysis, a common eﬀect is assumed whereas in a random-eﬀects analysis, the model allows for between-study heterogeneity. The goal of analysing heterogeneous studies is not only to report a summary estimate but to explain the observed diﬀerences. Whilst a random-eﬀects model remains gold standard for analysing heterogeneous studies, solely reporting the summary estimate and its 95% conﬁdence interval masks the potential eﬀects of heterogeneity. A 95% prediction interval, which takes into the account the full uncertainty surround the summary estimate, describes the whole distribution of eﬀects in a random-eﬀects model, the degree of betweenstudy heterogeneity and conveniently gives a range for which we are 95% sure that the treatment eﬀect in a brand new study lies within. Aims I aim to apply a 95% prediction interval to a collection of meta-analyses of randomised controlled trials and observe the impact it has on their outcomes. I also aim to apply a 95% prediction interval to meta-epidemiological studies which assesses the inﬂuence of trial characteristics on the treatment eﬀect estimates in meta-analyses. Results I carried out an empirical review to look at the impact of 95% prediction intervals on existing meta-analyses of randomised controlled trials on the Lancet. From 26 studies, I extracted 36 meta-analyses containing between three and thirty-four randomised controlled trials (median eight, IQ range seven) and reproduced each using a randomeﬀects model with a 95% prediction interval. I found 19 (52.8%) had signiﬁcant 95% conﬁdence intervals of which 10 (27.8%) had insigniﬁcant 95% prediction intervals, 9 (25%) had signiﬁcant 95% prediction intervals. Also, 95% prediction intervals were applied to 4 meta-epidemiological studies revealing extra information concerning the summary estimates.
- 3. Conclusion Every random-eﬀects meta-analysis should include a 95% prediction interval but for best performance, the analysis should include a suﬃcient number of good quality unbiased randomised controlled trials. To enhance quality and robustness of metaepidemiological studies, a 95% prediction interval should be included. 2
- 4. Contents 1 Introduction 1.1 Systematic Review . . . . . . . . . . . . . . . 1.2 Meta-Analysis . . . . . . . . . . . . . . . . . . 1.3 Fixed-Eﬀect Meta-Analysis . . . . . . . . . . . 1.4 Carrying out a Fixed-Eﬀect Meta-Analysis . . 1.5 Heterogeneity . . . . . . . . . . . . . . . . . . 1.6 Random-Eﬀects Meta-Analysis . . . . . . . . . 1.7 Carrying out a Random-Eﬀects Meta-Analysis 1.8 Fixed-Eﬀect v Random-Eﬀects . . . . . . . . . . . . . . . . . 3 3 4 5 6 9 11 12 14 2 Prediction Interval 2.1 95% Prediction Interval . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Calculating a Prediction Interval . . . . . . . . . . . . . . . . . . . . 2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18 18 20 3 Empirical review of the impact of using prediction isting meta-analyses 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Search Strategy and Selection Criteria . . . . 3.2.2 Data Calculations . . . . . . . . . . . . . . . . 3.2.3 Software . . . . . . . . . . . . . . . . . . . . . 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Principal Findings . . . . . . . . . . . . . . . 3.4.2 Limitations . . . . . . . . . . . . . . . . . . . 3.4.3 Comparison with other studies . . . . . . . . . 3.4.4 Final Remarks and Implications . . . . . . . . . . . . . . . . . . . 22 22 23 23 24 27 27 36 37 39 40 40 4 Prediction intervals in Meta-Epidemiological studies 4.1 Meta-Epidemiological Study . . . . . . . . . . . . . . . . . . . . . . . 4.2 Prediction Intervals in Meta-Epidemiological Studies . . . . . . . . . 42 43 43 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . intervals on ex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- 5. 4.3 4.2.1 Example 4.2.2 Example 4.2.3 Example 4.2.4 Example Discussion . . . 1 2 3 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 46 48 50 52 5 Final Discussion and Conclusion 53 A STATA Codes 57 2
- 6. Chapter 1 Introduction In health care and medicine, clinicians, researchers and other important ﬁgures require quality and accurate information to assist them in being able to make the best possible decisions on health care interventions. Such information is normally found in systematic reviews containing meta-analyses of randomised controlled trials. 1 The aim of this paper is to investigate the use of prediction intervals in meta-analysis, a typical statistical component of a systematic review and how its application can help aid interpretation of meta-analysis results to a higher degree of quality and accuracy. 1.1 Systematic Review Since the 1990s, systematic reviews have become very important in medicine and health care. Reasons for this are down to the sheer volume of medical literature produced annually and the requirement for clinicians and other health care oﬃcials to have up to data quality and accurate information on health care interventions. 1 The objective of a systematic review is to present a balanced and impartial summary of all the available research on a well-deﬁned research question. 1 It uses systematic and explicit methods to identify, assess, select and synthesise all the evidence that is relevant to answering a well-deﬁned research questions in an objective and unbiased manner. Systematic reviews have replaced traditional narrative reviews since the former does not follow peer-protocol, do not use any kind of rigorous methods and tend to lack transparency causing bias; a systematic review corrects these issues. 2 A systematic review begins by clearly deﬁning a research question of interest, this may include what treatments are being compared, what outcomes are being measured, what is the population of interest etc. The next step is to search for studies that are relevant to the research question, this is done by searching all of the published and unpublished information against a well-deﬁned quality search criterion which can 3
- 7. involve searching databases such as MEDLINE, PubMed etc. The studies which pass through the search criterion go through further quality assessment to remove any irrelevant studies. The next step is extract all the relevant data from the included studies and then carry out a statistical synthesis of the data which is done using meta-analysis (see Meta-Analysis). The ﬁnal step is to present all the ﬁndings from the analysis as well as analysing any possible heterogeneity between the studies, commenting on the quality of the studies (e.g. bias) and identifying areas of further research. 1 Examples of systematic reviews can be found easily on the internet on websites of the British Medical Journal (BMJ) or the Cochrane Collaboration and many more. These websites dedicate themselves to provide information on health care interventions to the health care and medicine industry. A robust methodology for preparing and producing systematic reviews can be found on these websites for example, The Cochrane handbook for systematic reviews of interventions. 3 1.2 Meta-Analysis “The Statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the ﬁndings” Gene V. Glass deﬁnition of Meta-Analysis A meta-analysis is a statistical technique whereby results from studies, included in the analysis, are combined to produce a total and complete summary of the studies. In epidemiology, a stereotypical systematic review of randomised controlled trials will use meta-analysis as its statistical component whereby treatment eﬀects from individual trials are synthesised in the aim of assessing clinical eﬀectiveness of healthcare interventions. 4 Meta-analysis is based on one of two models, the ﬁxed-eﬀect and the random-eﬀects model. In this chapter, I discuss both models and when each type should be used. It ﬁrst seems appropriate to address the reasons why we would want to use a metaanalysis and not the traditional narrative approach. In a narrative approach, the focus tends to be on p-values of individual studies and observing if there is signiﬁcant eﬀect in each study. Since there is no rigorous way of synthesising p-values, the ﬁndings from a narrative approach tends to lack transparency and in many cases, the researchers may only include studies that support their own opinions which leads to the results being biased towards their own opinions. 1;2 A meta-analysis, on the other hand, works directly with the treatment eﬀects of each study and their respective standard errors and performs one single synthesis of all the data to produce an aggregate summary ˆ estimate, which I denote as θ. 4 Since we are combining all the information across the studies, we reduce uncertainty compared to any individual study since we are increasing the sample size and in turn, increasing the power to detect clinically meaningful 4
- 8. results. 2 A meta-analysis also addresses the consistency of treatment eﬀects across the studies, something a narrative approach fails to do. If the treatment eﬀects are consistent, then the focus is on the summary estimate and making sure we estimate this accurately as possible. If the treatment eﬀects are not consistent, the not only should we estimate the summary eﬀect but explain the diﬀerences that exist between the studies. 2;5 Treatment eﬀects are generally much more important to clinicians and other health care oﬃcials compared to p-values. The eﬀect size tells us not only if the treatment eﬀect is better/worse (i.e. greater or less than the null value), but also the magnitude of the eﬀect. Also, p-values can be easily misinterpreted as some researchers may deem a non-signiﬁcant p-value to suggest the treatment eﬀect has no eﬀect. 2 I later return to the argument of a narrative approach against a meta-analysis when I consider an example (see page 8). A vital requirement for a strong meta-analysis is a well-conducted systematic review. If the underlying systematic review isn’t carried out under good conduct, the metaanalysis will produce results that may lead to misleading conclusions. 1;2 A metaanalysis should also be carried out under good conduct, once again I recommend the Cochrane handbook for systematic reviews on how to conduct a good meta-analysis. 3 1.3 Fixed-Eﬀect Meta-Analysis The ﬁrst type of meta-analysis I discuss is the ﬁxed-eﬀect meta-analysis. The ﬁxedeﬀect model assumes that all the studies included in the analysis are estimating the same underlying treatment eﬀect or in other words, we believe the true treatment eﬀect is common across all the other studies and each study is estimating that same true treatment eﬀect. The repercussions of this model is that any diﬀerences observed between the individual treatment eﬀects are down solely to random sampling error (within-study error). If we had an inﬁnite number of studies with an inﬁnitely large sample size, we expect the within-study error in each study to tend to zero and the individual treatment eﬀects to be the same as the true common treatment eﬀect. 2 In the ﬁxed-eﬀect model, we can express the observed treatment eﬀects in the following way, ˆ Y k = θ + εk (1.1) ˆ where Yk is the observed treatment eﬀect in study k, θ is the estimate of the summary treatment eﬀect and εk is the random sampling error in study k. We can assume that the errors follow a normal distribution with mean 0 and variance equal to the variance of the treatment eﬀect in study k, i.e. that εk ∼ N(0,Var(Yk )). Here the 5
- 9. errors account for the within-study error in each study since in the ﬁxed-eﬀect model, we assume this is the only source of variation. 2 ˆ For the ﬁxed-eﬀect meta-analysis, the aim is to compute the summary estimate θ, which is interpreted as the best estimate of the common treatment eﬀect that underlies each of the studies in the analysis, along with a 95% conﬁdence interval. 1.4 Carrying out a Fixed-Eﬀect Meta-Analysis A general approach to meta-analysis is given by the inverse-variance method, this method works for any type of data as long as we can obtain a treatment eﬀect and its standard error. 2 For continuous data, we need a mean diﬀerence (or any kind of diﬀerence), for survival data, we need a log hazard ratio and for binary outcomes, we need a log odds ratio or log relative risk along with their respective standard errors (log standard errors for ratios). In the ﬁxed-eﬀect model, the weight assigned to each study is one over the variance of the study, hence the term inverse variance method. Studies with smaller variances are assigned larger weights than studies with larger variances. The ﬁxed-eﬀect inverse-variance weighting is therefore given by Wk = 1 . V ar(Yk ) (1.2) where V ar(Yk ) is the variance of the observed treatment eﬀect in study k. ˆ The formula for θ using a ﬁxed-eﬀect model is given by ˆ θ= n k=1 Yk Wk n k=1 Wk (1.3) which has variance given by 1 ˆ V ar(θ) = n k=1 Wk . (1.4) Here Wk is the weighting given using the inverse variance given by (1.2). I note that ˆ θ is the maximum likelihood estimate for θ and it is asymptotically unbiased, eﬃcient ˆ and normal. 6 I reiterate that θ should be interpreted as the best estimate of the common treatment eﬀect since the ﬁxed-eﬀect model assumes that each of the studies in the analysis are estimating the same treatment eﬀect. 6
- 10. We also calculate a 95% conﬁdence interval to express our uncertainty around our ˆ ˆ summary estimate θ, assuming that θ is approximately normally distributed, using the following formula ˆ ˆ θ ± 1.96s.e.(θ) . (1.5) If we are working on the log scale, i.e. we are using some type of ratio, we must ˆ remember to exponentiate θ in (1.3) and the end points of the conﬁdence interval in (1.5). I could also present a 100(1 − α)% conﬁdence interval but for convention, I am only going to calculate 95% conﬁdence intervals in this paper. Example 1 Table 1.1, presented below, shows the results from ten randomised controlled trials each comparing the beneﬁt of an anti-hyperintensive treatment, treatment A, against placebo. Each trial is presented with its unbiased estimated mean diﬀerence in change in systolic blood pressure (mmHg), variance and a 95% conﬁdence interval. 7 T rial(k) 1 2 3 4 5 6 7 8 9 10 Yk -0.49 -0.17 -0.52 -0.48 -0.26 -0.36 -0.47 -0.30 -0.15 -0.28 V ark 0.12 0.05 0.06 0.14 0.06 0.08 0.05 0.02 0.07 0.25 95% C.I. [-1.17,0.19] [-0.61,0.27] [-1.00,-0.04] [-1.21,0.25] [-0.74,0.22] [-0.91,0.19] [-0.91,0.03] [-0.58,-0.02] [-0.67,0.37] [-1.26,0.70] Table 1.1: Results of trials comparing treatment A against placebo (A value < 0 represents a reduction in blood pressure and therefore beneﬁcial) Using a ﬁxed-eﬀect model, we weight each study using (1.2) and then obtain a summary estimate for the treatment eﬀect along with a 95% conﬁdence interval. Using ˆ (1.3), we calculated our summary estimate θ to be -0.33, so we expect treatment A to consistently reduce systolic blood pressure by 0.33mmHg. Our 95% conﬁdence interval calculated using (1.5) is [-0.48,-0.18]. Since the null value of 0 is not in the ˆ 95% conﬁdence interval for θ, there is strong evidence at 5% level that treatment A is eﬀective in reducing systolic blood pressure. The results are presented in a forest plot given below in ﬁgure 1.1. 7
- 11. Figure 1.1: Forest plot of a meta-analysis of randomised controlled trials showing the eﬀects of treatment A on reducing systolic blood-pressure (SMD = standardised mean diﬀerence) 7 On the forest plot, the squares represent the weight that is assigned to the corresponding study with the centre of the square depicting the observed treatment eﬀect for that study. The 95% conﬁdence interval for each study is represented by the lines going through the squares with them beginning and ending at the end points of the interval. The diamond at the bottom of the forest plot represents the 95% conﬁdence interval of the summary estimate with the centre representing the summary estimate. I know return to the argument of using meta-analysis over a narrative approach. If we observing the forest plot in ﬁgure 1.1, eight trials have a conﬁdence interval that contains the null value 0 and therefore have insigniﬁcant p-values. If we took a narrative approach and consider each study separately, we would most likely conclude that since 80% of the studies produced insigniﬁcant p-values, the treatment isn’t beneﬁcial. When we perform a meta-analysis, the 95% conﬁdence interval for the summary estimate doesn’t contain the null value and therefore we obtain a signiﬁcant p-value since we have increased the power to detect signiﬁcant results. 2 8
- 12. 1.5 Heterogeneity In the ﬁxed-eﬀect model, we assumed that all the studies in the analysis are estimating the same treatment eﬀect and the only error we allow for is random sampling error (within-study heterogeneity), but is this always a plausible assumption. In general, studies looking at the same treatment may diﬀer in many ways such as patient characteristics (age, patient health etc), location of study, intervention applied (dosage etc) and many more known and unknown factors causing the treatment eﬀects across the studies to longer remain consistent. 2 If the treatment eﬀects are no longer consistent, then there exist real diﬀerences between the studies (between-study heterogeneity) and the aim of a meta-analysis should be to assess the heterogeneity between the treatment eﬀects as well as calculate a summary estimate. 2;5;8 If we used a ﬁxed-eﬀect method in the presence of between-study heterogeneity, we would be wrongly implying a common eﬀect exists and hence lead to misleading conclusions about the treatment. I now discuss ways in which we can assess heterogeneity, since heterogeneity is made up of real diﬀerences (between-study heterogeneity) and random sampling error (withinstudy error), we need some tools to help us see if between-study heterogeneity is present. I ﬁrst introduce the Q-statistic which is based on the result of the Q-test. This test is useful if we believe the presence of between-study heterogeneity is causing more variation in the treatment eﬀects than is expected only by random sampling error. 2;9 The Q-test is then deﬁned as follows; H0 : Y1 = Y2 = · · · = Yk (for all k studies) H1 : At least one Yk diﬀers, where Yk is the observed treatment in study k and Wk is the ﬁxed-eﬀect weighting of study k. The Q-statistic, which is given by the following formula n Wk Yk2 − Q= k=1 ( n 2 k=1 Wk Yk ) , n k=1 Wk (1.6) is compared to χ2 (α). If we ﬁnd Q > χ2 (α), then we reject the null hypothesis at n−1 n−1 (1−α)% level and this suggests that there is evidence of between-study heterogeneity. If Q < χ2 (α), then we accept the null hypothesis at at (1 − α)% and this suggests n−1 there is no evidence of between-study heterogeneity. 2;9 Another useful statistic is the I 2 -statistic, this measures approximately the percentage of total variation that is down to between-study heterogeneity. 9 It is given by the following formula; 9
- 13. I 2 = 100% (Q − (k − 1)) Q (1.7) where Q is the Q-statistic worked out using (1.5). If our I 2 is 0%, this suggests that all the variability in our summary estimate is down to random sampling error (within-study heterogeneity) and not down to between-study variation and therefore it could make sense to use a ﬁxed-eﬀect model. I 2 values are considered by Higgins et al. to be low, moderate and high on the values of 25%, 50% and 75% respectively. 2;9 If we obtain a negative value for I 2 , the value is set to 0 and interpreted in the same way as 0. I must stress that both the Q-test and the I 2 -statistic should be used as tools to help us to decide what model we use, the decision on what model we use shouldn’t be solely based on the performance of the Q-test and I 2 -statistic since they aren’t precise. If we consider the Q-test, while a signiﬁcant p-value suggests that there exists variation in the individual treatment eﬀects, a non-signiﬁcant p-value doesn’t necessarily mean a common eﬀect exists. The lack of signiﬁcant can be as a result of a lack of power. If there are few trials or we have lots of within-study error as a result of trials having small sample sizes, the even the presence of a large amount of between-study heterogeneity may result in a non-signiﬁcant p-value. 2 If there are few studies, a signiﬁcance level of 10% is often used because of lack of power, so a p-value strictly less than 0.1 would be enough to accept the null hypothesis that there exists no between-study heterogeneity. The I 2 -statistic itself is dependent on the Q-statistic therefore if the Q-test lacks power, then the I 2 will be imprecise. Also, I 2 may tell us what proportion of the variation is down to real error but what it doesn’t tell us is how spread out the error is. A high value of I 2 implies a high proportion of the variation is down to real error but this error may only be spread out narrowly since the studies may have high precision. Conversely, a low I 2 only implies a low proportion of variation is down to real error but doesn’t imply the eﬀects are grouped together in a narrow range, they could easily vary of a wide range if the studies used lack precision. 2 Higgins in his paper 10 talks about the misunderstanding of the I 2 -statistic and believes it should only be used a descriptive statistic. Example 1 (Continued) I now apply both the Q-test and the I 2 -statistic to example 1 and see if conducting a ﬁxed-eﬀect analysis to that example was appropriate. Conducting a Q-test leads to a Q-statistic of 2.490 using (1.6), this is compared to χ2 = 14.684 (we use 10% 9 level of signiﬁcance, since we only have a few studies). Since our test statistic of 10
- 14. 2.490 < 14.684, there is no statistical evidence against H0 at 10% level of signiﬁcance. This suggests that there is no sign of between-study heterogeneity. I also work out the I 2 -statistic, here our I 2 value is −261.385% using (1.7), which is set to 0 which suggests that the total variation across the studies is only down to within-study error. If we observe the forest plots in ﬁgure 1.1, it’s fairly clear to see that the observed treatment eﬀects do not deviate too far from the summary estimate so using a ﬁxedeﬀect model seems appropriate, so I can regard our summary estimate as the common eﬀect. If we conclude that between-study heterogeneity is present, we cannot use the ﬁxedeﬀect model, we instead use the random-eﬀects model which is discussed in the next chapter. I brieﬂy discuss two alternatives that try and eradicate all presence of between-study heterogeneity which can be ideal from a researchers perspective. The ﬁrst is sub-group analysis, in this case, a series of ﬁxed-eﬀect meta-analyses are performed on each sub-group where studies in each group are deemed similar enough to assume a common eﬀect. Problems with the sub-group analysis is that each sub-group will contain fewer studies so we have a loss of power and instead of carrying out one synthesis, we are doing several and we still aren’t guaranteed a suﬃcient amount of between-study heterogeneity will be removed. 2 The second option is meta-regression where the covariates in the model explain the variation in the data and we can obtain the treatment eﬀect for each covariate while adjusting for the others. A problem with this method is that unidentiﬁed sources of heterogeneity aren’t accounted for. 11 A problem inherent in both alternatives is that with a few studies, both aren’t useful since there is a loss of power, i.e. in the case of meta-regression, we have low power to detect what covariates explain heterogeneity. 2;11 1.6 Random-Eﬀects Meta-Analysis The second type of meta-analysis I discuss is the random-eﬀects meta-analysis. This model assumed that the individual treatments eﬀects vary across the studies because of the presence of real diﬀerences (between-study heterogeneity) as well as random sampling error. A random-eﬀects model assumes that the true eﬀects from the individual studies come from a distribution of true eﬀects with mean θ and variance equal to the magnitude of the between-study heterogeneity which I denote as τ 2 and term between-study variance (we can usually assume a normal distribution). The repercussions of this model is that if we had an inﬁnite number of studies with an inﬁnitely large sample size, we expect the random sampling error to tend to zero but expect the individual treatment eﬀects to still diﬀer because of real diﬀerences that exist between them. 2;5 In the random-eﬀects model, we can express the observed treatment eﬀects in the following way, 11
- 15. ˆ Yk = θ + ζk + εk (1.8) ˆ where θ is the summary estimate, εk is the sampling error in study k and ζk is the between-study error in study k. We again assume that εk ∼ N(0,Var(Yk )) and assume that ζk ∼ N(0,ˆ2 ). Here the errors account for the within-study error and the betweenτ study error since in the random-eﬀects model, we allow for two sources of variation. 2 ˆ For the ﬁxed-eﬀect meta-analysis, the aim is to compute the summary estimate θ, which is interpreted as the best estimate of the common treatment eﬀect that underlies each of the studies in the analysis, along with a 95% conﬁdence interval. For the random-eﬀects meta-analysis, computing the summary estimate alone and its 95% conﬁdence interval is insuﬃcient. Since we assume there exists real diﬀerences between the treatment eﬀects, the aim of a random-eﬀects meta-analysis is not only to compute the summary estimate but also to explain the diﬀerences that exists between the trials and learn about how the individual treatment eﬀects are distributed about the ˆ summary estimate. 2;5 I note that the summary estimate θ is now interpreted as the average eﬀect. 1.7 Carrying out a Random-Eﬀects Meta-Analysis To carry out a random-eﬀects meta-analysis, we ﬁrst need to estimate the betweenstudy variance since it describes the magnitude of the between-study heterogeneity ˆ and this has to be incorporated into the calculations of the summary estimate θ. To estimate τ 2 , we use the DerSimonian and Laird method which provides an unbiased point estimate for τ 2 . 12 This is given by the following formula, τ2 = ˆ Q − (k − 1) n k=1 Wk − n 2 k=1 Wk n k=1 Wk (1.9) where Q is the Q-statistic calculated using (1.6) and Wk are the weights for each study from the ﬁxed-eﬀect meta-analysis calculated using (1.2). I note that should Q < (k − 1), then we set τ 2 = 0. If our point estimate of between-study variance is zero (implying no between-study heterogeneity), then the random-eﬀects model reduces to the ﬁxed-eﬀects model. Similar to the ﬁxed-eﬀect model, we use the inverse variance method to weight the individual studies. In the ﬁxed-eﬀect model, since we assume each study is estimating the same common eﬀect, the study with the highest precision is given the largest weighting since it will contain the most information about the true summary eﬀect 12
- 16. θ. In a random-eﬀects model, the weighting has to be given more care since each study is no longer estimating the same treatment eﬀect. 2 The weighting must now take into account the estimate of the between-study variance τ 2 so the study with the ˆ largest precision doesn’t have as much inﬂuence as it would if a ﬁxed-eﬀect model was assumed. So, in a random-eﬀects model, the weight given to each study is given by ∗ Wk = 1 . V ar(Yk + τ 2 ) ˆ (1.10) ˆ The formula for θ using a random-eﬀects model is given by ˆ θ= n ∗ k=1 Yk Wk n ∗ k=1 Wk (1.11) and has variance ˆ V ar(θ) = 1 n k=1 ∗ Wk . (1.12) ˆ I reiterate that θ should be interpreted as average or mean treatment eﬀect and not the common eﬀect, since by using a random-eﬀects model, I am assuming that the true eﬀects from each of the studies are distributed about the man of a distribution ˆ of true eﬀects and θ is the estimate of this mean. I also note that the true treatment eﬀect in an individual study could be lower or higher than this average eﬀect. ˆ A 95% conﬁdence interval for θ is given by ˆ ˆ θ ± 1.96s.e.(θ) . (1.13) Example 2 Table 1.2 presented below shows the results from ten randomised trials each comparing the beneﬁt of another anti-hyperintensive treatment, treatment B against placebo. Each trial is presented with its unbiased estimated mean diﬀerence in change in systolic blood pressure (mmHg), variance and a 95% conﬁdence interval. 7 13
- 17. T rial(k) 1 2 3 4 5 6 7 8 9 10 θk 0.00 0.10 -0.40 -0.80 -0.63 -0.22 -0.34 -0.51 -0.03 -0.81 V ark 0.423 0.219 0.026 0.199 0.301 0.301 0.071 0.102 0.122 0.301 95% C.I. [-0.829,0.829] [-0.329,0.529] [-0.451,-0.349] [-1.190,-0.410] [-1.220,-0.040] [-0.370,0.810] [-0.480,-0.201] [-0.710,-0.310] [-0.209,0.269] [-1.340,-0.220] Table 1.2: Results of trials comparing treatment B against placebo (A value < 0 represents a reduction in blood pressure and therefore beneﬁcial) I ﬁrst test for heterogeneity to help us decide what type of meta-analysis we should use. We obtain a Q-statistic of 30.876 > χ2 (0.05) = 14.684 using (1.6) which suggests 9 evidence of heterogeneity at 10% level of signiﬁcance. I also obtained an I 2 value of 70.85% using (1.7) which suggests that 70.85% of the variation in treatment eﬀects is due to between-study heterogeneity and the rest is due to chance. This is considered a high level of between-study heterogeneity and therefore a random-eﬀects meta-analysis would seem appropriate to use. Using the formulas (1.9) through to (1.13), I obtained τ 2 to be 0.029 and summary ˆ estimate of -0.33 along with 95% conﬁdence of [-0.48,-0.18]. So on average, treatment B reduced systolic blood pressure by 0.33mmHg but in an individual study, the treatment eﬀect can vary from this average and since the null value of 0 is not in the 95% conﬁdence interval, there is strong evidence at 5% level that treatment B, on average, is beneﬁcial. A forest plot of the results from the meta-analysis is shown in ﬁgure 1.2. We can see that unlike in ﬁgure 1.1, there is clear deviations from the individual treatment eﬀects to the summary estimate so it would therefore seem appropriate to assume that each trial is estimating a diﬀerent treatment eﬀect and use a randomeﬀects model to account for it. 1.8 Fixed-Eﬀect v Random-Eﬀects It is imperative that when conducting a meta-analysis, the right model is chosen since it inﬂuences how we interpret the results. If we look at examples 1 (ﬁgure 1.1 on page 8) and 2 (ﬁgure 1.2 on page 15), both of these produce the same summary estimate of -0.33 and have the same 95% conﬁdence interval of [-0.48,-0.18]. Despite these similarities, the way in which they are interpreted are very diﬀerent. In example 1, I used 14
- 18. Figure 1.2: Forest plot of a meta-analysis of randomised controlled trials showing the eﬀects of treatment B on reducing systolic blood-pressure (SMD = standardised mean diﬀerence) 7 a ﬁxed-eﬀect model which I justiﬁed because I believeed there is no presence of real diﬀerences between the studies so the summary estimate is the common eﬀect across the studies. In example 2, I decided to use a random-eﬀects model since I believed the variation between the individual treatment eﬀects were down to real diﬀerences as well as random-sampling error so therefore, I regard the summary estimate as the average across the studies but in an individual study, the treatment eﬀect can vary from this average eﬀect. Despite these diﬀerences, there still seems to be some misunderstanding when it comes to choosing what model to use and in interpreting the results. Riley at al. 7 reviewed ˆ 44 Cochrane reviews that wrongly interpreted θ as the common eﬀect rather than the average eﬀect when using a random-eﬀects approach. They also reviewed 31 Cochrane reviews that used a ﬁxed-eﬀect meta-analysis and found that 26 of these had I 2 values of 25% or more without justifying why a ﬁxed-eﬀect model was used. Using a ﬁxed-eﬀect model in these situations must be justiﬁed, otherwise we end up making inaccurate conclusions from the results since we are suggesting there is 15
- 19. a single common eﬀect when actually no common treatment eﬀect exists because of real diﬀerences amongst the studies. A reason for misinterpretation can be put down to the fact that if we observed the forest plots for examples 1 and 2, the results are presented in the same way which causes confusion. Skipka et al. 13 point this out and also point out that the point estimate of τ 2 is never displayed on the forest plot. I have already commented that the choice of what model we use shouldn’t be solely based on the Q-test and the I 2 -statistic but how should we go about choosing what model we use. Let say we wish to carry out a meta-analysis on a suﬃcient number of studies looking at some treatment against placebo. If we know there are a suﬃcient number of properties that these studies have in common, for example similar age range, similar dosage, similar follow-up time etc, then it would seem appropriate to use a ﬁxed-eﬀect model since we believe there are negligible real diﬀerences between the studies and any factors that do aﬀect the treatment eﬀects are the same across the studies. A common procedure is to carry out a ﬁxed-eﬀect meta-analysis and observe the forest plot to see if the observed treatment eﬀects are similar. 2 There are two problems with this, ﬁrstly it isn’t clear if the observed diﬀerences are only down to random sampling error and if this was the incorrect model, then carrying it out was a waste of time. If we believe there are real diﬀerences, then a random-eﬀects model should be implemented. In this model, each study is expected to be estimating a diﬀerence treatment eﬀect and the job of this type of meta-analysis is to make sense of the diﬀerences between the studies and how the true individual treatment eﬀects are distributed about the summary estimate. 2;5 A clear advantage of a random-eﬀects meta-analysis is that we can generalise our results to a range of populations not included in the analysis given that the analysis includes a suﬃcient number of studies, this maybe one of the goals of the underlying systematic review. 2;5 If we wanted to estimate what the treatment eﬀects will be in a new study, we can draw it from our results as long as we can describe how the individual treatments are distributed about the summary estimate with adequate precision. 5 In a ﬁxed-eﬀect model, we cannot generalise since our results are exclusive to certain properties, for example a particular population. 2 16
- 20. Chapter 2 Prediction Interval In the presence of between-study heterogeneity, the aim of a meta-analysis isn’t just to calculate the summary estimate but also to make sense of the heterogeneity. I have already pointed out that methods of eradicating all presence of heterogeneity can be diﬃcult because of unknown sources of heterogeneity so it would seem better to assess heterogeneity rather than try and remove it. Higgins 10 believes any amount of heterogeneity is acceptable provided there is a “sound predeﬁned eligibility criteria” and that the “data is correct” but stresses that a meta-analysis must provide a stern assessment of heterogeneity. Since a random-eﬀects meta-analysis accounts for unidentiﬁed sources of heterogeneity 7 , I believe it should be gold standard for explaining heterogeneous data. Unfortunately, once researchers have carried out a random-eﬀects meta-analysis, they tend to focus on the summary estimate and its 95% conﬁdence interval, this however isn’t suﬃcient since, by the assumption of a random-eﬀects model, we allow for real diﬀerences between the individual studies. 2;7 If we were using a ﬁxed-eﬀect model, then focusing on the summary estimate, which gives the best estimate of the common eﬀect, and its 95% conﬁdence interval, which describes the impact of within-study heterogeneity on the summary estimate, is adequate. The random-eﬀects summary estimate tell us the average eﬀect across the studies and its 95% conﬁdence interval indicates the region in which we are 95% sure that our estimate lies in, neither tell us how the individual treatment eﬀects are distributed about the random-eﬀects summary estimate. 5 This leads us to the introduction of the prediction interval which is discussed in this chapter. 17
- 21. 2.1 95% Prediction Interval A 95% prediction interval gives the range for which we are 95% sure that the potential treatment eﬀect of a brand new individual study lies. The beauty of a prediction interval is that not only does it quantitatively give a range for a treatment eﬀect in a new study thus allowing the researcher, clinicians etc to apply the results into future applications, but it also oﬀers a suitable way to express the full uncertainty around the summary estimate in a way which acknowledges heterogeneity. A prediction interval can also describe how the true individual treatment eﬀects are distributed about the summary estimate. 2;5;7;13 For these reasons, the inclusion of a prediction interval in a random-eﬀects meta-analysis can make its conclusions more robust and provide a more complete summary of the results and therefore making the results more relevant to clinical practice. 14 The notion of a prediction interval was ﬁrst proposed by Ades et al. 8 where they propose a predictive distribution of a future treatment eﬀect in a brand new study using a Bayesian approach to meta-analysis. A further push for the prediction interval in meta-analysis is seen in a paper by Higgins et al. 5 . The authors acknowledge the small attention that has been given to predictions to meta-analysis and present the prediction interval in a classical (frequentist) framework to meta-analysis. Higgins et al. 10;5 believe that a prediction interval is the most convenient way to present the ﬁndings of a random-eﬀects meta-analysis in a way that acknowledges heterogeneity since it takes into account the full distribution of eﬀects in the analysis. 2.2 Calculating a Prediction Interval When calculating a prediction interval, we not only account for the between-study and within-study heterogeneity, but also for the uncertainty of the summary estimate ˆ θ and the uncertainty of the between-study variance τ 2 . 2 Let say we knew the true ˆ values of the summary eﬀect θ and the between-study variance τ 2 , if we made the assumption that the treatment eﬀects across the studies are normally distributed, the 95% prediction interval would be given by √ θ ± 1.96 τ 2 . (2.1) The problem with (2.1) is that we do not know the exact values of theta and τ 2 , rather we are estimating them and because of this, there is uncertainty surrounding these estimates. 2 To account for this, we use the following formula provided by Higgins et al. 5 for a 95% prediction interval which is given by 18
- 22. ˆ 0.05 θ ± tn−2 ˆ τ 2 + V ar(θ) . ˆ (2.2) ˆ ˆ Here, θ is the summary estimate form the random-eﬀects meta-analysis, V ar(θ) is the variance of the summary estimate accounting for the uncertainty of the estimate of 0.05 θ, τ 2 is the estimate of the between-study variance, tn−2 is the t-value corresponding ˆ to the 95th percentile of the t-distribution where there are n − 2 degrees of freedom (where n is the number of studies) which accounts for the uncertainty of the estimate of τ 2 . 2;5 We require at least three studies to calculate a prediction interval 7 and we also must remember to exponentiate the end points of (2.2) if we are working on the log scale. Example 2 with a Prediction Interval In example 2, I used a random-eﬀects model and found the summary estimate to be -0.33mmHg, between-study variance τ 2 to be 0.029 and the 95% conﬁdence interval ˆ for the summary estimate to be [-0.48,-0.18] (see ﬁgure 1.2). I can now calculate a prediction interval for example 2 using (2.2), I obtained the interval [-0.76,0.09]. We notice that the null value of 0 is now in the prediction interval so therefore, it isn’t statistically signiﬁcant at the 5% level. So, in a brand new individual study setting, we are 95% sure that the potential treatment eﬀect for this study will be between 0.76mmHg and 0.09mmHg. Although on average, the treatment will be beneﬁcial (as indicated from the 95% conﬁdence interval), in a single study setting, we cannot rule out that the treatment may actually be harmful (since the 95% prediction interval contains values < 0). The prediction interval therefore acknowledges the impact of heterogeneity that was masked by just focusing on the random-eﬀects summary estimate and its 95% conﬁdence by themselves. A forest plot for example 2 is given in ﬁgure 2.1 but now includes a 95% prediction interval. The prediction interval is given by the diamond at the bottom of the forest plot in ﬁgure 2.1. The centre of the diamond represents the random-eﬀects summary estimate, the width of the diamond represents the 95% conﬁdence interval for the summary estimate and the width of the lines going through the diamond represents the 95% prediction interval. Skipka et al. 13 discuss diﬀerent methods that have been proposed of how a prediction interval should be presented in a forest plot. They also suggests that the inclusion of a prediction interval in a forest plot is a good way of distinguishing between a random-eﬀects and ﬁxed-eﬀect forest plot. Throughout this paper, I will present a 95% prediction interval in a forest plot as is seen in ﬁgure 2.1. 19
- 23. Figure 2.1: Forest plot of a meta-analysis of randomised controlled trials showing the eﬀects of treatment B on reducing systolic blood pressure with a 95% prediction interval (SMD = standardised mean diﬀerence) 7 2.3 Discussion It is important that I address a few issues that arise when working with a prediction interval. A problem that exists in both prediction interval and in a random-eﬀects meta-analysis is when the analysis has few studies. If we have few studies, regardless how large they are, the prediction interval will be wide because of the lack of precision in the DerSimonian and Laird method (using (1.9)) estimate of τ 2 . 2;5 If our meta-analysis contains few studies and has substantial between-study heterogeneity, a random-eﬀects meta-analysis remains the correct option but an alternative approach could be to use a Bayesian approach to estimate τ 2 instead of using the DerSimonian and Laird method which is sensitive to the number of studies in the analysis. A Bayesian approach uses prior information outside the studies to calculate an estimate to τ 2 . This approach has the advantage of naturally allowing the full uncertainty ˆ around all the parameters in the model and incorporation information that may not be considered in a frequentist model. The approach however can be diﬃcult to im20
- 24. plement and could be prone to bias. I refer papers by Higgins et al. 5 and Ades et al. 8 which provide a more thorough description on the Bayesian approach to prediction intervals. Another problem that occurs because of having a small number of studies is the validity of the assumption that when calculating a prediction interval, the population in a new study “suﬃciently similar” to those in the studies already included in the analysis. In a random-eﬀects meta-analysis, since we allow for real diﬀerences, each study will be diﬀerent in many ways, the more studies we have, the broader the range of populations we cover thus validating this assumption. 5 We also assume that each study has a low risk of bias, i.e. that each study included in the analysis has been carried out under good conduct. If this wasn’t the case, the prediction interval will inherit heterogeneity caused by these biases. 7 Finally, it seems meaningful to make it absolutely clear the diﬀerences between a random-eﬀects 95% conﬁdence interval and a 95% prediction interval since. A 95% conﬁdence interval in a random-eﬀects meta-analysis contains the region in which we are 95% sure that our summary estimate (regarded as the average eﬀect) lies within. The width of the conﬁdence interval accounts for the error in the summary estimate and with an inﬁnite number of inﬁnitely large studies, the end points of the conﬁdence interval will tend to the summary estimate. 2 The mistake that is made is that the 95% conﬁdence interval from a random-eﬀects meta-analysis measures the extent of heterogeneity but this wrong since we only consider the error in the summary estimate. 5 A 95% prediction interval contains the region in which we are 95% sure that the potential treatment eﬀect in a brand new individual study lies within. Another way to describe a 95% prediction interval is that we can draw the potential treatment eﬀect, denoted ynew with 95% precision from the prediction interval since the prediction interval describes how the true individual treatment eﬀects are distributed about the summary estimate. 5 If we had an inﬁnite number of inﬁnitely large studies, we expect the width of the prediction interval to reﬂect the actual variation between the true treatment eﬀects. 2 Since the 95% prediction interval accounts for all the uncertainty, the 95% prediction interval will never be smaller than its corresponding 95% random-eﬀects conﬁdence interval so we can regard the 95% random-eﬀects conﬁdence interval as a subset of the 95% prediction interval. 21
- 25. Chapter 3 Empirical review of the impact of using prediction intervals on existing meta-analyses 3.1 Introduction A random-eﬀects meta-analysis should remain gold standard for analysing heterogeneous studies but solely presenting the summary estimate from the random-eﬀects meta-analysis and its 95% conﬁdence interval masks the potential eﬀects of heterogeneity. 7 The addition of a prediction interval gives a more complete summary of the results from a random-eﬀects meta-analysis in a way that acknowledges heterogeneity and therefore making it easy to apply to clinical practice. 5 A 95% prediction interval, with enough studies, can describe the distribution of true treatment eﬀects and therefore gives a range for which we can be 95% sure that the potential treatment eﬀect in a brand new study, ynew , is within. 2;5 The aim of this review is to assess the impact of a 95% prediction interval on the outcomes of existing meta-analyses of randomised controlled trials. I want to see if the inclusion of a 95% prediction interval can help interpret the results of a randomeﬀects meta-analysis to a higher degree of accuracy and therefore recommend whether or not a random-eﬀects meta-analysis should always include a 95% prediction interval in its analysis. 22
- 26. 3.2 3.2.1 Methods Search Strategy and Selection Criteria To ﬁnd the studies for the review, I electronically searched for studies on the Lancet website (www.lancet.com). I used the Lancet since it is one of the oldest and most respected medical journals and has vast amounts of medical literature. I used the advanced search toolbar on the Lancet website using the key words “RANDOMISED TRIAL” and “META ANALYSIS” in the abstract of all research, reviews and seminars in all years in all Lancet journals. The search was carried out on 20/12/2011 and produced 61 studies. For each study, I initially obtained a PDF ﬁle of the study plus any supplementary material using Sciencedirect via access through the University of Birmingham student portal. The eligibility criteria for the studies to enter the review is that each study must include at least one meta-analysis of three or randomised controlled trials on their primary outcomes as deﬁned by the authors of the studies. Of the 61 studies, I reviewed their abstracts to remove any irrelevant studies. I excluded studies that only contained a meta-analysis of non-randomised controlled trials (e.g. observational studies) since I am only interested in meta-analyses of randomised controlled trials whereby patients are randomly assigned to the treatment or control group. Randomised controlled trials cancel the eﬀects of known and unknown confounding factors as well as selection bias. 2 I also excluded studies that had a meta-analysis of less than three randomised controlled trials which is seen as the minimum number of trials required to calculate a prediction interval. 7 In the case where the meta-analysis contained a mixture of randomised and non-randomised controlled trial, I took the meta-analysis of the randomised controlled trials only if the author had explicitly presented the meta-analysis of the randomised controlled trials along with the overall meta-analysis, if they only presented a meta-analysis covering all randomised and non-randomised trials, the study is excluded. I also excluded any studies that didn’t display data by trial. Other reasons for study exclusion were that some of the studies were only randomised controlled trials and not meta-analyses, some studies were informative studies or research papers on meta-analysis and a couple of studies were network meta-analyses which were removed since they are potentially more subject to error than typical meta-analyses. I also came across studies that were duplicates for which I only considered the most recent study. The ﬂow chart given below in ﬁgure 3.1 describes the process. The boxes contain the reasons for excluding the studies and the number represents the studies that were removed for that reasons. 23
- 27. Figure 3.1: Flow chart describing the process of excluding studies for the review 3.2.2 Data Calculations I had a total of 26 studies that passed my eligibility criteria to enter the review. From these studies, I extracted 36 meta-analyses containing between three to thirty-four randomised controlled trials. For each meta-analysis, I reproduced the analysis using a random-eﬀects model (using formulas (1.9) to (1.13)) with a 95% prediction interval (using formula (2.2)) as well as calculating I 2 -statistic (using formula (1.7)). For 20 of the studies, from which 26 meta-analyses were extracted, I could directly calculate individual trial treatment eﬀects and its variance (log variance if the eﬀect-size of interest is a ratio). For these, the individual treatment eﬀects are calculated using 24
- 28. the following formulas depending on the relevant outcome of interest. We deﬁne the following a = Number of events in the treatment group b = Number of events in the control group NT = Total number of patients in the treatment group NC = Total number of patients in the control group c = NT − a d = NC − b Odds Ratio The odds ratio for trial k is given by 2 a·d b·c (3.1) 1 1 1 1 + + + . a b c d (3.2) YkOR = and has log variance ln(V ar(YkOR )) = A 95% conﬁdence interval for the odds ratio in the k-th trial is given by exp ln(YkOR ) ± 1.96 V ar(YkOR ) . (3.3) Relative Risk The relative risk for trial k is given by 2 YkRR = a · NC b · NT (3.4) and has log variance ln(V ar(YkRR )) = 1 1 1 1 + − − . a b NT NC A 95% conﬁdence for the relative risk in the k-th trial is given by 25 (3.5)
- 29. exp ln(YkRR ) ± 1.96 V ar(YkRR ) . (3.6) Risk Diﬀerence The risk diﬀerence for trial k is given by 2 YkRD = a b − NT NC (3.7) and has variance V ar(YkRD ) = a NT 1− a NT NT + b NC 1− b NC NC . (3.8) A 95% conﬁdence for the risk diﬀerence in the k-th trial is given by YkRD ± 1.96 V ar(YkRD ) (3.9) Hazard Ratio To calculate the Hazard Ratio for the k-th trial, we require the diﬀerence between the observed deaths and expected deaths (O − E) and the variance V ar(O − E). 15 YkHR = exp (O − E) V ar(O − E) (3.10) and has log variance ln(V ar(YkHR )) = 1 . V ar(O − E) (3.11) A 95% conﬁdence for the hazard ratio in the k-th trial is given by exp ln(YkHR ) ± 1.96 V ar(YkHR ) 26 . (3.12)
- 30. Extra Formulas For 6 of the studies, from which 10 meta-analyses were extracted, only the individual trial treatment eﬀects along with their 95% conﬁdence intervals were reported. For these studies, I couldn’t directly calculate the individual trial standard errors and therefore the standard errors are estimated using the following formulas. We let x− and x+ be the lower and upper bounds respectively of the 95% conﬁdence interval for θk . For eﬀect-sizes that require us to work on the log scale, i.e. odds ratios, relative risks and hazard ratios, the standard error in the k-th trial is calculated using the formula s.e. YkHR,RR,OR = 1 2 ln(x+ ) − ln(x− ) 1.96 , (3.13) For diﬀerences (continous outcomes), the standard error in the k-th trial is calculated using the formula s.e. YkRD = 3.2.3 1 2 x + − x− 1.96 . (3.14) Software I used the statistical software STATA v10.1 to perform a random-eﬀects meta-analysis with a 95% prediction interval on each meta-analysis that we included in the study. The software incorporates the formulas (1.7), (1.9) to (1.13), (2.2) and any of the relevant formulas from (3.1) to (3.12). All forest plots produced in this paper are created using STATA (see Appendix for STATA codes). 3.3 Results From 26 studies, I took 36 meta-analyses containing between three to thirty-four randomised controlled trials (median eight trials, IQ range seven trials) and reproduced each meta-analysis using a random-eﬀects model with a 95% prediction interval. The results of all 36 random-eﬀects meta-analyses with a 95% prediction interval are presented in the table in ﬁgure 3.2. 27
- 31. Figure 3.2: Main characteristics of studies included in the review (Note: Outcome of interest deﬁned as given by authors, HR = Hazard Ratio, OR = Odds Ratio, RD = ˆ Risk Diﬀerence, RR = Relative Risk, θ is the random-eﬀects summary estimate, 95% C.I. = 95% conﬁdence interval ,I 2 is percentage of heterogeneity down to real diﬀerences, τ 2 is estimate of between-study variance, 95% P.I. = 95% prediction interval) ˆ 28
- 32. I classiﬁed each study to the following groups; 1. Their 95% conﬁdence and prediction interval contained their null value 2. Their 95% conﬁdence and prediction interval excluded their null value 3. Their 95% conﬁdence interval excluded the null value but their 95% prediction included the null value For the ﬁrst type, I found 17 (47.2%) of the meta-analyses had their 95% conﬁdence interval contain their respective null values. For these meta-analyses, the 95% prediction interval will also contain the null value since the 95% conﬁdence interval is a subset of the 95% prediction interval. Focusing on these studies, 6 of these had only three trials which is the minimum required to calculate a prediction interval. In fact, 11 of these 17 meta-analyses had less than ten trials in their analysis which may explain why their 95% conﬁdence intervals contain their null value, since a randomeﬀects meta-analysis will have low power to detect signiﬁcant results when there are few studies in the analysis. 2 In study ID 15 30 , the meta-analysis contains only three trials (there were originally four trials but no events occurred in one of the trials so the trial was discarded from the analysis), yet there is a signiﬁcant amount of between-study heterogeneity as indicated by the large I 2 value of 49.4% (suggesting that almost half of the variation in treatment eﬀects is down to real diﬀerences) and τ 2 value of 0.3369. The study itself is primarily ˆ a randomised controlled trial looking at assessing whether granulocyte-macrophage colony stimulating factor (GM-CSF) administered as prophylaxis to preterm neonates at high risk of neutropenia reduces sepsis, mortality and morbidity. The authors also carried out a meta-analysis of their trial along with two other published randomised controlled trials to see if there is a treatment beneﬁt. Each trial estimated on odds ratio with an odds ratio < 1 indicated treatment is beneﬁcial. The authors used a ﬁxed-eﬀect model stating “there was no evidence of between-trial heterogeneity” yet the large τ 2 and I 2 values suggest otherwise so a random-eﬀects model would be better ˆ suited to analyse the data. I obtained a summary estimate of 0.84 (authors obtained 0.94) and a 95% conﬁdence interval of [0.32,2.17] (authors obtained [0.55,1.60]). In both cases, the 95% conﬁdence intervals included the null value so on average, there isn’t any evidence at 5% level that the treatment is beneﬁcial. The authors look to use subgroup analysis to analyse the data but a prediction interval can further explain the results in a way that acknowledges heterogeneity. A 95% prediction interval was calculated to be (0,12655.86]. All the results are presented in a forest plot in ﬁgure 3.3. The 95% prediction interval is extremely large in this case. The results occurs because we are using the t-distribution, which accounts for the uncertainty in τ 2 , with few studˆ ies which results in a large value of tk−2 as well as accounting for large between-study heterogeneity. When using a random-eﬀects meta-analysis, we make the assumption 29
- 33. Figure 3.3: Forest plot showing a meta-analysis of randomised controlled trials of GM-CSF for preventing neonatal infections 30 that each study is estimating a diﬀerent treatment eﬀect, if we have studies in the presence of substantial between-study heterogeneity, irrespective of how large they are, we have low power to detect signiﬁcant results. 2;5 Study ID 17 32 , a meta-analysis of three randomised controlled trials, also has a large 95% prediction interval given by (0,91064.69] but unlike study ID 15 30 , has no evidence ˆ of between-study heterogeneity suggested by I 2 and τ 2 values of 0. In this case, the large prediction interval is attributed to the uncertainty in the estimate of τ 2 since there are too few trials. In these cases, a Bayesian approach to calculating τ 2 may ˆ work better. 5;8 The studies that had more than 10 trials which had both their 95% conﬁdence and prediction intervals contain the null value tended to have narrower 95% conﬁdence intervals and apart from study ID 3c 18 , only slightly include their respective null value. For the second type, 9 (25%) meta-analyses had both their 95% conﬁdence and prediction interval exclude their respective null value. In these cases, the prediction interval remains signiﬁcant at the 5% level even after we have considered the whole distribution of eﬀects. Out of these 9 meta-analyses, 7 of there had their I 2 and τ 2 values to ˆ 2 be 0 (or very close to 0) and 1 other meta-analysis had an I value of 6.1% and τ 2 ˆ 30
- 34. value of 0.0027. In the case of these 8 meta-analyses, the 95% predictions intervals are only slightly wider than the 95% conﬁdence intervals. In the general case where a prediction interval slightly increases the width of a random-eﬀects conﬁdence interval and I 2 and τ 2 are 0 (suggesting no evidence of between-study heterogeneity), a comˆ mon eﬀect may be assumed since the impact of heterogeneity is negligible and the extra width in the prediction interval is only attributable in the uncertainty surround the estimate of τ 2 (which are 0 or very close to 0 in these cases). In study ID 11a 26 , the authors carried out two meta-analyses of individual patient data to investigate the eﬀect of adjuvant chemotherapy in operable non-small-cell lung cancer. The ﬁrst meta-analysis was observing the eﬀect of surgery and chemotherapy against surgery on survival by type of chemotherapy and the second was the eﬀect of surgery and radiotherapy and chemotherapy versus surgery and radiotherapy on survival by type of chemotherapy. Both meta-analyses were extracted for the review but the ﬁrst meta-analysis is the one of interest. The analysis included thirty-four randomised controlled trials each estimating a hazard ratio where a hazard ratio < 1 indicates survival better with surgery and chemotherapy. I calculated I 2 and τ 2 values ˆ to be 6.1% (authors calculated 4% and 0.0027 respectively) indicating little betweenstudy heterogeneity across the trials despite the trials diﬀering by number of patients, drug used, number of cycles, etc. The authors used a ﬁxed-eﬀect model to analyse the data and used χ2 test to investigate any diﬀerences in treatment eﬀects across the trials. Using a random-eﬀects meta-analysis, I obtained a summary estimate of 0.86 (authors also obtained 0.86), a 95% conﬁdence interval of [0.80,0.92] (authors obtained [0.81,0.92]) and 95% prediction interval of [0.75,0.97], the results are displayed in ﬁgure 3.4. The summary estimate suggests that on average, survival is better with surgery and chemotherapy compared to surgery alone. The 95% conﬁdence interval didn’t contain the null value and is entire < 1 so there is strong evidence that on average, survival better with surgery and chemotherapy. The authors acknowledge this and state along with their second meta-analysis “The results showed a clear beneﬁt of chemotherapy with little heterogeneity”, but is this always the case. The 95% prediction interval is also entirely < 1, so now having considered the whole distribution of eﬀects, we can say that chemotherapy surgery will increase survival when carried out in at least 95% of brand new individual study settings. I point out that the author’s results, using a ﬁxed-eﬀect meta-analysis, were very similar to my results using a randomeﬀects meta-analysis. Furthermore, the 95% prediction interval is only slightly wider than the 95% conﬁdence interval which indicates that the impact of between-study heterogeneity is small across all the trials and there maybe justiﬁcation for using a ﬁxed-eﬀect model. Despite this, a random-eﬀects model is still useful since it accounts for all uncertainty 5 . We’ve seen already how a prediction interval can be wide (e.g. Study ID 15 30 , Study ID 17 32 ) if there is uncertainty in the actual estimates regardless of whether there is evidence of between-study heterogeneity or not. 31
- 35. Figure 3.4: Forest plot showing a meta-analysis of randomised controlled trials assessing the eﬀect of surgery (S) and chemotherapy (CT) versus surgery alone 26 The 1 other meta-analysis that is yet unaccounted for is study ID 3d 18 . The authors are assessing the use of recombinant tissue plasminogen activator (rt-Pa) for acute ischaemic stroke. They had updated a previous systematic review by adding a new large randomised controlled trial to the analysis. The review contained four metaanalyses, all of which were extracted for the review but the meta-analysis of interest (study ID 3d) is looking at the eﬀect or rt-Pa on systematic intracranial haemorrhage (SICH) within 7 days on patients who have suﬀered an acute ischaemic stroke. The 32
- 36. analysis included twelve randomised controlled trials each estimating an odds ratio where an odds ratio < 1 indicates rt-Pa reduced development of SICH. The trials used in this study diﬀered by dosage, ﬁnal follow-up time, stroke type etc, which has resulted in us obtaining large I 2 and τ 2 values of 43.4% and 0.2320 respectively. ˆ The authors used a standard ﬁxed-eﬀect model and calculated heterogeneity using χ2 -statistic if there is presence of substantial heterogeneity. Given the large values of I 2 and τ 2 and observing the treatment eﬀect as well as taking into account the diﬀerˆ ences between the trials, a random-eﬀects meta-analysis seems more appropriate. So, using a random-eﬀects meta-analysis, I obtained a summary estimate of 3.93 (authors obtained 3.72), 95% conﬁdence interval of [3.44,6.35] (authors obtained [2.98,4.64]) and a 95% prediction interval of [1.18,13.10], the results are displayed in ﬁgure 3.5. Figure 3.5: Forest plot showing a meta-analysis of randomised controlled trials assessing the eﬀects of SICH within 7 days (treatment up to 6 hours) 18 The summary estimate suggests that on average, the odds of developing SICH in the treatment group is 3.93 times the odds of developing SICH in the control group. The 95% conﬁdence interval didn’t contain the null value and is entirely > 1 so provides 33
- 37. strong evidence that on average, the treatment is more likely to increase the odds of SICH but it doesn’t indicate whether it will be always be the case. The 95% interval is entire > 1 suggesting that the treatment will increase the odds of SICH when carried out in at least 95% of brand new individual settings. Like study ID 11a 26 , the 95% prediction interval remains signiﬁcant but unlike study ID 11a, the 95% prediction interval in study ID 3d was much wider than its 95% random-eﬀects conﬁdence interal. Here the impact of between-study heterogeneity is large, this can also be seen by the large I 2 ad τ 2 values which result in the large width of the 95% ˆ prediction interval. Like study ID 11a, the 95% prediction interval remains signiﬁcant but unlike study ID 11a, the 95% prediction interval in study ID 3d is much wider than its 95% random-eﬀects conﬁdence interval. Here the impact of between-study heterogeneity is large (in study ID 11a, the impact is low), this can also be seen by the large I 2 and τ 2 values. The impact is such that in some cases, the odds of SICH, ˆ when rt-Pa is given, could be as low as 1.18 times the odds in the control but could be as high as 13.1 times the odds in the control group. The authors, by using a ﬁxed-eﬀect method, fail to acknowledge the potential eﬀects of heterogeneity. They report that “42 more patients were alive and independent, 55 more were alive with a favourable outcome at the end of follow up despite an increase in the number of early symptomatic intracranial haemorrhages and early deaths. Since the odds of SICH in the treatment group could be as high as 13.1, further research could be carried out to identify scenarios when this may occur since this could reduce the number of patients that will have favourable results come the end of follow up. For the third type, 10 (27.8%) of the meta-analyses had their 95% conﬁdence intervals exclude the null value but had their 95% prediction interval include the null. In these cases, the 95% prediction intervals are not signiﬁcant at the 5% level after we have considered the whole distribution of eﬀects. Most of the studies, apart from two, tended to have a signiﬁcant amount of between-study heterogeneity based on the I 2 value ranging from 22.3% to 62.7% and τ 2 values ranging from 0.022 to 0.098. Two ˆ 2 2 studies had I value and τ values of 0. These were study ID 9 24 , which had 3 trials ˆ and justiﬁably use a ﬁxed-eﬀect method, and study ID 16b 31 , which had 9 trials, used a random-eﬀects meta-analysis but do exercise caution since there are few trials which can result in the summary estimates carrying large uncertainty. In study ID 20 35 , the authors are looking at the eﬃcacy of probiotics in prevention of acute diarrhoea . They carried out a meta-analysis of thirty-four randomised controlled trials each estimating a relative risk with a relative risk < 1 indicating the probiotic has a beneﬁcial eﬀect. The authors used a random-eﬀects meta-analysis acknowledging the potential eﬀects of heterogeneity since the studies diﬀered in many such as study setting, age grow, follow-up duration, probiotic administered, dosage etc which resulted in a large I 2 value of 62.7% and τ 2 value of 0.0980. I obtained ˆ identical results to the authors, a summary estimate of 0.65 and a 95% conﬁdence interval of [0.55,0.78]. Additionally, I obtained a 95% prediction interval of [0.34,1.27], 34
- 38. the results are displayed in ﬁgure 3.6. Figure 3.6: Forest plot of a meta-analysis of randomised controlled trials assessing the eﬀects of probiotics on diarrhoeal morbidity 35 The summary estimate of 0.65 indicates on average, the risk of diarrhoea morbidity is 0.65 times the risk of diarrhoea morbidity in the placebo group. The 95% conﬁdence interval is entirely < 1 providing strong evidence that on average, the probiotics are beneﬁcial but is this always the case. The authors acknowledge heterogeneity ﬁrst by using a random-eﬀects model and then by carrying out a subgroup and stratiﬁed 35
- 39. analysis by assessing the eﬀect of age, setting of trial, type of diarrhoea, probiotic strains used, formulation of probiotics administered, inﬂuence of setting and quality score of trials. A more formal way of acknowledging heterogeneity is to consider a 95% prediction interval which I calculated to be [0.34,1.27]. This interval now contains the null value and contains values > 1, so although on average the use of probiotics are beneﬁcial, it may not always be the case in a brand new individual setting, in fact in some cases it may be harmful and further research is required to identify these scenarios. In study 23 38 , the authors are looking at the eﬃcacy and safety of electroconvulsive therapy in depressive disorders. They carried out a meta-analysis of twenty-two randomised controlled trials each estimating a standardised risk diﬀerence where a risk diﬀerence > 0 favoured unilateral ECT and a risk diﬀerence < 0 favoured bilateral ECT. The authors reported both ﬁxed-eﬀect and random-eﬀects results and acknowledge heterogeneity since the trials diﬀer by dosage, methods of administration etc and this can be seen by the I 2 value of 24.00% and τ 2 value of 0.0286. I obtained ˆ slightly diﬀerent results to the authors when using a random-eﬀects meta-analysis, a summary estimate of -0.34 (authors obtained -0.32) and a 95% conﬁdence interval of [-0.49,-0.20] (authors obtained [-0.46,-0.19]). I also obtained a 95% prediction interval of [-0.73,0.04], the results are displayed in ﬁgure 3.7. The summary estimate suggests that on average, out of a 100 patients, 34 more patients had favourable results in the bilateral group compared to the unilateral group. The 95% conﬁdence interval is entirely < 0 providing strong evidence that on average, the bilateral group is better but is this always the case. The authors acknowledge heterogeneity by ﬁrstly reporting random-eﬀects results and then by carrying out a meta-regression analysis but considering a prediction interval would be a more formal way of acknowledging heterogeneity. The 95% prediction interval is [-0.73,0.04] which now contains the null value 0 and slightly exceeds 0. This suggests that although on average the bilateral group is better, in a brand new individual study setting, the bilateral group may not be better and further research is required to identify such scenarios. 3.4 Discussion From 26 studies that entered my review, 36 meta-analyses were extracted and each reproduced using a random-eﬀects model with a 95% prediction interval. My aim was to see whether or not these intervals had a signiﬁcant impact on the conclusions of these studies. Most of the studies that I found reported a summary estimates (ﬁxed or random-eﬀects) along with a 95% conﬁdence interval and carried out some type of analysis to assess heterogeneity. An observation worth noting is that none of the studies post 2005 mentioned the idea of predictions in the context of meta-analysis. 36
- 40. Figure 3.7: Forest plot of a meta-analysis of randomised controlled trials assessing the eﬀect of bilateral versus unilateral electrode placement on depressive symptoms 38 Papers by Ades et al. 8 and Higgins et al. 5 set the foundations for the use of prediction intervals in traditional and Bayesian meta-analysis and how presenting it can describe the extent of heterogeneity, how the true individual treatment eﬀects are distributed about the random-eﬀects summary estimate as well as giving a range for which the true treatment eﬀect in an individual brand new study setting lies within. 2;5 3.4.1 Principal Findings I found that 17 (47.2%) of the 36 meta-analyses had their 95% conﬁdence interval contain the null value. In these cases, the average eﬀect across the trials is not significant at the 5% level and the 95% prediction interval will also include the null value. Presenting a 95% prediction interval in these cases is still useful since it helps describe 37
- 41. the distribution of eﬀects across the studies given there is between-study heterogeneity. The other 19 (52.8%) meta-analyses had their 95% conﬁdence interval exclude the null values. In these cases, the average eﬀect is signiﬁcant at the 5% level, the aim is to see how many of their 95% predictions intervals now include the null value. I found that 9 of the meta-analyses had their 95% prediction interval exclude the null value whilst the other 10 included the null value. In terms of clinical practice, the prediction interval excluding the null indicates that in 95% of the times the treatment is applied in brand new study settings, the treatment will be beneﬁcial/worse which is much more useful to clinicians than just reporting the average eﬀect and the uncertainty around it. If the prediction interval included the null, then although the average effect is beneﬁcial/worse, in some brand new individual study settings, the eﬀect may be worse/beneﬁcial. Again, this is much useful to clinicians and researchers since it reveals the impact of heterogeneity and can motivate further research to identify such cases. Another way of discussing our results is to consider the size of heterogeneity across the meta-analyses. I reiterate that describing heterogeneity is a key motivation for a prediction interval. If heterogeneity wasn’t a problem, then we could use a ﬁxed-eﬀect model in all cases but even the slightest diﬀerences between studies must be considered. 2 I found 12 meta-analyses had no evidence of between-study heterogeneity (I 2 and τ 2 values of 0), only in two of these cases 20;26 did they have more than ten trials. ˆ In many of these cases, the authors would tend to use a ﬁxed-eﬀect model but since there are few studies, we have low power to detect heterogeneity and therefore there may be uncertainty around I 2 and τ 2 values. 2 A common-eﬀect should be assumed ˆ if there is no evidence of between-study heterogeneity and the 95% conﬁdence and prediction intervals are close suggesting that the impact of heterogeneity is negligible and the uncertainty around the parameters are low (e.g. Study ID 11b 26 ). In some cases, there may seem to be no evidence of heterogeneity but if there are few studies, the uncertainty around τ 2 can be large resulting in wide prediction intervals (e.g. ˆ 32 Study ID 17 ). The other 24 meta-analyses had evidence of between-study heterogeneity (I 2 ranging from 0.30% to 62.90% and τ 2 ranging from 0.0001 to 0.3369). Whilst the randomˆ eﬀects model wasn’t always used in these cases, in most of these cases, the authors did carry out some analysis of heterogeneity (e.g. subgroup analysis, meta regression etc). The problem that occurs is that if there are few trials in the analysis, the power to detect sources of heterogeneity is low and therefore the analysis lacks precision. 2;11 . A prediction interval when calculated with few studies will be large (e.g. study ID 15 30 and may not be useful from a clinicians point of view since the range of eﬀects is so wide. On the other hand, in study ID 3d 18 , the 95% prediction interval is large yet was entirely above the null value, so even though there is uncertainty on what the eﬀect could be in an individual study setting, we know that 95% of the times the treatment will have a negative eﬀect (in that case), we just don’t know how bad of 38
- 42. an eﬀect it could be. From a researchers point of view, large prediction intervals can still have meaning since it reveals the uncertainty surrounding the parameters and therefore may just indicate that more trials, further research or other information (incorporate a Bayesian approach 5;8 ) should be required whereas a 95% conﬁdence interval only tells us the average eﬀect is signiﬁcant/insigniﬁcant but this result may be imprecise due to the lack of trials. 3.4.2 Limitations It is important that potential limitations of this review are acknowledged. I decided to only use the Lancet database to search for studies since it is regarded as one of the world’s most respected medical journal. I expected each study to be of high standard in terms of methodology and conduct. Unfortunately, I cannot be sure that this is the case, ﬂaws in procedure at trial level and meta-analysis level can result in error prone results and may not reﬂect the true performance of the intervention. 42 In these cases, the prediction interval will be wider since it mixes heterogeneity caused by real diﬀerences with heterogeneity as a result of methodological errors. 7 I also only included meta-analyses of randomised controlled trials since such trials cancel the effects of known and unknown confounding factors. I did come across meta-analyses of non-randomised trials (mainly observational studies) but excluded them since they are more inﬂuenced by confounders. Whilst randomised controlled trials are held in higher regard relative to observational studies, the jury remains out on whether we would take randomised trials of low or even average quality over high quality observational studies. Stroup et al. 44 “inclusion of suﬃcient detail to allow a reader to replicate meta-analytic methods was the only characteristic related to acceptance for publication” suggesting that high quality observational studies could be considered. I could’ve extended our search beyond the Lancet to other databases but I felt the Lancet already covered a wide variety of studies. There are also technical limitations to the review that must be addressed. Whilst there was a criteria that every meta-analysis must have at least three randomised controlled trials, with few studies, assumptions made when calculating a prediction interval may become violated. We assume a normal distribution but with few studies, this may be an inappropriate choice. 5 When considering the true treatment eﬀect of a brand new study, I assume the population in this new study is “suﬃciently similar” to those already covered in the analysis. If we have few studies, we fail to cover a suﬃcient range of populations resulting in a wider prediction interval accounting for large uncertainty. 2;5 I also wasn’t speciﬁc on what types of outcomes we allowed into the review. There is evidence that suggests that certain biases are more likely to arise when subjective outcomes (e.g. favourable outcome (Study ID 3d 18 , poor outcome (Study ID 2 17 or any outcome that requires human input). 45 It may have been more prudent to only consider outcomes such as survival, mortality or continuous outcomes that have no 39
- 43. chance of being inﬂuenced by an external source. 3.4.3 Comparison with other studies A related study complied by Graham et al. 14 explored prediction intervals on metaanalysis. They performed a meta-epidemiological study of binary from meta-analyses published between 2002 to 2010. Their study included 72 meta-analyses from 70 studies each containing between 3-80 studies and for each, they calculated a randomeﬀects meta-analysis incorporating DerSimonian and Laird 12 method and calculated traditional and Bayesian 95% prediction intervals for odds ratios and risk ratios. They found that 50 out of 72 meta-analyses had their 95% random-eﬀects conﬁdence interval for odds ratios exclude their null value, of these, 18 had their 95% prediction intervals exclude the null. They also found that 46 out of the 72 meta-analyses had their 95% random-eﬀects conﬁdence interval for risk ratios exclude the null value, of these, 19 had their 95% prediction intervals exclude the null. They concluded “meta-analytic conclusions may be appropriately signaled by consideration of initial interval estimates with prediction intervals” but also stress that increasing heterogeneity can result in wide predictions intervals and caution must be taken when writing conclusions on a meta-analysis. 14 Comparing my results to theirs, I found less meta-analyses had their 95% prediction interval include the null when their 95% conﬁdence interval had excluded theirs. Their study was larger than mine and they also were able to directly calculate odds ratios and relative risks for each meta-analysis. I worked out the eﬀect size according to the authors of the studies and in some cases, couldn’t directly work out the summary estimate since the relevant data wasn’t available, only the individual treatment eﬀects along with their 95% conﬁdence intervals were reported. 3.4.4 Final Remarks and Implications Perhaps only looking at focusing on cases where prediction intervals include the null when their corresponding 95% conﬁdence intervals didn’t may somewhat deviate away from why a prediction interval is useful. Since we were able to apply a 95% prediction interval to all cases, whether the analysis had high between-study heterogeneity, no between-study heterogeneity, whether the analysis had few or large trials, I was able to describe the results of random-eﬀects meta-analysis more accurately since we are considering the whole distribution of eﬀects, even if what I am deducing is that the authors require more trials or further research/information in cases where there are few studies. In the case where there is no evidence of between-study heterogeneity (indicated by I 2 , τ equal to 0), if we used a random-eﬀects model with a predicˆ tion interval, if the prediction interval is signiﬁcant wider than the random-eﬀects 40
- 44. conﬁdence interval, then this suggests there is uncertainty amongst the parameters (e.g. lack of power if there are few studies). If the prediction interval is fairly close to the conﬁdence interval, then this suggests a common eﬀect may exists since we have considered the whole distribution of eﬀects and the impact of heterogeneity is negligible. If there is evidence of betweens-study heterogeneity, then a prediction interval can reveal the impact of between-study heterogeneity which is useful to clinicians/researchers regardless if the average eﬀect is signiﬁcant. I therefore believe a 95% prediction interval should be presented in every random-eﬀects meta-analysis to enhance the interpretation of its results, but I stress the need for the analysis to have a suﬃcient number of good quality unbiased randomised controlled trials. 41
- 45. Chapter 4 Prediction intervals in Meta-Epidemiological studies It seems widely agreed that systematic reviews which contain a meta-analysis of randomised controlled trials provide the strongest and most reliable evidence of the eﬀects of health care interventions since they use systematic and explicit methods to summarise all the evidence to answer a research question of interest. 1;42;46 Unfortunately, they are not impervious to bias, if the meta-analysis is biased or includes biased trials; the results from a meta-analysis will incorporate these biases resulting in either an over/underestimation of the summary treatment eﬀect which can lead to misleading conclusions of how well the intervention works. 42;46 In the process of systematic reviews, when the relevant trials are searched for, we must make sure that al oﬂ the evidence (published and unpublished) is searched for so we can get the most accurate results. There is evidence that supports the fact that published studies are more likely to reﬂect a statistical signiﬁcant results and more likely to report larger treatment eﬀects and moreover, published studies are more likely to be used in a systematic review and therefore a meta-analysis, which can lead to a biased summary treatment eﬀect in a meta-analysis (publication bias). 2;47 Furthermore, randomised controlled trials themselves are in danger of bias if there are imperfections in their methodological properties, i.e. there wasn’t proper allocation concealment, lack of blinding etc. 46 If we were to calculate a prediction interval in the presence of bias, heterogeneity accounting for real diﬀerences mixes with heterogeneity caused by these bias resulting in a much wider prediction interval. 7 Other biases that can arise are citation bias, language bias, cost bias etc. 2 The fundamental idea here is that bias must be assessed to make the conclusions of a meta-analysis more robust, failure to acknowledge it can result in misleading results. 42
- 46. 4.1 Meta-Epidemiological Study A way in which we can inspect bias is to carry out a meta-epidemiological study which assesses the inﬂuence of trial characteristics on the treatment eﬀect estimates in a meta-analysis. 43;42;46 A meta-epidemiological study will assess a speciﬁc trial characteristic by carrying out a meta-analysis on summary eﬀects from a collection of meta-analysis (essentially a ’meta-analysis of meta-analyses’). 43;42;46 Like a normal meta-analysis, meta-epidemiological study should describe the distribution of all evidence, describe any heterogeneity between the meta-analyses, inspect associated risk factors and identify and control bias. The ﬁrst time meta-epidemiology surfaced was in an editorial in the BMJ by David Naylor 48 , in 1997, where cautions are raised concerning the summary eﬀect of a metaanalysis. The author mentions how meta-analyses can generate “inﬂated and unduly precise” estimates if biases exist. He also refers to evidence stating statistically significant outcomes were more likely to be published than non-signiﬁcant studies and adds “readers need to examine any meta-analyses critically to see whether researchers have overlooked important sources of clinical heterogeneity among the included trials”. In 2002, meta-epidemiology is deﬁned, by Sterne et al. 46 , as a statistical method to “identify and quantify the inﬂuence of study level characteristics”. In 2007, the method has been generalised in a systematic review conducted by OARSI (Osteoarthritis Research Society International). 49 This has resulted in many published meta-epidemiological studies which can be founded on the internet such as the BMJ website. These types of studies have provided strong evidence that ﬂaws in trial characteristics lead on average to exaggeration of intervention eﬀect estimates and in turn increase heterogeneity. 42 4.2 Prediction Intervals in Meta-Epidemiological Studies The aim of this chapter is to apply a 95% prediction interval to meta-epidemiological studies. Meta-epidemiological studies will use either a ﬁxed-eﬀect or a random-eﬀects model and report a summary estimate with a 95% conﬁdence interval. They still, however, need to describe the extent of heterogeneity that exits across all the evidence so the inclusion of a prediction interval can help formally describe it. We searched for meta-epidemiological studies on the website of the British Medical Journal (www.bmj.com). We used the advanced search toolbar and used the keyword “META EPIDEMIOLOGICAL” in text, abstract and title in all articles in all years. Any meta-epidemiological study looking at a trial characteristic was eligible as long as we are able to carry out their meta-analysis ourselves. We took 4 studies at random and carried out their meta-epidemiological meta-analysis using a random-eﬀects meta43
- 47. analysis with a 95% prediction interval using the formulas (1.9 to 1.13) and (2.2). In all 4 of the examples we use, we estimated the standard errors using the formulas (3.13) or (3.14) depending on outcome of interest, since we couldn’t work them out directly. 4.2.1 Example 1 A trial characteristic that can inﬂuence the estimates of individual trial treatment eﬀect is the status of the study centre, i.e. is it carried out in a single centre or in multicentres. Bafeta et al. 50 carry out a meta-epidemiological study in the aim to compare estimates of intervention eﬀects between single centre and multicentre randomised controlled trials on continuous outcomes. They address a previous study that concluded the eﬀect of interventions using binary outcomes are larger in single centre randomised controlled trials compare to multicentre ones 51 and address a paper by Bellomo et al. 52 who state single centre trials often contradict multi centre trials. The authors included 26 meta-analyses with a total of 292 randomised controlled trials (177 in single centres and 115 in multicentres) with continuous outcomes that were published between January 2007 to January 2010 in the Cochrane database for systematic reviews (which they state as having “high methodological quality”). They ignored meta-analyses of non-randomised trials, IPD meta-analyses and meta-analyses where all trials were only single centre or only multicentres and any meta-analysis that had less than 5 randomised controlled trials. They used the risk of bias tool as recommended by the Cochrane Collaboration 3 to assess risk of bias from individual reports for each trial. For each meta-analysis, they used a random-eﬀects metaanalysis incorporated DerSimonian and Laird estimate for τ 2 to combine treatment eﬀects across the trials and assessed heterogeneity using χ2 and I 2 . The authors then estimate a standardised mean diﬀerence between single centre and multicentre trials using a random-eﬀects meta-regression to incorporate potential heterogeneity between trials. They then synthesised these using a random-eﬀects model and used I 2 , Qtest to assess between-meta-analysis heterogeneity. A standardised mean diﬀerence < 0 indicates that single centre trials, on average, showed larger treatment eﬀects than multicentre trials. They calculated a summary estimate of -0.09 with a 95% conﬁdence interval of [-0.17,-0.01] with low between-meta-analysis heterogeneity (I 2 and τ 2 values of 0). We obtained the same random-eﬀects summary estimate of -0.09 ˆ and same 95% conﬁdence interval of [-0.17,-0.01], additionally we calculated a 95% prediction interval of [-0.18,0.00]. The results are shown in the forest plot below in ﬁgure 4.1. The summary estimate (-0.09) indicates that on average, single centre trials produced a larger estimate of the intervention eﬀect than multicentre trials. Since the 95% conﬁdence interval ([-0.17,-0.01]) is entirely < 0, there is strong evidence that on average, single centre trials show a larger eﬀect than multicentre trials looking at the 44
- 48. Figure 4.1: Forest plot of a meta-epidemiological analysis assessing the diﬀerence in intervention eﬀect estimates between single centre and multicentre randomised controlled trials 50 same intervention but is this always the case. The authors report “on average single centre trials with continuous outcomes showed slightly larger intervention eﬀects than multicentre” and acknowledge between-meta-analysis heterogeneity and risk of bias by using subgroup and sensitive analysis but a 95% prediction interval can describe all the uncertainty more formally. The calculated 95% prediction interval ([-0.18,0.00]) now includes the null value 0 but doesn’t exceed it and is only slightly wider than the 95% random-eﬀects conﬁdence interval revealing the impact of heterogeneity is low. We can say, that after considering the whole distribution of eﬀects, in at least 95% of the times, the eﬀect in a multicentre will never be strictly larger than the corresponding eﬀect in a single centre but we cannot rule out that the eﬀect might be the same. We mirror the authors views that further research is needed to investigate 45
- 49. potential causes of these diﬀerences. 4.2.2 Example 2 Nuesch et al. 53 carried out a meta-epidemiological study to examine whether or not excluding patients from the analysis of randomised controlled trials are associated with biased estimates of treatment eﬀects and whether or not it causes heterogeneity between trials. They address evidence that departure from protocol and losses to follow-up in randomised controlled trials can lead to exclusion of patients from the ﬁnal analysis, and such handling of these patients lead to treatment eﬀects that diﬀer methodically from the true treatment eﬀects. 54;55 Such bias is termed attrition bias 56 or selection bias and this study aims to see how it aﬀects the summary eﬀects in a meta-analysis and does it increase between-study heterogeneity. The authors include 14 meta-analyses, with a total of 167 trials (39 with all patients in the analysis, 128 where some patients excluded). Eligible meta-analyses were those of random/quasi-randomised trials in patients with osteoarthritis of the knee or hip and reported non-binary patient reported outcome (e.g. pain intensity) which assessed any intervention with placebo or a non-intervention control. If a meta-analysis only included trials that had patient exclusions or had trials where there were no exclusions, it is ignored. Within each meta-analysis, the authors used a random-eﬀects meta-analysis to calculated a summary eﬀect for trials with and trials without exclusions before deriving diﬀerences between them. A diﬀerence of < 0 suggests trials with exclusions have a more beneﬁcial treatment eﬀect. These diﬀerences were then synthesised using a random-eﬀects meta-analysis for which the authors state “fully accounted for variability in bias between meta-analysis” and they estimate τ 2 as a measure of between-study heterogeneity. They obtained a summary estimate of 0.13 with a 95% conﬁdence interval of [-0.29,0.04] with what they consider as high between-meta-analysis heterogeneity indicated by τ 2 value of 0.07. We obtained the ˆ same random-eﬀects summary estimate of -0.13 but a diﬀerent conﬁdence interval of [-0.31,0.05] noticing an error in the 3rd meta-analysis in the forest plot presented in the paper. We also obtained an I 2 value of 78.2% and a slightly larger τ 2 value of ˆ 0.0811 as well as a 95% prediction interval of [-0.78,0.52].The results are shown in the forest plot below in ﬁgure 4.2. The summary estimate (-0.13) indicates that on average, trials with exclusions produce a larger estimate of the treatment eﬀect compare to those without exclusions. The 95% conﬁdence interval ([-0.31,0.05]) contains the null value so the average isn’t signiﬁcant (nor is the authors 95% conﬁdence interval). However, both ours and the authors 95% conﬁdence interval suggests there is evidence (albeit non-signiﬁcant at 5% level) that on average, patient exclusion leads to more beneﬁcial treatment eﬀects. This may have lead the authors to report that “excluding patients from the analysis of randomised trials often resulted in biased estimates of treatment eﬀects, but the 46
- 50. Figure 4.2: Forest plot of a meta-epidemiological analysis assessing the diﬀerence in eﬀect sizes between trials with and without exclusions of patients from analysis 50 extent and direction of bias remained unpredictable in a speciﬁc situation” and recommend “results from intention to treat analysis should always be described in reports of randomised trials”. They acknowledge the large between-meta-analysis heterogeneity by carrying out stratiﬁed analysis but a 95% prediction interval can reveal the full uncertainty around the summary estimate. The calculated 95% prediction interval ([-0.78,0.52]) is fairly wide since it is accounting for the large between-meta-analysis heterogeneity (indicated by I 2 and τ 2 values of 78.2% and 0.0811 respectively). I ˆ can say that after considering the whole distribution of eﬀects, although on average it seems as though studies with exclusions lead to more beneﬁcial treatment eﬀect, analysis where the trials have no patient exclusions could quite easily have a more beneﬁcial treatment eﬀect compared to those where there are exclusions. Here, the impact of heterogeneity is much more evidential than the 95% conﬁdence interval and further reveals in a brand new situation, the chance of a trial with exclusion being better than a trial without exclusions is unpredictable. Possible reasons for such unpredictability could be down to the fact the analysis had a combined 39 trials without 47

Be the first to comment