An illustrated guide to the methods of meta analysi
Upcoming SlideShare
Loading in...5
×
 

An illustrated guide to the methods of meta analysi

on

  • 878 views

 

Statistics

Views

Total Views
878
Views on SlideShare
878
Embed Views
0

Actions

Likes
0
Downloads
30
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An illustrated guide to the methods of meta analysi An illustrated guide to the methods of meta analysi Document Transcript

  • Journal of Evaluation in Clinical Practice, 7, 2, 135–148An illustrated guide to the methods of meta-analysisAlexander J. Sutton BSc MSc1 Keith R. Abrams BSc MSc PhD2 andDavid R. Jones BA MSc PhD CStat CMath DipTCDHE31 Lecturer in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK2 Reader in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK3 Professor of Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UKCorrespondence AbstractMr Alex J Sutton Meta-analysis is now accepted as a necessary tool for the evaluation ofDepartment of Epidemiology and Public Health health care. Such analyses have been carried out in virtually every area ofUniversity of Leicester medicine to evaluate a wide spectrum of health care interventions and poli-22-28 Princess Road West cies. This paper has three broad aims: (1) to describe the basic principles ofLeicester LE1 6TP meta-analysis, using a meta-analysis of interventions intended to reduceUK hospital re-admission rates for illustration; (2) to consider threats to theKeywords: Bayesian methods, internal validity of meta-analysis, and the measures which can be taken tohospital discharge, meta-analysis, minimize their impact; and (3) to present an overview of more specialistmethods, re-admission, review and developing methods for synthesizing data, with the intention of out-Accepted for publication: lining the directions meta-analysis may take in the future.The methods used22 July 2000 to synthesize studies, which take ‘weighted averages’ of effect sizes have been refined to a high degree, while the methods for dealing with threats to the validity of meta-analyses such as publication bias, and variations in quality of the primary studies, are at a less advanced stage. However, many consider this standard ‘weighted average’ approach to meta-analysis not to be ‘state of the art’ in at least some situations, where the use of more sophisticated methods, generally to explain variation in estimates from different studies and synthesize a broader base of evidence, would be advantageous. Currently, approaches which attempt to do this are mainly still in the experimental stage and, unfortunately, ideas which sound natural and appealing are often difficult to implement in practice. Clearly, it will be some time before they are used routinely, but significant steps have been made. Since different studies are carried out using different1 Introduction populations, different designs and a whole range ofMeta-analysis is now accepted as a necessary tool other study-specific factors, it has been suggested thatfor the evaluation of health care. Such analyses have combining them will produce an estimate that hasbeen carried out in virtually every area of medicine, broader generalizability than any single study. Addi-to evaluate a wide spectrum of health-care interven- tionally, it may be possible to explain the differencestions and policies. The primary aim of many meta- between results from individual studies by carryinganalyses is to produce a more accurate estimate of the out a meta-analysis. Such an assessment may eveneffect of a particular intervention, or group of inter- provide further insight into the intervention, andventions, than is possible using only a single study. develop our understanding of how it works.© 2001 Blackwell Science 135
  • A.J. Sutton et al. Concurrent with the explosion in the use of meta- tive has produced a checklist addressing the qualityanalysis is the continued development and refine- of reporting of meta-analyses (QUORUM) (Moherment of the methods used to carry out such analyses. et al. 1999b). This statement is in the same spirit asThis is an important endeavour, because the science the CONSORT statement for reporting randomizedof meta-analysis is still in its infancy, and in the past clinical trials (RCTs) (Begg et al. 1996) and is recom-over-simplistic methods have led to misleading mend as reading for those preparing reports of meta-conclusions (Hunt 1997). A systematic review of analyses of RCTs.methodology for meta-analysis carried out by theauthors (Sutton et al. 1998) informed the writing of 2 The synthesis of estimates of effectivenessthis paper, and is recommended further reading for from multiple primary studiesmore technical details on the material presentedhere. The reader should note, however, that several This section focuses on pooling results from a numberimportant developments which are noted here have of studies investigating the relative effectiveness of anbeen published in the short time since the review was intervention. Often, meta-analyses of this sort includewritten, confirming the speed with which this field only RCTs, typically with two arms – one arm receiv-continues to develop. ing experimental treatment and the other control, This paper has three broad aims: (1) to describe the placebo or standard treatment. (The issue of variablebasic principles of meta-analysis using a worked quality of studies, and the synthesis of studies withexample; (2) to consider the threats to the validity of different designs is considered in sections 3 and 4,meta-analysis and the measures which can be taken respectively). Data from a meta-analysis of interven-to minimize their impact; and (3) to present an tions intended to improve the process of hospital dis-overview of more specialist and developing methods, charge of older people, published elsewhere (Parkerwith the intention of outlining the directions meta- et al. 2001), is used to illustrate the methods. Thirtyanalysis may take in the future. The term ‘meta- two-arm RCTs are included in the meta-analysis, andanalysis’ is used to describe different aspects of the outcome focused on here is the re-admission rateresearch synthesis by different people. In some con- to hospital following discharge. In the remainder oftexts it is used to indicate the whole review process, this section the principal ideas involved in performingincluding aspects such as literature searching and a meta-analysis are explained and, where possible,data extraction, as well as the statistical combination the calculations required are reproduced to aidof quantitative results. We prefer to use the term ‘sys- understanding. In practice, the use of computer soft-tematic review’ to indicate the whole review process, ware greatly facilitates the analyses required. Therestricting the term ‘meta-analysis’ to describe the meta-analysis capabilities of many common statisticalsynthesis of quantitative data from multiple studies. analysis packages are limited; however, muchAlthough many recent advances in pre-synthesis specialist software has been developed recentlyreview methods have been made, such as the devel- (Sutton et al. 2000b; Sterne et al. 2001).opment of sophisticated searching methods (Suttonet al. 1998; Dickersin et al. 1994), this paper focuses Calculation of an effect size for each studysolely on aspects of quantitative data synthesis, ormeta-analysis. [Note: very often a systematic review Broadly speaking, quantitative outcomes from anywill include a meta-analysis; however, if no quantita- study can be classified as belonging to one of threetive data are available from the primary reports, or data types: (i) binary, e.g. often indicating the pres-that which is available is deemed too heterogeneous ence or absence of the event of interest in eachto be meaningfully combined, then only a narrative patient; (ii) continuous, where outcome is measureddescription of the studies may be carried out (Sutton on a continuous scale, e.g. this could be change inet al. 1998).] Guidelines for good practice for the pre- blood pressure, etc.; or (iii) ordinal, where outcomesynthesis aspects of systematic reviews have been is measured on an ordered categorical scale, e.g. adescribed comprehensively elsewhere (Deeks et al. disease severity scale, where a patient can be classi-1996; Oxman 1996). Very importantly, a recent initia- fied as belonging to one of several distinct categories.136 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
  • Meta-analysis methodsThe approaches used to combine either binary or calculated by dividing the RRs in the treatment andcontinuous outcomes are often similar, while ordinal control arms, 0.036/0.162, which produces an RRR ofdata is somewhat more complex and requires spe- 0.222. This RRR is less than one, which indicates thecialist methods, discussed elsewhere (Whitehead & re-admission rate is lower in the treatment arm, sug-Jones 1994). gesting that the intervention is beneficial. In this Table 1 provides a sample of the data extracted instance the estimated effect is large (a long wayfrom reports of 30 RCTs to be included in the meta- from 1). The RRs for each arm are provided inanalysis (for a list of references for these RCTs see columns 5, 8 and the RRR in column 9.the original report (Parker et al. 2001) – numbers Although the RRR is the measure of interest, dueused to identify these RCTs in this report are pro- to theoretical statistical considerations (includingvided here in the final column of Table 1). Columns improved approximate normality), a natural loga-three and six provide the number of patients ran- rithm transformation is used (ln(RRR)) for thedomized to the experimental and control arms of purpose of combining studies via a meta-analysis.each study, respectively. [Note: analysis should (Fleiss 1994) The pooled result can be back-trans-usually be calculated on the basis of intention to treat formed by taking the exponential of the pooled(Hollis & Campbell 1999) – if the analysis in the orig- ln(RRR) (e1n(RRR)) afterwards, to convert the answerinal study report was not performed using this back to the RRR scale, allowing easier interpreta-method it may still be possible to extract the cor- tion. The ln(RRR) estimates for each study are givenrect figures for the purposes of the meta-analysis.] in column 10 of Table 1.Columns four and seven indicate the number of re- A further value, the standard error (SE) of theadmission episodes. Note that an individual can have ln(RRR), is required for the meta-analysis calcula-multiple re-admissions; for example, the new inter- tion. The SE gives an indication of the degree of pre-vention arm of study 8 included 142 patients, while cision to which each study estimates the effect size; a554 events were reported. [Note: the fact that more small SE indicates a precise estimate, usually from athan one re-admission is permitted for each patient large study. The SE for the ln(RRR) is calculated by:means that an individual’s outcome is not binary.]Column two indicates the length of follow-up of the SE(ln(RRR)) =studies, which ranges from 1 to 12 months; it is nec- 1 1essary to account for follow-up when calculating + num. of re - admiss. num. of re - admiss. ineffect sizes, since the number of re-admissions may in exp. group control groupbe critically dependent on the length of the observa-tion period of the trial. Hence, for study 1 the SE(ln(RRR)) is An outcome measure which takes into account ÷1/2 + 1/9 = 0.782. Standard errors for the remain-length of follow-up is the re-admission rate ratio ing studies are provided in column 11 of Table 1.(RRR). As the name suggests, this is the ratio of It is common practice to calculate 95% confi-the re-admission rates (per month) in both arms. dence intervals for each study – these indicateThe re-admission rate (RR) in each arm is calculated the interval in which the estimate of effect sizeby: would be expected to fall 95 times out of every 100 replications of the trial. Hence, a 95% confidence Number of re - admissions RR = interval provides a range in which one can be Number of patients ¥ length of follow - up reasonably sure the true effect size lies. The formula for calculating a 95% confidence interval for aFor example, there are two re-admissions in 37 ln(RRR) is:patients over 1.5 months in trial 1, so the RR is2/(3.7 ¥ 1.5) = 0.036. [Note: more decimal places are ln(RRR) ± 1.96 ¥ SE(ln(RRR)).used in the working of the calculations in this paperthan are printed.] Similarly, the RR in the control For study 1 the ln(RRR) 95% confidence intervalgroup is 0.162. The outcome of interest can now be is given by -1.504 ± 1.96(0.782) = (-3.04 - 0.03). Con-© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 137
  • 138 Table 1 Data and calculations for the hospital re-admissions meta-analysis Experimental group Control group Re- Number Length of Re- Re- Re- Re- admission EPOC used in Study follow-up Patients admissions admission Patients admissions admission rate ratio SE 95% CI 95% CI Intervention quality original ID (months) (n) (n) rate (n) (n) rate (RRR) ln(RRR) (ln(RRR)) ln(RRR) RRR Weight administration measure report* 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 1.5 37 2 0.036 37 9 0.162 0.222 -1.504 0.782 (-3.04 - 0.03) (0.05 - 1.03) 1.64 Single 5 53 2 3 464 102 0.073 439 102 0.077 0.946 -0.055 0.140 (-0.33 - 0.22) (0.72 - 1.24) 51.00 Single 3 59 3 6 499 347 0.116 502 340 0.113 1.027 0.026 0.076 (-0.12 - 0.18) (0.88 - 1.19) 171.73 Single 6 60 4 6 86 36 0.070 87 26 0.050 1.401 0.337 0.257 (-0.17 - 0.84) (0.85 - 2.32) 15.10 Single 4 69 5 12 57 9 0.013 56 6 0.009 1.474 0.388 0.527 (-0.65 - 1.42) (0.52 - 4.14) 3.60 Team 3 82 6 2 39 29 0.372 41 35 0.427 0.871 -0.138 0.251 (-0.63 - 0.35) (0.53 - 1.42) 15.86 Single 3 88 7 3 20 3 0.050 20 13 0.217 0.231 -1.466 0.641 (-2.72 to - 0.21) (0.07 - 0.81) 2.44 Single 6 177 8 3 142 554 1.300 140 868 2.067 0.629 -0.463 0.054 (-0.57 to - 0.36) (0.57 - 0.70) 338.16 Team 4 187 9 6 695 343 0.082 701 310 0.074 1.116 0.110 0.078 (-0.04 - 0.26) (0.96 - 1.30) 162.83 Team 6 222© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 10 2 178 43 0.121 176 37 0.105 1.149 0.139 0.224 (-0.30 - 0.58) (0.74 - 1.78) 19.89 Single 2 228 11 6 30 9 0.050 30 6 0.033 1.500 0.405 0.527 (-0.63 - 1.44) (0.53 - 4.21) 3.60 Team 5 231 12 6 96 42 0.073 97 62 0.107 0.684 -0.379 0.200 (-0.77 - 0.01) (0.46 - 1.01) 25.04 Team 3 236 13 3 303 104 0.114 300 109 0.121 0.945 -0.057 0.137 (-0.33 - 0.21) (0.72 - 1.24) 53.22 Team 4 275 14 6 150 51 0.057 99 32 0.054 1.052 0.051 0.226 (-0.39 - 0.49) (0.68 - 1.64) 19.66 Team 4 283 15 1 20 4 0.200 20 6 0.300 0.667 -0.405 0.645 (-1.67 - 0.86) (0.19 - 2.36) 2.40 Team 1 312 16 1.5 29 4 0.092 25 9 0.240 0.383 -0.959 0.601 (-2.14 - 0.22) (0.12 - 1.24) 2.77 Single 4 334 17 12 333 396 0.099 335 410 0.102 0.972 -0.029 0.070 (-0.17 - 0.11) (0.85 - 1.12) 201.44 Single 3 339 18 3 140 18 0.043 136 16 0.039 1.093 0.089 0.344 (-0.58 - 0.76) (0.56 - 2.14) 8.47 Single 4 351 19 9 418 495 0.132 417 549 0.146 0.899 -0.106 0.062 (-0.23 - 0.02) (0.80 - 1.02) 260.31 Single 3 397 20 6 62 21 0.056 58 35 0.101 0.561 -0.578 0.276 (-1.12 to - 0.04) (0.33 - 0.96) 13.13 Team 4 403 21 12 199 107 0.045 205 111 0.045 0.993 -0.007 0.135 (-0.2 - 0.26) (0.76 - 1.30) 54.48 Team 3 416 22 12 63 22 0.029 60 30 0.042 0.698 -0.359 0.281 (-0.91 - 0.19) (0.40 - 1.21) 12.69 Team 4 691 23 6 35 10 0.048 40 51 0.213 0.224 -1.496 0.346 (-2.17 to - 0.82) (0.11 - 0.44) 8.36 Single 4 1793 24 6 102 49 0.080 102 51 0.083 0.961 -0.040 0.200 (-0.43 - 0.35) (0.65 - 1.42) 24.99 Single 7 1796 25 6 140 24 0.029 97 29 0.050 0.573 -0.556 0.276 (-1.10 to - 0.02) (0.33 - 0.98) 13.13 Team 3.5 2211 26 3 45 5 0.037 46 5 0.036 1.022 0.022 0.632 (-1.22 - 1.26) (0.30 - 3.53) 2.50 Team 6 2229 27 4 49 11 0.056 51 7 0.034 1.636 0.492 0.483 (-0.46 - 1.44) (0.63 - 4.22) 4.28 Single 3.5 2657 28 6 177 49 0.046 186 107 0.096 0.481 -0.731 0.172 (-1.07 to - 0.39) (0.34 - 0.67) 33.61 Single 3 3632 29 3 381 154 0.135 381 197 0.172 0.782 -0.246 0.108 (-0.46 to - 0.04) (0.63 - 0.97) 86.43 Team 4 3636 30 2 96 22 0.019 110 43 0.033 0.586 -0.534 0.262 (-1.05 to - 0.02) (0.35 - 0.98) 14.55 Single 6 4460 *Parker et al. 2000. n = number.
  • Meta-analysis methodsFigure 1 Forest plot of 30 RCTsexamining the effect on re-admission rates of interventionsaimed at modifying the hospitaldischarge process for elderlypeople.fidence intervals for RRR are obtained by taking Combining effect sizes – calculatingthe exponential of this ln(RRR) interval; hence, weighted averagesthe RRR 95% confidence interval for study 1 is(0.05–1.03). This interval includes 1, which indicates The previous section illustrated how a RRR estimatethat on its own the trial is inconclusive, because both and corresponding standard error could be calcu-beneficial and harmful effect size estimates are lated from summary data extracted from individualincluded in the interval and are in some sense plau- study reports. In other instances different effectsible. This highlights the need to consider the preci- measures may be more appropriate, but the generalsion of the estimate; the study estimated a very large principle that an estimate and SE are required fromtreatment effect, but did so very imprecisely; the true each study remains. When outcomes are reportedeffect could be much smaller (or larger) than the on a binary scale, the odds ratio, risk ratio or riskpoint estimate. The 95% confidence intervals for difference measures are commonly used, whileln(RRR) and RRR for the remaining studies are outcomes measured on a continuous scale can beprovided in columns 12 and 13, respectively. To aid combined directly, or standardized – if differentexamination of the results of the individual studies, scales of measurement have been used in the indi-these intervals can be plotted on the same axis, as in vidual studies. Descriptions and formulae for each ofFig. 1. The RRR estimate for each study is plotted, these outcome measures and others are availablewith the size of the plotting symbol proportional to elsewhere (Fleiss 1993; Sutton et al. 2000c).the precision of the estimate. The 95% confidence The simplest way to combine estimates is tointerval for each RRR estimate is also plotted (the average them. Since different studies estimate themore precise estimates having the smaller confidence true effect size with varying degrees of precision, aintervals) (other features of this figure will be weighted average is used. The weight given to eachexplained in due course). This plot highlights the study in the re-admissions meta-analysis is calculatedvariability in the estimates and in the precisions by:between studies. The issue of variability between esti-mates from individual studies is considered further in 1 weight = 2 .later sections. SE(ln(RRR))© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 139
  • A.J. Sutton et al.The square of the standard error is often known as treatment effect. Many people feel that in medicalthe variance, so combining studies using this weight- and related research such an assumption is unrealis-ing is often called the inverse variance-weighted tic (Thompson 1993) because studies are never iden-method (Fleiss 1993). The weightings for each study tical replications of one another, and study designare provided in Table 1, column 14. If an effect and conduct differences will inevitably have somemeasure other than the RRR is being used, then the degree of influence on study outcome. Models whichweightings are calculated by the same principle, using account for underlying variability in the treatmentthe inverse of the variance of that effect measure. effect estimates are considered in the next section. Once weight for each study has been calculated, apooled estimate of ln(RRR) is calculated by multi- Heterogeneity and random effect modelsplying each study’s weight by its ln(RRR) andsumming the resulting values, and then dividing this When performing a meta-analysis, although thevalue by the sum of the weights. Using figures from overall aim may be to produce an overall pooled esti-Table 1, the outline calculation for the re-admissions mate of treatment effect, it is crucial to assess thedata is: variation between results of the primary studies and, if possible, to investigate why they differ. Clearly, it ln( pooled RRR) would be remarkable if all studies being meta- [(1.64 ¥ (-1.504)) + . . . + (14.55 ¥ (-1.504))] analysed produced exactly the same treatment effect = (1.654 + . . . + 14.55) estimate. Some variation in results is expected, due = -0.164 simply to the play of chance; this is often called random variation. However, if effect size estimatesThe variance for ln(pooled RRR) (or any other vary between studies to a greater extent thaneffect measure used) is then calculated by taking the expected on the basis of chance alone the studies arereciprocal of the sum of the weights (1/sum of considered to be heterogeneous, and it is necessaryweights): to account for the extra variation, above that ex- var ( pooled RRR) = 1 (1.64 + . . . + 14.55) pected by chance, in the meta-analysis model. The = 0.0006 way this is usually performed is through the use of a random-effect model. Essentially, this relaxes theUsing these figures, an approximate 95% confidence assumption that each study is estimating exactly theinterval for the pooled estimate can be calculated same underlying treatment effect, and insteadin the same manner as confidence intervals were assumes that the underlying effect sizes are drawnproduced for the individual study estimates above. from a distribution of effect sizes. This distribution isThe pooled estimate of RRR for the re-admissions usually assumed to be Normal, with a variance deter-dataset is 0.85 with 95% CI (0.81–0.89), indicating a mined by the data. In practical terms, accounting formodest, statistically significant treatment benefit at between study heterogeneity in this way produces athe 5% level.This estimate is plotted using a diamond pooled point estimate which is often (but not always)shape in Fig. 1 directly below the 30 individual similar to the one produced by fixed-effect methods.studies. Figure 1 is often called a forest plot and is However, taking into account between study hetero-commonly used to display the results of a meta- geneity produces a wider 95% confidence interval, soanalysis. the estimate is more conservative. This approach is often known as a fixed-effect The whole issue of appropriateness and suitabilityapproach, to distinguish it from the random-effect of fixed- and random-effect models for meta-analysismodels described below. It can be used to combine has been much discussed (Thompson 1993; Petooutcomes on any scale; however, other related fixed- 1987). A test for heterogeneity exists (Fleiss 1993),effect methods specifically for combining odds ratios and the result of this test can then be used to informalso exist (Fleiss 1993; Sutton et al. 2000c). These model choice. If it is non-significant a fixed-effectfixed-effect methods all make the strong assumption model is to be used, and if it is significant a random-that each study is estimating the same underlying effect model should be used. This seemingly sensible140 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
  • Meta-analysis methodsapproach has a flaw because the test has low power. desirable than using random-effect models to allowThis implies that heterogeneity may exist even when for heterogeneity is to try to explain the heterogene-it produces a non-significant result (Boissel et al. ity. This may lead to the identification of associations1989). An alternative approach is to always use a between study or patient characteristics and therandom-effect model. The inflation of the confidence outcome measure, which would not have been pos-interval is dictated by the degree of variation sible in single studies. This may lead in turn to clini-between studies, so when between-study variation is cally important findings and may eventually assist insmall the inflation will be negligible, producing a individualizing treatment regimes (Lau et al. 1998).result which would be very similar to the fixed-effect Both subgroup analyses and regression methods canapproach. be used to do this. A detailed description of the random-effect meta- Potential study level factors, pertaining to eitheranalysis model is beyond the scope of this paper, but study design or patient characteristics which couldclear accounts are given elsewhere (DerSimonian & affect study results should ideally be identified beforeLaird 1986; Shadish & Haddock 1994). Combining a meta-analysis is conducted. If this is carried out,the 30 studies evaluating interventions to prevent re- data on these factors can then be obtained at the dataadmission using a random-effect model produces a extraction stage of a review, and such explicit a prioriRRR of 0.83 (0.73–0.93). This estimate is plotted specification also reduces the temptation of ‘databelow the fixed-effect one in Fig. 1. The estimate of dredging’.the between-study variance is 0.057, which is quite Returning to the re-admission dataset, one poten-small but non-negligible (the test for between- tial factor which could affect results is whether thestudy heterogeneity is highly significant (P < 0.001)). intervention was administered by a team or anAccounting for this heterogeneity has produced a individual. This information is given for each studywider confidence interval compared to the fixed- in column 15 of Table 1. In 16 of the studies theeffect approach, which is a typical finding. Modifica- intervention was administered by an individualtions to the way the parameters in a random-effect and in 14 it was administered by a team. Separatemeta-analysis model are calculated have been devel- meta-analyses can be performed for these two sub-oped (Hardy & Thompson 1996; Biggerstaff & groups in an attempt to see if the effectiveness ofTweedie 1997). One of these should be used if the the intervention depends on whether an individualnumber of studies in the meta-analysis is small or team implements it, and whether between study(approximately less than 10) as it overcomes prob- heterogeneity is reduced in the subgroups. Pooledlems with a previous simplification in the model cal- estimates for these subgroups turn out to be almostculations, which can be important in meta-analyses of identical. The intervention administered by indi-small numbers of studies. vidual subgroup has a RRR of 0.83 (0.70–0.97) and A final point concerning between study hetero- the estimate of the between-study heterogeneitygeneity is that there is little explicit guidance to offer of 0.056 (test for heterogeneity highly significant atregarding the point at which studies estimates should P < 0.001). For the studies where the interventionnot be pooled at all because heterogeneity is deemed was administered by a team the RRR was 0.83too great, but alternative approaches are discussed (0.69–0.99) and the estimate of between-studybelow. heterogeneity 0.062 (test for heterogeneity highly significant at P < 0.001). Hence, it would appear that whether the intervention is administered by an individual or a team makes very little difference toExploring and explaining heterogeneity the effectiveness of the intervention and, hence, doesUntil now, the impression has been given that het- not explain any of the variation between studyerogeneity is a nuisance factor which needs account- results.ing for when performing a meta-analysis. However, If the factor of interest is measured on a continu-investigating why between-study variation exists ous scale, or dummy indicator variables are createdoffers the meta-analyst unique opportunities. More for the levels of categorical factors, then meta-© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 141
  • A.J. Sutton et al.regression can be used to explore their impact. Meta- Publication and related biasesregression models are very similar in principle toordinary simple linear regression models, the main Publication bias exists because research with statisti-differences being that individual observations (the cally significant or interesting results is potentiallyprimary studies), unlike individual patients, are not more likely to be submitted, published or publishedgiven equal weight in the analysis (i.e. study should more rapidly than work with null or non-significantbe weighted according to its precision). Addition- results (Song et al. 2000). When only the publishedally, it may be desirable to include a random-effect literature is included in a meta-analysis, this canterm to account for residual heterogeneity not potentially lead to biased over-optimistic conclu-explained by the covariate(s); such a model can be sions. Related biases which can also bias the resultsthought of as an extension to the random-effect of a meta-analysis include (i) pipeline bias, when sig-model described above (Berkey et al. 1995). An nificant results are published quicker than non-sig-example of a meta-regression analysis is given in nificant ones; and (ii) language bias, when researcherssection 3. whose native tongue is not English are more likely to Meta-regression techniques are currently used publish their non-significant results in non-Englishrelatively rarely, and the authors believe not to written journals, but are more likely to publish theirtheir full potential, but examples are emerging significant results in English. If this happens, a meta-(Freemantle et al. 1999; von Dadelszen et al. analysis including only study reports in English may2000). Although a powerful tool, they do have their be based on a biased collection of studies. Perhaps anlimitations. Regression analysis of this type are also appropriate term which includes all these sources ofsusceptible to aggregation bias, which occurs if the bias is ‘dissemination bias’ (Song et al. 2000).relation between patient characteristic study means Long-term initiatives to alleviate the problem ofand outcomes do not directly reflect the relation publication bias have commenced, including trialbetween individuals’ values and individuals’ out- amnesties (Horton 1997) to encourage publication ofcomes (Greenland 1987). Additionally, meta- previously unpublished trials, and the creation of reg-regression type analyses are often limited by the istries for prospective registration of trials (Hortonnumber of studies included in the meta-analysis. & Smith 1999). However, the issue is currently stillSpecial regression models have also been developed a big concern for researchers carrying out meta-to explore the effect of patients’ underlying risk on analyses. There are certain measures which can beintervention effect (Senn et al. 1996; Walter 1997) taken to assess the presence and minimize the impactwhich are necessary to avoid producing incorrect of publication bias in a meta-analysis dataset. Cur-results when exploring the effect of such a factor rently, however, there is much debate, and some(Schmid et al. 1998; Senn et al. 1996). dispute as to the approach researchers should take to deal with publication bias in meta-analyses. The presence of publication bias in a meta- analysis dataset can be assessed informally by inspec-3 Threats to the validity of a meta-analysis tion of a funnel plot (Light & Pillemar 1984). ThisAlthough meta-analyses are often considered to plots the effect size for each study against someprovide the highest grade of evidence available measure of its precision, e.g. the 1/standard error ofregarding the effectiveness of an intervention, higher the effect size. The resulting plot should be shapedthan an individual trial, it should not be forgotten like a funnel if no publication bias is present. Thisthat they are a type of observational study, and as shape is expected because trials of decreasing sizesuch are open to biases which may threaten their have increasingly large variation in their effectvalidity. Perhaps the two most serious problems size estimates due to random variation becomingwhich can potentially lead to biased estimates are increasingly influential. However, if the chance ofpublication bias and variable study quality of the publication is greater for larger trials or trials withprimary studies. These two issues are considered statistically significant results, some small non-further below. significant studies may not appear in the literature.142 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
  • Meta-analysis methods 20 part of a sensitivity analysis is sensible (Sutton et al. 2000a) but more research is needed in this area.1/standard error (In(RRR)) 16 12 Study quality 8 It is rare that all the studies available for a meta- analysis are of a unanimously high quality. More 4 likely there will be a range in the quality of the research pertaining to the intervention of interest. 0 Restricting a meta-analysis to include only RCTs is 0.25 0.5 0.75 1.0 1.5 2.0 a safeguard taken by such groups as the Cochrane Re-admission rate ratio (log scale) Collaboration, in an attempt to include only evidence which potentially produces the least biased results. Figure 2 Funnel plot of studies included in the hospital Restricting analyses only to RCT does not guarantee discharge meta-analysis examining the effect of interventions on the effect of re-admission rates. the meta-analysis will produce an unbiased result, however, as there can still be methodological flaws in the design, conduct and analysis of a trial. Clearly, This leads to omission of trials in one corner of the the inclusion of poor or flawed studies in a meta- plot – the bottom right-hand corner of the plot when analysis may be problematic because their influence an ‘undesirable’ outcome such as the re-admission may bias the pooled result and even mean the meta- rate is being considered, and hence to a degree of analysis cones to the wrong qualitative conclusions. asymmetry in the funnel. A funnel plot for the 30 Unfortunately, most studies are flawed to some RCTs in the re-admissions dataset is provided in degree, and including all but ‘perfect’ studies (which Fig. 2. Visual inspection would suggest that there is may not be possible to conduct due to ethical or prac- little evidence of publication bias in this dataset; tical constraints in some fields) may leave the meta- however, there are a few small studies with extremely analyst with few if any data. The problem of dealing beneficial RRRs at the bottom left-hand corner of with study quality in a meta-analysis is similar to that the plot, for which there are no symmetric counter- for publication bias, in the sense that there is agree- parts with extreme positive RRRs in the bottom ment that some assessment of quality should always right-hand corner. be made, but little consensus on how to make such Publication bias can be tested for more formally an assessment, or how to incorporate the results into using statistical tests which are based on the same the meta-analysis. symmetry assumptions as a funnel plot assessment There have been many scales and checklists devel- (Begg & Mazumdar 1994; Egger et al. 1997; Duval & oped to aid in the assessment of study quality (Moher Tweedie 1998). One formal test (Egger et al. 1997) et al. 1995) but many of them have come under heavy produces a non-significant P-value of 0.57 for the re- criticism for not being constructed scientifically admissions dataset, which is consistent with the (Moher et al. 1999a). Further, recent work has de- inconclusive visual assessment. monstrated that different results can be obtained in Disagreement exists about how to proceed if pub- a meta-analysis depending on the checklist used (Juni lication bias is suspected, after an assessment for its et al. 1999). A further problem is the fact that it is presence has been made. Methods to assess the likely often difficult to ascertain all the required details of impact of publication bias on the pooled outcome the trial from a study report (Begg et al. 1996). Often, estimate have been developed (Duval & Tweedie this means that an assessment of the trial report and 1998; Givens et al. 1997; Copas 1999; Song et al. 2000) not of the trial itself is in effect being made. The but they are not widely used, due partly to the fact underlying problem with the use of a scale or check- that many are complex and hence difficult to imple- list is that it is impossible to predict which design ment, and due partly to concerns about their applic- aspects cause the most bias and, more fundamentally, ability. We believe that the use of such methods as it is often impossible to predict even the direction in © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 143
  • A.J. Sutton et al.which any bias will be acting (Schulz et al. 1995). This 2.0 Re-admission rate ratio (log scale)makes the direct adjustment of study estimates forstudy quality impossible. Several ways in which study quality can be incor- 1.0porated into a meta-analysis have been suggested.Perhaps the simplest is to use a quality threshold toinclude or exclude studies. This could be defined 0.5using a cut-off value, on a particular quality scale, oras a requirement of having several design aspectspresent. A further possibility is to use a quality scoreto weight study results, or incorporate such a score 0.2into the standard precision weightings (Berard & 1 2 3 4 5 6 7Bravo 1998). Finally, an approach which appears to EPOC quality scorebe gaining support is the exploration of quality, via Figure 3 Regression line examining the impact ofmeta-regression. In such an approach a quality score, quality score, using a random effect meta-regressionor individual markers of study quality, such as the model for re-admission rate in the hospital dischargedegree of blinding or method of treatment allocation, meta-analysis.are included in a regression model as explanatoryvariables. Examining individual markers of qualityseparately eliminates the problems with the some- further developments have been made. A proportionwhat arbitrary construction for the quality scale of these focus on the synthesis of less standard datascoring systems (Detsky et al. 1992). types. For example, specialist methods are required Returning to the re-admissions meta-analysis, to pool the results of diagnostic tests because twostudy quality was rated crudely using a count of effec- outcomes, specificity and sensitivity, require simulta-tive practice and organization of care (EPOC) neous consideration (Irwig et al. 1995). Another areaquality criteria (Cochrane Effective Practice & Orga- which requires special methods is the analysis of sur-nization of Care Review Group 1998) that were vival data because account has to be made of cen-satisfied for each study. The scores obtained by each sored observations. (Dear 1994) Other data-types fortrial using this method are given in the penultimate which specialist methods have been developed arecolumn of Table 1. When these scores are included in dose–response data (Tweedie & Mengersen 1995)a random effect regression model, the equation and economic data (Jefferson et al. 1996). Individualln(RRR) = -0.22 + 0.007 ¥ quality score is obtained. patient data (Stewart & Clarke 1995), where originalThis regression line, together with the primary study datasets are pooled, rather than relying on pub-studies (the size of the plotting symbol is propor- lished summary data has been described as the goldtional to the precision of the effect size estimate), are standard, it is considered by some to be the only wayplotted in Fig. 3. The quality score coefficient is small to carry out a meta-analysis of survival data, and isand not statistically significant (P = 0.88). This means much more time consuming and costly than meta-study quality, at least as measured in this way, would analysis of summary data. It is currently unclearnot appear to affect the study results systematically, whether the extra effort required is worthwhile. Foror to explain the between-study heterogeneity. an overview of these and further meta-analytical developments see Sutton et al. (2000c).4 Further developments in methodsof meta-analysis New directions for meta-analysis using Bayesian statisticsSpecialist meta-analysis methods In addition to the above developments, moreWhile section 2 provided a summary of the most advanced methods for synthesis of information havecommonly used methods in meta-analysis, many been developed. Although not currently used rou-144 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
  • Meta-analysis methodstinely, these provide potentially more powerful and ment many existing meta-analysis methods devel-flexible tools for synthesizing evidence. Many of oped classically and, more importantly, developthese methods use Bayesian statistics, in contrast to models not possible using more traditional classicalthe more commonly employed classical approach. software. This has potentially huge benefits for syn- A full description of Bayesian methods is not pos- thesizing information and builds on earlier pioneer-sible here, but for a recent review of their use in ing work by Eddy et al. (1992), whose ‘new’ graphicalassessing health technologies see Spiegelhalter et al. approach to meta-analysis can now be implemented(2000a). The key element of the Bayesian approach using WinBUGS (Spiegelhalter et al. 2000b). Issuesis that it introduces the idea of subjective probability being addressed by these methods are outlined(O’Hagan 1988) in contrast to the objective pro- below:babilities traditionally attached to specific, often 1 Data from an RCT may be of direct interest, butrepeatable, events. Before carrying out a piece of not of a form which can simply included in a meta-research, an investigator would have formed some analysis. For example, data from an RCT which usesprior beliefs regarding its outcome, possibly derived the intervention of interest in the treatment arm, butfrom results of previous research in the same field. a different intervention from the other studies inThese a priori beliefs are combined with the data the control arm may be available. Methods tofrom the current investigation to produce results include such data have been developed (Higgins &which reflect the researchers beliefs having con- Whitehead 1996).ducted the research. These posterior beliefs are cal- 2 In some assessments considering only random-culated by combining the prior beliefs with the new ized evidence may not be the optimal approach.data using Bayes’ Theorem, which forms the back- Observational studies, which could potentially bebone of all Bayesian analysis. very large, providing valuable data on thousands of The advantages of using such an approach are patients, may be available. It may sometimes seemoften subtle, but important. Perhaps most notable unjust to exclude these from a meta-analysis, partic-from a health-care context is the ability to make ularly if they are of high quality, as they may havedirect probability statements regarding quantities of particular strengths and weaknesses, different frominterest, for example, the probability that patients those of randomized studies (Droitcour et al. 1993).receiving drug A have better survival than those Special methods have been developed to account forwho receive drug B. There are good reasons, different study designs in a meta-analysis (Prevosthowever, why the Bayesian approach has largely et al. 2000; Larose & Dey 1997). In other instancesbeen neglected in routine use. The most serious data on the effect of a drug of interest in animals mayis that, generally, the computations required in be available and provide valuable information whichBayesian models are very complex. Additionally, the can be incorporated (DuMouchel & Harris 1983).expressing of prior beliefs in form which can be 3 There may be benefits to including informationincluded in analysis is a non-trivial task. Excitingly, included in previous trials or meta-analyses onmany of the computational difficulties have been similar topics using similar interventions and out-addressed recently, with the development of special- come measures (Higgins & Whitehead 1996).ist software, most notably WinBUGS (Spiegelhalter 4 A study may not provide any quantitative data atet al. 2000b). The problem of expressing prior beliefs all, being qualitative in design, but this qualitativeremains; however, there are practical ‘solutions’, data may be of direct relevance to the topic underincluding using ‘off-the-shelf’ priors, which can assessment (Roberts et al. 1998).express the presence of a range of degrees of prior Bayesian modelling gives us the potential toknowledge, and can be used in a sensitivity analysis. include all these types of data in a variety of ways,Use of ‘vague priors’, which essentially means prior including direct input into the model, or incorporatedinformation is ignored, is also possible. through the specification of prior beliefs. The new WinBUGS software is able to compute Other new approaches to meta-analysis have beenthe calculations required for a wide range of suggested, but the corresponding methodology is atBayesian analyses. The user has freedom to imple- the conceptual rather than practical stage of devel-© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 145
  • A.J. Sutton et al.opment. The extension of meta-regression to the ideas which sound natural and appealing are oftensimultaneous modelling of multiple scientific factors difficult to implement in practice. Clearly, it will bewith the intention of producing a response surface of some time before they are used routinely, but signif-treatment effects, rather than a single pooled result icant steps have been made. Moving the synthesis ofhas been advocated (Rubin 1992). This may allow a evidence beyond calculating simple averages ismore detailed examination of the science underlying timely, feasible and, indeed, essential.the results synthesized (Rubin 1992; Lau et al. 1998).Further, it may be possible to model different aspects Acknowledgementsof the processes under study separately. For example,if one were interested in the effect of lowering cho- The research on which this paper is based waslesterol of clinical outcomes, in a first stage of the funded, in part, by the NHS Research and Develop-analysis data relating to the degree different inter- ment Health Technology Assessment Programmeventions lower cholesterol levels could be synthe- (Methodology Project Numbers 93/52/3 & 95/09/03)sized. Then, in a second stage, the relationshipbetween cholesterol level and various clinical out- Referencescomes could be modelled (Katerndahl & Lawler Begg C., Cho M., Eastwood S., Horton R., Moher O. &1999). A further utilization of Bayesian modelling Olkin I. (1996) Improving the quality of reporting ofcould allow meta-analysis to be placed within a deci- randomised controlled trials: the CONSORT statement.sion theoretical framework (Berger 1980) which can Journal of the American Medical Association 276,also take into account utilities when making health 637–639.care or policy decisions (Midgette et al. 1994). Begg C.B. & Mazumdar M. (1994) Operating characteris- However, there is no magic wand to make all this tics of a rank correlation test for publication bias. Bio-happen. While Bayesian modelling provides flexibil- metrics 50, 1088–1101.ity and framework, it does not dictate how models Berard A. & Bravo G. (1998) Combining studies usingshould be specified, data should be incorporated, or effect sizes and quality scores: application to bone losshow priors should be elicited. There is much method- in postmenopausal women. Journal of Clinical Epidemi-ological work required to further develop the ideas ology 51, 801–807.outlined above. Berger J.O. (1980). Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer-Verlag, New York. Berkey C.S., Hoaglin D.C., Mosteller F. & Colditz G.A.5 Conclusion (1995) A random-effects regression model for meta- analysis. Statistics in Medicine 14, 395–411.Much has been written on meta-analysis and the syn- Biggerstaff B.J. & Tweedie R.L. (1997) Incorporating vari-thesis of evidence within the medical literature over ability in estimates of heterogeneity in the randomthe past two decades. During this time, the basic syn- effects model in meta-analysis. Statistics in Medicine 16,thesizing of effect measures using weighed averages 753–768.has been refined to a high degree, and much of the Boissel J.P., Blanchard J., Panak E., Peyrieux J.C., SACKSmethodology required to do so is in place for most & H. (1989) Considerations for the meta-analysis of ran-situations encountered. Threats to the validity of domized clinical trials: summary of a panel discussion.meta-analysis exist, and the methods for dealing with Controlled Clinical Trials 10, 254–281.problems such as publication bias and variations in Cochrane Effective Practice and Organisation of Care Review Group (1998) The Data Collection Checklist.quality of the primary studies are at a less refined University of Aberdeen, HSRU, Aberdeen.stage. Additionally, many consider the standard Copas J. (1999) What works?: selectivity models and meta-‘weighted average approach’ to meta-analysis not to analysis. Journal of the Royal Statistical Society, Series Abe ‘state of the art’ in at least some situations, where 161, 95–105.the use of more sophisticated methods, generally to von Dadelszen P., Ornstein M.P., Bull S.B., Logan A.G.,synthesize a broader base of evidence, would be Koren G. & Magee L.A. (2000) Fall in mean arterialadvantageous. Currently, such approaches are still pressure and fetal growth restriction in pregnancy hyper-firmly in the experimental stage and unfortunately tension: a meta-analysis. Lancet 355, 87–92.146 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148
  • Meta-analysis methodsDear K.B.G. (1994) Iterative generalized least squares for epidemiological literature. Epidemiological Review 9, meta-analysis of survival data at multiple times. Biomet- 1–30. rics 50, 989–1002. Hardy R.J. & Thompson S.G. (1996) A likelihood approachDeeks J., Glanville J. & Sheldon T. (1996) Undertaking sys- to meta-analysis with random effects. Statistics in Medi- tematic reviews of research on effectiveness: CRD cine 15, 619–629. guidelines for those carrying out or commissioning Higgins J.P.T. & Whitehead A. (1996) Borrowing strength reviews. Report no. 4. Centre for Reviews and Dissemi- from external trials in a meta-analysis. Statistics in Med- nation. York Publishing Services Ltd, York. icine 15, 2733–2749.DerSimonian R. & Laird N. (1986) Meta-analysis in clini- Hollis S. & Campbell F. (1999) What is meant by intention cal trials. Controlled Clinical Trials 7, 177–188. to treat analysis? Survey of published randomised con-Detsky A.S., Naylor C.D.O., Rourke K., McGeer A.J.L., trolled trials. British Medical Journal 319, 670–674. Abbe K.A., O’Rourke K. & L’Abbe K.A. (1992) Incor- Horton R. (1997) Medical editors trial amnesty. Lancet 350, porating variations in the quality of individual random- 756. ized trials into meta-analysis. Journal of Clinical Horton R. & Smith R. (1999) Time to register randomised Epidemiology 45, 255–265. trials – the case is now unanswerable. British MedicalDickersin K., Scherer R. & Lefebvre C. (1994) Systematic Journal 319, 865–866. reviews – identifying relevant studies for systematic Hunt M. (1997) How Science Takes Stock: the story of meta- reviews. British Medical Journal 309, 1286–1291. analysis. Russell Sage Foundation, New York.Droitcour J., Silberman G. & Chelimsky E. (1993) Irwig L., Macaskill P., Glasziou P. & Fahey M. (1995) Meta- Cross-design synthesis: a new form of meta-analysis analytic methods for diagnostic test accuracy. Journal of for combining results from randomized clinical trials Clinical Epidemiology 48, 119–130. and medical-practice databases. International Journal Jefferson T., Mugford M., Gray A. & DeMicheli V. (1996) of Technology Assessment in Health Care 9, 440– An exercise in the feasibility of carrying out secondary 449. economic analysis. Health Economics 5, 155–165.DuMouchel W.H. & Harris J.E. (1983) Bayes methods for Juni P., Witschi A., Bloch R. & Egger M. (1999) The hazards combining the results of cancer studies in humans and of scoring the quality of clinical trials for meta-analysis. other species (with comment). Journal of the American Journal of the American Medical Association 282, Statistical Association 78, 293–308. 1054–1060.Duval S. & Tweedie R. (1998) Practical estimates of the Katerndahl D.A. & Lawler W.R. (1999) Variability in meta- effect of publication bias in meta-analysis. Australasian analytic results concerning the value of cholesterol re- Epidemiologist 5, 14–17. duction in coronary heart disease: a meta-meta-analysis.Eddy D.M., Hasselblad V. & Shachter R. (1992) Meta- American Journal of Epidemiology 149, 429–441. Analysis by the Confidence Profile Method. Academic Larose D.T. & Dey D.K. (1997) Grouped random effects Press, San Diego. models for Bayesian meta-analysis. Statistics in MedicineEgger M., Smith G.D., Schneider M. & Minder C. (1997) 16, 1817–1829. Bias in meta-analysis detected by a simple, graphical test. Lau J., Ioannidis J.P. & Schmid C.H. (1998) Summing up British Medical Journal 315, 629–634. evidence: one answer is not always enough. Lancet 351,Fleiss J.L. (1993) The statistical basis of meta-analysis. Sta- 123–127. tistical Methods in Medical Research 2, 121–145. Light R.J. & Pillemar D.B. (1984) Summing Up: the scienceFleiss J.L. (1994) Measures of effect size for categorical of reviewing research. Harvard University Press, Cam- data. In The Handbook of Research Synthesis (eds H. bridge, MA. Cooper & L.V. Hedges), pp. 245–260. Russell Sage Foun- Midgette A.S., Wong J.B., Beshansky J.R., Porath A., dation, New York. Fleming C. & Pauker S.G. (1994) Cost-effectiveness ofFreemantle N., Cleland J., Young P., Mason J. & Harrison streptokinase for acute myocardial-infarction – a com- J. (1999) b-Blockade after myocardial infarction: sys- bined metaanalysis and decision-analysis of the effects tematic review and meta regression analysis. British of infarct location and of likelihood of infarction. Medical Journal 318, 1730–1737. Medical Decision Making 14, 108–117.Givens G.H., Smith D.D. & Tweedie R.L. (1997) Publica- Moher D., Cook D.J., Eastwood S., Olkin I., Rennie D. & tion bias in meta-analysis: a Bayesian data-augmentation Stroup D. for the QUORUM Group (1999b) Improving approach to account for issues exemplified in the passive the quality of reporting of meta-analysis of randomised smoking debate. Statistical Science 12, 221–250. controlled trials: the QUORUM statement. Lancet 354,Greenland S. (1987) Quantitative methods in the review of 1896–1900.© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 147
  • A.J. Sutton et al.Moher D., Jadad A.R., Nichol G., Penman M., Tugwell P. & Song F., Easterwood A., Gilbody S., Duley L. & Sutton Walsh S. (1995) Assessing the quality of randomized con- A.J. (2000) Publication and other selection biases in trolled trials – an annotated bibliography of scales and systematic reviews. Health Technology Assessment 4(10), checklists. Controlled Clinical Trials 12, 62–73. 1–115.Moher D., Klassen T.P., Jadad A.R., Tugwell P., Moher M. & Spiegelhalter D.J., Miles J.P., Jones D.R. & Abrams K.R. Jones A.L. (1999a) Assessing the quality of randomised (2000a) Bayesian methods in health technology assess- controlled trials: implications for the conduct of meta- ment. Health Technology Assessment 4(38), 1–142. analyses. Health Technology Assessment 3(12), 1–98. Spiegelhalter D.J., Thomas A. & Best N.G. (2000b)O’Hagan A. (1988) Probability: methods and measurement. Winbugs, version 1.2. user manual. MRC Biostatistics Chapman & Hall, London. Unit, Cambridge.Oxman A.D. (1996) The Cochrane Collaboration Hand- Sterne J.A.C., Egger M. & Sutton A.J. (2001) Meta- book: preparing and maintaining systematic reviews, 2nd analysis software. In Systematic Reviews in Health Care: edn. Cochrane Collaboration, Oxford. meta-analysis in context, 2nd edn (eds M. Egger, G.Parker S.G., Peet S.M., McPherson A., Cannaby A.M., Davey Smith & D.G. Altman), pp. 336–346. BMJ Books, Baker R.,Wilson A., Lindesay J., Parker G.,Abrams K.R. London. & Jones D.R. (2001) A systematic review of discharge Stewart L.A. & Clarke M.J. (1995) Practical methodology arrangements for older people. Health Technology of meta-analyses (overviews) using updated individual Assessment (in press). patient data. Cochrane Working Group on StatisticalPeto R. (1987) Why do we need systematic overviews of Medicine 14, 2057–2079. randomised trials? Statistics in Medicine 6, 233–240. Sutton A.J., Abrams K.R., Jones D.R., Sheldon T.A. & SongPrevost T.C., Abrans K.R. & Jones D.R. (2000) Hierarchi- F. (1998) Systematic reviews of trials and other studies. cal models in generalized synthesis of evidence: an Health Technology Assessment 2(19), 1–310. example based on studies of breast cancer screening. Sutton A.J., Abrams K.R., Jones D.R., Sheldon T.A. & Statistics in Medicine 19, 3359–3376. Song F. (2000c) Methods for Meta-Analysis in MedicalRoberts K.A., Jones D.R., Abrams K.R., Dixon-Woods M. Research. John Wiley, London. & Fitzpatrick R. (1998) Meta-analysis of qualitative and Sutton A.J., Duval S.J., Tweedie R.L., Abrams K.R. & Jones quantitative evidence: an example based on studies of D.R. (2000a) Empirical assessment of effect of publica- patient satisfaction. Technical Report 98–01, University tion bias on meta-analyses. British Medical Journal 320, of Leicester: Department of Epidemiology and Public 1574–1577. Health, Leicester. Sutton A.J., Lambert P.C., Hellmich M., Abrams K.R. &Rubin D. (1992) A new perspective. In The Future of Meta- Jones D.R. (2000b) Meta-analysis in practice: a critical Analysis (eds K.W. Wachter & M.L. Straf), pp. 155–165. review of available software. In Meta-Analysis in Medi- Russell Sage Foundation, New York. cine and Health Policy (eds D.A. Berry & D.K. Stangl).Schmid C.H., Lau J., McIntosh M.W. & Cappelleri J.C. Marcel Dekker, New York. (1998) An empirical study of the effect of the control Thompson S.G. (1993) Controversies in meta-analysis: the rate as a predictor of treatment efficacy in meta-analysis case of the trials of serum cholesterol reduction. Statisti- of clinical trials. Statistics in Medicine 17, 1923–1942. cal Methods in Medical Research 2, 173–192.Schulz K.F., Chalmers I., Hayes R.J. & Altman D.G. (1995) Tweedie R.L. & Mengersen K.L. (1995) Meta-analytic Empirical evidence of bias: dimensions of methodologi- approaches to dose–response relationships, with appli- cal quality associated with estimates of treatment effects cation in studies of lung cancer and exposure to envi- in controlled trials. Journal of the American Medical ronmental tobacco smoke. Statistics in Medicine 14, Association 273, 408–412. 545–569.Senn S., Sharp S., Thompson S. & Altman D. (1996) Rela- Walter S.D. (1997) Variation in baseline risk as an expla- tion between treatment benefit and underlying risk in nation of heterogeneity in meta-analysis. Statistics in meta-analysis. British Medical Journal 313, 1550–1551. Medicine 16, 2883–2900.Shadish W.R. & Haddock C.K. (1994) Combining estimates Whitehead A. & Jones N.M.B. (1994) A meta-analysis of of effect size. In The Handbook of Research Synthesis clinical trials involving different classifications of (eds H. Cooper & L.V. Hedges), pp. 261–284. Russell response into ordered categories. Statistics in Medicine Sage Foundation, New York. 13, 2503–2515.148 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148