Quantitative Synthesis I

1,815 views
1,610 views

Published on

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,815
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Quantitative Synthesis I
  • Systematic Review Process Overview This slide illustrates the steps in the systematic review process. This module focuses on quantitative synthesis.
  • Learning Objectives
  • Synonyms for Meta-Analysis
  • Reasons To Conduct Meta-Analyses
  • Commonly Encountered Comparative Effect Measures This table shows the type of data and corresponding effect measures that are commonly encountered in meta-analyses. For continuous data, the mean difference is most often used. Standardized mean difference, or “effect size,” is a dimensionless standardized unit; it is seldom used in medical meta-analyses, because it lacks direct clinical interpretation. Corresponding metrics for dichotomous data include the odds ratio, risk ratio, and risk difference. Hazard ratio is used in time-of-event meta-analyses.
  • Principles of Combining Data for Basic Meta-Analyses
  • Things To Know About the Data Before Combining Them The biological and clinical plausibility of the research question should be addressed first when deciding whether or not to carry out a meta-analysis. Will the answer to the research question be clinically meaningful? As will be discussed later, the scale of the effect measure may influence the decision of whether or how to combine data. Estimates from studies with small numbers of patients or studies with few events are unstable. The results of a meta-analysis based only on small studies, even if statistically significant, should be interpreted cautiously. Additional points about data being considered for meta-analysis are illustrated with hypothetical examples in the next five slides.
  • True Associations May Disappear When Data Are Combined Inappropriately In this figure, the two hypothetical groups of data show a strong correlation between the variable of interest and the effect of interest. Any relationship between variables can be obscured by inappropriate aggregation of data. Imagine that the oblongs represent two studies; if one were to conduct a meta-analysis by using only the mean values, the correlation would be lost. The correct approach would be to take the correlation (or effect) in each of the studies and use the weighted average.
  • An Association May Be Seen When There Is None In contrast with the last slide, there is no correlation of the variable with the effect within each data group. A correlation appears, however, if we inappropriately aggregate the data by ignoring the fact that the data are from two separate studies that may have addressed different research questions or that there is no clinical rationale to combine the data.
  • Changes in the Same Scale May Have Difference Meanings This hypothetical example illustrates the importance of appreciating baseline values and the scale used to measure them when interpreting changes in data. It can be seen that the same absolute change of one unit at both locations (A to B, C to D) has different meaning in term of relative change. A to B represents a 100% relative change, whereas C to D represents only a 14% relative change. Meta-analysis of studies that have different baseline values may not be appropriate. Serum creatinine is such an example; the same absolute increase of measurement from 1.0 to 1.5 mg/dL has very different clinical implications than a change from 6.0 to 6.5 mg/dL.
  • Effect of the Choice of Metric on Meta-analysis In this hypothetical example, two studies with the same number of patients and different event rates have identical relative risk but very different absolute risks (risk difference). A meta-analysis of the relative risk scale would be appropriate, but a meta-analysis of the absolute risk scale would probably not be appropriate.
  • Effect of Small Changes on the Estimate In this hypothetical example of three studies with different numbers of patients, the decrease or increase of one event (perhaps due to error of misclassification) leads to a very different effect on the estimate. The greatest effect will be seen in the small studies.
  • Binary Outcomes Sometimes continuous variables are dichotomized into binary outcomes. A threshold value may be used to categorize pain scores as improved or not improved. This approach is suboptimal because of information loss and because the choice of the threshold for defining success is often arbitrary.
  • A Sample 2x2 Table Reference: ISIS-2 (Second International Study of Infarct Survival) Collaborative Group. Randomized trial of intravenous streptokinase, oral aspirin, both, or neither among 17,817 cases of suspected acute myocardial infarction: ISIS-2. Lancet 1988;2:349-60. http://www.ncbi.nlm.nih.gov/pubmed/2899772
  • Treatment Effect Metrics That Can Be Calculated From a 2x2 Table Three treatment effect metrics can be calculated from a 2x2 table: the risk difference, odds ratio, and risk ratio. By convention, the risk difference is the treatment-group event rate minus the control-group event rate. If the outcome is undesirable, a negative risk difference is interpreted as a reduction of the bad outcome. The odds ratio is the ratio of the odds of having an event in the treatment group to the odds of having an event in the control group. The risk ratio is equal to the treatment-group event rate divided by the control-group event rate.
  • Some Characteristics and Uses of the Risk Difference
  • Some Characteristics and Uses of the Odds Ratio
  • Some Characteristics and Uses of the Risk Ratio
  • When the Complementary Outcome of the Risk Ratio Is Asymmetric This slide demonstrates that the complementary outcome for the odds ratio is symmetrical but the complimentary outcome for the risk ratio is not. Thus, one needs to be careful in using risk ratio as the outcome measure.
  • Calculation of Treatment Effects in the Second International Study of Infarct Survival (ISIS-2) This is an example of how metrics were calculated for one study. The metrics for all studies included in a meta-analysis are similarly calculated.
  • Treatment Effects Estimates in Different Metrics: Second International Study of Infarct Survival (ISIS-2) The methods of calculating the standard error and confidence interval for individual studies are not shown in this presentation. Many books include these formulae, and meta-analysis computer software typically performs and reports these calculations.
  • Example: Meta-Analysis Data Set In a meta-analysis, we are interested in an overall estimate of the effect on studies that meet eligibility criteria for a research question of interest. This slide depicts the outcome data of 17 studies assembled for a meta-analysis. The next few slides address the issues of how we should combine the data.
  • Simpson’s Paradox This example illustrates why pooling counts from 2x2 tables across studies into a single table may yield misleading results. This phenomenon is known as the Simpson’s Paradox. Reference: Charig CR, Webb DR, Payne, SR, et al. Comparison of treatment of renal calculi by operative surgery, percutaneous nephrolithotomy, and extracorporeal shock wave lithotripsy. BMJ 1986;292:879-82. http://www.ncbi.nlm.nih.gov/pubmed/3083922
  • Simpson’s Paradox (II) Each stone-size category reveals that open surgery leads to higher rate of success, but pooling the raw data yields paradoxical results. In this example, the paradox is due to the different rates of open surgery and percutaneous nephrolithotomy in patients with different stone sizes. Applying this concept to meta-analysis, calculating the study effect first and then combining the effects estimate across studies eliminates this problem. Reference: Charig CR, Webb DR, Payne, SR, et al. Comparison of treatment of renal calculi by operative surgery, percutaneous nephrolithotomy, and extracorporeal shock wave lithotripsy. BMJ 1986;292:879-82. http://www.ncbi.nlm.nih.gov/pubmed/3083922
  • Combining Effect Estimates This slide and the next few discuss the basic concept of combining the effect estimates from studies included in a meta-analysis. We can use the diastolic blood pressure reduction data from three studies to illustrate how an average (overall) effect is estimated. Notice the three studies differ along three parameters: 1. Studies 1 and 2 have similar clinical effects: a respective reduction in blood pressure of -6.2 and -7.7 mm Hg reduction in blood pressure. 2. Study 3 has no clinical effect; it has a much smaller sample size than the first two. 3. Study 3 also has a wider confidence interval and is, therefore, less precise in its estimate of the average (overall) effect. There are two ways to average these data: a simple average and a weighted average. We explore the implications of these approaches in the following slides.
  • Simple Average The simple average for the three blood pressure studies is -4.7 mmHg. However, this simple average is not a useful average for the three studies. A simple average weights the contribution of each study equally. When there are important differences in the sample size or in the results across studies, the simple average does not provide a reasonable answer that reflects the contribution of the data. Intuitively, we believe that larger studies should be trusted more than smaller ones.
  • 37 Weighted Average The weighted average for blood pressure reduction, when weighted by sample size, is -6.4 mm Hg. This average is more consistent with the results of the first two studies, which were more precise. Although weighting by sample size is simplistically appealing, it offers only a rough approximation of the precision of the estimators. In this case, where the outcome is continuous, the better weight to use is the inverse of the variance/N. Consider two studies with equal sample sizes. It is common for the samples in the two studies to have different amounts of dispersion. Thus, the estimates of the mean would have different degrees of precision. Later, we will examine methods for weighting that incorporate both sample size and precision. The approach to calculating the weighted average for a binary outcome is similar. The variance formula for binary outcomes depends upon the metric selected for the meta-analysis and uses the number of events and sample size in the calculation.
  • 38 General Formula: Weighted Average Effect Size (d + ) This general formula is almost identical to the weighted average formula we used earlier for the three blood pressure studies. The one exception is that the letter “d” is used to represent the measure of effect in this formula, whereas the previous formula used the letter “x.” This formula can be used to calculate weighted averages for any type of data, even those from one-arm studies. In some cases, when using relative measures, such as relative risk and the odds ratio, the effect must be first transformed. In the case of relative risk and the odds ratio, the transformation would be its natural logarithm. The primary difference in using this formula for various meta-analytic data sets is how the weights, represented by the letter “w i ,” are calculated.
  • Calculation of Weights
  • Heterogeneity (Diversity) It may be easier to think of homogeneity rather than heterogeneity. Homogeneity is the degree to which studies are sufficiently similar, such that a single average effect would be meaningful. Homogeneity is the basis for the “apples and oranges” debate on which many critics of this science base their argument. A useful way to consider homogeneity is from two perspectives: clinical and methodological diversity, and statistical heterogeneity.
  • Clinical Diversity
  • Example: A Meta-analysis With a Large Degree of Clinical Diversity This is an example of a meta-analysis with a large degree of clinical diversity in the included studies. Many different combinations of endoscopic treatments, conditions, and outcomes are possible. Given the wide range of clinical diversity, one might ask, “Should a meta-analysis be carried out?” If one uses a narrow set of criteria (e.g., a comparison of argon laser treatment with placebo in patients with active bleeding, using mortality as the outcome) for meta-analysis, none of the 25 randomized controlled trial might be suitable for a meta-analysis. However, if one frames a broader question that is nonetheless clinically meaningful (combining multiple categories into one), meta-analysis may be appropriate. For example, if we accept that all three listed outcomes are undesirable and that reducing their rates is the key question, then it may make sense to combine the outcomes into one category called “undesirable outcomes.” This decision requires clinical judgment and input from content experts. Managing clinical diversity is a key issue in meta-analysis and will be discussed further in the next module. Reference: Sacks HS, Chalmers TC, Blum AL, et al. Endoscopic hemostasis: an effective therapy for bleeding peptic ulcers. JAMA 1990;264:494-9. http://www.ncbi.nlm.nih.gov/pubmed/2142225
  • Methodological Diversity Methodological diversity refers to variations in the design and conduct of clinical trials. It includes issues such as (but not limited to) blinded or unblinded studies, method of random allocation, and choice of measurement methods. The impact of methodological diversity can be explored by examining the differential effects of subgroups.
  • Statistical Heterogeneity The next several slides discuss the statistical models used to combine data in a meta-analysis. The key concepts discussed are the fixed effect model and the random effects model.
  • Cochran’s Q-Statistics: Chi-square (  2 ) Test for Homogeneity This slide depicts the formula for Q-statistics. Under the null hypothesis of homogeneity, the Q-statistics follow a  2 distribution with k-1 degrees of freedom. When the value of the Q-statistic is too high, the null hypothesis of homogeneity is rejected. The  2 test, however, has low sensitivity to detect heterogeneity. A significance level of 0.1 instead of 0.05 is generally recommended. Any of the effect measures described earlier can be used with appropriate transformation and corresponding inverse variance weights. The purpose of showing the formula is not to teach reviewers how to calculate but to convey the idea that Q-statistics measure between-study variation.
  • The I 2 Index and Its Interpretation Reference: Higgins JP, Thompson SG, Deeks JJ, et al. Measuring inconsistency in meta-analyses. BMJ 2003;327:557–60.
  • Example: A Fixed Effect Model
  • Random Sampling From a Container With a Fixed Number of White and Black Balls (Equal Sample Size) In this illustration, to estimate the proportion of black and white balls, we performed 6 draws of equal sample size (10 balls each). If the sample size is small (as shown here), there will likely be wide variations of results across samples. Each sample will not be a good estimate of the fixed proportion. However, if we combine the results (i.e., do a meta-analysis) across the samples, we will get a more reliable estimate.
  • Random Sampling From a Container With a Fixed Number of Black and White Balls (Different Sample Size) In this illustration, we drew five samples, but the sample size of each draw is different. A weighted average of the samples should be used to estimate the proportion of black and white balls.
  • Different Containers With Different Proportions of Black and White Balls (Random Effects Model) This slide illustrates the meaning of a random effects model. Instead of a single container, now we have seven containers with different proportions of black and white balls. We are interested in estimating the overall proportion of the black and white balls in all the containers. This illustration is analogous to a clinical scenario of many different population groups, each with a different response rate to a treatment. Nonetheless, we are interested in knowing the overall effect of an intervention in the entire population.
  • Random Sampling From Containers To Get an Overall Estimate of the Proportion of Black and White Balls If we are interested in an overall estimate of the proportion of black and white balls in all the containers, we would randomly sample from these containers and then calculate a weighted average. Note that not all the containers are sampled because there may be too many or by chance or by design we will not draw balls from them. We assume that the proportion of black and white balls in all the containers follows a normal distribution. The weight to be used for each study in this setting must consider both within-sample variations and between-sample variations.
  • Statistical Models of Combining 2x2 Tables
  • Example Meta-analysis Where Fixed and the Random Effects Models Yield Identical Results This slide presents a forest plot of a meta-analysis that compared newer antibiotics with amoxicillin for acute sinusitis. The risk ratios of 13 studies are plotted with their respective 95% confidence intervals. The estimates of the risk of clinical failure of each study hovers around 1 for all studies, suggesting no difference in results (the studies are homogeneous) across these studies. Not surprisingly, the overall estimate of the meta-analysis reflects this observation. In this data set the DerSimonian and Laird random effects model’s estimate of between-study heterogeneity is 0.   In this case, the results from the random effects model are identical to those of the fixed effects model.   The chi-squared test discussed earlier could be used to test for between-study heterogeneity. In this particular example it is not significant.
  • Example Meta-analysis Where Results from Fixed and Random Effects Models Will Differ This slide presents a forest plot of 23 studies used in a meta-analysis of observational studies on the efficacy of influenza vaccination in elderly people. There is large degree of heterogeneity in the odds ratios (ranging from 0.04 to 1.1) of overall mortality. The 95% confidence intervals of many studies do not overlap each other. This suggests that a fixed effects model will fit these data poorly. In this data set the DerSimonian and Laird random effects model’s estimate of between-study heterogeneity is non-0, and therefore its results will differ from those of the fixed effects model.     Further, studies were conducted in different years (thus using different strains of viruses) and in different settings. The efficacy of vaccination is likely to vary with different viruses and in different populations; Therefore, there is probably a lot of clinical diversity, and the assumptions made by a fixed effects model (single summary effect) are probably not true.   So which model to pick? In most meta-analyses clinical and methodological diversity are abundant, and therefore the basic assumption of the fixed effects model that there is a single treatment effect is probably too strong. Therefore, in most meta-analyses the random effects model is more justifiable. There are exceptions, but we will not discuss them here. In the meta-analysis in this example, a random effects model should be used, and the I2 index should be useful to quantify the extent of heterogeneity. Reference: Gross PA, Hermogenes AW, Sacks HS, et al. The efficacy of Influenza vaccine in elderly persons: a meta-analysis and review of the literature. Ann Intern Med 1995;123:518-27. http://www.ncbi.nlm.nih.gov/pubmed/7661497
  • Weights of the Fixed Effect and Random Effects Models This slide depicts two weight formulas: one for the fixed effect model and one for the random effects model. The difference between the fixed effect weight and the random effects weight is that the latter incorporates the between-study variance. When a group of studies is statistically homogeneous, the between-study variance is small (or zero). As the studies become increasingly heterogeneous, the between-study variance increases. The implications of this relationship are as follows: 1. When studies are homogenous (i.e., when between-study variance is small), the fixed effect and random effects study-specific weights are essentially equal, which, overall, results in similar weighted average effects and confidence intervals. As the between-study variance increases (increasing heterogeneity), the two weights will become increasingly different. If the between-study variance is large enough, this value becomes dominant, and the random effects weights for every study move toward equality. In cases of extreme heterogeneity, the random effects model may give what appears to be counterintuitive results. However, if one bears in mind the effect of the between-study variance component in the formula, this is entirely consistent with the formulation of this approach. In situations that give rise to counterintuitive results, the wisdom of doing a meta-analysis should be questioned. Alternative approaches to quantitatively synthesize data, such as meta-regression, should be considered.
  • Commonly Used Statistical Methods for Combining 2x2 Tables
  • Dealing With Heterogeneity This block diagram depicts four general approaches and the specific methods that can be used to handle heterogeneity that may be encountered in a meta-analysis. The first approach is to ignore heterogeneity and combine data using a fixed effect model. The second approach is to test for heterogeneity using a chi-squared test and to not combine the data if the test is significant. In the third approach, if after assessing heterogeneity of the data it is deemed appropriate to combine results to provide an overall effect, then the random effects model is used. The fourth approach is to seek to explain heterogeneity by performing subgroup analyses or meta-regressions. The next module focuses on exploring and identifying reasons for heterogeneity. Reference: Lau J, Ioannidis JPA, Schmid CH. Quantitative synthesis in systematic review. Ann Intern Med 1997;127:820-6. http://www.ncbi.nlm.nih.gov/pubmed/9382404
  • Statistical Models of Combining 2x2 Tables: Summary
  • Caveats
  • Key Messages
  • References (I)
  • References (II)
  • Authors
  • Quantitative Synthesis I

    1. 1. Quantitative Synthesis I Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov
    2. 2. Systematic Review Process Overview
    3. 3. <ul><li>To list the basic principles of combining data </li></ul><ul><li>To recognize common metrics for meta-analysis </li></ul><ul><li>To describe the role of weights to combine results across studies </li></ul><ul><li>To distinguish between clinical and methodological diversity and statistical heterogeneity </li></ul><ul><li>To define fixed effect model and random effects model </li></ul>Learning Objectives
    4. 4. <ul><li>Quantitative overview/synthesis </li></ul><ul><li>Pooling </li></ul><ul><ul><li>Less precise </li></ul></ul><ul><ul><li>Suggests that data from multiple sources are simply lumped together </li></ul></ul><ul><li>Combining </li></ul><ul><ul><li>Preferred by some </li></ul></ul><ul><ul><li>Suggests applying statistical procedures to data </li></ul></ul>Synonyms for Meta-Analysis
    5. 5. <ul><li>Improve the power to detect a small difference if the individual studies are small </li></ul><ul><li>Improve the precision of the effect measure </li></ul><ul><li>Compare the efficacy of alternative interventions and assess consistency of effects across study and patient characteristics </li></ul><ul><li>Gain insights into statistical heterogeneity </li></ul><ul><li>Help to understand controversy arising from conflicting studies or generate new hypotheses to explain these conflicts </li></ul><ul><li>Force rigorous assessment of the data </li></ul>Reasons To Conduct Meta-Analyses
    6. 6. Commonly Encountered Comparative Effect Measures Type of Data Corresponding Effect Measure Continuous <ul><li>Mean difference (e.g., mmol, mmHg) </li></ul><ul><li>Standardized mean difference (effect size) </li></ul><ul><li>Correlation </li></ul>Dichotomous Odds ratio, risk ratio, risk difference Time to event Hazard ratio
    7. 7. <ul><li>For each analysis, one study should contribute only one treatment effect. </li></ul><ul><li>The effect estimate may be for a single outcome or a composite. </li></ul><ul><li>The outcome being combined should be the same — or similar, based on clinical plausibility — across studies. </li></ul><ul><li>Know the research question. The question drives study selection, data synthesis, and interpretation of the results. </li></ul>Principles of Combining Data for Basic Meta-Analyses
    8. 8. <ul><li>Biological and clinical plausibility </li></ul><ul><li>Scale of effect measure </li></ul><ul><li>Studies with small numbers of events do not give reliable estimates </li></ul>Things To Know About the Data Before Combining Them
    9. 9. True Associations May Disappear When Data Are Combined Inappropriately
    10. 10. An Association May Be Seen When There Is None
    11. 11. Changes in the Same Scale May Have Different Meanings <ul><li>Both A–B and C–D involve a change of one absolute unit </li></ul><ul><li>A–B change (1 to 2) represents a 100% relative change </li></ul><ul><li>C–D change (7 to 8) represents only a 14% relative change </li></ul>
    12. 12. Effect of the Choice of Metric on Meta-analysis Treatment Control Study Events Total Rate Events Total Rate Relative Risk Risk Difference A 100 1000 10% 200 1000 20% 0.5 10% B 1 1000 0.1% 2 1000 0.2% 0.5 0.1%
    13. 13. Effect of Small Changes on the Estimate Baseline case Effect of decrease of 1 event Effect of increase of 1 event Relative change of estimate 2/10 20% 1/10 10% 3/10 30% ±50% 20/100 20% 19/100 19% 21/100 21% ±5% 200/1,000 20% 199/1,000 19.9% 201/1,000 20.1% ±0.5%
    14. 14. <ul><li>Outcomes that have two states (e.g., dead or alive, success or failure) </li></ul><ul><li>The most common type of outcome reported in clinical trials </li></ul><ul><li>2x2 tables commonly used to report binary outcomes </li></ul>Binary Outcomes
    15. 15. A Sample 2x2 Table ISIS-2 Collaborative Group. Lancet 1988;2:349-60. Binary outcomes data to be extracted from studies Vascular deaths Survival Total Streptokinase 791 7,801 8,592 Placebo 1,029 7,566 8,595
    16. 16. Treatment Effect Metrics That Can Be Calculated From a 2x2 Table OR = (a / b) / (c / d)
    17. 17. <ul><li>Value ranges from -1 to +1 </li></ul><ul><li>Magnitude of effect is directly interpretable </li></ul><ul><li>Has the same meaning for the complementary outcome (e.g., 5% more people dying is 5% fewer living) </li></ul><ul><li>Across studies in many settings, tends to be more heterogeneous than relative measures </li></ul><ul><li>Inverse is the number needed to treat (NNT) and may be clinically useful </li></ul><ul><li>If heterogeneity is present, a single NNT derived from the overall risk difference could be misleading </li></ul>Some Characteristics and Uses of the Risk Difference
    18. 18. <ul><li>Value ranges from 1/oo to +  </li></ul><ul><li>Has desirable statistical properties; better normality approximation in log scale than risk ratio </li></ul><ul><li>Symmetrical meaning for complementary outcome (the odds ratio of dying is equal to the opposite [inverse] of the odds ratio of living) </li></ul><ul><li>Ratio of two odds is not intuitive to interpret </li></ul><ul><li>Often used to approximate risk ratio (but gives inflated values at high event rates) </li></ul>Some Characteristics and Uses of the Odds Ratio
    19. 19. <ul><li>Value ranges from 0 to  </li></ul><ul><li>Like its derivative, relative risk reduction, is easy to understand and is preferred by clinicians </li></ul><ul><ul><li>Example: a risk ratio of 0.75 is a 25% relative reduction of the risk </li></ul></ul><ul><li>Requires a baseline rate for proper interpretation </li></ul><ul><ul><li>Example: an identical risk ratio for a study with a low event rate and another study with higher event rate may have very different clinical and public health implications </li></ul></ul><ul><li>Asymmetric meaning for the complementary outcome </li></ul><ul><ul><li>Example: the risk ratio of dying is not the same as the inverse of the risk ratio of living </li></ul></ul>Some Characteristics and Uses of the Risk Ratio
    20. 20. When the Complementary Outcome of the Risk Ratio Is Asymmetric <ul><li>Odds Ratio (Dead) = 20 x 60 / 40 x 80 = 3/8 = 0.375 </li></ul><ul><li>Odds Ratio (Alive) = 80 x 40 / 20 x 60 = 8/3 = 2.67 </li></ul><ul><li>Risk Ratio (Dead) = 20/100 / 40/100 = 1/2 = 0.5 </li></ul><ul><li>Risk Ratio (Alive) = 80/100 / 60/100 = 4/3 = 1.33 </li></ul>Dead Alive Total Treatment 20 80 100 Control 40 60 100
    21. 21. Calculation of Treatment Effects in the Second International Study of Infarct Survival ( ISIS-2) <ul><li>Treatment-Group Effect Rate = 791 / 8592 = 0.0921 </li></ul><ul><li>Control-Group Effect Rate = 1029 / 8595 = 0.1197 </li></ul><ul><li>Risk Ratio = 0.0921 / 0.1197 = 0.77 </li></ul><ul><li>Odds Ratio = (791 x 7566) / (1029 x 7801) = 0.75 </li></ul><ul><li>Risk Difference = 0.0921 – 0.1197 = -0.028 </li></ul>ISIS-2 Collaborative Group. Lancet 1988;2:349-60. Vascular deaths Survival Total Streptokinase 791 7,801 8,592 Placebo 1,029 7,566 8,595
    22. 22. Treatment Effects Estimates in Different Metrics: S econd International Study of Infarct Survival ( ISIS-2) ISIS-2 Collaborative Group. Lancet 1988;2:349-60. Streptokinase vs. Placebo Vascular Death Estimate 95% Confidence Interval Risk ratio 0.77 0.70 to 0.84 Odds ratio 0.75 0.68 to 0.82 Risk difference -0.028 -0.037 to -0.019 Number needed to treat 36 27 to 54
    23. 23. Example: Meta-Analysis Data Set Beta-Blockers after Myocardial Infarction - Secondary Prevention Experiment Control Odds 95% CI N Study Year Obs Tot Obs Tot Ratio Low High === ============ ==== ====== ====== ====== ====== ===== ===== =====   1 Reynolds 1972 3 38 3 39 1.03 0.19 5.45 2 Wilhelmsson 1974 7 114 14 116 0.48 0.18 1.23 3 Ahlmark 1974 5 69 11 93 0.58 0.19 1.76 4 Multctr. Int 1977 102 1533 127 1520 0.78 0.60 1.03 5 Baber 1980 28 355 27 365 1.07 0.62 1.86 6 Rehnqvist 1980 4 59 6 52 0.56 0.15 2.10 7 Norweg.Multr 1981 98 945 152 939 0.60 0.46 0.79 8 Taylor 1982 60 632 48 471 0.92 0.62 1.38 9 BHAT 1982 138 1916 188 1921 0.72 0.57 0.90 10 Julian 1982 64 873 52 583 0.81 0.55 1.18 11 Hansteen 1982 25 278 37 282 0.65 0.38 1.12 12 Manger Cats 1983 9 291 16 293 0.55 0.24 1.27 13 Rehnqvist 1983 25 154 31 147 0.73 0.40 1.30 14 ASPS 1983 45 263 47 266 0.96 0.61 1.51 15 EIS 1984 57 858 45 883 1.33 0.89 1.98 16 LITRG 1987 86 1195 93 1200 0.92 0.68 1.25 17 Herlitz 1988 169 698 179 697 0.92 0.73 1.18
    24. 24. <ul><li>A 1986 study by Charig et al. compared the treatment of renal calculi by open surgery and percutaneous nephrolithotomy. </li></ul><ul><li>The authors reported that success was achieved in 78% of patients after open surgery and in 83% after percutaneous nephrolithotomy. </li></ul><ul><li>When the size of the stones was taken into account, the apparent higher success rate of percutaneous nephrolithotomy was reversed. </li></ul>Simpson’s Paradox (I) Charig CR, et al. BMJ 1986;292:879-82.
    25. 25. Simpson’s Paradox (II) Stones < 2 cm Stones ≥ 2 cm Pooling Tables 1 and 2 Open (93%) > PN (87%) Open (73%) > PN (69%) Open (78%) < PN (83%) Charig CR, et al. BMJ 1986;292:879-82. PN = percutaneous nephrolithotomy Success Failure Open 81 6 PN 234 36 Success Failure Open 192 71 PN 55 25 Success Failure Open 273 77 PN 289 61
    26. 26. Combining Effect Estimates What is the average (overall) treatment -c ontrol difference in blood pressure? Study N Mean difference (mm Hg) 95% Confidence Interval A 554 -6.2 -6.9 to -5.5 B 304 -7.7 -10.2 to -5.2 C 39 -0.1 -6.5 to 6.3
    27. 27. Simple Average What is the average (overall) treatment -c ontrol difference in blood pressure? (-6.2) + (-7.7) + (-0.1) 3 = -4.7 mm Hg Study N Mean difference mmHg 95% CI A 554 -6.2 -6.9 to -5.5 B 304 -7.7 -10.2 to -5.2 C 39 -0.1 -6.5 to 6.3
    28. 28. Weighted Average What is the average (overall) treatment -c ontrol difference in blood pressure? (554 x - 6.2) + (304 x - 7.7) + (39 x - 0.1) 554 + 304 + 39 = - 6.4 mm Hg Study N Mean difference mmHg 95% CI A 554 -6.2 -6.9 to -5.5 B 304 -7.7 -10.2 to -5.2 C 39 -0.1 -6.5 to 6.3
    29. 29. General Formula: Weighted Average Effect Size (d + ) <ul><li>Where: </li></ul><ul><ul><li>d i = effect size of the i th study </li></ul></ul><ul><ul><li>w i = weight of the i th study </li></ul></ul><ul><ul><li>k = number of studies </li></ul></ul>
    30. 30. <ul><li>Generally is the inverse of the variance of treatment effect (that captures both study size and precision) </li></ul><ul><li>Different formula for odds ratio, risk ratio, and risk difference </li></ul><ul><li>Readily available in books and software </li></ul>Calculation of Weights
    31. 31. <ul><li>Is it reasonable? </li></ul><ul><ul><li>Are the characteristics and effects of studies sufficiently similar to estimate an average effect? </li></ul></ul><ul><li>Types of heterogeneity: </li></ul><ul><ul><li>Clinical diversity </li></ul></ul><ul><ul><li>Methodological diversity </li></ul></ul><ul><ul><li>Statistical heterogeneity </li></ul></ul>Heterogeneity (Diversity)
    32. 32. <ul><li>Are the studies of similar treatments, populations, settings, design, et cetera, such that an average effect would be clinically meaningful? </li></ul>Clinical Diversity
    33. 33. <ul><li>25 randomized controlled trials compared endoscopic hemostasis with standard therapy for bleeding peptic ulcer. </li></ul><ul><li>5 different types of treatment were used: monopolar electrode, bipolar electrode, argon laser, neodymium-YAG laser, and sclerosant injection. </li></ul><ul><li>4 different conditions were treated: active bleeding, a nonspurting blood vessel, no blood vessels seen, and undesignated. </li></ul><ul><li>3 different outcomes were assessed: emergency surgery, overall mortality, and recurrent bleeding. </li></ul>Example: A Meta-analysis With a Large Degree of Clinical Diversity Sacks HS, et al. JAMA 1990;264:494-9.
    34. 34. <ul><li>Are the studies of similar design and conduct such that an average effect would be clinically meaningful? </li></ul>Methodological Diversity
    35. 35. <ul><li>Is the observed variability of effects greater than that expected by chance alone? </li></ul><ul><li>Two statistical measures are commonly used to assess statistical heterogeneity: </li></ul><ul><ul><li>Cochran’s Q-statistics </li></ul></ul><ul><ul><li>I2 index </li></ul></ul>Statistical Heterogeneity
    36. 36. Cochran’s Q-Statistics: Chi-square (  2 ) Test for Homogeneity d i = effect measure; d + = weighted average Q-statistics measure between-study variation.
    37. 37. The I 2 Index and Its Interpretation <ul><li>Describes the percentage of total variation in study estimates that is due to heterogeneity rather than to chance </li></ul><ul><li>Value ranges from 0 to 100 percent </li></ul><ul><ul><li>A value of 25 percent is considered to be low heterogeneity, 50 percent to be moderate, and 75 percent to be large </li></ul></ul><ul><li>Is independent of the number of studies in the meta-analysis; it could be compared directly between meta-analyses </li></ul>Higgins JP, et al. BMJ 2003;327:557-60.
    38. 38. Example: A Fixed Effect Model <ul><li>Suppose that we have a container with a very large number of black and white balls. </li></ul><ul><li>The ratio of white to black balls is predetermined and fixed. </li></ul><ul><li>We wish to estimate this ratio. </li></ul><ul><li>Now, imagine that the container represents a clinical condition and the balls represent outcomes. </li></ul>
    39. 39. Random Sampling From a Container With a Fixed Number of White and Black Balls (Equal Sample Size)
    40. 40. Random Sampling From a Container With a Fixed Number of Black and White Balls (Different Sample Size)
    41. 41. Different Containers With Different Proportions of Black and White Balls (Random Effects Model)
    42. 42. Random Sampling From Containers To Get an Overall Estimate of the Proportion of Black and White Balls
    43. 43. <ul><li>Fixed effect model : assumes a common treatment effect. </li></ul><ul><ul><li>For inverse variance weighted method, the precision of the estimate determines the importance of the study. </li></ul></ul><ul><ul><li>The Peto and Mantel-Haenzel methods are noninverse variance weighted fixed effect models. </li></ul></ul><ul><li>Random effects model: in contrast to the fixed effect model, accounts for within-study variation. </li></ul><ul><ul><li>The most popular random effects model in use is the DerSimonian and Laird inverse variance weighted method, which calculates the sum of the within-study variation and the among-study variation. </li></ul></ul><ul><ul><li>Random effects model can also be implemented with Bayesian methods. </li></ul></ul>Statistical Models of Combining 2x2 Tables
    44. 44. Example Meta-analysis Where Fixed and the Random Effects Models Yield Identical Results
    45. 45. Example Meta-analysis Where Results from Fixed and Random Effects Models Will Differ Gross PA, et al. Inn Intern Med 1995;123:518-27. Reprinted with permission from the American College of Physicians.
    46. 46. Weights of the Fixed Effect and Random Effects Models Random Effects Weight Fixed Effect Weight where: v i = within study variance v * = between study variance
    47. 47. Commonly Used Statistical Methods for Combining 2x2 Tables Odds Ratio Risk Ratio Risk Difference Fixed Effect Model <ul><li>Mante </li></ul><ul><li>l-Haenszel Peto </li></ul><ul><li>Exact </li></ul><ul><li>Inverse variance weighted </li></ul><ul><li>Mantel-Haenszel </li></ul><ul><li>Inverse variance weighted </li></ul><ul><li>Inverse variance weighted </li></ul>Random Effects Model <ul><li>DerSimonian and Laird </li></ul><ul><li>DerSimonian and Laird </li></ul><ul><li>DerSimonian and Laird </li></ul>
    48. 48. Dealing With Heterogeneity Lau J, et al. Ann Intern Med 1997;127:820-6. Reprinted with permission from the American College of Physicians.
    49. 49. <ul><li>Most meta-analyses of clinical trials combine treatment effects (risk ratio, odds ratio, risk difference) across studies to produce a common estimate, by using either a fixed effect or random effects model. </li></ul><ul><li>In practice, the results from using these two models are similar when there is little or no heterogeneity. </li></ul><ul><li>When heterogeneity is present, the random effects model generally produces a more conservative result (smaller Z-score) with a similar estimate but also a wider confidence interval; however, there are rare exceptions of extreme heterogeneity where the random effects model may yield counterintuitive results. </li></ul>Summary: Statistical Models of Combining 2x2 Tables
    50. 50. <ul><li>Many assumptions are made in meta-analyses, so care is needed in the conduct and interpretation. </li></ul><ul><li>Most meta-analyses are retrospective exercises, suffering from all the problems of being an observational design. </li></ul><ul><li>Researchers cannot make up missing information or fix poorly collected, analyzed, or reported data. </li></ul>Caveats
    51. 51. <ul><li>Basic meta-analyses can be easily carried out with readily available statistical software. </li></ul><ul><li>Relative measures are more likely to be homogeneous across studies and are generally preferred. </li></ul><ul><li>The random effects model is the appropriate statistical model in most instances. </li></ul><ul><li>The decision to conduct a meta-analysis should be based on: </li></ul><ul><ul><li>a well-formulated question, </li></ul></ul><ul><ul><li>appreciation of the heterogeneity of the data, and </li></ul></ul><ul><ul><li>understanding of how the results will be used. </li></ul></ul>Key Messages
    52. 52. <ul><li>Charig CR, Webb DR, Payne, SR, et al. Comparison of treatment of renal calculi by operative surgery, percutaneous nephrolithotomy, and extracorporeal shock wave lithotripsy. BMJ 1986;292:879–82. </li></ul><ul><li>Gross PA, Hermogenes AW, Sacks HS, et al. The efficacy of Influenza vaccine in elderly persons: a meta-analysis and review of the literature. Ann Intern Med 1995;123:518-27. </li></ul><ul><li>Higgins JPT, Thompson SG, Deeks JJ, et al. Measuring inconsistency in meta-analyses. BMJ 2003;327:557–60. </li></ul><ul><li>Lau J, Ioannidis JPA, Schmid CH. Quantitative synthesis in systematic review. Ann Intern Med 1997;127:820-6. </li></ul>References (I)
    53. 53. <ul><li>ISIS-2 (Second International Study of Infarct Survival) Collaborative Group. Randomized trial of intravenous streptokinase, oral aspirin, both, or neither among 17,817 cases of suspected acute myocardial infarction: ISIS-2. Lancet 1988;2:349-60. </li></ul><ul><li>Sacks HS, Chalmers TC, Blum AL, et al. Endoscopic hemostasis: an effective therapy for bleeding peptic ulcers. JAMA 1990;264:494-9. </li></ul>References (II)
    54. 54. <ul><li>This presentation was prepared by Joseph Lau, M.D., and Thomas Trikalinos, M.D., Ph.D., members of the Tufts Medical Center Evidence-based Practice Center. </li></ul><ul><li>The information in this module is based on Chapter 9 in Version 1.0 of the Methods Guide for Comparative Effectiveness Reviews (available at: http://www.effectivehealthcare.ahrq.gov/repFiles/2007_10DraftMethodsGuide.pdf). </li></ul>Authors

    ×