ANNALS, AAPSS, 578, November 2001META-ANALYTIC METHODS FOR ACADEMYTHE ANNALS OF THE AMERICANCRIMINOLOGY Meta-Analytic Methods for Criminology By DAVID B. WILSON ABSTRACT: Meta-analysis was designed to synthesize empirical re- lationships across studies, such as the effects of a specific crime pre- vention intervention on criminal offending behavior. Meta-analysis focuses on the size and direction of effects across studies, examining the consistency of effects and the relationship between study features and observed effects. The findings from meta-analysis not only reveal robust empirical relationships but also identify existing weaknesses in the knowledge base. Furthermore, meta-analytic results can easily be translated into summary statistics useful for informing public pol- icy regarding effective crime prevention efforts. David B. Wilson is an assistant professor of the administration of justice at George Mason University. His research interests include program evaluation research method- ology, meta-analysis, crime and general problem behavior prevention programs, and ju- venile delinquency intervention effectiveness. NOTE: This work was supported by the Jerry Lee Foundation. 71
72 THE ANNALS OF THE AMERICAN ACADEMYI MAGINE you are given the task of synthesizing what is currentlyknown about the effectiveness of cor- typhoid fever (Pearson 1904). His method involved computing the cor- relation between inoculation andrectional boot camps for reducing mortality within each study and thenfuture criminal behavior among ju- averaging the correlations acrossvenile and adult offenders. An ex- studies, producing a composite corre-haustive search for all relevant eval- lation. By today’s standards, this wasuations of boot camp programs a meta-analysis, although the termcompared with more traditional was not introduced until the 1970sforms of punishment and rehabilita- (Glass 1976).tion identifies 29 unique studies. The The logical framework of meta-findings from these studies range analysis is based on the assumptionfrom large positive to large negative that the averaging of findings acrossstatistically significant effects. To studies will produce a more validcomplicate matters, the studies vary estimate of the effect of interestin the evaluation methods used, in- than that of any individual study.cluding the definition of recidivism Typically, the finding from any indi-(for example, rearrest, reconviction, vidual study is imprecise due to sam-and reinstitutionalization), offender pling error. Thus some studies of apopulations, and program character- specific phenomenon, such as theistics. How will you meaningfully effectiveness of correctional bootmake sense of this array of informa- camps, will overestimate and otherstion? will underestimate the size of the The statistical methods of meta- true effect. Instability in observedanalysis were designed specifically effects due to sampling error is anto address this situation. Meta-anal- assumption at the core of statisticalysis represents a statistical and sys- inference testing, such as a t testtematic approach to reviewing between an intervention and com-research findings across multiple parison condition. Averaging acrossindependent studies. As such, meta- studies is analogous to averaginganalyses are systematic reviews across individuals within a single(Petrosino et al. 2001 [this issue]). study or averaging across multipleHowever, not all criminological inter- test items.vention research literatures can be For a collection of pure replica-successfully meta-analyzed, and tions, the logic behind meta-analysisthus not all systematic reviews will is indisputable if one accepts theuse the statistical methods of meta- logic and assumptions of the stan-analysis. dard statistical practices of the social The basic idea behind meta-analy- and medical sciences. Meta-analysissis dates back almost 100 years and as it is applied in criminology and theis simple. Karl Pearson, the devel- other social sciences extends thisoper of the Pearson product-moment logic to collections of studies that arecorrelation coefficient, synthesized conceptual replications, that is, stud-the findings from multiple studies of ies that examine the same relation-the effectiveness of inoculation for ship of interest but differ from one
META-ANALYTIC METHODS FOR CRIMINOLOGY 73another in other respects, such as the it is objective and systematic, yetresearch design or elements of the simple. Furthermore it upholds theintervention. long-standing tradition in the social Conceptual replications are assumed sciences of allowing the statisticalto be estimating the same fundamen- significance test to be the arbiter oftal relationship, despite differences the validity of a scientific hypothesis.in methodology and other substan- The intuitive appeal of the votetive features. This variability in count obscures its weaknesses. First,study features can be viewed as a the vote count fails to account for thestrength, however, because a synthe- differential precision of the studiessis of conceptual replications can being reviewed. Larger studies, allshow that a relationship is observed else being equal, provide more pre-across a range of methodological and cise estimates of the relationship ofsubstantive variability. Unlike sam- interest and thus should be givenpling error, however, errors in esti- greater weight in a review.mates of the relationship of interest Second, the vote count fails to rec-that arise from poor study design will ognize the fundamental asymmetrynot necessarily cancel out as a result of the statistical significance test. Aof aggregation. Therefore the meta- statistically significant finding is aanalyst must carefully assess the strong conclusion, whereas a statisti-influence of methodological variation cally nonsignificant (null) finding is aon observed effects (Wilson and weak conclusion. In the vote-countLipsey, in press). review, null findings are typically interpreted as evidence that the rela- WHY META-ANALYSIS? tionship of interest does not exist (for example, the intervention is not Meta-analysis is not the only effective). This is an incorrect inter-method of synthesizing or reviewing pretation. Failure to reject a nullresults across studies. Other ap- hypothesis is not support for theproaches include the narrative and null, merely suspended judgment.vote-count review. The narrative Enough null findings in the samereview relies on a researcher’s ability direction are evidence that the null isto digest the array of findings across false. This possibility was recognizedstudies and arrive at a pronounce- by Fisher (1944), a strong proponentment regarding the evidence for or of significance testing.against a hypothesis using some Third, the vote count ignores theunknown and unknowable (that is, size of the observed effects. By focus-subjective) mental calculus. ing on statistical significance, and The vote-count method imposes not the size and direction of thediscipline on this process by tallying effect, a study with a small but statis-the number of studies with statisti- tically significant effect would becally significant findings in favor of viewed as evidence favoring the hy-the hypothesis and the number con- pothesis, and a study with a largetrary to the hypothesis (null find- nonsignificant effect would beings). This approach is appealing, for viewed as evidence against the
74 THE ANNALS OF THE AMERICAN ACADEMYhypothesis. Both studies provide evi- As a method, meta-analysisdence that the relationship is non- includes all of the essential featureszero, although the strength of that of a systematic review (see Petrosinoevidence is weak in one of the studies. et al. 2001), including an exhaustiveThe benefits of a null hypothesis sta- search for all relevant studies (pub-tistical significance test for inter- lished or not), explicit inclusion andpreting a finding from an individual exclusion criteria, and a coding pro-study do not translate into benefits tocol for extracting data from thewhen evaluating a collection of studies. The distinctive feature ofrelated studies. meta-analysis is the application of Furthermore a counterintuitive statistical techniques to the analysisfeature of the vote-count method is of the study findings, where studythat the likelihood of arriving at an findings are encoded on a commonincorrect conclusion increases as the metric. The section below presentsnumber of s tudies on a t opi c an overview of the analytic methodsincreases, if the typical statistical of meta-analysis. Several articles inpower of the studies in that area is this issue (MacKenzie, Wilson, andlow. This is a common situation in Kider 2001 [this issue]; Lipsey, Chap-criminology. For example, Lipsey and man, and Landenberger 2001 [thiscolleagues (1985) estimated that the issue]) provide examples of meta-typical power of evaluations of juve- analytic methods. This article con-nile delinquency interventions was cludes with a discussion of theless than .50. A vote-count review of strengths and weaknesses of meta-that literature is sure to yield mis- analysis and guidance on when not toleading conclusions. use meta-analysis. Meta-analysis avoids the pitfallsof the vote-count method by focusing A FRAMEWORK FORon the size and direction of effects META-ANALYSISacross studies, not whether the indi-vidual effects were statistically sig- A defining feature of meta-analy-nificant. The latter largely depends sis is the effect size, that is, any indexon the sample size of the study. Fur- of the effect of interest that is compa-thermore focusing on the size and rable across studies. The effect sizedirection of the effect makes better might index the effects of a treat-use of the data available in the pri- ment group relative to a comparisonmary studies, providing a mecha- group or the relationship betweennism for analyzing differences across two observed variables, such as gen-studies and drawing inferences der and mathematical achievementabout the likely size of the true popu- or attachment to parents and delin-lation effect of interest. The statisti- quent behavior. In the analysis ofcal methods of meta-analysis allow meta-analytic data, the effect size isfor an assessment of both the consis- the dependent variable.tency of findings across studies and The need for an effect size placesthe relationship of study features restrictions on what research can bewith variability in effects. meta-analyzed. The collection of
META-ANALYTIC METHODS FOR CRIMINOLOGY 75studies of interest to the reviewer has been argued that the correlationmust examine the same basic rela- coefficient is the ideal effect sizetionship, even if at a broad level of (Rosenthal 1991). However, the stan-abstraction. At the broad end of the dardized mean difference and oddscontinuum would be a group of stud- ratio effect sizes have distinct statis-ies examining the effects of school- tical advantages over the correlationbased prevention programs on delin- coefficient for intervention researchquent behavior. At the narrow end of and are more natural indices of pro-the continuum would be a set of repli- gram effects.cations of a study on the effects of thedrug DepoProvea on the perpetra- Standardizedtion of sexual offenses. The research mean differencedesigns of a collection of studieswould all need to be sufficiently simi- The standardized mean differ-lar such that a comparable effect size ence, d, represents the effect of ancould be computed from each. Thus intervention as the difference between the intervention and com-most meta-analyses of intervention parison group means on the depend-studies will stipulate that eligible ent variable of interest, standardizedstudies use a comparison group by the pooled within-groups stan-design. dard deviation. Thus findings based The specific effect size index used on different operationalizations ofin a given meta-analysis will depend the dependent variable of intereston the nature of the research being (for example, delinquency) are stan-synthesized. Commonly used effect dardized to a common metric: stan-size indices for intervention research dard deviation units for the popula-are the standardized mean differ- tion. An advantage of d is that it canence, odds ratio, and correlation coef- be computed from a wide range officient. The standardized mean dif- statistical data, including means andference–type effect size is well suited standard deviations, t tests, F tests,to two group comparison studies (for correlation coefficients, and 2 × 2 con-example, a treatment versus a com- tingency tables (see Lipsey and Wil-parison condition) with continuous son 2001). Although conceptualizedor dichotomous dependent measures. as the difference between two groupsThe odds ratio is well suited to these on a continuous dependent variable,same research domains with the d can also be computed from dichoto-exception that the dependent mea- mous data.sures must be dichotomous, such aswhether the participants recidivated Odds ratiowithin 12 months of leaving the pro-gram. The correlation coefficient can The odds ratio, o, represents thebe applied to the broadest range of effect of an intervention as the oddsresearch designs, including all of a favorable (or unfavorable) out-designs for which standardized mean come for the intervention group rela-difference and odds ratio effect sizes tive to the comparison group. It iscan be computed. Because of this, it used when the outcome is measured
76 THE ANNALS OF THE AMERICAN ACADEMYdichotomously, such as is common in cussion of other alternatives, seemedicine and criminology. The odds Lipsey and Wilson 2001).ratio is easy to compute from eitherthe raw frequencies of a 2 × 2 contin- ANALYSIS OFgency table or the proportions of suc- META-ANALYTIC DATAcesses or failures in each condition.As a ratio of two odds, a value of 1 A typical meta-analysis extractsindicates an equal likelihood of a suc- one or more effect sizes per study andcessful outcome, whereas values codes a variety of study characteris-between 1 and 0 indicate a negative tics to represent the important sub-effect and values greater than 1 indi- stantive and methodological differ-cate a positive effect. Unlike the cor- ences across studies. Before analysisrelation coefficient, the odds ratio is of the data, statistical transforma-unaffected by differential base rates tions and adjustments may need to(the marginal distribution) for the be applied to the effect size. If multi-outcome acros s s tudi es ( s ee ple effect sizes were extracted perFarrington and Loeber 2000), thus study, then a method of includingeliminating a potential source of only a single effect size per study (oreffect variability across studies. sample within a study) per analysis will need to be adopted. The analysisCorrelation coefficient of effect size data typically examines the central tendency of the effect size The correlation coefficient is a distribution and the consistency ofwidely used and widely understood effects across studies. Additionalstatistic within the social sciences. It analyses test for the ability of studycan be used to represent the relation- features to explain inconsistencies inship between two dichotomous vari- effects across studies. Meta-analyticables, a dichotomous and a continu- methods for performing these analy-ous variable, and two continuous ses are summarized below.variables. The correlation coefficienthas a distinct disadvantage, however, Transformationswhen one or both of the variables on and adjustmentswhich it is based are dichotomous(Farrington and Loeber 2000). For There are standard adjustmentsexample, the correlation coefficient is and transformations that are rou-restricted to less than +1 in absolute tinely applied to effect sizes, andvalue if the percentage of partici- optional adjustments may be appliedpants in the intervention and com- depending on the purpose of theparison conditions is not split fifty- meta-analysis. For example, Hedgesfifty. Thus it is recommended that it (1982; Hedges and Olkin 1985)only be used for meta-analyses of showed that the standardized meancorrelational research and that difference effect size is positivelymeta-analyses of intervention stud- biased when based on a small sam-ies use either the standardized mean ple; that is, it is too large in absolutedifference, the odds ratio, or a more value, and the bias increases as sam-specialized effect size (for a dis- ple size decreases. The size of bias is
META-ANALYTIC METHODS FOR CRIMINOLOGY 77very modest for all but very small studies, such as reliability and valid-sample sizes, but the adjustment is ity coefficients. The logic of theseeasy to perform and routinely done adjustments is to estimate whatwhen using d as the effect size index would have been observed under(for formulas, see the appendix). more ideal research conditions. When using the odds ratio, one These adjustments, while common inencounters a complication that is meta-analyses of measurementalso easily rectified. The odds ratio is generalizability studies, are rarelyasymmetric, with negative relation- used in meta-analyses of interven-ships represented as values between tion research. If they are used, it is0 and 1 and positive relationships recommended that a sensitivityrepresented as values between 1 and analysis be performed to assess theinfinity. This complicates analysis. effect the adjustments have on theFortunately, the natural logarithm of results.the odds ratio is symmetric about 0with a well-defined standard error. Statistical independenceThe importance of the latter is dis- among effect sizescussed below. Thus, for purposes of A complication with effect sizeanalysis, the odds ratio is trans- data is the often numerous effectformed into the logged odds ratio. sizes of interest available from eachResults can be transformed back into study. Effect sizes that are based onodds ratios for purposes of interpre- the same sample of individuals (ortation using the antilogarithm. other units of analysis, such as city Similarly the correlation coeffi- blocks and so forth) are statisticallycient has a distributional shape that dependent, that is, correlated withis less than ideal for purposes of com- each other. Meta-analytic analysisputing averages. Furthermore the assumes that each data point (effectstandard error is asymmetric, partic- size in this case) is statistically inde- 1ularly as the correlation approaches pendent of all other data points.–1 or +1. This is easily solved by Thus we can include only one effectapplying Fisher’s Zr transformation, size per sample in any given analysis.which normalizes the correlation and An independent set of effect sizes canresults in a standard error that is be obtained through several strate-remarkably simple. As with the odds gies. First, each major outcome con-ratio, final results can be trans- struct of interest can, and should, beformed back into correlation coeffi- analyzed separately. For example,cients for interpretative purposes. effect sizes representing employ- Hunter and Schmidt (1990) pro- ment success should be analyzed sep-posed adjusting effect sizes for mea- arately from those representingsurement unreliability and invalid- criminal behavior. Second, multipleity, range restriction, and artificial effect sizes within each outcome con-dichotomization. These adjustments, struct can be averaged to produce onehowever, depend on information that effect size per study or sample withinis rarely reported for outcome mea- a study. Alternatively, a meta-ana-sures in crime and justice evaluation lyst may choose a single effect size
78 THE ANNALS OF THE AMERICAN ACADEMYbased on an explicit criterion. That is, the overall mean effect size, com-the meta-analyst may prefer rearrest puted as a weighted mean, weightingdata over reinstitutionalization data by the inverse variance weight. A zif the former are available. Finally, test can be performed to assessthe meta-analyst may randomly whether the mean effect size is sta-select among those effect sizes that tistically greater than (or less than)are of interest to a given analysis. 0, and a confidence interval can beNote that several analyses can be constructed around the mean effectperformed, each with a different set size. Both statistics rely on the stan-of independent effect sizes. dard error of the mean effect size, computed from the sum of theThe inverse variance weight weights. Thus both the precision and number of the individual effect sizes An additional complication of influence the precision of the meanmeta-analytic data is the differential effect size. (For equations, see theprecision in effect sizes across stud- appendix.)ies. Effect sizes based on large sam- The mean effect size is meaningfulples, all other things being equal, are only if the effects are consistentmore precise than effect sizes based across studies, that is, statisticallyon small samples. A simple solution homogeneous. If the effects areto this problem would be to weight highly heterogeneous, then a singleeach effect size by its sample size. overall mean effect size does not ade-Hedges (1982) showed, however, that qu at el y repres en t t h e ef f ect sthe optimal weight is based on the observed by the collection of studies.variance (squared standard error) of In meta-analysis, consistency ineach effect size. This is intuitively effects is assessed with the homoge-appealing as well, for the standard neity statistic Q. A statistically sig-error is a statistical expression of the n i f i can t Q i n di cat es t h at t h eprecision of parameter, such as an observed variability in effect sizeseffect size. The smaller the standard exceeds statistical expectationserror, the more precise is the effect regarding the variability that wouldsize. Thus, in all meta-analytic anal- be observed across pure replications,yses, weights are computed from the that is, if the collection of studiesinverse of the squared standard error were indeed estimating a commonof the effect size. This is called the population effect size. A statisticallyinverse variance weight method. nonsignificant Q suggests that theEquations for the inverse variance variability in effects across studies isweight for each of the three effect size no greater than expected due to sam-indices discussed above are pre- pling error.sented in the appendix. A heterogeneous distribution (aThe mean effect size significant Q) is often the desired and related statistics outcome of a homogeneity analysis. Heterogeneity justifies the explora- A starting point for the analysis of tion of the relationship between studyeffect size data is the computation of features and effects, an important
META-ANALYTIC METHODS FOR CRIMINOLOGY 79aspect of meta-analysis. The analytic As with the overall distribution,approaches available to the meta- the residual distribution of effectsanalyst for examining between study within categories may be homoge-effects are an analysis of mean effect neous or heterogeneous. This issizes by a categorical study feature, tested with the Q within statistic (seeanalogous to a one-way ANOVA, and the appendix). A homogeneous Qa meta-analytic regression analysis within indicates that the categoricalapproach. Both approaches rely on variable explained the excess vari-inverse variance weighting, and both ability detected by the overall homo-can be implemented under the geneity test. In this case, the categor-assumptions of a fixed- or random- ical variable provides an explanationeffects model. The assumptions of for the variability in effects acrossthese models will be discussed below. studies. Alternatively, additional sources of variability in effects existCategorical analysis if the Q within is significant. of effect sizes: The The computation of the analog to analog to the ANOVA the ANOVA can be tedious. Macros that work with existing statistical The analog to the ANOVA-type software packages exist for perform-analysis is used to examine the rela- ing this analysis (for example, Lipseytionship between a single categorical and Wilson 2001; Wang and Bush-variable, such as treatment type or man 1998). BioStat (2000) has cre-research method, and effect size. ated a meta-analysis program thatThere may be as few as two catego- among other features performs theries, in which case the analysis is con- analog to the ANOVA analysis.ceptually similar to a t test, or manycategories. A separate mean effect Meta-analyticsize and associated statistics, such as regression analysisa z test and confidence interval, arecomputed for each category of the The analog to the ANOVA is lim-variable of interest. To test whether ited to a single categorical variable. Athe mean effect sizes differ across more flexible and general analyticcategories, a Q between groups is cal- strategy for assessing the relation-culated (see the appendix). Although ship between study features andthis statistic is distributed as a chi- effect size is regression analysis.square, it is interpreted in the same Regression analysis can incorporatefashion as an F from a one-way multiple independent variablesANOVA. A significant Q between (study features) in a single analysis,groups indicates that the variability including continuous variables andin the mean effect sizes across cate- categorical variables (via dummygories is greater than expected due to coding). The differences betweensampling error. Thus the category is ordinary least squares regressionrelated to effect size. Examination of and meta-analytic regression are theconfidence intervals provides evi- weighting by the inverse variancedence of the source of the important and a modification to the standarddifference(s). error of the regression coefficients,
80 THE ANNALS OF THE AMERICAN ACADEMYnecessitating the use of specialized Fixed and randomsoftware (for example, Lipsey and effects modelsWilson 2001; Wang and Bushman1998). As with the analog to the The statistical model presentedANOVA, two Q values are calculated above assumes that the collection ofas part of meta-analytic regression: a effect sizes being analyzed is esti-Q for the model and a Q for the resid- mating a common population effectual or error variance. The former is a size. In statistical terms, this is atest of the predictive ability of the fixed-effects model. Stated differ-study features in explaining between- ently, a fixed-effects model assumesstudies variability in effects. The that each effect size differs from theregression model accounts for signifi- true population effect size solely duecant variability in the effect size dis- to subject-level sampling error. Eachtribution if the Q for the model is sig- observed effect size is viewed as annificant. As with the Q within for the imperfect estimate of the true, singleanalog to the ANOVA, a significant Q population effect for the interventionfor the error variance indicates that of interest. This provides the theoret-excess variability remains in the ical basis for incorporating the stan-effects across studies after account- dard error of the effect size (an esti-ing for the variability explained by mate of subject-level sampling error)the regression model. That is, the into the analysis as the inverse vari-residual distribution in effect sizes is ance weight.heterogeneous. This assumption is restrictive and Recognizing the correlational likely to be untenable in many syn-nature of the above analyses of the theses of criminological interventionrelationship between study features research where studies of a commonand effect size is critical. Study fea- research hypothesis differ on manytures are often correlated with one dimensions, some of which are likelyanother and, as such, a moderating to be related to effect size. Thus eachrelationship may be the result of con- effect size has variability (that is,founded between-studies features. instability) due to subject-level sam-For example, the mean effect size for pling error and study-level variabil-treatment type A may be higher than ity. The random-effects modelthe mean effect size for treatment assumes that at least some portion oftype B. The studies examining treat- the study-level variability is unex-ment type B, however, may have used plained by the study featuresa less sensitive measure of the out- included in the statistical models ofcome construct, thus confounding effect size. These study differencestreatment type with characteristics may simply be unmeasured, or theyof the dependent variable. Multi- may be unmeasurable. In both cases,variate analyses can help assess the each effect size is assumed to esti-interrelationships between study mate a true population effect size forfeatures, but these analyses cannot that study, and the collection of trueaccount for unmeasured study population effect sizes represents acharacteristics. random distribution of effects. In
META-ANALYTIC METHODS FOR CRIMINOLOGY 81statistical terms, this is a random- effect size per study for any giveneffects model. analysis may also affect the meta- Methods for estimating random- analytic findings. For example, in theeffects models in meta-analysis are boot camp systematic review by Mac-well developed. The basic method Kenzie, Wilson, and Kider (2001), theinvolves modifying the definition of analyses were performed on a singlethe inverse variance weight such effect size selected from each studythat it incorporates both the subject- based on a set of decision rules. A sen-and study-level estimates of instabil- sitivity analysis showed that using aity. The inverse variance weight is composite of all recidivism effectthus based on both the standard sizes produced the same results, bol-error of the effect size and an esti- stering the authors’ confidence in themate of the variability in the distri- findings. Third, if the meta-analysisbution of population effects. The lat- has included methodologically weakter is computed from the observed studies, analyses examining the rela-distribution of effects. Random- tionship between method featureseffects models are more conservative and observed effects are essential.than fixed-effects models. Confi-dence intervals will be larger, and Illustration: Cognitive-regression coefficients that were sta- behavioral programstistically significant under a fixed- for sex offenderseffects model may no longer be signif- To illustrate the methods outlinedicant under a random-effects model. above, I have selected a subset ofIt is recommended that meta-analy- studies included in a meta-analysisses of criminological literatures use a of sex offender programs (Gallagher,random-effects model of analysis Wilson, and MacKenzie no date).unless a clear justification to do oth- Presented below are the programserwise exists. based on cognitive-behavioral princi- ples. Studies were included if theySensitivity analysis used a comparison group design and A final analytic issue is the sensi- the comparison received either notivity of the results to unusual study treatment or non-sex-offender-spe-effects and decisions made by the cific treatment. Studies also had tometa-analyst. First, it is wise to report a measure of sex offense recid-examine the influence of outliers in ivism at some point following termi-the distribution of effect sizes and nation of the program.the distribution of inverse variance A total of 13 studies met the eligi-weights. A modest effect size outlier bility criteria for this meta-analysis.with a large weight can drive an The recidivism data were dichoto-analysis. Rerunning an important mous and as such, the odds ratio wasanalysis with and without highly selected as the effect size index. Theinfluential studies can help verify odds ratio and 95 percent confidencethat the observed result is not solely interval for these 13 studies are pre-a function of a single unusual study. sented in Figure 1. Visual inspectionSecond, the method of selecting one of these odds ratios shows a distinct
82 THE ANNALS OF THE AMERICAN ACADEMY FIGURE 1 ODDS RATIO AND 95 PERCENT CONFIDENCE INTERVAL FOR EACH OF THE 13 COGNITIVE-BEHAVIORAL SEX OFFENDER EVALUATION STUDIES Author(s) N Favors Comparison Favors Intervention Borduin, Henggeler, Blaske & Stein (N = 16) McGrath, Hoke & Vojtisek (N = 103) Hildebran & Pithers (N = 90) Marhsall, Eccles & Barbaree (N = 38) Studer, Reddon, Roper & Estrada (N = 220) Nicholaichuk, Gordon, Andre & Gu (N = 579) Gordon & Nicholaichuk (N = 206) Guarino & Kimball (N = 75) Marques, Day, Nelson, & West (N = 229) Huot (N = 224) Gordon & Nicholaichuk (N = 1248) Song & Lieb (N = 278) Nicholaichuk (N = 65) Overall Mean Odds-Ratio .02 .1 .50 1 5 25 200 Odds-Ratio NOTE: Sources of programs are available from the author.positive trend, with 12 of the 13 stud- related to study features, Q = 21.99,ies observing lower recidivism rates df = 12, p < .05.(and hence odds ratios greater than This collection of studies differed1) for the sex offender treatment con- in many ways, both in the researchdition than the comparison condi- methods used and the specifics of thetion. The sole study with a negative sex offender treatment program.effect (an odds ratio between 0 and 1) Many of these 13 studies evaluated ahad a large confidence interval that cognitive-behavioral approach calledextended well into the positive range relapse prevention. Relapse preven-and was from a study of poor method- tion programs may be more (or less)ological quality. effective than other cognitive-behav- ioral programs. To explore this, the The weighted mean odds ratio for mean effect size for relapse preven-this collection of 13 studies was 2.33, tion and other cognitive-behavioraland the 95 percent confidence inter- programs was calculated (2.41 andval was 1.57 to 3.42. The z test indi- 1.73, respectively). Also calculatedcates that this odds ratio was statis- were the Q between and Q within.tically significant at conventional The Q between was 0.87, p > .05, indi-levels, z = 4.26, p < .001. This collec- cating that the observed differencetion of studies supports the conclu- between these two means was notsion that cognitive-behavioral pro- statistically significant. The Qgrams for sex offenders reduce the within was statistically significant,risk of a sexual reoffense. The homo- QWITHIN = 21.12, df = 11, p = .03, indi-geneity statistic was significant, cating that significant variabilityindicating that the findings are not acros s g rou ps remai n ed af t erconsistent across studies and may be accounting for treatment type.
META-ANALYTIC METHODS FOR CRIMINOLOGY 83 A regression analysis was per- from a practical or clinical perspec-formed to test whether the differen- tive. That is, is the effect “significant”tial lengths of follow-up across stud- in the everyday meaning of thaties and the different definitions of word? Meta-analysts are confrontedrecidivism could account for the het- with the same problem. What is theerogeneity. The regression coefficient practical significance of an observedfor whether the recidivism was mea- mean effect size? A common ap-sured at least five years posttreat- proach to addressing this problem isment was statistically significant the translation of the effect size intoand positive, B = 1.58, p = .01, sug- a success rate differential for thegesting that studies with longer fol- intervention and comparison condi-low-up periods observed larger dif- tions, such as using the binomialferences in the rates of sexual effect size display (Rosenthal andoffending between the treated and Rubin 1983). For example, a stan-nontreated groups. The effects of sex dardized mean difference effect sizeoffender programs may increase over of .40 is equivalent to a success ratetime, or the length of follow-up was differential of 20 percent (that is, 40related to an unmeasured program percent recidivism in the interven-characteristic that led to greater tion condition and 60 percent recidi-effectiveness. The regression coeffi- vism in the comparison condition). Ifcient for whether the recidivism mea- the audience for the meta-analysis issure was an indicator of arrest or not familiar with standardized meanreconviction was also statistically difference effect sizes, then the suc-significant, B = 1.25, p = .04, suggest- cess rate differential provides a use-ing that arrest may be a more sensi- ful method of understanding thetive measure of the program effects. practical significance of the observedSignificant variability in the effect findings.size distribution was accounted for The odds ratio has a natural inter-by this regression model, QMODEL = pretation without transformation:7.05, df = 3, p = .03. Furthermore the the odds ratio is the odds of a success-Q associated with the residual vari- ful outcome in the treated conditionability in effect sizes was not statisti- relative to the comparison condition.cally significant, QRESIDUAL = 14.9, df = Thinking about odds is, however, odd10, p = .13, indicating that the resid- for all but the more mathematicallyual variability in effects is not inclined. As with the standardizedgreater than would be expected due mean difference, a mean odds ratioto sampling error. can be translated into percentages of successes (or failures). This transla- INTERPRETATION OF tion requires “fixing” the failure rate META-ANALYTIC FINDINGS for one of the conditions. For exam- ple, if we assume a 50 percent recidi- A researcher who finds a statisti- vism rate for the comparison condi-cally significant effect is presented tion, then an odds ratio of 1.5with the difficult task of deciding translates into a recidivism rate of 40whether the effect is meaningful percent in the treatment condition.
84 THE ANNALS OF THE AMERICAN ACADEMYPresenting the results of a meta- applied to a small number of similaranalysis of odds ratios as percent- studies.ages provides a means of assessing As a practitioner of meta-analysis,the magnitude of the observed pro- I see few justified disadvantages togram effects. the use of meta-analysis. This does not mean that meta-analysis does not have its disadvantages. On the ADVANTAGES AND DISADVANTAGES OF META-ANALYSIS practical side, meta-analysis is far more time-consuming than tradi- Meta-analysis has several distinct tional forms of review and requires aadvantages over alternative forms of moderate level of statistical sophisti-reviewing empirical research. As a cation. Meta-analysis also simplifiessystematic method of review, meta- the findings of the individual studies,analysis is replicable by independent often representing each study as aresearchers. The methods are single effect size and a small set ofexplicit and open to the scrutiny of descriptor variables. Complex pat-other scholars, who may question the terns of effects often found in individ-inclusion and exclusion criteria and ual studies do not lend themselves tocritique the variables used to exam- synthesis, such as the results fromine between-studies differences. This individual growth-curve modeling.can lead to productive debates and To accommodate this, a reviewer maycompeting analyses of the meta-ana- wish to augment a meta-analyticlytic data. In addition, meta-analysis review with narrative descriptions ofmakes efficient use of the informa- important studies and interestingtion contained in the primary stud- study-level findings obscured in theies. Focusing on the direction and meta-analytic synthesis. Finally, themagnitude of the findings across methods of meta-analysis cannotstudies using a common statistical overcome weaknesses in the primarybenchmark allows for the explora- studies. If the research base thattion of relationships between study examines the hypothesis of interestfeatures of effects that would not oth- is methodologically weak, then theerwise be observable. The statistical findings from the meta-analysis willmethods of meta-analysis help guard also be weak. In these situations,against interpreting the dispersion meta-analysis creates a solid founda-in results as meaningful when it can tion for the next generation of studiesjust as easily be explained as sam- by clearly identifying the weak-pling error. Finally, meta-analysis nesses of the current knowledge basecan handle a much larger number of on a given issue.studies than could effectively besummarized with alternative meth- WHEN NOT TO DO META-ANALYSISods. There is no theoretical limit tothe number of studies that can be Meta-analysis is the preferredincorporated into a single meta-anal- method of systematically reviewing aysis, yet as a method it can also be collection of empirical studies
META-ANALYTIC METHODS FOR CRIMINOLOGY 85examining a common research analyzed. Finally, meta-analysishypothesis. However, meta-analysis does not address broad theoreticalis not appropriate for the synthesis of issues that may be important to aall empirical research literatures. debate regarding the value of variousFirst, meta-analysis cannot be used crime prevention efforts. Meta-anal-when a common effect size index can- ysis is designed to synthesize the evi-not be computed across the studies of dence regarding the strength of ainterest. For example, the appropri- relationship across distinct researchate effect size for area studies (that studies. This is a very specific taskis, studies that have a geographic that may be imbedded in a largerarea as the unit of analysis) is cur- scholarly endeavor.rently being discussed among mem-bers of the Campbell Collaboration.Second, the research designs across a CONCLUSIONScollection of studies examining therelationship of interest may be too Systematic reviews approach thedisparate for meaningful synthesis. task of summarizing findings of a col-For example, studies with different lection of research studies as aunits of analysis cannot be readily research task. As a method of sys-meta-analyzed unless sufficient data tematic reviewing, meta-analysisare presented to compute an effect takes this a step further by quantify-size at a common level of analysis. ing the direction and magnitude ofStudies with fundamentally differ- the findings of interest across studiesent research designs, such as one- and uses specialized statisticalgroup longitudinal studies and com- methods to analyze the relationshipparison group studies also should not between findings and study features.be combined in the same meta-analy- Properly executed, meta-analysissis. Third, the research question provides a firm foundation for futurefor a meta-analysis may involve research. That is, empirical relation-a multivariate relatio n s h i p. ships that are well established andAlthough methods have been devel- areas that are underresearched oroped for meta-analyzing multi- that have equivocal findings arevariate research studies (for exam- identified through the meta-analyticple, Becker 1992; Becker 1996; process. In addition, meta-analysisPremack and Hunter 1988), these provides a defensible strategy formethods have rarely been applied summarizing crime prevention andand are still not well developed. It is intervention efforts for informingunlikely that the more elaborate public policy. Although the methodsresearch designs will ever easily lend are technical, the findings can bethemselves to synthesis. Thus some translated into summary statisticsresearch questions addressed by pri- readily understandable by non–mary studies are not easily meta- social science researchers.
86 THE ANNALS OF THE AMERICAN ACADEMY APPENDIX EQUATIONS FOR THE CALCULATION OF EFFECT SIZES AND META-ANALYTIC SUMMARY STATISTICS No. Equation Notes Common effect size indices X1 − X 2 (1) d = Standardized mean difference effect size; X1 is the s pooled mean of the intervention condition; X2 is the mean of the comparison condition; and spooled is the pooled ad within-groups standard deviation (2) o = Odds ratio effect size; a and c are the number of bc successful outcomes in the intervention and comparison conditions, and b and d are the number of failures in the intervention and comparison conditions (based on a 2 × 2 contingency table) (3) r = r Correlation coefficient effect size; r is the Pearson product-moment correlation coefficient between the two variables of interest Common transformations of effect size 3 (4) d ′ = 1− d Small sample size bias correction; d is the standardized 4N − 9 mean difference effect size and N is the total sample size (5) lor = log(o) Log transformation of the odds ratio 1+ r (6) z =.5 log Fisher’s transformation of the correlation effect size 1− r lor (7) o = e Logged odds ratio (lor) transformed into an odds ratio e 2 z −1 (o); e is the constant 2.7183 (8) r = 2 z Transforms the effect size z from equation 6 back into a e +1 correlation; e is the constant 2.7183 Fixed effects model inverse variance weights n1 + n2 d′ 2 (9) v d = + The variance for the standardized mean difference; n1 n1n2 2(n1 + n2 ) and n2 are the sample sizes for the intervention and 1 1 1 1 comparison conditions(10) v lor = + + + The variance for the logged odds ratio; a, b, c, and d a b c d 1 are the cell frequencies of a 2 × 2 contingency table(11) v z = The variance for the Fisher’s transformed correlation N −3 1 coefficient; N is the total sample size(12) w = The inverse variance weight; v is the inverse variance v from equation 9, 10, or 11 Mean effect size and related statistics(13) ES = ∑ (ES ⋅ w ) Weighted mean effect size, where ES is the effect size ∑w index (equations 4, 5, or 6) and w is the inverse variance weight (equation 12)
META-ANALYTIC METHODS FOR CRIMINOLOGY 87 APPENDIX Continued No. Equation Notes 1(14) seES = The standard error of the mean effect size ∑w ES(15) z = A z test; tests whether ES is statistically greater than or seES less than 0(16) LowerCI = ES – 1.96seES Lower bound of the 95 percent confidence interval(17) UpperCI = ES + 1.96seES Upper bound of the 95 percent confidence interval Homogeneity test Q ( ∑ (ES ⋅ w )) 2(18) Q = ∑ (ES 2 ⋅w)− Homogeneity test Q; distributed as a chi-square, ∑w degrees of freedom equals the number of effect sizes less 1 Random effects variance component and weight Q − (k − 1)(19) Vθ = The random effects variance component; the random ∑w2 ∑w − ∑w effects variance component has a more complex form when used as part of the analog to the ANOVA or 1 regression models(20) w = The random effects inverse variance weight, where v is v + vθ defined as in equations 9 through 11 Analog to the ANOVA 2 (ES ⋅ w ∑ j(21) Q j = ∑ (ES j ⋅ w j ) − 2 j Q between groups; where j is 1 to the number of ∑w j categories for the independent variable; distributed as a chi-square with j – 1 degrees of freedom(22) QW = Q – QB Q within groups; where Q is the overall homogeneity statistics defined in equation 18 and QB is defined in equation 21; distributed as a chi-square with the number of effect sizes minus the number of categories in the independent variable as the degrees of freedom Meta-analytic regression analysis(23) Use specialized software For example, SAS, SPSS, or Stata macros by Lipsey and Wilson (2001); SAS macros by Wang and Bushman (1998)
88 THE ANNALS OF THE AMERICAN ACADEMY Note Larry V. Hedges. New York: Russell Sage. 1. Methods have been developed for han- Hedges, Larry V. 1982. Estimating Effectdling dependent effect sizes in a single analy- Size from a Series of Independent Ex-sis, but these methods are beyond the scope ofthis article. (For details, see Gleser and Olkin periments. Psychological Bulletin 92:1994; Kalaian and Raudenbush 1996.) 490-99. Hedges, Larry V. and Ingram Olkin. 1985. Statistical Methods for Meta-Analysis. References Orlando, FL: Academic Press. Hunter, John E. and Frank L. Schmidt.Becker, Betsy J. 1992. Models of Science 1990. Methods of Meta-Analysis: Cor- Achievement: Forces Affecting Perfor- recting Error and Bias in Research mance in School Science. In Meta- Findings. Newbury Park, CA: Sage. analysis for Explanation: A Casebook, Kalaian, H. A. and Stephen W. Rauden- ed. Thomas D. Cook, Harris Cooper, bush. 1996. A Multivariate Mixed Lin- David S. Cordray, Heidi Hartmann, ear Model For Meta-Analysis. Psycho- Larry V. Hedges, Richard J. Light, logical Methods 1:227-35. Thomas A. Louis, and Frederick Lipsey, Mark W., Gabrielle L. Chapman, Mosteller. New York: Russell Sage. and Nana A. Landenberger. 2001. Cog-Becker, G. 1996. The Meta-Aanalysis of nitive-Behavioral Programs for Of- Factor Analyses: An Illustration fenders. Annals of the American Acad- Based on the Cumulation of Correla- emy of Political and Social Science tion Matrices. Psychological Methods 578:144-157. 1:341-53. Lipsey, Mark W., Scott Crosse, J. Dunkle,BioStat. 2000. Comprehensive Meta- J. Pollard, and G. Stobart. 1985. Evalu- Analysis (Software Program, Version ation: The State of the Art and the 1.0.9). Englewood, NJ: BioStat. Avail- Sorry State of the Science. New Direc- able: www.metaanalysis.com. tions for Program Evaluation 27:7-28.Farrington, David P. and Rolf Loeber. Lipsey, Mark W. and David B. Wilson. 2000. Some Benefits of Dichot- 2001. Practical Meta-Analysis. Thou- omization in Psychiatric and Crimino- sand Oaks, CA: Sage. logical Research. Criminal Behaviour MacKenzie, Doris Layton, David B. Wil- and Mental Health 10:100-122. son, and Suzanne B. Kider. 2001. Ef-Fisher, Ronald A. 1944. Statistical fects of Correctional Boot Camps on Methods for Research Workers. 9th ed. Offending. Annals of the American London: Oliver and Boyd. Academy of Political and Social Sci-Gallagher, Catherine A., David B. Wilson, ence 578:126-143. and Doris Layton MacKenzie. N.d. A Pearson, Karl. 1904. Report on Certain Meta-Analysis of the Effectiveness of Enteric Fever Inoculation Statistics. Sexual Offender Treatment Pro- British Medical Journal 3:1243-46. grams. Unpublished manuscript, Uni- Quoted in Morton Hunt, How Science versity of Maryland at College Park. Takes Stock: The Story of Meta-Glass, Gene V. 1976. Primary, Secondary Analysis (New York: Russell Sage, and Meta-Analysis of Research. Edu- 1997). cational Researcher 5:3-8. Petrosino, Anthony, Robert F. Boruch,Gleser, Leon J. and Ingram Olkin. 1994. Haluk Soydan, Lorna Duggan, and Stochastically Dependent Effect Julio Sanchez-Meca. 2001. Meeting Sizes. In The Handbook of Research the Challenges of Evidence-Based Synthesis, ed. Harris Cooper and Policy: The Campbell Collaboration.
META-ANALYTIC METHODS FOR CRIMINOLOGY 89 Annals of the American Academy of fect. Journal of Educational Psychol- Political and Social Science 578:14-34. ogy 74:166-69.Premack, Steven L. and John E. Hunter. Wang, Morgan C. and Brad J. Bushman. 1988. Individual Unionization De- 1998. Integrating Results Through cisions. Psychological Bulletin 103: Meta-Analytic Review Using SAS 223-34. Software. Cary, NC: SAS Institute.Rosenthal, Robert. 1991. Meta-Analytic Wilson, David B. and Mark W. Lipsey. In Procedures for Social Research. Ap- press. The Role of Method in Treat- plied Social Research Methods Series. ment Effect Estimates: Evidence from Vol. 6. Newbury Park, CA: Sage. Psychological, Behavioral, and Educa-Rosenthal, Robert and Donald B. Rubin. tional Treatment Intervention Meta- 1983. A Simple, General Purpose Dis- Analyses. Psychological Methods. play of Magnitude of Experimental Ef-