Systematic Review Process Overview This slide illustrates the steps in the systematic review process. This module focuses on data extraction.
Why Is Data Extraction Important? A structured and organized data extraction is necessary for a systematic review or meta-analysis. To perform a systematic review or meta-analysis, one has to summarize studies to facilitate synthesis. A major value of extracting data into tables that have the same format is that comparisons across studies becomes easier, and the reader gets a better gestalt for the evidence. Furthermore, the structured format makes it simple to transfer data to other programs for analysis. In particular, one has to identify numerical data for meta-analyses. If the necessary numbers are not directly available, one would have to calculate them from sufficient statistics that are reported or from graphs. In practice, data extraction, in-depth review, and quality assessment (assessment of bias risk) happen at the same time. It is not uncommon that studies with overt methodological problems also report inconsistent data, and this becomes evident only when one actually tries to extract data. Often, during the data extraction phase, the reviewer realizes that some information is systematically missing or incorrectly assessed; some outcomes are never studied; or some populations are underrepresented in the existing evidence.
On Data Extraction (I) The extracted data should accurately reflect information reported in the publication, remain in a form close to the original reporting so that disputes can be easily resolved, and provide sufficient information to understand the studies and to perform analyses. Extract what is needed, but avoid overdoing it; data extraction is labor intensive and can be costly and error prone. Also, keep in mind that different research questions may have different data needs.
On Data Extraction (II) Data extraction is an intellectual exercise. It involves more than just copying words and numbers from a publication to a form. Often one needs specific knowledge of the clinical domain to correctly identify the information that needs to be extracted. For example, some familiarity with sleep medicine terminology is important when trying to classify sleep monitors according to the neurophysiologic channels they record. Furthermore, methodological and statistical knowledge is necessary, especially when the numerical data that one needs are not reported directly and need to be calculated based on other reported statistics. An example is the calculation of standard errors from confidence intervals or p -values; another is the calculation of hazard ratios from log-rank statistics or survival graphs. To some extent, interpretation of published data is needed (i.e., the extractor must be able to understand what the author meant). A very common occurrence pertains to the description of statistical methods in observational studies. Because of space limitations, and in an effort to avoid statistical jargon, authors may be less than clear in describing what they did. Another important thing is to appreciate the fact that what is reported is sometimes not what was carried out. Finally, data extraction and evaluation of risk of bias and of applicability typically occur at the same time.
Data Extraction: A Boring Task? Data extraction is a tedious task, but one learns a lot about the topic while performing it. In fact, every member of the systematic review team, including the leads and those who will perform the statistical analyses should perform part of the data extraction on almost every systematic review topic. This way, project leads and the statistical analysts get firsthand knowledge of the idiosyncrasies of a body of literature. In addition, they can appreciate what sort of data are typically available and what data can be extracted, which can help them guide the rest of the team. Reference: Gurevitch J, Hedges LV. Meta-analysis: combining the results of independent experiments. In: Scheiner AM and Gurevich J, eds. Design and analysis of ecological experiments . New York: Chapman & Hall; 1993. p. 347-70.
Comparative Effectiveness Review: Clarifying Research Terminology (II) In the parlance of the Evidence-based Practice Center Program, data extraction forms are also called evidence tables. Each study is extracted into its own set of evidence tables. Evidence tables are detailed and are typically not included in the main reports. The main reports organize studies into summary tables, which typically contain selected information. For example, a summary table may describe the characteristics of all studies that answer the first key question, and another summary table may describe the results of the same studies. Summary tables facilitate the presentation of the synthesis. The presentation of findings is discussed in detail in a separate module.
What Data To Collect? There is no single answer to this question. It depends on the key questions being asked, but also on how far the data extractors are willing to go to obtain a piece of information. We discuss more about this in later slides when we refer to digitization of graphs. One should be guided by the key questions and generate a template that captures relevant information. Typically, one would be expected to extract information on populations, interventions or exposures, comparators (when applicable), outcomes and numbers, and characteristics of study design. Of course, it is not necessary to lay out the template of the evidence tables (i.e., the data extraction forms) in this specific order. However, it is very likely that it will be meaningful to extract all of this information.
Data Elements: Population, Intervention, and Comparator Population-generic elements may include patient characteristics such as age, gender distribution, and disease stage. One may need more specific items, depending upon the topic. Choosing what data to extract to describe the intervention or exposure and comparators (when applicable) depends upon the type of study being extracted, which may be a randomized trial, an observational study, a diagnostic test study, a prognostic factor study, a family-based or population-based genetic study, et cetera. Describing the intervention in a trial is different from extracting genotypic distributions in a genetic association study.
Data Elements: Outcome (I) Of course, all outcomes should be determined a priori after discussions with the Technical Expert Panel. In practice, it is not uncommon that the protocol criteria are unclear about which outcomes to include and which to discard. For example, some studies may report mean change in ejection fraction, and other studies may report the proportion of patients with an increase in ejection fraction by at least 5 percent. It may be useful to record different definitions of “outcome” and consult with content experts before making a decision about which definition to use.
Data Elements: Outcome (II) The exact types of data that have to be extracted for meta-analysis depends upon the details of the topic at hand. Sometimes 2x2 cell counts need be recorded. Other times, one has to record other statistics, including means and standard deviations, p -values or ranked logarithmic odds scores (for nonparametric meta-analyses), correlation coefficients, and distributional parameters. Those interested in more information on quantitative synthesis are referred to the pertinent modules.
Data Elements: Timing and Study Design Which data elements one extracts vary by type of study. For example, if one analyzes randomized trials, it may be useful to extract a series of items including number of centers (for multicenter studies), method of randomization (to assess adequacy of allocation concealment), blinding of patients and outcome assessors, funding source, whether the analysis is by intention to treat, and so on. Again, one may have to use operational definitions for some of these items. For example, the intention-to-treat principle is not perceived identically by researchers who run trials and by study authors.
Always Provide Instructions
Single Versus Double Extraction
Developing Data Extraction Forms (Evidence Tables)
Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (I) Being as thorough as possible when working with the experts/project leads to identify fields for inclusion will help minimize the need for extensive modifications of the extraction form, but it is inevitable that new material will come to light as the studies are reviewed. Reworking the tables will be necessary. The dual review process ensures that nothing is left out if new fields are added after the first round of reviews, which is very common.
Evidence Tables: Example (I) These drafts of tables on traumatic brain injury and depression illustrate how tables can change: columns are renamed or relocated, fields are moved from column to column, fields are added or completely removed, et cetera. It is not important to understand exactly which fields changed place or were removed. The important thing to note is that extraction tables change after pilot testing to suit the perceptions or needs of the extractors.
Evidence Tables: Example (II) Some corrections will be simple: copy-and-paste text edits (e.g., “design” moving from one column to another). Others will require that extracters reread the papers and look for new information (e.g., the addition of irritability,” “aggression,” and “suicidality” as fields). Again, the dual review process helps when new fields are introduced; but sometimes, core staff members and project leads will be depended upon to ensure tables are complete.
Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (II)
Sample Fields From a Table Guidance Document: Vanderbilt University Evidence-based Center These are examples of fields that will, regardless of the project, be included in the evidence table. The more thorough the guide, the less variation (and incorrectly classified material). Nevertheless, tables will always be returned with formatting problems, incorrectly placed data, or omitted fields.
Example: Two Reviewers Extract Different Data In this example, two different reviewers — each with the same evidence table template, the same paper to review, the same guidance document, and the same set of instructions — arrived at different results.
Samples of Final Data Extraction Forms (Evidence Tables) Reference: Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
Examples: Differential Data Extraction by Two Reviewers Reference: Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
Characteristics of the Index Test and Reference Standard Again, do not spend time reviewing the slide. The aim is to illustrate that there can be a lot of data to be extracted. Here we get information on the characteristics of the index test and the reference standard, as well as the criteria that test readers used. Reference : Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
Results (Concordance/Accuracy) As before, do not spend time reviewing this slide. These tables would record numerical information. Reference : Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
Results (Nonquantitative) These tables would record additional information and quality items. Reference : Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
Tools Available for Data Extraction and Collection Data extractors should use whatever methods they are more comfortable with to record information from each publication. These include pencil and paper, word processing software (e.g., Microsoft Word), spreadsheet (e.g., Microsoft Excel), Database software (e.g., Microsoft Access, Epi Info™), dedicated off-the-shelf commercial software, and homegrown software.
Extracting the Data A content expert knows the jargon of the field and can identify subtle differences in treatment options and whether or not the definitions of outcomes across studies are consistent. However, when a content expert has no methodological background, he or she may be unable to back-calculate numerical data that are reported differently (e.g., a 95% confidence interval from a p -value) or may miss important limitations of the studies. Quality assessment can be meaningfully performed by people with good understanding of methodological issues. Independent data extraction in duplicate is the best way to ensure quality in data extraction, but it is also very expensive. Independent verification of extracted data is faster and, therefore, less expensive. However, reviewers may fail to identify mistakes or may make the same mistakes. The rationale for blinded extraction is that extractors will be more objective (especially in their quality assessment) if they are undistracted by the impact factor/fame of the journal, their personal biases about the treatments, or the fame of the authors. Based on limited empirical explorations of whether blinding extractors to authors or journals affects the quality of data extraction and methodological assessment, there is little evidence in favor of blinding extractors. The University of Pennsylvania Meta-analysis Blinding Study Group studied blinding versus nonblinding of extractors to studies, authors, and treatments, and “concluded that blinding is not necessary when conducting meta-analyses of RCTs [randomized controlled trials].” Furthermore, this is very impractical and time consuming: reformatting papers to blind them took on average 7.7 hours, and the blinding process itself took on average 1.3 hours per paper. Reference: Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Does blinding of readers affect the results of meta-analysis? Lancet 1997;350:185-6.
Challenges in Data Extraction We will not discuss the additional common issues in any detail. We will illustrate examples in the following slides, and then briefly go through additional common issues.
Examples of Data Reporting Problems (I) The numbers in this excerpt are full of inconsistencies that are not apparent unless one actually tries to extract simple information by going sentence by sentence: There are at least 40 patients in total. Overall there are 34 men and women. What happened to the other, at least, 6 subjects? There are 43 black and white people. Back-calculating the denominator (total number of people) from each proportion is even more amusing. In the order of appearance, the proportion of men comes from 23 people, women from 43 people, whites from 43 people, and blacks from 42 (or 43) people. Seeing these patterns may enable us to see where the errors occurred, but we cannot actually do a lot about them.
Examples of Data Reporting Problems (II) This is obviously a typo, but it could be a sign of poor proofreading.
Examples of Data Reporting Problems (III) The mean age is outside the age range. It seems that the weight range was also put as age range.
Inconsistencies in Published Papers This example is from a real study.
Results Text In the results text, we read that “24 deaths occurred in the PCI [percutaneous coronary intervention] group, […] and 25 in the MT [medical treatment] group.”
Overall Mortality Figure Look at the number of survivors at different time points among those receiving percutaneous coronary intervention and among those receiving medical treatment. The paper clearly states that there is no censoring, which means that the difference of those alive at baseline versus at 5 years is the number of all-cause deaths. We now get a different set of numbers. We cross out the previous set, not because we believe the second set of numbers to be more accurate, but to emphasize that these are different sets of numbers on the same outcome.
Clinical Events Table When we look at another major table in this paper, we get another set of numbers. Multiplying the proportions in the table with the total number of patients randomized in the respective study arms, we get a third set of numbers, which we list in the table on the right.
Why Do Such Problems Exist? Reference: Green BF, Hall JA. Quantitative methods for literature reviews. Annu Rev Psychol 1984;35:37-53.
Using Digitizing Software To extract data from graphs, one may use digitizing software for accuracy and convenience. The example here shows Engauge Digitizer, a free open-source program that works on all common platforms (Windows, Mac, Linux, Unix). The systematic reviewer imports an electronic version of the figure (scanned or copied from the PDF) into the software — and after setting the ranges and scales (linear, logarithmic, or other) of the axes — marks the points in the scatter plot with the red Xs. The software then converts the marks to coordinates. Note that the quality of the scanning or the photocopying can affect the digitizing results. For example, some photocopiers and scanners may distort the image slightly. However, this method should yield acceptable accuracy in most cases. R eference: Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/.
Additional Common Issues There are several additional issues to consider when extracting data, which we will not discuss in any detail. Very often, the sought-after information is missing in the published papers, and then one faces the choice of whether or not to contact the authors. Often Evidence-based Practice Center (EPC) reviewers do not contact authors for clarifications or for missing data. Missing data are not peer-reviewed, and supplemental information submitted to address queries may have been suboptimally analyzed. The decision whether or not to contact authors is made by each EPC on a case-by-case basis. Another problem is that, especially for observational studies, several publications may exist, each referring to partially overlapping populations. It can be very difficult to distinguish the degree of overlap and is usually the result of one of three things: overlapping data (from preliminary and later reports), the same data but different authors, or similar data (same authors) but a different cohort (need to verify with authors). Of course, one should always keep in mind the potential for fraudulent data. It may be impossible to distinguish fraudulent research from genuine data; one hopes that fraudulent research is not frequently encountered.
Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov
Data extraction involves more than copying words and numbers from the publication to a form.
Clinical domain, methodological, and statistical knowledge is needed to ensure the right information is captured.
Interpretation of published data is often needed.
What is reported is sometimes not what was carried out.
Data extraction and evaluation of risk of bias and of applicability typically occur at the same time.
On Data Extraction (II)
Data Extraction: A Boring Task? “ It is an eye-opening experience to attempt to extract information from a paper that you have read carefully and thoroughly understood only to be confronted with ambiguities, obscurities, and gaps in the data that only an attempt to quantify the results reveals.” — Gurevitch and Hedges (1993) Gurevitch J, Hedges LV. In: Design and analysis of ecological experiments ; 1993.
“ Data for the 40 patients who were given all 4 doses of medications were considered evaluable for efficacy and safety. The overall study population consisted of 10 (44%) men and 24 (56%) women, with a racial composition of 38 (88%) whites and 5 (12%) blacks.”
Let us extract the number of deaths in two study arms, at 5 years of followup . . .
Inconsistencies in Published Papers
Results Text Overall Mortality […] 24 deaths occurred in the PCI group, […] and 25 in the MT group […] MED and MT = medical treatment; PCI = percutaneous coronary intervention PCI (205) MED (203) Dead 24 25
Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Does blinding of readers affect the results of meta-analysis? Lancet 1997;350:185-6.
Green BF, Hall JA. Quantitative methods for literature reviews. Annu Rev Psychol 1984;35:37-53.
Gurevitch J, Hedges LV. Meta-analysis: combining the results of independent experiments. In: Scheiner AM and Gurevich J, eds. Design and analysis of ecological experiments . New York: Chapman & Hall; 1993. p. 347-70.
Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/.
Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/ id48TA.pdf.
This presentation was prepared by Joseph Lau, M.D., and Thomas Trikalinos, M.D., Ph.D., members of the Tufts–New England Medical Center Evidence-based Practice Center, and Melissa L. McPheeters, Ph.D., M.P.H., and Jeff Seroogy, B.S., members of the Vanderbilt University Evidence-based Practice Center.
The information in this module is currently not included in Version 1.0 of the Methods Guide for Comparative Effectiveness Reviews (available at: http://www.effective healthcare.ahrq.gov/repFiles/2007_10DraftMethodsGuide.pdf).