• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Extraction
 

Data Extraction

on

  • 6,704 views

 

Statistics

Views

Total Views
6,704
Views on SlideShare
6,366
Embed Views
338

Actions

Likes
2
Downloads
93
Comments
0

4 Embeds 338

http://libguides.library.tmc.edu 174
http://libguides.uthsc.edu 161
http://libguides.library.tmc.edu.ezproxyhost.library.tmc.edu 2
http://tmc.beta.libguides.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Data Extraction
  • Systematic Review Process Overview This slide illustrates the steps in the systematic review process. This module focuses on data extraction.
  • Learning Objectives
  • Why Is Data Extraction Important? A structured and organized data extraction is necessary for a systematic review or meta-analysis. To perform a systematic review or meta-analysis, one has to summarize studies to facilitate synthesis. A major value of extracting data into tables that have the same format is that comparisons across studies becomes easier, and the reader gets a better gestalt for the evidence. Furthermore, the structured format makes it simple to transfer data to other programs for analysis. In particular, one has to identify numerical data for meta-analyses. If the necessary numbers are not directly available, one would have to calculate them from sufficient statistics that are reported or from graphs. In practice, data extraction, in-depth review, and quality assessment (assessment of bias risk) happen at the same time. It is not uncommon that studies with overt methodological problems also report inconsistent data, and this becomes evident only when one actually tries to extract data. Often, during the data extraction phase, the reviewer realizes that some information is systematically missing or incorrectly assessed; some outcomes are never studied; or some populations are underrepresented in the existing evidence.
  • On Data Extraction (I) The extracted data should accurately reflect information reported in the publication, remain in a form close to the original reporting so that disputes can be easily resolved, and provide sufficient information to understand the studies and to perform analyses. Extract what is needed, but avoid overdoing it; data extraction is labor intensive and can be costly and error prone. Also, keep in mind that different research questions may have different data needs.
  • On Data Extraction (II) Data extraction is an intellectual exercise. It involves more than just copying words and numbers from a publication to a form. Often one needs specific knowledge of the clinical domain to correctly identify the information that needs to be extracted. For example, some familiarity with sleep medicine terminology is important when trying to classify sleep monitors according to the neurophysiologic channels they record. Furthermore, methodological and statistical knowledge is necessary, especially when the numerical data that one needs are not reported directly and need to be calculated based on other reported statistics. An example is the calculation of standard errors from confidence intervals or p -values; another is the calculation of hazard ratios from log-rank statistics or survival graphs. To some extent, interpretation of published data is needed (i.e., the extractor must be able to understand what the author meant). A very common occurrence pertains to the description of statistical methods in observational studies. Because of space limitations, and in an effort to avoid statistical jargon, authors may be less than clear in describing what they did. Another important thing is to appreciate the fact that what is reported is sometimes not what was carried out. Finally, data extraction and evaluation of risk of bias and of applicability typically occur at the same time.
  • Data Extraction: A Boring Task? Data extraction is a tedious task, but one learns a lot about the topic while performing it. In fact, every member of the systematic review team, including the leads and those who will perform the statistical analyses should perform part of the data extraction on almost every systematic review topic. This way, project leads and the statistical analysts get firsthand knowledge of the idiosyncrasies of a body of literature. In addition, they can appreciate what sort of data are typically available and what data can be extracted, which can help them guide the rest of the team. Reference: Gurevitch J, Hedges LV. Meta-analysis: combining the results of independent experiments. In: Scheiner AM and Gurevich J, eds. Design and analysis of ecological experiments . New York: Chapman & Hall; 1993. p. 347-70.
  • Comparative Effectiveness Review: Clarifying Research Terminology (II) In the parlance of the Evidence-based Practice Center Program, data extraction forms are also called evidence tables. Each study is extracted into its own set of evidence tables. Evidence tables are detailed and are typically not included in the main reports. The main reports organize studies into summary tables, which typically contain selected information. For example, a summary table may describe the characteristics of all studies that answer the first key question, and another summary table may describe the results of the same studies. Summary tables facilitate the presentation of the synthesis. The presentation of findings is discussed in detail in a separate module.
  • What Data To Collect? There is no single answer to this question. It depends on the key questions being asked, but also on how far the data extractors are willing to go to obtain a piece of information. We discuss more about this in later slides when we refer to digitization of graphs. One should be guided by the key questions and generate a template that captures relevant information. Typically, one would be expected to extract information on populations, interventions or exposures, comparators (when applicable), outcomes and numbers, and characteristics of study design. Of course, it is not necessary to lay out the template of the evidence tables (i.e., the data extraction forms) in this specific order. However, it is very likely that it will be meaningful to extract all of this information.
  • Data Elements: Population, Intervention, and Comparator Population-generic elements may include patient characteristics such as age, gender distribution, and disease stage. One may need more specific items, depending upon the topic. Choosing what data to extract to describe the intervention or exposure and comparators (when applicable) depends upon the type of study being extracted, which may be a randomized trial, an observational study, a diagnostic test study, a prognostic factor study, a family-based or population-based genetic study, et cetera. Describing the intervention in a trial is different from extracting genotypic distributions in a genetic association study.
  • Data Elements: Outcome (I) Of course, all outcomes should be determined a priori after discussions with the Technical Expert Panel. In practice, it is not uncommon that the protocol criteria are unclear about which outcomes to include and which to discard. For example, some studies may report mean change in ejection fraction, and other studies may report the proportion of patients with an increase in ejection fraction by at least 5 percent. It may be useful to record different definitions of “outcome” and consult with content experts before making a decision about which definition to use.
  • Data Elements: Outcome (II) The exact types of data that have to be extracted for meta-analysis depends upon the details of the topic at hand. Sometimes 2x2 cell counts need be recorded. Other times, one has to record other statistics, including means and standard deviations, p -values or ranked logarithmic odds scores (for nonparametric meta-analyses), correlation coefficients, and distributional parameters. Those interested in more information on quantitative synthesis are referred to the pertinent modules.
  • Data Elements: Timing and Study Design Which data elements one extracts vary by type of study. For example, if one analyzes randomized trials, it may be useful to extract a series of items including number of centers (for multicenter studies), method of randomization (to assess adequacy of allocation concealment), blinding of patients and outcome assessors, funding source, whether the analysis is by intention to treat, and so on. Again, one may have to use operational definitions for some of these items. For example, the intention-to-treat principle is not perceived identically by researchers who run trials and by study authors.
  • Always Provide Instructions
  • Single Versus Double Extraction
  • Developing Data Extraction Forms (Evidence Tables)
  • Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (I) Being as thorough as possible when working with the experts/project leads to identify fields for inclusion will help minimize the need for extensive modifications of the extraction form, but it is inevitable that new material will come to light as the studies are reviewed. Reworking the tables will be necessary. The dual review process ensures that nothing is left out if new fields are added after the first round of reviews, which is very common.
  • Evidence Tables: Example (I) These drafts of tables on traumatic brain injury and depression illustrate how tables can change: columns are renamed or relocated, fields are moved from column to column, fields are added or completely removed, et cetera. It is not important to understand exactly which fields changed place or were removed. The important thing to note is that extraction tables change after pilot testing to suit the perceptions or needs of the extractors.
  • Evidence Tables: Example (II) Some corrections will be simple: copy-and-paste text edits (e.g., “design” moving from one column to another). Others will require that extracters reread the papers and look for new information (e.g., the addition of irritability,” “aggression,” and “suicidality” as fields). Again, the dual review process helps when new fields are introduced; but sometimes, core staff members and project leads will be depended upon to ensure tables are complete.
  • Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (II)
  • Sample Fields From a Table Guidance Document: Vanderbilt University Evidence-based Center These are examples of fields that will, regardless of the project, be included in the evidence table. The more thorough the guide, the less variation (and incorrectly classified material). Nevertheless, tables will always be returned with formatting problems, incorrectly placed data, or omitted fields.
  • Example: Two Reviewers Extract Different Data In this example, two different reviewers — each with the same evidence table template, the same paper to review, the same guidance document, and the same set of instructions — arrived at different results.
  • Samples of Final Data Extraction Forms (Evidence Tables) Reference: Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  • Examples: Differential Data Extraction by Two Reviewers Reference: Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  • Characteristics of the Index Test and Reference Standard Again, do not spend time reviewing the slide. The aim is to illustrate that there can be a lot of data to be extracted. Here we get information on the characteristics of the index test and the reference standard, as well as the criteria that test readers used. Reference : Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  • Results (Concordance/Accuracy) As before, do not spend time reviewing this slide. These tables would record numerical information. Reference : Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf . 
  • Results (Nonquantitative) These tables would record additional information and quality items. Reference : Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf . 
  • Tools Available for Data Extraction and Collection Data extractors should use whatever methods they are more comfortable with to record information from each publication. These include pencil and paper, word processing software (e.g., Microsoft Word), spreadsheet (e.g., Microsoft Excel), Database software (e.g., Microsoft Access, Epi Info™), dedicated off-the-shelf commercial software, and homegrown software.
  • Extracting the Data A content expert knows the jargon of the field and can identify subtle differences in treatment options and whether or not the definitions of outcomes across studies are consistent. However, when a content expert has no methodological background, he or she may be unable to back-calculate numerical data that are reported differently (e.g., a 95% confidence interval from a p -value) or may miss important limitations of the studies. Quality assessment can be meaningfully performed by people with good understanding of methodological issues. Independent data extraction in duplicate is the best way to ensure quality in data extraction, but it is also very expensive. Independent verification of extracted data is faster and, therefore, less expensive. However, reviewers may fail to identify mistakes or may make the same mistakes. The rationale for blinded extraction is that extractors will be more objective (especially in their quality assessment) if they are undistracted by the impact factor/fame of the journal, their personal biases about the treatments, or the fame of the authors. Based on limited empirical explorations of whether blinding extractors to authors or journals affects the quality of data extraction and methodological assessment, there is little evidence in favor of blinding extractors. The University of Pennsylvania Meta-analysis Blinding Study Group studied blinding versus nonblinding of extractors to studies, authors, and treatments, and “concluded that blinding is not necessary when conducting meta-analyses of RCTs [randomized controlled trials].” Furthermore, this is very impractical and time consuming: reformatting papers to blind them took on average 7.7 hours, and the blinding process itself took on average 1.3 hours per paper. Reference: Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Does blinding of readers affect the results of meta-analysis? Lancet 1997;350:185-6.
  • Challenges in Data Extraction We will not discuss the additional common issues in any detail. We will illustrate examples in the following slides, and then briefly go through additional common issues.
  • Examples of Data Reporting Problems (I) The numbers in this excerpt are full of inconsistencies that are not apparent unless one actually tries to extract simple information by going sentence by sentence: There are at least 40 patients in total. Overall there are 34 men and women. What happened to the other, at least, 6 subjects? There are 43 black and white people. Back-calculating the denominator (total number of people) from each proportion is even more amusing. In the order of appearance, the proportion of men comes from 23 people, women from 43 people, whites from 43 people, and blacks from 42 (or 43) people. Seeing these patterns may enable us to see where the errors occurred, but we cannot actually do a lot about them.
  • Examples of Data Reporting Problems (II) This is obviously a typo, but it could be a sign of poor proofreading.
  • Examples of Data Reporting Problems (III) The mean age is outside the age range. It seems that the weight range was also put as age range.
  • Inconsistencies in Published Papers This example is from a real study.
  • Results Text In the results text, we read that “24 deaths occurred in the PCI [percutaneous coronary intervention] group, […] and 25 in the MT [medical treatment] group.”
  • Overall Mortality Figure Look at the number of survivors at different time points among those receiving percutaneous coronary intervention and among those receiving medical treatment. The paper clearly states that there is no censoring, which means that the difference of those alive at baseline versus at 5 years is the number of all-cause deaths. We now get a different set of numbers. We cross out the previous set, not because we believe the second set of numbers to be more accurate, but to emphasize that these are different sets of numbers on the same outcome.
  • Clinical Events Table When we look at another major table in this paper, we get another set of numbers. Multiplying the proportions in the table with the total number of patients randomized in the respective study arms, we get a third set of numbers, which we list in the table on the right.
  • Why Do Such Problems Exist? Reference: Green BF, Hall JA. Quantitative methods for literature reviews. Annu Rev Psychol 1984;35:37-53.
  • Using Digitizing Software To extract data from graphs, one may use digitizing software for accuracy and convenience. The example here shows Engauge Digitizer, a free open-source program that works on all common platforms (Windows, Mac, Linux, Unix). The systematic reviewer imports an electronic version of the figure (scanned or copied from the PDF) into the software — and after setting the ranges and scales (linear, logarithmic, or other) of the axes — marks the points in the scatter plot with the red Xs. The software then converts the marks to coordinates. Note that the quality of the scanning or the photocopying can affect the digitizing results. For example, some photocopiers and scanners may distort the image slightly. However, this method should yield acceptable accuracy in most cases. R eference: Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/.
  • Additional Common Issues There are several additional issues to consider when extracting data, which we will not discuss in any detail. Very often, the sought-after information is missing in the published papers, and then one faces the choice of whether or not to contact the authors. Often Evidence-based Practice Center (EPC) reviewers do not contact authors for clarifications or for missing data. Missing data are not peer-reviewed, and supplemental information submitted to address queries may have been suboptimally analyzed. The decision whether or not to contact authors is made by each EPC on a case-by-case basis. Another problem is that, especially for observational studies, several publications may exist, each referring to partially overlapping populations. It can be very difficult to distinguish the degree of overlap and is usually the result of one of three things: overlapping data (from preliminary and later reports), the same data but different authors, or similar data (same authors) but a different cohort (need to verify with authors). Of course, one should always keep in mind the potential for fraudulent data. It may be impossible to distinguish fraudulent research from genuine data; one hopes that fraudulent research is not frequently encountered.
  • Conclusions
  • Key Messages
  • References (I)
  • References (II)
  • Authors

Data Extraction Data Extraction Presentation Transcript

  • Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov
  • Systematic Review Process Overview
    • To describe why data extraction is important
    • To identify challenges in data extraction
    • To describe the general layout of a data extraction form
    • To suggest methods for collecting data accurately and efficiently
    • To discuss the pros and cons for querying original authors
    Learning Objectives
    • To summarize studies in a common format to facilitate synthesis and coherent presentation of data
    • To identify numerical data for meta-analyses
    • To obtain information to assess more objectively the risk of bias in and applicability of studies
    • To identify systematically missing or incorrectly assessed data, outcomes that are never studied, and underrepresented populations
    Why Is Data Extraction Important?
    • Extracted data should:
      • Accurately reflect information reported in the publication
      • Remain in a form close to the original reporting, so that disputes can be easily resolved
      • Provide sufficient information to understand the studies and to perform analyses
    • Extract only the data needed, because the extraction process:
      • Is labor intensive
      • Can be costly and error prone
    • Different research questions may have different data needs
    On Data Extraction (I)
    • Data extraction involves more than copying words and numbers from the publication to a form.
    • Clinical domain, methodological, and statistical knowledge is needed to ensure the right information is captured.
    • Interpretation of published data is often needed.
    • What is reported is sometimes not what was carried out.
    • Data extraction and evaluation of risk of bias and of applicability typically occur at the same time.
    On Data Extraction (II)
  • Data Extraction: A Boring Task? “ It is an eye-opening experience to attempt to extract information from a paper that you have read carefully and thoroughly understood only to be confronted with ambiguities, obscurities, and gaps in the data that only an attempt to quantify the results reveals.” — Gurevitch and Hedges (1993) Gurevitch J, Hedges LV. In: Design and analysis of ecological experiments ; 1993.
    • In the Evidence-based Practice Center Program, we often refer to two types of tables:
    • Evidence Tables
      • Essentially are data extraction forms
      • Typically are study specific, with data from each study extracted into a set of such tables
      • Are detailed and typically not included in main reports
    • Summary Tables
        • Are used in main reports facilitate the presentation of the synthesis of the studies
        • Typically contain context-relevant pieces of the information included in study-specific evidence tables
        • Address particular research questions
    Comparative Effectiveness Reviews: Clarifying Research Terminology
    • Use key questions and eligibility criteria as a guide
    • Anticipate what data summary tables should include:
      • To describe studies
      • To assess outcomes, risk of bias, and applicability
      • To conduct meta-analyses
    • Use the PICOTS framework to choose data elements:
      • Population
      • Intervention (or exposure)
      • Comparator (when applicable)
      • Outcome (remember numerical data)
      • Timing
      • Study design (study setting)
    What Data To Collect?
    • Population-generic elements may include patient characteristics, such as age, gender distribution, and disease stage.
      • More specific items may be needed, depending upon the topic.
    • Intervention or exposure and comparator items depend upon the extracted study.
      • Study types include randomized trial, observational study, diagnostic test study, prognostic factor study, family-based or population-based genetic study, et cetera.
    Data Elements: Population, Intervention, and Comparator
    • Outcomes should be determined a priori with the Technical Expert Panel.
    • Criteria often are unclear about which outcomes to include and which to discard.
      • Example: mean change in ejection fraction versus the proportion of subjects with an increase in ejection fraction by ≥5 percent
    • Record different definitions of “outcome” and consult with content experts before making a decision about which definition to use.
    Data Elements: Outcome (I)
    • Apart from outcome definitions, quantitative data are needed for meta-analysis:
      • Dichotomous variables (e.g., deaths, patients with at least one stroke)
      • Count data (e.g., number of strokes, counting multiple ones)
      • Continuous variables (e.g., mm Hg, pain score)
      • Survival data
      • Sensitivity, specificity, receiver operating characteristic
      • Correlations
      • Slopes
    Data Elements: Outcome (II)
    • The data elements to be extracted vary by type of study.
    • Consider collecting this information when recording study characteristics for randomized trials:
      • Number of centers (multicenter studies)
      • Method of randomization (adequacy of allocation concealment)
      • Blinding
      • Funding source
      • Whether or not an intention-to-treat analysis was used
    Data Elements: Timing and Study Design
    • Provide “operational definitions” (instructions) indicating exactly what should be extracted in each field of the form.
    • Make sure that all data extractors understand the operational definitions the same way.
      • Pilot-test the forms on several published papers.
      • Encourage communication to clarify even apparently mundane questions.
    Always Provide Instructions
    • Independent extraction of data by at least two experienced reviewers is ideal but is also resource intensive.
    • There is a tradeoff between cost and the quality of data extraction.
      • Data extraction often takes longer than 2 hours per paper.
      • A reduction in the scope of the work may be necessary if independent data extraction is desired.
    • Careful single extraction by experienced reviewers, with or without crosschecking of selected items by a second reviewer, is a good compromise.
    Single Versus Double Extraction
    • To address all needs, a generic data extraction form will have to be very comprehensive.
    • Although there are common generic elements, forms need to be adapted to each topic or study design to be most efficient.
    • Organization of information in the PICOTS (population, intervention, comparator, outcome, timing, and setting) format is highly desirable.
    • Balance the structure of the form with the flexibility of its use.
    • Anticipate the need to capture unanticipated data.
    • Use an iterative process and have several individuals test the form on multiple studies.
    Developing Data Extraction Forms (Evidence Tables)
    • Forms have to be constructed before any serious data extraction is underway.
      • Original fields may turn out to be inefficient or unusable when coding begins.
    • Reviewers must:
      • be as thorough as possible in the initial set-up,
      • reconfigure the tables as needed, and
      • use a dual review process to fill in gaps.
    Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (I)
  • Evidence Tables: Example (I) First Draft Second Draft
  • Evidence Tables: Example (II) Final Draft
    • Lack of uniformity among outside reviewers:
      • No matter how clear and detailed are the instructions, data will not be entered identically by one reviewer to the next.
    • Solutions:
      • Develop an evidence table guidance document—instructions on how to input data.
      • Limit the number of core members handling the evidence tables to avoid discrepancies in presentation.
    Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (II)
    • In the “country, setting” field, data extractors should list possible settings that could be encountered in the literature:
      • Academic medical center(s), community, database, tertiary care hospital(s), specialty care treatment center(s), substance abuse center(s), level I trauma center(s), et cetera.
    • In the “study design” field, data extractors should list one of the following:
      • Randomized controlled trial, cross-sectional study, longitudinal study, case-control study, et cetera.
    Sample Fields From a Table Guidance Document: Vanderbilt University Evidence-based Practice Center
  • Example: Two Reviewers Extract Different Data Reviewer A Reviewer B
    • For evidence reports or technology assessments that have many key questions, data extraction forms may be several pages long.
    • The next few slides are examples of data extraction forms.
    • Remember, there is more than one way to structure a data extraction form.
    Samples of Final Data Extraction Forms (Evidence Tables) Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  • Examples: Differential Data Extraction by Two Reviewers Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  • Characteristics of the Index Test and Reference Standard Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  • Results (Concordance/Accuracy) Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  • Results (Nonquantitative) Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
    • Pencil and paper
    • Word processing software (e.g., Microsoft Word)
    • Spreadsheet (e.g., Microsoft Excel)
    • Database software (e.g., Microsoft Access, Epi Info™)
    • Dedicated off-the-shelf commercial software
    • Homegrown software
    Tools Available for Data Extraction and Collection
    • Who should extract the data?
        • Domain experts versus methodologists
    • What extraction method should be used?
        • Single or double independent extraction followed by reconciliation versus single extraction and independent verification
    • Should data extraction be blinded (to authors, journal, results)?
    Extracting the Data Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997;350:185-6.
    • Problems in data reporting
    • Inconsistencies in published papers
    • Data reported in graphs
    Challenges in Data Extraction
    • “ Data for the 40 patients who were given all 4 doses of medications were considered evaluable for efficacy and safety. The overall study population consisted of 10 (44%) men and 24 (56%) women, with a racial composition of 38 (88%) whites and 5 (12%) blacks.”
    Examples of Data Reporting Problems (I)
  • Examples of Data Reporting Problems (II)
  • Examples of Data Reporting Problems (III)
    • Let us extract the number of deaths in two study arms, at 5 years of followup . . .
    Inconsistencies in Published Papers
  • Results Text Overall Mortality […] 24 deaths occurred in the PCI group, […] and 25 in the MT group […] MED and MT = medical treatment; PCI = percutaneous coronary intervention PCI (205) MED (203) Dead 24 25
  • Overall Mortality Figure MT = medical treatment PCI = percutaneous coronary intervention PCI (205) MT (203) Dead 24 25 28 35
  • Clinical Events Table CABG = coronary artery bypass graft MT = medical treatment PCI = percutaneous coronary intervention PCI (205) MT (203) Dead 24 28 32 25 35 33
    • Because so few research reports give effect size, standard normal deviates, or exact p-values, the quantitative reviewer must calculate almost all indices of study outcomes.
    • Little of this calculation is automatic, because results are presented in a bewildering variety of forms and are often obscure.
    Why Do Such Problems Exist? Green BF, Hall JA. Annu Rev Psychol 1984;35:37-53.
  • Using Digitizing Software Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/.
    • Engauge Digitizer, an open-source software:
    • Each data point is marked with an “X,” and the coordinates are given in a spreadsheet.
    • Missing information in published papers
    • Publications with at least partially overlapping patient subgroups
    • Potentially fraudulent data
    Additional Common Issues
    • Data extraction is laborious and tedious.
    • To err is human: data extractors will identify errors and will err themselves.
    • Interpretation and subjectivity are unavoidable.
    • Data are often not reported in a uniform manner (e.g., quality, location in paper, metrics, outcomes, numerical value vs. graphs).
    Conclusions
    • Key questions will guide reviewers in choosing which information to extract.
    • There is no single correct way to record extracted data.
    • Extracting data requires familiarity with the content and knowledge of epidemiological principles and statistical concepts.
    • Be persistent:
        • Often, one can extract more information than the paper initially appears to contain (e.g., by digitizing graphs).
    • Be comprehensive:
      • Try to verify the same piece of information from different places in the same article.
      • Sometimes there are surprising inconsistencies.
      • Inconsistencies indicate suboptimal reporting quality at least.
    Key Messages
    • Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Does blinding of readers affect the results of meta-analysis? Lancet 1997;350:185-6.
    • Green BF, Hall JA. Quantitative methods for literature reviews. Annu Rev Psychol 1984;35:37-53.
    • Gurevitch J, Hedges LV. Meta-analysis: combining the results of independent experiments. In: Scheiner AM and Gurevich J, eds. Design and analysis of ecological experiments . New York: Chapman & Hall; 1993. p. 347-70.
    • Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/.
    References (I)
    • Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/ id48TA.pdf.
    References (II)
    • This presentation was prepared by Joseph Lau, M.D., and Thomas Trikalinos, M.D., Ph.D., members of the Tufts–New England Medical Center Evidence-based Practice Center, and Melissa L. McPheeters, Ph.D., M.P.H., and Jeff Seroogy, B.S., members of the Vanderbilt University Evidence-based Practice Center.
    • The information in this module is currently not included in Version 1.0 of the Methods Guide for Comparative Effectiveness Reviews (available at: http://www.effective healthcare.ahrq.gov/repFiles/2007_10DraftMethodsGuide.pdf).
    Authors