Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Review...
Systematic Review Process Overview
<ul><li>To describe why data extraction is important </li></ul><ul><li>To identify challenges in data extraction  </li></u...
<ul><li>To summarize studies in a common format to facilitate synthesis and coherent presentation of data </li></ul><ul><l...
<ul><li>Extracted data should: </li></ul><ul><ul><li>Accurately reflect information reported in the publication </li></ul>...
<ul><li>Data extraction involves more than copying words and numbers from the publication to a form. </li></ul><ul><li>Cli...
Data Extraction: A Boring Task?  “ It is an eye-opening experience to attempt to extract information from a paper that you...
<ul><li>In the Evidence-based Practice Center Program, we often refer to two types of tables: </li></ul><ul><li>Evidence T...
<ul><li>Use key questions and eligibility criteria as a guide </li></ul><ul><li>Anticipate what data summary tables should...
<ul><li>Population-generic elements may include patient characteristics, such as age, gender distribution, and disease sta...
<ul><li>Outcomes should be determined a priori with the Technical Expert Panel. </li></ul><ul><li>Criteria often are uncle...
<ul><li>Apart from outcome definitions, quantitative data are needed for meta-analysis: </li></ul><ul><ul><li>Dichotomous ...
<ul><li>The data elements to be extracted vary by type of study. </li></ul><ul><li>Consider collecting this information wh...
<ul><li>Provide “operational definitions” (instructions) indicating exactly what should be extracted in each field of the ...
<ul><li>Independent extraction of data by at least two experienced reviewers is ideal but is also resource intensive. </li...
<ul><li>To address all needs, a generic data extraction form will have to be very comprehensive. </li></ul><ul><li>Althoug...
<ul><li>Forms have to be constructed before any serious data extraction is underway. </li></ul><ul><ul><li>Original fields...
Evidence Tables: Example (I) First Draft Second Draft
Evidence Tables: Example (II) Final Draft
<ul><li>Lack of uniformity among outside reviewers: </li></ul><ul><ul><li>No matter how clear and detailed are the instruc...
<ul><li>In the “country, setting” field, data extractors should list possible settings that could be encountered in the li...
Example: Two Reviewers Extract Different Data Reviewer A Reviewer B
<ul><li>For evidence reports or technology assessments that have many key questions, data extraction forms may be several ...
Examples: Differential Data Extraction by Two Reviewers Trikalinos TA, et al. AHRQ Technology Assessment. Available at:  h...
Characteristics of the Index Test and Reference Standard Trikalinos TA, et al. AHRQ Technology Assessment. Available at:  ...
Results (Concordance/Accuracy) Trikalinos TA, et al. AHRQ Technology Assessment. Available at:  http://www.cms.gov/determi...
Results (Nonquantitative) Trikalinos TA, et al. AHRQ Technology Assessment. Available at:  http://www.cms.gov/determinatio...
<ul><li>Pencil and paper </li></ul><ul><li>Word processing software (e.g., Microsoft Word) </li></ul><ul><li>Spreadsheet (...
<ul><li>Who should extract the data? </li></ul><ul><ul><ul><li>Domain experts versus methodologists </li></ul></ul></ul><u...
<ul><li>Problems in data reporting </li></ul><ul><li>Inconsistencies in published papers  </li></ul><ul><li>Data reported ...
<ul><li>“ Data for the 40 patients who were given all 4 doses of medications were considered evaluable for efficacy and sa...
Examples of Data Reporting Problems (II)
Examples of Data Reporting Problems (III)
<ul><li>Let us extract the number of deaths in two study arms, at 5 years of followup . . . </li></ul>Inconsistencies in P...
Results Text  Overall Mortality  […] 24 deaths occurred in the PCI group, […] and  25  in the  MT  group […] MED and MT = ...
Overall Mortality Figure MT = medical treatment PCI = percutaneous coronary intervention PCI (205) MT (203) Dead 24 25 28 35
Clinical Events Table CABG = coronary artery bypass graft MT = medical treatment PCI = percutaneous coronary intervention ...
<ul><li>Because so few research reports give effect size, standard normal deviates, or exact p-values, the quantitative re...
Using Digitizing Software Source Forge Web site. Engauge Digitizer. Available at:  http://sourceforge.net/projects/digitiz...
<ul><li>Missing information in published papers </li></ul><ul><li>Publications with at least partially overlapping patient...
<ul><li>Data extraction is laborious and tedious. </li></ul><ul><li>To err is human: data extractors will identify errors ...
<ul><li>Key questions will guide reviewers in choosing which information to extract.  </li></ul><ul><li>There is no single...
<ul><li>Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Does blinding of readers affect t...
<ul><li>Trikalinos TA, Ip S, Raman G, et al.  Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Ass...
<ul><li>This presentation was prepared by Joseph Lau, M.D., and Thomas Trikalinos, M.D., Ph.D., members of the Tufts–New E...
Upcoming SlideShare
Loading in …5
×

Data Extraction

25,898 views

Published on

Published in: Health & Medicine
  • Dating direct: ♥♥♥ http://bit.ly/2ZDZFYj ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Doctors Predicted I Would NEVER Stop Snoring But Contrarily to their Prediction, I Did It 100% Naturally! learn more... ■■■ http://ishbv.com/snoringno/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Free Healing Soundscape - Just click "Play" to immerse yourself in a beautiful "Sound Bath" that clears negativity, awakens abundance, and brings miracles into your life! Listen for yourself right now. ■■■ https://bit.ly/30Ju5r6
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nine Signs Wealth is Coming Your Way... ♣♣♣ http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • You can try the web data scraper tool for extracting data.. http://www.theskysoft.com/web-data-miner.html
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Data Extraction

  1. 1. Data Extraction Prepared for: The Agency for Healthcare Research and Quality (AHRQ) Training Modules for Systematic Reviews Methods Guide www.ahrq.gov
  2. 2. Systematic Review Process Overview
  3. 3. <ul><li>To describe why data extraction is important </li></ul><ul><li>To identify challenges in data extraction </li></ul><ul><li>To describe the general layout of a data extraction form </li></ul><ul><li>To suggest methods for collecting data accurately and efficiently </li></ul><ul><li>To discuss the pros and cons for querying original authors </li></ul>Learning Objectives
  4. 4. <ul><li>To summarize studies in a common format to facilitate synthesis and coherent presentation of data </li></ul><ul><li>To identify numerical data for meta-analyses </li></ul><ul><li>To obtain information to assess more objectively the risk of bias in and applicability of studies </li></ul><ul><li>To identify systematically missing or incorrectly assessed data, outcomes that are never studied, and underrepresented populations </li></ul>Why Is Data Extraction Important?
  5. 5. <ul><li>Extracted data should: </li></ul><ul><ul><li>Accurately reflect information reported in the publication </li></ul></ul><ul><ul><li>Remain in a form close to the original reporting, so that disputes can be easily resolved </li></ul></ul><ul><ul><li>Provide sufficient information to understand the studies and to perform analyses </li></ul></ul><ul><li>Extract only the data needed, because the extraction process: </li></ul><ul><ul><li>Is labor intensive </li></ul></ul><ul><ul><li>Can be costly and error prone </li></ul></ul><ul><li>Different research questions may have different data needs </li></ul>On Data Extraction (I)
  6. 6. <ul><li>Data extraction involves more than copying words and numbers from the publication to a form. </li></ul><ul><li>Clinical domain, methodological, and statistical knowledge is needed to ensure the right information is captured. </li></ul><ul><li>Interpretation of published data is often needed. </li></ul><ul><li>What is reported is sometimes not what was carried out. </li></ul><ul><li>Data extraction and evaluation of risk of bias and of applicability typically occur at the same time. </li></ul>On Data Extraction (II)
  7. 7. Data Extraction: A Boring Task? “ It is an eye-opening experience to attempt to extract information from a paper that you have read carefully and thoroughly understood only to be confronted with ambiguities, obscurities, and gaps in the data that only an attempt to quantify the results reveals.” — Gurevitch and Hedges (1993) Gurevitch J, Hedges LV. In: Design and analysis of ecological experiments ; 1993.
  8. 8. <ul><li>In the Evidence-based Practice Center Program, we often refer to two types of tables: </li></ul><ul><li>Evidence Tables </li></ul><ul><ul><li>Essentially are data extraction forms </li></ul></ul><ul><ul><li>Typically are study specific, with data from each study extracted into a set of such tables </li></ul></ul><ul><ul><li>Are detailed and typically not included in main reports </li></ul></ul><ul><li>Summary Tables </li></ul><ul><ul><ul><li>Are used in main reports facilitate the presentation of the synthesis of the studies </li></ul></ul></ul><ul><ul><ul><li>Typically contain context-relevant pieces of the information included in study-specific evidence tables </li></ul></ul></ul><ul><ul><ul><li>Address particular research questions </li></ul></ul></ul>Comparative Effectiveness Reviews: Clarifying Research Terminology
  9. 9. <ul><li>Use key questions and eligibility criteria as a guide </li></ul><ul><li>Anticipate what data summary tables should include: </li></ul><ul><ul><li>To describe studies </li></ul></ul><ul><ul><li>To assess outcomes, risk of bias, and applicability </li></ul></ul><ul><ul><li>To conduct meta-analyses </li></ul></ul><ul><li>Use the PICOTS framework to choose data elements: </li></ul><ul><ul><li>Population </li></ul></ul><ul><ul><li>Intervention (or exposure) </li></ul></ul><ul><ul><li>Comparator (when applicable) </li></ul></ul><ul><ul><li>Outcome (remember numerical data) </li></ul></ul><ul><ul><li>Timing </li></ul></ul><ul><ul><li>Study design (study setting) </li></ul></ul>What Data To Collect?
  10. 10. <ul><li>Population-generic elements may include patient characteristics, such as age, gender distribution, and disease stage. </li></ul><ul><ul><li>More specific items may be needed, depending upon the topic. </li></ul></ul><ul><li>Intervention or exposure and comparator items depend upon the extracted study. </li></ul><ul><ul><li>Study types include randomized trial, observational study, diagnostic test study, prognostic factor study, family-based or population-based genetic study, et cetera. </li></ul></ul>Data Elements: Population, Intervention, and Comparator
  11. 11. <ul><li>Outcomes should be determined a priori with the Technical Expert Panel. </li></ul><ul><li>Criteria often are unclear about which outcomes to include and which to discard. </li></ul><ul><ul><li>Example: mean change in ejection fraction versus the proportion of subjects with an increase in ejection fraction by ≥5 percent </li></ul></ul><ul><li>Record different definitions of “outcome” and consult with content experts before making a decision about which definition to use. </li></ul>Data Elements: Outcome (I)
  12. 12. <ul><li>Apart from outcome definitions, quantitative data are needed for meta-analysis: </li></ul><ul><ul><li>Dichotomous variables (e.g., deaths, patients with at least one stroke) </li></ul></ul><ul><ul><li>Count data (e.g., number of strokes, counting multiple ones) </li></ul></ul><ul><ul><li>Continuous variables (e.g., mm Hg, pain score) </li></ul></ul><ul><ul><li>Survival data </li></ul></ul><ul><ul><li>Sensitivity, specificity, receiver operating characteristic </li></ul></ul><ul><ul><li>Correlations </li></ul></ul><ul><ul><li>Slopes </li></ul></ul>Data Elements: Outcome (II)
  13. 13. <ul><li>The data elements to be extracted vary by type of study. </li></ul><ul><li>Consider collecting this information when recording study characteristics for randomized trials: </li></ul><ul><ul><li>Number of centers (multicenter studies) </li></ul></ul><ul><ul><li>Method of randomization (adequacy of allocation concealment) </li></ul></ul><ul><ul><li>Blinding </li></ul></ul><ul><ul><li>Funding source </li></ul></ul><ul><ul><li>Whether or not an intention-to-treat analysis was used </li></ul></ul>Data Elements: Timing and Study Design
  14. 14. <ul><li>Provide “operational definitions” (instructions) indicating exactly what should be extracted in each field of the form. </li></ul><ul><li>Make sure that all data extractors understand the operational definitions the same way. </li></ul><ul><ul><li>Pilot-test the forms on several published papers. </li></ul></ul><ul><ul><li>Encourage communication to clarify even apparently mundane questions. </li></ul></ul>Always Provide Instructions
  15. 15. <ul><li>Independent extraction of data by at least two experienced reviewers is ideal but is also resource intensive. </li></ul><ul><li>There is a tradeoff between cost and the quality of data extraction. </li></ul><ul><ul><li>Data extraction often takes longer than 2 hours per paper. </li></ul></ul><ul><ul><li>A reduction in the scope of the work may be necessary if independent data extraction is desired. </li></ul></ul><ul><li>Careful single extraction by experienced reviewers, with or without crosschecking of selected items by a second reviewer, is a good compromise. </li></ul>Single Versus Double Extraction
  16. 16. <ul><li>To address all needs, a generic data extraction form will have to be very comprehensive. </li></ul><ul><li>Although there are common generic elements, forms need to be adapted to each topic or study design to be most efficient. </li></ul><ul><li>Organization of information in the PICOTS (population, intervention, comparator, outcome, timing, and setting) format is highly desirable. </li></ul><ul><li>Balance the structure of the form with the flexibility of its use. </li></ul><ul><li>Anticipate the need to capture unanticipated data. </li></ul><ul><li>Use an iterative process and have several individuals test the form on multiple studies. </li></ul>Developing Data Extraction Forms (Evidence Tables)
  17. 17. <ul><li>Forms have to be constructed before any serious data extraction is underway. </li></ul><ul><ul><li>Original fields may turn out to be inefficient or unusable when coding begins. </li></ul></ul><ul><li>Reviewers must: </li></ul><ul><ul><li>be as thorough as possible in the initial set-up, </li></ul></ul><ul><ul><li>reconfigure the tables as needed, and </li></ul></ul><ul><ul><li>use a dual review process to fill in gaps. </li></ul></ul>Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (I)
  18. 18. Evidence Tables: Example (I) First Draft Second Draft
  19. 19. Evidence Tables: Example (II) Final Draft
  20. 20. <ul><li>Lack of uniformity among outside reviewers: </li></ul><ul><ul><li>No matter how clear and detailed are the instructions, data will not be entered identically by one reviewer to the next. </li></ul></ul><ul><li>Solutions: </li></ul><ul><ul><li>Develop an evidence table guidance document—instructions on how to input data. </li></ul></ul><ul><ul><li>Limit the number of core members handling the evidence tables to avoid discrepancies in presentation. </li></ul></ul>Common Problems Encountered When Creating Data Extraction Forms (Evidence Tables) (II)
  21. 21. <ul><li>In the “country, setting” field, data extractors should list possible settings that could be encountered in the literature: </li></ul><ul><ul><li>Academic medical center(s), community, database, tertiary care hospital(s), specialty care treatment center(s), substance abuse center(s), level I trauma center(s), et cetera. </li></ul></ul><ul><li>In the “study design” field, data extractors should list one of the following: </li></ul><ul><ul><li>Randomized controlled trial, cross-sectional study, longitudinal study, case-control study, et cetera. </li></ul></ul>Sample Fields From a Table Guidance Document: Vanderbilt University Evidence-based Practice Center
  22. 22. Example: Two Reviewers Extract Different Data Reviewer A Reviewer B
  23. 23. <ul><li>For evidence reports or technology assessments that have many key questions, data extraction forms may be several pages long. </li></ul><ul><li>The next few slides are examples of data extraction forms. </li></ul><ul><li>Remember, there is more than one way to structure a data extraction form. </li></ul>Samples of Final Data Extraction Forms (Evidence Tables) Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  24. 24. Examples: Differential Data Extraction by Two Reviewers Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  25. 25. Characteristics of the Index Test and Reference Standard Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  26. 26. Results (Concordance/Accuracy) Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  27. 27. Results (Nonquantitative) Trikalinos TA, et al. AHRQ Technology Assessment. Available at: http://www.cms.gov/determinationprocess/downloads/id48TA.pdf .
  28. 28. <ul><li>Pencil and paper </li></ul><ul><li>Word processing software (e.g., Microsoft Word) </li></ul><ul><li>Spreadsheet (e.g., Microsoft Excel) </li></ul><ul><li>Database software (e.g., Microsoft Access, Epi Info™) </li></ul><ul><li>Dedicated off-the-shelf commercial software </li></ul><ul><li>Homegrown software </li></ul>Tools Available for Data Extraction and Collection
  29. 29. <ul><li>Who should extract the data? </li></ul><ul><ul><ul><li>Domain experts versus methodologists </li></ul></ul></ul><ul><li>What extraction method should be used? </li></ul><ul><ul><ul><li>Single or double independent extraction followed by reconciliation versus single extraction and independent verification </li></ul></ul></ul><ul><li>Should data extraction be blinded (to authors, journal, results)? </li></ul>Extracting the Data Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997;350:185-6.
  30. 30. <ul><li>Problems in data reporting </li></ul><ul><li>Inconsistencies in published papers </li></ul><ul><li>Data reported in graphs </li></ul>Challenges in Data Extraction
  31. 31. <ul><li>“ Data for the 40 patients who were given all 4 doses of medications were considered evaluable for efficacy and safety. The overall study population consisted of 10 (44%) men and 24 (56%) women, with a racial composition of 38 (88%) whites and 5 (12%) blacks.” </li></ul>Examples of Data Reporting Problems (I)
  32. 32. Examples of Data Reporting Problems (II)
  33. 33. Examples of Data Reporting Problems (III)
  34. 34. <ul><li>Let us extract the number of deaths in two study arms, at 5 years of followup . . . </li></ul>Inconsistencies in Published Papers
  35. 35. Results Text Overall Mortality […] 24 deaths occurred in the PCI group, […] and 25 in the MT group […] MED and MT = medical treatment; PCI = percutaneous coronary intervention PCI (205) MED (203) Dead 24 25
  36. 36. Overall Mortality Figure MT = medical treatment PCI = percutaneous coronary intervention PCI (205) MT (203) Dead 24 25 28 35
  37. 37. Clinical Events Table CABG = coronary artery bypass graft MT = medical treatment PCI = percutaneous coronary intervention PCI (205) MT (203) Dead 24 28 32 25 35 33
  38. 38. <ul><li>Because so few research reports give effect size, standard normal deviates, or exact p-values, the quantitative reviewer must calculate almost all indices of study outcomes. </li></ul><ul><li>Little of this calculation is automatic, because results are presented in a bewildering variety of forms and are often obscure. </li></ul>Why Do Such Problems Exist? Green BF, Hall JA. Annu Rev Psychol 1984;35:37-53.
  39. 39. Using Digitizing Software Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/. <ul><li>Engauge Digitizer, an open-source software: </li></ul><ul><li>Each data point is marked with an “X,” and the coordinates are given in a spreadsheet. </li></ul>
  40. 40. <ul><li>Missing information in published papers </li></ul><ul><li>Publications with at least partially overlapping patient subgroups </li></ul><ul><li>Potentially fraudulent data </li></ul>Additional Common Issues
  41. 41. <ul><li>Data extraction is laborious and tedious. </li></ul><ul><li>To err is human: data extractors will identify errors and will err themselves. </li></ul><ul><li>Interpretation and subjectivity are unavoidable. </li></ul><ul><li>Data are often not reported in a uniform manner (e.g., quality, location in paper, metrics, outcomes, numerical value vs. graphs). </li></ul>Conclusions
  42. 42. <ul><li>Key questions will guide reviewers in choosing which information to extract. </li></ul><ul><li>There is no single correct way to record extracted data. </li></ul><ul><li>Extracting data requires familiarity with the content and knowledge of epidemiological principles and statistical concepts. </li></ul><ul><li>Be persistent: </li></ul><ul><ul><ul><li>Often, one can extract more information than the paper initially appears to contain (e.g., by digitizing graphs). </li></ul></ul></ul><ul><li>Be comprehensive: </li></ul><ul><ul><li>Try to verify the same piece of information from different places in the same article. </li></ul></ul><ul><ul><li>Sometimes there are surprising inconsistencies. </li></ul></ul><ul><ul><li>Inconsistencies indicate suboptimal reporting quality at least. </li></ul></ul>Key Messages
  43. 43. <ul><li>Berlin J, for the University of Pennsylvania Meta-analysis Blinding Study Group. Does blinding of readers affect the results of meta-analysis? Lancet 1997;350:185-6. </li></ul><ul><li>Green BF, Hall JA. Quantitative methods for literature reviews. Annu Rev Psychol 1984;35:37-53. </li></ul><ul><li>Gurevitch J, Hedges LV. Meta-analysis: combining the results of independent experiments. In: Scheiner AM and Gurevich J, eds. Design and analysis of ecological experiments . New York: Chapman & Hall; 1993. p. 347-70. </li></ul><ul><li>Source Forge Web site. Engauge Digitizer. Available at: http://sourceforge.net/projects/digitizer/files/Engauge%20Digitizer/. </li></ul>References (I)
  44. 44. <ul><li>Trikalinos TA, Ip S, Raman G, et al. Home Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome . Technology Assessment (Prepared by Tufts–New England Medical Center Evidence-based Practice Center). Rockville, MD: Agency for Healthcare Research and Quality; August 2007. Available at: http://www.cms.gov/determinationprocess/downloads/ id48TA.pdf. </li></ul>References (II)
  45. 45. <ul><li>This presentation was prepared by Joseph Lau, M.D., and Thomas Trikalinos, M.D., Ph.D., members of the Tufts–New England Medical Center Evidence-based Practice Center, and Melissa L. McPheeters, Ph.D., M.P.H., and Jeff Seroogy, B.S., members of the Vanderbilt University Evidence-based Practice Center. </li></ul><ul><li>The information in this module is currently not included in Version 1.0 of the Methods Guide for Comparative Effectiveness Reviews (available at: http://www.effective healthcare.ahrq.gov/repFiles/2007_10DraftMethodsGuide.pdf). </li></ul>Authors

×