Presentation at 2015 Association for Clinical and Translational Science Meeting on Machine learning approach to classifying publications along the translational research spectrum
1. Classifying Publications Along the
Translational Science Spectrum:
A Machine Learning Approach
April 18, 2015
Alisa Surkis, PhD, MLS
Health Sciences Library
NYU School of Medicine
2. Evaluation Key Function Committee
Definitions Workgroup Bibliometrics Workgroup
Common Metrics
Publication Impact
7. Combining Definitions
IOM ReportHarvard Catalyst
Blumberg et al.
Westfall et al.
Woodcock & Woosley
Sung et al.
Finkbeiner
Waldman & Terzick
Khoury et al.
Marmot et al.
UT Southwestern MC
9. Combined Definitions
T0: BASIC BIOMEDICAL RESEARCH: Identification of opportunities and
approaches to health problems
• Includes preclinical and animal studies
• May or may not consider a particular disease process
• May include human subjects, but does not include interventions with human subjects
• Goal is to understand the human condition and environment as it exists
• Focuses on understanding biological, social and behavioral mechanisms that underlie
health or disease
• Defining mechanisms, biomarkers, targets for therapeutic development; drug discovery (lead
molecule screening, optimization, formulation); prototyping; physical assessments (radiology,
laboratory, biopsy)
• Can include non-interventional, correlational epidemiologic studies using existing large data sets
• Studies mechanisms or derive modifications of cells, proteins, and DNA present in
human disease processes
• Identifies functional significance and mechanisms of genomic polymorphisms identified
by human genome-wide association studies
T1: TRANSLATION TO HUMANS: Seeks to move fundamental discovery
into health application; provide clinical insights
• Involves Proof of Concept studies
• Includes Phase 1 clinical trials
• Healthy subjects or select population of patients
• Small sample size
• Tests for safety
• Focuses on new methods of diagnosis, treatment, and prevention
• Takes place in highly controlled research settings
T2: TRANSLATION TO PATIENTS: Health application to implications for
evidence-based practice guidelines
• Involves controlled clinical research studies which may lead to the basis for clinical
application and evidence-based guidelines
• Yields knowledge about the efficacy of interventions in highly-controlled/protocol-driven
settings
• Goal is to identify and analyze the optimal effects of an intervention on the human
condition or environment
• Phase 2 clinical trials – focus on safety and efficacy (dose-response)
• Select population of patients
• Relatively large sample size
• Phase 3 clinical trials – focus on safety and efficacy
• Select population of patients
• Special groups of patients (ex. renal failure)
T3: TRANSLATION TO PRACTICE: Practice guidelines to health practices
• Includes comparative effectiveness, pragmatic clinical trials, community based
participatory research, dissemination and implementation research, and clinical
outcomes research, post-marketing analysis (phase 4)
• Health services research, including reasons for gaps in care and delivery of
recommended and timely care to the right patient
• Meta-analyses, and systematic reviews involving interventions
• Development and implementation of evidenced-based guidelines, policies, and best
practices
T4: TRANSLATION TO COMMUNITIES: Health practice to population health
impact, providing communities with the optimal intervention
• Includes population-level outcomes research: population monitoring of morbidity,
mortality, benefits, and risks
• Focuses on wider dissemination/implementation of improved practices/interventions
Focuses on impacts of policy and/or environmental change
• Studies focusing on disease prevention through life-style and behavioral modifications
• Documents “real-world” health outcomes of population health practices associated with
improved disease prevention and reduced medical costs
• Results in true benefit to society
10. Algorithm
What is Machine Learning?
Input 1
Input 2
Input 3
Class 2
Class 2
Feature Set 1
Feature Set 2
Feature Set 3
Class 1
19. Combined Definitions
T0: BASIC BIOMEDICAL RESEARCH: Identification of opportunities and
approaches to health problems
• Includes preclinical and animal studies
• May or may not consider a particular disease process
• May include human subjects, but does not include interventions with human subjects
• Goal is to understand the human condition and environment as it exists
• Focuses on understanding biological, social and behavioral mechanisms that underlie
health or disease
• Defining mechanisms, biomarkers, targets for therapeutic development; drug discovery (lead
molecule screening, optimization, formulation); prototyping; physical assessments (radiology,
laboratory, biopsy)
• Can include non-interventional, correlational epidemiologic studies using existing large data sets
• Studies mechanisms or derive modifications of cells, proteins, and DNA present in
human disease processes
• Identifies functional significance and mechanisms of genomic polymorphisms identified
by human genome-wide association studies
T1: TRANSLATION TO HUMANS: Seeks to move fundamental discovery
into health application; provide clinical insights
• Involves Proof of Concept studies
• Includes Phase 1 clinical trials
• Healthy subjects or select population of patients
• Small sample size
• Tests for safety
• Focuses on new methods of diagnosis, treatment, and prevention
• Takes place in highly controlled research settings
T2: TRANSLATION TO PATIENTS: Health application to implications for
evidence-based practice guidelines
• Involves controlled clinical research studies which may lead to the basis for clinical
application and evidence-based guidelines
• Yields knowledge about the efficacy of interventions in highly-controlled/protocol-driven
settings
• Goal is to identify and analyze the optimal effects of an intervention on the human
condition or environment
• Phase 2 clinical trials – focus on safety and efficacy (dose-response)
• Select population of patients
• Relatively large sample size
• Phase 3 clinical trials – focus on safety and efficacy
• Select population of patients
• Special groups of patients (ex. renal failure)
T3: TRANSLATION TO PRACTICE: Practice guidelines to health practices
• Includes comparative effectiveness, pragmatic clinical trials, community based
participatory research, dissemination and implementation research, and clinical
outcomes research, post-marketing analysis (phase 4)
• Health services research, including reasons for gaps in care and delivery of
recommended and timely care to the right patient
• Meta-analyses, and systematic reviews involving interventions
• Development and implementation of evidenced-based guidelines, policies, and best
practices
T4: TRANSLATION TO COMMUNITIES: Health practice to population health
impact, providing communities with the optimal intervention
• Includes population-level outcomes research: population monitoring of morbidity,
mortality, benefits, and risks
• Focuses on wider dissemination/implementation of improved practices/interventions
Focuses on impacts of policy and/or environmental change
• Studies focusing on disease prevention through life-style and behavioral modifications
• Documents “real-world” health outcomes of population health practices associated with
improved disease prevention and reduced medical costs
• Results in true benefit to society
20. • 9 coders from 5 institutions
• Pilot sample of 32 articles
• Wisconsin
• Difficult to categorize
• Consensus reached on ZERO articles
Agreement reached?
21. Try again
• 25 randomly selected articles
• Agreement on 5 articles
• Revisit 11 of the previous articles
• Agreement on 1 article
22. A definition is not as helpful as…
TRANSLATION TO PRACTICE: Practice guidelines to
health practices
• Includes comparative effectiveness, pragmatic clinical trials,
community based participatory research, dissemination and
implementation research, and clinical outcomes research, post-
marketing analysis (phase 4)
• Health services research, including reasons for gaps in care and
delivery of recommended and timely care to the right patient
• Meta-analyses, and systematic reviews involving interventions
• Development and implementation of evidenced-based guidelines,
policies, and best practices
24. Checklist
• Re-visiting the 25 + 11 publications
• 17% agreement -> 31% agreement
• 36% all but one agreed
• Improves inter-rater reliability
25. Preparing the Training Set
• 200 articles
• 40 from each of the 5 CTSAs
• Each article given to 2 institutions to classify
• 123 articles agreement on classification
• 49/77 articles I cast a “deciding vote”
• Still too few T1/T2
• Added 6 articles from pilot data
• Applied PubMed Clinical Trial filter and randomly selected 30 articles
from the 5 CTSAs
26. Results – Area Under the Curve
• T0 - 0.89
• T1 - 0.63
• T2 - 0.66
• T3/T4 - 0.87
T1/T2 - 0.69
29. Definitions
BASIC BIOMEDICAL RESEARCH: Identification of opportunities and
approaches to health problems.
• Includes preclinical and animal studies
• May or may not consider a particular disease process
• May include human subjects, but does not include interventions with human subjects
• Goal is to understand the human condition and environment as it exists
• Focuses on understanding biological, social and behavioral mechanisms that underlie
health or disease
• Defining mechanisms, biomarkers, targets for therapeutic development; drug discovery (lead molecule screening,
optimization, formulation); prototyping; physical assessments (radiology, laboratory, biopsy)
• Can include non-interventional, correlational epidemiologic studies using existing large data sets
• Studies mechanisms or derive modifications of cells, proteins, and DNA present in
human disease processes
• Identifies functional significance and mechanisms of genomic polymorphisms identified
by human genome-wide association studies
Concise Definitions
BASIC BIOMEDICAL RESEARCH: Identification of opportunities and
approaches to health problems.
• Includes: preclinical and animal studies; GWAS studies; studies of cells, proteins, and
DNA; studies on humans or existing datasets that focus on understanding biological,
social and behavioral mechanisms that underlie health or disease
30. Concise Definitions
T0 BASIC BIOMEDICAL RESEARCH
Identification of opportunities and approaches to health problems.
Includes: preclinical and animal studies; GWAS studies; studies of cells, proteins, and DNA; studies on humans or existing
datasets that focus on understanding biological, social and behavioral mechanisms that underlie health or disease
T1 TRANSLATION TO HUMANS
Seeks to move fundamental discovery into health application; provide clinical insights
Includes: Proof of Concept studies; phase 1 clinical trials; studies testing feasibility or safety of a new method of diagnosis,
treatment, or prevention
T2 TRANSLATION TO PATIENTS
Health application to implications for evidence-based practice guidelines.
Includes: Phase 2/3 clinical trials; studies to test efficacy of interventions in highly controlled settings
T3 TRANSLATION TO PRACTICE
Practice guidelines to health practices.
Includes: Phase 4 clinical trials, comparative effectiveness research, community based participatory research, dissemination
and implementation research, clinical outcomes research, health services research, meta-analyses/systematic reviews of
interventions, development/implementation of guidelines
T4 TRANSLATION TO COMMUNITIES
Health practice to population health impact, providing communities with the optimal intervention
Includes: population-level outcomes research; wider implementation and dissemination; policy impacts; disease prevention
through lifestyle/behavior modifications; real-world health outcomes; true benefit to society
31. Results so far
• Checklist
• Concise Definitions
• Initial Classifiers
32. Next Steps
• Improve classifiers
• Larger training set
• Different algorithms
• Input variations
• Test for generalizability
• Apply classifiers to use cases
33. References
• “Translating New Treatments and Prevention Strategies More Efficiently” http://www.ncats.nih.gov/research/cts/cts.html
• “Pathfinder” https://catalyst.harvard.edu/pathfinder/
• Dougherty, D., & Conway, P. H. (2008). The "3T's" road map to transform US health care: the "how" of high-quality care. JAMA,
299(19), 2319-2321.
• Sung, N. S., Crowley, W. F., Jr., Genel, M., Salber, P., Sandy, L., Sherwood, L. M., . . . Rimoin, D. (2003). Central challenges
facing the national clinical research enterprise. JAMA, 289(10), 1278-1287.
• Waldman, S. A., & Terzic, A. (2010). Clinical and translational science: from bench-bedside to global village. Clin Transl Sci,
3(5), 254-257.
• Institute of Medicine. (2013). The CTSA Program at NIH: Opportunities for Advancing Clinical and Translational Research.
Retrieved from http://www.iom.edu/Reports/2013/The-CTSA-Program-at-NIH-Opportunities-for-Advancing-Clinical-and-
Translational-Research.aspx