Build a semantic discovery engine that allows researchers to querySo we built a prototype….Build a semantic search engine that will seamlessly query across many resources
The number of distinct resources will now be 176 by the end of the month
Transcript of "Beyond Ontologies: Putting Biomedical Knowledge to Work"
Beyond Ontologies: Putting Biomedical Knowledge to Work Philip R.O. Payne, Ph.D. Associate Professor and Chair, Biomedical Informatics (College of Medicine) Associate Professor, Health Services Management and Policy (College of Public Health) Associate Director for Data Sciences, Center for Clinical and Translational Science Executive-in-residence, Office of Technology Transfer and CommercializationNCBO Project MeetingMarch 13, 2013
COI/Disclosures Federal Funding: NCI, NLM, NCATS, AHRQ Additional Research Funding: SAIC, Rockefeller Philanthropy Associates, Academy Health, Pfizer Academic Consulting: CWRU, Cleveland Clinic, University of Cincinnati, Columbia University, Emory University, Virginia Commonwealth University, University of California San Diego, University of California Irvine, University of California San Francisco, University of Minnesota, Northwestern University Other Consulting/Honoraria: American Medical Informatics Association (AMIA), Institute of Medicine (IOM) Editorial Boards: Journal of the American Medical Informatics Association, Journal of Biomedical Informatics Study Sections: NLM (BLIRC), NCATS (formerly NCRR), NIDDK Corporate: Oracle, HP, Epic, Accelmatics (interim-CEO)2
Outline Working definitions and assumptions Putting biomedical knowledge to work A practical approach to CDEs Resource discovery Hypothesis generation Discussion Knowledge-based systems engineering Big data3
Outline Working definitions and assumptions Putting biomedical knowledge to work A practical approach to CDEs Resource discovery Hypothesis generation Discussion Knowledge-based systems engineering Big data4
The Multiple Dimensions of Biomedical Knowledge Engineering • Deployment model Lowering Barriers Technology • Systems to integration Adoption • Extensibility • Vocabularies Facilitating • Semantics Sharing Data Sharing • Knowledge and engineering Integration processes • Tools and best Solving Empowering practices Real Creating Dynamic, Knowledge • Governance World Interoperable Systems Workers • Socio-cultural Problems factors5
A Balanced Approach to Realizing the Benefits of Shared Semantics Community-wide Computable Vocabularies Interoperability and Semantics Historical Focus Peer-to-Peer Negotiation of Working Project-Specific Constructs Interoperability Impact on Community-wide: Governance Technologies Software Engineering Approaches6
Empowering Knowledge Workers Driving Solutions to Biological and Subject Matter Real World Clinical Experts Interoperability Problems Needs Critical Issues: Workflows that enable engagement by Subject Matter Experts Tight coupling of knowledge engineering efforts and research programs that can define “real world” driving problems Facilitation and support of interdisciplinary, team science models (including basic and translational scientists, clinical researchers, and informaticians) Biomedical Informatics ≠ Engineering Systems-level Approaches To Knowledge Engineering and Usability Are Essential7
4 Assumptions Regarding the Current State and Future of the NCBO 1) The tools and knowledge collections created and maintained by the NCBO have become a substrate for a broad spectrum of biomedical informatics innovations Analogous to the role played by NLM provided resources 2) Future directions for the center and its work will trend towards an applied science focus 3) At the same time, outreach, engagement, and education will remain high priorities 4) The current funding climate presents a significant and unknown challenge to the preceding 3 assumptions8
4 Assumptions Regarding the Current State and Future of the NCBO 1) The tools and knowledge collections created and maintained by the NCBO have become a substrate for a broad spectrum of biomedical informatics innovations Analogous to the role played by NLM provided resources 2) Future directions for the center and its work will trend towards an applied science focus 3) At the same time, outreach, engagement, and education will remain high priorities 4) The current funding climate presents a significant and unknown challenge to the preceding 3 assumptions9
Outline Working definitions and assumptions Putting biomedical knowledge to work A practical approach to CDEs Resource discovery Hypothesis generation Discussion Knowledge-based systems engineering Big data10
A Pragmatic Approach to CDEs: The openMDR Project11
Defining Common Data Elements (CDEs) Common Data Elements (CDEs) are standardized terms for the collection and exchange of data CDEs are metadata CDEs describe the type of data being collected, not the data itself Critical role(s) for CDEs: to identify discrete, defined items for data collection to promote consistent data collection in the field to eliminate unneeded or redundant data collection to promote consistent reporting and analysis to reduce the possibility of error related to data translation and transmission to facilitate data sharing Source: National Cancer Institute (NCI)12
OpenMDR: a Distribute CDE Platform Semantic Metadata Management Suite Locally relevant ontology-anchored data elements Rapid and agile development paradigm Distributed terminology ecosystem Federated queries across multiple deployments Interaction with other semantic management systems ISO 11179 semantic repository Integration with industry standard tools http://openmdr.org13
OpenMDR Functional Components Create and manage terminology Discover and reuse concepts Annotate models for discovery and interoperability Utilize data elements to build semantically anchored services http://openmdr.org14
OpenMDR Is Federated Multiple deployments for locally relevant terminology. DNS-like hierarchy of authority http://openmdr.org15
OpenMDR as part of an MDA Workflow Empowers knowledge workers Enterprise Architect plugin Formulate searches against local or distributed OpenMDR instances Identify semantic terms in detail Concept codes help distinguish similar elements Apply annotations to the data model http://openmdr.org16
OpenMDR and the TRIAD SOA http://openmdr.org17
Motivation for the Design of ResearchIQ• Clinical and translational researchers frequently need to identify and engage: – Collaborators – Shared resources – Data, information, and knowledge collections• There are a multitude of sources that can be used to support such needs, however they are usually: – Heterogeneous – Difficult to find – Not linked How do we overcome these barriers to the efficientplanning and conduct of clinical and translational studies?
A Potential Solution: What is ResearchIQ (Research Integrative Query)? A single knowledge resource portal for the clinical and translational research community that will provide a ”front door” for a variety of resources. How does ResearchIQ work? Knowledge anchored semantic search Leveraging semantic web technologies Current project focus is on the development and deployment of an end-user facing proof-of concept Can it be done? How difficult will it be? Can it scale? What kind of coverage would we have?
High-level System Architecture Shared Resources Web Pages UMLS Study Search RDFizers Database Metamap Domain Model caGrid/TRIAD Dataservice RDMZ Triple Store (Database) Web Web Query Performance Optimization Services Application Browser
Putting Conceptual Knowledge to Work:Constructive Induction (CI) & Hypothesis Generation Conceptual Knowledge Constructs (CKCs) • Conceptual knowledge-anchored concepts + relationships • Higher order constructs (multiple intermediate concepts) • Controls for concept granularity (search depth) • Basis for inference of hypotheses concerning relationships between data elements
Experimental Context: CLL Research Consortium NCI-funded Program/Project (PO1) Translational research targeting Chronic Lymphocytic Leukemia (CLL) Established in 1999 Cohort of over 6,000 patients Comprehensive phenotypic and bio-molecular data sets, as well as bio-specimens 8 participating sites Informatics platform: Research networking Clinical trials management Correlative data management Bio-specimen management
Multi-part CI Evaluation Study in CLL (3) Mining (1) Efficacy (2) Verification Domain & Validation Literature CKC Evaluation Automated lit. Mining CLL • 108 data elements queries literature • 822 UMLS concepts • Random sample • Medline, 2005- • 5800 CKCs (50) 2008 • 5 SMEs SME “gold • Random sample Comparison standard” • Literature-based (250) • Support metric CKCs • 86% valid • 90% “meaningful” Critical • Ontology-based CKCs Search depth relationship Critical findings • support metric • No overlap controls • “meaningful” • Differing TOKEn browser • Significant granularity1. correlation • More Knowledge (SMEs) in a Payne PR, Borlawsky T, Kwok A, Dhaval R, Greaves A. Ontology-anchored Approaches to Conceptual timely Discovery Multi-dimensional Research Data Repository. AMIA Translational Bioinformatics Summit Proc. 2008.2. Payne PR, Borlawsky T, Kwok A, Greaves A. Supporting the Design of Translational Clinical Studies Through the Generation and Verification of Conceptual Knowledge-anchored Hypotheses. AMIA Annu Symp Proc. 2008.3. Payne PR, Borlawsky T, Lele O, James S, Greaves AW. The TOKEN Project: Knowledge Synthesis for in-silico Science. Journal of American Medical Informatics Association (JAMIA). 2011
CKC Visualization Bone Marrow Cytogenetic & Morphology Chromosomal abnormalities Solid Tumors Tissues of Origin Hematologic Myelogenous Malignancies Malignancies Bio-molecular ProductsTOKEn CKC Network: CLL Research Consortium Metadata
Cytogenetic Chromosome LaboratoryAbnormalities Loss Findings Tissues of Origin Protein Treatment Expression Response Leukemias Molecular Tissues of Abnormalities Origin Bone Marrow Morphology LymphomasTOKEn CKC Network: Semantic Partitions
Outline Working definitions and assumptions Putting biomedical knowledge to work A practical approach to CDEs Resource discovery Hypothesis generation Discussion Knowledge-based systems engineering Big data31
Knowledge-based Systems: Replicating Expert PerformanceAdapted from Gaines and Shaw, “Knowledge Acquisition Tools Based On Personal Construct Psychology”, 1993
The Importance of Knowledge-based Systems Engineering is Amplified by Our Increased Focus on Big Data Scalability Volume Extensibility Reproducibility Multi- dimensional Velocity data, informatio n, and knowledge Integration Moving beyond Variability the “hype cycle” and solving real world problems Over $100M investment by NIH, including the creation of centers of excellence34
Acknowledgements Collaborators: Funding: Peter J. Embi, MD, MS NCI: R01CA134232, R01CA107106, P01CA081534, P50CA140158, Albert M. Lai, PhD P30CA016058 Kun Huang, PhD NCATS: U54RR024384 Po-Yin Yen, RN, PhD NLM: R01LM009533, T15LM011270 Yang Xiang, PhD AHRQ: R01HS019908 Marcelo Lopetegui, MD Rockefeller Philanthropy Associates Tara Borlawsky-Payne, MA Academy Health – EDM Forum Omkar Lele, MS, MBA Laboratory for Knowledge Marjorie Kelley Based Applications and William Stephens Systems Engineering (KBASE): Arka Pattanayak Caryn Roth Andrew Greaves35
Thank you for your time and attention! • email@example.com • http://go.osu.edu/payne36
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.