Beyond Ontologies: Putting Biomedical Knowledge to Work
Upcoming SlideShare
Loading in...5
×
 

Beyond Ontologies: Putting Biomedical Knowledge to Work

on

  • 148 views

 

Statistics

Views

Total Views
148
Views on SlideShare
148
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Build a semantic discovery engine that allows researchers to querySo we built a prototype….Build a semantic search engine that will seamlessly query across many resources
  • The number of distinct resources will now be 176 by the end of the month

Beyond Ontologies: Putting Biomedical Knowledge to Work Beyond Ontologies: Putting Biomedical Knowledge to Work Presentation Transcript

  • Beyond Ontologies: Putting Biomedical Knowledge to Work Philip R.O. Payne, Ph.D. Associate Professor and Chair, Biomedical Informatics (College of Medicine) Associate Professor, Health Services Management and Policy (College of Public Health) Associate Director for Data Sciences, Center for Clinical and Translational Science Executive-in-residence, Office of Technology Transfer and CommercializationNCBO Project MeetingMarch 13, 2013
  • COI/Disclosures  Federal Funding: NCI, NLM, NCATS, AHRQ  Additional Research Funding: SAIC, Rockefeller Philanthropy Associates, Academy Health, Pfizer  Academic Consulting: CWRU, Cleveland Clinic, University of Cincinnati, Columbia University, Emory University, Virginia Commonwealth University, University of California San Diego, University of California Irvine, University of California San Francisco, University of Minnesota, Northwestern University  Other Consulting/Honoraria: American Medical Informatics Association (AMIA), Institute of Medicine (IOM)  Editorial Boards: Journal of the American Medical Informatics Association, Journal of Biomedical Informatics  Study Sections: NLM (BLIRC), NCATS (formerly NCRR), NIDDK  Corporate: Oracle, HP, Epic, Accelmatics (interim-CEO)2
  • Outline  Working definitions and assumptions  Putting biomedical knowledge to work  A practical approach to CDEs  Resource discovery  Hypothesis generation  Discussion  Knowledge-based systems engineering  Big data3
  • Outline  Working definitions and assumptions  Putting biomedical knowledge to work  A practical approach to CDEs  Resource discovery  Hypothesis generation  Discussion  Knowledge-based systems engineering  Big data4
  • The Multiple Dimensions of Biomedical Knowledge Engineering • Deployment model Lowering Barriers Technology • Systems to integration Adoption • Extensibility • Vocabularies Facilitating • Semantics Sharing Data Sharing • Knowledge and engineering Integration processes • Tools and best Solving Empowering practices Real Creating Dynamic, Knowledge • Governance World Interoperable Systems Workers • Socio-cultural Problems factors5
  • A Balanced Approach to Realizing the Benefits of Shared Semantics Community-wide Computable Vocabularies Interoperability and Semantics Historical Focus Peer-to-Peer Negotiation of Working Project-Specific Constructs Interoperability Impact on Community-wide:  Governance  Technologies  Software Engineering Approaches6
  • Empowering Knowledge Workers Driving Solutions to Biological and Subject Matter Real World Clinical Experts Interoperability Problems Needs Critical Issues:  Workflows that enable engagement by Subject Matter Experts  Tight coupling of knowledge engineering efforts and research programs that can define “real world” driving problems  Facilitation and support of interdisciplinary, team science models (including basic and translational scientists, clinical researchers, and informaticians) Biomedical Informatics ≠ Engineering Systems-level Approaches To Knowledge Engineering and Usability Are Essential7
  • 4 Assumptions Regarding the Current State and Future of the NCBO 1) The tools and knowledge collections created and maintained by the NCBO have become a substrate for a broad spectrum of biomedical informatics innovations  Analogous to the role played by NLM provided resources 2) Future directions for the center and its work will trend towards an applied science focus 3) At the same time, outreach, engagement, and education will remain high priorities 4) The current funding climate presents a significant and unknown challenge to the preceding 3 assumptions8
  • 4 Assumptions Regarding the Current State and Future of the NCBO 1) The tools and knowledge collections created and maintained by the NCBO have become a substrate for a broad spectrum of biomedical informatics innovations  Analogous to the role played by NLM provided resources 2) Future directions for the center and its work will trend towards an applied science focus 3) At the same time, outreach, engagement, and education will remain high priorities 4) The current funding climate presents a significant and unknown challenge to the preceding 3 assumptions9
  • Outline  Working definitions and assumptions  Putting biomedical knowledge to work  A practical approach to CDEs  Resource discovery  Hypothesis generation  Discussion  Knowledge-based systems engineering  Big data10
  • A Pragmatic Approach to CDEs: The openMDR Project11
  • Defining Common Data Elements (CDEs)  Common Data Elements (CDEs) are standardized terms for the collection and exchange of data  CDEs are metadata  CDEs describe the type of data being collected, not the data itself  Critical role(s) for CDEs:  to identify discrete, defined items for data collection  to promote consistent data collection in the field  to eliminate unneeded or redundant data collection  to promote consistent reporting and analysis  to reduce the possibility of error related to data translation and transmission  to facilitate data sharing Source: National Cancer Institute (NCI)12
  • OpenMDR: a Distribute CDE Platform  Semantic Metadata Management Suite  Locally relevant ontology-anchored data elements  Rapid and agile development paradigm  Distributed terminology ecosystem  Federated queries across multiple deployments  Interaction with other semantic management systems  ISO 11179 semantic repository  Integration with industry standard tools http://openmdr.org13
  • OpenMDR Functional Components  Create and manage terminology  Discover and reuse concepts  Annotate models for discovery and interoperability  Utilize data elements to build semantically anchored services http://openmdr.org14
  • OpenMDR Is Federated  Multiple deployments for locally relevant terminology.  DNS-like hierarchy of authority http://openmdr.org15
  • OpenMDR as part of an MDA Workflow  Empowers knowledge workers  Enterprise Architect plugin  Formulate searches against local or distributed OpenMDR instances  Identify semantic terms in detail  Concept codes help distinguish similar elements  Apply annotations to the data model http://openmdr.org16
  • OpenMDR and the TRIAD SOA http://openmdr.org17
  • Resource Discovery: ResearchIQ18
  • Motivation for the Design of ResearchIQ• Clinical and translational researchers frequently need to identify and engage: – Collaborators – Shared resources – Data, information, and knowledge collections• There are a multitude of sources that can be used to support such needs, however they are usually: – Heterogeneous – Difficult to find – Not linked How do we overcome these barriers to the efficientplanning and conduct of clinical and translational studies?
  • A Potential Solution: What is ResearchIQ (Research Integrative Query)?  A single knowledge resource portal for the clinical and translational research community that will provide a ”front door” for a variety of resources. How does ResearchIQ work?  Knowledge anchored semantic search  Leveraging semantic web technologies Current project focus is on the development and deployment of an end-user facing proof-of concept  Can it be done?  How difficult will it be?  Can it scale?  What kind of coverage would we have?
  • High-level System Architecture Shared Resources Web Pages UMLS Study Search RDFizers Database Metamap Domain Model caGrid/TRIAD Dataservice RDMZ Triple Store (Database) Web Web Query Performance Optimization Services Application Browser
  • Presentation Layer Syntactic Semantic Semantic Semantic Syntactic Syntactic Syntactic
  • Knowledge Base Growth (2012) Cumulative Total Resources 4.4Log (Monthly Cumulative 4.2 Resource Count) 4 3.8 3.6 3.4 3.2 3 Jan-12 Feb-12 Mar-12 Apr-12 May-12 Jun-12 Jul-12 Aug-12 Sep-12 Oct-12 Nov-12 Dec-12 Months - 2012
  • Managing Knowledge Base Growth High Marginal Costs Costs/unit GenBank OAI Dataset Desired resource mix Low Marginal Costs PubMed Websites Quantity of resource instances
  • Hypothesis Generation: TOKEN Knowledge Synthesis Platform25
  • Putting Conceptual Knowledge to Work:Constructive Induction (CI) & Hypothesis Generation Conceptual Knowledge Constructs (CKCs) • Conceptual knowledge-anchored concepts + relationships • Higher order constructs (multiple intermediate concepts) • Controls for concept granularity (search depth) • Basis for inference of hypotheses concerning relationships between data elements
  • Experimental Context: CLL Research Consortium NCI-funded Program/Project (PO1)  Translational research targeting Chronic Lymphocytic Leukemia (CLL)  Established in 1999  Cohort of over 6,000 patients  Comprehensive phenotypic and bio-molecular data sets, as well as bio-specimens 8 participating sites Informatics platform:  Research networking  Clinical trials management  Correlative data management  Bio-specimen management
  • Multi-part CI Evaluation Study in CLL (3) Mining (1) Efficacy (2) Verification Domain & Validation Literature  CKC Evaluation  Automated lit.  Mining CLL • 108 data elements queries literature • 822 UMLS concepts • Random sample • Medline, 2005- • 5800 CKCs (50) 2008 • 5 SMEs  SME “gold • Random sample  Comparison standard” • Literature-based (250) • Support metric CKCs • 86% valid • 90% “meaningful”  Critical • Ontology-based CKCs  Search depth relationship  Critical findings •  support metric • No overlap controls •  “meaningful” • Differing  TOKEn browser • Significant granularity1. correlation • More Knowledge (SMEs) in a Payne PR, Borlawsky T, Kwok A, Dhaval R, Greaves A. Ontology-anchored Approaches to Conceptual timely Discovery Multi-dimensional Research Data Repository. AMIA Translational Bioinformatics Summit Proc. 2008.2. Payne PR, Borlawsky T, Kwok A, Greaves A. Supporting the Design of Translational Clinical Studies Through the Generation and Verification of Conceptual Knowledge-anchored Hypotheses. AMIA Annu Symp Proc. 2008.3. Payne PR, Borlawsky T, Lele O, James S, Greaves AW. The TOKEN Project: Knowledge Synthesis for in-silico Science. Journal of American Medical Informatics Association (JAMIA). 2011
  • CKC Visualization Bone Marrow Cytogenetic & Morphology Chromosomal abnormalities Solid Tumors Tissues of Origin Hematologic Myelogenous Malignancies Malignancies Bio-molecular ProductsTOKEn CKC Network: CLL Research Consortium Metadata
  • Cytogenetic Chromosome LaboratoryAbnormalities Loss Findings Tissues of Origin Protein Treatment Expression Response Leukemias Molecular Tissues of Abnormalities Origin Bone Marrow Morphology LymphomasTOKEn CKC Network: Semantic Partitions
  • Outline  Working definitions and assumptions  Putting biomedical knowledge to work  A practical approach to CDEs  Resource discovery  Hypothesis generation  Discussion  Knowledge-based systems engineering  Big data31
  • Applying Conceptual Knowledge: BuildingKnowledge-Based SystemsPayne PR et al. Translational informatics: enabling high-throughput research paradigms. In: Physiol. Genomics 39: 131-140, 2009
  • Knowledge-based Systems: Replicating Expert PerformanceAdapted from Gaines and Shaw, “Knowledge Acquisition Tools Based On Personal Construct Psychology”, 1993
  • The Importance of Knowledge-based Systems Engineering is Amplified by Our Increased Focus on Big Data  Scalability Volume   Extensibility Reproducibility  Multi- dimensional Velocity data, informatio n, and knowledge Integration  Moving beyond Variability the “hype cycle” and solving real world problems Over $100M investment by NIH, including the creation of centers of excellence34
  • Acknowledgements Collaborators: Funding:  Peter J. Embi, MD, MS  NCI: R01CA134232, R01CA107106, P01CA081534, P50CA140158,  Albert M. Lai, PhD P30CA016058  Kun Huang, PhD  NCATS: U54RR024384  Po-Yin Yen, RN, PhD  NLM: R01LM009533, T15LM011270  Yang Xiang, PhD  AHRQ: R01HS019908  Marcelo Lopetegui, MD  Rockefeller Philanthropy Associates  Tara Borlawsky-Payne, MA  Academy Health – EDM Forum  Omkar Lele, MS, MBA Laboratory for Knowledge  Marjorie Kelley Based Applications and  William Stephens Systems Engineering (KBASE):  Arka Pattanayak  Caryn Roth  Andrew Greaves35
  • Thank you for your time and attention! • philip.payne@osumc.edu • http://go.osu.edu/payne36