How Bio Ontologies Enable Open Science
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


How Bio Ontologies Enable Open Science






Total Views
Views on SlideShare
Embed Views



5 Embeds 11 4 3 2 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

How Bio Ontologies Enable Open Science Presentation Transcript

  • 1. How Bio-Ontologies enable Open Science Nigam Shah [email_address]
  • 2. Ontologies By Pedro Beltrão
  • 3. Key Points
    • Open science requires structured content.
    • Structured content acquisition runs into a curation bottleneck.
      • And “controlled manual curation” will not scale
    • For “open science” to really take off:
      • collaborative curation platforms are going to be necessary and,
      • (semi-)automation of curation is going to be necessary.
    • Researchers need to exactly identify what is being mentioned/discussed.
    • NCBO provides services that support these needs
  • 4.
    • Currently, the main use of ontologies is for making sense of high throughput data.
    There are other uses of course, see Biomedical Ontologies: A functional perspective, Rubin et al, Briefings in Bioinformatics, Dec 2007, Vol 9:1 75-90
  • 5. How does ontology help?
    • An ontology provides a organizing framework for creating “abstractions” of the high throughput data
    • The simplest ontologies (i.e. terminologies, controlled vocabularies) provide the most bang-for-the-buck
      • Gene Ontology (GO) is the prime example
    • More structured ontologies – such as those that represent pathways and more higher order biological concepts – still have to demonstrate real utility.
  • 6. Ontologies and content acquisition
    • First start naming ‘things’
    • Then name ‘relationships’
    • Then comes the ‘logic of combining simple relationships’
    • … realization that all this “structure” is hard to create manually and manual curation will not scale … lots of dead projects.
      • Leads to new found love for text-mining!
  • 7. Emerging trends in content acquisition
    • Increased Structure (in curation and annotations)
    • Collaborative curation platforms
      • Knewco
      • SWAN
      • CBioC
    • Integration of Text-mining in curation
      • Finding entities
        • BioLit by Phil Bourne’s group
      • Finding relations … facts.
        • Larry Hunter’s group
        • Biolink papers
        • EBI-MED
  • 8. Increasing Structure
    • Until now the predominant use of ontologies is as a vocabulary to describe data … minimal structure in the descriptions.
    • Precise capture of biomedical knowledge in structured form is now considered essential
      • Hits the manual curation bottleneck.
      • WA Baumgartner Jr. et al, Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 2007 23(13):i41-i48. Presented at ISMB 2007
  • 9. Knewco: Concept Web and Wikiprofesional
  • 10. The SWAN discourse ontology Ciccarese P, Wu E, Clark T (2007) 'An Overview of the SWAN 1.0 Ontology of Scientific Discourse‘ at the 16th International World Wide Web Conference Banff, Canada. May 8-12, 2007.
  • 11. Collaborative KB curation: SWAN Knowledge Workbench Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
  • 12. Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
  • 13. Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
  • 14. The SWAN Team and papers
    • Harvard/MGH : Paolo Ciccarese, Marco Ocana, Tim Clark
    • Alzforum : Elizabeth Wu, Gwen Wong, June Kinoshita
    Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital [1] Gao Y, Kinoshita J, Wu E, Miller E, Lee R, Seaborne A, Cayzer S, Clark T (2006) ‘ SWAN: A Distributed Knowledge Infrastructure for Alzheimer Disease Research’. Journal of Web Semantics 4(3).   [2] Ciccarese P, Wu E, Clark T (2007) 'An Overview of the SWAN 1.0 Ontology of Scientific Discourse'. 16th International World Wide Web Conference (WWW2007). Banff, Canada. May 8-12, 2007. [3] Clark T and Kinoshita J (2007) 'Alzforum and SWAN: The Present and Future of Scientific Web Communities'. Briefings in Bioinformatics 8(3). [4] Ciccarese, P, Wu E, Kinoshita J, Wong G, Ocana M, Ruttenberg A and Clark T (submitted for publication 9/4/2007) 'The SWAN Ontology of Scientific Discourse'. Photo not available
  • 15. Integration of Text-mining + Curation
    • Text mining works better if it uses appropriate ontologies.
    • “ Model” mismatch b/w needs of text mining and needs of KB builders.
    • Text mining might work much better if:
      • It works in a loop with a curator
      • It leverages the wisdom of the masses
  • 16. Integration of Text-mining + Curation
  • 17.  
  • 18.  
  • 19. Quick recap
    • Use of ontologies in collaborative curation and content acquistion is not wide-spread; possibly because of:
      • Lack of a one stop shop for bio-ontologies
      • Lack of tools to use ontologies for annotation
        • Manual  will not scale
        • Automatic  can it be ‘good enough’?
      • Lack of a sustainable mechanism to create ontology based annotations
  • 20. NCBO’s efforts
    • The key ingredients needed for collaborative curation platforms to succeed:
      • Proper use of bioontologies (just enough ontology!)
      • Appropriate use of Natural Language Processing in the curation workflow.
    • NCBO has created web-services that allow use of ontologies in collaborative platforms
  • 21. NCBO ontology services Base URL: Documentation: Description REST URL List all ontologies ./ontologies Find a specific ontology ./ontologies/{ontology version id} Download ontology file ./ontologies/download/{ontology version id} Get versions of an ontology ./ontologies/version/{ontology id} Get concept ./concepts/{ontology version id}/{concept id} Search for concepts ./search/concepts/{query}?ontologies={ids} Get latest version of an ontology ./virtual/{ontology_id} Get concept for latest ontology version ./virtual/{ontology id}/{concept id} List all ontology categories ./categories
  • 22. NCBO annotation services
    • Open Biomedical Annotator (OBA) web service
      • To automatically process textual metadata to recognize relevant ontology concepts and return the terms as annotations
    • Open Biomedical Resource (OBR) index
      • To index the contents of a few biomedical resources with the biomedical concepts to which they relate … and allow programmatic access to the indexed data.
    • URL:
    • Using Ontologies to Annotate Your Data
  • 24. Annotator: The Basic Idea
    • Process textual metadata to automatically tag text with as many ontology terms as possible.
  • 25. Annotator: Usage
    • Give your text as input
    • Select your parameters (ontologies to use, semantic type to filter, semantic expansion…)
    • Get your results… in text, tab-delimited, XML, or OWL
    • Paper in AMIA STB 09
    • Using Ontologies to Access and Analyze Public Data
  • 27. Open Biomedical Resources index
    • The index can be used for:
      • Search (next few slides)
      • Data mining (Paper in AMIA STB 08 on mining relationships b/w drugs, diseases and genes from Medline)
  • 28. Example
  • 29.  
  • 30.  
  • 31.  
  • 32. NCBO services Ontology services (OBS) UMLS services BioPortal services Data service (OBR) Annotation service (OBA) Users UCSF Laboratree CollabRx PharmGKB, JAX HGMD Users BioPortal UI PDB/PLoS I2B2 NextBio IO informatics Users “ Resources” tab Knewco IO informatics
  • 33. Uses of NCBO services
    • For programmatic access to latest versions of ontologies
    • For concept recognition from text
      • For annotation
      • For accelerating curation
    • For data aggregation and summarization
  • 34. BioLit web resource: automated recognition of ontology terms and database IDs after publication
  • 35. Automated recognition of ontology terms and database IDs before publication with manual curation by author Word 2007 add-in
  • 36. End
  • 37. Annotation: UCSF
    • The task is to decide which trial is relevant for a particular patient.
      • Use the annotator service to map concepts in eligibility rules to UMLS CUIs
      • Use the annotations from the OBR index to create tag clouds in CTExplorer.
  • 38. Annotation: Laboratree
  • 39. Annotation: CollabRx caTissue/TIES Specimen Banking Specimen management is based on ontologies developed by NCI Ontology-based integration to create a virtual specimen bank
  • 40. Curation: JAX, UCHSC, PDB/PLoS
    • JAX – Use concepts recognized in the abstracts of publications to triage papers for curation.
    • UCSHC – Wrap our annotator as a UIMA component and compare performance on full text
    • PDB/PLoS – BioLit and Word-plugin
  • 41. Ontology Access: I2B2
    • Needs a “source” for ontologies in their ontology cell
    • Using our services, we export BioPortal Ontologies to the I2B2 format.
  • 42. Ontology Access: IO-informatics
  • 43. Ontology Access: NextBio
    • “ Our collaboration with NCBO on adopting public biomedical ontologies throughout NextBio enabled us to create a platform dealing with heterogeneous biological data. These ontology-based search capabilities have resulted in a rapid adoption of NextBio by over 100,000 researchers around the world since our public debut in May of 2008”.
  • 44. Data Summarization: PharmGKB 1. CYP2C9 , 2. VKORC1 , 3. CYP2A6 , etc. 1. Hemorrhage , 2. Venous Thrombosis , etc. 1. warfarin , 2. coumarin , 3. phenoprocoumon , etc. 34 scored annotations: 5 scored annotations: 20 scored annotations:
  • 45. Data Summarization: Knewco
  • 46. Data Summarization: HGMD
    • Use the disease hierarchy from SNOMED-CT to compute “enrichment” of mutation types in particular types of diseases
    • … playing the GO-based microarray analysis game for disease mutations