Your SlideShare is downloading. ×
How Bio Ontologies Enable Open Science
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

How Bio Ontologies Enable Open Science


Published on

Published in: Education, Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Transcript

    • 1. How Bio-Ontologies enable Open Science Nigam Shah [email_address]
    • 2. Ontologies By Pedro Beltrão
    • 3. Key Points
      • Open science requires structured content.
      • Structured content acquisition runs into a curation bottleneck.
        • And “controlled manual curation” will not scale
      • For “open science” to really take off:
        • collaborative curation platforms are going to be necessary and,
        • (semi-)automation of curation is going to be necessary.
      • Researchers need to exactly identify what is being mentioned/discussed.
      • NCBO provides services that support these needs
    • 4.
      • Currently, the main use of ontologies is for making sense of high throughput data.
      There are other uses of course, see Biomedical Ontologies: A functional perspective, Rubin et al, Briefings in Bioinformatics, Dec 2007, Vol 9:1 75-90
    • 5. How does ontology help?
      • An ontology provides a organizing framework for creating “abstractions” of the high throughput data
      • The simplest ontologies (i.e. terminologies, controlled vocabularies) provide the most bang-for-the-buck
        • Gene Ontology (GO) is the prime example
      • More structured ontologies – such as those that represent pathways and more higher order biological concepts – still have to demonstrate real utility.
    • 6. Ontologies and content acquisition
      • First start naming ‘things’
      • Then name ‘relationships’
      • Then comes the ‘logic of combining simple relationships’
      • … realization that all this “structure” is hard to create manually and manual curation will not scale … lots of dead projects.
        • Leads to new found love for text-mining!
    • 7. Emerging trends in content acquisition
      • Increased Structure (in curation and annotations)
      • Collaborative curation platforms
        • Knewco
        • SWAN
        • CBioC
      • Integration of Text-mining in curation
        • Finding entities
          • BioLit by Phil Bourne’s group
        • Finding relations … facts.
          • Larry Hunter’s group
          • Biolink papers
          • EBI-MED
    • 8. Increasing Structure
      • Until now the predominant use of ontologies is as a vocabulary to describe data … minimal structure in the descriptions.
      • Precise capture of biomedical knowledge in structured form is now considered essential
        • Hits the manual curation bottleneck.
        • WA Baumgartner Jr. et al, Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 2007 23(13):i41-i48. Presented at ISMB 2007
    • 9. Knewco: Concept Web and Wikiprofesional
    • 10. The SWAN discourse ontology Ciccarese P, Wu E, Clark T (2007) 'An Overview of the SWAN 1.0 Ontology of Scientific Discourse‘ at the 16th International World Wide Web Conference Banff, Canada. May 8-12, 2007.
    • 11. Collaborative KB curation: SWAN Knowledge Workbench Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
    • 12. Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
    • 13. Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital
    • 14. The SWAN Team and papers
      • Harvard/MGH : Paolo Ciccarese, Marco Ocana, Tim Clark
      • Alzforum : Elizabeth Wu, Gwen Wong, June Kinoshita
      Copyright 2007 Alzheimer Research Forum and Massachusetts General Hospital [1] Gao Y, Kinoshita J, Wu E, Miller E, Lee R, Seaborne A, Cayzer S, Clark T (2006) ‘ SWAN: A Distributed Knowledge Infrastructure for Alzheimer Disease Research’. Journal of Web Semantics 4(3).   [2] Ciccarese P, Wu E, Clark T (2007) 'An Overview of the SWAN 1.0 Ontology of Scientific Discourse'. 16th International World Wide Web Conference (WWW2007). Banff, Canada. May 8-12, 2007. [3] Clark T and Kinoshita J (2007) 'Alzforum and SWAN: The Present and Future of Scientific Web Communities'. Briefings in Bioinformatics 8(3). [4] Ciccarese, P, Wu E, Kinoshita J, Wong G, Ocana M, Ruttenberg A and Clark T (submitted for publication 9/4/2007) 'The SWAN Ontology of Scientific Discourse'. Photo not available
    • 15. Integration of Text-mining + Curation
      • Text mining works better if it uses appropriate ontologies.
      • “ Model” mismatch b/w needs of text mining and needs of KB builders.
      • Text mining might work much better if:
        • It works in a loop with a curator
        • It leverages the wisdom of the masses
    • 16. Integration of Text-mining + Curation
    • 17.  
    • 18.  
    • 19. Quick recap
      • Use of ontologies in collaborative curation and content acquistion is not wide-spread; possibly because of:
        • Lack of a one stop shop for bio-ontologies
        • Lack of tools to use ontologies for annotation
          • Manual  will not scale
          • Automatic  can it be ‘good enough’?
        • Lack of a sustainable mechanism to create ontology based annotations
    • 20. NCBO’s efforts
      • The key ingredients needed for collaborative curation platforms to succeed:
        • Proper use of bioontologies (just enough ontology!)
        • Appropriate use of Natural Language Processing in the curation workflow.
      • NCBO has created web-services that allow use of ontologies in collaborative platforms
    • 21. NCBO ontology services Base URL: Documentation: Description REST URL List all ontologies ./ontologies Find a specific ontology ./ontologies/{ontology version id} Download ontology file ./ontologies/download/{ontology version id} Get versions of an ontology ./ontologies/version/{ontology id} Get concept ./concepts/{ontology version id}/{concept id} Search for concepts ./search/concepts/{query}?ontologies={ids} Get latest version of an ontology ./virtual/{ontology_id} Get concept for latest ontology version ./virtual/{ontology id}/{concept id} List all ontology categories ./categories
    • 22. NCBO annotation services
      • Open Biomedical Annotator (OBA) web service
        • To automatically process textual metadata to recognize relevant ontology concepts and return the terms as annotations
      • Open Biomedical Resource (OBR) index
        • To index the contents of a few biomedical resources with the biomedical concepts to which they relate … and allow programmatic access to the indexed data.
      • URL:
      • Using Ontologies to Annotate Your Data
    • 24. Annotator: The Basic Idea
      • Process textual metadata to automatically tag text with as many ontology terms as possible.
    • 25. Annotator: Usage
      • Give your text as input
      • Select your parameters (ontologies to use, semantic type to filter, semantic expansion…)
      • Get your results… in text, tab-delimited, XML, or OWL
      • Paper in AMIA STB 09
    • 26. DATA SERVICE
      • Using Ontologies to Access and Analyze Public Data
    • 27. Open Biomedical Resources index
      • The index can be used for:
        • Search (next few slides)
        • Data mining (Paper in AMIA STB 08 on mining relationships b/w drugs, diseases and genes from Medline)
    • 28. Example
    • 29.  
    • 30.  
    • 31.  
    • 32. NCBO services Ontology services (OBS) UMLS services BioPortal services Data service (OBR) Annotation service (OBA) Users UCSF Laboratree CollabRx PharmGKB, JAX HGMD Users BioPortal UI PDB/PLoS I2B2 NextBio IO informatics Users “ Resources” tab Knewco IO informatics
    • 33. Uses of NCBO services
      • For programmatic access to latest versions of ontologies
      • For concept recognition from text
        • For annotation
        • For accelerating curation
      • For data aggregation and summarization
    • 34. BioLit web resource: automated recognition of ontology terms and database IDs after publication
    • 35. Automated recognition of ontology terms and database IDs before publication with manual curation by author Word 2007 add-in
    • 36. End
    • 37. Annotation: UCSF
      • The task is to decide which trial is relevant for a particular patient.
        • Use the annotator service to map concepts in eligibility rules to UMLS CUIs
        • Use the annotations from the OBR index to create tag clouds in CTExplorer.
    • 38. Annotation: Laboratree
    • 39. Annotation: CollabRx caTissue/TIES Specimen Banking Specimen management is based on ontologies developed by NCI Ontology-based integration to create a virtual specimen bank
    • 40. Curation: JAX, UCHSC, PDB/PLoS
      • JAX – Use concepts recognized in the abstracts of publications to triage papers for curation.
      • UCSHC – Wrap our annotator as a UIMA component and compare performance on full text
      • PDB/PLoS – BioLit and Word-plugin
    • 41. Ontology Access: I2B2
      • Needs a “source” for ontologies in their ontology cell
      • Using our services, we export BioPortal Ontologies to the I2B2 format.
    • 42. Ontology Access: IO-informatics
    • 43. Ontology Access: NextBio
      • “ Our collaboration with NCBO on adopting public biomedical ontologies throughout NextBio enabled us to create a platform dealing with heterogeneous biological data. These ontology-based search capabilities have resulted in a rapid adoption of NextBio by over 100,000 researchers around the world since our public debut in May of 2008”.
    • 44. Data Summarization: PharmGKB 1. CYP2C9 , 2. VKORC1 , 3. CYP2A6 , etc. 1. Hemorrhage , 2. Venous Thrombosis , etc. 1. warfarin , 2. coumarin , 3. phenoprocoumon , etc. 34 scored annotations: 5 scored annotations: 20 scored annotations:
    • 45. Data Summarization: Knewco
    • 46. Data Summarization: HGMD
      • Use the disease hierarchy from SNOMED-CT to compute “enrichment” of mutation types in particular types of diseases
      • … playing the GO-based microarray analysis game for disease mutations