BioPortal: ontologies and integrated data resources at the click of a mouse

  • 1,191 views
Uploaded on

Invited presentation at the French Medical Semantic Web workshop 2010 in Nimes. Presentation done in several seminars since then.

Invited presentation at the French Medical Semantic Web workshop 2010 in Nimes. Presentation done in several seminars since then.

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,191
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
31
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Let’s try to understand the context of this work and what we mean by semantic annotation.
  • Common infrastructure for Notes using the Changes and Annotation Ontology (ChAO)
  • Users create notes in order todiscuss class definitionssuggest changes and correctionsrequest new itemsprovide additional information about a class (e.g., references, supporting documentation)
  • found by the tools (efficient, but far from perfect)specified by users (low throughput, but better quality)
  • Les découvertes qui pourraient être réalisées par la fouille des données biomédicales sont limitées car la plupart des ressources publiques ne sont généralement pas décrites à l’aide de terminologies et d'ontologies Pourquoi est-ce que c’est difficile ?Traiter des données textuelle (TAL, désambiguation, polysémie)Mettre en valeur la connaissance des ontologiesAlgorithmes de graphe (e.g., fermeture transitive is_a sur des ontologies de 300K concepts), Distance sémantique, Alignement entre ontologiesEchelle, Ontologies (différents formats, dispatchées, recoupées)Resource de données énormes, e.g., PubMed 17M citation Ontologies et ressources évoluent au cours du temps: Nouvelle version de GO toutes les nuitsReference: beaucoup de travail fait au niveau de l’annotation de produit de genes… ou de la reconnaissance de nom de proteine ou de gene ou de molecules… mais c’est pas forcement des approches basees sur les ontologies (bien que GO soit le meilleur example de success)Faire ce genre de chose avec les maladies par exemple, reste un vrai challenge. Et les maladies elles sont beaucoup decrites dans des ontologies.
  • Elsevier SciVerseKaren Dowell, Jackson LabShai-shen Orr, Mark Davis’s labSean Mooney’s groupIda Sim, UCSFSimon Twigger, Medical college of WisconsinNathan Baker, Washington Univ.Amit Seth, Wright State Univ.Neil Sarkar, University of VermontLarry Hunter, University of Colorado, Denver
  • Let’s try to understand the context of this work and what we mean by semantic annotation.
  • Ontology based annotation is not wide-spread; possibly because of:Lack of a one stop shop for bio-ontologiesLack of tools to annotate datasetsManual  will not scaleAutomatic  can it be ‘good enough’?Lack of a sustainable mechanism to create ontology based annotations

Transcript

  • 1. BioPortal ontologies et ressources de données biomédicales à portée de main…
    Clement Jonquet& BioPortal team
    jonquet@stanford.edu
    Atelier Web Sémantique Médical, Nîmes, France - 8 Juin 2010
    1
  • 2. Présentation de la présentation
    Merci pour cette opportunité
    Contribution de tout le groupe NCBO (~20 pers.)
    Plan
    Présentation générale
    Ce qu’on peut faire avec BioPortal (démo?)
    Discussion
    Article de référence
    N. F. Noy, N. H. Shah, P. L. Whetzel, B. Dai, M. Dorf, N. B. Griffith, C. Jonquet, D. L. Rubin, M. Storey, C. G. Chute, M. A. Musen. BioPortal: ontologies and integrated data resourcesat the click of a mouse. NucleicAcidsResearch, 37:170–173, May 2009.
    2
  • 3. Biologist have adopted ontologies
    To provide canonical representation of scientific knowledge
    To annotate experimental data to enable interpretation, comparison, and discovery across databases
    To facilitate knowledge-based applications for
    Decision support
    Natural language-processing
    Data integration
    But ontologies are: spread out, in different formats, of different size, with different structures
    3
  • 4. What is BioPortal?
    Web repository for biomedical ontologies – “ one stop shop”
    Make ontologies accessible and usable – abstraction on format, locations, structure, etc.
    Users can publish, download, browse, search, comment, align ontologies and use them for annotations both online and via a web services API.
    Community-based ontology development, alignment, and evaluation
    Figures:
    200+ ontologies (OWL, OBO, UMLS)
    ~ 1.7 million terms
    ~ 2 million mappings
    22 annotated biomedical resources
    ~ 10 milliards annotations
    4
  • 5. What are we trying to do
    You’ve built an ontology, how do you let the world know?
    You need an ontology, where do you go o get it?
    How do you know whether an ontology is any good?
    How do you find resources that are relevant to the domain of the ontology (or to specific terms)?
    How could you leverage your ontology to enable new science?
    5
  • 6. Community-based ontology repository
    http://bioportal.bioontology.org
    6
  • 7. BioPortal features
    Library of ontologies (support browsing, visualizing, versioning, metrics, views)
    Search ontologies, resources
    Peer review: comments and discussion
    Mapping
    Annotate data
    7
  • 8. Library of biomedical ontologies
    8
  • 9. Ontology metadata
    9
  • 10. Ontology metrics
    10
    Statistics
    Conformance to
    Best practices
  • 11. Ontology views
    11
    Specific subset
    Other languages
  • 12. Ontology search
    12
    Keywords & options
    Ontologies to use
  • 13. Ontology browsing
    13
  • 14. Ontology visualizing
    14
  • 15. Ontology notes
    15
  • 16. Ontology mappings
    16
  • 17. Mappings in BioPortal
    Ontologies, vocabularies, and terminologies will inevitably overlap in coverage
    Concept-to-concept mappings
    e.g., nostril in NCI Thesaurus is similar to naris in Mouse Anatomy Ontology
    • Found by tools and uploaded in bulk
    • 18. Created by users
    • 19. Provenance
    17
  • 20. How mappings are useful?
    Navigation mechanism, linking one ontology to another
    Annotating & query expansion in search
    Allows to include synonyms defined in other ontologies
    Use for finding “important” or “reference” ontologies
    If everyone maps to NCI Thesaurus, it must be important
    Accessible through web services & RDF to be used in other applications
    18
  • 21. Ontology-based annotation workflow
    19
    First, direct annotations are created by recognizing concepts in raw text,
    Second, annotations are semantically expanded using knowledge of the ontologies,
    Third, all annotations are scored according to the context in which they have been created.
  • 22. Explosion of biomedical data: diverse, distributed, unstructured… not link to ontologies
    • Hard for biomedical researchers to find the data they need
    • 23. Data integration problem
    • 24. Translational discoveries are prevented
    • 25. Good examples
    • 26. GO annotations
    • 27. PubMed (biomedical literature) indexed with Mesh headings
    Annotate data with ontology concepts
    Horizontal approach
    Annotation challenge
    20
    RESOURCES
    ONTOLOGIES
  • 28. NCBO Annotator in BioPortal
    21
  • 29. Code
    Word & Firefox add-ins to call the Annotator Service?
    Excel
    UIMA platform
    Specific UI
    NCBO Annotator service
    Multiple ways to access
  • 30. NCBO Biomedical Resources index
    • We have used the workflow to index several important biomedical resources with ontology concepts (22+)
    • 31. The index can be used to enhance search & data integration
    23
    [DILS 08]
    [BMC BioInfo09]
    [IC 10]
  • 32. Ex: annotation of a GEO element
    24
  • 33. Ontology-based search (1/2)
    Example of resource available (name and description)
    Number of annotations in the NCBO Resource Index
    Ontology concept/term browsed
    Title and URL link to the original element
    Context in which an element has been annotated
    ID of an element
    25
  • 34. Ontology-based search (2/2)
    26
    Ontology concept(s) to use for search
    Keyword to search
    Biomedical resources to query
    Resource elements found
  • 35. Good use of the semantics (1/2)
    • Simple keywords based search miss results
    27
  • 36. 28
    Good use of the semantics (2/2)
  • 37. Ontology recommendation
    29
  • 38. The BioPortal technology
    All BioPortal data is accessible through REST services
    BioPortal user interface accesses the repository through REST services as well
    For example:
    http://bioportal.bioontology.org/visualize/40401/?conceptid=D008545
    http://rest.bioontology.org/bioportal/concepts/40401/?conceptid=D008545
    The BioPortal technology is domain-independent
    BioPortal code is open-source
    Technology stack includes: Protégé, LexGrid, MySQL, Hibernate, Spring, J2EE, Ruby-on-Rails
    30
  • 39. Other installations of BioPortal
    31
  • 40. BioPortal’s future
    Better support of Semantic Web standards
    Done: provide URI for every concept in the ontology
    TBD: ontologies & annotations available through a SPARQL endpoint
    Development of a biomedical mega-thesaurus based on ontology mappings
    Merge ontology editing & publishing
    Scalability
    Distributed architecture
    Enhance views/modularization e.g., different languages
    32
  • 41. Conclusion
    BioPortal is allowing NCBO to experiment with new models for
    Dissemination of knowledge on the Web
    Integration and alignment of online content
    Knowledge visualization and cognitive support
    Peer review of online content
    Exciting context of research & application for both CS and Biomedical informatics
    BioPortal is a good illustration of biomedical semantic web application
    Please try it and join us!
    33
  • 42. Collaborateurs & remerciements
    • @ NCBO, Stanford University
    • 43. Natasha Noy, Mark Musen, Nigam Shah, Patricia Whetzel, Adrien Coulet, Paea Le Pendu, Michael Dorf, Cherie Youn, Paul Alexander, Sean Falconer
    • 44. @ NCBO, somewhere else
    • 45. Peggy Storey, Chris Callendar, Christopher Chute, Pradip Kanjamala, JyotiPathak, Jim Buntrock
    • 46. and many others
    34
  • 47. MerciNational Center for BioMedical Ontologyhttp://www.bioontology.orgBioPortal, biomedical ontology repositoryhttp://bioportal.bioontology.orgContact mejonquet@stanford.edu
    35
  • 48. Develop a mega-thesaurus
    Group mapped concept s from different ontologies to create a single concept
    Similar to the approach taken by NLM with UMLS Metathesaurus
    manual vs. automatic
    36
  • 49. Integration of ontology editing and publishing
    Enable users to go seamlessly between ontology editing and publishing
    Notes created in BioPortal are visible in an ontology editor
    User accounts and roles shared among BioPortal and ontology editors
    Users don’t need to be aware of the difference: they just get their work done
    37
  • 50. Annotation & semantic web
    • Part of the vision for the semantic web
    • 51. Web content must be semantically described using ontologies
    • 52. Semantic annotations help to structure the web
    • 53. Annotation is not an easy task
    • 54. Automatic vs. manual
    • 55. Lack of annotation tools (convenient, simple to use and easily integrated into automatic processes)
    • 56. Today’s web content (& public data available through the web) mainly composed of unstructured text
    38
  • 57. Annotation is not a common practice
    • High number of ontologies
    • 58. Getting access to all is hard: formats, locations, APIs
    • 59. Lack of tools that easily access all ontologies (domain)
    • 60. Users do not always know the structure of an ontology’s content or how to use it in order to do the annotations themselves
    • 61. Lack of tools to do the annotations automatically
    • 62. Boring additional task without immediate reward for the user
    39
  • 63. The challenge
    • Automatically process a piece of raw text to annotate it with relevant ontologies
    • 64. Large scale – to scale up for many resources and ontologies
    • 65. Automatic – to keep precision and accuracy
    • 66. Easy to use and to access – to prevent the biomedical community from getting lost
    • 67. Customizable – to fit very specific needs
    • 68. Smart – to leverage the knowledge contained in ontologies
    40