BioPortal: ontologies and integrated data resources at the click of a mouse

1,581 views
1,493 views

Published on

Invited presentation at the French Medical Semantic Web workshop 2010 in Nimes. Presentation done in several seminars since then.

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,581
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
36
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Let’s try to understand the context of this work and what we mean by semantic annotation.
  • Common infrastructure for Notes using the Changes and Annotation Ontology (ChAO)
  • Users create notes in order todiscuss class definitionssuggest changes and correctionsrequest new itemsprovide additional information about a class (e.g., references, supporting documentation)
  • found by the tools (efficient, but far from perfect)specified by users (low throughput, but better quality)
  • Les découvertes qui pourraient être réalisées par la fouille des données biomédicales sont limitées car la plupart des ressources publiques ne sont généralement pas décrites à l’aide de terminologies et d'ontologies Pourquoi est-ce que c’est difficile ?Traiter des données textuelle (TAL, désambiguation, polysémie)Mettre en valeur la connaissance des ontologiesAlgorithmes de graphe (e.g., fermeture transitive is_a sur des ontologies de 300K concepts), Distance sémantique, Alignement entre ontologiesEchelle, Ontologies (différents formats, dispatchées, recoupées)Resource de données énormes, e.g., PubMed 17M citation Ontologies et ressources évoluent au cours du temps: Nouvelle version de GO toutes les nuitsReference: beaucoup de travail fait au niveau de l’annotation de produit de genes… ou de la reconnaissance de nom de proteine ou de gene ou de molecules… mais c’est pas forcement des approches basees sur les ontologies (bien que GO soit le meilleur example de success)Faire ce genre de chose avec les maladies par exemple, reste un vrai challenge. Et les maladies elles sont beaucoup decrites dans des ontologies.
  • Elsevier SciVerseKaren Dowell, Jackson LabShai-shen Orr, Mark Davis’s labSean Mooney’s groupIda Sim, UCSFSimon Twigger, Medical college of WisconsinNathan Baker, Washington Univ.Amit Seth, Wright State Univ.Neil Sarkar, University of VermontLarry Hunter, University of Colorado, Denver
  • Let’s try to understand the context of this work and what we mean by semantic annotation.
  • Ontology based annotation is not wide-spread; possibly because of:Lack of a one stop shop for bio-ontologiesLack of tools to annotate datasetsManual  will not scaleAutomatic  can it be ‘good enough’?Lack of a sustainable mechanism to create ontology based annotations
  • BioPortal: ontologies and integrated data resources at the click of a mouse

    1. 1. BioPortal ontologies et ressources de données biomédicales à portée de main…<br />Clement Jonquet& BioPortal team<br />jonquet@stanford.edu<br />Atelier Web Sémantique Médical, Nîmes, France - 8 Juin 2010<br />1<br />
    2. 2. Présentation de la présentation<br />Merci pour cette opportunité<br />Contribution de tout le groupe NCBO (~20 pers.)<br />Plan<br />Présentation générale<br />Ce qu’on peut faire avec BioPortal (démo?)<br />Discussion<br />Article de référence<br />N. F. Noy, N. H. Shah, P. L. Whetzel, B. Dai, M. Dorf, N. B. Griffith, C. Jonquet, D. L. Rubin, M. Storey, C. G. Chute, M. A. Musen. BioPortal: ontologies and integrated data resourcesat the click of a mouse. NucleicAcidsResearch, 37:170–173, May 2009.<br />2<br />
    3. 3. Biologist have adopted ontologies<br />To provide canonical representation of scientific knowledge<br />To annotate experimental data to enable interpretation, comparison, and discovery across databases<br />To facilitate knowledge-based applications for<br />Decision support<br />Natural language-processing<br />Data integration<br />But ontologies are: spread out, in different formats, of different size, with different structures<br />3<br />
    4. 4. What is BioPortal?<br />Web repository for biomedical ontologies – “ one stop shop”<br />Make ontologies accessible and usable – abstraction on format, locations, structure, etc.<br />Users can publish, download, browse, search, comment, align ontologies and use them for annotations both online and via a web services API.<br />Community-based ontology development, alignment, and evaluation<br />Figures: <br />200+ ontologies (OWL, OBO, UMLS)<br />~ 1.7 million terms<br />~ 2 million mappings<br />22 annotated biomedical resources<br />~ 10 milliards annotations<br />4<br />
    5. 5. What are we trying to do<br />You’ve built an ontology, how do you let the world know?<br />You need an ontology, where do you go o get it?<br />How do you know whether an ontology is any good?<br />How do you find resources that are relevant to the domain of the ontology (or to specific terms)?<br />How could you leverage your ontology to enable new science?<br />5<br />
    6. 6. Community-based ontology repository<br />http://bioportal.bioontology.org<br />6<br />
    7. 7. BioPortal features<br />Library of ontologies (support browsing, visualizing, versioning, metrics, views)<br />Search ontologies, resources<br />Peer review: comments and discussion<br />Mapping<br />Annotate data<br />7<br />
    8. 8. Library of biomedical ontologies<br />8<br />
    9. 9. Ontology metadata<br />9<br />
    10. 10. Ontology metrics<br />10<br />Statistics<br />Conformance to <br />Best practices<br />
    11. 11. Ontology views<br />11<br />Specific subset<br />Other languages<br />
    12. 12. Ontology search<br />12<br />Keywords & options<br />Ontologies to use<br />
    13. 13. Ontology browsing<br />13<br />
    14. 14. Ontology visualizing<br />14<br />
    15. 15. Ontology notes<br />15<br />
    16. 16. Ontology mappings<br />16<br />
    17. 17. Mappings in BioPortal<br />Ontologies, vocabularies, and terminologies will inevitably overlap in coverage<br />Concept-to-concept mappings<br />e.g., nostril in NCI Thesaurus is similar to naris in Mouse Anatomy Ontology<br /><ul><li>Found by tools and uploaded in bulk
    18. 18. Created by users
    19. 19. Provenance</li></ul>17<br />
    20. 20. How mappings are useful?<br />Navigation mechanism, linking one ontology to another<br />Annotating & query expansion in search <br />Allows to include synonyms defined in other ontologies<br />Use for finding “important” or “reference” ontologies<br />If everyone maps to NCI Thesaurus, it must be important<br />Accessible through web services & RDF to be used in other applications<br />18<br />
    21. 21. Ontology-based annotation workflow<br />19<br />First, direct annotations are created by recognizing concepts in raw text,<br />Second, annotations are semantically expanded using knowledge of the ontologies,<br />Third, all annotations are scored according to the context in which they have been created.<br />
    22. 22. Explosion of biomedical data: diverse, distributed, unstructured… not link to ontologies<br /><ul><li>Hard for biomedical researchers to find the data they need
    23. 23. Data integration problem
    24. 24. Translational discoveries are prevented
    25. 25. Good examples
    26. 26. GO annotations
    27. 27. PubMed (biomedical literature) indexed with Mesh headings</li></ul>Annotate data with ontology concepts<br />Horizontal approach<br />Annotation challenge<br />20<br />RESOURCES<br />ONTOLOGIES<br />
    28. 28. NCBO Annotator in BioPortal<br />21<br />
    29. 29. Code<br />Word & Firefox add-ins to call the Annotator Service?<br />Excel<br />UIMA platform<br />Specific UI<br />NCBO Annotator service<br />Multiple ways to access<br />
    30. 30. NCBO Biomedical Resources index<br /><ul><li>We have used the workflow to index several important biomedical resources with ontology concepts (22+)
    31. 31. The index can be used to enhance search & data integration</li></ul>23<br />[DILS 08]<br />[BMC BioInfo09]<br />[IC 10]<br />
    32. 32. Ex: annotation of a GEO element<br />24<br />
    33. 33. Ontology-based search (1/2)<br />Example of resource available (name and description)<br />Number of annotations in the NCBO Resource Index<br />Ontology concept/term browsed<br />Title and URL link to the original element<br />Context in which an element has been annotated<br />ID of an element<br />25<br />
    34. 34. Ontology-based search (2/2)<br />26<br />Ontology concept(s) to use for search<br />Keyword to search<br />Biomedical resources to query<br />Resource elements found<br />
    35. 35. Good use of the semantics (1/2)<br /><ul><li>Simple keywords based search miss results</li></ul>27<br />
    36. 36. 28<br />Good use of the semantics (2/2)<br />
    37. 37. Ontology recommendation<br />29<br />
    38. 38. The BioPortal technology<br />All BioPortal data is accessible through REST services<br />BioPortal user interface accesses the repository through REST services as well<br />For example: <br />http://bioportal.bioontology.org/visualize/40401/?conceptid=D008545<br />http://rest.bioontology.org/bioportal/concepts/40401/?conceptid=D008545<br />The BioPortal technology is domain-independent<br />BioPortal code is open-source<br />Technology stack includes: Protégé, LexGrid, MySQL, Hibernate, Spring, J2EE, Ruby-on-Rails<br />30<br />
    39. 39. Other installations of BioPortal<br />31<br />
    40. 40. BioPortal’s future<br />Better support of Semantic Web standards<br />Done: provide URI for every concept in the ontology<br />TBD: ontologies & annotations available through a SPARQL endpoint<br />Development of a biomedical mega-thesaurus based on ontology mappings<br />Merge ontology editing & publishing<br />Scalability<br />Distributed architecture<br />Enhance views/modularization e.g., different languages<br />32<br />
    41. 41. Conclusion<br />BioPortal is allowing NCBO to experiment with new models for<br />Dissemination of knowledge on the Web<br />Integration and alignment of online content<br />Knowledge visualization and cognitive support <br />Peer review of online content<br />Exciting context of research & application for both CS and Biomedical informatics<br />BioPortal is a good illustration of biomedical semantic web application<br />Please try it and join us!<br />33<br />
    42. 42. Collaborateurs & remerciements<br /><ul><li>@ NCBO, Stanford University
    43. 43. Natasha Noy, Mark Musen, Nigam Shah, Patricia Whetzel, Adrien Coulet, Paea Le Pendu, Michael Dorf, Cherie Youn, Paul Alexander, Sean Falconer
    44. 44. @ NCBO, somewhere else
    45. 45. Peggy Storey, Chris Callendar, Christopher Chute, Pradip Kanjamala, JyotiPathak, Jim Buntrock
    46. 46. and many others</li></ul>34<br />
    47. 47. MerciNational Center for BioMedical Ontologyhttp://www.bioontology.orgBioPortal, biomedical ontology repositoryhttp://bioportal.bioontology.orgContact mejonquet@stanford.edu<br />35<br />
    48. 48. Develop a mega-thesaurus<br />Group mapped concept s from different ontologies to create a single concept<br />Similar to the approach taken by NLM with UMLS Metathesaurus<br />manual vs. automatic<br />36<br />
    49. 49. Integration of ontology editing and publishing<br />Enable users to go seamlessly between ontology editing and publishing<br />Notes created in BioPortal are visible in an ontology editor<br />User accounts and roles shared among BioPortal and ontology editors<br />Users don’t need to be aware of the difference: they just get their work done<br />37<br />
    50. 50. Annotation & semantic web<br /><ul><li>Part of the vision for the semantic web
    51. 51. Web content must be semantically described using ontologies
    52. 52. Semantic annotations help to structure the web
    53. 53. Annotation is not an easy task
    54. 54. Automatic vs. manual
    55. 55. Lack of annotation tools (convenient, simple to use and easily integrated into automatic processes)
    56. 56. Today’s web content (& public data available through the web) mainly composed of unstructured text</li></ul>38<br />
    57. 57. Annotation is not a common practice<br /><ul><li>High number of ontologies
    58. 58. Getting access to all is hard: formats, locations, APIs
    59. 59. Lack of tools that easily access all ontologies (domain)
    60. 60. Users do not always know the structure of an ontology’s content or how to use it in order to do the annotations themselves
    61. 61. Lack of tools to do the annotations automatically
    62. 62. Boring additional task without immediate reward for the user</li></ul>39<br />
    63. 63. The challenge<br /><ul><li>Automatically process a piece of raw text to annotate it with relevant ontologies
    64. 64. Large scale – to scale up for many resources and ontologies
    65. 65. Automatic – to keep precision and accuracy
    66. 66. Easy to use and to access – to prevent the biomedical community from getting lost
    67. 67. Customizable – to fit very specific needs
    68. 68. Smart – to leverage the knowledge contained in ontologies</li></ul>40<br />

    ×