Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A few contributions of the SIFR (Semantic Indexing of French biomedical Resources project) and how we reuse NCBO technology

491 views

Published on

Keynote de Clement Jonquet a l'atelier RISE (Atelier Recherche d’Information Sémantique) le 30 juin 2015

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

A few contributions of the SIFR (Semantic Indexing of French biomedical Resources project) and how we reuse NCBO technology

  1. 1. Atelier Recherche d’Information Sémantique, RISE’15 30 juin 2015 – Rennes Clement Jonquet – jonquet@lirmm.fr A few contributions of the SIFR (Semantic Indexing of French biomedical Resources project) and how we reuse NCBO technology
  2. 2. How is this relevant to RISE?  Modèles de Recherche d'Information Sémantique  Extraction d'Information  Annotation Sémantique  Indexation Sémantique  Alignement d'ontologies et correspondances pour la Recherche d'Information  Langages de Représentation des connaissances pour la Recherche d'Information  Utilisation des distances Sémantiques pour la Recherche d'Information Atelier RISE 2015 30 juin 2015, Rennes
  3. 3. A few introduction words Atelier RISE 2015 30 juin 2015, Rennes
  4. 4. Biologist have adopted ontologies  To provide canonical representation of scientific knowledge  To annotate experimental data to enable interpretation, comparison, and discovery across databases  To facilitate knowledge-based applications for  Decision support  Natural language-processing  Data integration  But ontologies are: spread out, in different formats, of different size, with different structures Atelier RISE 2015 30 juin 2015, Rennes
  5. 5. Working with terminologies & ontologies – a portal please!  You’ve built an ontology, how do you let the world know?  You need an ontology, where do you go o get it?  How do you know whether an ontology is any good?  How do you find resources that are relevant to the domain of the ontology (or to specific terms)?  How could you leverage your ontology to enable new science?  How could you use ontologies without managing them ? Atelier RISE 2015 30 juin 2015, Rennes
  6. 6. Atelier RISE 2015 30 juin 2015, Rennes  Comparison of the approaches [IWBBIO'14]
  7. 7. Annotation challenge  Explosion of biomedical data: diverse, distributed, unstructured… not linked to ontologies  Hard for biomedical researchers to find the data they need  Data integration problem  Translational discoveries are prevented  Good examples  GO annotations  PubMed (biomedical literature) indexed with Mesh headings  Annotate data with ontology concepts  Horizontal approach ONTOLOGIES RESOURCES Atelier RISE 2015 30 juin 2015, Rennes
  8. 8. Good use of the semantics (1/2)  Simple keywords based search miss results Atelier RISE 2015 30 juin 2015, Rennes
  9. 9. Good use of the semantics (2/2) Atelier RISE 2015 30 juin 2015, Rennes
  10. 10. A few words about SIFR project Atelier RISE 2015 30 juin 2015, Rennes
  11. 11. Semantic Indexing of French Biomedical Data Resources project … in collaboration with…
  12. 12. People  Young researchers  Clement Jonquet  Mathieu Roche  Sandra Bringay  Advisors  Stefano A. Cerri  Maguelonne Teisseire  Pascal Poncelet  Staff  Vincent Emonet  Students  Juan Antonio Lossio Ventura  Guillaume Surroca  ~3 MSc students / year  Close collaborators  Philippe Lemoisson (TETIS)  Pierre Larmande (IRD / IBC)  Mark Musen (BMIR)  Stefan Darmoni (CISMEF)  Sebastien Harispe (LGI2P) Atelier RISE 2015 30 juin 2015, Rennes
  13. 13. Increasing number of biomedical data + multilingualism  Limits of keyword-based indexing  Biomedical community has turned to ontologies to describe their data and turn them into structured and formalized knowledge  Using ontologies is by means of creating semantic annotations  Crucial need for tools & services for French biomedical data  Biomedical data integration challenge  New potential sceintific discoveries hidden in data  Translational research Atelier RISE 2015 30 juin 2015, Rennes
  14. 14. Use ontologies for indexing, mining and searching (French) biomedical data  Obj1: Design, development and deployment of the French Annotator.  Obj2: Obtain new research results to exploit and enhance ontology-based indexing services.  semantic distances  ontology alignment  ontology enrichment and disambiguation  Obj3: Valorization of indexing services Atelier RISE 2015 30 juin 2015, Rennes
  15. 15. Atelier RISE 2015 30 juin 2015, Rennes A French biomedical Annotator
  16. 16. Atelier RISE 2015 30 juin 2015, Rennes Use biomedical ontologies-based annotations end-user applications
  17. 17. Reuse of the NCBO technology Atelier RISE 2015 30 juin 2015, Rennes
  18. 18. Bioportal : A “one stop shop” for Biomedical Ontologies  Web repository for biomedical ontologies  Make ontologies accessible and usable – abstraction on format, locations, structure, etc.  Users can publish, download, browse, search, comment, align ontologies and use them for annotations both online and via a web services API.  Online support for ontology  Peer review  Notes (comments and discussion)  Versioning  Mapping  Search  Resources Atelier RISE 2015 30 juin 2015, Rennes
  19. 19. http://bioportal.bioontology.org BioPortal Ontology Repository
  20. 20. http://data.bioontology.org Ontology Services • Search • Traverse • Comment • Download Widgets • Tree-view • Auto-complete • Graph-view Annotation Data Access Mapping Services • Create • Upload • Download Term recognition Search “data” annotated with a given term http://bioportal.bioontology.org Atelier RISE 2015 30 juin 2015, Rennes
  21. 21. Current axes of research Atelier RISE 2015 30 juin 2015, Rennes
  22. 22. SIFR axes of research (1/8): Design of the SIFR (French) Annotator service  Deployment of a local instance of BioPortal at LIRMM  16 French terminologies imported from UMLS, EHTOP & BioPortal  UTF8 compliant Mgrep concept recognizer (Univ. of Michigan)  http://bioportal.lirmm.fr/annotator  New improvement to the annotation workflow  Automatic term extraction measures (C-value, LIDF-value, etc.)  Scoring of annotations & representation in RDF using the AO [SWAT4LS 2014] Atelier RISE 2015 30 juin 2015, Rennes
  23. 23. Improving the Annotator(s) – example with scoring  Objective : To improve the Annotator(s) results by ranking the annotations according to their relevance  While not changing the service implementation  Take into account their frequencies (as originally proposed in 2009 and removed)  Add a term extraction measure, called C-Value, used to positively discriminate annotations generated from matches with multi-word terms.  2 new scoring methods allowing to score and rank annotations by their importance in the given input data  Interesting results validated against PubMed manual annotations  [SWAT4LS 2014] Atelier RISE 2015 30 juin 2015, Rennes
  24. 24. SIFR axes of research (2/8): Dealing with multilingualism within BioPortal  Status of multilingualism in BioPortal – quite negative  Set of propositions [MSW 2014]  Representation of natural language property for an ontology  Representation of the distinction between ontologies  Representation of relation between ontologies  Representation of multilingual translation mappings  Reconciliation of multilingual mappings (possible PhD collaboration with ESI)  Currently being tested/implemented within our local instance Atelier RISE 2015 30 juin 2015, Rennes
  25. 25. What is being multilingual?  Interface internationalization = displaying static elements of the user interface (e.g., menu names, help, etc.) in different languages  Content internationalization = displaying BioPortal content (e.g., ontology labels, mappings, etc.) in different languages  Multilingual = internationalization (display) + to enabling a complete use of the functionalities and services of BioPortal for multilingual ontologies or monolingual ontologies  completely and properly addressed (languages, translations, multilingual mappings, etc.)  rich semantic description  Being able to parse multilingual content in ontologies (from xmllang to Lemon) Atelier RISE 2015 30 juin 2015, Rennes
  26. 26. multilingual ontology Atelier RISE 2015 30 juin 2015, Rennes en:disease fr:maladie ... en:cancer fr:cancer en:spindel cell sarcome fr:sarcome à cellules fusiformes en:melanoma fr:mélanome disease ... cancer spindle cell sarcome melanoma maladie ... cancer sarcome à cellules fusiformes mélanome language specific ontology (monolingual)
  27. 27. SIFR axes of research (3/8): Automatic extraction of biomedical terminology from text  Context of the PhD of Juan Antonio Lossio [LBM 2013][TALN 2014][PolTAL 2014]  BioTex , software http://tubo.lirmm.fr/biotex [ISWC 2014]  Work in French, English and Spanish  Motivations for automatic terminology extraction  Experiment and validate approaches for French data  Contribute to the ontology enrichment process  Acquire some NLP expertise for the annotation workflow Atelier RISE 2015 30 juin 2015, Rennes
  28. 28. Atelier RISE 2015 30 juin 2015, Rennes
  29. 29. Statistical methods C-value: Improves the extraction of longest terms soft contact soft contact lens Frantzi, K., Ananiadou, S., & Mima, H. (2000). Automatic recognition of multi-word terms:. the c-value/nc-value method. International Journal on Digital Libraries, 3(2), 115-130. Atelier RISE 2015 30 juin 2015, Rennes
  30. 30. Atelier RISE 2015 30 juin 2015, Rennes
  31. 31. Atelier RISE 2015 30 juin 2015, Rennes
  32. 32. Include BioTex into BioPortal  Use BioPortal dictionary for validation  New ontology enrichment service… give a corpus of data and see what are the terms not yet covered Atelier RISE 2015 30 juin 2015, Rennes
  33. 33. SIFR axes of research (4/8): Semantic distance framework  Automatically compute existing (Rada, Wu&Palmer, Resnik) semantic similarity measures over BioPortal ontologies  For a given concept get all semantically closed concepts  Get the semantic distance between 2 concepts  Collaboration with LGI2P to reuse Semantic Measure Library (SML) within BioPortal  1st prototype: http://tubo.lirmm.fr/BioMedicalSemantic/web/app_dev.php  To include SML within BioPortal backend to bring semantic distance services to the ontologies and data annotated Atelier RISE 2015 30 juin 2015, Rennes
  34. 34. SIFR axes of research (5/8): Informal patient data analysis  Dealing with public patient data on blogs, forums and tweets (Sandra Bringay)  Detection of emotion [EGC 2014][eTELEMED 2014]  Patient vocabulary (crabe vs. cancer)  Project “Parlons de nous” (www.lirmm.fr/patient-mind)  MSH-M  A patient vocabulary currently being constructed [IC 2015]  Hosted and available in our local instance of BioPortal  Used for annotations, indexing, information retrieval Atelier RISE 2015 30 juin 2015, Rennes
  35. 35. SIFR axes of research (6/8): Viewpoint: a subjective knowledge representation formalism  Collaboration with P. Lemoisson (CIRAD) & PhD of G. Surroca  Graph based knowledge representation formalism  Linked data from the semantic Web and user contributions from the social Web.  Unified topological approach  First prototype for semantic search over HAL-LIRMM publications [IC2014]  Capture the phenomenon of Serendipity (i.e., incidental learning) [IC 2015] Atelier RISE 2015 30 juin 2015, Rennes
  36. 36. SIFR axes of research (7/8): Pharmacogenomics use case  PGx studies how individual gene variations cause variability in drug responses  Validation of pharmacogenomics state-of-the-art knowledge on the basis of practice-based evidences  Compare pharmacogenomics literature (in English) and electronic health records (in French)  EHRs from Paris (HEGP) & St Etienne hospitals  Improvement of the AnnotatorS to come to handle clinical data: negation, disambiguation, modularity, temporality  Project submitted to ANR generic call 2015 (April 27th)  Collaborative action lead by Adrien Coulet (LORIA)  Stanford is in the loop (Russ, Mark, Michel, Nigam) Atelier RISE 2015 30 juin 2015, Rennes
  37. 37. SIFR axes of research (8/8): application to agronomy & plant  Within the Institute of Computational Biology of Montpellier  Design of a semantic annotation workflow for plant data - collaboration with IBC project [CO-PDI 2014]  AgroLD: to build an RDF knowledge base to house plant data resources: SouthGreen, Gramene, OryGeneDB… [RDA 2014]  AgroPortal: reference ontology repository for the agronomic domain [IN-OVIVE 2015]  Experiment NCBO technologies for the plant community  4 driving agronomic use cases Atelier RISE 2015 30 juin 2015, Rennes
  38. 38. Objectives of AgroPortal project  Develop and support a reference ontology repository for the agronomic domain  One-stop-shop for plant/agronomic related ontologies  Primary focus on the agronomic & plant domain  Reusing the NCBO BioPortal technology  Avoid to re-implement what has been done  Facilitate interoperability  Reusing the scientific outcomes, experience & methods of the biomedical domain  Enable straightforward use of agronomic related ontologies  Respect the requirements of the agronomic community  Fully semantic web compliant infrastructure Atelier RISE 2015 30 juin 2015, Rennes
  39. 39. AgroPortal  50 ontologies relevant to agronomic and plant Atelier RISE 2015 30 juin 2015, Rennes
  40. 40. A few conclusions Atelier RISE 2015 30 juin 2015, Rennes
  41. 41. Next future  Continue to move different prototypes into production  Release of the French Annotator  Find more use cases  Collaboration with the plant/agro community  Continue reusing and contributing to NCBO technology Atelier RISE 2015 30 juin 2015, Rennes
  42. 42. Online resources  Web page: www.lirmm.fr/sifr  https://www.researchgate.net/projects  Code repository: https://github.com/sifrproject  13 developpers  10 repositories  Publications: http://bit.ly/194ImnR  Direct link to HAL-LIRMM platform with advance search features  Portals & services:  http://bioportal.lirmm.fr  http://agroportal.lirmm.fr Atelier RISE 2015 30 juin 2015, Rennes
  43. 43. Questions & Remarks ? Atelier RISE 2015 30 juin 2015, Rennes

×