Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

About the use of biomedical ontologies to play with text in the context of the SIFR project.

175 views

Published on

Conférence donéée au LGI2P (Conférence Communication Science et Société) à Nimes le 17 mars 2015. Contenu en partie produit par le travail de Juan Antonio Lossio Ventura.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

About the use of biomedical ontologies to play with text in the context of the SIFR project.

  1. 1. About the use of biomedical ontologies to play with text … in the context of the… Clement Jonquet (jonquet@lirmm.fr) Conférence Communication Science et Société LGI2P, Nimes – 17 mars 2015
  2. 2. A few introduction words Conference C2S LGI2P, Nimes – 17 mars 2015
  3. 3. Biologist have adopted ontologies  To provide canonical representation of scientific knowledge  To annotate experimental data to enable interpretation, comparison, and discovery across databases  To facilitate knowledge-based applications for  Decision support  Natural language-processing  Data integration  But ontologies are: spread out, in different formats, of different size, with different structures Conference C2S LGI2P, Nimes – 17 mars 2015
  4. 4. Working with terminologies & ontologies – a portal please!  You’ve built an ontology, how do you let the world know?  You need an ontology, where do you go o get it?  How do you know whether an ontology is any good?  How do you find resources that are relevant to the domain of the ontology (or to specific terms)?  How could you leverage your ontology to enable new science?  How could you use ontologies without managing them ? Conference C2S LGI2P, Nimes – 17 mars 2015
  5. 5. Conference C2S LGI2P, Nimes – 17 mars 2015 Comparison in [IWBBIO'14]
  6. 6. Annotation challenge  Explosion of biomedical data: diverse, distributed, unstructured… not link to ontologies  Hard for biomedical researchers to find the data they need  Data integration problem  Translational discoveries are prevented  Good examples  GO annotations  PubMed (biomedical literature) indexed with Mesh headings  Annotate data with ontology concepts  Horizontal approach ONTOLOGIES RESOURCES Conference C2S LGI2P, Nimes – 17 mars 2015
  7. 7. A few words about SIFR project Conference C2S LGI2P, Nimes – 17 mars 2015
  8. 8. Semantic Indexing of French Biomedical Data Resources project … in collaboration with…
  9. 9. Context: increasing number of biomedical data + multilingualism  Limits of keyword-based indexing  Biomedical community has turned to ontologies to describe their data and turn them into structured and formalized knowledge  Using ontologies is by means of creating semantic annotations  Crucial need for tools & services for French biomedical data  Biomedical data integration challenge  New potential sceintific discoveries hidden in data  Translational research Conference C2S LGI2P, Nimes – 17 mars 2015
  10. 10. Use ontologies for indexing, mining and searching (French) biomedical data  Obj1: Design, development and deployment of the French Annotator.  Obj2: Obtain new research results to exploit and enhance ontology-based indexing services.  semantic distances  ontology alignment  ontology enrichment and disambiguation  Obj3: Valorization of indexing services Conference C2S LGI2P, Nimes – 17 mars 2015
  11. 11. Conference C2S LGI2P, Nimes – 17 mars 2015 A French biomedical Annotator
  12. 12. Conference C2S LGI2P, Nimes – 17 mars 2015 Use biomedical ontologies-based annotations end-user applications
  13. 13. Reuse of the NCBO technology Conference C2S LGI2P, Nimes – 17 mars 2015
  14. 14. http://bioportal.bioontology.org BioPortal Ontology Repository
  15. 15. http://data.bioontology.org Ontology Services • Search • Traverse • Comment • Download Widgets • Tree-view • Auto-complete • Graph-view Annotation Data Access Mapping Services • Create • Upload • Download Term recognition Search “data” annotated with a given term http://bioportal.bioontology.org Conference C2S LGI2P, Nimes – 17 mars 2015
  16. 16. SIFR axes of research (1/2)  Design of the SIFR (French) Annotator service  Deployment of a local instance of BioPortal at LIRMM  Scoring of annotations & representation RDF using the AO [SWAT4LS 2014]  Dealing with multilingualism within BioPortal [TOTh-w 2014]  Automatic extraction of biomedical terminology from text  Hereafter [LBM 2013][ISWC 2014][TALN 2014][PolTAL 2014]  Semantic distance framework  Collaboration with LGI2P to reuse Semantic Measure Library (SML) Conference C2S LGI2P, Nimes – 17 mars 2015
  17. 17. SIFR axes of research (2/2)  Dealing with public patient data on blogs, forums and tweets (Sandra Bringay)  Detection of emotion [EGC 2014]  Patient vocabulary [eTELEMED 2014]  Adverse drug event mining from EHRs  Project to compare pharmacogenomics literature and EHRs  Design of a semantic annotation workflow for plant data - collaboration with IBC project [CO-PDI 2014]  AgroLD project [RDA 2014]  Cropontology.org  Semantic indexing and users feedback – Viewpoint [IC 2014]  Collaboration with P. Lemoisson (CIRAD)  PhD project of Guillaume Surroca Conference C2S LGI2P, Nimes – 17 mars 2015
  18. 18. Biomedical terminology extraction Work realized in the context of Juan Antonio Lossio Ventura ‘s PhD preparation In collaboration with Mathieu Roche & Maguelonne Teisseire (TETIS)
  19. 19. Motivations for automatic terminology extraction  Experiment and validate approaches for French data  Offer services for both English and French communities  Go beyond the state-of-the-art  Contribute to the ontology enrichment process  Acquire some NLP expertise to enhance the NCBO Annotation workflow Conference C2S LGI2P, Nimes – 17 mars 2015
  20. 20. Combining ATR & AKE ATR AKE Automatic Term Recognition Automatic Keyword Extraction Input one large corpus single document of a dataset Output technical terms of a domain keywords that describe the document Domain very specific none Exemples C-value TFIDF, Okapi Automatic Term Recognition Automatic Keyword Extraction term1 term2 … termn Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Conference C2S LGI2P, Nimes – 17 mars 2015
  21. 21. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  22. 22. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  23. 23. Assign each word in a text to its grammatical category (e.g., noun, adjective). We apply part-of-speech to the whole corpus Three tools: • TreeTagger, • Stanford Tagger, • Brill’s rules (1) Part-of-speech tagging Conference C2S LGI2P, Nimes – 17 mars 2015
  24. 24. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  25. 25. (2) Candidate term extraction following patterns Conference C2S LGI2P, Nimes – 17 mars 2015 ~ 5M concepts 161 sources Unified Medical Language System … UMLS MeSH ICD SNOMED
  26. 26. (2) Candidate term extraction following patterns Conference C2S LGI2P, Nimes – 17 mars 2015
  27. 27. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  28. 28. (3) Ranking of candidate terms Conference C2S LGI2P, Nimes – 17 mars 2015 Using C-value Where: In order to extract single-word and multi-word terms
  29. 29. (3) Ranking of candidate terms Using TF-IDF and Okapi BM25 Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Conference C2S LGI2P, Nimes – 17 mars 2015
  30. 30. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  31. 31. (4) Computing of new combination measures Conference C2S LGI2P, Nimes – 17 mars 2015 F-OCapi and F-TFIDF-C (Harmonic mean)
  32. 32. Conference C2S LGI2P, Nimes – 17 mars 2015 C-Okapi and C-TFIDF (4) Computing of new combination measures
  33. 33. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  34. 34. (5) Re-ranking using web- based measure Conference C2S LGI2P, Nimes – 17 mars 2015 term 1 term 2 … term n WEB “treponema pallidum” treponema pallidum
  35. 35. Experiments: datasets  Plus automatic validation using UMLS (EN) & MeSH (FR) Conference C2S LGI2P, Nimes – 17 mars 2015 Drugs and Herbs Medical Tests PubMed (EN, FR) (EN, FR) (EN)
  36. 36. Precision comparison of the best measures for term extraction for English. Precision comparison of the best measures for term extraction for French. Precision comparison between F-OCapiM and WebR with automatic validation for French. Conference C2S LGI2P, Nimes – 17 mars 2015 Experiments: results
  37. 37. Conference C2S LGI2P, Nimes – 17 mars 2015
  38. 38. Conference C2S LGI2P, Nimes – 17 mars 2015
  39. 39. Current & future work on term extraction  Methodology for term extraction and ranking for two languages, French and English.  C-value adapted to extract French biomedical terms.  Two new measures thanks to the combination of three existing methods and another new web-based measure.  WebR was applied to re-rank the best list positioning the true biomedical terms at the top of list.  Reuse such NLP within the SIFR Annotator workflow to enhance semantic annotation Conference C2S LGI2P, Nimes – 17 mars 2015
  40. 40. A few words by way of conclusion Conference C2S LGI2P, Nimes – 17 mars 2015
  41. 41. Conference C2S LGI2P, Nimes – 17 mars 2015  Terminologies & ontologies are relevant features for knowledge representation  But a large majority of the data are texts  Go beyond one language  Share & mutualize relevant resources in the domain: ontologies, terminologies, mappings, annotations, technologies

×