About the use of biomedical ontologies to play with text in the context of the SIFR project.

INRAE (MISTEA) and University of Montpellier (LIRMM)
INRAE (MISTEA) and University of Montpellier (LIRMM)Senior Researcher at INRAE (MISTEA) and University of Montpellier (LIRMM)
About the use of
biomedical
ontologies to play
with text
… in the context of the…
Clement Jonquet (jonquet@lirmm.fr)
Conférence Communication Science et Société
LGI2P, Nimes – 17 mars 2015
A few introduction words
Conference C2S
LGI2P, Nimes – 17 mars 2015
Biologist have adopted
ontologies
 To provide canonical representation of scientific
knowledge
 To annotate experimental data to enable
interpretation, comparison, and discovery across
databases
 To facilitate knowledge-based applications for
 Decision support
 Natural language-processing
 Data integration
 But ontologies are: spread out, in different formats, of
different size, with different structures
Conference C2S
LGI2P, Nimes – 17 mars 2015
Working with terminologies &
ontologies – a portal please!
 You’ve built an ontology, how do you let the world know?
 You need an ontology, where do you go o get it?
 How do you know whether an ontology is any good?
 How do you find resources that are relevant to the
domain of the ontology (or to specific terms)?
 How could you leverage your ontology to enable new
science?
 How could you use ontologies without managing them ?
Conference C2S
LGI2P, Nimes – 17 mars 2015
Conference C2S
LGI2P, Nimes – 17 mars 2015
Comparison in [IWBBIO'14]
Annotation challenge
 Explosion of biomedical data: diverse,
distributed, unstructured… not link to
ontologies
 Hard for biomedical researchers to find the
data they need
 Data integration problem
 Translational discoveries are prevented
 Good examples
 GO annotations
 PubMed (biomedical literature) indexed with
Mesh headings
 Annotate data with ontology concepts
 Horizontal approach
ONTOLOGIES
RESOURCES
Conference C2S
LGI2P, Nimes – 17 mars 2015
A few words about SIFR
project
Conference C2S LGI2P, Nimes –
17 mars 2015
Semantic Indexing of
French Biomedical Data
Resources project
… in collaboration with…
Context:
increasing number of biomedical data
+ multilingualism
 Limits of keyword-based indexing
 Biomedical community has turned to ontologies to describe their
data and turn them into structured and formalized knowledge
 Using ontologies is by means of creating semantic annotations
 Crucial need for tools & services for French biomedical data
 Biomedical data integration challenge
 New potential sceintific discoveries hidden in data
 Translational research
Conference C2S LGI2P, Nimes –
17 mars 2015
Use ontologies for indexing, mining
and searching (French) biomedical
data
 Obj1: Design, development and deployment
of the French Annotator.
 Obj2: Obtain new research results to exploit
and enhance ontology-based indexing
services.
 semantic distances
 ontology alignment
 ontology enrichment and disambiguation
 Obj3: Valorization of indexing services
Conference C2S LGI2P, Nimes –
17 mars 2015
Conference C2S LGI2P, Nimes –
17 mars 2015
A French biomedical Annotator
Conference C2S LGI2P, Nimes –
17 mars 2015
Use biomedical ontologies-based
annotations end-user applications
Reuse of the NCBO
technology
Conference C2S LGI2P, Nimes –
17 mars 2015
http://bioportal.bioontology.org
BioPortal Ontology Repository
http://data.bioontology.org
Ontology
Services
• Search
• Traverse
• Comment
• Download
Widgets
• Tree-view
• Auto-complete
• Graph-view
Annotation
Data Access
Mapping
Services
• Create
• Upload
• Download
Term recognition
Search “data”
annotated with a
given term
http://bioportal.bioontology.org Conference C2S LGI2P, Nimes –
17 mars 2015
SIFR axes of research (1/2)
 Design of the SIFR (French) Annotator service
 Deployment of a local instance of BioPortal at LIRMM
 Scoring of annotations & representation RDF using the AO [SWAT4LS
2014]
 Dealing with multilingualism within BioPortal [TOTh-w 2014]
 Automatic extraction of biomedical terminology from text
 Hereafter [LBM 2013][ISWC 2014][TALN 2014][PolTAL 2014]
 Semantic distance framework
 Collaboration with LGI2P to reuse Semantic Measure Library (SML)
Conference C2S LGI2P, Nimes –
17 mars 2015
SIFR axes of research (2/2)
 Dealing with public patient data on blogs, forums and tweets
(Sandra Bringay)
 Detection of emotion [EGC 2014]
 Patient vocabulary [eTELEMED 2014]
 Adverse drug event mining from EHRs
 Project to compare pharmacogenomics literature and EHRs
 Design of a semantic annotation workflow for plant data -
collaboration with IBC project [CO-PDI 2014]
 AgroLD project [RDA 2014]
 Cropontology.org
 Semantic indexing and users feedback – Viewpoint [IC 2014]
 Collaboration with P. Lemoisson (CIRAD)
 PhD project of Guillaume Surroca
Conference C2S LGI2P, Nimes –
17 mars 2015
Biomedical
terminology extraction
Work realized in the context of
Juan Antonio Lossio Ventura ‘s PhD preparation
In collaboration with Mathieu Roche & Maguelonne Teisseire (TETIS)
Motivations for automatic
terminology extraction
 Experiment and validate approaches for French data
 Offer services for both English and French communities
 Go beyond the state-of-the-art
 Contribute to the ontology enrichment process
 Acquire some NLP expertise to enhance the NCBO
Annotation workflow
Conference C2S LGI2P, Nimes –
17 mars 2015
Combining ATR & AKE
ATR AKE
Automatic Term
Recognition
Automatic Keyword
Extraction
Input one large corpus single document of a dataset
Output technical terms of a domain keywords that describe the
document
Domain very specific none
Exemples C-value TFIDF, Okapi
Automatic Term
Recognition
Automatic Keyword
Extraction
term1
term2
…
termn
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
Assign each word in a text to its grammatical category (e.g.,
noun, adjective).
We apply part-of-speech to the whole corpus
Three tools:
• TreeTagger,
• Stanford Tagger,
• Brill’s rules
(1) Part-of-speech tagging
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms
extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(2) Candidate term extraction
following patterns
Conference C2S LGI2P, Nimes –
17 mars 2015
~ 5M concepts
161 sources
Unified Medical Language System
…
UMLS
MeSH
ICD
SNOMED
(2) Candidate term extraction
following patterns
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(3) Ranking of candidate terms
Conference C2S LGI2P, Nimes –
17 mars 2015
Using C-value
Where:
In order to extract single-word and multi-word terms
(3) Ranking of candidate terms
Using TF-IDF and Okapi BM25
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Keyword1
Keyword2
…
Conference C2S LGI2P, Nimes –
17 mars 2015
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(4) Computing of new
combination measures
Conference C2S LGI2P, Nimes –
17 mars 2015
F-OCapi and F-TFIDF-C (Harmonic mean)
Conference C2S LGI2P, Nimes –
17 mars 2015
C-Okapi and C-TFIDF
(4) Computing of new
combination measures
Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination
measures
Re-ranking using web-based measure.
Conference C2S LGI2P, Nimes –
17 mars 2015
(5) Re-ranking using web-
based measure
Conference C2S LGI2P, Nimes –
17 mars 2015
term
1
term
2
…
term
n
WEB
“treponema pallidum”
treponema pallidum
Experiments: datasets
 Plus automatic validation using UMLS (EN) & MeSH (FR)
Conference C2S LGI2P, Nimes –
17 mars 2015
Drugs and Herbs
Medical Tests
PubMed
(EN, FR)
(EN, FR)
(EN)
Precision comparison of the best measures for term extraction for
English.
Precision comparison of the best measures for term extraction for
French.
Precision comparison between F-OCapiM and WebR with automatic validation for
French.
Conference C2S LGI2P, Nimes –
17 mars 2015
Experiments: results
Conference C2S LGI2P, Nimes –
17 mars 2015
Conference C2S LGI2P, Nimes –
17 mars 2015
Current & future work on
term extraction
 Methodology for term extraction and ranking for two
languages, French and English.
 C-value adapted to extract French biomedical terms.
 Two new measures thanks to the combination of three
existing methods and another new web-based measure.
 WebR was applied to re-rank the best list positioning
the true biomedical terms at the top of list.
 Reuse such NLP within the SIFR Annotator workflow to
enhance semantic annotation
Conference C2S LGI2P, Nimes –
17 mars 2015
A few words by way of
conclusion
Conference C2S
LGI2P, Nimes – 17 mars 2015
Conference C2S LGI2P, Nimes –
17 mars 2015
 Terminologies & ontologies are relevant
features for knowledge representation
 But a large majority of the data are texts
 Go beyond one language
 Share & mutualize relevant resources in the
domain: ontologies, terminologies,
mappings, annotations, technologies
1 of 41

More Related Content

Similar to About the use of biomedical ontologies to play with text in the context of the SIFR project.(20)

Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
INRAE (MISTEA) and University of Montpellier (LIRMM)1.8K views
Mastering an ontology & vocabulary management technology in France ?Mastering an ontology & vocabulary management technology in France ?
Mastering an ontology & vocabulary management technology in France ?
INRAE (MISTEA) and University of Montpellier (LIRMM)678 views
Industrial Natural Language Processing and Information ExtractionIndustrial Natural Language Processing and Information Extraction
Industrial Natural Language Processing and Information Extraction
Institute for Technologies and Management of Digital Transformation, University of Wuppertal738 views
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability Institute
Neil Chue Hong547 views
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBI
Juan Antonio Vizcaino623 views
Introduction to EBI for Proteomics in ELIXIRIntroduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIR
Juan Antonio Vizcaino187 views
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
Francisco Prosdocimi471 views
AgroPortal : a proposition for ontology- based services in the agronomic domainAgroPortal : a proposition for ontology- based services in the agronomic domain
AgroPortal : a proposition for ontology- based services in the agronomic domain
INRAE (MISTEA) and University of Montpellier (LIRMM)871 views
Schuurman phd presentation 2015 02 27Schuurman phd presentation 2015 02 27
Schuurman phd presentation 2015 02 27
Dimitri Schuurman1.3K views
Presentation OntoCommons Workshop March 2021Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
INRAE (MISTEA) and University of Montpellier (LIRMM)121 views
NLP & ML WebinarNLP & ML Webinar
NLP & ML Webinar
Pistoia Alliance2.3K views

More from INRAE (MISTEA) and University of Montpellier (LIRMM)(18)

Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Te...
INRAE (MISTEA) and University of Montpellier (LIRMM)69 views
O’FAIRe: Ontology FAIRness Evaluator in theAgroPortal semantic resource rep...O’FAIRe: Ontology FAIRness Evaluator in theAgroPortal semantic resource rep...
O’FAIRe: Ontology FAIRness Evaluator in the AgroPortal semantic resource rep...
INRAE (MISTEA) and University of Montpellier (LIRMM)128 views
Presentation FAIRsFAIR workshop (June 2021)Presentation FAIRsFAIR workshop (June 2021)
Presentation FAIRsFAIR workshop (June 2021)
INRAE (MISTEA) and University of Montpellier (LIRMM)93 views
Presentation FAIRsFAIR workshop (April 2020)Presentation FAIRsFAIR workshop (April 2020)
Presentation FAIRsFAIR workshop (April 2020)
INRAE (MISTEA) and University of Montpellier (LIRMM)53 views
Tutorial: “How to use ontology repositories and ontology–based services”Tutorial: “How to use ontology repositories and ontology–based services”
Tutorial: “How to use ontology repositories and ontology–based services”
INRAE (MISTEA) and University of Montpellier (LIRMM)1.2K views
Ontology Repository and Ontology-based ServicesOntology Repository and Ontology-based Services
Ontology Repository and Ontology-based Services
INRAE (MISTEA) and University of Montpellier (LIRMM)607 views
Portail d’ontologies et annotation sémantique de texte - Application en biomé...Portail d’ontologies et annotation sémantique de texte - Application en biomé...
Portail d’ontologies et annotation sémantique de texte - Application en biomé...
INRAE (MISTEA) and University of Montpellier (LIRMM)460 views
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
INRAE (MISTEA) and University of Montpellier (LIRMM)454 views
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
AgroPortal : a vocabulary and ontology repository for agronomy, plant science...
INRAE (MISTEA) and University of Montpellier (LIRMM)593 views
SIFR : Indexation sémantique de ressources biomédicales francophonesSIFR : Indexation sémantique de ressources biomédicales francophones
SIFR : Indexation sémantique de ressources biomédicales francophones
INRAE (MISTEA) and University of Montpellier (LIRMM)485 views
Tutoriel : "Gestion d’ontologies"Tutoriel : "Gestion d’ontologies"
Tutoriel : "Gestion d’ontologies"
INRAE (MISTEA) and University of Montpellier (LIRMM)646 views
SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminolog...SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminolog...
SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminolog...
INRAE (MISTEA) and University of Montpellier (LIRMM)580 views
Challenges for ontology repositories and applications to biomedicine and agro...Challenges for ontology repositories and applications to biomedicine and agro...
Challenges for ontology repositories and applications to biomedicine and agro...
INRAE (MISTEA) and University of Montpellier (LIRMM)433 views
Presentation AgroPortalPresentation AgroPortal
Presentation AgroPortal
INRAE (MISTEA) and University of Montpellier (LIRMM)327 views
Roadmap for a multilingual BioPortalRoadmap for a multilingual BioPortal
Roadmap for a multilingual BioPortal
INRAE (MISTEA) and University of Montpellier (LIRMM)969 views
Presentation Sommet iPad en education 2014 Polytech MontpellierPresentation Sommet iPad en education 2014 Polytech Montpellier
Presentation Sommet iPad en education 2014 Polytech Montpellier
INRAE (MISTEA) and University of Montpellier (LIRMM)981 views
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
INRAE (MISTEA) and University of Montpellier (LIRMM)1.5K views
Dynamic Service Generation: Agent interactions for service exchange on the GridDynamic Service Generation: Agent interactions for service exchange on the Grid
Dynamic Service Generation: Agent interactions for service exchange on the Grid
INRAE (MISTEA) and University of Montpellier (LIRMM)691 views

Recently uploaded(20)

METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views

About the use of biomedical ontologies to play with text in the context of the SIFR project.

  • 1. About the use of biomedical ontologies to play with text … in the context of the… Clement Jonquet (jonquet@lirmm.fr) Conférence Communication Science et Société LGI2P, Nimes – 17 mars 2015
  • 2. A few introduction words Conference C2S LGI2P, Nimes – 17 mars 2015
  • 3. Biologist have adopted ontologies  To provide canonical representation of scientific knowledge  To annotate experimental data to enable interpretation, comparison, and discovery across databases  To facilitate knowledge-based applications for  Decision support  Natural language-processing  Data integration  But ontologies are: spread out, in different formats, of different size, with different structures Conference C2S LGI2P, Nimes – 17 mars 2015
  • 4. Working with terminologies & ontologies – a portal please!  You’ve built an ontology, how do you let the world know?  You need an ontology, where do you go o get it?  How do you know whether an ontology is any good?  How do you find resources that are relevant to the domain of the ontology (or to specific terms)?  How could you leverage your ontology to enable new science?  How could you use ontologies without managing them ? Conference C2S LGI2P, Nimes – 17 mars 2015
  • 5. Conference C2S LGI2P, Nimes – 17 mars 2015 Comparison in [IWBBIO'14]
  • 6. Annotation challenge  Explosion of biomedical data: diverse, distributed, unstructured… not link to ontologies  Hard for biomedical researchers to find the data they need  Data integration problem  Translational discoveries are prevented  Good examples  GO annotations  PubMed (biomedical literature) indexed with Mesh headings  Annotate data with ontology concepts  Horizontal approach ONTOLOGIES RESOURCES Conference C2S LGI2P, Nimes – 17 mars 2015
  • 7. A few words about SIFR project Conference C2S LGI2P, Nimes – 17 mars 2015
  • 8. Semantic Indexing of French Biomedical Data Resources project … in collaboration with…
  • 9. Context: increasing number of biomedical data + multilingualism  Limits of keyword-based indexing  Biomedical community has turned to ontologies to describe their data and turn them into structured and formalized knowledge  Using ontologies is by means of creating semantic annotations  Crucial need for tools & services for French biomedical data  Biomedical data integration challenge  New potential sceintific discoveries hidden in data  Translational research Conference C2S LGI2P, Nimes – 17 mars 2015
  • 10. Use ontologies for indexing, mining and searching (French) biomedical data  Obj1: Design, development and deployment of the French Annotator.  Obj2: Obtain new research results to exploit and enhance ontology-based indexing services.  semantic distances  ontology alignment  ontology enrichment and disambiguation  Obj3: Valorization of indexing services Conference C2S LGI2P, Nimes – 17 mars 2015
  • 11. Conference C2S LGI2P, Nimes – 17 mars 2015 A French biomedical Annotator
  • 12. Conference C2S LGI2P, Nimes – 17 mars 2015 Use biomedical ontologies-based annotations end-user applications
  • 13. Reuse of the NCBO technology Conference C2S LGI2P, Nimes – 17 mars 2015
  • 15. http://data.bioontology.org Ontology Services • Search • Traverse • Comment • Download Widgets • Tree-view • Auto-complete • Graph-view Annotation Data Access Mapping Services • Create • Upload • Download Term recognition Search “data” annotated with a given term http://bioportal.bioontology.org Conference C2S LGI2P, Nimes – 17 mars 2015
  • 16. SIFR axes of research (1/2)  Design of the SIFR (French) Annotator service  Deployment of a local instance of BioPortal at LIRMM  Scoring of annotations & representation RDF using the AO [SWAT4LS 2014]  Dealing with multilingualism within BioPortal [TOTh-w 2014]  Automatic extraction of biomedical terminology from text  Hereafter [LBM 2013][ISWC 2014][TALN 2014][PolTAL 2014]  Semantic distance framework  Collaboration with LGI2P to reuse Semantic Measure Library (SML) Conference C2S LGI2P, Nimes – 17 mars 2015
  • 17. SIFR axes of research (2/2)  Dealing with public patient data on blogs, forums and tweets (Sandra Bringay)  Detection of emotion [EGC 2014]  Patient vocabulary [eTELEMED 2014]  Adverse drug event mining from EHRs  Project to compare pharmacogenomics literature and EHRs  Design of a semantic annotation workflow for plant data - collaboration with IBC project [CO-PDI 2014]  AgroLD project [RDA 2014]  Cropontology.org  Semantic indexing and users feedback – Viewpoint [IC 2014]  Collaboration with P. Lemoisson (CIRAD)  PhD project of Guillaume Surroca Conference C2S LGI2P, Nimes – 17 mars 2015
  • 18. Biomedical terminology extraction Work realized in the context of Juan Antonio Lossio Ventura ‘s PhD preparation In collaboration with Mathieu Roche & Maguelonne Teisseire (TETIS)
  • 19. Motivations for automatic terminology extraction  Experiment and validate approaches for French data  Offer services for both English and French communities  Go beyond the state-of-the-art  Contribute to the ontology enrichment process  Acquire some NLP expertise to enhance the NCBO Annotation workflow Conference C2S LGI2P, Nimes – 17 mars 2015
  • 20. Combining ATR & AKE ATR AKE Automatic Term Recognition Automatic Keyword Extraction Input one large corpus single document of a dataset Output technical terms of a domain keywords that describe the document Domain very specific none Exemples C-value TFIDF, Okapi Automatic Term Recognition Automatic Keyword Extraction term1 term2 … termn Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Conference C2S LGI2P, Nimes – 17 mars 2015
  • 21. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 22. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 23. Assign each word in a text to its grammatical category (e.g., noun, adjective). We apply part-of-speech to the whole corpus Three tools: • TreeTagger, • Stanford Tagger, • Brill’s rules (1) Part-of-speech tagging Conference C2S LGI2P, Nimes – 17 mars 2015
  • 24. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 25. (2) Candidate term extraction following patterns Conference C2S LGI2P, Nimes – 17 mars 2015 ~ 5M concepts 161 sources Unified Medical Language System … UMLS MeSH ICD SNOMED
  • 26. (2) Candidate term extraction following patterns Conference C2S LGI2P, Nimes – 17 mars 2015
  • 27. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 28. (3) Ranking of candidate terms Conference C2S LGI2P, Nimes – 17 mars 2015 Using C-value Where: In order to extract single-word and multi-word terms
  • 29. (3) Ranking of candidate terms Using TF-IDF and Okapi BM25 Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Keyword1 Keyword2 … Conference C2S LGI2P, Nimes – 17 mars 2015
  • 30. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 31. (4) Computing of new combination measures Conference C2S LGI2P, Nimes – 17 mars 2015 F-OCapi and F-TFIDF-C (Harmonic mean)
  • 32. Conference C2S LGI2P, Nimes – 17 mars 2015 C-Okapi and C-TFIDF (4) Computing of new combination measures
  • 33. Part-of-Speech Tagging Candidate terms extraction Ranking of candidate terms Computing of new combination measures Re-ranking using web-based measure. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 34. (5) Re-ranking using web- based measure Conference C2S LGI2P, Nimes – 17 mars 2015 term 1 term 2 … term n WEB “treponema pallidum” treponema pallidum
  • 35. Experiments: datasets  Plus automatic validation using UMLS (EN) & MeSH (FR) Conference C2S LGI2P, Nimes – 17 mars 2015 Drugs and Herbs Medical Tests PubMed (EN, FR) (EN, FR) (EN)
  • 36. Precision comparison of the best measures for term extraction for English. Precision comparison of the best measures for term extraction for French. Precision comparison between F-OCapiM and WebR with automatic validation for French. Conference C2S LGI2P, Nimes – 17 mars 2015 Experiments: results
  • 37. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 38. Conference C2S LGI2P, Nimes – 17 mars 2015
  • 39. Current & future work on term extraction  Methodology for term extraction and ranking for two languages, French and English.  C-value adapted to extract French biomedical terms.  Two new measures thanks to the combination of three existing methods and another new web-based measure.  WebR was applied to re-rank the best list positioning the true biomedical terms at the top of list.  Reuse such NLP within the SIFR Annotator workflow to enhance semantic annotation Conference C2S LGI2P, Nimes – 17 mars 2015
  • 40. A few words by way of conclusion Conference C2S LGI2P, Nimes – 17 mars 2015
  • 41. Conference C2S LGI2P, Nimes – 17 mars 2015  Terminologies & ontologies are relevant features for knowledge representation  But a large majority of the data are texts  Go beyond one language  Share & mutualize relevant resources in the domain: ontologies, terminologies, mappings, annotations, technologies