A few contributions of the SIFR (Semantic Indexing of French biomedical Resources project) and how we reuse NCBO technology
1. Atelier Recherche d’Information Sémantique, RISE’15
30 juin 2015 – Rennes
Clement Jonquet – jonquet@lirmm.fr
A few contributions of
the SIFR
(Semantic Indexing of French
biomedical Resources project)
and how we reuse NCBO
technology
2. How is this relevant to RISE?
Modèles de Recherche d'Information Sémantique
Extraction d'Information
Annotation Sémantique
Indexation Sémantique
Alignement d'ontologies et correspondances pour la Recherche
d'Information
Langages de Représentation des connaissances pour la Recherche
d'Information
Utilisation des distances Sémantiques pour la Recherche
d'Information
Atelier RISE 2015
30 juin 2015, Rennes
4. Biologist have adopted
ontologies
To provide canonical representation of scientific
knowledge
To annotate experimental data to enable
interpretation, comparison, and discovery across
databases
To facilitate knowledge-based applications for
Decision support
Natural language-processing
Data integration
But ontologies are: spread out, in different formats, of
different size, with different structures
Atelier RISE 2015
30 juin 2015, Rennes
5. Working with terminologies &
ontologies – a portal please!
You’ve built an ontology, how do you let the world know?
You need an ontology, where do you go o get it?
How do you know whether an ontology is any good?
How do you find resources that are relevant to the
domain of the ontology (or to specific terms)?
How could you leverage your ontology to enable new
science?
How could you use ontologies without managing them ?
Atelier RISE 2015
30 juin 2015, Rennes
6. Atelier RISE 2015
30 juin 2015, Rennes
Comparison of the
approaches
[IWBBIO'14]
7. Annotation challenge
Explosion of biomedical data: diverse,
distributed, unstructured… not linked to
ontologies
Hard for biomedical researchers to find the
data they need
Data integration problem
Translational discoveries are prevented
Good examples
GO annotations
PubMed (biomedical literature) indexed with
Mesh headings
Annotate data with ontology concepts
Horizontal approach
ONTOLOGIES
RESOURCES
Atelier RISE 2015
30 juin 2015, Rennes
8. Good use of the semantics (1/2)
Simple keywords based search miss results
Atelier RISE 2015
30 juin 2015, Rennes
9. Good use of the semantics (2/2)
Atelier RISE 2015
30 juin 2015, Rennes
10. A few words about SIFR
project
Atelier RISE 2015
30 juin 2015, Rennes
12. People
Young researchers
Clement Jonquet
Mathieu Roche
Sandra Bringay
Advisors
Stefano A. Cerri
Maguelonne Teisseire
Pascal Poncelet
Staff
Vincent Emonet
Students
Juan Antonio Lossio Ventura
Guillaume Surroca
~3 MSc students / year
Close collaborators
Philippe Lemoisson (TETIS)
Pierre Larmande (IRD / IBC)
Mark Musen (BMIR)
Stefan Darmoni (CISMEF)
Sebastien Harispe (LGI2P)
Atelier RISE 2015
30 juin 2015, Rennes
13. Increasing number of biomedical
data + multilingualism
Limits of keyword-based indexing
Biomedical community has turned to ontologies to describe their
data and turn them into structured and formalized knowledge
Using ontologies is by means of creating semantic annotations
Crucial need for tools & services for French biomedical data
Biomedical data integration challenge
New potential sceintific discoveries hidden in data
Translational research
Atelier RISE 2015
30 juin 2015, Rennes
14. Use ontologies for indexing, mining
and searching (French) biomedical
data
Obj1: Design, development and deployment
of the French Annotator.
Obj2: Obtain new research results to exploit
and enhance ontology-based indexing
services.
semantic distances
ontology alignment
ontology enrichment and disambiguation
Obj3: Valorization of indexing services
Atelier RISE 2015
30 juin 2015, Rennes
16. Atelier RISE 2015
30 juin 2015, Rennes
Use biomedical ontologies-based
annotations end-user applications
17. Reuse of the NCBO
technology
Atelier RISE 2015
30 juin 2015, Rennes
18. Bioportal : A “one stop shop”
for Biomedical Ontologies
Web repository for biomedical ontologies
Make ontologies accessible and usable – abstraction on
format, locations, structure, etc.
Users can publish, download, browse, search, comment,
align ontologies and use them for annotations both online
and via a web services API.
Online support for ontology
Peer review
Notes (comments and discussion)
Versioning
Mapping
Search
Resources
Atelier RISE 2015
30 juin 2015, Rennes
20. http://data.bioontology.org
Ontology
Services
• Search
• Traverse
• Comment
• Download
Widgets
• Tree-view
• Auto-complete
• Graph-view
Annotation
Data Access
Mapping
Services
• Create
• Upload
• Download
Term recognition
Search “data”
annotated with a
given term
http://bioportal.bioontology.org Atelier RISE 2015
30 juin 2015, Rennes
21. Current axes of research
Atelier RISE 2015
30 juin 2015, Rennes
22. SIFR axes of research (1/8):
Design of the SIFR (French)
Annotator service
Deployment of a local instance of BioPortal at LIRMM
16 French terminologies imported from UMLS, EHTOP & BioPortal
UTF8 compliant Mgrep concept recognizer (Univ. of Michigan)
http://bioportal.lirmm.fr/annotator
New improvement to the annotation workflow
Automatic term extraction measures (C-value, LIDF-value, etc.)
Scoring of annotations & representation in RDF using the AO
[SWAT4LS 2014]
Atelier RISE 2015
30 juin 2015, Rennes
23. Improving the Annotator(s) –
example with scoring
Objective : To improve the Annotator(s) results by ranking
the annotations according to their relevance
While not changing the service implementation
Take into account their frequencies (as originally proposed in
2009 and removed)
Add a term extraction measure, called C-Value, used to
positively discriminate annotations generated from matches
with multi-word terms.
2 new scoring methods allowing to score and rank
annotations by their importance in the given input data
Interesting results validated against PubMed manual
annotations
[SWAT4LS 2014]
Atelier RISE 2015
30 juin 2015, Rennes
24. SIFR axes of research (2/8):
Dealing with multilingualism within
BioPortal
Status of multilingualism in BioPortal – quite negative
Set of propositions [MSW 2014]
Representation of natural language property for an ontology
Representation of the distinction between ontologies
Representation of relation between ontologies
Representation of multilingual translation mappings
Reconciliation of multilingual mappings (possible PhD collaboration with
ESI)
Currently being tested/implemented within our local instance
Atelier RISE 2015
30 juin 2015, Rennes
25. What is being multilingual?
Interface internationalization = displaying static elements of
the user interface (e.g., menu names, help, etc.) in
different languages
Content internationalization = displaying BioPortal content
(e.g., ontology labels, mappings, etc.) in different languages
Multilingual = internationalization (display) + to enabling a
complete use of the functionalities and services of BioPortal
for multilingual ontologies or monolingual ontologies
completely and properly addressed (languages, translations,
multilingual mappings, etc.)
rich semantic description
Being able to parse multilingual content in ontologies (from
xmllang to Lemon)
Atelier RISE 2015
30 juin 2015, Rennes
26. multilingual
ontology
Atelier RISE 2015
30 juin 2015, Rennes
en:disease
fr:maladie
...
en:cancer
fr:cancer
en:spindel cell sarcome
fr:sarcome à cellules fusiformes
en:melanoma
fr:mélanome
disease
... cancer
spindle cell sarcome melanoma
maladie
... cancer
sarcome à cellules
fusiformes
mélanome
language specific
ontology
(monolingual)
27. SIFR axes of research (3/8):
Automatic extraction of biomedical
terminology from text
Context of the PhD of Juan Antonio Lossio
[LBM 2013][TALN 2014][PolTAL 2014]
BioTex , software
http://tubo.lirmm.fr/biotex [ISWC 2014]
Work in French, English and Spanish
Motivations for automatic terminology
extraction
Experiment and validate approaches for
French data
Contribute to the ontology enrichment
process
Acquire some NLP expertise for the
annotation workflow
Atelier RISE 2015
30 juin 2015, Rennes
29. Statistical methods
C-value: Improves the extraction of longest terms
soft contact soft contact lens
Frantzi, K., Ananiadou, S., & Mima, H. (2000). Automatic recognition of multi-word terms:. the
c-value/nc-value method. International Journal on Digital Libraries, 3(2), 115-130.
Atelier RISE 2015
30 juin 2015, Rennes
32. Include BioTex into BioPortal
Use BioPortal dictionary for validation
New ontology enrichment service… give a corpus of data and
see what are the terms not yet covered
Atelier RISE 2015
30 juin 2015, Rennes
33. SIFR axes of research (4/8):
Semantic distance framework
Automatically compute existing (Rada, Wu&Palmer, Resnik)
semantic similarity measures over BioPortal ontologies
For a given concept get all semantically closed concepts
Get the semantic distance between 2 concepts
Collaboration with LGI2P to reuse Semantic Measure Library
(SML) within BioPortal
1st prototype: http://tubo.lirmm.fr/BioMedicalSemantic/web/app_dev.php
To include SML within BioPortal backend to bring semantic
distance services to the ontologies and data annotated
Atelier RISE 2015
30 juin 2015, Rennes
34. SIFR axes of research (5/8):
Informal patient data analysis
Dealing with public patient data on blogs, forums and
tweets (Sandra Bringay)
Detection of emotion [EGC 2014][eTELEMED 2014]
Patient vocabulary (crabe vs. cancer)
Project “Parlons de nous” (www.lirmm.fr/patient-mind)
MSH-M
A patient vocabulary currently being constructed [IC 2015]
Hosted and available in our local instance of BioPortal
Used for annotations, indexing, information retrieval
Atelier RISE 2015
30 juin 2015, Rennes
35. SIFR axes of research (6/8):
Viewpoint: a subjective knowledge
representation formalism
Collaboration with P. Lemoisson (CIRAD) & PhD of G. Surroca
Graph based knowledge representation formalism
Linked data from the semantic Web and user contributions
from the social Web.
Unified topological approach
First prototype for semantic search over HAL-LIRMM
publications [IC2014]
Capture the phenomenon of Serendipity
(i.e., incidental learning) [IC 2015]
Atelier RISE 2015
30 juin 2015, Rennes
36. SIFR axes of research (7/8):
Pharmacogenomics use case
PGx studies how individual gene variations cause variability in
drug responses
Validation of pharmacogenomics state-of-the-art knowledge on
the basis of practice-based evidences
Compare pharmacogenomics literature (in English) and electronic
health records (in French)
EHRs from Paris (HEGP) & St Etienne hospitals
Improvement of the AnnotatorS to come to handle clinical data:
negation, disambiguation, modularity, temporality
Project submitted to ANR generic call 2015 (April 27th)
Collaborative action lead by Adrien Coulet (LORIA)
Stanford is in the loop (Russ, Mark, Michel, Nigam)
Atelier RISE 2015
30 juin 2015, Rennes
37. SIFR axes of research (8/8):
application to agronomy & plant
Within the Institute of Computational Biology of
Montpellier
Design of a semantic annotation workflow for plant data -
collaboration with IBC project [CO-PDI 2014]
AgroLD: to build an RDF knowledge base to house plant data
resources: SouthGreen, Gramene, OryGeneDB… [RDA 2014]
AgroPortal: reference ontology repository for the agronomic
domain [IN-OVIVE 2015]
Experiment NCBO technologies for the plant community
4 driving agronomic use cases
Atelier RISE 2015
30 juin 2015, Rennes
38. Objectives of AgroPortal project
Develop and support a reference ontology repository for the
agronomic domain
One-stop-shop for plant/agronomic related ontologies
Primary focus on the agronomic & plant domain
Reusing the NCBO BioPortal technology
Avoid to re-implement what has been done
Facilitate interoperability
Reusing the scientific outcomes, experience & methods of the
biomedical domain
Enable straightforward use of agronomic related ontologies
Respect the requirements of the agronomic community
Fully semantic web compliant infrastructure
Atelier RISE 2015
30 juin 2015, Rennes
41. Next future
Continue to move different prototypes into production
Release of the French Annotator
Find more use cases
Collaboration with the plant/agro community
Continue reusing and contributing to NCBO technology
Atelier RISE 2015
30 juin 2015, Rennes
42. Online resources
Web page: www.lirmm.fr/sifr
https://www.researchgate.net/projects
Code repository: https://github.com/sifrproject
13 developpers
10 repositories
Publications: http://bit.ly/194ImnR
Direct link to HAL-LIRMM platform
with advance search features
Portals & services:
http://bioportal.lirmm.fr
http://agroportal.lirmm.fr
Atelier RISE 2015
30 juin 2015, Rennes