My keynote at the 20th BioOntologies meeting at ISMB/ECCB 2017. It focuses on discussing the challenges and opportunities of ontology semantic similarity and ontology matching to support semantic interoperability.
How is Real-Time Analytics Different from Traditional OLAP?
Sense and Similarity: making sense of similarity for ontologies
1. SENSE AND SIMILARITY
making sense of similarity for ontologies
Catia Pesquita
LASIGE, Faculdade de Ciências, Universidade de Lisboa
20th Bio-Ontologies@ISMB 2017
1
5. Outline
Similarity within an ontology
Class similarity
Annotated entities similarity
Challenges and opportunities
Similarity between ontologies
Biomedical Ontology matching
Challenges and opportunities
AgreementMakerLight
5
7. Why Semantic Similarity for Biomedical Ontologies?
7
validate protein-protein interactions (Jain & Bader, 2010)
evaluating functional coherence of gene sets (Bastos et al, 2013)
classification of chemical compounds (Ferreira et al, 2013)
calculating similarity of clinical models (Gøeg et al, 2015)
diagnosing patients (Köhler et al, 2009)
suggesting candidate genes involved in diseases (Li et al.,
2011)
8. Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
8
Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity
in biomedical ontologies. PLoS computational biology, 5(7), e1000443.
9. Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
9
Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity
in biomedical ontologies. PLoS computational biology, 5(7), e1000443.
How to measure class specificity?
10. Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
10(Lord et al. 2003)
11. Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
11
How to address annotation quality impact?
12. Measuring class specificity with depth
molecular function
toxin activity
(9)
catalytic activity
(369044)
...
... ...
cytochrome-c
oxidase activity
(2066)
...
...
12
Variable semantic specificity at same depth
13. Measuring term specificity with corpus-based
Information Content
Corpus-bias effect of rarely used but generic classes
Not all ontologies have annotations
molecular function
toxin activity
(9)
catalytic activity
(369044)
...
... ...
cytochrome-c
oxidase activity
(2066)
...
...
13
IC = -log p(c)
(Resnik, 1995)
14. Measuring term specificity with structural
Information Content
molecular function
toxin activity
(9)
catalytic activity
(369044)
...
... ...
cytochrome-c
oxidase activity
(2066)
...
...
14
Lack of subclasses may be due to ontology incompleteness
(Seco et al., 2004)
IC = 1-
log(subclass(c) + 1)
log(max(c))
15. Impact of annotation quality
Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O.
(2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519.
64%
incomplete
annotation 23%
inconsistent
annotation
Gene Ontology
15
98%
electronic
annotations
16. Impact of annotation quality
Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O.
(2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519.
23%
inconsistent
annotation
16
cytochrome-c oxidase activity
cytochrome-c oxidase activity
electron carrier activity
cytochrome-c oxidase activity
electron carrier activity
heme binding
cytochrome-c oxidase activity
electron carrier activity
heme binding
copper ion binding
17. Evaluation of Semantic Similarity Measures
22k pairs of proteins
Pre-computed similarities with classical measures
Correlation to sequence, PFam family and EC class
20% of new GO-based SS measures use CESSM
17http://xldb.di.fc.ul.pt/biotools/cessm2014/
Gene Ontology
18. Future Directions
Explore growing semantic richness
disjoint axioms
different types of relationships
logical definitions and cross-products
Improve computational efficiency
semantic similarity based searches
Semantic similarity across multiple ontologies
18
20. Ontology Matching
20
Input: Two ontologies
Output: Alignment
Alignment: optimal set of mappings between the entities
Mapping: relates two entities and has a score
21. Why match Biomedical Ontologies?
Salvadores et al. Semant Web. 2013; 4(3): 277–284.
https://bioportal.bioontology.org/, on July, 2017
21
22. Simple Lexical Mappings are not enough
High precision but low recall
Mouse Anatomy - NCI Human Anatomy (OAEI Anatomy track)
LOOM: 99% precision, 65% recall
AML: 95% precision, 93.5% recall
leghind limb
22
23. Simple Lexical Mappings are not enough
Potential incoherences
23
Chemicals_and
_Drugs_Kind
Anatomical_
Entity
Anatomy_Kind
Gingiva Gum Gingiva
Faria, Daniel, et al. "Towards annotating potential incoherences in BioPortal mappings." ISWC,
2014.
24. Challenges and Opportunities in Biomedical
Ontology Alignment
large size
rich and complex vocabulary
different modeling views
abundant sources of background knowledge
going beyond binary matching
24
27. Large Size
HashMaps to store Lexicon and Relationships
Hash-based matchers as primary matchers
No similarity matrix
27
28. Rich and complex vocabulary
Uses all labels
Assigns different weights to
labels
Extends synonyms through
the Thesaurus Matcher
28
29. stomach secretion
gastric secretion
gall bladder serosa
biliary serosa
stomach serosa
Deriving new synonyms for the Thesaurus Matcher
gastric
stomach
biliary
gall bladder
Synonyms Thesaurus
gastric serosa
gall bladder
biliary
New Synonyms
Pesquita, C., Faria, D., Stroe, C., Santos, E., Cruz, I. F., & Couto, F. M. (2013). What’s in a ‘nym’?
Synonyms in Biomedical Ontology Matching. ISWC 29
30. Different modeling views
30
body part
surface of cell
anatomical entity
anatomical
surface
cardinal cell part
surface of
epithelial cell
cell part
cell surface
31. Different modeling views
Can cause incoherences
31
body part
surface of cell/
cell surface
anatomical entity
anatomical
surface
cardinal cell part/
cell part
surface of
epithelial cell
32. Different modeling views
Repair by removing mappings
32
body part
surface of cell/
cell surface
anatomical entity
anatomical
surface
surface of
epithelial cell
cardinal cell part cell part
Santos, Emanuel, Daniel Faria, Catia Pesquita, and Francisco M. Couto. "Ontology alignment repair
through modularization and confidence-based heuristics." PloS one 10, no. 12 (2015)
33. To repair or not to repair
Repair can cause loss of information
Information preservation vs. alignment coherence
Pesquita, C. et al. (2013). Proc. of the 8th International Conference on Ontology Matching-Volume
1111 (pp. 13-24).
33
36. Automated selection of background knowledge
Mapping gain over a baseline alignment
Combine multiple sources
Faria, D., Pesquita, C., Santos, E., Cruz, I. F., & Couto, F. M. (2014). Automatic background knowledge
selection for matching biomedical ontologies. PloS one, 9(11), e111226. 36
40. Manual evaluation
40
Compound Ontology Matching
Evaluated in 6 ontology sets with logical definitions
Precision between 0.82 and 1.0
900 new candidate logical definitions
Applied to Crop Ontology - Plant Ontology - PATO
and Plant Trait Ontology - Plant Ontology - PATO
Oliveira, D. and Pesquita, C. (2015) Compound Matching of Biomedical Ontologies. ICBO
41. AML in action
Life sciences
Global Agricultural Concept Scheme (FAO)
Mapping the Crop Ontology to references
Integration of pharmacological vocabularies (Jansen Pharma)
Comp. of PhenomeNET for ontology matching (Garcia et al,
2016)
Healthcare
Semantic knowledge-base form public healthcare system (India)
Translation of SNOMED-CT (Silva et al. 2015)
Antibiotic resistance monitoring
Geospatial and
environmental
Satellite Data Semantic Interoperability (Abburu,2015)
Mapping SWEET to ENVO
Others
Comp. of eXtreme Design methodology (Dragisic et al. 2015)
Business process matching (Bahkshandeh et al., 2015)
41
42. Clustering with Semantic Similarity across Multiple
Ontologies
https://github.com/csalexandre/SESAME.git 42
Annotation to
Multiple Ontologies
BioPortal
Match Ontologies
AML
Calculate Semantic
Similarity
SML
Clustering in
Semantic Space
WEKA
SESAME
43. Clustering with Semantic Similarity across Multiple
Ontologies
https://github.com/csalexandre/SESAME.git 43
Annotation to
Multiple Ontologies
BioPortal
Match Ontologies
AML
Calculate Semantic
Similarity
SML
Clustering in
Semantic Space
WEKA
SESAME
44. Acknowledgements
Daniel Faria, IGC, Portugal
Francisco Couto, U. Lisboa, Portugal
Isabel Cruz, U. Illinois, USA
Emanuel Santos, RMIT University, Vietnam
Daniela Oliveira, Insight Centre, Ireland
Catarina Martins, University of Manchester, UK
Carlos A. Santos, U. Lisboa, Portugal
and many others
44