Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sense and Similarity: making sense of similarity for ontologies

190 views

Published on

My keynote at the 20th BioOntologies meeting at ISMB/ECCB 2017. It focuses on discussing the challenges and opportunities of ontology semantic similarity and ontology matching to support semantic interoperability.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Sense and Similarity: making sense of similarity for ontologies

  1. 1. SENSE AND SIMILARITY making sense of similarity for ontologies Catia Pesquita LASIGE, Faculdade de Ciências, Universidade de Lisboa 20th Bio-Ontologies@ISMB 2017 1
  2. 2. Similarity Shepherd, 1957 Points in space Distance 2
  3. 3. Similarity Shepherd, 1957 Points in space Distance Tversky, 1977 Sets of features Commonalities and differences 3 Ahoj! Hallo!
  4. 4. Representation of objects in Biology 4
  5. 5. Outline Similarity within an ontology Class similarity Annotated entities similarity Challenges and opportunities Similarity between ontologies Biomedical Ontology matching Challenges and opportunities AgreementMakerLight 5
  6. 6. Similarity within an ontology 6
  7. 7. Why Semantic Similarity for Biomedical Ontologies? 7 validate protein-protein interactions (Jain & Bader, 2010) evaluating functional coherence of gene sets (Bastos et al, 2013) classification of chemical compounds (Ferreira et al, 2013) calculating similarity of clinical models (Gøeg et al, 2015) diagnosing patients (Köhler et al, 2009) suggesting candidate genes involved in diseases (Li et al., 2011)
  8. 8. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 8 Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLoS computational biology, 5(7), e1000443.
  9. 9. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 9 Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLoS computational biology, 5(7), e1000443. How to measure class specificity?
  10. 10. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 10(Lord et al. 2003)
  11. 11. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 11 How to address annotation quality impact?
  12. 12. Measuring class specificity with depth molecular function toxin activity (9) catalytic activity (369044) ... ... ... cytochrome-c oxidase activity (2066) ... ... 12 Variable semantic specificity at same depth
  13. 13. Measuring term specificity with corpus-based Information Content Corpus-bias effect of rarely used but generic classes Not all ontologies have annotations molecular function toxin activity (9) catalytic activity (369044) ... ... ... cytochrome-c oxidase activity (2066) ... ... 13 IC = -log p(c) (Resnik, 1995)
  14. 14. Measuring term specificity with structural Information Content molecular function toxin activity (9) catalytic activity (369044) ... ... ... cytochrome-c oxidase activity (2066) ... ... 14 Lack of subclasses may be due to ontology incompleteness (Seco et al., 2004) IC = 1- log(subclass(c) + 1) log(max(c))
  15. 15. Impact of annotation quality Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O. (2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519. 64% incomplete annotation 23% inconsistent annotation Gene Ontology 15 98% electronic annotations
  16. 16. Impact of annotation quality Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O. (2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519. 23% inconsistent annotation 16 cytochrome-c oxidase activity cytochrome-c oxidase activity electron carrier activity cytochrome-c oxidase activity electron carrier activity heme binding cytochrome-c oxidase activity electron carrier activity heme binding copper ion binding
  17. 17. Evaluation of Semantic Similarity Measures 22k pairs of proteins Pre-computed similarities with classical measures Correlation to sequence, PFam family and EC class 20% of new GO-based SS measures use CESSM 17http://xldb.di.fc.ul.pt/biotools/cessm2014/ Gene Ontology
  18. 18. Future Directions Explore growing semantic richness disjoint axioms different types of relationships logical definitions and cross-products Improve computational efficiency semantic similarity based searches Semantic similarity across multiple ontologies 18
  19. 19. Similarity between ontologies 19
  20. 20. Ontology Matching 20 Input: Two ontologies Output: Alignment Alignment: optimal set of mappings between the entities Mapping: relates two entities and has a score
  21. 21. Why match Biomedical Ontologies? Salvadores et al. Semant Web. 2013; 4(3): 277–284. https://bioportal.bioontology.org/, on July, 2017 21
  22. 22. Simple Lexical Mappings are not enough High precision but low recall Mouse Anatomy - NCI Human Anatomy (OAEI Anatomy track) LOOM: 99% precision, 65% recall AML: 95% precision, 93.5% recall leghind limb 22
  23. 23. Simple Lexical Mappings are not enough Potential incoherences 23 Chemicals_and _Drugs_Kind Anatomical_ Entity Anatomy_Kind Gingiva Gum Gingiva Faria, Daniel, et al. "Towards annotating potential incoherences in BioPortal mappings." ISWC, 2014.
  24. 24. Challenges and Opportunities in Biomedical Ontology Alignment large size rich and complex vocabulary different modeling views abundant sources of background knowledge going beyond binary matching 24
  25. 25. AgreementMakerLight Ontology Loading Ontology Matching Filtering Input Ontology 1 Input Ontology 2 Background Knowledge Final Alignment Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I. F., & Couto, F. M. (2013). The agreementmakerlight ontology matching system. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 527-541). 25
  26. 26. 26
  27. 27. Large Size HashMaps to store Lexicon and Relationships Hash-based matchers as primary matchers No similarity matrix 27
  28. 28. Rich and complex vocabulary Uses all labels Assigns different weights to labels Extends synonyms through the Thesaurus Matcher 28
  29. 29. stomach secretion gastric secretion gall bladder serosa biliary serosa stomach serosa Deriving new synonyms for the Thesaurus Matcher gastric stomach biliary gall bladder Synonyms Thesaurus gastric serosa gall bladder biliary New Synonyms Pesquita, C., Faria, D., Stroe, C., Santos, E., Cruz, I. F., & Couto, F. M. (2013). What’s in a ‘nym’? Synonyms in Biomedical Ontology Matching. ISWC 29
  30. 30. Different modeling views 30 body part surface of cell anatomical entity anatomical surface cardinal cell part surface of epithelial cell cell part cell surface
  31. 31. Different modeling views Can cause incoherences 31 body part surface of cell/ cell surface anatomical entity anatomical surface cardinal cell part/ cell part surface of epithelial cell
  32. 32. Different modeling views Repair by removing mappings 32 body part surface of cell/ cell surface anatomical entity anatomical surface surface of epithelial cell cardinal cell part cell part Santos, Emanuel, Daniel Faria, Catia Pesquita, and Francisco M. Couto. "Ontology alignment repair through modularization and confidence-based heuristics." PloS one 10, no. 12 (2015)
  33. 33. To repair or not to repair Repair can cause loss of information Information preservation vs. alignment coherence Pesquita, C. et al. (2013). Proc. of the 8th International Conference on Ontology Matching-Volume 1111 (pp. 13-24). 33
  34. 34. Visualizing incoherences 34 Catarina Martins, Ernesto Jimenez-Ruiz, Emanuel Santos and Catia Pesquita (2015) Towards visualizing the mapping incoherences in Bioportal, ICBO
  35. 35. Cross-references Mediating matchers Logical definitions Background Knowledge Mouse Anatomy NCI-Human Anatomy UBERON 35
  36. 36. Automated selection of background knowledge Mapping gain over a baseline alignment Combine multiple sources Faria, D., Pesquita, C., Santos, E., Cruz, I. F., & Couto, F. M. (2014). Automatic background knowledge selection for matching biomedical ontologies. PloS one, 9(11), e111226. 36
  37. 37. Ontology Alignment Evaluation Initiative 2016 37 Task Precision Recall F-measure Ranking MA-HA 0.950 0.936 0.943 1 FMA-NCI 0.838 0.872 0.855 1 FMA-SNOMED 0.882 0.687 0.773 1 SNOMED-NCI 0.904 0.668 0.768 1 HP-MP - - - Top 3 DOID-ORDO - - - Top 3
  38. 38. 38 HP FMA PATO constricted Beyond Binary Matching Compound Ontology Matching aortic stenosis aorta
  39. 39. Compound Matching Algorithm HP:0001650 aortic stenosis PATO:000184 7 constricted Step 1 FMA:3734 aorta Step 2 stenosis Remove unmapped source classes and mapped words. Selection 39 Compound Ontology Matching
  40. 40. Manual evaluation 40 Compound Ontology Matching Evaluated in 6 ontology sets with logical definitions Precision between 0.82 and 1.0 900 new candidate logical definitions Applied to Crop Ontology - Plant Ontology - PATO and Plant Trait Ontology - Plant Ontology - PATO Oliveira, D. and Pesquita, C. (2015) Compound Matching of Biomedical Ontologies. ICBO
  41. 41. AML in action Life sciences Global Agricultural Concept Scheme (FAO) Mapping the Crop Ontology to references Integration of pharmacological vocabularies (Jansen Pharma) Comp. of PhenomeNET for ontology matching (Garcia et al, 2016) Healthcare Semantic knowledge-base form public healthcare system (India) Translation of SNOMED-CT (Silva et al. 2015) Antibiotic resistance monitoring Geospatial and environmental Satellite Data Semantic Interoperability (Abburu,2015) Mapping SWEET to ENVO Others Comp. of eXtreme Design methodology (Dragisic et al. 2015) Business process matching (Bahkshandeh et al., 2015) 41
  42. 42. Clustering with Semantic Similarity across Multiple Ontologies https://github.com/csalexandre/SESAME.git 42 Annotation to Multiple Ontologies BioPortal Match Ontologies AML Calculate Semantic Similarity SML Clustering in Semantic Space WEKA SESAME
  43. 43. Clustering with Semantic Similarity across Multiple Ontologies https://github.com/csalexandre/SESAME.git 43 Annotation to Multiple Ontologies BioPortal Match Ontologies AML Calculate Semantic Similarity SML Clustering in Semantic Space WEKA SESAME
  44. 44. Acknowledgements Daniel Faria, IGC, Portugal Francisco Couto, U. Lisboa, Portugal Isabel Cruz, U. Illinois, USA Emanuel Santos, RMIT University, Vietnam Daniela Oliveira, Insight Centre, Ireland Catarina Martins, University of Manchester, UK Carlos A. Santos, U. Lisboa, Portugal and many others 44
  45. 45. https://github.com/AgreementMakerLight clpesquita@fc.ul.pt 45

×