SlideShare a Scribd company logo
1 of 45
Download to read offline
SENSE AND SIMILARITY
making sense of similarity for ontologies
Catia Pesquita
LASIGE, Faculdade de Ciências, Universidade de Lisboa
20th Bio-Ontologies@ISMB 2017
1
Similarity
Shepherd, 1957
Points in space
Distance
2
Similarity
Shepherd, 1957
Points in space
Distance
Tversky, 1977
Sets of features
Commonalities and differences
3
Ahoj!
Hallo!
Representation of objects in Biology
4
Outline
Similarity within an ontology
Class similarity
Annotated entities similarity
Challenges and opportunities
Similarity between ontologies
Biomedical Ontology matching
Challenges and opportunities
AgreementMakerLight
5
Similarity within an ontology
6
Why Semantic Similarity for Biomedical Ontologies?
7
validate protein-protein interactions (Jain & Bader, 2010)
evaluating functional coherence of gene sets (Bastos et al, 2013)
classification of chemical compounds (Ferreira et al, 2013)
calculating similarity of clinical models (Gøeg et al, 2015)
diagnosing patients (Köhler et al, 2009)
suggesting candidate genes involved in diseases (Li et al.,
2011)
Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
8
Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity
in biomedical ontologies. PLoS computational biology, 5(7), e1000443.
Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
9
Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity
in biomedical ontologies. PLoS computational biology, 5(7), e1000443.
How to measure class specificity?
Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
10(Lord et al. 2003)
Semantic Similarity in Biomedical Ontologies
lyase actitvity hydrolase actitvity
molecular function
catalytic activity binding
ion binding
copper
ion binding
ATP binding
iron
ion binding
11
How to address annotation quality impact?
Measuring class specificity with depth
molecular function
toxin activity
(9)
catalytic activity
(369044)
...
... ...
cytochrome-c
oxidase activity
(2066)
...
...
12
Variable semantic specificity at same depth
Measuring term specificity with corpus-based
Information Content
Corpus-bias effect of rarely used but generic classes
Not all ontologies have annotations
molecular function
toxin activity
(9)
catalytic activity
(369044)
...
... ...
cytochrome-c
oxidase activity
(2066)
...
...
13
IC = -log p(c)
(Resnik, 1995)
Measuring term specificity with structural
Information Content
molecular function
toxin activity
(9)
catalytic activity
(369044)
...
... ...
cytochrome-c
oxidase activity
(2066)
...
...
14
Lack of subclasses may be due to ontology incompleteness
(Seco et al., 2004)
IC = 1-
log(subclass(c) + 1)
log(max(c))
Impact of annotation quality
Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O.
(2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519.
64%
incomplete
annotation 23%
inconsistent
annotation
Gene Ontology
15
98%
electronic
annotations
Impact of annotation quality
Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O.
(2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519.
23%
inconsistent
annotation
16
cytochrome-c oxidase activity
cytochrome-c oxidase activity
electron carrier activity
cytochrome-c oxidase activity
electron carrier activity
heme binding
cytochrome-c oxidase activity
electron carrier activity
heme binding
copper ion binding
Evaluation of Semantic Similarity Measures
22k pairs of proteins
Pre-computed similarities with classical measures
Correlation to sequence, PFam family and EC class
20% of new GO-based SS measures use CESSM
17http://xldb.di.fc.ul.pt/biotools/cessm2014/
Gene Ontology
Future Directions
Explore growing semantic richness
disjoint axioms
different types of relationships
logical definitions and cross-products
Improve computational efficiency
semantic similarity based searches
Semantic similarity across multiple ontologies
18
Similarity between ontologies
19
Ontology Matching
20
Input: Two ontologies
Output: Alignment
Alignment: optimal set of mappings between the entities
Mapping: relates two entities and has a score
Why match Biomedical Ontologies?
Salvadores et al. Semant Web. 2013; 4(3): 277–284.
https://bioportal.bioontology.org/, on July, 2017
21
Simple Lexical Mappings are not enough
High precision but low recall
Mouse Anatomy - NCI Human Anatomy (OAEI Anatomy track)
LOOM: 99% precision, 65% recall
AML: 95% precision, 93.5% recall
leghind limb
22
Simple Lexical Mappings are not enough
Potential incoherences
23
Chemicals_and
_Drugs_Kind
Anatomical_
Entity
Anatomy_Kind
Gingiva Gum Gingiva
Faria, Daniel, et al. "Towards annotating potential incoherences in BioPortal mappings." ISWC,
2014.
Challenges and Opportunities in Biomedical
Ontology Alignment
large size
rich and complex vocabulary
different modeling views
abundant sources of background knowledge
going beyond binary matching
24
AgreementMakerLight
Ontology
Loading
Ontology
Matching
Filtering
Input
Ontology
1
Input
Ontology
2
Background
Knowledge
Final
Alignment
Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I. F., & Couto, F. M. (2013).
The agreementmakerlight ontology matching system. In OTM Confederated International Conferences" On
the Move to Meaningful Internet Systems" (pp. 527-541).
25
26
Large Size
HashMaps to store Lexicon and Relationships
Hash-based matchers as primary matchers
No similarity matrix
27
Rich and complex vocabulary
Uses all labels
Assigns different weights to
labels
Extends synonyms through
the Thesaurus Matcher
28
stomach secretion
gastric secretion
gall bladder serosa
biliary serosa
stomach serosa
Deriving new synonyms for the Thesaurus Matcher
gastric
stomach
biliary
gall bladder
Synonyms Thesaurus
gastric serosa
gall bladder
biliary
New Synonyms
Pesquita, C., Faria, D., Stroe, C., Santos, E., Cruz, I. F., & Couto, F. M. (2013). What’s in a ‘nym’?
Synonyms in Biomedical Ontology Matching. ISWC 29
Different modeling views
30
body part
surface of cell
anatomical entity
anatomical
surface
cardinal cell part
surface of
epithelial cell
cell part
cell surface
Different modeling views
Can cause incoherences
31
body part
surface of cell/
cell surface
anatomical entity
anatomical
surface
cardinal cell part/
cell part
surface of
epithelial cell
Different modeling views
Repair by removing mappings
32
body part
surface of cell/
cell surface
anatomical entity
anatomical
surface
surface of
epithelial cell
cardinal cell part cell part
Santos, Emanuel, Daniel Faria, Catia Pesquita, and Francisco M. Couto. "Ontology alignment repair
through modularization and confidence-based heuristics." PloS one 10, no. 12 (2015)
To repair or not to repair
Repair can cause loss of information
Information preservation vs. alignment coherence
Pesquita, C. et al. (2013). Proc. of the 8th International Conference on Ontology Matching-Volume
1111 (pp. 13-24).
33
Visualizing incoherences
34
Catarina Martins, Ernesto Jimenez-Ruiz, Emanuel Santos and Catia Pesquita (2015)
Towards visualizing the mapping incoherences in Bioportal, ICBO
Cross-references
Mediating matchers
Logical definitions
Background Knowledge
Mouse Anatomy
NCI-Human
Anatomy
UBERON
35
Automated selection of background knowledge
Mapping gain over a baseline alignment
Combine multiple sources
Faria, D., Pesquita, C., Santos, E., Cruz, I. F., & Couto, F. M. (2014). Automatic background knowledge
selection for matching biomedical ontologies. PloS one, 9(11), e111226. 36
Ontology Alignment Evaluation Initiative 2016
37
Task Precision Recall F-measure Ranking
MA-HA 0.950 0.936 0.943 1
FMA-NCI 0.838 0.872 0.855 1
FMA-SNOMED 0.882 0.687 0.773 1
SNOMED-NCI 0.904 0.668 0.768 1
HP-MP - - - Top 3
DOID-ORDO - - - Top 3
38
HP
FMA
PATO
constricted
Beyond Binary Matching
Compound Ontology Matching
aortic
stenosis
aorta
Compound Matching Algorithm
HP:0001650
aortic
stenosis
PATO:000184
7
constricted
Step 1
FMA:3734
aorta
Step 2
stenosis
Remove unmapped
source classes and
mapped words.
Selection
39
Compound Ontology Matching
Manual evaluation
40
Compound Ontology Matching
Evaluated in 6 ontology sets with logical definitions
Precision between 0.82 and 1.0
900 new candidate logical definitions
Applied to Crop Ontology - Plant Ontology - PATO
and Plant Trait Ontology - Plant Ontology - PATO
Oliveira, D. and Pesquita, C. (2015) Compound Matching of Biomedical Ontologies. ICBO
AML in action
Life sciences
Global Agricultural Concept Scheme (FAO)
Mapping the Crop Ontology to references
Integration of pharmacological vocabularies (Jansen Pharma)
Comp. of PhenomeNET for ontology matching (Garcia et al,
2016)
Healthcare
Semantic knowledge-base form public healthcare system (India)
Translation of SNOMED-CT (Silva et al. 2015)
Antibiotic resistance monitoring
Geospatial and
environmental
Satellite Data Semantic Interoperability (Abburu,2015)
Mapping SWEET to ENVO
Others
Comp. of eXtreme Design methodology (Dragisic et al. 2015)
Business process matching (Bahkshandeh et al., 2015)
41
Clustering with Semantic Similarity across Multiple
Ontologies
https://github.com/csalexandre/SESAME.git 42
Annotation to
Multiple Ontologies
BioPortal
Match Ontologies
AML
Calculate Semantic
Similarity
SML
Clustering in
Semantic Space
WEKA
SESAME
Clustering with Semantic Similarity across Multiple
Ontologies
https://github.com/csalexandre/SESAME.git 43
Annotation to
Multiple Ontologies
BioPortal
Match Ontologies
AML
Calculate Semantic
Similarity
SML
Clustering in
Semantic Space
WEKA
SESAME
Acknowledgements
Daniel Faria, IGC, Portugal
Francisco Couto, U. Lisboa, Portugal
Isabel Cruz, U. Illinois, USA
Emanuel Santos, RMIT University, Vietnam
Daniela Oliveira, Insight Centre, Ireland
Catarina Martins, University of Manchester, UK
Carlos A. Santos, U. Lisboa, Portugal
and many others
44
https://github.com/AgreementMakerLight
clpesquita@fc.ul.pt
45

More Related Content

Similar to Sense and Similarity: making sense of similarity for ontologies

Systems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural PerspectiveSystems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural PerspectivePhilip Bourne
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...CSCJournals
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsCatia Pesquita
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
download
downloaddownload
downloadbutest
 
PublicationsJan_2017
PublicationsJan_2017PublicationsJan_2017
PublicationsJan_2017Peter Rogan
 
Pep Talk San Diego 011311
Pep Talk San Diego 011311Pep Talk San Diego 011311
Pep Talk San Diego 011311Philip Bourne
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Hilmar Lapp
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Identification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesIdentification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesYoann Pageaud
 
Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...
Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...
Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...taxonbytes
 
Protein association networks with STRING
Protein association networks with STRINGProtein association networks with STRING
Protein association networks with STRINGLars Juhl Jensen
 
Quorum sensing in Archaea
Quorum sensing in ArchaeaQuorum sensing in Archaea
Quorum sensing in ArchaeaZahra Naz
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 
importance of pathogenomics in plant pathology
importance of pathogenomics in plant pathologyimportance of pathogenomics in plant pathology
importance of pathogenomics in plant pathologyvinay ju
 

Similar to Sense and Similarity: making sense of similarity for ontologies (20)

Systems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural PerspectiveSystems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural Perspective
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applications
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
download
downloaddownload
download
 
PublicationsJan_2017
PublicationsJan_2017PublicationsJan_2017
PublicationsJan_2017
 
Pep Talk San Diego 011311
Pep Talk San Diego 011311Pep Talk San Diego 011311
Pep Talk San Diego 011311
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Identification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesIdentification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databases
 
Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...
Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...
Franz 2016 Phenotype RCN Representing Taxonomy and Phylogeny as Logically Tra...
 
Text and data integration
Text and data integrationText and data integration
Text and data integration
 
Protein association networks with STRING
Protein association networks with STRINGProtein association networks with STRING
Protein association networks with STRING
 
D03011027030
D03011027030D03011027030
D03011027030
 
Quorum sensing in Archaea
Quorum sensing in ArchaeaQuorum sensing in Archaea
Quorum sensing in Archaea
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
importance of pathogenomics in plant pathology
importance of pathogenomics in plant pathologyimportance of pathogenomics in plant pathology
importance of pathogenomics in plant pathology
 

Recently uploaded

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 

Recently uploaded (17)

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 

Sense and Similarity: making sense of similarity for ontologies

  • 1. SENSE AND SIMILARITY making sense of similarity for ontologies Catia Pesquita LASIGE, Faculdade de Ciências, Universidade de Lisboa 20th Bio-Ontologies@ISMB 2017 1
  • 3. Similarity Shepherd, 1957 Points in space Distance Tversky, 1977 Sets of features Commonalities and differences 3 Ahoj! Hallo!
  • 5. Outline Similarity within an ontology Class similarity Annotated entities similarity Challenges and opportunities Similarity between ontologies Biomedical Ontology matching Challenges and opportunities AgreementMakerLight 5
  • 6. Similarity within an ontology 6
  • 7. Why Semantic Similarity for Biomedical Ontologies? 7 validate protein-protein interactions (Jain & Bader, 2010) evaluating functional coherence of gene sets (Bastos et al, 2013) classification of chemical compounds (Ferreira et al, 2013) calculating similarity of clinical models (Gøeg et al, 2015) diagnosing patients (Köhler et al, 2009) suggesting candidate genes involved in diseases (Li et al., 2011)
  • 8. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 8 Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLoS computational biology, 5(7), e1000443.
  • 9. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 9 Pesquita, C., Faria, D., Falcao, A. O., Lord, P., & Couto, F. M. (2009). Semantic similarity in biomedical ontologies. PLoS computational biology, 5(7), e1000443. How to measure class specificity?
  • 10. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 10(Lord et al. 2003)
  • 11. Semantic Similarity in Biomedical Ontologies lyase actitvity hydrolase actitvity molecular function catalytic activity binding ion binding copper ion binding ATP binding iron ion binding 11 How to address annotation quality impact?
  • 12. Measuring class specificity with depth molecular function toxin activity (9) catalytic activity (369044) ... ... ... cytochrome-c oxidase activity (2066) ... ... 12 Variable semantic specificity at same depth
  • 13. Measuring term specificity with corpus-based Information Content Corpus-bias effect of rarely used but generic classes Not all ontologies have annotations molecular function toxin activity (9) catalytic activity (369044) ... ... ... cytochrome-c oxidase activity (2066) ... ... 13 IC = -log p(c) (Resnik, 1995)
  • 14. Measuring term specificity with structural Information Content molecular function toxin activity (9) catalytic activity (369044) ... ... ... cytochrome-c oxidase activity (2066) ... ... 14 Lack of subclasses may be due to ontology incompleteness (Seco et al., 2004) IC = 1- log(subclass(c) + 1) log(max(c))
  • 15. Impact of annotation quality Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O. (2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519. 64% incomplete annotation 23% inconsistent annotation Gene Ontology 15 98% electronic annotations
  • 16. Impact of annotation quality Faria, D., Schlicker, A., Pesquita, C., Bastos, H., Ferreira, A. E., Albrecht, M., & Falcão, A. O. (2012). Mining GO annotations for improving annotation consistency. PloS one, 7(7), e40519. 23% inconsistent annotation 16 cytochrome-c oxidase activity cytochrome-c oxidase activity electron carrier activity cytochrome-c oxidase activity electron carrier activity heme binding cytochrome-c oxidase activity electron carrier activity heme binding copper ion binding
  • 17. Evaluation of Semantic Similarity Measures 22k pairs of proteins Pre-computed similarities with classical measures Correlation to sequence, PFam family and EC class 20% of new GO-based SS measures use CESSM 17http://xldb.di.fc.ul.pt/biotools/cessm2014/ Gene Ontology
  • 18. Future Directions Explore growing semantic richness disjoint axioms different types of relationships logical definitions and cross-products Improve computational efficiency semantic similarity based searches Semantic similarity across multiple ontologies 18
  • 20. Ontology Matching 20 Input: Two ontologies Output: Alignment Alignment: optimal set of mappings between the entities Mapping: relates two entities and has a score
  • 21. Why match Biomedical Ontologies? Salvadores et al. Semant Web. 2013; 4(3): 277–284. https://bioportal.bioontology.org/, on July, 2017 21
  • 22. Simple Lexical Mappings are not enough High precision but low recall Mouse Anatomy - NCI Human Anatomy (OAEI Anatomy track) LOOM: 99% precision, 65% recall AML: 95% precision, 93.5% recall leghind limb 22
  • 23. Simple Lexical Mappings are not enough Potential incoherences 23 Chemicals_and _Drugs_Kind Anatomical_ Entity Anatomy_Kind Gingiva Gum Gingiva Faria, Daniel, et al. "Towards annotating potential incoherences in BioPortal mappings." ISWC, 2014.
  • 24. Challenges and Opportunities in Biomedical Ontology Alignment large size rich and complex vocabulary different modeling views abundant sources of background knowledge going beyond binary matching 24
  • 25. AgreementMakerLight Ontology Loading Ontology Matching Filtering Input Ontology 1 Input Ontology 2 Background Knowledge Final Alignment Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I. F., & Couto, F. M. (2013). The agreementmakerlight ontology matching system. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 527-541). 25
  • 26. 26
  • 27. Large Size HashMaps to store Lexicon and Relationships Hash-based matchers as primary matchers No similarity matrix 27
  • 28. Rich and complex vocabulary Uses all labels Assigns different weights to labels Extends synonyms through the Thesaurus Matcher 28
  • 29. stomach secretion gastric secretion gall bladder serosa biliary serosa stomach serosa Deriving new synonyms for the Thesaurus Matcher gastric stomach biliary gall bladder Synonyms Thesaurus gastric serosa gall bladder biliary New Synonyms Pesquita, C., Faria, D., Stroe, C., Santos, E., Cruz, I. F., & Couto, F. M. (2013). What’s in a ‘nym’? Synonyms in Biomedical Ontology Matching. ISWC 29
  • 30. Different modeling views 30 body part surface of cell anatomical entity anatomical surface cardinal cell part surface of epithelial cell cell part cell surface
  • 31. Different modeling views Can cause incoherences 31 body part surface of cell/ cell surface anatomical entity anatomical surface cardinal cell part/ cell part surface of epithelial cell
  • 32. Different modeling views Repair by removing mappings 32 body part surface of cell/ cell surface anatomical entity anatomical surface surface of epithelial cell cardinal cell part cell part Santos, Emanuel, Daniel Faria, Catia Pesquita, and Francisco M. Couto. "Ontology alignment repair through modularization and confidence-based heuristics." PloS one 10, no. 12 (2015)
  • 33. To repair or not to repair Repair can cause loss of information Information preservation vs. alignment coherence Pesquita, C. et al. (2013). Proc. of the 8th International Conference on Ontology Matching-Volume 1111 (pp. 13-24). 33
  • 34. Visualizing incoherences 34 Catarina Martins, Ernesto Jimenez-Ruiz, Emanuel Santos and Catia Pesquita (2015) Towards visualizing the mapping incoherences in Bioportal, ICBO
  • 35. Cross-references Mediating matchers Logical definitions Background Knowledge Mouse Anatomy NCI-Human Anatomy UBERON 35
  • 36. Automated selection of background knowledge Mapping gain over a baseline alignment Combine multiple sources Faria, D., Pesquita, C., Santos, E., Cruz, I. F., & Couto, F. M. (2014). Automatic background knowledge selection for matching biomedical ontologies. PloS one, 9(11), e111226. 36
  • 37. Ontology Alignment Evaluation Initiative 2016 37 Task Precision Recall F-measure Ranking MA-HA 0.950 0.936 0.943 1 FMA-NCI 0.838 0.872 0.855 1 FMA-SNOMED 0.882 0.687 0.773 1 SNOMED-NCI 0.904 0.668 0.768 1 HP-MP - - - Top 3 DOID-ORDO - - - Top 3
  • 38. 38 HP FMA PATO constricted Beyond Binary Matching Compound Ontology Matching aortic stenosis aorta
  • 39. Compound Matching Algorithm HP:0001650 aortic stenosis PATO:000184 7 constricted Step 1 FMA:3734 aorta Step 2 stenosis Remove unmapped source classes and mapped words. Selection 39 Compound Ontology Matching
  • 40. Manual evaluation 40 Compound Ontology Matching Evaluated in 6 ontology sets with logical definitions Precision between 0.82 and 1.0 900 new candidate logical definitions Applied to Crop Ontology - Plant Ontology - PATO and Plant Trait Ontology - Plant Ontology - PATO Oliveira, D. and Pesquita, C. (2015) Compound Matching of Biomedical Ontologies. ICBO
  • 41. AML in action Life sciences Global Agricultural Concept Scheme (FAO) Mapping the Crop Ontology to references Integration of pharmacological vocabularies (Jansen Pharma) Comp. of PhenomeNET for ontology matching (Garcia et al, 2016) Healthcare Semantic knowledge-base form public healthcare system (India) Translation of SNOMED-CT (Silva et al. 2015) Antibiotic resistance monitoring Geospatial and environmental Satellite Data Semantic Interoperability (Abburu,2015) Mapping SWEET to ENVO Others Comp. of eXtreme Design methodology (Dragisic et al. 2015) Business process matching (Bahkshandeh et al., 2015) 41
  • 42. Clustering with Semantic Similarity across Multiple Ontologies https://github.com/csalexandre/SESAME.git 42 Annotation to Multiple Ontologies BioPortal Match Ontologies AML Calculate Semantic Similarity SML Clustering in Semantic Space WEKA SESAME
  • 43. Clustering with Semantic Similarity across Multiple Ontologies https://github.com/csalexandre/SESAME.git 43 Annotation to Multiple Ontologies BioPortal Match Ontologies AML Calculate Semantic Similarity SML Clustering in Semantic Space WEKA SESAME
  • 44. Acknowledgements Daniel Faria, IGC, Portugal Francisco Couto, U. Lisboa, Portugal Isabel Cruz, U. Illinois, USA Emanuel Santos, RMIT University, Vietnam Daniela Oliveira, Insight Centre, Ireland Catarina Martins, University of Manchester, UK Carlos A. Santos, U. Lisboa, Portugal and many others 44