SlideShare a Scribd company logo
1 of 40
Download to read offline
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
LogMap
Large-scale, Logic-based and Interactive
Ontology Matching
Ernesto Jiménez-Ruiz Bernardo Cuenca Grau
Yujiao Zhou Ian Horrocks
Department of Computer Science, University of Oxford
European Conference on Artificial Intelligence (ECAI)
29 August 2012
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Outline
Preliminaries
Challenges
LogMap Anatomy
Maximising recall
Maximising precision
Evaluation
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Ontologies and OWL (I)
Ontologies
• Formal representation of the knowledge of a domain.
OWL 2 Language
• Web Ontology language (OWL) is World Wide Web
Consortium (W3C) standard.
• OWL 2 corresponds to a decidable fragment of first-order
logic.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Ontologies and OWL (II)
OWL 2 example axioms
• JuvenileArthritis v JuvenileDisease
• PolyArthritis ≡ Arthritis u > 5 affects.Joint
• Disease u Joint v ⊥
• JuvenileIdiopathicArthritis @ “Juvenile Rheumatoid Arthritis”
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Ontology mappings
Mappings are tuples he1, e2, n, ρi
• e1, e2 are entities in the O1 and O2
• n a confidence value between 0 and 1
• ρ is the semantic relationship between e1 and e2
Formalized as OWL 2 axioms
• Where the semantic relationship ρ is one of {≡, v, w, ⊥}
• No extra semantics
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Outline
Preliminaries
Challenges
LogMap Anatomy
Maximising recall
Maximising precision
Evaluation
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Challenges
Why ontogy matching tools?
• Ontologies are being developed by different groups, and
• Use different classifications and naming schemas.
• (Biomedical) ontologies may contain tends of thousands of
entities.
• FMA (78, 989 classes), NCI (66, 724 classes) or SNOMED
CT (306, 591 classes) are prominent examples.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Challenges
Challenges to be addressed
• Sufficient scalability to deal with large ontologies
• Detect and repair errors.
• Reasoning with OU := O1 ∪ O2 ∪ M may lead to (a large
number of) unsatisfiable clases (i.e, OU |= A v ⊥)
• Reasoning and repairing OU aggravates scalability problem
• Logic-based but scalable techniques
• Involve the expert user (if accurate mappings are needed)
• Minimise number of requests
• Reduce delay between requests
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Challenges
Challenges to be addressed
• Sufficient scalability to deal with large ontologies
• Detect and repair errors.
• Reasoning with OU := O1 ∪ O2 ∪ M may lead to (a large
number of) unsatisfiable clases (i.e, OU |= A v ⊥)
• Reasoning and repairing OU aggravates scalability problem
• Logic-based but scalable techniques
• Involve the expert user (if accurate mappings are needed)
• Minimise number of requests
• Reduce delay between requests
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Challenges
Challenges to be addressed
• Sufficient scalability to deal with large ontologies
• Detect and repair errors.
• Reasoning with OU := O1 ∪ O2 ∪ M may lead to (a large
number of) unsatisfiable clases (i.e, OU |= A v ⊥)
• Reasoning and repairing OU aggravates scalability problem
• Logic-based but scalable techniques
• Involve the expert user (if accurate mappings are needed)
• Minimise number of requests
• Reduce delay between requests
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Outline
Preliminaries
Challenges
LogMap Anatomy
Maximising recall
Maximising precision
Evaluation
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Our approach in a nutshell
LogMap . . .
• can efficiently match semantically rich ontologies containing
tens (and even hundreds) of thousands of classes;
• incorporates sophisticated reasoning and repair capabilities;
• provides support for user intervention during the matching
process.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
LogMap Antomy
LogMap can be divided in . . .
• Stage 1: maximising recall.
• The goal is to reduce search space
• and extract an overestimation of the mappings
• Stage 2: maximising precision.
• The goal is to return a set of (precise) mappings
• not leading to many logical inconsistencies.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Outline
Preliminaries
Challenges
LogMap Anatomy
Maximising recall
Maximising precision
Evaluation
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Lexical indexation and mapping computation
Inverted Files
• The ontology lexicon is indexed in an inverted file (IF)
• Each entry in the IF is a “set” of words corresponding to
exact or partial entity labels
Inverted Index for FMA Ids for FMA class URIs
Index entry Class ids Class id Class URI (namespace omitted)
{ acinus } 6953,7661,8171 6953 Mixed acinus
{ hepatic,acinus } 8171 7661 Serious acinus
{ acinus,mixed } 6953 8171 Hepatic acinus
{ serious,acinus } 7661 1170 Branch of common cochlear artery
{ common,branch,artery } 1170,7842 7842 Branch of common interosseous artery
Inverted Index for NCI Ids for NCI class URIs
Index entry Class ids Class id Class URI (namespace omitted)
{ acinus } 18081 18081 Liver acinus
{ liver } 18081 8087 Common iliac artery branch
{ acinus,liver } 18081 27727 Common femoral artery branch
{ common,branch,artery } 1204,8087,27727 1204 Common carotid artery branch
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Lexical indexation and mapping computation
Intersection of inverted files
• As a result we obtain an overestimation of the candidate
mappings (M?).
• This step condiderably reduces the search space (e.g.
19,151 for FMA-NCI with Recall=0.93 and Precision=0.14)
• Note that, most of them will turn out to be incorrect (i.e.
Serious acinus ≡ Liver acinus)
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Overlapping estimation
• LogMap extracts two fragments (O0
1 and O0
2) representing
the overlapping between the input ontologies (via M?)
• Logic-based modularization techniques are used
• Characteristics:
• Correct mappings are unlikely to involve classes outside these
fragments
• The use of fragments is key for the scalability challenge
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Outline
Preliminaries
Challenges
LogMap Anatomy
Maximising recall
Maximising precision
Evaluation
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Identifying reliable mappings
We select Mr ⊆ M?
Based on. . .
• High lexical similarity (using the string matcher ISUB)
• A principle of locality
• Correct mappings (C1 ≡ C2) are likely to have similar scopes
(classes semantically related)
• E.g. 2,281 (out of 19,151) reliable mappings for FMA-NCI
(P=0.91)
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Identifying reliable mappings
FMA:Trapezoid ≡ NCI:Trapezoid (non reliable) vs
FMA:Trapezoid ≡ NCI:TrapezoidBone (reliable)
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Reasoning with the reliable mappings (Mr )
Detecting an repairing unsatisfiabilities
• Mr are tipically very precise but may lead to many
unsatifiabilities (> 600 for FMA-NCI)
• LogMap implements efficient methods to repair most of them
(only misses two cases for FMA-NCI)
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Reasoning with the reliable mappings (Mr )
Propositional Horn representation
• LogMap encodes the (classified) fragments O0
1 and O0
2 as
Propositional Horn theories P0
1 and P0
2
• This is key to LogMap’s scalability
Propositional FMA (P1) Propositional NCI (P2)
(1) Smegma → Secretion (8) Smegma → ExocrineGlandFluid
(2) Secretion → PortionBodySusbstance (9) ExocrineGlandFluid → Anatomy
(3) PortionBodySusbstance → AnatomicalEntity (10) CellularSecretion → TransmembraneTransport
Computed mappings (PM ) (11) TransmembraneTransport → TransportProcess
(m4) FMA:Secretion → NCI:CellularSecretion (12) TransportProcess → BiologicalProcess
(m5) NCI:CellularSecretion → FMA:Secretion (13) Anatomy ∧ BiologicalProcess → false
(m6) FMA:Smegma → NCI:Smegma (14) ExocrineGlandFluid ∧ ExfolCells → Smegma
(m7) NCI:Smegma → FMA:Smegma
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Reasoning with the reliable mappings (Mr )
Propositional Horn SAT with Dowling-Gallier (D-G)
• LogMap implements the SAT algorithm D-G
• D-G is called for every class C and the propositional theory
PC :
• the rule (true → C);
• the propositional representations P0
1 and P0
2 of the input
ontologies; and
• the propositional representation PM of the mappings.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Reasoning with the reliable mappings (Mr )
Satisfiability of Smegma
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Reasoning with reliable mappings (Mr )
Mapping repair
• LogMap extends D-G to record conflictive mappings involved
in an unsatisfiability (e.g. {m4, m5, m6, m7}).
• LogMap implements a ‘greedy’ repair algorithm to compute
repairs for each unsatisfiability
• LogMap finds all repairs of “smallest” size.
• E.g.: R1 = {m4} and R2 = {m6}
• The repair with less confidence is selected.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Reasoning with the reliable mappings (Mr )
Our class satisfiability algorithm is . . .
• sound
• If LogMap finds a class unsatisfiable, it is indeed unsatisfiable.
• worst-case linear in the size of the (classified) ontologies.
• incomplete, but incompleteness is mitigated:
• Most of the relevant non-propositional reasoning is already
performed when classifying input ontologies independently
• Mappings are Horn propositional axioms
• Most new entailments caused by the mappings likely to be
computable using Horn propositional reasoning only (only 2
cases missed out of 600 for FMA-NCI)
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Assessing M?
 Mr
Semantic Index
• LogMap indexes P0
1, P0
2 and the repaired Mr are efficiently
indexed using an interval labelling schema.
• LogMap efficiently discards mappings in M?  Mr that are
in conflict with semantics index.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Assessing M?
 Mr
Revision of Confidence values
• Co-occurence anaylisis
• NCI : Hepatic acinus v FMA : Liver Acinus?
• Principle of locality
User feedback
• Clear-cut mappings in M?  Mr are either discarded or
included in the output.
• The rest are (optionally) given to the expert user.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Assessing M?
 Mr : user feedback
Feedback requests
• Many candidate mappings are discarded automatically in the
previous steps (more than 16,000 for FMA-NCI).
• The number or non clear-cuts may still be high (852 for
FMA-NCI)
User interaction in LogMap
• LogMap performs automatic actions based on user decisions
to reduce the number of remaining requests. Criteria:
• Ambiguity
• Conflicts with semantic index
• Delay to compute automatic questions is negligible
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Assessing M?
 Mr : user feedback
Feedback requests
• Many candidate mappings are discarded automatically in the
previous steps (more than 16,000 for FMA-NCI).
• The number or non clear-cuts may still be high (852 for
FMA-NCI)
User interaction in LogMap
• LogMap performs automatic actions based on user decisions
to reduce the number of remaining requests. Criteria:
• Ambiguity
• Conflicts with semantic index
• Delay to compute automatic questions is negligible
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Final Diagnosis
Horn porpositional reasoning
• We perform a final repair step before returning the output
mappings (M).
OWL 2 reasoning (optional)
• Additionally we (optionally) check how clean is M using an
off-the-shelf OWL 2 reasoner
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Final Diagnosis
Horn porpositional reasoning
• We perform a final repair step before returning the output
mappings (M).
OWL 2 reasoning (optional)
• Additionally we (optionally) check how clean is M using an
off-the-shelf OWL 2 reasoner
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Outline
Preliminaries
Challenges
LogMap Anatomy
Maximising recall
Maximising precision
Evaluation
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Evaluation
Ontology Alignment Evaluation Campaign (OAEI)
http://oaei.ontologymatching.org/
• The OAIE is an annual international campaign for the
systematic evaluation of ontology matching systems
• LogMap has been one of the top tools in 2011 and 2011.5,
and
• currently is the unique matching systems to scale to large
ontologies and perform reasoning over their integration
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Evaluation
Matching FMA, NCI and SNOMED with LogMap
• OAEI Large BioMed track:
http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/
Ontologies |MGS |
Upper bound M? Reliable Mr Output M
|M?| P R |Mr | P R |M| P R ⊥
FMA-NCI 2,898 19,151 0.14 0.93 2,256 0.91 0.71 2,658 0.87 0.80 2
FMA-SNMD 8,111 67,592 0.09 0.74 4,929 0.84 0.51 6,313 0.80 0.62 0
SNMD-NCI 18,322 102,514 0.13 0.75 10,598 0.86 0.50 12,978 0.81 0.58 *
• GOMMA can also (successfully) cope with FMA-NCI
• P=0.85, R=0.78, F=0.81
• GOMMA mappings lead to > 5,000 unsatisfiable classes
• GOMMA matches FMA-NCI in 48min while LogMap in 4min
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Evaluation
Matching FMA, NCI and SNOMED with LogMap
• OAEI Large BioMed track:
http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/
Ontologies |MGS |
Upper bound M? Reliable Mr Output M
|M?| P R |Mr | P R |M| P R ⊥
FMA-NCI 2,898 19,151 0.14 0.93 2,256 0.91 0.71 2,658 0.87 0.80 2
FMA-SNMD 8,111 67,592 0.09 0.74 4,929 0.84 0.51 6,313 0.80 0.62 0
SNMD-NCI 18,322 102,514 0.13 0.75 10,598 0.86 0.50 12,978 0.81 0.58 *
• GOMMA can also (successfully) cope with FMA-NCI
• P=0.85, R=0.78, F=0.81
• GOMMA mappings lead to > 5,000 unsatisfiable classes
• GOMMA matches FMA-NCI in 48min while LogMap in 4min
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Evaluation
User interaction in LogMap
• We have “simulated” the human expert using Gold Standard
mappings to return the correct answer with a given probability.
• We have matched more than 1,100 medium-sized modules of
NCI to (the whole of) FMA
• Number of feedback requests is manegeable
• LogMap (with not interaction) closely behaves as LogMap
with interaction and 30% error rate.
Preliminaries Challenges LogMap Max Recall Max Precision Evaluation
Evaluation
Conclusions and future work
• We aim at creating a suitable interface for user interaction
• Instance and property matching already included in
LogMap but still under development.
• We also intend to implement multilingual features
• LogMap is available for download:
http://www.cs.ox.ac.uk/isg/tools/LogMap/
• It also has a Web interface:
http://csu6325.cs.ox.ac.uk/
Questions?
We want you to. . .
• . . . test LogMap and give us feedback
• . . . provide us with your ontologies and use cases
Thank you for your attention
• LogMap Project:
http://www.cs.ox.ac.uk/isg/projects/LogMap/
• Web interface:
http://csu6325.cs.ox.ac.uk/
• ernesto.jimenez.ruiz@gmail.com

More Related Content

What's hot

Deep learning book_chap_02
Deep learning book_chap_02Deep learning book_chap_02
Deep learning book_chap_02HyeongGooKang
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmChenghao Jin
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsEnrico Palumbo
 

What's hot (8)

Deep learning book_chap_02
Deep learning book_chap_02Deep learning book_chap_02
Deep learning book_chap_02
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Knowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender SystemsKnowledge Graph Embeddings for Recommender Systems
Knowledge Graph Embeddings for Recommender Systems
 

Similar to LogMap: Large-scale, Logic-based and Interactive Ontology Matching

Evaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical OntologiesEvaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical OntologiesErnesto Jimenez Ruiz
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
VRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVictor Pillac
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautzbutest
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Innovation Quotient Pvt Ltd
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processingnxmaosdh232
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptxSeungeon Baek
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsrDebora Da Rosa
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
 
A time study in numerical methods programming
A time study in numerical methods programmingA time study in numerical methods programming
A time study in numerical methods programmingGlen Alleman
 
A time study in numerical methods programming
A time study in numerical methods programmingA time study in numerical methods programming
A time study in numerical methods programmingGlen Alleman
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
BIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.pptBIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.pptgrssieee
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 

Similar to LogMap: Large-scale, Logic-based and Interactive Ontology Matching (20)

Evaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical OntologiesEvaluating Mapping Repair Systems with Large Biomedical Ontologies
Evaluating Mapping Repair Systems with Large Biomedical Ontologies
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
VRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRP
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
Energy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language ProcessingEnergy-Based Models with Applications to Speech and Language Processing
Energy-Based Models with Applications to Speech and Language Processing
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
A time study in numerical methods programming
A time study in numerical methods programmingA time study in numerical methods programming
A time study in numerical methods programming
 
A time study in numerical methods programming
A time study in numerical methods programmingA time study in numerical methods programming
A time study in numerical methods programming
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
BIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.pptBIOMASS_E2ES_IGARSS2011.ppt
BIOMASS_E2ES_IGARSS2011.ppt
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
modeling.ppt
modeling.pptmodeling.ppt
modeling.ppt
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

LogMap: Large-scale, Logic-based and Interactive Ontology Matching

  • 1. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation LogMap Large-scale, Logic-based and Interactive Ontology Matching Ernesto Jiménez-Ruiz Bernardo Cuenca Grau Yujiao Zhou Ian Horrocks Department of Computer Science, University of Oxford European Conference on Artificial Intelligence (ECAI) 29 August 2012
  • 2. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Outline Preliminaries Challenges LogMap Anatomy Maximising recall Maximising precision Evaluation
  • 3. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Ontologies and OWL (I) Ontologies • Formal representation of the knowledge of a domain. OWL 2 Language • Web Ontology language (OWL) is World Wide Web Consortium (W3C) standard. • OWL 2 corresponds to a decidable fragment of first-order logic.
  • 4. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Ontologies and OWL (II) OWL 2 example axioms • JuvenileArthritis v JuvenileDisease • PolyArthritis ≡ Arthritis u > 5 affects.Joint • Disease u Joint v ⊥ • JuvenileIdiopathicArthritis @ “Juvenile Rheumatoid Arthritis”
  • 5. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Ontology mappings Mappings are tuples he1, e2, n, ρi • e1, e2 are entities in the O1 and O2 • n a confidence value between 0 and 1 • ρ is the semantic relationship between e1 and e2 Formalized as OWL 2 axioms • Where the semantic relationship ρ is one of {≡, v, w, ⊥} • No extra semantics
  • 6. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Outline Preliminaries Challenges LogMap Anatomy Maximising recall Maximising precision Evaluation
  • 7. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Challenges Why ontogy matching tools? • Ontologies are being developed by different groups, and • Use different classifications and naming schemas. • (Biomedical) ontologies may contain tends of thousands of entities. • FMA (78, 989 classes), NCI (66, 724 classes) or SNOMED CT (306, 591 classes) are prominent examples.
  • 8. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Challenges Challenges to be addressed • Sufficient scalability to deal with large ontologies • Detect and repair errors. • Reasoning with OU := O1 ∪ O2 ∪ M may lead to (a large number of) unsatisfiable clases (i.e, OU |= A v ⊥) • Reasoning and repairing OU aggravates scalability problem • Logic-based but scalable techniques • Involve the expert user (if accurate mappings are needed) • Minimise number of requests • Reduce delay between requests
  • 9. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Challenges Challenges to be addressed • Sufficient scalability to deal with large ontologies • Detect and repair errors. • Reasoning with OU := O1 ∪ O2 ∪ M may lead to (a large number of) unsatisfiable clases (i.e, OU |= A v ⊥) • Reasoning and repairing OU aggravates scalability problem • Logic-based but scalable techniques • Involve the expert user (if accurate mappings are needed) • Minimise number of requests • Reduce delay between requests
  • 10. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Challenges Challenges to be addressed • Sufficient scalability to deal with large ontologies • Detect and repair errors. • Reasoning with OU := O1 ∪ O2 ∪ M may lead to (a large number of) unsatisfiable clases (i.e, OU |= A v ⊥) • Reasoning and repairing OU aggravates scalability problem • Logic-based but scalable techniques • Involve the expert user (if accurate mappings are needed) • Minimise number of requests • Reduce delay between requests
  • 11. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Outline Preliminaries Challenges LogMap Anatomy Maximising recall Maximising precision Evaluation
  • 12. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Our approach in a nutshell LogMap . . . • can efficiently match semantically rich ontologies containing tens (and even hundreds) of thousands of classes; • incorporates sophisticated reasoning and repair capabilities; • provides support for user intervention during the matching process.
  • 13. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation LogMap Antomy LogMap can be divided in . . . • Stage 1: maximising recall. • The goal is to reduce search space • and extract an overestimation of the mappings • Stage 2: maximising precision. • The goal is to return a set of (precise) mappings • not leading to many logical inconsistencies.
  • 14. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Outline Preliminaries Challenges LogMap Anatomy Maximising recall Maximising precision Evaluation
  • 15. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Lexical indexation and mapping computation Inverted Files • The ontology lexicon is indexed in an inverted file (IF) • Each entry in the IF is a “set” of words corresponding to exact or partial entity labels Inverted Index for FMA Ids for FMA class URIs Index entry Class ids Class id Class URI (namespace omitted) { acinus } 6953,7661,8171 6953 Mixed acinus { hepatic,acinus } 8171 7661 Serious acinus { acinus,mixed } 6953 8171 Hepatic acinus { serious,acinus } 7661 1170 Branch of common cochlear artery { common,branch,artery } 1170,7842 7842 Branch of common interosseous artery Inverted Index for NCI Ids for NCI class URIs Index entry Class ids Class id Class URI (namespace omitted) { acinus } 18081 18081 Liver acinus { liver } 18081 8087 Common iliac artery branch { acinus,liver } 18081 27727 Common femoral artery branch { common,branch,artery } 1204,8087,27727 1204 Common carotid artery branch
  • 16. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Lexical indexation and mapping computation Intersection of inverted files • As a result we obtain an overestimation of the candidate mappings (M?). • This step condiderably reduces the search space (e.g. 19,151 for FMA-NCI with Recall=0.93 and Precision=0.14) • Note that, most of them will turn out to be incorrect (i.e. Serious acinus ≡ Liver acinus)
  • 17. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Overlapping estimation • LogMap extracts two fragments (O0 1 and O0 2) representing the overlapping between the input ontologies (via M?) • Logic-based modularization techniques are used • Characteristics: • Correct mappings are unlikely to involve classes outside these fragments • The use of fragments is key for the scalability challenge
  • 18. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Outline Preliminaries Challenges LogMap Anatomy Maximising recall Maximising precision Evaluation
  • 19. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Identifying reliable mappings We select Mr ⊆ M? Based on. . . • High lexical similarity (using the string matcher ISUB) • A principle of locality • Correct mappings (C1 ≡ C2) are likely to have similar scopes (classes semantically related) • E.g. 2,281 (out of 19,151) reliable mappings for FMA-NCI (P=0.91)
  • 20. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Identifying reliable mappings FMA:Trapezoid ≡ NCI:Trapezoid (non reliable) vs FMA:Trapezoid ≡ NCI:TrapezoidBone (reliable)
  • 21. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Reasoning with the reliable mappings (Mr ) Detecting an repairing unsatisfiabilities • Mr are tipically very precise but may lead to many unsatifiabilities (> 600 for FMA-NCI) • LogMap implements efficient methods to repair most of them (only misses two cases for FMA-NCI)
  • 22. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Reasoning with the reliable mappings (Mr ) Propositional Horn representation • LogMap encodes the (classified) fragments O0 1 and O0 2 as Propositional Horn theories P0 1 and P0 2 • This is key to LogMap’s scalability Propositional FMA (P1) Propositional NCI (P2) (1) Smegma → Secretion (8) Smegma → ExocrineGlandFluid (2) Secretion → PortionBodySusbstance (9) ExocrineGlandFluid → Anatomy (3) PortionBodySusbstance → AnatomicalEntity (10) CellularSecretion → TransmembraneTransport Computed mappings (PM ) (11) TransmembraneTransport → TransportProcess (m4) FMA:Secretion → NCI:CellularSecretion (12) TransportProcess → BiologicalProcess (m5) NCI:CellularSecretion → FMA:Secretion (13) Anatomy ∧ BiologicalProcess → false (m6) FMA:Smegma → NCI:Smegma (14) ExocrineGlandFluid ∧ ExfolCells → Smegma (m7) NCI:Smegma → FMA:Smegma
  • 23. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Reasoning with the reliable mappings (Mr ) Propositional Horn SAT with Dowling-Gallier (D-G) • LogMap implements the SAT algorithm D-G • D-G is called for every class C and the propositional theory PC : • the rule (true → C); • the propositional representations P0 1 and P0 2 of the input ontologies; and • the propositional representation PM of the mappings.
  • 24. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Reasoning with the reliable mappings (Mr ) Satisfiability of Smegma
  • 25. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Reasoning with reliable mappings (Mr ) Mapping repair • LogMap extends D-G to record conflictive mappings involved in an unsatisfiability (e.g. {m4, m5, m6, m7}). • LogMap implements a ‘greedy’ repair algorithm to compute repairs for each unsatisfiability • LogMap finds all repairs of “smallest” size. • E.g.: R1 = {m4} and R2 = {m6} • The repair with less confidence is selected.
  • 26. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Reasoning with the reliable mappings (Mr ) Our class satisfiability algorithm is . . . • sound • If LogMap finds a class unsatisfiable, it is indeed unsatisfiable. • worst-case linear in the size of the (classified) ontologies. • incomplete, but incompleteness is mitigated: • Most of the relevant non-propositional reasoning is already performed when classifying input ontologies independently • Mappings are Horn propositional axioms • Most new entailments caused by the mappings likely to be computable using Horn propositional reasoning only (only 2 cases missed out of 600 for FMA-NCI)
  • 27. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Assessing M? Mr Semantic Index • LogMap indexes P0 1, P0 2 and the repaired Mr are efficiently indexed using an interval labelling schema. • LogMap efficiently discards mappings in M? Mr that are in conflict with semantics index.
  • 28. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Assessing M? Mr Revision of Confidence values • Co-occurence anaylisis • NCI : Hepatic acinus v FMA : Liver Acinus? • Principle of locality User feedback • Clear-cut mappings in M? Mr are either discarded or included in the output. • The rest are (optionally) given to the expert user.
  • 29. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Assessing M? Mr : user feedback Feedback requests • Many candidate mappings are discarded automatically in the previous steps (more than 16,000 for FMA-NCI). • The number or non clear-cuts may still be high (852 for FMA-NCI) User interaction in LogMap • LogMap performs automatic actions based on user decisions to reduce the number of remaining requests. Criteria: • Ambiguity • Conflicts with semantic index • Delay to compute automatic questions is negligible
  • 30. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Assessing M? Mr : user feedback Feedback requests • Many candidate mappings are discarded automatically in the previous steps (more than 16,000 for FMA-NCI). • The number or non clear-cuts may still be high (852 for FMA-NCI) User interaction in LogMap • LogMap performs automatic actions based on user decisions to reduce the number of remaining requests. Criteria: • Ambiguity • Conflicts with semantic index • Delay to compute automatic questions is negligible
  • 31. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Final Diagnosis Horn porpositional reasoning • We perform a final repair step before returning the output mappings (M). OWL 2 reasoning (optional) • Additionally we (optionally) check how clean is M using an off-the-shelf OWL 2 reasoner
  • 32. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Final Diagnosis Horn porpositional reasoning • We perform a final repair step before returning the output mappings (M). OWL 2 reasoning (optional) • Additionally we (optionally) check how clean is M using an off-the-shelf OWL 2 reasoner
  • 33. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Outline Preliminaries Challenges LogMap Anatomy Maximising recall Maximising precision Evaluation
  • 34. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Evaluation Ontology Alignment Evaluation Campaign (OAEI) http://oaei.ontologymatching.org/ • The OAIE is an annual international campaign for the systematic evaluation of ontology matching systems • LogMap has been one of the top tools in 2011 and 2011.5, and • currently is the unique matching systems to scale to large ontologies and perform reasoning over their integration
  • 35. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Evaluation Matching FMA, NCI and SNOMED with LogMap • OAEI Large BioMed track: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/ Ontologies |MGS | Upper bound M? Reliable Mr Output M |M?| P R |Mr | P R |M| P R ⊥ FMA-NCI 2,898 19,151 0.14 0.93 2,256 0.91 0.71 2,658 0.87 0.80 2 FMA-SNMD 8,111 67,592 0.09 0.74 4,929 0.84 0.51 6,313 0.80 0.62 0 SNMD-NCI 18,322 102,514 0.13 0.75 10,598 0.86 0.50 12,978 0.81 0.58 * • GOMMA can also (successfully) cope with FMA-NCI • P=0.85, R=0.78, F=0.81 • GOMMA mappings lead to > 5,000 unsatisfiable classes • GOMMA matches FMA-NCI in 48min while LogMap in 4min
  • 36. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Evaluation Matching FMA, NCI and SNOMED with LogMap • OAEI Large BioMed track: http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/ Ontologies |MGS | Upper bound M? Reliable Mr Output M |M?| P R |Mr | P R |M| P R ⊥ FMA-NCI 2,898 19,151 0.14 0.93 2,256 0.91 0.71 2,658 0.87 0.80 2 FMA-SNMD 8,111 67,592 0.09 0.74 4,929 0.84 0.51 6,313 0.80 0.62 0 SNMD-NCI 18,322 102,514 0.13 0.75 10,598 0.86 0.50 12,978 0.81 0.58 * • GOMMA can also (successfully) cope with FMA-NCI • P=0.85, R=0.78, F=0.81 • GOMMA mappings lead to > 5,000 unsatisfiable classes • GOMMA matches FMA-NCI in 48min while LogMap in 4min
  • 37. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Evaluation User interaction in LogMap • We have “simulated” the human expert using Gold Standard mappings to return the correct answer with a given probability. • We have matched more than 1,100 medium-sized modules of NCI to (the whole of) FMA • Number of feedback requests is manegeable • LogMap (with not interaction) closely behaves as LogMap with interaction and 30% error rate.
  • 38. Preliminaries Challenges LogMap Max Recall Max Precision Evaluation Evaluation
  • 39. Conclusions and future work • We aim at creating a suitable interface for user interaction • Instance and property matching already included in LogMap but still under development. • We also intend to implement multilingual features • LogMap is available for download: http://www.cs.ox.ac.uk/isg/tools/LogMap/ • It also has a Web interface: http://csu6325.cs.ox.ac.uk/
  • 40. Questions? We want you to. . . • . . . test LogMap and give us feedback • . . . provide us with your ontologies and use cases Thank you for your attention • LogMap Project: http://www.cs.ox.ac.uk/isg/projects/LogMap/ • Web interface: http://csu6325.cs.ox.ac.uk/ • ernesto.jimenez.ruiz@gmail.com