SlideShare a Scribd company logo
1 of 46
Ontology mapping
         needs
context & approximation

        Frank van Harmelen
   Vrije Universiteit Amsterdam
Or:
 How to make ontology-mapping
  less like data-base integration




 and
  more like a social conversation   2
Three
Two obvious intuitions
 The Semantic Web needs
  ontology mapping
 Ontology mapping needs
  background knowledge


 Ontology mapping needs approximation

                                         3
Which Semantic Web?
 Version 1:
  "Semantic Web as Web of Data" (TBL)


 recipe:
  expose databases on the web,
  use RDF, integrate
 meta-data from:
  q   expressing DB schema semantics
      in machine interpretable ways
 enable integration and unexpected re-use
                                        4
Which Semantic Web?
 Version 2:
  “Enrichment of the current Web”

 recipe:
  Annotate, classify, index
 meta-data from:
  q   automatically producing markup:
      named-entity recognition,
      concept extraction, tagging, etc.
 enable personalisation, search, browse,..
                                          5
Which Semantic Web?
 Version 1:
  “Semantic Web as Web of Data”

 Version 2:
  “Enrichment of the current Web”

 Different use-cases        data-oriented
 Different techniques
 Different users
                            user-oriented
                                        6
Which Semantic Web?
 Version 1:
  “Semantic Web as Web of Data”

 Version 2:
  “Enrichment of the current Web”

 But both need ontologies
    for semantic agreement
            between sources

           between source & user    7

Ontology research is
almost done..
 we know what they are
  “consensual, formalised models of a domain”
 we know how to make and maintain them
  (methods, tools, experience)
 we know how to deploy them
  (search, personalisation, data-integration, …)

Main remaining open questions
 Automatic construction (learning)
 Automatic mapping (integration)
                                                   8
Three obvious intuitions
 The Semantic Web needs ontology mapping
 Ontology mapping needs
  background knowledge
                     ?
     Ph.D. student   =        AIO

 Ontology mapping needs approximation

      young          ?
    researcher       ≈     post-doc

                                         9
This work with
Zharko Aleksovski &
   Michel Klein
Does context knowledge help
mapping?
The general idea
              background
               knowledge



  anchoring                     anchoring
                    inference

    source                       target
               mapping


                                            12
a realistic example
 Two Amsterdam hospitals (OLVG, AMC)
 Two Intensive Care Units, different vocab’s
 Want to compare quality of care
 OLVG-1400:
   q   1400 terms in a flat list
   q   used in the first 24 hour of stay
   q   some implicit hierarchy e.g.6 types of Diabetes
       Mellitus)
   q   some reduncy (spelling mistakes)
 AMC: similar list, but from different hospital

                                                         13
Context ontology used
DICE:
 q   2500 concepts (5000 terms), 4500 links
 q   Formalised in DL
 q   five main categories:
     • tractus (e.g. nervous_system, respiratory_system)
     • aetiology (e.g. virus, poising)
     • abnormality (e.g. fracture, tumor)
     • action (e.g. biopsy, observation, removal)
     • anatomic_location (e.g. lungs, skin)
                                                      14
Baseline: Linguistic methods
 Combine lexical analysis with hierarchical structure

 313 suggested matches, around 70 % correct
 209 suggested matches, around 90 % correct

 High precision, low recall (“the easy cases”)




                                                  15
Now use background knowledge
                      DICE
                  (2500 concepts,
                     4500 links)



  anchoring                            anchoring
                           inference

    OLVG                                 AMC
   (1400, flat)                        (1400, flat)
                    mapping


                                                      16
Example found with context
knowledge (beyond lexical)




                             17
Example 2




            18
Anchoring strength
 Anchoring = substring + trivial morphology

  anchored on N aspects           OLVG     AMC
  N=5                              0        2
  N=4                              0      198
  N=3                              4      711
  N=2                            144      285
  N=1                            401      208
  total nr. of anchored terms    549 39% 1404 96%
  total nr. of anchorings       1298     5816

                                                 19
Results
Example matchings discovered
 q   OLVG: Acute respiratory failure
     AMC: Asthma cardiale
 q   OLVG: Aspergillus fumigatus
     AMC: Aspergilloom
 q   OLVG: duodenum perforation
     AMC: Gut perforation
 q   OLVG: HIV
     AMC: AIDS
 q   OLVG: Aorta thoracalis dissectie type B
     AMC: Dissection of artery                 20
Experimental results
 Source & target =
  flat lists of ±1400 ICU terms each
 Background = DICE (2300 concepts in DL)
 Manual Gold Standard (n=200)




                                            21
Does more context
knowledge help?
Adding more context
     Only lexical
     DICE (2500 concepts)
     MeSH (22000 concepts)
     ICD-10 (11000 concepts)
 Anchoring strength:
                       DICE    MeSH ICD10
           4 aspects       0       8    0
           3 aspects       0      89    0
           2 aspects     135     201    0
           1 aspect      413     694   80
           total         548     992   80   23
Results with multiple ontologies
   Separate    Lexical ICD-10          DICE MeSH
   Recall         64%    64%            76%  88%
   Precision      95%    95%            94%  89%

   Joint                    100
                             90
 Monotonic improvement      80
                             70
 Independent of order       60
                             50
 Linear increase of cost    40
                             30
                             20
                             10
                              0
                             Lexical   ICD-10   DICE        MeSH
                                                       24
does structured context
knowledge help?
Exploiting structure
 CRISP: 700 concepts, broader-than
 MeSH: 1475 concepts, broader-than
 FMA: 75.000 concepts, 160 relation-types
  (we used: is-a & part-of)

                         FMA
                       (75.000)


           anchoring                    anchoring
                            inference

             CRISP                       MeSH
              (738)                      (1475)
                       mapping
                                                    26
Using the structure or not ?
 (S <a B) & (B < B’) & (B’ <a T) ! (S <i T)




          a                   a


                     i
                                        27
Using the structure or not ?
 (S <a B) & (B < B’) & (B’ <a T) ! (S <i T)


No use of structure
 Only stated is-a & part-of
 Transitive chains of is-a, and
  transitive chains of part-of
 Transitive chains of is-a and part-of
 One chain of part-of before
  one chain of is-a                      28
Examples




           29
Examples




           30
Matching results (CRISP to MeSH)
   (Golden Standard n=30)

Recall                              =   ·    ¸     total   incr.
Exp.1:Direct                       448 417 156     1021        -
Exp.2:Indir. is-a + part-of        395 516 405     1316     29%
Exp.3:Indir. separate closures     395 933 1402    2730    167%
Exp.4:Indir. mixed closures        395 1511 2228   4143    306%
Exp.5:Indir. part-of before is-a   395 972 1800    3167    210%

Precision                          =    ·    ¸     total correct
Exp.1:Direct                       17   18     3     38    100%
Exp.4:Indir. mixed closures        14   39    59    112     94%
Exp.5:Indir. part-of before is-a   14   37    50    101    100%
                                                           31
Three obvious intuitions
 The Semantic Web needs ontology mapping
 Ontology mapping needs
  background knowledge



 Ontology mapping needs
  approximation
      young        ?
    researcher     ≈       post-doc

                                      32
This work with
Zharko Aleksovski
  Risto Gligorov
 Warner ten Kate
Approximating subsumptions
  (and hence mappings)
 query: A v B ?

 B = B1 u B2 u B3 A v B1, A v B2, A v B3 ?

                       B2
                       B
        B1         A          B3



                                          34
Approximating subsumptions
                                        bi lity
 Use “Google distance” to decide whichba
  subproblems are reasonable    al pro to focus on
 Google distancendit
                            ion                           B3
                                         e         2u
                     co ce f ( x),stanfc y )} −u B f ( x, y )
       NGD( xt,  ryc = max{log c di log o B1 log
                  i ) en                (

wherey
          me ccurr log anti min{logt f ( x), log f ( y )}
        m o                 M−       n”
    s       o -          em ibutio
 ≈ f(x)cis the number ntr Google hits for x
                      fs
       of       at e o “co of
           ti is te of
      f(x,y)m the number of Google hits for
        es
    ≈         theatuple of search items x and y
            stim
       ≈  e
      M is the number of web pages indexed by Google
                                                        35
Google distance



           HIDDEN




                    36
Google distance


             animal         plant

     sheep    cow     vegeterian

             madcow


                                    37
Google for sloppy matching
 Algorithm for A vB       (B=B1 u B2 u B3)


   determine NGD(B, Bi)=σ i, i=1,2,3
   incrementally:
    • increase sloppyness threshold σ
    • allow to ignore A vBi with Σ σ i · σ

   match if remaining A v Bj hold
                                             38
Properties of sloppy matching
 When sloppyness threshold σ goes up,
  set of matches grows monotonically
 σ=0: classical matching
 σ=1: trivial matching

 Ideally: compute σ i such that:
   q desirable matches

                become true at low σ
                                          ?
   q undesirable matches

                become true only at high σ 39
Experiments in music
domain

                        CDNow (Amazon.com)
                         Size: 2410 classes          ArtistGigs
                            Depth: 5 levels         Size: 382 classes
                                                     Depth: 4 levels
Artist Direct Network
  Size: 465 classes                                                       CD baby
   Depth: 2 levels       very sloppy terms                           Size: 222 classes
                                                                      Depth: 2 levels
                               good
 All Music Guide                                                  Yahoo
 Size: 403 classes                                            Size: 96 classes
  Depth: 3 levels                                             Depth: 2 levels
                                  MusicMoz
                               Size: 1073 classes
                                 Depth: 7 levels
                                                                                40
Experiment
  Manual Gold Standard, N=50, random pairs

                           σ =0.53
  97


  60                       σ =0.5
 precision




                                     classical
                                     random
                                     NGD
16-05-2006
             20   recall                         7
wrapping up
Three obvious intuitions
 The Semantic Web needs
  ontology mapping

 Ontology mapping needs
  background knowledge

 Ontology mapping needs approximation


                                         43
So that
 shared context & approximation
  make ontology-mapping
  a bit more like a social conversation




                                          44
Future: Distributed/P2P setting


              background
               knowledge



  anchoring                     anchoring
                    inference

    source                       target
               mapping
                                            45
Vragen & discussie

 Frank.van.Harmelen@cs.vu.nl
 http://www.cs.vu.nl/~frankh




                               46

More Related Content

Similar to Ontology mapping needs context & approximation

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
ETALIS_RuleML_2011_Retractions
ETALIS_RuleML_2011_RetractionsETALIS_RuleML_2011_Retractions
ETALIS_RuleML_2011_RetractionsDarko Anicic
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Pirouz Nourian
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yuzukun
 
Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic ProgrammingRaphael Reitzig
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemIosif Itkin
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...Alexander Litvinenko
 
10.1.1.70.8789
10.1.1.70.878910.1.1.70.8789
10.1.1.70.8789Hoài Bùi
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Konstantinos Giannakis
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Chris Rackauckas
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingErnesto Jimenez Ruiz
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Universitat Politècnica de Catalunya
 

Similar to Ontology mapping needs context & approximation (20)

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
ETALIS_RuleML_2011_Retractions
ETALIS_RuleML_2011_RetractionsETALIS_RuleML_2011_Retractions
ETALIS_RuleML_2011_Retractions
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Spatio-temporal reasoning for traffic scene understanding
Spatio-temporal reasoning for traffic scene understandingSpatio-temporal reasoning for traffic scene understanding
Spatio-temporal reasoning for traffic scene understanding
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic Programming
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 
presentation
presentationpresentation
presentation
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
 
ETALIS at RR 2010
ETALIS at RR 2010ETALIS at RR 2010
ETALIS at RR 2010
 
10.1.1.70.8789
10.1.1.70.878910.1.1.70.8789
10.1.1.70.8789
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Simulation Software Performances And Examples
Simulation Software Performances And ExamplesSimulation Software Performances And Examples
Simulation Software Performances And Examples
 
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology Matching
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
 

More from Frank van Harmelen

The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"Frank van Harmelen
 
Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Frank van Harmelen
 
Modular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyModular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyFrank van Harmelen
 
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Frank van Harmelen
 
Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Frank van Harmelen
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationFrank van Harmelen
 
The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)Frank van Harmelen
 
On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...Frank van Harmelen
 
The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)Frank van Harmelen
 
Linked Open Data for Medical Guidelines Interactions
Linked Open Data for Medical  Guidelines InteractionsLinked Open Data for Medical  Guidelines Interactions
Linked Open Data for Medical Guidelines InteractionsFrank van Harmelen
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Semantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years agoSemantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years agoFrank van Harmelen
 
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Frank van Harmelen
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural scienceFrank van Harmelen
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic WebFrank van Harmelen
 

More from Frank van Harmelen (20)

The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"The K in "neuro-symbolic" stands for "knowledge"
The K in "neuro-symbolic" stands for "knowledge"
 
Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)Adoption of Knowledge Graphs, mid 2022 (incomplete)
Adoption of Knowledge Graphs, mid 2022 (incomplete)
 
Modular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyModular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxology
 
Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019Adoption of Knowledge Graphs, late 2019
Adoption of Knowledge Graphs, late 2019
 
Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019Adoption of Knowledge Graphs, mid 2019
Adoption of Knowledge Graphs, mid 2019
 
Empirical Semantics
Empirical SemanticsEmpirical Semantics
Empirical Semantics
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)The end of the scientific paper as we know it (or not...)
The end of the scientific paper as we know it (or not...)
 
On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...On the nature of AI, and the relation between symbolic and statistical approa...
On the nature of AI, and the relation between symbolic and statistical approa...
 
The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)The end of the scientific paper as we know it (in 4 easy steps)
The end of the scientific paper as we know it (in 4 easy steps)
 
Linked Open Data for Medical Guidelines Interactions
Linked Open Data for Medical  Guidelines InteractionsLinked Open Data for Medical  Guidelines Interactions
Linked Open Data for Medical Guidelines Interactions
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Semantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years agoSemantic Web questions we couldn't ask 10 years ago
Semantic Web questions we couldn't ask 10 years ago
 
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
Knowledge Engineering rediscovered, Towards Reasoning Patterns for the Semant...
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web4 Popular Fallacies about the Semantic Web
4 Popular Fallacies about the Semantic Web
 
WCIT2010
WCIT2010WCIT2010
WCIT2010
 
Het slimme Web 3.0
Het slimme Web 3.0Het slimme Web 3.0
Het slimme Web 3.0
 
OWL briefing
OWL briefingOWL briefing
OWL briefing
 

Recently uploaded

Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 

Recently uploaded (20)

Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 

Ontology mapping needs context & approximation

  • 1. Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam
  • 2. Or:  How to make ontology-mapping less like data-base integration  and more like a social conversation 2
  • 3. Three Two obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge  Ontology mapping needs approximation 3
  • 4. Which Semantic Web?  Version 1: "Semantic Web as Web of Data" (TBL)  recipe: expose databases on the web, use RDF, integrate  meta-data from: q expressing DB schema semantics in machine interpretable ways  enable integration and unexpected re-use 4
  • 5. Which Semantic Web?  Version 2: “Enrichment of the current Web”  recipe: Annotate, classify, index  meta-data from: q automatically producing markup: named-entity recognition, concept extraction, tagging, etc.  enable personalisation, search, browse,.. 5
  • 6. Which Semantic Web?  Version 1: “Semantic Web as Web of Data”  Version 2: “Enrichment of the current Web”  Different use-cases data-oriented  Different techniques  Different users user-oriented 6
  • 7. Which Semantic Web?  Version 1: “Semantic Web as Web of Data”  Version 2: “Enrichment of the current Web”  But both need ontologies for semantic agreement between sources between source & user 7
  • 8.  Ontology research is almost done..  we know what they are “consensual, formalised models of a domain”  we know how to make and maintain them (methods, tools, experience)  we know how to deploy them (search, personalisation, data-integration, …) Main remaining open questions  Automatic construction (learning)  Automatic mapping (integration) 8
  • 9. Three obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge ? Ph.D. student = AIO  Ontology mapping needs approximation young ? researcher ≈ post-doc 9
  • 10. This work with Zharko Aleksovski & Michel Klein
  • 11. Does context knowledge help mapping?
  • 12. The general idea background knowledge anchoring anchoring inference source target mapping 12
  • 13. a realistic example  Two Amsterdam hospitals (OLVG, AMC)  Two Intensive Care Units, different vocab’s  Want to compare quality of care  OLVG-1400: q 1400 terms in a flat list q used in the first 24 hour of stay q some implicit hierarchy e.g.6 types of Diabetes Mellitus) q some reduncy (spelling mistakes)  AMC: similar list, but from different hospital 13
  • 14. Context ontology used DICE: q 2500 concepts (5000 terms), 4500 links q Formalised in DL q five main categories: • tractus (e.g. nervous_system, respiratory_system) • aetiology (e.g. virus, poising) • abnormality (e.g. fracture, tumor) • action (e.g. biopsy, observation, removal) • anatomic_location (e.g. lungs, skin) 14
  • 15. Baseline: Linguistic methods  Combine lexical analysis with hierarchical structure  313 suggested matches, around 70 % correct  209 suggested matches, around 90 % correct  High precision, low recall (“the easy cases”) 15
  • 16. Now use background knowledge DICE (2500 concepts, 4500 links) anchoring anchoring inference OLVG AMC (1400, flat) (1400, flat) mapping 16
  • 17. Example found with context knowledge (beyond lexical) 17
  • 18. Example 2 18
  • 19. Anchoring strength  Anchoring = substring + trivial morphology anchored on N aspects OLVG AMC N=5 0 2 N=4 0 198 N=3 4 711 N=2 144 285 N=1 401 208 total nr. of anchored terms 549 39% 1404 96% total nr. of anchorings 1298 5816 19
  • 20. Results Example matchings discovered q OLVG: Acute respiratory failure AMC: Asthma cardiale q OLVG: Aspergillus fumigatus AMC: Aspergilloom q OLVG: duodenum perforation AMC: Gut perforation q OLVG: HIV AMC: AIDS q OLVG: Aorta thoracalis dissectie type B AMC: Dissection of artery 20
  • 21. Experimental results  Source & target = flat lists of ±1400 ICU terms each  Background = DICE (2300 concepts in DL)  Manual Gold Standard (n=200) 21
  • 23. Adding more context Only lexical DICE (2500 concepts) MeSH (22000 concepts) ICD-10 (11000 concepts)  Anchoring strength: DICE MeSH ICD10 4 aspects 0 8 0 3 aspects 0 89 0 2 aspects 135 201 0 1 aspect 413 694 80 total 548 992 80 23
  • 24. Results with multiple ontologies Separate Lexical ICD-10 DICE MeSH Recall 64% 64% 76% 88% Precision 95% 95% 94% 89% Joint 100 90  Monotonic improvement 80 70  Independent of order 60 50  Linear increase of cost 40 30 20 10 0 Lexical ICD-10 DICE MeSH 24
  • 26. Exploiting structure  CRISP: 700 concepts, broader-than  MeSH: 1475 concepts, broader-than  FMA: 75.000 concepts, 160 relation-types (we used: is-a & part-of) FMA (75.000) anchoring anchoring inference CRISP MeSH (738) (1475) mapping 26
  • 27. Using the structure or not ?  (S <a B) & (B < B’) & (B’ <a T) ! (S <i T) a a i 27
  • 28. Using the structure or not ?  (S <a B) & (B < B’) & (B’ <a T) ! (S <i T) No use of structure Only stated is-a & part-of Transitive chains of is-a, and transitive chains of part-of Transitive chains of is-a and part-of One chain of part-of before one chain of is-a 28
  • 29. Examples 29
  • 30. Examples 30
  • 31. Matching results (CRISP to MeSH) (Golden Standard n=30) Recall = · ¸ total incr. Exp.1:Direct 448 417 156 1021 - Exp.2:Indir. is-a + part-of 395 516 405 1316 29% Exp.3:Indir. separate closures 395 933 1402 2730 167% Exp.4:Indir. mixed closures 395 1511 2228 4143 306% Exp.5:Indir. part-of before is-a 395 972 1800 3167 210% Precision = · ¸ total correct Exp.1:Direct 17 18 3 38 100% Exp.4:Indir. mixed closures 14 39 59 112 94% Exp.5:Indir. part-of before is-a 14 37 50 101 100% 31
  • 32. Three obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge  Ontology mapping needs approximation young ? researcher ≈ post-doc 32
  • 33. This work with Zharko Aleksovski Risto Gligorov Warner ten Kate
  • 34. Approximating subsumptions (and hence mappings)  query: A v B ?  B = B1 u B2 u B3 A v B1, A v B2, A v B3 ? B2 B B1 A B3 34
  • 35. Approximating subsumptions bi lity  Use “Google distance” to decide whichba subproblems are reasonable al pro to focus on  Google distancendit ion B3 e 2u co ce f ( x),stanfc y )} −u B f ( x, y ) NGD( xt, ryc = max{log c di log o B1 log i ) en ( wherey me ccurr log anti min{logt f ( x), log f ( y )} m o M− n” s o - em ibutio ≈ f(x)cis the number ntr Google hits for x fs of at e o “co of ti is te of f(x,y)m the number of Google hits for es ≈ theatuple of search items x and y stim ≈ e M is the number of web pages indexed by Google 35
  • 36. Google distance HIDDEN 36
  • 37. Google distance animal plant sheep cow vegeterian madcow 37
  • 38. Google for sloppy matching  Algorithm for A vB (B=B1 u B2 u B3)  determine NGD(B, Bi)=σ i, i=1,2,3  incrementally: • increase sloppyness threshold σ • allow to ignore A vBi with Σ σ i · σ  match if remaining A v Bj hold 38
  • 39. Properties of sloppy matching  When sloppyness threshold σ goes up, set of matches grows monotonically  σ=0: classical matching  σ=1: trivial matching  Ideally: compute σ i such that: q desirable matches become true at low σ ? q undesirable matches become true only at high σ 39
  • 40. Experiments in music domain CDNow (Amazon.com) Size: 2410 classes ArtistGigs Depth: 5 levels Size: 382 classes Depth: 4 levels Artist Direct Network Size: 465 classes CD baby Depth: 2 levels very sloppy terms Size: 222 classes Depth: 2 levels  good All Music Guide Yahoo Size: 403 classes Size: 96 classes Depth: 3 levels Depth: 2 levels MusicMoz Size: 1073 classes Depth: 7 levels 40
  • 41. Experiment Manual Gold Standard, N=50, random pairs σ =0.53 97 60 σ =0.5 precision classical random NGD 16-05-2006 20 recall 7
  • 43. Three obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge  Ontology mapping needs approximation 43
  • 44. So that  shared context & approximation make ontology-mapping a bit more like a social conversation 44
  • 45. Future: Distributed/P2P setting background knowledge anchoring anchoring inference source target mapping 45
  • 46. Vragen & discussie Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh 46