Ontology mapping needs context & approximation

Ontology mapping
needs
context & approximation

Frank van Harmelen
Vrije Universiteit Amsterdam

Or:
 How to make ontology-mapping
less like data-base integration

 and
more like a social conversation 2

Three
Two obvious intuitions
 The Semantic Web needs
ontology mapping
 Ontology mapping needs
background knowledge

 Ontology mapping needs approximation

3

Which Semantic Web?
 Version 1:
"Semantic Web as Web of Data" (TBL)

 recipe:
expose databases on the web,
use RDF, integrate
 meta-data from:
q expressing DB schema semantics
in machine interpretable ways
 enable integration and unexpected re-use
4

Which Semantic Web?
 Version 2:
“Enrichment of the current Web”

 recipe:
Annotate, classify, index
 meta-data from:
q automatically producing markup:
named-entity recognition,
concept extraction, tagging, etc.
 enable personalisation, search, browse,..
5

Which Semantic Web?
 Version 1:
“Semantic Web as Web of Data”

 Version 2:

 Different use-cases data-oriented
 Different techniques
 Different users
user-oriented
6

Which Semantic Web?
 Version 1:
“Semantic Web as Web of Data”

 Version 2:

 But both need ontologies
for semantic agreement
between sources

between source & user 7


Ontology research is
almost done..
 we know what they are
“consensual, formalised models of a domain”
 we know how to make and maintain them
(methods, tools, experience)
 we know how to deploy them
(search, personalisation, data-integration, …)

Main remaining open questions
 Automatic construction (learning)
 Automatic mapping (integration)
8

Three obvious intuitions
 The Semantic Web needs ontology mapping
?
Ph.D. student = AIO


young ?
researcher ≈ post-doc

9

This work with
Zharko Aleksovski &
Michel Klein

Does context knowledge help
mapping?

The general idea
background
knowledge

anchoring anchoring
inference

source target
mapping

12

a realistic example
 Two Amsterdam hospitals (OLVG, AMC)
 Two Intensive Care Units, different vocab’s
 Want to compare quality of care
 OLVG-1400:
q 1400 terms in a flat list
q used in the first 24 hour of stay
q some implicit hierarchy e.g.6 types of Diabetes
Mellitus)
q some reduncy (spelling mistakes)
 AMC: similar list, but from different hospital

13

Context ontology used
DICE:
q 2500 concepts (5000 terms), 4500 links
q Formalised in DL
q five main categories:
• tractus (e.g. nervous_system, respiratory_system)
• aetiology (e.g. virus, poising)
• abnormality (e.g. fracture, tumor)
• action (e.g. biopsy, observation, removal)
• anatomic_location (e.g. lungs, skin)
14

Baseline: Linguistic methods
 Combine lexical analysis with hierarchical structure

 313 suggested matches, around 70 % correct
 209 suggested matches, around 90 % correct

 High precision, low recall (“the easy cases”)

15

Now use background knowledge
DICE
(2500 concepts,
4500 links)

anchoring anchoring
inference

OLVG AMC
(1400, flat) (1400, flat)
mapping

16

Example found with context
knowledge (beyond lexical)

17

Anchoring strength
 Anchoring = substring + trivial morphology

anchored on N aspects OLVG AMC
N=5 0 2
N=4 0 198
N=3 4 711
N=2 144 285
N=1 401 208
total nr. of anchored terms 549 39% 1404 96%
total nr. of anchorings 1298 5816

19

Results
Example matchings discovered
q OLVG: Acute respiratory failure
AMC: Asthma cardiale
q OLVG: Aspergillus fumigatus
AMC: Aspergilloom
q OLVG: duodenum perforation
AMC: Gut perforation
q OLVG: HIV
AMC: AIDS
q OLVG: Aorta thoracalis dissectie type B
AMC: Dissection of artery 20

Experimental results
 Source & target =
flat lists of ±1400 ICU terms each
 Background = DICE (2300 concepts in DL)
 Manual Gold Standard (n=200)

21

Does more context
knowledge help?

Adding more context
Only lexical
DICE (2500 concepts)
MeSH (22000 concepts)
ICD-10 (11000 concepts)
 Anchoring strength:
DICE MeSH ICD10
4 aspects 0 8 0
3 aspects 0 89 0
2 aspects 135 201 0
1 aspect 413 694 80
total 548 992 80 23

Results with multiple ontologies
Separate Lexical ICD-10 DICE MeSH
Recall 64% 64% 76% 88%
Precision 95% 95% 94% 89%

Joint 100
90
 Monotonic improvement 80
70
 Independent of order 60
50
 Linear increase of cost 40
30
20
10
0
Lexical ICD-10 DICE MeSH
24

does structured context
knowledge help?

Exploiting structure
 CRISP: 700 concepts, broader-than
 MeSH: 1475 concepts, broader-than
 FMA: 75.000 concepts, 160 relation-types
(we used: is-a & part-of)

FMA
(75.000)

anchoring anchoring
inference

CRISP MeSH
(738) (1475)
mapping
26

Using the structure or not ?
 (S <a B) & (B < B’) & (B’ <a T) ! (S <i T)

a a

i
27

Using the structure or not ?
 (S <a B) & (B < B’) & (B’ <a T) ! (S <i T)

No use of structure
Only stated is-a & part-of
Transitive chains of is-a, and
transitive chains of part-of
Transitive chains of is-a and part-of
One chain of part-of before
one chain of is-a 28

Matching results (CRISP to MeSH)
(Golden Standard n=30)

Recall = · ¸ total incr.
Exp.1:Direct 448 417 156 1021 -
Exp.2:Indir. is-a + part-of 395 516 405 1316 29%
Exp.3:Indir. separate closures 395 933 1402 2730 167%
Exp.4:Indir. mixed closures 395 1511 2228 4143 306%
Exp.5:Indir. part-of before is-a 395 972 1800 3167 210%

Precision = · ¸ total correct
Exp.1:Direct 17 18 3 38 100%
Exp.4:Indir. mixed closures 14 39 59 112 94%
Exp.5:Indir. part-of before is-a 14 37 50 101 100%
31

 The Semantic Web needs ontology mapping

approximation
young ?
researcher ≈ post-doc

32

This work with
Zharko Aleksovski
Risto Gligorov
Warner ten Kate

Approximating subsumptions
(and hence mappings)
 query: A v B ?

 B = B1 u B2 u B3 A v B1, A v B2, A v B3 ?

B2
B
B1 A B3

34

Approximating subsumptions
bi lity
 Use “Google distance” to decide whichba
subproblems are reasonable al pro to focus on
 Google distancendit
ion B3
e 2u
co ce f ( x),stanfc y )} −u B f ( x, y )
NGD( xt, ryc = max{log c di log o B1 log
i ) en (

wherey
me ccurr log anti min{logt f ( x), log f ( y )}
m o M− n”
s o - em ibutio
≈ f(x)cis the number ntr Google hits for x
fs
of at e o “co of
ti is te of
f(x,y)m the number of Google hits for
es
≈ theatuple of search items x and y
stim
≈ e
M is the number of web pages indexed by Google
35

Google distance

HIDDEN

36

Google distance

animal plant

sheep cow vegeterian

madcow

37

Google for sloppy matching
 Algorithm for A vB (B=B1 u B2 u B3)

 determine NGD(B, Bi)=σ i, i=1,2,3
 incrementally:
• increase sloppyness threshold σ
• allow to ignore A vBi with Σ σ i · σ

 match if remaining A v Bj hold
38

Properties of sloppy matching
 When sloppyness threshold σ goes up,
set of matches grows monotonically
 σ=0: classical matching
 σ=1: trivial matching

 Ideally: compute σ i such that:
q desirable matches

become true at low σ
?
q undesirable matches

become true only at high σ 39

Experiments in music
domain

CDNow (Amazon.com)
Size: 2410 classes ArtistGigs
Depth: 5 levels Size: 382 classes
Depth: 4 levels
Artist Direct Network
Size: 465 classes CD baby
Depth: 2 levels very sloppy terms Size: 222 classes
Depth: 2 levels
 good
All Music Guide Yahoo
Size: 403 classes Size: 96 classes
Depth: 3 levels Depth: 2 levels
MusicMoz
Size: 1073 classes
Depth: 7 levels
40

Experiment
Manual Gold Standard, N=50, random pairs

σ =0.53
97

60 σ =0.5
precision

classical
random
NGD
16-05-2006
20 recall 7

 The Semantic Web needs
ontology mapping



43

So that
 shared context & approximation
make ontology-mapping
a bit more like a social conversation

44

Future: Distributed/P2P setting

background
knowledge

anchoring anchoring
inference

source target
mapping
45

Vragen & discussie

Frank.van.Harmelen@cs.vu.nl
http://www.cs.vu.nl/~frankh

46

Ontology mapping needs context & approximation

Recommended

Recommended

More Related Content

Similar to Ontology mapping needs context & approximation

Similar to Ontology mapping needs context & approximation (20)

More from Frank van Harmelen

More from Frank van Harmelen (20)

Recently uploaded

Recently uploaded (20)

Ontology mapping needs context & approximation