Ontology mapping requires context, background knowledge, and approximation. Using background knowledge from multiple large ontologies can improve ontology mapping results between two target ontologies by discovering more matches. Exploiting the hierarchical structure in background ontologies through indirect subsumption reasoning can significantly increase the number of matches found. Allowing for approximate matches by introducing a "sloppiness" threshold based on the semantic distance between concepts can further improve results by discovering desirable matches while avoiding undesirable ones except at high sloppiness levels.
1. Ontology mapping
needs
context & approximation
Frank van Harmelen
Vrije Universiteit Amsterdam
2. Or:
How to make ontology-mapping
less like data-base integration
and
more like a social conversation 2
3. Three
Two obvious intuitions
The Semantic Web needs
ontology mapping
Ontology mapping needs
background knowledge
Ontology mapping needs approximation
3
4. Which Semantic Web?
Version 1:
"Semantic Web as Web of Data" (TBL)
recipe:
expose databases on the web,
use RDF, integrate
meta-data from:
q expressing DB schema semantics
in machine interpretable ways
enable integration and unexpected re-use
4
5. Which Semantic Web?
Version 2:
“Enrichment of the current Web”
recipe:
Annotate, classify, index
meta-data from:
q automatically producing markup:
named-entity recognition,
concept extraction, tagging, etc.
enable personalisation, search, browse,..
5
6. Which Semantic Web?
Version 1:
“Semantic Web as Web of Data”
Version 2:
“Enrichment of the current Web”
Different use-cases data-oriented
Different techniques
Different users
user-oriented
6
7. Which Semantic Web?
Version 1:
“Semantic Web as Web of Data”
Version 2:
“Enrichment of the current Web”
But both need ontologies
for semantic agreement
between sources
between source & user 7
8.
Ontology research is
almost done..
we know what they are
“consensual, formalised models of a domain”
we know how to make and maintain them
(methods, tools, experience)
we know how to deploy them
(search, personalisation, data-integration, …)
Main remaining open questions
Automatic construction (learning)
Automatic mapping (integration)
8
9. Three obvious intuitions
The Semantic Web needs ontology mapping
Ontology mapping needs
background knowledge
?
Ph.D. student = AIO
Ontology mapping needs approximation
young ?
researcher ≈ post-doc
9
12. The general idea
background
knowledge
anchoring anchoring
inference
source target
mapping
12
13. a realistic example
Two Amsterdam hospitals (OLVG, AMC)
Two Intensive Care Units, different vocab’s
Want to compare quality of care
OLVG-1400:
q 1400 terms in a flat list
q used in the first 24 hour of stay
q some implicit hierarchy e.g.6 types of Diabetes
Mellitus)
q some reduncy (spelling mistakes)
AMC: similar list, but from different hospital
13
27. Using the structure or not ?
(S <a B) & (B < B’) & (B’ <a T) ! (S <i T)
a a
i
27
28. Using the structure or not ?
(S <a B) & (B < B’) & (B’ <a T) ! (S <i T)
No use of structure
Only stated is-a & part-of
Transitive chains of is-a, and
transitive chains of part-of
Transitive chains of is-a and part-of
One chain of part-of before
one chain of is-a 28
34. Approximating subsumptions
(and hence mappings)
query: A v B ?
B = B1 u B2 u B3 A v B1, A v B2, A v B3 ?
B2
B
B1 A B3
34
35. Approximating subsumptions
bi lity
Use “Google distance” to decide whichba
subproblems are reasonable al pro to focus on
Google distancendit
ion B3
e 2u
co ce f ( x),stanfc y )} −u B f ( x, y )
NGD( xt, ryc = max{log c di log o B1 log
i ) en (
wherey
me ccurr log anti min{logt f ( x), log f ( y )}
m o M− n”
s o - em ibutio
≈ f(x)cis the number ntr Google hits for x
fs
of at e o “co of
ti is te of
f(x,y)m the number of Google hits for
es
≈ theatuple of search items x and y
stim
≈ e
M is the number of web pages indexed by Google
35
38. Google for sloppy matching
Algorithm for A vB (B=B1 u B2 u B3)
determine NGD(B, Bi)=σ i, i=1,2,3
incrementally:
• increase sloppyness threshold σ
• allow to ignore A vBi with Σ σ i · σ
match if remaining A v Bj hold
38
39. Properties of sloppy matching
When sloppyness threshold σ goes up,
set of matches grows monotonically
σ=0: classical matching
σ=1: trivial matching
Ideally: compute σ i such that:
q desirable matches
become true at low σ
?
q undesirable matches
become true only at high σ 39
40. Experiments in music
domain
CDNow (Amazon.com)
Size: 2410 classes ArtistGigs
Depth: 5 levels Size: 382 classes
Depth: 4 levels
Artist Direct Network
Size: 465 classes CD baby
Depth: 2 levels very sloppy terms Size: 222 classes
Depth: 2 levels
good
All Music Guide Yahoo
Size: 403 classes Size: 96 classes
Depth: 3 levels Depth: 2 levels
MusicMoz
Size: 1073 classes
Depth: 7 levels
40