Ontology mapping
needs
context & approximation
Frank van Harmelen
Vrije Universiteit Amsterdam
Or:
How to make ontology-mapping
less like data-base integration
and
more like a social conversation 2
Three
Two obvious intuitions
The Semantic Web needs
ontology mapping
Ontology mapping needs
background knowledge
Ontology mapping needs approximation
3
Which Semantic Web?
Version 1:
"Semantic Web as Web of Data" (TBL)
recipe:
expose databases on the web,
use RDF, integrate
meta-data from:
q expressing DB schema semantics
in machine interpretable ways
enable integration and unexpected re-use
4
Which Semantic Web?
Version 2:
“Enrichment of the current Web”
recipe:
Annotate, classify, index
meta-data from:
q automatically producing markup:
named-entity recognition,
concept extraction, tagging, etc.
enable personalisation, search, browse,..
5
Which Semantic Web?
Version 1:
“Semantic Web as Web of Data”
Version 2:
“Enrichment of the current Web”
Different use-cases data-oriented
Different techniques
Different users
user-oriented
6
Which Semantic Web?
Version 1:
“Semantic Web as Web of Data”
Version 2:
“Enrichment of the current Web”
But both need ontologies
for semantic agreement
between sources
between source & user 7
Ontology research is
almost done..
we know what they are
“consensual, formalised models of a domain”
we know how to make and maintain them
(methods, tools, experience)
we know how to deploy them
(search, personalisation, data-integration, …)
Main remaining open questions
Automatic construction (learning)
Automatic mapping (integration)
8
Three obvious intuitions
The Semantic Web needs ontology mapping
Ontology mapping needs
background knowledge
?
Ph.D. student = AIO
Ontology mapping needs approximation
young ?
researcher ≈ post-doc
9
This work with
Zharko Aleksovski &
Michel Klein
Does context knowledge help
mapping?
The general idea
background
knowledge
anchoring anchoring
inference
source target
mapping
12
a realistic example
Two Amsterdam hospitals (OLVG, AMC)
Two Intensive Care Units, different vocab’s
Want to compare quality of care
OLVG-1400:
q 1400 terms in a flat list
q used in the first 24 hour of stay
q some implicit hierarchy e.g.6 types of Diabetes
Mellitus)
q some reduncy (spelling mistakes)
AMC: similar list, but from different hospital
13
Using the structure or not ?
(S <a B) & (B < B’) & (B’ <a T) ! (S <i T)
a a
i
27
Using the structure or not ?
(S <a B) & (B < B’) & (B’ <a T) ! (S <i T)
No use of structure
Only stated is-a & part-of
Transitive chains of is-a, and
transitive chains of part-of
Transitive chains of is-a and part-of
One chain of part-of before
one chain of is-a 28
Three obvious intuitions
The Semantic Web needs ontology mapping
Ontology mapping needs
background knowledge
Ontology mapping needs
approximation
young ?
researcher ≈ post-doc
32
This work with
Zharko Aleksovski
Risto Gligorov
Warner ten Kate
Approximating subsumptions
(and hence mappings)
query: A v B ?
B = B1 u B2 u B3 A v B1, A v B2, A v B3 ?
B2
B
B1 A B3
34
Approximating subsumptions
bi lity
Use “Google distance” to decide whichba
subproblems are reasonable al pro to focus on
Google distancendit
ion B3
e 2u
co ce f ( x),stanfc y )} −u B f ( x, y )
NGD( xt, ryc = max{log c di log o B1 log
i ) en (
wherey
me ccurr log anti min{logt f ( x), log f ( y )}
m o M− n”
s o - em ibutio
≈ f(x)cis the number ntr Google hits for x
fs
of at e o “co of
ti is te of
f(x,y)m the number of Google hits for
es
≈ theatuple of search items x and y
stim
≈ e
M is the number of web pages indexed by Google
35
Google distance
HIDDEN
36
Google distance
animal plant
sheep cow vegeterian
madcow
37
Google for sloppy matching
Algorithm for A vB (B=B1 u B2 u B3)
determine NGD(B, Bi)=σ i, i=1,2,3
incrementally:
• increase sloppyness threshold σ
• allow to ignore A vBi with Σ σ i · σ
match if remaining A v Bj hold
38
Properties of sloppy matching
When sloppyness threshold σ goes up,
set of matches grows monotonically
σ=0: classical matching
σ=1: trivial matching
Ideally: compute σ i such that:
q desirable matches
become true at low σ
?
q undesirable matches
become true only at high σ 39
Experiments in music
domain
CDNow (Amazon.com)
Size: 2410 classes ArtistGigs
Depth: 5 levels Size: 382 classes
Depth: 4 levels
Artist Direct Network
Size: 465 classes CD baby
Depth: 2 levels very sloppy terms Size: 222 classes
Depth: 2 levels
good
All Music Guide Yahoo
Size: 403 classes Size: 96 classes
Depth: 3 levels Depth: 2 levels
MusicMoz
Size: 1073 classes
Depth: 7 levels
40
0 comments
Post a comment