Ontology mapping needs context & approximation

1,496 views
1,394 views

Published on

Keynote at the Web Intelligence Conference (WI 2006), describing the various approaches in our lab to approximate ontology mapping.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,496
On SlideShare
0
From Embeds
0
Number of Embeds
110
Actions
Shares
0
Downloads
45
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ontology mapping needs context & approximation

  1. 1. Ontology mapping needs context & approximation Frank van Harmelen Vrije Universiteit Amsterdam
  2. 2. Or:  How to make ontology-mapping less like data-base integration  and more like a social conversation 2
  3. 3. Three Two obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge  Ontology mapping needs approximation 3
  4. 4. Which Semantic Web?  Version 1: "Semantic Web as Web of Data" (TBL)  recipe: expose databases on the web, use RDF, integrate  meta-data from: q expressing DB schema semantics in machine interpretable ways  enable integration and unexpected re-use 4
  5. 5. Which Semantic Web?  Version 2: “Enrichment of the current Web”  recipe: Annotate, classify, index  meta-data from: q automatically producing markup: named-entity recognition, concept extraction, tagging, etc.  enable personalisation, search, browse,.. 5
  6. 6. Which Semantic Web?  Version 1: “Semantic Web as Web of Data”  Version 2: “Enrichment of the current Web”  Different use-cases data-oriented  Different techniques  Different users user-oriented 6
  7. 7. Which Semantic Web?  Version 1: “Semantic Web as Web of Data”  Version 2: “Enrichment of the current Web”  But both need ontologies for semantic agreement between sources between source & user 7
  8. 8.  Ontology research is almost done..  we know what they are “consensual, formalised models of a domain”  we know how to make and maintain them (methods, tools, experience)  we know how to deploy them (search, personalisation, data-integration, …) Main remaining open questions  Automatic construction (learning)  Automatic mapping (integration) 8
  9. 9. Three obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge ? Ph.D. student = AIO  Ontology mapping needs approximation young ? researcher ≈ post-doc 9
  10. 10. This work with Zharko Aleksovski & Michel Klein
  11. 11. Does context knowledge help mapping?
  12. 12. The general idea background knowledge anchoring anchoring inference source target mapping 12
  13. 13. a realistic example  Two Amsterdam hospitals (OLVG, AMC)  Two Intensive Care Units, different vocab’s  Want to compare quality of care  OLVG-1400: q 1400 terms in a flat list q used in the first 24 hour of stay q some implicit hierarchy e.g.6 types of Diabetes Mellitus) q some reduncy (spelling mistakes)  AMC: similar list, but from different hospital 13
  14. 14. Context ontology used DICE: q 2500 concepts (5000 terms), 4500 links q Formalised in DL q five main categories: • tractus (e.g. nervous_system, respiratory_system) • aetiology (e.g. virus, poising) • abnormality (e.g. fracture, tumor) • action (e.g. biopsy, observation, removal) • anatomic_location (e.g. lungs, skin) 14
  15. 15. Baseline: Linguistic methods  Combine lexical analysis with hierarchical structure  313 suggested matches, around 70 % correct  209 suggested matches, around 90 % correct  High precision, low recall (“the easy cases”) 15
  16. 16. Now use background knowledge DICE (2500 concepts, 4500 links) anchoring anchoring inference OLVG AMC (1400, flat) (1400, flat) mapping 16
  17. 17. Example found with context knowledge (beyond lexical) 17
  18. 18. Example 2 18
  19. 19. Anchoring strength  Anchoring = substring + trivial morphology anchored on N aspects OLVG AMC N=5 0 2 N=4 0 198 N=3 4 711 N=2 144 285 N=1 401 208 total nr. of anchored terms 549 39% 1404 96% total nr. of anchorings 1298 5816 19
  20. 20. Results Example matchings discovered q OLVG: Acute respiratory failure AMC: Asthma cardiale q OLVG: Aspergillus fumigatus AMC: Aspergilloom q OLVG: duodenum perforation AMC: Gut perforation q OLVG: HIV AMC: AIDS q OLVG: Aorta thoracalis dissectie type B AMC: Dissection of artery 20
  21. 21. Experimental results  Source & target = flat lists of ±1400 ICU terms each  Background = DICE (2300 concepts in DL)  Manual Gold Standard (n=200) 21
  22. 22. Does more context knowledge help?
  23. 23. Adding more context Only lexical DICE (2500 concepts) MeSH (22000 concepts) ICD-10 (11000 concepts)  Anchoring strength: DICE MeSH ICD10 4 aspects 0 8 0 3 aspects 0 89 0 2 aspects 135 201 0 1 aspect 413 694 80 total 548 992 80 23
  24. 24. Results with multiple ontologies Separate Lexical ICD-10 DICE MeSH Recall 64% 64% 76% 88% Precision 95% 95% 94% 89% Joint 100 90  Monotonic improvement 80 70  Independent of order 60 50  Linear increase of cost 40 30 20 10 0 Lexical ICD-10 DICE MeSH 24
  25. 25. does structured context knowledge help?
  26. 26. Exploiting structure  CRISP: 700 concepts, broader-than  MeSH: 1475 concepts, broader-than  FMA: 75.000 concepts, 160 relation-types (we used: is-a & part-of) FMA (75.000) anchoring anchoring inference CRISP MeSH (738) (1475) mapping 26
  27. 27. Using the structure or not ?  (S <a B) & (B < B’) & (B’ <a T) ! (S <i T) a a i 27
  28. 28. Using the structure or not ?  (S <a B) & (B < B’) & (B’ <a T) ! (S <i T) No use of structure Only stated is-a & part-of Transitive chains of is-a, and transitive chains of part-of Transitive chains of is-a and part-of One chain of part-of before one chain of is-a 28
  29. 29. Examples 29
  30. 30. Examples 30
  31. 31. Matching results (CRISP to MeSH) (Golden Standard n=30) Recall = · ¸ total incr. Exp.1:Direct 448 417 156 1021 - Exp.2:Indir. is-a + part-of 395 516 405 1316 29% Exp.3:Indir. separate closures 395 933 1402 2730 167% Exp.4:Indir. mixed closures 395 1511 2228 4143 306% Exp.5:Indir. part-of before is-a 395 972 1800 3167 210% Precision = · ¸ total correct Exp.1:Direct 17 18 3 38 100% Exp.4:Indir. mixed closures 14 39 59 112 94% Exp.5:Indir. part-of before is-a 14 37 50 101 100% 31
  32. 32. Three obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge  Ontology mapping needs approximation young ? researcher ≈ post-doc 32
  33. 33. This work with Zharko Aleksovski Risto Gligorov Warner ten Kate
  34. 34. Approximating subsumptions (and hence mappings)  query: A v B ?  B = B1 u B2 u B3 A v B1, A v B2, A v B3 ? B2 B B1 A B3 34
  35. 35. Approximating subsumptions bi lity  Use “Google distance” to decide whichba subproblems are reasonable al pro to focus on  Google distancendit ion B3 e 2u co ce f ( x),stanfc y )} −u B f ( x, y ) NGD( xt, ryc = max{log c di log o B1 log i ) en ( wherey me ccurr log anti min{logt f ( x), log f ( y )} m o M− n” s o - em ibutio ≈ f(x)cis the number ntr Google hits for x fs of at e o “co of ti is te of f(x,y)m the number of Google hits for es ≈ theatuple of search items x and y stim ≈ e M is the number of web pages indexed by Google 35
  36. 36. Google distance HIDDEN 36
  37. 37. Google distance animal plant sheep cow vegeterian madcow 37
  38. 38. Google for sloppy matching  Algorithm for A vB (B=B1 u B2 u B3)  determine NGD(B, Bi)=σ i, i=1,2,3  incrementally: • increase sloppyness threshold σ • allow to ignore A vBi with Σ σ i · σ  match if remaining A v Bj hold 38
  39. 39. Properties of sloppy matching  When sloppyness threshold σ goes up, set of matches grows monotonically  σ=0: classical matching  σ=1: trivial matching  Ideally: compute σ i such that: q desirable matches become true at low σ ? q undesirable matches become true only at high σ 39
  40. 40. Experiments in music domain CDNow (Amazon.com) Size: 2410 classes ArtistGigs Depth: 5 levels Size: 382 classes Depth: 4 levels Artist Direct Network Size: 465 classes CD baby Depth: 2 levels very sloppy terms Size: 222 classes Depth: 2 levels  good All Music Guide Yahoo Size: 403 classes Size: 96 classes Depth: 3 levels Depth: 2 levels MusicMoz Size: 1073 classes Depth: 7 levels 40
  41. 41. Experiment Manual Gold Standard, N=50, random pairs σ =0.53 97 60 σ =0.5 precision classical random NGD 16-05-2006 20 recall 7
  42. 42. wrapping up
  43. 43. Three obvious intuitions  The Semantic Web needs ontology mapping  Ontology mapping needs background knowledge  Ontology mapping needs approximation 43
  44. 44. So that  shared context & approximation make ontology-mapping a bit more like a social conversation 44
  45. 45. Future: Distributed/P2P setting background knowledge anchoring anchoring inference source target mapping 45
  46. 46. Vragen & discussie Frank.van.Harmelen@cs.vu.nl http://www.cs.vu.nl/~frankh 46

×