Instance-based Ontology Matching by Instance Enrichment

706 views
616 views

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
706
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Instance-based Ontology Matching by Instance Enrichment

  1. 1. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions Instance-Based Ontology Matching By Instance Enrichment Balthasar A.C. Schopman – supervisors: Antoine Isaac Shenghui Wang Stefan Schlobach Vrije Universiteit Amsterdam June 29, 2009
  2. 2. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsOutline 1 Ontology matching 2 Instance-based OM 3 IBOMbIE 4 Experiments 5 Comparison other OM 6 Conclusions
  3. 3. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResearch questions General research questions: How do different algorithm design options of IBOMbIE influence the final result? How does the performance of IBOMbIE relate to other OM algorithms?
  4. 4. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsQuestions from the audience Crucial questions: please interrupt me. Other questions: after presentation please.
  5. 5. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroductionOntology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language 1 by Euzenat and Shvaiko
  6. 6. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroductionOntology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language 1 by Euzenat and Shvaiko
  7. 7. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroductionOntology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based
  8. 8. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroductionOntology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based
  9. 9. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroductionInstance-based OM (IBOM) Variants IBOM: 1 use dually annotated instances (DAI) 2 create DAI 3 use extension of concepts (DAI not required) General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym
  10. 10. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroductionInstance-based OM (IBOM) Variants IBOM: 1 use dually annotated instances (DAI) 2 create DAI 3 use extension of concepts (DAI not required) General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym
  11. 11. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroDefinitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition someone:Peter foaf:name foaf:knows rdf:type "Peter" someone:Nate foaf:Person
  12. 12. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroDefinitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition ontology / vocabulary object o1 c1 c1 c2 c3 object o2 ... c1 c2 c3 ...
  13. 13. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIntroApplication Two library scenarios: KB and TEL match controlled vocabularies data-sets: book catalogs multi-lingual
  14. 14. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMIBOM: measuring similarity c1 c2
  15. 15. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMIBOM: measuring similarity c1 c2
  16. 16. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMIBOM: measuring similarity c1 c2
  17. 17. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMIBOM: measuring similarity c1 c2
  18. 18. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMJaccard coefficient Jaccard coefficient: |i1 ∩ i2 | J(c1 , c2 ) = |i1 ∪ i2 | quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets
  19. 19. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMJaccard coefficient Jaccard coefficient: |i1 ∩ i2 | J(c1 , c2 ) = |i1 ∪ i2 | quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets
  20. 20. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMCreating dually annotated instances (DAI) Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations approximate instance matching → enrich instances
  21. 21. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIBOMCreating dually annotated instances (DAI) Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations approximate instance matching → enrich instances
  22. 22. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsInstance matchingApproximate instance matching Instance similarity measures: Lucene vector space model (VSM)
  23. 23. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesBasic instance enrichment (IE) data-set D1 data-set D2 i i i1 i2 a b match A B i i
  24. 24. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesBasic instance enrichment (IE) data-set D1 data-set D2 i i i1 i2 a b A B i i A B
  25. 25. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: topN data-set D1 data-set D2 i i2 i1 1st A B a b match i3 2nd D match i i4 3rd A C match
  26. 26. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: topN data-set D1 data-set D2 i i2 i1 A B a b i3 A B D i i4 A C
  27. 27. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: topN data-set D1 data-set D2 i i2 i1 A B a b i3 A B D i i4 D A C
  28. 28. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: topN data-set D1 data-set D2 i i2 i1 A B a b i3 A B D i i4 D A C A C
  29. 29. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: similarity threshold (ST) data-set D1 data-set D2 i i2 i1 sim(i1,i2) A B a b = 0.8 i3 sim(i1,i3) D = 0.4 i i4 sim(i1,i4) A C = 0.2
  30. 30. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: similarity threshold (ST) data-set D1 data-set D2 i i2 i1 sim(i1,i2) A B a b = 0.8 i3 A B sim(i1,i3) D = 0.4 i i4 sim(i1,i4) A C = 0.2
  31. 31. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: similarity threshold (ST) data-set D1 data-set D2 i i2 i1 sim(i1,i2) A B a b = 0.8 i3 A B sim(i1,i3) D = 0.4 i i4 D sim(i1,i4) A C = 0.2
  32. 32. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEnriching instancesIE parameter: similarity threshold (ST) data-set D1 data-set D2 i i2 i1 sim(i1,i2) A B a b = 0.8 i3 A B sim(i1,i3) D = 0.4 i i4 D sim(i1,i4) A C A C = 0.2
  33. 33. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsExperimental questionsExperimental questions Instance similarity measure topN parameter ST parameter combining topN + ST parameters performance as compared to other OM algorithms
  34. 34. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEvaluationAlignment evaluation Methods: Gold standard := good alignment Reindexing Measures: Precision Recall f-measure
  35. 35. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResults of experimentsResults: instance similarity measure - quality 1 1 P VSM P VSM R VSM R VSM F VSM F VSM P Lucene P Lucene R Lucene R Lucene 0.8 F Lucene 0.8 F Lucene 0.6 0.6 performance performance 0.4 0.4 0.2 0.2 0 0 10 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 mapping rank mapping rank (a) Gold standard (b) Reindex Virtually equal
  36. 36. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResults of experimentsResults: instance similarity measure - quality 1 1 precision VSM precision Lucene 0.8 0.8 0.6 0.6 performance overlap 0.4 0.4 0.2 0.2 0 0 1 10 100 1000 10000 100000 1e+06 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 mapping rank mapping rank (c) Overlap (d) Manual Evaluation Edge to VSM
  37. 37. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResults of experimentsResults: instance similarity measure - run-time amount time to enrich 100K indexed instances (hrs:min) instances Lucene VSM 524K 1:04 0:17 1600 VSM Lucene 1400 1,457K 7:20 0:22 1200 2,506K 26:15 0:32 1000 increase run-time (e) stats 800 600 400 200 0 4 6 8 10 12 14 16 18 20 22 24 26 indexed documents * 100K (f) figure it out Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach
  38. 38. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResults of experimentsResults: instance similarity measure - run-time amount time to enrich 100K indexed instances (hrs:min) instances Lucene VSM 524K 1:04 0:17 1600 VSM Lucene 1400 1,457K 7:20 0:22 1200 2,506K 26:15 0:32 1000 increase run-time (g) stats 800 600 400 200 0 4 6 8 10 12 14 16 18 20 22 24 26 indexed documents * 100K (h) figure it out Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach
  39. 39. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResults of experimentsResults: topN parameter (TEL) As N increases, quality of mappings decrease 0.45 0.25 top1 (baseline) top1 (baseline) top2 top2 top3 top3 0.4 top4 top4 top5 top5 top6 0.2 top6 0.35 0.3 0.15 f-measure f-measure 0.25 0.2 0.1 0.15 0.1 0.05 0.05 0 0 1 10 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 mapping rank mapping rank (i) Gold standard (j) Reindex
  40. 40. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResults of experimentsResults: similarity threshold parameter (KB) Best performance with ST: ST=µ Best performance: baseline (topN=1, ST=∞) 0.6 0.4 baseline baseline T=mean-1.5s T=mean-1.5s T=mean-s T=mean-s T=mean-.5s 0.35 T=mean-.5s 0.5 T=mean T=mean T=mean+.5s T=mean+.5s T=mean+s T=mean+s T=mean+1.5s 0.3 T=mean+1.5s 0.4 0.25 f-measure f-measure 0.3 0.2 0.15 0.2 0.1 0.1 0.05 0 0 10 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 mapping rank mapping rank (k) Gold standard (l) Reindex
  41. 41. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsResults of experimentsResults: combining parameters Using both parameters performs good in TEL, not in KB... possibly due to: more selective IBOMbIE pays off in TEL, because vocabularies + instance annotations are more different than in KB scenario. 0.4 0.3 baseline baseline topN=1 ST=mu-0.5s topN=1 ST=mu-0.5s topN=1 ST=mu topN=1 ST=mu 0.35 topN=1 ST=mu+0.5s topN=1 ST=mu+0.5s topN=2 ST=mu-0.5s 0.25 topN=2 ST=mu-0.5s topN=2 ST=mu topN=2 ST=mu topN=2 ST=mu+0.5s topN=2 ST=mu+0.5s 0.3 topN=3 ST=mu-0.5s topN=3 ST=mu topN=3 ST=mu topN=3 ST=mu+0.5s 0.2 0.25 f-measure f-measure 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06 mapping rank mapping rank (m) KB (n) TEL (evaluation method: reindexing)
  42. 42. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsOAEIOntology alignment evaluation initiative (OAEI) terminol- structure- semantic- instance- ogical based based based DSSim # Lily # TaxoMap # IBOMbIE # # # DSSim, Lily and TaxoMap: consider KB ontologies “huge” feature functionality to deal with large ontologies
  43. 43. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsOAEIPerformance comparison: quality 0.8 P IBOMbIE topN=1 R IBOMbIE topN=1 P DSSim 0.7 R DSSim P Lily R Lily P TaxoMap 0.6 R TaxoMap 0.5 performance 0.4 0.3 0.2 0.1 0 0 2000 4000 6000 8000 10000 mapping rank
  44. 44. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsOAEIPerformance comparison: resources + coverage matcher run-time amount mappings DSSim 12:00 2930 Lily ? 2797 TaxoMap 2:40 1851 IBOMbIE 1:54 7000+ (Amount lexically equal concepts KB vocabulaires = 2,895)
  45. 45. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsConclusions + discussion IBOMbIE algorithm is quite promising: Relatively low run-time Able to deal with large ontologies Amount + quality of mappings Pros of IBOM Able to align ontologies using disjunct data-sets Basic instance enrichment appears best performing method. Possible cause: Jaccard coefficient does not support multi-sets.
  46. 46. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsFin Thank you... any questions ?
  47. 47. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsVocabularies vocabulary size KB GTT 35K Brinkman 5K TEL LCSH 340K Rameau 155K SWD 805K
  48. 48. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIE parameter: similarity threshold (ST) D1 D2 annotated annotated with with µ σ KB O1 O2 0.297 0.106 O2 O1 0.279 0.101 TEL O1 O2 0.260 0.097 O2 O1 0.232 0.084 standard ST: µ 1 step-size: 2 σ
  49. 49. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsVSM Weights are components of vectors: term frequency - inverse document frequency: TF-IDF e.g. audiovisual features tfidfw ,d = tfw ,d ∗ idfw √ nw ,d tfw ,d = |d| |D| idfw = log |d ∈ D : w ∈ d| VSM cosine similarity n d1 · d2 i =1 wi ,d1 wi ,d2 cosine sim(d1 , d2 ) = = |d1 ||d2 | i wi2 1 i wi2 2 ,d ,d
  50. 50. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEvaluation method: gold standard Gold standard := good alignment |{reference} ∩ {retrieved}| P = precision = |{retrieved}| |{reference} ∩ {retrieved}| R = recall = |{reference}| P ∗R F = f − measure = 2 ∗ P +R
  51. 51. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsEvaluation method: reindexing o_1 o_2 a x b y c z instance i_dual instance i_dual {a, b} {x, z} reindex {x} {a, b} dually annotated instances |{reference}∩{retrieved}| |{retrieved}| P= |{reindexed instances}| dually annotated instances |{reference}∩{retrieved}| |{reference}| R=
  52. 52. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM ConclusionsIbOM by IM algorithm overview Whole algorithm Start: two data-sets Dx and Dy 1 Enrich instances of Dx with annotations of instances of Dy For every instance a: 1 Find N best matching instances {b} in Dy 2 Add annotations of {b} to a 2 Enrich vice versa 3 Merge data-sets into one dually annotated data-set 4 Apply Jaccard measure

×