Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Combining Vocabulary Alignment Techniques<br />Anna Tordai, Jacco van Ossenbruggen, Guus Schreiber<br />VU University Amst...
Vocabulary Alignments<br />Many museums, libraries and archives capture their knowledge in structured vocabularies coverin...
Aligning is difficult	<br />Differences in:<br />Lexical conventions<br />Structure/metamodel<br />Ontological commitments...
What can we achieve with current alignment tools?<br />Wide selection of alignment tools exist<br />OAEI workshop: alignme...
Concepts<br />11,995<br />WordNet NL<br />(Cornetto)<br />70,434<br /> My Data: E-Culture Cloud<br />
Research Question	<br />Does combining alignment techniques have added value?<br />If yes, we need a methodology that tell...
Case Study Setup<br />2 data sets in Dutch:<br />RKD subject thesaurus <br />Cornetto, a lexical thesaurus linked to WordN...
Data Sets<br />RKD subject thesaurus<br />3,342 concepts<br />3,342 preferred  labels<br />242 alternative labels<br />Bro...
Alignments techniques<br />Baseline technique: optimizes precision<br />Plain string matching<br />Ignores ambiguous match...
Quantitative Results: 4375 Candidate Alignments<br />Baseline <br />(30%)<br />STITCH<br />(86%)<br />59<br />10<br />1726...
Evaluation<br />1 person (me) evaluated the entire set <br />2493 concepts with 4375 alignments <br />Taking approximately...
Validation of Manual Evaluation<br />We measured inter-observer agreement for exact matches between me and the 5 raters us...
Qualitative Results<br />The tools found no alignments for 849 concepts<br />Recall is based on the correct exact-matches ...
Overlap in correct exact-match alignments (precision)<br />Baseline<br />STITCH<br />53<br />90%<br />9<br />90%<br />429<...
Disambiguation<br />Total aligned concepts 2,493 with 4,375 alignments<br />860 concepts have more than one alignment with...
Child Match<br />Parent Match<br />Target thesaurus<br />Source thesaurus<br />
Disambiguation Results<br />Child match: 120  out of 449 alignments for 112 concepts have highest number of child alignmen...
Final Results<br />
Conclusion and Future Work	<br />A methodology is much needed in this area<br />Our next step is to see how alignment tech...
Thanks and Acknowledgements<br />Cornetto project team<br />The Netherlands Institute for Art History (RKD)<br />Antoine I...
Upcoming SlideShare
Loading in …5
×

Combining Vocabulary Alignment Techniques

492 views

Published on

Identifying alignments between vocabularies has become a central knowledge engineering activity. A plethora of alignment techniques has been developed over the past
years. In this paper we present a case study in which we examine and evaluate the practical use of three typical alignment techniques. The study involves the alignment
of two vocabularies used in a semantic-search engine for cultural-heritage objects. We show that a sequence can be beneficial. The case study gives insight into evaluation issues, such as techniques for identi fication of false positives. We see this work as a step to a badly-needed methodology for alignment.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Combining Vocabulary Alignment Techniques

  1. 1. Combining Vocabulary Alignment Techniques<br />Anna Tordai, Jacco van Ossenbruggen, Guus Schreiber<br />VU University Amsterdam<br />
  2. 2. Vocabulary Alignments<br />Many museums, libraries and archives capture their knowledge in structured vocabularies covering similar areas (materials, subject matter)<br />One goal in the CH field is data integration by making museum collections and their vocabularies available through common portals<br />Our solution is aligning vocabularies to each other and/or aligning to large commonly used resources <br />
  3. 3. Aligning is difficult <br />Differences in:<br />Lexical conventions<br />Structure/metamodel<br />Ontological commitments<br />Use of: <br />Jargon<br />Homonyms/polysemes<br />Background knowledge/implicit context<br />
  4. 4. What can we achieve with current alignment tools?<br />Wide selection of alignment tools exist<br />OAEI workshop: alignment tools are tested on benchmark sets and on real world applications<br />A practical methodology that tells us which tool to use in which situation is still lacking<br />How can I get better results on MY data?<br />
  5. 5. Concepts<br />11,995<br />WordNet NL<br />(Cornetto)<br />70,434<br /> My Data: E-Culture Cloud<br />
  6. 6. Research Question <br />Does combining alignment techniques have added value?<br />If yes, we need a methodology that tells us how to combine alignment techniques. <br />
  7. 7. Case Study Setup<br />2 data sets in Dutch:<br />RKD subject thesaurus <br />Cornetto, a lexical thesaurus linked to WordNet<br />Alignment techniques for generating exact-matches<br />Baseline technique<br />Lexical technique<br />Structural technique<br />Manual Evaluation<br />Techniques for improving precision/recall<br />Combining alignments techniques to improve recall and precision<br />Disambiguation techniques for improving precision<br />
  8. 8. Data Sets<br />RKD subject thesaurus<br />3,342 concepts<br />3,342 preferred labels<br />242 alternative labels<br />Broader, narrower and related relations between concepts<br />Cornetto<br />70,434 synsets<br />102,572 sense-labels<br />16 relation types including hypernym relation<br />One word can be part of multiple synsets<br />Rationale: link small to large hub vocabularies<br />Small specialized vocabularies are frequent (in the CH field)<br />Linking to large vocabularies adds synonyms and relations<br />
  9. 9. Alignments techniques<br />Baseline technique: optimizes precision<br />Plain string matching<br />Ignores ambiguous matches<br />Lexical technique (STITCH tool): increases recall<br />Matches terms and uses lemmatization and compound splitting<br />Returns all (possibly ambiguous) matches<br />Structural technique (Falcon – AO): best tool in town (OAEI 2007)<br />Uses the structure of vocabularies<br />Uses lexical measures, lemmatization and distance metrics<br />
  10. 10. Quantitative Results: 4375 Candidate Alignments<br />Baseline <br />(30%)<br />STITCH<br />(86%)<br />59<br />10<br />1726<br />1145<br />92<br />836<br />507<br />Falcon<br />(59%)<br />
  11. 11. Evaluation<br />1 person (me) evaluated the entire set <br />2493 concepts with 4375 alignments <br />Taking approximately 26 person-hours<br />5 (external) people evaluated small samples of alignments to validate the manual evaluation<br />50 concepts with around 80 alignments<br />Taking 17 minutes on average<br />
  12. 12. Validation of Manual Evaluation<br />We measured inter-observer agreement for exact matches between me and the 5 raters using Cohen’s Kappa κ= 0.70<br />Reasons for disagreement: <br />Disagreement in the vocabulary interpretation<br />Vocabulary error<br />Human error<br />We will use the list of correct exact-matches as a “Gold Standard” to compare the performance of the tools<br />
  13. 13. Qualitative Results<br />The tools found no alignments for 849 concepts<br />Recall is based on the correct exact-matches that were found<br />
  14. 14. Overlap in correct exact-match alignments (precision)<br />Baseline<br />STITCH<br />53<br />90%<br />9<br />90%<br />429<br />25%<br />1073<br />94%<br />434<br />52%<br />87<br />95%<br />147<br />29%<br />Falcon<br />Distinct total: 2232<br />
  15. 15. Disambiguation<br />Total aligned concepts 2,493 with 4,375 alignments<br />860 concepts have more than one alignment with a total of 2712 alignments<br />From the manual evaluation we know that many of these alignments are wrong<br />We will disambiguate alignments using the structure of the vocabularies(broader/hyponym relations)<br />Child match <br />Parent match<br />
  16. 16. Child Match<br />Parent Match<br />Target thesaurus<br />Source thesaurus<br />
  17. 17. Disambiguation Results<br />Child match: 120 out of 449 alignments for 112 concepts have highest number of child alignments<br /> with 24% false positives <br />and 10 % false negatives<br />Parent match: 234 out of 561 alignments for 185 concepts had the highest number of parent alignments<br /> with 22 % false positives <br />and 12 % false negatives<br />Small overlap of 59 alignments for 18 concepts<br />A third of ambiguous alignments is resolved using the two disambiguation methods: for 279 out of 860 concepts we keep 336 alignments and throw away 615 alignments<br />
  18. 18. Final Results<br />
  19. 19. Conclusion and Future Work <br />A methodology is much needed in this area<br />Our next step is to see how alignment techniques can be combined with regard to larger vocabularies:<br />We are currently working on experiments with Getty’s AAT and Princeton WordNet<br />
  20. 20. Thanks and Acknowledgements<br />Cornetto project team<br />The Netherlands Institute for Art History (RKD)<br />Antoine Isaac and the STITCH team<br />Wei Hu (Falcon)<br />Mark van Assem, Willem van Hage, Laura Hollink and Jan Wielemaker for their contribution to the alignment evaluation<br />Bob Wielinga for comments on earlier versions of the paper<br />

×