Combining Vocabulary Alignment TechniquesAnna Tordai, Jacco van Ossenbruggen, Guus SchreiberVU University Amsterdam
Vocabulary AlignmentsMany museums, libraries and archives capture their knowledge in structured vocabularies covering similar areas (materials, subject matter)One goal in the CH field is data integration by making museum collections and their vocabularies available through common portalsOur solution is aligning vocabularies to each other and/or aligning to large commonly used resources
Aligning is difficult	Differences in:Lexical conventionsStructure/metamodelOntological commitmentsUse of: JargonHomonyms/polysemesBackground knowledge/implicit context
What can we achieve with current alignment tools?Wide selection of alignment tools existOAEI workshop: alignment tools are tested on benchmark sets and on real world applicationsA practical methodology that tells us which tool to use in which situation is still lackingHow can I get better results on MY data?
Concepts11,995WordNet NL(Cornetto)70,434 My Data: E-Culture Cloud
Research Question	Does combining alignment techniques have added value?If yes, we need a methodology that tells us how to combine alignment techniques.
Case Study Setup2 data sets in Dutch:RKD subject thesaurus Cornetto, a lexical thesaurus linked to WordNetAlignment techniques for generating exact-matchesBaseline techniqueLexical techniqueStructural techniqueManual EvaluationTechniques for improving precision/recallCombining alignments techniques to improve recall and precisionDisambiguation techniques for improving precision
Data SetsRKD subject thesaurus3,342 concepts3,342 preferred  labels242 alternative labelsBroader, narrower and related relations between conceptsCornetto70,434 synsets102,572 sense-labels16 relation types including hypernym relationOne word can be part of multiple synsetsRationale: link small to large hub vocabulariesSmall specialized vocabularies are frequent (in the CH field)Linking to large vocabularies adds synonyms and relations
Alignments techniquesBaseline technique: optimizes precisionPlain string matchingIgnores ambiguous matchesLexical technique (STITCH tool): increases recallMatches terms and uses lemmatization and compound splittingReturns all (possibly ambiguous) matchesStructural technique (Falcon – AO): best tool in town (OAEI 2007)Uses the structure of vocabulariesUses lexical measures, lemmatization and distance metrics
Quantitative Results: 4375 Candidate AlignmentsBaseline (30%)STITCH(86%)59101726114592836507Falcon(59%)
Evaluation1 person (me) evaluated the entire set 2493 concepts with 4375 alignments Taking approximately 26 person-hours5 (external) people evaluated small samples of alignments to validate the manual evaluation50 concepts with around 80 alignmentsTaking 17 minutes on average
Validation of Manual EvaluationWe measured inter-observer agreement for exact matches between me and the 5 raters using Cohen’s Kappa  κ= 0.70Reasons for disagreement: Disagreement in the vocabulary interpretationVocabulary errorHuman errorWe will use the list of correct exact-matches as a “Gold Standard” to compare the performance of the tools
Qualitative ResultsThe tools found no alignments for 849 conceptsRecall is based on the correct exact-matches that were found
Overlap in correct exact-match alignments (precision)BaselineSTITCH5390%990%42925%107394%43452%8795%14729%FalconDistinct total: 2232
DisambiguationTotal aligned concepts 2,493 with 4,375 alignments860 concepts have more than one alignment with a total of 2712 alignmentsFrom the manual evaluation we know that many of these alignments are wrongWe will disambiguate alignments using the structure of the vocabularies(broader/hyponym relations)Child match Parent match
Child MatchParent MatchTarget thesaurusSource thesaurus
Disambiguation ResultsChild match: 120  out of 449 alignments for 112 concepts have highest number of child alignments with 24% false positives and 10 % false negativesParent match: 234 out of 561 alignments for 185 concepts had the highest number of parent alignments with 22 % false positives and 12 % false negativesSmall overlap of 59 alignments for 18 conceptsA third of ambiguous alignments is resolved using the two disambiguation methods: for 279 out of 860 concepts we keep 336 alignments and throw away 615 alignments
Final Results
Conclusion and Future Work	A methodology is much needed in this areaOur next step is to see how alignment techniques can be combined with regard to larger vocabularies:We are currently working on experiments with Getty’s AAT and Princeton WordNet
Thanks and AcknowledgementsCornetto project teamThe Netherlands Institute for Art History (RKD)Antoine Isaac and the STITCH teamWei Hu (Falcon)Mark van Assem, Willem van Hage, Laura Hollink and Jan Wielemaker for their contribution to the alignment evaluationBob Wielinga for comments on earlier versions of the paper

Combining Vocabulary Alignment Techniques

  • 1.
    Combining Vocabulary AlignmentTechniquesAnna Tordai, Jacco van Ossenbruggen, Guus SchreiberVU University Amsterdam
  • 2.
    Vocabulary AlignmentsMany museums,libraries and archives capture their knowledge in structured vocabularies covering similar areas (materials, subject matter)One goal in the CH field is data integration by making museum collections and their vocabularies available through common portalsOur solution is aligning vocabularies to each other and/or aligning to large commonly used resources
  • 3.
    Aligning is difficult Differencesin:Lexical conventionsStructure/metamodelOntological commitmentsUse of: JargonHomonyms/polysemesBackground knowledge/implicit context
  • 4.
    What can weachieve with current alignment tools?Wide selection of alignment tools existOAEI workshop: alignment tools are tested on benchmark sets and on real world applicationsA practical methodology that tells us which tool to use in which situation is still lackingHow can I get better results on MY data?
  • 5.
  • 6.
    Research Question Does combiningalignment techniques have added value?If yes, we need a methodology that tells us how to combine alignment techniques.
  • 7.
    Case Study Setup2data sets in Dutch:RKD subject thesaurus Cornetto, a lexical thesaurus linked to WordNetAlignment techniques for generating exact-matchesBaseline techniqueLexical techniqueStructural techniqueManual EvaluationTechniques for improving precision/recallCombining alignments techniques to improve recall and precisionDisambiguation techniques for improving precision
  • 8.
    Data SetsRKD subjectthesaurus3,342 concepts3,342 preferred labels242 alternative labelsBroader, narrower and related relations between conceptsCornetto70,434 synsets102,572 sense-labels16 relation types including hypernym relationOne word can be part of multiple synsetsRationale: link small to large hub vocabulariesSmall specialized vocabularies are frequent (in the CH field)Linking to large vocabularies adds synonyms and relations
  • 9.
    Alignments techniquesBaseline technique:optimizes precisionPlain string matchingIgnores ambiguous matchesLexical technique (STITCH tool): increases recallMatches terms and uses lemmatization and compound splittingReturns all (possibly ambiguous) matchesStructural technique (Falcon – AO): best tool in town (OAEI 2007)Uses the structure of vocabulariesUses lexical measures, lemmatization and distance metrics
  • 10.
    Quantitative Results: 4375Candidate AlignmentsBaseline (30%)STITCH(86%)59101726114592836507Falcon(59%)
  • 11.
    Evaluation1 person (me)evaluated the entire set 2493 concepts with 4375 alignments Taking approximately 26 person-hours5 (external) people evaluated small samples of alignments to validate the manual evaluation50 concepts with around 80 alignmentsTaking 17 minutes on average
  • 12.
    Validation of ManualEvaluationWe measured inter-observer agreement for exact matches between me and the 5 raters using Cohen’s Kappa κ= 0.70Reasons for disagreement: Disagreement in the vocabulary interpretationVocabulary errorHuman errorWe will use the list of correct exact-matches as a “Gold Standard” to compare the performance of the tools
  • 13.
    Qualitative ResultsThe toolsfound no alignments for 849 conceptsRecall is based on the correct exact-matches that were found
  • 14.
    Overlap in correctexact-match alignments (precision)BaselineSTITCH5390%990%42925%107394%43452%8795%14729%FalconDistinct total: 2232
  • 15.
    DisambiguationTotal aligned concepts2,493 with 4,375 alignments860 concepts have more than one alignment with a total of 2712 alignmentsFrom the manual evaluation we know that many of these alignments are wrongWe will disambiguate alignments using the structure of the vocabularies(broader/hyponym relations)Child match Parent match
  • 16.
    Child MatchParent MatchTargetthesaurusSource thesaurus
  • 17.
    Disambiguation ResultsChild match:120 out of 449 alignments for 112 concepts have highest number of child alignments with 24% false positives and 10 % false negativesParent match: 234 out of 561 alignments for 185 concepts had the highest number of parent alignments with 22 % false positives and 12 % false negativesSmall overlap of 59 alignments for 18 conceptsA third of ambiguous alignments is resolved using the two disambiguation methods: for 279 out of 860 concepts we keep 336 alignments and throw away 615 alignments
  • 18.
  • 19.
    Conclusion and FutureWork A methodology is much needed in this areaOur next step is to see how alignment techniques can be combined with regard to larger vocabularies:We are currently working on experiments with Getty’s AAT and Princeton WordNet
  • 20.
    Thanks and AcknowledgementsCornettoproject teamThe Netherlands Institute for Art History (RKD)Antoine Isaac and the STITCH teamWei Hu (Falcon)Mark van Assem, Willem van Hage, Laura Hollink and Jan Wielemaker for their contribution to the alignment evaluationBob Wielinga for comments on earlier versions of the paper

Editor's Notes

  • #4 Lexical convention example: plural vs singularStructure example: hypernymyvs broader thanOntological commitments: OWL vs SKOSJargon: Expert terms vs layman termsHomonyms: bank (financial institution vs bank (river)Polysemes: to milk (act of) and milk (product)Background knowledge: the application domain can define the meaning of concepts
  • #6 This is what My, well our data looks like. We have a number of museum datasets, the blobs with the same color indicate that. These datasets contain artwork metadata, and vocabularies describing people, locations concepts and events. We have some domain specific vocabularies such as Getty’s Art and Architecture Thesaurus and Lexical resources in various languages, English, Dutch and French WordNets. In this talk I will focus on the alignment between these these two blobs. Our problem essentially is that none of the tools we tried works well enough on our data then our main research question are…
  • #7 Does combining alignment techniques or tools have added value and if yes then we are in need of a methodology that tells us how to combine alignment techniques given certain goals. As a first step we performed a case study.
  • #8 RKD: The Netherlands Institute for Art HistoryWe perfomed a manual evualuation for determining the correctness of the alignmentsBecause the lexical thesaurus contains multiple homonyms we also applied disambiguation techniques to improve precision
  • #9 RKD: The Netherlands Institute for Art HistoryRkd subject thesaurus contains less than 3 and a half thousand concepts, each with a preferred label. There are few alternative labels. There are also broader, narrower and related relations between concepts.Cornetto contains over 70 thousand synsets with over 100 thousand labels. A large portion of the synsets has a single label but there are synsets with over a dozen labels such asThere are 16 relation types including the hyperonym relation. One word can be part of multiple synsets which creates a disambiguation problem for automatic alignment techniques. Our rationale is to link small vocabularies to large hub vocabularies as small vocabularies are frequent in the Cultural Heritage field. Also linking to large vocabularies adds new synonyms and relations which make the data more searchable
  • #10 We used the following alignments techniques:Baseline technique which optimizes precision. It performs plain string matching and simply ignores ambiguous matches. The lexical technique where we used the tool from the STITCH project is geared towards increasing recall. It matches terms and uses lemmatization and compound splitting. It also returns all matches found even ambiguous ones. Finally we have the structural technique, Here we used Falcon-AO which was the best performing tool in the 2007 alignment workshop. It uses the structure of vocabularies as well as lexical measures, lemmatization and distance metrics for finding the best possible alignments.The three tools found overlapping sets of candidate alignments. Note that at this point we do not say anything about the quality of the alignments
  • #11 So what were the quantitative results? The three tools together returned over 4 thousand candidate alignments. The baseline tool found 30 % of all candidate alignments, followed by Falcon with 60 % and the STITCH tool found 86%.There is a large overlap between the three tools as well as between STITCH and Falcon. Almost half of the alignments found by STITCH were not found by the other two techniquesBut how good are these alignments?
  • #12 In order to find out about the quality of alignments we performed a manual evaluation of the alignments. One person, you may guess who, evaluated the entire set taking approximately 26 hours. We also had 5 external people evaluating separate sets of sample alignments to validate the manual evaluation each taking 17 minutes on average.
  • #13 We then measured inter observer agreement for exact matches between me and the 5 raters using Cohen’s kappa and found a kappa of 0.7 which is relatively low. This just goes to show how difficult it is to evaluate alignments even for humans.The reasons for disagreement were either due to differences in the interpretation of the terms of the vocabularies, or due to differences in dealing with errors in the vocabulary and because of plain human error such as accidentally pushing the wrong button without noticing. We will use the list of correct exact-matches as a Gold Standard to compare the performances of the tool
  • #14 For the baseline tool we have a high number of correct exact matches as expected. About half of the non exact match alignments is some semantic relation like broader narrower or related. The incorrect alignments were entirely due to homonyms where one meaning appears in one vocabulary and the other meaning in the other vocabulary. However the baseline tool returned approximately half of the total number of correct alignments found. For the stitch tool the picture is quite different with only about half of the alignments being correct and most of the non exact match alignments entirely incorrect altough there are 750 more correct concepts found in than the baseline. The performance of the falcon tool is somewhere in the middle higher percentage of correct exact matches than the STITCH tool but slightly lower coverage, still Falcon also returned significantly more correct alignments than the baseline tool. When looking at the distinct total we see that for almost every concept aligned there is at least one correct alignment.In addition the tools found no alignments for around 850 concepts of the subject thesaurus so for around 1000 concepts we have no correct alignment. Our recall is based on the correct exact matches found and could have been called coverage.
  • #15 The most interesting conclusions can be drawn where the STITCH tool and Falcon don’t overlap with the baseline. We see that in the overlap between Stitch and falcon the precision to 50 % while for alignments found only by Falcon the precision drops further to 29% and even further for Stitch to 25 %One key observation is that most of the alignments found only by STITCH and to some extent by Falcon are ambiguous, that is for a single source concept we have multiple alignments. To tackle this problem we also perform automatic disambiguation
  • #17 So here is a small sample of the hierarchies of two thesauri. The source thesaurus is the RKD subject thesaurus and the Target thesaurus is Cornetto.For the Child Match technique if we have two alignments for a single source concept. We look at the bottom of the hierarchy whether there are any alignments between the children of these concepts. In this case we have two alignments for the top most alignment. Here we make the assumption that this is the correct alignments while we discard the other alignment. The parent Match technique works similarly except there we look at the bottom of the hierarchy. Again we have a source concept with multiple alignments but in this case we look at the parents of the concepts and if there is at least one parent alignment we consider that alignment to be correct and discard the other one. So what were the results of the disambiguation
  • #18 The key part of this slide is that there is a small overlap between the two methods but with this computationally cheap method we were able to disambiguate a third of all the ambiguous alignments. About 23 % of the alignments we keep are actually false positives and we of the discarded alignments about 10 % were correct. For more information I refer you to the paper.
  • #19 So in general when it comes to combining the tools and the use of disambiguation we found the following:By doing an additional manual evaluation of a very selective subset we can boost the recall and precision even further. This Is described in more detail in the paper.
  • #20 I would like to conclude by saying that a methodology is much needed in this area.With regard to future work our next step is to see how alignment techniques can be combined to on larger vocabularies.We are currently working on experiments with Getty’s Art and Architecture Thesaurus and Princeton WordNet.