Interactive vocabulary alignmenta practical session using AmalgameJacco van Ossenbruggen, Michiel HildebrandCWI AmsterdamV...
Schedule13:00 Welcome & who is who13:10 Introduction to Amalgame (demo)13:45 Hands-on session I: practice run & exercisesG...
Introduction: problem context  – Europeana and many other projects aggregate    collections from libraries, archives and m...
Alignment issues• When is similar sufficiently similar?  – concepts can be similar along many dimensions,    but differ on...
Data: Two types of concepts• Universals (or qualities)  – kinds or types (painter, president, Cubist)  – properties (red, ...
User & task• Who is your user?  – Domain expert versus lay user  – Familiar with both, one or none of the thesauri?  – Bil...
How (I):Too much for humans alone• our vocabularies tend to be too large to do  alignment without any automation• question...
Example: AAT to WordNet• aat:restorer   altLabels: restaurateur (fr), Restaurator (de) , hersteller (nl), ...   scopeNote:...
How (2):Too hard for machines alone• In our domain, automatic tools tend to  perform bad on virtually all datasets  that d...
But …• Current systems are complex  reasoning engines that combine  multiple strategies in some “smart” way• This “smartne...
Let’s try something radically different• Do not even try to make the tool smart   –   only use simple atomic matching comp...
demo       11
Schedule13:00 Welcome & who is who13:10 Introduction to Amalgame (demo)13:45 Hands-on session I: practice run & exercisesG...
AcknowledgementsWe thank A. Isaac, W. van Hage, C. RevertéReverté, A. Tordai and J. Wiele-maker for their feedback and hel...
Upcoming SlideShare
Loading in …5
×

Intro talk for amalgame test with RCE; Beeld & Geluid

432 views

Published on

Introduction to alignment for CH vocabulary experts

  • Be the first to comment

  • Be the first to like this

Intro talk for amalgame test with RCE; Beeld & Geluid

  1. 1. Interactive vocabulary alignmenta practical session using AmalgameJacco van Ossenbruggen, Michiel HildebrandCWI AmsterdamVU University Amsterdam
  2. 2. Schedule13:00 Welcome & who is who13:10 Introduction to Amalgame (demo)13:45 Hands-on session I: practice run & exercisesGTAA OnderwerpenvsAAT-NED14:30 Break & questions14:45 Hands-on session II: the real work Alignment on self-selected datasets16:00 Pitch16:30 Wrap up and closing remarks 1
  3. 3. Introduction: problem context – Europeana and many other projects aggregate collections from libraries, archives and museums – Collection items come with metadata, often using terms from a pre-defined set of terms – These term sets we call vocabularies • or authority files, thesauri, ontologies, … – How are terms related across vocabularies? – A set of such relations is a vocabulary alignment 2
  4. 4. Alignment issues• When is similar sufficiently similar? – concepts can be similar along many dimensions, but differ on others – answer depends on the data, user, task, … 3
  5. 5. Data: Two types of concepts• Universals (or qualities) – kinds or types (painter, president, Cubist) – properties (red, high, expensive) – relations (teacher of, influenced by)• Individuals (or particulars, named entities) – Persons (Rembrandt, Obama) – Locations (Hilversum, New Amsterdam, Guernica) – Works (Mill “De Hoop”, today’s 8 o’clock news, Picasso’s Guernica) 4
  6. 6. User & task• Who is your user? – Domain expert versus lay user – Familiar with both, one or none of the thesauri? – Bilingual problems or challenges?• What is their task? – few but very good examples (high precision) – exhaustive result lists (high recall) 5
  7. 7. How (I):Too much for humans alone• our vocabularies tend to be too large to do alignment without any automation• question: how many potential alignments are there between two vocabularies with 100k concepts each?• so we need to automate this 6
  8. 8. Example: AAT to WordNet• aat:restorer altLabels: restaurateur (fr), Restaurator (de) , hersteller (nl), ... scopeNote: Those engaged in making changes to an object or structure so that it will closely approximate its state at a specific time in its history. (...) When changes made are to prevent further deterioration, see "preservationists." More generally, for those who undertake treatment, preventive care, and research directed toward long-term safekeeping of cultural and natural heritage, see "conservators."• wn:restorer synonyms: refinisher, renovator, restorer, preserver gloss: a skilled worker who is employed to restore or refinish buildings or antique furniture. 7
  9. 9. How (2):Too hard for machines alone• In our domain, automatic tools tend to perform bad on virtually all datasets that differ from those used for testing• The “Anna Karenina-problem”: – “Happy alignments are all alike; every unhappy alignment is unhappy in its own way”• and we need to be able to trust the results• or at least know why we cannot trust them 8
  10. 10. But …• Current systems are complex reasoning engines that combine multiple strategies in some “smart” way• This “smartness” has major drawbacks: – hard to predict if it will work for your data – hard to explain results afterwards: what went wrong, why & how to fix it – does not scale on large vocabularies 9
  11. 11. Let’s try something radically different• Do not even try to make the tool smart – only use simple atomic matching components – each has F-measure << state of the art – is very fast and scalable – with predictable and understandable results• but empower domain experts (= you!) results improve by interactive iteration loop: – try most promising component(s) – analyze the results – decide on next step: • discard results from last attempt • refine current results to improve precision • look for more results to improve recall 10
  12. 12. demo 11
  13. 13. Schedule13:00 Welcome & who is who13:10 Introduction to Amalgame (demo)13:45 Hands-on session I: practice run & exercisesGTAA OnderwerpenvsAAT-NED http://eculture.cs.vu.nl/hilversum2012/14:30 Break & questions14:45 Hands-on session II: the real work Alignment on self-selected datasets16:00 Pitch16:30 Wrap up and closing remarks 12
  14. 14. AcknowledgementsWe thank A. Isaac, W. van Hage, C. RevertéReverté, A. Tordai and J. Wiele-maker for their feedback and help in the development of Amalgame.M. van Assem produced the RDF conversions for WordNet 2.0 and 3.0.This work was partially supported by the PrestoPRIME andEuropeanaConnect projects.

×