Word Sense Disambiguation and Induction

1,439 views

Published on

An introduction to WSD and WSI, based on the talks given at ESSLLI 2010.

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,439
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
77
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Word Sense Disambiguation and Induction

  1. 1. Introduction WSD WSI Evaluation and Issues Wikipedia Summary Word Sense Disambiguation and Induction Leon Derczynski University of Sheffield 27 January 2011Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  2. 2. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryOrigin Originally a course at ESSLLI 2011, Copenhagen by Roberto Navigli and Simone PonzettoLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  3. 3. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryOutline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 SummaryLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  4. 4. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryGeneral Problem Being able to disambiguate words in context is a crucial problem Can potentially help improve many other NLP applications Polysemy is everywhere – our job is to model this Ambiguity is rampant. I saw a man who is 98 years old and can still walk and tell jokes. saw:26 man:11 years:4 old:8 can:5 still:4 walk:10 tell:8 jokes:3 43 929 600 possible senses for this simple sentence. general problem, ambiguity is rampantLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  5. 5. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryWord Senses Monosemous words – only one meaning; plant life, internet Polysemous words – more than one meaning; bar, bass A word sense is a commonly-accepted meaning of a word. We are fond of fruit such as the kiwifruit and banana.Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  6. 6. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryEnumerative Approach Fixed sense inventory enumerates the range of possible meanings of a word Context is used to select a particular sense chop vegetables with a knife, was stabbed with a knife However, we may want to add senses.Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  7. 7. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryWSD Tasks Different representations of senses change the way we think about WSD Lexical sample – disambiguate a restricted set of words All words – disambiguate all content words Cross lingual WSD – disambiguate a target word by labeling it with the appropriate translation in other languages; eg. English coach → German Bus/Linienbus/Omnibus/Reisebus.Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  8. 8. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryRepresenting the Context Text is unstructured, and needs to be made machine-readable. Flat representation (surface features) vs. Structured representation (graphs, trees) Local features: local context of a word usage, e.g. PoS tags and surrounding word forms Topical features: general topic of a sentence or discourse, represented as a bag of words Syntactic features: argument-head relations between target and rest of sentence Semantic features: previously established word sensesLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  9. 9. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryKnowledge Resources Structured and Unstructured Thesauri, machine-readable dictionaries, semantic networks (WordNet) BabelNet – Babel synsets, with semantic relations (is-a, part-of) Raw corpora Collocation (Web1T)Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  10. 10. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryApplications Information extraction – acronym expansion, disambiguate people names, domain-specific IE Information retrieval Machine Translation Semantic web Question answeringLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  11. 11. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryApproaches Supervised WSD: classification task, hand-labelled data KB WSD: uses knowledge resources, no training Unsupervised: performs WSI Word sense dominance: find predominant sense of a word Domain-driven WSD: use domain information as vectors to compare with senses of wLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  12. 12. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryOutline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 SummaryLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  13. 13. Introduction WSD WSI Evaluation and Issues Wikipedia SummarySupervised WSD Given a set of manually sense-annotated examples (training set), learn a classifier Features for WSD: Bag of words, bigrams, collocations, VP and NP heads, PoS Using WordNet as a sense inventory, SemCor is a readily available source of sense-labelled data Current SotA performance from SVMsLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  14. 14. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryKnowledge-based WSD Exploit knowledge resources (dictionaries, thesauri, collocations) to assign senses Lower performance than supervised methods, but wider coverage No need to train or be tuned to a task/domainLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  15. 15. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryGloss Overlap Knowledge-based method proposed by Lesk (1986) Retrieve all sense definitions of target word Compare each sense definition with the definitions of other words in context Choose the sense with the most overlap To disambiguate pine cone; pine: 1. a kind of evergreen tree; 2. to waste away through sorrow. cone: 1. a solid body which narrows to a point; 2. something of this shape; 3. fruit of certain evergreen trees.Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  16. 16. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryLexical Chains Knowledge-based method proposed by Hirst and St Onge (1998) A lexical chain is a sequence of semantically related words in a text Assign scores to senses based on the chain of related words it is inLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  17. 17. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryPageRank Knowledge-based method proposed by Agirre and Soroa (2009) Build a graph including all synsets of words in the input text Assign an initial low value to each node in the graph Apply PageRank (Brin and Page) to the graph, and select synsets with the highest PRLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  18. 18. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryKnowledge Acquisition Bottleneck WSD needs knowledge! Corpora, dictionaries, semantic networks More knowledge is required to improve the performance of both: Supervised systems – more training data Knowledge based systems – richer networksLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  19. 19. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryMinimally Supervised WSD Human supervision is expensive, but required for training examples or a knowledge base Minimally supervised approaches aim to learn classifiers from annotated data with minimal human supervisionLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  20. 20. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryBootstrapping Given a set labelled examples L, a set of unlabelled examples U and a classifier c: 1. Choose N examples from U and add them to U ′ 2. Train c on L and label U ′ 3. Select K most confidently labelled instances from U ′ and assign them to L Repeat until U or K is emptyLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  21. 21. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryOutline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 SummaryLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  22. 22. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryWord Sense Induction Based on the idea that one sense of a word will have similar neighbouring words Follows the idea that the meaning of a word is given by its usage We induce word sense from input text by clustering word occurrencesLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  23. 23. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryClustering Unsupervised machine learning for grouping similar objects into groups No a priori input (sense labels) Context clustering: each occurrence of a word is represented as a context vector; cluster vectors into groups Word clustering: cluster words which are semantically similar and thus have a specific meaningLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  24. 24. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryWord Clustering Aims to cluster words which are semantically similar Lin (1998) proposes this method: 1. Extract dependency triples from a text corpus John eats a yummy kiwi → (eat subj John), (kiwi obj-of eat), (kiwi det a) ... 2. Define a measure of similarity between two words 3. Use similarity scores to create a similarity tree; start with a root node, and add recursively add children in descending order of similarity.Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  25. 25. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryLin’s approach: exampleLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  26. 26. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryWSI: pros and cons + Actually performs word sense disambiguation + Aims to divide the occurrences of a word into a number of classes - Makes objective evaluation more difficult if not domain-specificLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  27. 27. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryOutline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 SummaryLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  28. 28. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryDisambiguation Evaluation Disambiguation is easy to evaluate – we have discrete sense inventories Evaluate with Coverage (answers given), Precision and Recall, and then F1 Accuracy – correct answers / total answersLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  29. 29. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryDisambiguation Baselines MFS – Most Frequent Sense Strong baseline - 50-60% accuracy on lexical sample task Doesn’t take into account genre (e.g. star in astrophysics / newswire) Subject to idiosyncracies of corpusLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  30. 30. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryEvaluation with gold-standard clustering Given a standard clustering, compare the gold standard and output clustering Can evaluate with set Entropy, Purity Also RandIndex (similar to Jacquard) and F-Score.Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  31. 31. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryDiscrimination Baselines All-in-one: group all words into one big cluster Random: produce a random set of clustersLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  32. 32. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryPseudowords Discrimination evaluation method Generates new words with artificial ambiguity Select two or more monosemous terms from gold standard data Given all their occurrences in a corpus, replace them with a pseudoword formed by joining the two terms Compare automatic discrimination to gold standardLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  33. 33. Introduction WSD WSI Evaluation and Issues Wikipedia SummarySemEval-2007 Lexical sample and all-words coarse grained WSD Preposition disambiguation Evaluation of WSD on cross-language RI WSI, lexical substitution Top systems reach 88.7% accuracy (on lexical sample) and 82.5% (on all-words)Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  34. 34. Introduction WSD WSI Evaluation and Issues Wikipedia SummarySemEval-2010 Fifth event of its kind Includes specific cross-lingual tasks Combined WSI/WSD task Domain-specific all-words taskLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  35. 35. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryIssues Representation of word senses: enumerative vs. generative approach Knowledge Acquisition Bottleneck: not enough data! Benefits for AI/NLP applicationsLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  36. 36. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryAlleviating the Knowledge Acquisition Bottleneck Weakly-supervised algorithms, incorporating bootstrapping or active learning Continuing manual efforts – WordNet, Open Mind Word Expert, OntoNotes Automatic enrichment of knowledge resources – collocation and relation triple extraction, BabelNetLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  37. 37. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryFuture Challenges How can we mine even larger repositories of textual data – e.g. the whole web! – to create huge knowledge repositories? How can we design high performance and scalable algorithms to use this data? Need to decide which kind of word sense are needed for which application Still, need to develop a general representation of word sensesLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  38. 38. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryOutline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 SummaryLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  39. 39. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryWikipedia as sense inventory Wikipedia articles provide an inventory of disambiguated word senses and entity references Task: Use their occurrences in texts, i.e. the internal Wikipedia hyperlinks, as named entity and sense annotations The articles’ texts provide a sense annotated corpusLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  40. 40. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryMihalcea (2007) Mihalcea proposes a method for automatically generating sense-tagged data using Wikipedia Rhythm is the arrangement of sounds in time. Meter animates time in regular pulse groupings, called measures or [[bar (music)—bar]]. The nightlife is particularly active around the beachfront promenades because of its many nightclubs and [[bar (establishment)—bars]]. 1. Extract all paragraphs in Wikipedia containing word w 2. Collect all possible labels l2 ..ln for w 3. Map each label l to its WordNet sense s 4. Annotate each occurrence of li |w with its sense s System trained on Wikipedia significantly outperforms MFS and Lesk baselinesLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  41. 41. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryKnowledge-rich WSD General aim is to relieve knowledge acquisition bottleneck of NLP systems, with WSD as a case study Main ideas: - Extend WordNet with millions of semantic relations (using Wikipedia) - Apply knowledge-based WSD to exploit extended WordNet Results: integration of many, many semantic relations in knowledge-based systems yields performance competitive with SotA supervised approachesLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  42. 42. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryWikification The task of generating hyperlinks to disambiguated Wikipedia concepts Two sub-tasks: automatic keyword extraction, WSD Wikify!1 can perform KW extraction by extracting candidates and then ranking them The system does knowledge-based and data-driven WSD, filtering out annotations that contain disagreements Disambiguate links using relatedness, commonness (prior probability of a sense), and context quality (context terms). 1 Csomai and Mihalcea (2008)Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  43. 43. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryOutline 1 Introduction 2 WSD 3 WSI 4 Evaluation and Issues 5 Wikipedia 6 SummaryLeon Derczynski University of SheffieldWord Sense Disambiguation and Induction
  44. 44. Introduction WSD WSI Evaluation and Issues Wikipedia SummaryQuestions Thank you. Are there any questions?Leon Derczynski University of SheffieldWord Sense Disambiguation and Induction

×