About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disambiguation


Published on

Presentation about the paper
Graph Connectivity Measures for Unsupervised Word Sense Disambiguation
Roberto Navigli and Mirella Lapata

Published in: Engineering, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

About the paper: Graph Connectivity Measures for Unsupervised Word Sense Disambiguation

  1. 1. + about the paper: Graph Connectivity Measures for UnsupervisedWord Sense Disambiguation Giovanni Murru Mirella Lapata Seminars in Computational learning methods for Natural Language Processing Prof.Roberto Basili Roberto Navigli Dipartimento di Informatica Sapienza Università di Roma School of Informatics University of Edinburg
  2. 2. + Abstract n Development of graph-based unsupervised algorithms for Word Sense Disambiguation n Discussion about a variety of measures that analyze the connectivity of the graph structures n Test the performance of these approaches on standard data sets
  3. 3. + Word Sense Disambiguation n  Word Sense Disambiguation (WSD) is an open research topic in Natural Language Processing n  Its goal is to identify which sense of a word is intended in a context, a sentence. n  The sense of the word is selected from a set of predefined possibilities n  Sense Inventory (Dictionary,Thesaurus) n  Knowledge intensive methods, Supervised Learning
  4. 4. + The essentiality of WSD n  Word Sense Disambiguation is essential for many applications: n  Machine Translation (e.g. complex translations between natural languages, achieved with corpus techniques) n  Information Retrieval (Used in Internet) n  Question Answering n  Knowledge Acquisition n  Summarization
  5. 5. + Huge Data Sets n  One of the problems of Word Sense Disambiguation (WSD) is the necessity to deal with huge data sets, in particular with the supervised approach. n  While the Supervised Disambiguation is based on a labeled training set the Unsupervised Disambiguation uses unlabeled corpora. n  The corpora are large and structured sets of text. n  Supervised approach outperforms the unsupervised one, but requires large amounts of training data.
  6. 6. + Limitations of Supervised n  The Supervised Disambiguation can obtain reliable results only with words, whose sense has been labeled. n  These sense tagged corpora are usually created by-hand, and this is very expensive and requires a lot of work n  Paucity, scarcity of suitable data for many languages and text genres. n  POSSIBLE SOLUTION? Unsupervised Disambiguation
  7. 7. + Graph vs Similarity (1/2) n  The Unsupervised method can be generally divided in 2 categories: 1.  Graph Based 2.  Similarity Base n  No need to label senses à optimal for large scale sense disambiguation n  Similarity Based algorithms assign a sense to an ambiguous word by comparing each of its senses with those of the words surrounding the context. n  The sense with the highest similarity is assumed to be the right one.
  8. 8. + Graph vs Similarity (2/2) n  The work developed by Navigli and Lapata takes in account the Graph-Based approach. n  Graph-Based steps: n  Build a graph representing all possible interpretations of the word sequence that we have to disambiguate. n  Graph nodes à Word meanings n  Graph edges à Semantic relations between these senses n  Estimate the value of each node in order to determine its importance. n  Sense Disambiguation is about finding the most important node for each word.
  9. 9. + Building the Graph (1/2) n  In the experiments, Navigli and Lapata used the WordNet sense inventory. n  For each generic sentence σ they build a graph G n  σ= {w1,w2, … , wn} is a set of words n  The graph G is composed by a set of vertices Vσ = {v1, v2, … , vn} n  Vσ initially contains, for each word wi that belongs to σ, the set of senses associated to that particular word in the WordNet sense inventory. n  The set of the edges E of the graph G is initially empty
  10. 10. + Building the Graph (2/2) n  Let’s say V =Vσ n  For each word sense vi in Vσ, a depth-first search regarding it in the WordNet graph is performed, and n  everytime a different word vj also contained in Vσ is found n  The semantic relations encountered during the path between vi and vj are added to the set of edges E n  and the nodes involved in this path (between vi and vj) are added to the set V of the vertices of the graph G. n  G is hence a representation of the semantic relations between the words related to the particular sentence that G represents.
  11. 11. + Why the graph is built? n  G is a subgraph of the WordNet, whose vertices and relations are reasonably useful for the WSD problem n  Remember: n  The aim of WSD is to find the most appropriate sense for each word that belongs to the sentence σ. n  This is determined by ranking each vertex in the graph G, according to its importance. n  How can we achieve this ranking? How can we measure the relevance of a word sense? n  CONNECTIVITY MEASURES
  12. 12. + Connectivity Measures (1/2) n  They are used to rank the nodes in order to select the most plausible meaning. n  Connectivity measures can be of two types n  LOCAL n  GLOBAL n  While global measures estimate the connectivity of the entire structure of the graph, the local measures capture the degree of connectivity related to a single vertex in the graph.
  13. 13. + Connectivity Measures (2/2) n  Assume to work with undirected graphs n  The researchers motivated this choice because semantic relations often have a counterpart, like in the case of hypernymy and hyponymy (IS-A) n  e.g. RED n  Hypernymy: something that red is a kind of (e.g. chromatic color) n  Hyponymy: something that is a kind of red (e.g. scarlet) n  They define a distance function d as the length of the shortest path between two nodes n  In the case these two nodes are disconnected, d = K, where K is the number of the graph’s nodes.
  14. 14. + Local Measures (1/2) n  Local measures used in the experiments are: n  In-degree centrality n  Normalized number of edges terminating in a vertex n  Betweenness centrality n  The normalized fraction of shortest paths between node pairs that pass through a vertex n  Key Player Problem (KPP) n  The normalized sum of the inverse of the distances between the vertex and the remaining nodes of the graph KPP(v) = 1 d(u,v)u∈S,v∈T ∑ V −1
  15. 15. + Local Measures (2/2) n  The researchers also used the local measures: n  HITS and PageRank n  Link analysis algorithms that are normally used to rate web pages, but can also be applied in the graph theory because of the particular structure of the web. n  Maximum Flow n  Maximum s-t flow: number of independent paths between a pair of vertices contained in the same partition of s and t respectively. n  Evaluates the flow towards a vertex v, as a measure of the sum of the maximum flows having v as a sink and the other vertices of the graph as source.
  16. 16. + Global Measures (1/2) n  They characterize the overall graph structure, thus they are not particularly helpful in selecting a unique sense for ambiguous words n  Navigli and Lapata used these 3 well-known Global Measures in their experiments: n  Compactness n  High value à vertices are connected with small distances, the graph is compact n  Low value à vertices are disconnected or connected with big distances.
  17. 17. + Global Measures (2/2) n  Graph Entropy n  Low value = few vertices are important n  High value = vertices are almost equally important n  Edge Density n  Is computed as the ratio between the number of edges in a graph and the number of edges of a complete graph with the same number of nodes.
  18. 18. + Experiments n  The experiments organized by Navigli and Lapata used a sentence-by-sentence disambiguation approach in order to evaluate the lately explained measures. n  They built a graph for each sentence, ranked the nodes using the measures, and selected the most appropriate meanings. n  They tested their algorithm using two different sense inventories: n  WordNet 2.0 n  An extended version of WordNet created by Navigli, adding semantic edges (~ 60.000) extracted from collocation resources (e.g Oxford Collocation, etc), that in particular defines restrictions on how words can be used together: n  e.g. strong tea is ok, powerful tea is not
  19. 19. + Experiments n  Two data standard sets n  SemCor Corpus n  subset of Brown Corpus n  200,000 words manually tagged with WordNet senses n  Senseval-3 English all word n  subset of Penn TreeBank Corpus n  2,081 words manually tagged with WordNet senses n  All the connectivity measures tested with SemCor. n  The best performing with SemCor, was tested with Senseval-3 too. n  Comparison between the graph-based algorithm developed by the researchers and a naïve criterion that randomly selects a sense for each word
  20. 20. + The tests’ results (1/4) n  The tests were made using words with more than one WordNet sense (polysemous). n  They used a chi-square test, a common statistical test. LEGEND: Prec: Precision, measure of exactness Rec: Recall, measure of completeness F1: mean between Precision and Recall F1 = 2 • PREC • REC PREC + REC
  21. 21. + The tests’ results (2/4) n  PageRank better than HITS: maybe because of the random surfer model, researcher stated. n  The best performing local measure is KPP with a F1=31.8% or F1=40.5% using WordNet or EnWordNet respectively. n  The best performing global measure is Graph Entropy with a F1=29.4% (WordNet) and F1=30.5% (ExtWordNet) § EnWordNet performs better than WordNet: • The existence of a denser lexicon with large number of semantic relations enhance the measures.
  22. 22. + The tests’ results (3/4) n  Since KPP was the best performing algorithm in SemCor, the researcher tested the behavior of this particular algorithm with SensVal-3 too, using the Enriched version of WordNet. n  And they compare it with the actually best unsupervised system, based on a domain driven disambiguation.
  23. 23. + The tests’ results (4/4) n  IRST-DDD compares the domain of the context surrounding the target word with the domain of its senses and uses a version of WordNet augmented with the use of domain labels (e.g. economy, geography). n  KPP comparable to IRST-DDD for nouns and adjectives, but worst for verbs. n  This can be explained as a lack of sentence relations (related to verbs) in the enriched WordNet used for the tests.
  24. 24. + Summary n  Navigli and Lapata presented a study of graph connectivity measures for unsupervised WSD. n  A large number of local and global measures has been evaluated. n  Local measures perform better than Global ones. n  KPP is better than other connectivity measures at identifying which node in the graph is maximally connected to the others (same results also in social network analysis). n  If the enrichment of WordNet is increased PageRank and InDegree are comparable to KPP in terms of performance.