A Survey on Unsupervised Graph-based Word Sense Disambiguation

  • 1,872 views
Uploaded on

Presents comparative evaluations of graph …

Presents comparative evaluations of graph
based word sense disambiguation techniques using several measures of
word semantic similarity and several ranking algorithms. Unsupervised
word sense disambiguation has received a lot of attention lately because
of it's fast execution time and it's ability to make the most of a small
input corpus. Recent state of the art graph based systems have tried to
close the gap between the supervised and the unsupervised approaches.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,872
On Slideshare
0
From Embeds
0
Number of Embeds
11

Actions

Shares
Downloads
39
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A Survey on Unsupervised Graph-based Word Sense Disambiguation Elena-Oana Tabaranu elena.tabaranu@info.uaic.ro UAIC, Iasi
  • 2. Plan 1.Introduction 2.State of the Art 3.Experiments and Results 4.Conclusions 5.References Elena-Oana Tabaranu 2
  • 3. Introduction ● WSD = assign automatically the most appropriate meaning to a polysemous word within a given context (Sinha et al, 2007) ● Use Cases: ● Machine translation ● Speech processing ● Boosting the performance of tasks like text retrieval, document classification and document clustering Elena-Oana Tabaranu 3
  • 4. State of the Art ● Supervised WSD vs Unsupervised WSD ● GWSD and Semantic Graph Construction ● SAN Method ● Page-Rank Method ● HITS Method ● P-Rank Method Elena-Oana Tabaranu 4
  • 5. Supervised WSD vs Unsupervised WSD ● Most approaches transform ● Identify the best sense the sense of the word into a candidate for a model of the feature vector word sense dependency in text ● Low execution time ● Ranking algorithm to choose ● Accuracy of 60%-70% their most likely combination ● Major disadvantage: ● Window, graph based knowledge aquisition representation of the model bottleneck (accuracy connected to the amount of ● Fast execution time manually anotated data) ● Accuracy of 40%-60% Elena-Oana Tabaranu 5
  • 6. Graph-based WSD ● GWSD = graph representation used to model word sense dependencies in text (WSD with graphs, not just word window) ● Goal: identify the most probable sense (label) for each word ● Advantage: takes into account information drawn from the entire graph Elena-Oana Tabaranu 6
  • 7. Semantic Graph Construction (I) ● Example (Sinha et al, 2007) Elena-Oana Tabaranu 7
  • 8. Semantic Graph Construction (II) ● Example (Tsatsaronis et al, 2010) Elena-Oana Tabaranu 8
  • 9. The Page-Rank Method (Brin and Page, 1998) ● Ranking algorithm based on the idea of voting: when one node links to another it offers a vote to that other node ● The higher the number of votes for a note, the higher the importance of the node ● Recursively score the candidate nodes for a weighted undirected graph Elena-Oana Tabaranu 9
  • 10. The P-Rank Method (Zao et al, 2009) ● Check the structural similarity of nodes in an information network ● Based on the idea that two nodes are similar if they reference and also reference similar nodes ● Represents a generalization of other state of the art measures like CoCitation, Coupling, Amsler, SimLink Elena-Oana Tabaranu 10
  • 11. The HITS Method (Kleinberg,1999) ● Identify authorities = the most important nodes in the graph ● Identify hubs = the nodes which point to authorities ● The sense with the highest authority is chosen as the most likely one for each word ● Major disadvantage: densely connected nodes can attract the highest score (clique attack) Elena-Oana Tabaranu 11
  • 12. Experiments and Results (I) ● Senseval 2 and 3 data sets often used for testing ● Occurencies for Senseval 2 using WordNet 2 ● Occurencies for Senseval 3 using WordNet 2 Elena-Oana Tabaranu 12
  • 13. Experiments and Results (II) ● Accuracies on the Senseval 2 and 3 English All Words Task data sets (Tsatsaronis et al) Elena-Oana Tabaranu 13
  • 14. Conclusions ● Recent systems minimise the gap between supervised and unsupervised approaches. ● The graph-based methods make the most of the rich semantic model they employ. ● Unsupervised approaches seek the optimal value for the parameters using as little training data as possible and testing on as large a dataset as possible. ● Future work: implement P-Rank using a different representation, for example Sinha et al. Elena-Oana Tabaranu 14
  • 15. References 1. Tsatsaronis, G., Varlamis, I., Norvag, K. : An Experimental Study on Unsupervised Graph-based Word Sense Disambiguation. In Proc. of CICLing (2010). 2. Sinha, R., Mihalcea, R. :Unsupervised graph-based word sense disambiguation using measures of semantic similarity. In Proc. of ICSC (2007). 3. Mihalcea, R., Csomai, A. : Senselearner: Word sense disambiguation for all words in unrestricted text. In Proc. of ACL, pages 53-56 (2005). 4. Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I. :Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri. In Proc. of IJCAI (2007). Elena-Oana Tabaranu 15
  • 16. Questions? Elena-Oana Tabaranu 16