CLIN-2015 Presentation
Word Sense Disambiguation is still an unsolved problem in Natural Language Processing.
We claim that most approaches do not model the context correctly, by relying
too much on the local context (the words surrounding the word in question), or on
the most frequent sense of a word. In order to provide evidence for this claim, we
conducted an in-depth analysis of all-words tasks of the competitions that have been
organized (Senseval 2&3, Semeval-2007, Semeval-2010, Semeval 2013). We focused
on the average error rate per competition and across competitions per part of speech,
lemma, relative frequency class, and polysemy class. In addition, we inspected the
“difficulty” of a token(word) by calculating the average polysemy of the words in the
sentence of a token. Finally, we inspected to what extent systems always chose the
most frequent sense. The results from Senseval 2, which are representative of other
competitions, showed that the average error rate for monosemous words was 33.3%
due to part of speech errors. This number was 71% for multiword and phrasal verbs.
In addition, we observe that higher polysemy yields a higher error rate. Moreover, we
do not observe a drop in the error rate if there are multiple occurrences of the same
lemma, which might indicate that systems rely mostly on the sentence itself. Finally,
out of the 799 tokens for which the correct sense was not the most frequent sense, system
still assigned the most frequent sense in 84% of the cases. For future work, we plan
to develop a strategy in order to determine in which context the predominant sense
should be assigned, and more importantly when it should not be assigned. One of the
most important parts of this strategy would be to not only determine the meaning of
a specific word, but to also know it’s referential meaning. For example, in the case of
the lemma ‘winner’, we do not only want to know what ‘winner’ means, but we also
want to know what this ‘winner’ won and who this ‘winner’ was.
2. Motivation
Word Sense Disambiguation is still an unsolved problem
2 Izquierdo, Postma and Vossen VU Amsterdam
3. Error Analysis
Perform error analysis on previousWSD evaluations to prove
our hypothesis
Senseval-2: all-words task
Senseval-3: all-words task
Semeval2007: all-words task (#17)
Semeval2010: all-words on specific domain (#17)
Semeval2013: multilingual all-wordsWSD and entity linking
(#12)
3 Izquierdo, Postma and Vossen VU Amsterdam
4. Motivation
Some “propagated” errors
Errors on monosemous
Errors because pos-tags
Multiwords and phrasal verbs
Little attention has been paid to the real problem
WSD is not 1 problem but N problems
Our hypothesis
Context is not modeled properly in general
System rely too much on the most frequent sense
4 Izquierdo, Postma and Vossen VU Amsterdam
8. Most Frequent Sense
When the correct sense is NOT the most frequent sense
Systems still assign mostly the MFS
Senseval2
799 tokens are not MFS
84% systems still assign the MFS
Most “failed” words due to MFS bias
Senseval2, senseval3
Say.v find.v take.v have.v cell.n church.n
Semeval2010
Area.n nature.n connection.n water.n population.n
8 Izquierdo, Postma and Vossen VU Amsterdam
13. Expected vs. Observed
difficulties
Calculate per sentence
The “expected” difficulty
Average polysemy, sentence length, average word length
13 Izquierdo, Postma and Vossen VU Amsterdam
14. Calculate per sentence
The “expected” difficulty
Average polysemy, sentence length, average word length
14 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties
15. Calculate per sentence
The “expected” difficulty
Average polysemy, sentence length, average wor length
The “observed” difficulty
From the real participant outputs, average error rate
We should expect:
harder sentences higher error rate
easier sentences lower error rate
15 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties
18. • The context is not (probably) exploited properly
• Expected “easy” sentences SHOULD show low error rates
• Occurrences of the same word in different contexts have similar error
rate
• The difficulty of a word depends more on its polysemy than on the
context where it appears
18 Izquierdo, Postma and Vossen VU Amsterdam
Expected vs. Observed
difficulties