Word sense disambiguation is an essential, yet a very difficult task in natural language processing. While several other NLP tasks, such as POS tagging, can provide more than fairly good results (highly accurate, with almost 100% rate of successfully labeled words), disambiguation is far from achieving such performances. However, we will demonstrate the need of word sense disambiguation in computing the lexical chains on a special kind of text (chats) using a WordNet-based approach. In addition, we will try to identify the bottlenecks (mostly in respect to accuracy) in such an approach and provide possible improvements
Forest laws, Indian forest laws, why they are important
Word sense disambiguation and lexical chains construction using wordnet
1. WORD SENSE DISAMBIGUATION AND LEXICAL
CHAINS CONSTRUCTION USING WORDNET
The Second Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and
Connectivity
September 14th 2010, Bucharest,
ROMANIA
Costin-Gabriel Chiru, Andrei Jancă, Traian Rebedea
Politehnica University of Bucharest
2. Summary
The corpus
Lexical Chains
Semantic Distances
WordNet
Word sense
disambiguation
Results
Further research
3. The corpus
Chats consisting of 4 or 5 participants,
debating subjects related to collaborative
learning
High percentage of misspelled words
Keywords - “wiki” and “forum” – not present
in WordNet with the proper sense
Utterances not necessarily a correct text in
respect to English grammar
4. Lexical chains
Lexical chain: set of words where each word has
an acceptable degree of semantic relatedness to
every other word in the set
Each word must be fitted into a lexical chain –
How?
The percentage of words in the chain that are
“related” to a word
Over 90% - word “belongs” to that chain
5. Semantic distance
When are two words considered to be
“related”?
Methods of computing a semantic distance
between words – a number in respect to the
strength of the semantic connection
Most methods use word frequency and the
lowest superordinate => the need for an
ontology
6. WordNet
Ontology containing information about the
semantic relationships between words
Words defining the same concept are
grouped into sets - Synsets
Directed acyclic graph – synsets are nodes,
semantic relationships are vertices =>
consistency with the lower superordinate
concept
7. Semantic distances in WN
Path length
• If a vertex exists between two nodes, the two
synsets are related through a semantic
relationship
• The length of such a path shows the strength of
a relationship between two senses.
Conrath-Jiang measure
• Uses the lower superordinate and word
frequency
• Word frequency – number of hits returned by a
8. Word sense disambiguation
Assigning a sense to an ambiguous word,
based on a context
Semantic distance is computed for senses,
not for words
The following scenario might occur : two
words are found to be semantically close, but
not for their right senses
9. Word sense disambiguation (2)
Context = a window of words
Window size – trade-off between time and
quality of results
Our corpus : ideas and the subject of the
utterances often alternate => a large window
size is not necessary
10. Word sense disambiguation (3)
Each word in the context has a list of senses
A set of word-sense pairs : for each word in
the window, a sense is assigned
We must choose the best such set => a score
must be computed for each set
What evaluation function should compute the
score?
11. Evaluation function for WSD
The degree of a word-sense pair : The number
of senses related to that sense in a set
We use WN and semantic distances to
determine if two senses are related
High degree – higher probability of a correct
word-sense assignment
The average of all degrees in a set – high
average = high score
12. Evaluation function for WSD (2)
Problem : many word-sense pairs with very low
degree and few with very high degree
All degrees should be “packed” around the
average => standard deviation of all degrees.
Low standard deviation = high score
Low semantic distances = all senses are closely
related => we need the average and standard
deviation of all distances
13. Results
Window size : 3-4 utterances
High threshold for computing semantic
relatedness
The lowest superordinate is part of the
shortest path between two senses – we
ignore path lengths > 4
A word is included in a chain if it is related to
90% of the words present in that chain
14. Results (2)
Vocabulary size for corpus - 1696 words
With WSD
Average chain length for entire corpus = 52.68 words
Number of chains = 649
Longest chain : 95 distinct words
Number of unitary chains (chains with a single distinct word) =
394 => 23.21 % of all words are part of such a chain
Without WSD
Average chain length for entire corpus = 94.48 words
Number of chains = 358 , of which 260 are unitary chains.
Therefore, some chains are very long and probably inaccurate
Longest chain : 756 distinct words (over 40% of the vocabulary
size)
15. Further research
There is a high dependency between the
linguistic tool (WN is used now) , the corpus and
algorithms for tasks like lexical chaining and
WSD
Bottlenecks and key points of this system must
be identified
Wikipedia – can be used as an ontology
(Wikipedia has a category graph), as well as a
relevant corpus
Implementing a spell-checker to increase the
number of words taken into account
16. Further research (2)
Each word must be fitted into a lexical chain –
How?
When is a chain stronger, rather than where
does a word best belong
• “Strength” of a chain = how closely related are
the words
• Output is a set of chains => “strength” of a set of
chains
• State-space searching : the output of the current
lexical chaining algorithm is the initial state, while
the final state is an acceptably strong set of
chains