WORD SENSE DISAMBIGUATION AND LEXICAL
CHAINS CONSTRUCTION USING WORDNET
The Second Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and
Connectivity
September 14th 2010, Bucharest,
ROMANIA
Costin-Gabriel Chiru, Andrei Jancă, Traian Rebedea
Politehnica University of Bucharest
Summary
 The corpus
 Lexical Chains
 Semantic Distances
 WordNet
 Word sense
disambiguation
 Results
 Further research
The corpus
 Chats consisting of 4 or 5 participants,
debating subjects related to collaborative
learning
 High percentage of misspelled words
 Keywords - “wiki” and “forum” – not present
in WordNet with the proper sense
 Utterances not necessarily a correct text in
respect to English grammar
Lexical chains
 Lexical chain: set of words where each word has
an acceptable degree of semantic relatedness to
every other word in the set
 Each word must be fitted into a lexical chain –
How?
 The percentage of words in the chain that are
“related” to a word
 Over 90% - word “belongs” to that chain
Semantic distance
 When are two words considered to be
“related”?
 Methods of computing a semantic distance
between words – a number in respect to the
strength of the semantic connection
 Most methods use word frequency and the
lowest superordinate => the need for an
ontology
WordNet
 Ontology containing information about the
semantic relationships between words
 Words defining the same concept are
grouped into sets - Synsets
 Directed acyclic graph – synsets are nodes,
semantic relationships are vertices =>
consistency with the lower superordinate
concept
Semantic distances in WN
 Path length
• If a vertex exists between two nodes, the two
synsets are related through a semantic
relationship
• The length of such a path shows the strength of
a relationship between two senses.
 Conrath-Jiang measure
• Uses the lower superordinate and word
frequency
• Word frequency – number of hits returned by a
Word sense disambiguation
 Assigning a sense to an ambiguous word,
based on a context
 Semantic distance is computed for senses,
not for words
 The following scenario might occur : two
words are found to be semantically close, but
not for their right senses
Word sense disambiguation (2)
 Context = a window of words
 Window size – trade-off between time and
quality of results
 Our corpus : ideas and the subject of the
utterances often alternate => a large window
size is not necessary
Word sense disambiguation (3)
 Each word in the context has a list of senses
 A set of word-sense pairs : for each word in
the window, a sense is assigned
 We must choose the best such set => a score
must be computed for each set
 What evaluation function should compute the
score?
Evaluation function for WSD
 The degree of a word-sense pair : The number
of senses related to that sense in a set
 We use WN and semantic distances to
determine if two senses are related
 High degree – higher probability of a correct
word-sense assignment
 The average of all degrees in a set – high
average = high score
Evaluation function for WSD (2)
 Problem : many word-sense pairs with very low
degree and few with very high degree
 All degrees should be “packed” around the
average => standard deviation of all degrees.
 Low standard deviation = high score
 Low semantic distances = all senses are closely
related => we need the average and standard
deviation of all distances
Results
 Window size : 3-4 utterances
 High threshold for computing semantic
relatedness
 The lowest superordinate is part of the
shortest path between two senses – we
ignore path lengths > 4
 A word is included in a chain if it is related to
90% of the words present in that chain
Results (2)
 Vocabulary size for corpus - 1696 words
 With WSD
 Average chain length for entire corpus = 52.68 words
 Number of chains = 649
 Longest chain : 95 distinct words
 Number of unitary chains (chains with a single distinct word) =
394 => 23.21 % of all words are part of such a chain
 Without WSD
 Average chain length for entire corpus = 94.48 words
 Number of chains = 358 , of which 260 are unitary chains.
Therefore, some chains are very long and probably inaccurate
 Longest chain : 756 distinct words (over 40% of the vocabulary
size)
Further research
 There is a high dependency between the
linguistic tool (WN is used now) , the corpus and
algorithms for tasks like lexical chaining and
WSD
 Bottlenecks and key points of this system must
be identified
 Wikipedia – can be used as an ontology
(Wikipedia has a category graph), as well as a
relevant corpus
 Implementing a spell-checker to increase the
number of words taken into account
Further research (2)
 Each word must be fitted into a lexical chain –
How?
 When is a chain stronger, rather than where
does a word best belong
• “Strength” of a chain = how closely related are
the words
• Output is a set of chains => “strength” of a set of
chains
• State-space searching : the output of the current
lexical chaining algorithm is the initial state, while
the final state is an acceptably strong set of
chains
Q & A

Word sense disambiguation and lexical chains construction using wordnet

  • 1.
    WORD SENSE DISAMBIGUATIONAND LEXICAL CHAINS CONSTRUCTION USING WORDNET The Second Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity September 14th 2010, Bucharest, ROMANIA Costin-Gabriel Chiru, Andrei Jancă, Traian Rebedea Politehnica University of Bucharest
  • 2.
    Summary  The corpus Lexical Chains  Semantic Distances  WordNet  Word sense disambiguation  Results  Further research
  • 3.
    The corpus  Chatsconsisting of 4 or 5 participants, debating subjects related to collaborative learning  High percentage of misspelled words  Keywords - “wiki” and “forum” – not present in WordNet with the proper sense  Utterances not necessarily a correct text in respect to English grammar
  • 4.
    Lexical chains  Lexicalchain: set of words where each word has an acceptable degree of semantic relatedness to every other word in the set  Each word must be fitted into a lexical chain – How?  The percentage of words in the chain that are “related” to a word  Over 90% - word “belongs” to that chain
  • 5.
    Semantic distance  Whenare two words considered to be “related”?  Methods of computing a semantic distance between words – a number in respect to the strength of the semantic connection  Most methods use word frequency and the lowest superordinate => the need for an ontology
  • 6.
    WordNet  Ontology containinginformation about the semantic relationships between words  Words defining the same concept are grouped into sets - Synsets  Directed acyclic graph – synsets are nodes, semantic relationships are vertices => consistency with the lower superordinate concept
  • 7.
    Semantic distances inWN  Path length • If a vertex exists between two nodes, the two synsets are related through a semantic relationship • The length of such a path shows the strength of a relationship between two senses.  Conrath-Jiang measure • Uses the lower superordinate and word frequency • Word frequency – number of hits returned by a
  • 8.
    Word sense disambiguation Assigning a sense to an ambiguous word, based on a context  Semantic distance is computed for senses, not for words  The following scenario might occur : two words are found to be semantically close, but not for their right senses
  • 9.
    Word sense disambiguation(2)  Context = a window of words  Window size – trade-off between time and quality of results  Our corpus : ideas and the subject of the utterances often alternate => a large window size is not necessary
  • 10.
    Word sense disambiguation(3)  Each word in the context has a list of senses  A set of word-sense pairs : for each word in the window, a sense is assigned  We must choose the best such set => a score must be computed for each set  What evaluation function should compute the score?
  • 11.
    Evaluation function forWSD  The degree of a word-sense pair : The number of senses related to that sense in a set  We use WN and semantic distances to determine if two senses are related  High degree – higher probability of a correct word-sense assignment  The average of all degrees in a set – high average = high score
  • 12.
    Evaluation function forWSD (2)  Problem : many word-sense pairs with very low degree and few with very high degree  All degrees should be “packed” around the average => standard deviation of all degrees.  Low standard deviation = high score  Low semantic distances = all senses are closely related => we need the average and standard deviation of all distances
  • 13.
    Results  Window size: 3-4 utterances  High threshold for computing semantic relatedness  The lowest superordinate is part of the shortest path between two senses – we ignore path lengths > 4  A word is included in a chain if it is related to 90% of the words present in that chain
  • 14.
    Results (2)  Vocabularysize for corpus - 1696 words  With WSD  Average chain length for entire corpus = 52.68 words  Number of chains = 649  Longest chain : 95 distinct words  Number of unitary chains (chains with a single distinct word) = 394 => 23.21 % of all words are part of such a chain  Without WSD  Average chain length for entire corpus = 94.48 words  Number of chains = 358 , of which 260 are unitary chains. Therefore, some chains are very long and probably inaccurate  Longest chain : 756 distinct words (over 40% of the vocabulary size)
  • 15.
    Further research  Thereis a high dependency between the linguistic tool (WN is used now) , the corpus and algorithms for tasks like lexical chaining and WSD  Bottlenecks and key points of this system must be identified  Wikipedia – can be used as an ontology (Wikipedia has a category graph), as well as a relevant corpus  Implementing a spell-checker to increase the number of words taken into account
  • 16.
    Further research (2) Each word must be fitted into a lexical chain – How?  When is a chain stronger, rather than where does a word best belong • “Strength” of a chain = how closely related are the words • Output is a set of chains => “strength” of a set of chains • State-space searching : the output of the current lexical chaining algorithm is the initial state, while the final state is an acceptably strong set of chains
  • 17.

Editor's Notes

  • #3 Beginning course details and/or books/materials needed for a class/project.
  • #4 Beginning course details and/or books/materials needed for a class/project.
  • #5 Beginning course details and/or books/materials needed for a class/project.
  • #6 Beginning course details and/or books/materials needed for a class/project.
  • #7 Beginning course details and/or books/materials needed for a class/project.
  • #8 Beginning course details and/or books/materials needed for a class/project.
  • #9 Beginning course details and/or books/materials needed for a class/project.
  • #10 Beginning course details and/or books/materials needed for a class/project.
  • #11 Beginning course details and/or books/materials needed for a class/project.
  • #12 Beginning course details and/or books/materials needed for a class/project.
  • #13 Beginning course details and/or books/materials needed for a class/project.
  • #14 Beginning course details and/or books/materials needed for a class/project.
  • #15 Beginning course details and/or books/materials needed for a class/project.
  • #16 Beginning course details and/or books/materials needed for a class/project.
  • #17 Beginning course details and/or books/materials needed for a class/project.
  • #18 Beginning course details and/or books/materials needed for a class/project.