A knowledge based approach
Word Sense Disambiguation
Submitted by:
Pradeep Sachdeva – 10104678
Surbhi Verma – 10104686
Sup...
• Words in the English language often
correspond to different meanings in different
contexts. Such words are referred to a...
The album includes a few instrumental pieces.
His efforts have been instrumental in solving the problem.
Consider the foll...
The solution to the problem of WSD impacts
other computer related writing such as:
• improving relevance of search engines...
• Supervised Methods
• Unsupervised Methods
Dictionary or knowledge based methods
Different Approaches
• Supervised methods are based on the assumption that
the context can provide enough evidence on its own to
disambiguate w...
• In this approach the underlying assumption is
that similar senses occur in similar contexts,
and thus senses can be indu...
• Knowledge based methods rely primarily on
dictionaries, thesauri, and lexical knowledge
bases, without using any corpus ...
• WordNet is a lexical database for the English
language which groups English words into sets
of synonyms called synsets, ...
• Every synset contains a group of synonymous words
or collocations ; different senses of a word are in different synsets....
The synsets of the word sea are :-
1. sea (synonyms): a division of an ocean or a large body of salt water
partially enclo...
The algorithm computes an overall impact of
the following parameters on the similarity of
two words:
• Intersection
• Hier...
NS1 S2
LEVEL 1
Intersection is computed as the number of overlapping words
between the word families of senses of target w...
NS1 S2
PNPS1 PS2
Including the hypernyms at level 2:
Intersection at Level 2
PS1, PS2 and PN are parents or hypernyms of S...
NS1 S2
PN
PS1
PS2
P2S1 P2S2P2N
Including the successive hypernyms at Level 3:
Intersection at Level 3
Score
We compute the overall impact of intersection, hierarchical level and distance on
the degree of similarity between t...
Evaluation - SemCor
The algorithm has been evaluated on the SemCor dataset, which is
the largest publicly available sense-...
The algorithm has been evaluated in the following three ways:
Top 1 – This refers to the case when the correct sense i.e. ...
Comparison of resultsComparison of results
Therefore the algorithm performs better than the existing approaches in this ar...
Upcoming SlideShare
Loading in …5
×

An Improved Approach to Word Sense Disambiguation

641 views

Published on

Published in: Engineering, Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
641
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

An Improved Approach to Word Sense Disambiguation

  1. 1. A knowledge based approach Word Sense Disambiguation Submitted by: Pradeep Sachdeva – 10104678 Surbhi Verma – 10104686 Supervisor: Dr. Sandeep Kumar Singh
  2. 2. • Words in the English language often correspond to different meanings in different contexts. Such words are referred to as polysemous words (words having more than one sense). • This project presents a knowledge based algorithm for disambiguating polysemous word in any given sentence using computational linguistics tool, WordNet. Problem Statement
  3. 3. The album includes a few instrumental pieces. His efforts have been instrumental in solving the problem. Consider the following sentences:
  4. 4. The solution to the problem of WSD impacts other computer related writing such as: • improving relevance of search engines • anaphora resolution, • coherence and inference. WSD is an intermediate language engineering technology which could improve applications such as information retrieval (IR). Relevance of WSD
  5. 5. • Supervised Methods • Unsupervised Methods Dictionary or knowledge based methods Different Approaches
  6. 6. • Supervised methods are based on the assumption that the context can provide enough evidence on its own to disambiguate words. However, they are subject to a new knowledge acquisition bottleneck since they rely on substantial amounts of manually sense-tagged corpora for training, which are laborious and expensive to create. • They depend crucially on the existence of manually annotated examples for every word sense, a requisite that can so far be met only for a handful of words for testing purposes. Supervised Methods
  7. 7. • In this approach the underlying assumption is that similar senses occur in similar contexts, and thus senses can be induced from text by clustering word occurrences using some measure of similarity of context. New occurrences of the word can be classified into the closest induced clusters/senses. • Performance of unsupervised methods is lower than other methods. Unsupervised Methods
  8. 8. • Knowledge based methods rely primarily on dictionaries, thesauri, and lexical knowledge bases, without using any corpus evidence. Therefore, these methods do not require any kind of training corpus. • Performance of these methods is high and also they do not face the challenge of new knowledge acquisition since there is no training data required. Knowledge Based Methods
  9. 9. • WordNet is a lexical database for the English language which groups English words into sets of synonyms called synsets, provides short, general definitions and the various semantic relations between these synonym sets. About Wordnet
  10. 10. • Every synset contains a group of synonymous words or collocations ; different senses of a word are in different synsets. • The meaning of the synsets is further clarified with short defining glosses(Definitions and/or example sentences) • Most synonym sets are connected to other synsets via a number of semantic relations. A few of them include :  hypernyms: Y is a hypernym of X if every X is a (kind of) Y (bird is a hypernym of parrot)  hyponyms: Y is a hyponym of X if every Y is a (kind of) X (parrot is a hypernym of bird)  meronym: Y is a meronym of X if Y is a part of X (window is a meronym of building)  holonym: Y is a holonym of X if X is a part of Y (building is a holonym of window)
  11. 11. The synsets of the word sea are :- 1. sea (synonyms): a division of an ocean or a large body of salt water partially enclosed by land – It has hypernyms - body of water, water – It has hyponyms - south sea – It has meronyms - bay, inlet, recess, embayment, gulf – It has holonyms - hydrosphere 2. sea, ocean (synonyms) : anything apparently limitless in quantity or volume – It has hypernyms - large indefinite amount, large indefinite quantity 3. Sea (synonyms): turbulent water with swells of considerable size – It has hypernyms - turbulent flow – It has hyponyms - head sea An example
  12. 12. The algorithm computes an overall impact of the following parameters on the similarity of two words: • Intersection • Hierarchical Level • Distance Algorithm
  13. 13. NS1 S2 LEVEL 1 Intersection is computed as the number of overlapping words between the word families of senses of target word and the nearby word at various levels of the hierarchy. At LEVEL 1: Let us assume there are two senses of the target word. Let the word families of two senses of a target word be S1 and S2. Also let the word families of all the senses of a nearby word be represented by a single set N. Intersection at Level 1
  14. 14. NS1 S2 PNPS1 PS2 Including the hypernyms at level 2: Intersection at Level 2 PS1, PS2 and PN are parents or hypernyms of S1, S2 and N respectively
  15. 15. NS1 S2 PN PS1 PS2 P2S1 P2S2P2N Including the successive hypernyms at Level 3: Intersection at Level 3
  16. 16. Score We compute the overall impact of intersection, hierarchical level and distance on the degree of similarity between target and nearby words. We have devised a formula of score as follows: Score = (Intersection)1/k1 (Level)k2 * (Distance)1/k3 The values of k1, k2 and k3 have been experimentally determined as: K1 = 3, k2 = 3, k3 = 3
  17. 17. Evaluation - SemCor The algorithm has been evaluated on the SemCor dataset, which is the largest publicly available sense-tagged-corpora created at Princeton University. It has been automatically mapped to various versions of the WordNet. For every polysemous word in a sentence, SemCor provides the sense it corresponds to in accordance with the WordNet.
  18. 18. The algorithm has been evaluated in the following three ways: Top 1 – This refers to the case when the correct sense i.e. the sense specified by Semcor has been given the highest score by the algorithm and is ranked as first. Top 2 – This refers to the case when the correct sense i.e. the sense specified by Semcor is one of the top 2 scoring senses given by the algorithm. Top 3 – This refers to the case when the correct sense i.e. the sense specified by Semcor is one of the top 3 scoring senses given by the algorithm
  19. 19. Comparison of resultsComparison of results Therefore the algorithm performs better than the existing approaches in this area.

×