SlideShare a Scribd company logo
1 of 8
Download to read offline
Jan Zizka (Eds) : CCSIT, SIPP, AISC, PDCTA - 2013
pp. 517–524, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3656
AN APPROACH TO WORD SENSE
DISAMBIGUATION COMBINING
MODIFIED LESK AND BAG-OF-WORDS
Alok Ranjan Pal1, 3
, Anirban Kundu2, 3
, Abhay Singh1
, Raj Shekhar1
,
Kunal Sinha1
1
College of Engineering and Management, West Bengal, India 721171
{chhaandasik, abhaysingh3185, rajshekharssp,
kunalsameer87}@gmail.com
2
Shenzhen Key Laboratory of Data Science and Modeling, Kuang-Chi Institute
of Advanced Technology, Shenzhen, Guangdong, P. R. China 518057
anirban.kundu@kuang-chi.org
3
Innovation Research Lab (IRL), West Bengal, India 711103
anik76in@gmail.com
ABSTRACT
In this paper, we are going to propose a technique to find meaning of words using Word Sense
Disambiguation using supervised and unsupervised learning. This limitation of information is
main flaw of the supervised approach. Our proposed approach focuses to overcome the
limitation using learning set which is enriched in dynamic way maintaining new data. We
introduce a mixed methodology having “Modified Lesk” approach and “Bag-of-Words” having
enriched bags using learning methods.
KEYWORDS
Word Sense Disambiguation (WSD), Modified Lesk (ML), Bag-of-Words (BOW).
1. INTRODUCTION
Word Sense Disambiguation (WSD) [1-2] is the process for identification of probable meaning of
ambiguous words based on distinct situations. Words with multiple meaning are ambiguous in
nature. The process of identification to decide appropriate meaning of an ambiguous word for a
particular context is known as WSD. People decide the meaning of a word based on the
characteristic points of a discussion or situation using their own merits. Machines have no ability
to decide such an ambiguous situation unless some protocols have been planted into the
machines’ memory.
In supervised learning, a learning set is considered for the system to predict the meaning of
ambiguous words using a few sentences having a specific meaning of particular ambiguous
518 Computer Science & Information Technology (CS & IT)
words. Specific learning set is generated as a result for each instance of different meaning. A
system finds the probable meaning of an ambiguous word for the particular context based on
defined learning set. It shows the result based on information available in database [3-4].
In unsupervised learning, online dictionary is taken as learning set avoiding the inefficiency of
supervised learning. “WordNet” is the most widely used online dictionary [5-7] maintaining
“words and related meanings” as well as “relations among different words”.
Organization of rest of the paper is as follows: Section 2 is about related activities of our paper,
based on the existing methods; Section 3 describes the proposed approach; Section 4 depicts
experimental results along with comparison; Section 5 represents conclusion of the paper.
2. RELATED WORKS
Many algorithms have been designed in WSD based on supervised and unsupervised learning.
“Lesk” and “Bag-of-Words” are two well-known methods which are discussed in this section as
the basis of our proposed approach.
Typical Lesk approach selects a short phrase from the sentence containing an ambiguous word.
Then, dictionary definition (gloss) of each of the senses for ambiguous word is compared with
glosses of other words in that particular phrase. An ambiguous word is being assigned with the
particular sense, whose gloss has highest frequency with the glosses of other words of the phrase.
The Bag-of-Words approach is a model, used in Natural Language Processing (NLP), to find out
the actual meaning of a word having different meaning due to different contexts. In this approach,
there is a bag for each sense of a keyword (disambiguated word) and all the bags are manually
populated. When the meaning of a keyword would be disambiguated, the sentence (containing the
keyword) is picked up and the entire sentence would be broken into separate words. Then, each
word of the sentence (except “stop words”) would be compared with each word of each “sense”
bags searching for the maximum frequency of words in common.
This paper adopts the basic ideas from typical Lesk algorithm and Bag-of-Words algorithm
introducing some modifications. In Modified Lesk Approach, gloss of keyword is only
considered within specific sentence instead of selection of all words. Number of common words
is being calculated between specific sentence and each dictionary based definitions of particular
keyword. A list of distinct words from the “Lesk” approach and “Bag-of-Words” approach is
prepared based on successful disambiguation of the keyword.
Disambiguation probability would be increased based on enrichment of the bag. It means that
learning method is tried to introduce within the typical concept of bags. If the bag grows
infinitely, then disambiguation accuracy would be near to 100% in a typical way. Actual growth
of the bag is limited depending on real-time memory management.
3. PROPOSED APPROACH
Design of our approach is presented in form of flow chart in this section. This approach is
designed to achieve a disambiguated result with higher precision values.
Computer Science & Information Technology (CS & IT) 519
In our approach, “stop words” like ‘a’, ‘an’, ‘the’, etc. are being discarded from input texts as
these words are meaningless to derive the “sense” of the particular sentence. Then, the text
containing meaningful words (excluding the stop words) is passed through “Bag-of-Words” and
“Modified Lesk” algorithms in a parallel fashion. “Bag-of-Words” algorithm is considered as
“Module 1”; and, “Modified Lesk” is considered as “Module 2”. These two algorithms are
responsible to find the actual sense of ambiguous words in the particular context. The unmatched
words in these algorithms are being stored in a temporary database for further usage. After that,
results of “Module 1” and “Module 2” have been being analysed to formulate the particular sense
depending on the context of the sentence in “Module 3”. If at least either of the algorithms (using
“Module 1” or “Module 2”) find the sense applying logical “OR” operation on the projected
results, then particular sense is assigned to the unmatched words in the temporary database.
Correctness of results based on the implemented algorithms is checked in “Module 4”. If both
algorithms derive same result obtained by applying “AND” operation on two results of “Module
1” and “Module 2”, then the sense is considered as disambiguated sense. Therefore, unmatched
words (kept in a temporary database) has to be moved to related sense bag as per the “Bag-of-
Words” algorithm in “Module 1” to participate in disambiguation method now onwards.
Otherwise, the derived senses are considered as the probable senses and unmatched words are
being moved to an anticipated database in “Module 5”. Figure 1 shows the modular division of
our proposed approach. If the occurrence of a word in the anticipated database with a particular
sense crosses specified threshold, the word is considered to be used for decision making and is
moved to the related sense bag of the “Bag-of-Words” algorithm in “Module 1” to participate in
disambiguation.
Fig. 1. Modular Division of Proposed Design.
Fig. 1 describes the overall procedure for the disambiguation of words.
Fig. 2 is based on Module 1 and it shows to find the sense of an ambiguous word using Bag-of-Words
approach.
520 Computer Science & Information Technology (CS & IT)
Fig. 2. Flowchart of Bag-of-Words approach.
Fig. 3. Flowchart of Modified Lesk approach.
Fig. 3 is based on Module 2 and it finds the sense of an ambiguous word using Modified Lesk.
Fig. 4 is designed based on Module 3. It formulates actual sense of the ambiguous word using results from
previous two modules. If at least one of the two approaches can derive the sense, that is considered as the
disambiguated sense.
Computer Science & Information Technology (CS & IT) 521
Fig. 4. Flowchart to Formulate Sense.
Fig. 5. Flowchart for checking correctness of sense.
Fig. 6. Flowchart for Learning Set Enrichment.
Fig. 5 is designed based on Module 4. It finds the correctness of disambiguated sense using
“AND” operation, derived by Module 1 and Module 2. If both approaches derive same sense, the
result of “AND” operation is ‘1’. Otherwise, for all other cases, the result is ‘0’.
Fig. 6 is designed based on Module 5 activities. It enriches the learning set by populating with
words from temporary database.
522 Computer Science & Information Technology (CS & IT)
Key feature of this approach is based on auto enrichment property of the learning set. Initially, if
any word is not present in the learning set, it could not be able for participation in case of
disambiguation. Though, its probable meaning would be stored in the database. When the number
of occurrences of the particular word with a particular sense crosses specific threshold value, the
word is inserted in the learning set to take part in disambiguation procedure. Therefore, the
efficiency of the disambiguation process is increased by this auto increment property of the
learning set.
4. EXPERIMENTAL RESULTS
Typical word sense disambiguation based approaches examine efficiency based on three
parameters such as “Precision”, “Recall”, and “F-measure”. Precision (P) is the ratio of “matched
target words based on human decision” and “number of instances responded by the system based
on the particular words”. Recall value (R) is the ratio of “number of target words for which the
answer matches with the human decided answer” and “total number of target words in the
dataset”. F-Measure is evaluated as “(2*P*R / (P+R))” based on the calculation of Precision and
Recall value. Different types of datasets are being considered in our experimentation to exhibit
the superiority of our proposed design.
Testing has been performed on huge datasets among which a sample is considered for showing
the comparison results between typical approaches and our proposed approach. In Table 1,
“Plant” and “Bank” have considered as target words. Main focus is the precision value as it is the
most dependable parameter in this type of disambiguation tests. Comparison among three
algorithms has been depicted in Table 1.
Sample Data for Test 1:
This is SBI bank. He goes to bank. Ram is a good boy. Smoke is coming out of cement plant. He
deposited Rs. 10,000 in SBI bank account. Are you near the bank of river? He is sitting on bank
of river. We must plant flowers and trees. To maintain environment green, all must plant flowers
and trees in our locality. The police made a plan with a motive to catch thieves with evidence.
Target Words: Bank, Plant.
Table 1. F-Measure Comparison in Test 1.
Algorithms Precision Recall Value F-Measure
Modified Lesk 1.0 0.3 0.5
Bag-Of-Words 1.0 0.67 0.80
Proposed Approach 1.0 0.88 0.94
Sample Data for Test 2:
We live in an era where bank plays an important role in life. Bank provides social security.
Money is an object which makes 90% human beings greedy but still people deposit money in
bank without fear. Reason for above activity is trust. The bank which creates maximum trust in
the hearts of people is considered to be most successful bank. Few such trustful names in India
are SBI, PNB and RBI. RBI is such a big name that people can bank upon it. Here is a small
story, one day a boy found a one rupee coin near the bank of the river. He wanted to keep that
Computer Science & Information Technology (CS & IT) 523
money safe. But he could not found any one upon whom he can bank upon. He thought to deposit
the money under a tree, in the ground, near the bank of river. Moral of the story kids find earth as
the safest bank. Here is another story about a beggar. A beggar deposited lot of money in her hut
which was near the bank of Ganga. One day other beggars found her asset and they planned to
loot that money. When the beggar came to know about the plan she shouted for help. Nobody but
a bank came to rescue and they helped the 80 year old to open an account and keep her money
safe.
Target Word: Bank.
Table 2. F-Measure Comparison in Test 2.
Algorithms Precision Recall Value F-Measure
Modified Lesk 0.83 0.45 0.58
Bag-Of-Words 0.71 0.45 0.55
Proposed Approach 0.77 0.6 0.68
In Table 2, the result is below our expectations as initial database is small for “Bag-of-Words”
approach. “Modified Lesk” (unsupervised) has shown better results than “Bag-of-Words”
(supervised).
Sample Data for Test 3:
This is PNB bank. He goes to bank. He was in PNB bank for money transfer. He deposited Rs
10,000 in PNB bank account. Are you near the bank of river? He is sitting on bank of river. He
was in PNB bank for money transfer. We must plant flowers and trees. He was in PNB bank for
money transfer. This is PNB bank. This is PNB bank. This is PNB bank. He was in PNB bank for
money transfer. He was in PNB bank for money transfer. He was in PNB bank for money
transfer. He was in PNB bank for money transfer. This is PNB bank. This is PNB bank. This is
PNB bank. This is PNB bank. This is his SBI bank.
Target Words: Bank.
Table 3. F-Measure Comparison in Test 3.
Algorithms Precision Recall Value F-Measure
Modified Lesk 1.0 0.15 0.26
Bag-Of-Words 1.0 0.45 0.62
Proposed Approach 1.0 0.85 0.92
In Table 3, the text is long enough to give combined approach more chances to show its
efficiency. Few lines are repeated in order to overcome the threshold value.
Table 4. Average of Test Results.
Algorithm Precision Recall Value F-Measure
Modified Lesk 0.94 0.3 0.45
Bag-of-Words 0.90 0.52 0.66
Proposed Approach 0.90 0.78 0.85
Table 4 shows average values of all the tests performed. Efficiency of an algorithm based on fixed
size learning set is improved in this paper enriching datasets. “Bag-of-Words” and “Modified
524 Computer Science & Information Technology (CS & IT)
Lesk” approaches individually exhibit the “F-Measure” as 0.66 and 0.45 respectively; whereas
proposed approach shows “F-Measure” as 0.85 since learning set is dynamically enriched with
new context sensitive definitions.
5. CONCLUSION
Our approach has established better performance in enhanced WSD based on learning sets.
Disambiguation accuracy is improved using enriched datasets. Higher precision value, recall
value, and F-Measure have achieved.
ACKNOWLEDGMENT
Flow cytometry data analysis system based on GPUs platform (No. JC201005280651A)
REFERENCES
[1] Nameh , M., S., Fakhrahmad, M., Jahromi, M. Z.: A New Approach to Word Sense Disambiguation
Based on Context Similarity, Proceedings of the World Congress on Engineering, Vol. I, 2011
[2] Navigli, R.: Word Sense Disambiguation: a Survey, ACM Computing Surveys, Vol. 41, No. 2, 2009,
ACM Press, pp. 1-69
[3] Kolte, S. G., Bhirud, S. G.: Word Sense Disambiguation Using WordNet Domains, First International
Conference on Digital Object Identifier, 2008, pp. 1187-1191
[4] Xiaojie, W., Matsumoto, Y.: Chinese word sense disambiguation by combining pseudo training data,
Proceedings of The International Conference on Natural Language Processing and Knowledge
Engineering, 2003, pp. 138-143
[5] Liu, Y., Scheuermann, P., Li, X., Zhu, X.: Using WordNet to Disambiguate Word Senses for Text
Classification, Proceedings of the 7th International Conference on Computational Science, Springer-
Verlag, 2007, pp. 781 - 789
[6] Cañas ,A. J., Valerio,A., Lalinde-Pulido,J., Carvalho,M., Arguedas,M.: Using WordNet for Word
Sense Disambiguation to Support Concept Map Construction, String Processing and Information
Retrieval, 2003, pp. 350-359
[7] Seo, H., Chung, H., Rim, H., Myaeng, S. H., Kim, S.: Unsupervised word sense disambiguation using
WordNet relatives, Computer Speech and Language, Vol. 18, No. 3, 2004, pp. 253-273

More Related Content

What's hot

NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241Urjit Patel
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSijnlc
 
Performance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithmsPerformance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithmseSAT Journals
 
Performance analysis of linkage learning techniques
Performance analysis of linkage learning techniquesPerformance analysis of linkage learning techniques
Performance analysis of linkage learning techniqueseSAT Publishing House
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONijnlc
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysiskrukob9
 
Semantic analyzer for marathi text
Semantic analyzer for marathi textSemantic analyzer for marathi text
Semantic analyzer for marathi texteSAT Journals
 
MODELLING OF INTELLIGENT AGENTS USING A–PROLOG
MODELLING OF INTELLIGENT AGENTS USING A–PROLOGMODELLING OF INTELLIGENT AGENTS USING A–PROLOG
MODELLING OF INTELLIGENT AGENTS USING A–PROLOGijaia
 
Coreference Resolution using Hybrid Approach
Coreference Resolution using Hybrid ApproachCoreference Resolution using Hybrid Approach
Coreference Resolution using Hybrid Approachbutest
 
Designing of an efficient algorithm for identifying Abbreviation definitions ...
Designing of an efficient algorithm for identifying Abbreviation definitions ...Designing of an efficient algorithm for identifying Abbreviation definitions ...
Designing of an efficient algorithm for identifying Abbreviation definitions ...ijcsit
 
IRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET Journal
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...csandit
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationijnlc
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
 

What's hot (19)

NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241NLP_Project_Paper_up276_vec241
NLP_Project_Paper_up276_vec241
 
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
 
Performance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithmsPerformance analysis of linkage learning techniques in genetic algorithms
Performance analysis of linkage learning techniques in genetic algorithms
 
Performance analysis of linkage learning techniques
Performance analysis of linkage learning techniquesPerformance analysis of linkage learning techniques
Performance analysis of linkage learning techniques
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Sentence analysis
Sentence analysisSentence analysis
Sentence analysis
 
Semantic analyzer for marathi text
Semantic analyzer for marathi textSemantic analyzer for marathi text
Semantic analyzer for marathi text
 
Semantic analyzer for marathi text
Semantic analyzer for marathi textSemantic analyzer for marathi text
Semantic analyzer for marathi text
 
MODELLING OF INTELLIGENT AGENTS USING A–PROLOG
MODELLING OF INTELLIGENT AGENTS USING A–PROLOGMODELLING OF INTELLIGENT AGENTS USING A–PROLOG
MODELLING OF INTELLIGENT AGENTS USING A–PROLOG
 
Coreference Resolution using Hybrid Approach
Coreference Resolution using Hybrid ApproachCoreference Resolution using Hybrid Approach
Coreference Resolution using Hybrid Approach
 
Designing of an efficient algorithm for identifying Abbreviation definitions ...
Designing of an efficient algorithm for identifying Abbreviation definitions ...Designing of an efficient algorithm for identifying Abbreviation definitions ...
Designing of an efficient algorithm for identifying Abbreviation definitions ...
 
IRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP Techniques
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classification
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 

Similar to AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfNohaGhoweil
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationIOSR Journals
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...ijctcm
 
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGLARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGkevig
 
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGLARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Textkevig
 
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...IJECEIAES
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 
The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 
Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modellingRiddhi Jain
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Miningmlaij
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 

Similar to AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS (20)

G04124041046
G04124041046G04124041046
G04124041046
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense Disambiguation
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
 
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGLARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
 
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGLARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
NBLex: emotion prediction in Kannada-English code-switchtext using naïve baye...
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modelling
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
 
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 

More from cscpconf

ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
 
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
 
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
 
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIESPROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
 
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICA SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
 
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICTWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
 
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINDETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
 
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMIMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
 
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
 
AUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEWAUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
 
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKCLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
 
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAPROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
 
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHCHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
 

More from cscpconf (20)

ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR
 
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATION
 
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...
 
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIESPROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIES
 
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICA SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGIC
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
 
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICTWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTIC
 
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINDETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAIN
 
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...
 
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMIMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEM
 
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...
 
AUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEWAUTOMATED PENETRATION TESTING: AN OVERVIEW
AUTOMATED PENETRATION TESTING: AN OVERVIEW
 
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKCLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORK
 
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...
 
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAPROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
 
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHCHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCH
 
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...
 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
 
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTGENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXT
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Recently uploaded (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS

  • 1. Jan Zizka (Eds) : CCSIT, SIPP, AISC, PDCTA - 2013 pp. 517–524, 2013. © CS & IT-CSCP 2013 DOI : 10.5121/csit.2013.3656 AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-WORDS Alok Ranjan Pal1, 3 , Anirban Kundu2, 3 , Abhay Singh1 , Raj Shekhar1 , Kunal Sinha1 1 College of Engineering and Management, West Bengal, India 721171 {chhaandasik, abhaysingh3185, rajshekharssp, kunalsameer87}@gmail.com 2 Shenzhen Key Laboratory of Data Science and Modeling, Kuang-Chi Institute of Advanced Technology, Shenzhen, Guangdong, P. R. China 518057 anirban.kundu@kuang-chi.org 3 Innovation Research Lab (IRL), West Bengal, India 711103 anik76in@gmail.com ABSTRACT In this paper, we are going to propose a technique to find meaning of words using Word Sense Disambiguation using supervised and unsupervised learning. This limitation of information is main flaw of the supervised approach. Our proposed approach focuses to overcome the limitation using learning set which is enriched in dynamic way maintaining new data. We introduce a mixed methodology having “Modified Lesk” approach and “Bag-of-Words” having enriched bags using learning methods. KEYWORDS Word Sense Disambiguation (WSD), Modified Lesk (ML), Bag-of-Words (BOW). 1. INTRODUCTION Word Sense Disambiguation (WSD) [1-2] is the process for identification of probable meaning of ambiguous words based on distinct situations. Words with multiple meaning are ambiguous in nature. The process of identification to decide appropriate meaning of an ambiguous word for a particular context is known as WSD. People decide the meaning of a word based on the characteristic points of a discussion or situation using their own merits. Machines have no ability to decide such an ambiguous situation unless some protocols have been planted into the machines’ memory. In supervised learning, a learning set is considered for the system to predict the meaning of ambiguous words using a few sentences having a specific meaning of particular ambiguous
  • 2. 518 Computer Science & Information Technology (CS & IT) words. Specific learning set is generated as a result for each instance of different meaning. A system finds the probable meaning of an ambiguous word for the particular context based on defined learning set. It shows the result based on information available in database [3-4]. In unsupervised learning, online dictionary is taken as learning set avoiding the inefficiency of supervised learning. “WordNet” is the most widely used online dictionary [5-7] maintaining “words and related meanings” as well as “relations among different words”. Organization of rest of the paper is as follows: Section 2 is about related activities of our paper, based on the existing methods; Section 3 describes the proposed approach; Section 4 depicts experimental results along with comparison; Section 5 represents conclusion of the paper. 2. RELATED WORKS Many algorithms have been designed in WSD based on supervised and unsupervised learning. “Lesk” and “Bag-of-Words” are two well-known methods which are discussed in this section as the basis of our proposed approach. Typical Lesk approach selects a short phrase from the sentence containing an ambiguous word. Then, dictionary definition (gloss) of each of the senses for ambiguous word is compared with glosses of other words in that particular phrase. An ambiguous word is being assigned with the particular sense, whose gloss has highest frequency with the glosses of other words of the phrase. The Bag-of-Words approach is a model, used in Natural Language Processing (NLP), to find out the actual meaning of a word having different meaning due to different contexts. In this approach, there is a bag for each sense of a keyword (disambiguated word) and all the bags are manually populated. When the meaning of a keyword would be disambiguated, the sentence (containing the keyword) is picked up and the entire sentence would be broken into separate words. Then, each word of the sentence (except “stop words”) would be compared with each word of each “sense” bags searching for the maximum frequency of words in common. This paper adopts the basic ideas from typical Lesk algorithm and Bag-of-Words algorithm introducing some modifications. In Modified Lesk Approach, gloss of keyword is only considered within specific sentence instead of selection of all words. Number of common words is being calculated between specific sentence and each dictionary based definitions of particular keyword. A list of distinct words from the “Lesk” approach and “Bag-of-Words” approach is prepared based on successful disambiguation of the keyword. Disambiguation probability would be increased based on enrichment of the bag. It means that learning method is tried to introduce within the typical concept of bags. If the bag grows infinitely, then disambiguation accuracy would be near to 100% in a typical way. Actual growth of the bag is limited depending on real-time memory management. 3. PROPOSED APPROACH Design of our approach is presented in form of flow chart in this section. This approach is designed to achieve a disambiguated result with higher precision values.
  • 3. Computer Science & Information Technology (CS & IT) 519 In our approach, “stop words” like ‘a’, ‘an’, ‘the’, etc. are being discarded from input texts as these words are meaningless to derive the “sense” of the particular sentence. Then, the text containing meaningful words (excluding the stop words) is passed through “Bag-of-Words” and “Modified Lesk” algorithms in a parallel fashion. “Bag-of-Words” algorithm is considered as “Module 1”; and, “Modified Lesk” is considered as “Module 2”. These two algorithms are responsible to find the actual sense of ambiguous words in the particular context. The unmatched words in these algorithms are being stored in a temporary database for further usage. After that, results of “Module 1” and “Module 2” have been being analysed to formulate the particular sense depending on the context of the sentence in “Module 3”. If at least either of the algorithms (using “Module 1” or “Module 2”) find the sense applying logical “OR” operation on the projected results, then particular sense is assigned to the unmatched words in the temporary database. Correctness of results based on the implemented algorithms is checked in “Module 4”. If both algorithms derive same result obtained by applying “AND” operation on two results of “Module 1” and “Module 2”, then the sense is considered as disambiguated sense. Therefore, unmatched words (kept in a temporary database) has to be moved to related sense bag as per the “Bag-of- Words” algorithm in “Module 1” to participate in disambiguation method now onwards. Otherwise, the derived senses are considered as the probable senses and unmatched words are being moved to an anticipated database in “Module 5”. Figure 1 shows the modular division of our proposed approach. If the occurrence of a word in the anticipated database with a particular sense crosses specified threshold, the word is considered to be used for decision making and is moved to the related sense bag of the “Bag-of-Words” algorithm in “Module 1” to participate in disambiguation. Fig. 1. Modular Division of Proposed Design. Fig. 1 describes the overall procedure for the disambiguation of words. Fig. 2 is based on Module 1 and it shows to find the sense of an ambiguous word using Bag-of-Words approach.
  • 4. 520 Computer Science & Information Technology (CS & IT) Fig. 2. Flowchart of Bag-of-Words approach. Fig. 3. Flowchart of Modified Lesk approach. Fig. 3 is based on Module 2 and it finds the sense of an ambiguous word using Modified Lesk. Fig. 4 is designed based on Module 3. It formulates actual sense of the ambiguous word using results from previous two modules. If at least one of the two approaches can derive the sense, that is considered as the disambiguated sense.
  • 5. Computer Science & Information Technology (CS & IT) 521 Fig. 4. Flowchart to Formulate Sense. Fig. 5. Flowchart for checking correctness of sense. Fig. 6. Flowchart for Learning Set Enrichment. Fig. 5 is designed based on Module 4. It finds the correctness of disambiguated sense using “AND” operation, derived by Module 1 and Module 2. If both approaches derive same sense, the result of “AND” operation is ‘1’. Otherwise, for all other cases, the result is ‘0’. Fig. 6 is designed based on Module 5 activities. It enriches the learning set by populating with words from temporary database.
  • 6. 522 Computer Science & Information Technology (CS & IT) Key feature of this approach is based on auto enrichment property of the learning set. Initially, if any word is not present in the learning set, it could not be able for participation in case of disambiguation. Though, its probable meaning would be stored in the database. When the number of occurrences of the particular word with a particular sense crosses specific threshold value, the word is inserted in the learning set to take part in disambiguation procedure. Therefore, the efficiency of the disambiguation process is increased by this auto increment property of the learning set. 4. EXPERIMENTAL RESULTS Typical word sense disambiguation based approaches examine efficiency based on three parameters such as “Precision”, “Recall”, and “F-measure”. Precision (P) is the ratio of “matched target words based on human decision” and “number of instances responded by the system based on the particular words”. Recall value (R) is the ratio of “number of target words for which the answer matches with the human decided answer” and “total number of target words in the dataset”. F-Measure is evaluated as “(2*P*R / (P+R))” based on the calculation of Precision and Recall value. Different types of datasets are being considered in our experimentation to exhibit the superiority of our proposed design. Testing has been performed on huge datasets among which a sample is considered for showing the comparison results between typical approaches and our proposed approach. In Table 1, “Plant” and “Bank” have considered as target words. Main focus is the precision value as it is the most dependable parameter in this type of disambiguation tests. Comparison among three algorithms has been depicted in Table 1. Sample Data for Test 1: This is SBI bank. He goes to bank. Ram is a good boy. Smoke is coming out of cement plant. He deposited Rs. 10,000 in SBI bank account. Are you near the bank of river? He is sitting on bank of river. We must plant flowers and trees. To maintain environment green, all must plant flowers and trees in our locality. The police made a plan with a motive to catch thieves with evidence. Target Words: Bank, Plant. Table 1. F-Measure Comparison in Test 1. Algorithms Precision Recall Value F-Measure Modified Lesk 1.0 0.3 0.5 Bag-Of-Words 1.0 0.67 0.80 Proposed Approach 1.0 0.88 0.94 Sample Data for Test 2: We live in an era where bank plays an important role in life. Bank provides social security. Money is an object which makes 90% human beings greedy but still people deposit money in bank without fear. Reason for above activity is trust. The bank which creates maximum trust in the hearts of people is considered to be most successful bank. Few such trustful names in India are SBI, PNB and RBI. RBI is such a big name that people can bank upon it. Here is a small story, one day a boy found a one rupee coin near the bank of the river. He wanted to keep that
  • 7. Computer Science & Information Technology (CS & IT) 523 money safe. But he could not found any one upon whom he can bank upon. He thought to deposit the money under a tree, in the ground, near the bank of river. Moral of the story kids find earth as the safest bank. Here is another story about a beggar. A beggar deposited lot of money in her hut which was near the bank of Ganga. One day other beggars found her asset and they planned to loot that money. When the beggar came to know about the plan she shouted for help. Nobody but a bank came to rescue and they helped the 80 year old to open an account and keep her money safe. Target Word: Bank. Table 2. F-Measure Comparison in Test 2. Algorithms Precision Recall Value F-Measure Modified Lesk 0.83 0.45 0.58 Bag-Of-Words 0.71 0.45 0.55 Proposed Approach 0.77 0.6 0.68 In Table 2, the result is below our expectations as initial database is small for “Bag-of-Words” approach. “Modified Lesk” (unsupervised) has shown better results than “Bag-of-Words” (supervised). Sample Data for Test 3: This is PNB bank. He goes to bank. He was in PNB bank for money transfer. He deposited Rs 10,000 in PNB bank account. Are you near the bank of river? He is sitting on bank of river. He was in PNB bank for money transfer. We must plant flowers and trees. He was in PNB bank for money transfer. This is PNB bank. This is PNB bank. This is PNB bank. He was in PNB bank for money transfer. He was in PNB bank for money transfer. He was in PNB bank for money transfer. He was in PNB bank for money transfer. This is PNB bank. This is PNB bank. This is PNB bank. This is PNB bank. This is his SBI bank. Target Words: Bank. Table 3. F-Measure Comparison in Test 3. Algorithms Precision Recall Value F-Measure Modified Lesk 1.0 0.15 0.26 Bag-Of-Words 1.0 0.45 0.62 Proposed Approach 1.0 0.85 0.92 In Table 3, the text is long enough to give combined approach more chances to show its efficiency. Few lines are repeated in order to overcome the threshold value. Table 4. Average of Test Results. Algorithm Precision Recall Value F-Measure Modified Lesk 0.94 0.3 0.45 Bag-of-Words 0.90 0.52 0.66 Proposed Approach 0.90 0.78 0.85 Table 4 shows average values of all the tests performed. Efficiency of an algorithm based on fixed size learning set is improved in this paper enriching datasets. “Bag-of-Words” and “Modified
  • 8. 524 Computer Science & Information Technology (CS & IT) Lesk” approaches individually exhibit the “F-Measure” as 0.66 and 0.45 respectively; whereas proposed approach shows “F-Measure” as 0.85 since learning set is dynamically enriched with new context sensitive definitions. 5. CONCLUSION Our approach has established better performance in enhanced WSD based on learning sets. Disambiguation accuracy is improved using enriched datasets. Higher precision value, recall value, and F-Measure have achieved. ACKNOWLEDGMENT Flow cytometry data analysis system based on GPUs platform (No. JC201005280651A) REFERENCES [1] Nameh , M., S., Fakhrahmad, M., Jahromi, M. Z.: A New Approach to Word Sense Disambiguation Based on Context Similarity, Proceedings of the World Congress on Engineering, Vol. I, 2011 [2] Navigli, R.: Word Sense Disambiguation: a Survey, ACM Computing Surveys, Vol. 41, No. 2, 2009, ACM Press, pp. 1-69 [3] Kolte, S. G., Bhirud, S. G.: Word Sense Disambiguation Using WordNet Domains, First International Conference on Digital Object Identifier, 2008, pp. 1187-1191 [4] Xiaojie, W., Matsumoto, Y.: Chinese word sense disambiguation by combining pseudo training data, Proceedings of The International Conference on Natural Language Processing and Knowledge Engineering, 2003, pp. 138-143 [5] Liu, Y., Scheuermann, P., Li, X., Zhu, X.: Using WordNet to Disambiguate Word Senses for Text Classification, Proceedings of the 7th International Conference on Computational Science, Springer- Verlag, 2007, pp. 781 - 789 [6] Cañas ,A. J., Valerio,A., Lalinde-Pulido,J., Carvalho,M., Arguedas,M.: Using WordNet for Word Sense Disambiguation to Support Concept Map Construction, String Processing and Information Retrieval, 2003, pp. 350-359 [7] Seo, H., Chung, H., Rim, H., Myaeng, S. H., Kim, S.: Unsupervised word sense disambiguation using WordNet relatives, Computer Speech and Language, Vol. 18, No. 3, 2004, pp. 253-273