SlideShare a Scribd company logo
1 of 6
Download to read offline
Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014
DOI:10.5121/acij.2014.5403 17
A DECISION TREE BASED WORD SENSE
DISAMBIGUATION SYSTEM IN MANIPURI
LANGUAGE
Richard Laishram Singh1
, Krishnendu Ghosh1
, Kishorjit Nongmeikapam2
and
Sivaji Bandyopadhyay3
1
School of Computer Engineering, Kalinga Institute of Industrial Technology,
Bhubaneswar, India
2
Department of Computer Science and Engineering, Manipur Institute of Technology
Manipur, India
3
Dept of Computer Science and Engineering, Jadavpur University, Kolkata, India
ABSTRACT
This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri
language. The paper discusses related attempts made in the Manipuri language followed by the proposed
plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study.
Conventional positional and context based features are suggested to capture the sense of the words, which
have ambiguous and multiple senses. The proposed work is expected to predict the senses of the
polysemous words with high accuracy with the help of the suitable knowledge acquisition techniques. The
system produces an accuracy of 71.75 %.
KEYWORDS
Word sense disambiguation(WSD); Classification and regression tree (CART); polysemous word;
Manipuri.
1. INTRODUCTION
Word sense disambiguation (WSD) is the technique to disambiguate between the senses of a
word [1]. It has been recognized as one of the major problems in the field of natural language
processing [2]. Word sense disambiguation is not an isolated system by itself, but can be fitted
with other important tasks such as, information retrieval, machine translation, speech processing,
parts-of-speech tagging and text processing [3, 4]. It has been noted that, the sense of a word is
determined based on its syntactic, positional, contextual and other relevant factors. In different
scenario, the sense or the meaning of a single word can be different. Such words are known as
polysemous words. The current study suggests development of an automatic WSD system [5]
with the help of a knowledge acquisition technique.
The goal of the present work is to develop a system which can disambiguate between the different
senses of a polysemous word in Manipuri language. In European languages, efficient and
automatic WSD systems are present. In Indian languages, such systems are mostly rule-based due
to lack of standardized database and presence of the proper knowledge acquisition tools. So far,
negligible amount of work has been reported in Manipuri language. That motivates the present
attempt to collect a suitable database consisting of every possible contexts and senses, used in
day-to-day life.
Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014
18
The current study reports mainly the development issues of a WSD system in Manipuri language.
Manipuri is basically a Tibeto-Burman language, spoken mainly in the valley of Manipur, a
North-Eastern state of India. Hence, the syntactic and semantic structures of the language are
different from other Indian languages. The task of disambiguating the senses in Manipuri
language poses a challenge due to its agglutinative, reduplicative, compounding and tonal nature
[6]. The language uses the traditional Meitei-Mayek script for general use. But, that script is not
used in the current study due of presence of limited words and structures. The present work uses
Bengali script for developing the database. The lack of a standard font made the task even more
challenging. Besides, limited amount of works are reported in Manipuri language in the field of
natural language processing such as, POS tagging [7], Transliteration [8] and negligible amount
in WSD.
The paper is organized as follows: related literatures are discussed in section 2. The details of the
corpus are mentioned in section 3 The proposed architecture of the word sense disambiguation
module is discussed in the section 4 followed by the conclusion and direction towards future work
in section 5. Finally a few important references are added.
2. LITERATURE SURVEY
Several works on WSD are reported in English and other European languages. On the
contrary, limited works have been noted for Indian languages. The approaches are classified in
five groups: (i) selection restriction based disambiguation, (ii) machine learning based approach,
(iii) supervised learning approach, (iv) bootstrapping approach and (v) dictionary based
disambiguation.
2.1 Selection restriction based disambiguation
Selection restriction and type hierarchies are the primary knowledge sources used to disambiguate
the senses of the words in most of the integrated approaches. In an integrated rule based approach
for semantic analysis [9], selection restrictions block the formation of component meaning
representations containing the selection restriction violations [1]. For Indian languages, where the
semantic and syntactic structures are not analyzed properly and the required knowledge sources
are missing, rule based processes generally are not suggested.
2.2 Machine learning based approach
The template is used to format your paper and style the text. All margins, column widths, line
spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For
example, the head margin in this template measures proportionately more than is customary. This
measurement and others are deliberate, using specifications that anticipate your paper as one part
of the entire proceedings, and not as an independent document. Please do not revise any of the
current designations.
2.3 Supervised learning approach
Supervised learning is a specific category of the machine learning approaches to train the
classifier from supervised (labelled) training data [1]. The classifier is fed with the training data
consisting of the exemplars along with the output labels. The classifier predicts the correct output
value for any valid test input object.
Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014
19
2.4 Bootstrapping approach
Bootstrapping approach is an incrementive machine learning based approach, widely used for
developing word sense disambiguation systems. Bootstrapping approach eliminates need for a
large training set by relying on a relatively less number of instances called seeds. Seeds are used
for training and testing. The generated pair of test patterns and the predicted output is later added
with the seeds to populate the training data. This process is very useful, if the base database is
comparatively small in size.
2.5 Bootstrapping approach
Dictionary based disambiguation approaches use a dictionary which provides both the means of
constructing a sense tagger and the target sense [1, 10]. In order to perform a large scale
disambiguation, machine readable dictionaries (MRD) are used to automate the task. This process
is useful for the languages having a digitalized machine readable dictionary. Due to the lack of
standard database and related knowledge acquisition tools, such approaches can hardly be used
for the focus Manipuri or other Indian languages.
3. DATABASE COLLECTION
The database used in the current study contains a total of 672 Manipuri sentences. These
sentences are collected from a local newspaper "The Sangai Express". The sentences contain a
total of 13,167 words and around 2,000 polysemous words. Polysemous words are the focus of
the current study as they contain more than one sense based on contexts and other factors. The
sentences are chosen from the news domains to accommodate day-to-day words and senses
mostly. The entire database, due to lack of digital resources and standard font are typed manually
[11]. The database is thoroughly checked for inconsistencies present in spelling, spacing and,
punctuation. The polysemous words are then tagged with the sense manually.
4. PROPOSED ARCHITECTURE
The current study attempts to find a solution to disambiguate between the possible senses of
Manipuri words. A base database is collected with sentences consisting of day-to-day senses. The
database is then preprocessed and suggested features are collected out of it. Then, a classifier is
developed with the knowledge obtained from the features using a learning algorithm. The
classifier finally predicts the sense of the test sentences.
The word sense disambiguation system works in two phases: (i) training phase and (ii) testing
phase. In the training phase, the training data is first preprocessed and features are generated. The
features are used then to train the classifier based on any learning algorithm. On the contrary, in
the testing phase, the test data is preprocessed and features are generated. The features of the test
data are fed to the classifier and the predicted output is compared for performance evaluation of
the system. Hence, the proposed architecture of the word sense disambiguation module for
Manipuri language contains five building blocks: (i) preprocessing, (ii) feature selection and
generation and (iii) training, (iv) testing and (v) performance evaluation.
A. Preprocessing
Preprocessing is the module where the raw data is processed so that the features can be generated
from the training or test data efficiently. Data is converted to corresponding ASCII format after
Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014
20
the inconsistencies (like spelling, punctuations and spacing mistakes) are removed. When the data
is converted to ASCII format, the selected features are generated semi-automatically.
B. Feature Selection and Generation
In this phase, the ASCII format generated database is used to build the feature. The current
study, as it is in the preliminary attempt, uses a set of very common, widely popular and easily
extractable features. A total of 6 features are taken to build feature:
(i) the focus word for which the sense is to be derived,
(ii) the normalized position of the word in the sentence,
(iii) the previous word,
(iv) the previous-to-previous word,
(v) the next word,
(vi) the next to next word.
A 5-gram window is formed using the pair of the focus word and its context words which forms
the context information. A focus word, based on the context may have different senses. Hence, in
order to disambiguate the sense of the focused word, the contextual information is very much
necessary [13], and helps in predicting the correct one. In the current study positional feature is
suggested because of the lack of other relevant morphological features. As the syntactic and
semantic structures of a sentence remain mostly similar for a particular language, this feature
contains probable morphological information. To generate the final input feature vector, from the
database mentioned above mentioned six features are collected automatically.
C. Training
In the present work, a supervised learning algorithm is proposed based on decision tree algorithm.
In supervised approaches, the classifier is developed with the help of the input feature vector and
the output labels of the data. Hence, the final feature vector is developed using the six features
mentioned in section B and the output sense of the focus word. The sense of the focus word is
derived manually and finally the seven entries are fed to the classifier. The classifier is trained
with the decision tree algorithm.
The decision tree is a simple and linear decision making approach. In the current work,
classification and regression tree (CART) based algorithm is suggested to train the classifier.
Binary decision tree is developed automatically using the seven entries of the focus word present
in training data. CART uses all the instances of the training data and asks binary questions on the
features. CART algorithm selects the most predictive feature from the feature set and the best
possible question to achieve accurate classification of the training. Based on the feature and its
distribution of values, the training data is segmented into two parts. This segmentation is carried
out based on each of the features and finally at the leaves, a output class is achieved. CART has
been reported as one of the suitable learning algorithms for developing supervised learning
models for Indian languages like Manipuri whose characteristics have not studied in detail. An
example of a CART is given in Fig 1.
Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014
21
Fig 1: Flow Chart of CART based word sense disambiguation
D. Testing
During the testing, CART offers prediction by comparing the features for the test case with the
trained decision tree. While predicting the sense for a test word, the corresponding features are
generated and compared with the trained decision tree. Starting from the root node asking a
question on individual features, the features of the test word will lead to a leaf which offers an
approximated output.
The features are already generated in the feature generation phase. The trained CART is fed with
features are test data and corresponding approximations are noted. The predictions are later
compared with the correct sense tags to perform evaluation of the current system.
E. Performance Evaluation
The accuracy of the proposed word sense disambiguation system is evaluated using an objective
analysis. The analysis is carried out by determining the correct sense prediction percentage for the
test words.
Being trained on 1,600 words and tested on 400 words using 7 features, the word sense
disambiguation system predicts the senses with 71.75% accuracy. The performance of the
proposed model is discussed in TABLE 1.
TABLE I: Objective Evaluation of CART based word sense disambiguation model
5. CONCLUSION AND FUTURE WORK
This paper proposed a preliminary attempt to predict the senses of the words for Manipuri
language. A set of positional and contextual features are suggested for developing the word sense
disambiguation system. Due to unavailability of morphological and other relevant linguistic
Predicted
Sense
Correctly
Predicted Sense
Accuracy
400 287 71.75%
Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014
22
knowledge, the accuracy of the model is lower than the models built for languages like English.
Moreover, the size of training data or the features used are not sufficient to achieve the
classification quality of human transcribers. A relevant and appropriate feature set can achieve
higher accuracy. Further improvements are expected by:
1. Joining a rule-based model with the CART based data-driven model to improve accuracy of the
word sense disambiguation system.
2. Exploring other well-known classifiers such as ANN, SVM or other probabilistic methods like
HMM and N-gram models.
ACKNOWLEDGEMENTS
The work presented in this paper is performed at KIIT University as a project for the fullfilment
of the degree of Masters of Technology. Special thanks to Mr. Kishorjit N. Singh for his help in
data collection.
REFERENCES
[1] D.Jurafsky, J.H.Martin, “Speech and Language Processing,”, Pearson Publishers, December, pp. 658-
672
[2] R.Mihalcea, D. Moldovan, “Word Sense Disambiguation based on Semantic Density”, in proceedings
of Seventh International conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2010, pp.
1492-1496.
[3] M.Sinha, M.K.Reddy, P.Bhattacharyya, P.Pandey and L.Kashyap, “Word Sense Disambiguation,” in
proceedings of International Journal of Computer Applications (IJCA),2010, vol. 5.9, pp. 25-32.
[4] R.Mihalcea and D. Moldovan, “A Method for Disambiguating Word Senses in a Large Corpus,” in
proceedings of COLING/ACL Workshop on Usage of WordNet in Natural Language Processing,
Computers and the Humanities, 1992, volume 26, no. 5-6,pp. 415-439.
[5] C.Leacock, G.Towell and E.Voorhees, “Corpus-Based Statistical Sense Resolution,” ARPA
Workshop on Human Language Technology, 1993.
[6] T.D.Singh and S.Bandyopadhyay, “Word Word Class and Sentence Type Identification in Manipuri
Morphological Analyzer,” in proceedings of MSPIL, 2006.
[7 ]K.Nongmeikapam, L.Nonglenjaoba, Y.Nirmal and S.Bandhyopadhyay, “Improvement of CRF based
Manipuri POS tagger by using Reduplicated MWE (RMWE),” arXiv preprint arXiv: 1111.2399,
2011.
[8] K.Nongmeikapam and S. Bandhyopadhyay, “A Transliteration of CRF Based Manipuri POS
Tagging,” in proceedings of Procedia Technology, Elsevier, 2012, pp. 582-589.
[9] M S. Olsen, “WordNet Word sense Disambiguation using an Automatically Generated Ontology,”
Class of 2003 Senior Conference on Natural Language Processing, 2003.
[10] J.Veronis and M. Nancy, “Word Sense Disambiguation with Very Large Neural Networks Extracted
from Machine Readable Dictionarie,” in Proceedings of 13th conference on Computational
linguistics, 1990, volume 2, pp. 389-394.
[11] M.M. Khapra, S. Shah, P. Kedia and P. Bhattacharyya, “Projecting Parameters for Multilingual Word
Sense Disambiguationin Proceedings of 2009 Conference on Empirical Methods in Natural Language
Processing, 2009,volume 1, pp. 459-467.
[12] K.Ghosh, R. V. Reddy, N. P. Narendra, S. Maity, S. Koolagudi and K. S. Rao, “Grapheme-to-
phoneme Conversion in Bengali for Festival based TTS Framework”, in Proceedings of International
Conference of Natural Language Processing, 2010, pp. 294.
[13] A.Eneko and G.Rigau, “Word Sense Disambiguation using Conceptual Density”, in Proceedings of
the 16th International Conference on Computational Linguistics (COLING), Copenhagen, Denmark,
1996.

More Related Content

What's hot

A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...Eswar Publications
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATANAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATAijnlc
 
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...ijnlc
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionaryiosrjce
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...CSCJournals
 
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODPART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODijait
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 
An Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalAn Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
 
Marathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language ProcessingMarathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir systemcsandit
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMcscpconf
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysisijnlc
 
Design of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageDesign of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageWaqas Tariq
 
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...IJCI JOURNAL
 
Current state of the art pos tagging for indian languages – a study
Current state of the art pos tagging for indian languages – a studyCurrent state of the art pos tagging for indian languages – a study
Current state of the art pos tagging for indian languages – a studyiaemedu
 

What's hot (20)

Cf32516518
Cf32516518Cf32516518
Cf32516518
 
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
A Review on Speech Corpus Development for Automatic Speech Recognition in Ind...
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATANAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
NAMED ENTITY RECOGNITION FROM BENGALI NEWSPAPER DATA
 
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
 
J1803015357
J1803015357J1803015357
J1803015357
 
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODPART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHOD
 
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
An Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity RemovalAn Improved Approach for Word Ambiguity Removal
An Improved Approach for Word Ambiguity Removal
 
Marathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language ProcessingMarathi Text-To-Speech Synthesis using Natural Language Processing
Marathi Text-To-Speech Synthesis using Natural Language Processing
 
A language independent approach to develop urduir system
A language independent approach to develop urduir systemA language independent approach to develop urduir system
A language independent approach to develop urduir system
 
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEMA LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
A LANGUAGE INDEPENDENT APPROACH TO DEVELOP URDUIR SYSTEM
 
Verb based manipuri sentiment analysis
Verb based manipuri sentiment analysisVerb based manipuri sentiment analysis
Verb based manipuri sentiment analysis
 
Design of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageDesign of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa Language
 
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...
A SURVEY OF LANGUAGE-DETECTION, FONTDETECTION AND FONT-CONVERSION SYSTEMS FOR...
 
SCTUR: A Sentiment Classification Technique for URDU
SCTUR: A Sentiment Classification Technique for URDUSCTUR: A Sentiment Classification Technique for URDU
SCTUR: A Sentiment Classification Technique for URDU
 
Current state of the art pos tagging for indian languages – a study
Current state of the art pos tagging for indian languages – a studyCurrent state of the art pos tagging for indian languages – a study
Current state of the art pos tagging for indian languages – a study
 

Viewers also liked

Compound Noun Polysemy and Sense Enumeration in WordNet
Compound Noun Polysemy and Sense Enumeration in WordNet Compound Noun Polysemy and Sense Enumeration in WordNet
Compound Noun Polysemy and Sense Enumeration in WordNet Biswanath Dutta
 
Ppt presentation group 6 1 28 11
Ppt presentation group 6 1 28 11Ppt presentation group 6 1 28 11
Ppt presentation group 6 1 28 11TMeg542D
 
Synonyms, antonyms, homophones, homographs powerpoint
Synonyms, antonyms, homophones, homographs powerpointSynonyms, antonyms, homophones, homographs powerpoint
Synonyms, antonyms, homophones, homographs powerpointlilybeth_22
 
Polysemous and Homonymous expressions
Polysemous and Homonymous expressionsPolysemous and Homonymous expressions
Polysemous and Homonymous expressionsJuan Miguel Palero
 
SYNONYMS, ANTONYMS, POLYSEMY, HOMONYM, AND HOMOGRAPH
SYNONYMS, ANTONYMS, POLYSEMY,  HOMONYM, AND HOMOGRAPHSYNONYMS, ANTONYMS, POLYSEMY,  HOMONYM, AND HOMOGRAPH
SYNONYMS, ANTONYMS, POLYSEMY, HOMONYM, AND HOMOGRAPHLili Lulu
 

Viewers also liked (7)

Jose P Training 12.17.08
Jose P Training 12.17.08Jose P Training 12.17.08
Jose P Training 12.17.08
 
Compound Noun Polysemy and Sense Enumeration in WordNet
Compound Noun Polysemy and Sense Enumeration in WordNet Compound Noun Polysemy and Sense Enumeration in WordNet
Compound Noun Polysemy and Sense Enumeration in WordNet
 
Ppt presentation group 6 1 28 11
Ppt presentation group 6 1 28 11Ppt presentation group 6 1 28 11
Ppt presentation group 6 1 28 11
 
Synonyms, antonyms, homophones, homographs powerpoint
Synonyms, antonyms, homophones, homographs powerpointSynonyms, antonyms, homophones, homographs powerpoint
Synonyms, antonyms, homophones, homographs powerpoint
 
Polysemous and Homonymous expressions
Polysemous and Homonymous expressionsPolysemous and Homonymous expressions
Polysemous and Homonymous expressions
 
Homonym & polysemy
Homonym & polysemyHomonym & polysemy
Homonym & polysemy
 
SYNONYMS, ANTONYMS, POLYSEMY, HOMONYM, AND HOMOGRAPH
SYNONYMS, ANTONYMS, POLYSEMY,  HOMONYM, AND HOMOGRAPHSYNONYMS, ANTONYMS, POLYSEMY,  HOMONYM, AND HOMOGRAPH
SYNONYMS, ANTONYMS, POLYSEMY, HOMONYM, AND HOMOGRAPH
 

Similar to A decision tree based word sense disambiguation system in manipuri language

Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDIJERA Editor
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...ijnlc
 
Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...ijctcm
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationIJERD Editor
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
 
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents ijsc
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news articleIJDKP
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrievaldannyijwest
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...Scott Bou
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Dhabal Sethi
 
NLP Based Text Summarization Using Semantic Analysis
NLP Based Text Summarization Using Semantic AnalysisNLP Based Text Summarization Using Semantic Analysis
NLP Based Text Summarization Using Semantic AnalysisINFOGAIN PUBLICATION
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationijnlc
 

Similar to A decision tree based word sense disambiguation system in manipuri language (20)

Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDMarathi-English CLIR using detailed user query and unsupervised corpus-based WSD
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSD
 
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
EXTENDING THE KNOWLEDGE OF THE ARABIC SENTIMENT CLASSIFICATION USING A FOREIG...
 
Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid Approach
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language TextsRBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Texts
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents Keyword Extraction Based Summarization of Categorized Kannada Text Documents
Keyword Extraction Based Summarization of Categorized Kannada Text Documents
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news article
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Hf3413291335
Hf3413291335Hf3413291335
Hf3413291335
 
A Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information RetrievalA Review on the Cross and Multilingual Information Retrieval
A Review on the Cross and Multilingual Information Retrieval
 
A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...A Comprehensive Study On Natural Language Processing And Natural Language Int...
A Comprehensive Study On Natural Language Processing And Natural Language Int...
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 
NLP Based Text Summarization Using Semantic Analysis
NLP Based Text Summarization Using Semantic AnalysisNLP Based Text Summarization Using Semantic Analysis
NLP Based Text Summarization Using Semantic Analysis
 
Viva
VivaViva
Viva
 
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classification
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 

More from acijjournal

Jan_2024_Top_read_articles_in_ACIJ.pdf
Jan_2024_Top_read_articles_in_ACIJ.pdfJan_2024_Top_read_articles_in_ACIJ.pdf
Jan_2024_Top_read_articles_in_ACIJ.pdfacijjournal
 
Call for Papers - Advanced Computing An International Journal (ACIJ) (2).pdf
Call for Papers - Advanced Computing An International Journal (ACIJ) (2).pdfCall for Papers - Advanced Computing An International Journal (ACIJ) (2).pdf
Call for Papers - Advanced Computing An International Journal (ACIJ) (2).pdfacijjournal
 
cs - ACIJ (4) (1).pdf
cs - ACIJ (4) (1).pdfcs - ACIJ (4) (1).pdf
cs - ACIJ (4) (1).pdfacijjournal
 
cs - ACIJ (2).pdf
cs - ACIJ (2).pdfcs - ACIJ (2).pdf
cs - ACIJ (2).pdfacijjournal
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
 
4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)acijjournal
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)acijjournal
 
3rdInternational Conference on Natural Language Processingand Applications (N...
3rdInternational Conference on Natural Language Processingand Applications (N...3rdInternational Conference on Natural Language Processingand Applications (N...
3rdInternational Conference on Natural Language Processingand Applications (N...acijjournal
 
4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)acijjournal
 
Graduate School Cyber Portfolio: The Innovative Menu For Sustainable Development
Graduate School Cyber Portfolio: The Innovative Menu For Sustainable DevelopmentGraduate School Cyber Portfolio: The Innovative Menu For Sustainable Development
Graduate School Cyber Portfolio: The Innovative Menu For Sustainable Developmentacijjournal
 
Genetic Algorithms and Programming - An Evolutionary Methodology
Genetic Algorithms and Programming - An Evolutionary MethodologyGenetic Algorithms and Programming - An Evolutionary Methodology
Genetic Algorithms and Programming - An Evolutionary Methodologyacijjournal
 
Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...acijjournal
 
Advanced Computing: An International Journal (ACIJ)
Advanced Computing: An International Journal (ACIJ) Advanced Computing: An International Journal (ACIJ)
Advanced Computing: An International Journal (ACIJ) acijjournal
 
E-Maintenance: Impact Over Industrial Processes, Its Dimensions & Principles
E-Maintenance: Impact Over Industrial Processes, Its Dimensions & PrinciplesE-Maintenance: Impact Over Industrial Processes, Its Dimensions & Principles
E-Maintenance: Impact Over Industrial Processes, Its Dimensions & Principlesacijjournal
 
10th International Conference on Software Engineering and Applications (SEAPP...
10th International Conference on Software Engineering and Applications (SEAPP...10th International Conference on Software Engineering and Applications (SEAPP...
10th International Conference on Software Engineering and Applications (SEAPP...acijjournal
 
10th International conference on Parallel, Distributed Computing and Applicat...
10th International conference on Parallel, Distributed Computing and Applicat...10th International conference on Parallel, Distributed Computing and Applicat...
10th International conference on Parallel, Distributed Computing and Applicat...acijjournal
 
DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...
DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...
DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...acijjournal
 
Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...
Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...
Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...acijjournal
 

More from acijjournal (20)

Jan_2024_Top_read_articles_in_ACIJ.pdf
Jan_2024_Top_read_articles_in_ACIJ.pdfJan_2024_Top_read_articles_in_ACIJ.pdf
Jan_2024_Top_read_articles_in_ACIJ.pdf
 
Call for Papers - Advanced Computing An International Journal (ACIJ) (2).pdf
Call for Papers - Advanced Computing An International Journal (ACIJ) (2).pdfCall for Papers - Advanced Computing An International Journal (ACIJ) (2).pdf
Call for Papers - Advanced Computing An International Journal (ACIJ) (2).pdf
 
cs - ACIJ (4) (1).pdf
cs - ACIJ (4) (1).pdfcs - ACIJ (4) (1).pdf
cs - ACIJ (4) (1).pdf
 
cs - ACIJ (2).pdf
cs - ACIJ (2).pdfcs - ACIJ (2).pdf
cs - ACIJ (2).pdf
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
 
4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)
 
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
7thInternational Conference on Data Mining & Knowledge Management (DaKM 2022)
 
3rdInternational Conference on Natural Language Processingand Applications (N...
3rdInternational Conference on Natural Language Processingand Applications (N...3rdInternational Conference on Natural Language Processingand Applications (N...
3rdInternational Conference on Natural Language Processingand Applications (N...
 
4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)4thInternational Conference on Machine Learning & Applications (CMLA 2022)
4thInternational Conference on Machine Learning & Applications (CMLA 2022)
 
Graduate School Cyber Portfolio: The Innovative Menu For Sustainable Development
Graduate School Cyber Portfolio: The Innovative Menu For Sustainable DevelopmentGraduate School Cyber Portfolio: The Innovative Menu For Sustainable Development
Graduate School Cyber Portfolio: The Innovative Menu For Sustainable Development
 
Genetic Algorithms and Programming - An Evolutionary Methodology
Genetic Algorithms and Programming - An Evolutionary MethodologyGenetic Algorithms and Programming - An Evolutionary Methodology
Genetic Algorithms and Programming - An Evolutionary Methodology
 
Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...Data Transformation Technique for Protecting Private Information in Privacy P...
Data Transformation Technique for Protecting Private Information in Privacy P...
 
Advanced Computing: An International Journal (ACIJ)
Advanced Computing: An International Journal (ACIJ) Advanced Computing: An International Journal (ACIJ)
Advanced Computing: An International Journal (ACIJ)
 
E-Maintenance: Impact Over Industrial Processes, Its Dimensions & Principles
E-Maintenance: Impact Over Industrial Processes, Its Dimensions & PrinciplesE-Maintenance: Impact Over Industrial Processes, Its Dimensions & Principles
E-Maintenance: Impact Over Industrial Processes, Its Dimensions & Principles
 
10th International Conference on Software Engineering and Applications (SEAPP...
10th International Conference on Software Engineering and Applications (SEAPP...10th International Conference on Software Engineering and Applications (SEAPP...
10th International Conference on Software Engineering and Applications (SEAPP...
 
10th International conference on Parallel, Distributed Computing and Applicat...
10th International conference on Parallel, Distributed Computing and Applicat...10th International conference on Parallel, Distributed Computing and Applicat...
10th International conference on Parallel, Distributed Computing and Applicat...
 
DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...
DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...
DETECTION OF FORGERY AND FABRICATION IN PASSPORTS AND VISAS USING CRYPTOGRAPH...
 
Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...
Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...
Detection of Forgery and Fabrication in Passports and Visas Using Cryptograph...
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

A decision tree based word sense disambiguation system in manipuri language

  • 1. Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014 DOI:10.5121/acij.2014.5403 17 A DECISION TREE BASED WORD SENSE DISAMBIGUATION SYSTEM IN MANIPURI LANGUAGE Richard Laishram Singh1 , Krishnendu Ghosh1 , Kishorjit Nongmeikapam2 and Sivaji Bandyopadhyay3 1 School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, India 2 Department of Computer Science and Engineering, Manipur Institute of Technology Manipur, India 3 Dept of Computer Science and Engineering, Jadavpur University, Kolkata, India ABSTRACT This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri language. The paper discusses related attempts made in the Manipuri language followed by the proposed plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study. Conventional positional and context based features are suggested to capture the sense of the words, which have ambiguous and multiple senses. The proposed work is expected to predict the senses of the polysemous words with high accuracy with the help of the suitable knowledge acquisition techniques. The system produces an accuracy of 71.75 %. KEYWORDS Word sense disambiguation(WSD); Classification and regression tree (CART); polysemous word; Manipuri. 1. INTRODUCTION Word sense disambiguation (WSD) is the technique to disambiguate between the senses of a word [1]. It has been recognized as one of the major problems in the field of natural language processing [2]. Word sense disambiguation is not an isolated system by itself, but can be fitted with other important tasks such as, information retrieval, machine translation, speech processing, parts-of-speech tagging and text processing [3, 4]. It has been noted that, the sense of a word is determined based on its syntactic, positional, contextual and other relevant factors. In different scenario, the sense or the meaning of a single word can be different. Such words are known as polysemous words. The current study suggests development of an automatic WSD system [5] with the help of a knowledge acquisition technique. The goal of the present work is to develop a system which can disambiguate between the different senses of a polysemous word in Manipuri language. In European languages, efficient and automatic WSD systems are present. In Indian languages, such systems are mostly rule-based due to lack of standardized database and presence of the proper knowledge acquisition tools. So far, negligible amount of work has been reported in Manipuri language. That motivates the present attempt to collect a suitable database consisting of every possible contexts and senses, used in day-to-day life.
  • 2. Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014 18 The current study reports mainly the development issues of a WSD system in Manipuri language. Manipuri is basically a Tibeto-Burman language, spoken mainly in the valley of Manipur, a North-Eastern state of India. Hence, the syntactic and semantic structures of the language are different from other Indian languages. The task of disambiguating the senses in Manipuri language poses a challenge due to its agglutinative, reduplicative, compounding and tonal nature [6]. The language uses the traditional Meitei-Mayek script for general use. But, that script is not used in the current study due of presence of limited words and structures. The present work uses Bengali script for developing the database. The lack of a standard font made the task even more challenging. Besides, limited amount of works are reported in Manipuri language in the field of natural language processing such as, POS tagging [7], Transliteration [8] and negligible amount in WSD. The paper is organized as follows: related literatures are discussed in section 2. The details of the corpus are mentioned in section 3 The proposed architecture of the word sense disambiguation module is discussed in the section 4 followed by the conclusion and direction towards future work in section 5. Finally a few important references are added. 2. LITERATURE SURVEY Several works on WSD are reported in English and other European languages. On the contrary, limited works have been noted for Indian languages. The approaches are classified in five groups: (i) selection restriction based disambiguation, (ii) machine learning based approach, (iii) supervised learning approach, (iv) bootstrapping approach and (v) dictionary based disambiguation. 2.1 Selection restriction based disambiguation Selection restriction and type hierarchies are the primary knowledge sources used to disambiguate the senses of the words in most of the integrated approaches. In an integrated rule based approach for semantic analysis [9], selection restrictions block the formation of component meaning representations containing the selection restriction violations [1]. For Indian languages, where the semantic and syntactic structures are not analyzed properly and the required knowledge sources are missing, rule based processes generally are not suggested. 2.2 Machine learning based approach The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire proceedings, and not as an independent document. Please do not revise any of the current designations. 2.3 Supervised learning approach Supervised learning is a specific category of the machine learning approaches to train the classifier from supervised (labelled) training data [1]. The classifier is fed with the training data consisting of the exemplars along with the output labels. The classifier predicts the correct output value for any valid test input object.
  • 3. Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014 19 2.4 Bootstrapping approach Bootstrapping approach is an incrementive machine learning based approach, widely used for developing word sense disambiguation systems. Bootstrapping approach eliminates need for a large training set by relying on a relatively less number of instances called seeds. Seeds are used for training and testing. The generated pair of test patterns and the predicted output is later added with the seeds to populate the training data. This process is very useful, if the base database is comparatively small in size. 2.5 Bootstrapping approach Dictionary based disambiguation approaches use a dictionary which provides both the means of constructing a sense tagger and the target sense [1, 10]. In order to perform a large scale disambiguation, machine readable dictionaries (MRD) are used to automate the task. This process is useful for the languages having a digitalized machine readable dictionary. Due to the lack of standard database and related knowledge acquisition tools, such approaches can hardly be used for the focus Manipuri or other Indian languages. 3. DATABASE COLLECTION The database used in the current study contains a total of 672 Manipuri sentences. These sentences are collected from a local newspaper "The Sangai Express". The sentences contain a total of 13,167 words and around 2,000 polysemous words. Polysemous words are the focus of the current study as they contain more than one sense based on contexts and other factors. The sentences are chosen from the news domains to accommodate day-to-day words and senses mostly. The entire database, due to lack of digital resources and standard font are typed manually [11]. The database is thoroughly checked for inconsistencies present in spelling, spacing and, punctuation. The polysemous words are then tagged with the sense manually. 4. PROPOSED ARCHITECTURE The current study attempts to find a solution to disambiguate between the possible senses of Manipuri words. A base database is collected with sentences consisting of day-to-day senses. The database is then preprocessed and suggested features are collected out of it. Then, a classifier is developed with the knowledge obtained from the features using a learning algorithm. The classifier finally predicts the sense of the test sentences. The word sense disambiguation system works in two phases: (i) training phase and (ii) testing phase. In the training phase, the training data is first preprocessed and features are generated. The features are used then to train the classifier based on any learning algorithm. On the contrary, in the testing phase, the test data is preprocessed and features are generated. The features of the test data are fed to the classifier and the predicted output is compared for performance evaluation of the system. Hence, the proposed architecture of the word sense disambiguation module for Manipuri language contains five building blocks: (i) preprocessing, (ii) feature selection and generation and (iii) training, (iv) testing and (v) performance evaluation. A. Preprocessing Preprocessing is the module where the raw data is processed so that the features can be generated from the training or test data efficiently. Data is converted to corresponding ASCII format after
  • 4. Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014 20 the inconsistencies (like spelling, punctuations and spacing mistakes) are removed. When the data is converted to ASCII format, the selected features are generated semi-automatically. B. Feature Selection and Generation In this phase, the ASCII format generated database is used to build the feature. The current study, as it is in the preliminary attempt, uses a set of very common, widely popular and easily extractable features. A total of 6 features are taken to build feature: (i) the focus word for which the sense is to be derived, (ii) the normalized position of the word in the sentence, (iii) the previous word, (iv) the previous-to-previous word, (v) the next word, (vi) the next to next word. A 5-gram window is formed using the pair of the focus word and its context words which forms the context information. A focus word, based on the context may have different senses. Hence, in order to disambiguate the sense of the focused word, the contextual information is very much necessary [13], and helps in predicting the correct one. In the current study positional feature is suggested because of the lack of other relevant morphological features. As the syntactic and semantic structures of a sentence remain mostly similar for a particular language, this feature contains probable morphological information. To generate the final input feature vector, from the database mentioned above mentioned six features are collected automatically. C. Training In the present work, a supervised learning algorithm is proposed based on decision tree algorithm. In supervised approaches, the classifier is developed with the help of the input feature vector and the output labels of the data. Hence, the final feature vector is developed using the six features mentioned in section B and the output sense of the focus word. The sense of the focus word is derived manually and finally the seven entries are fed to the classifier. The classifier is trained with the decision tree algorithm. The decision tree is a simple and linear decision making approach. In the current work, classification and regression tree (CART) based algorithm is suggested to train the classifier. Binary decision tree is developed automatically using the seven entries of the focus word present in training data. CART uses all the instances of the training data and asks binary questions on the features. CART algorithm selects the most predictive feature from the feature set and the best possible question to achieve accurate classification of the training. Based on the feature and its distribution of values, the training data is segmented into two parts. This segmentation is carried out based on each of the features and finally at the leaves, a output class is achieved. CART has been reported as one of the suitable learning algorithms for developing supervised learning models for Indian languages like Manipuri whose characteristics have not studied in detail. An example of a CART is given in Fig 1.
  • 5. Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014 21 Fig 1: Flow Chart of CART based word sense disambiguation D. Testing During the testing, CART offers prediction by comparing the features for the test case with the trained decision tree. While predicting the sense for a test word, the corresponding features are generated and compared with the trained decision tree. Starting from the root node asking a question on individual features, the features of the test word will lead to a leaf which offers an approximated output. The features are already generated in the feature generation phase. The trained CART is fed with features are test data and corresponding approximations are noted. The predictions are later compared with the correct sense tags to perform evaluation of the current system. E. Performance Evaluation The accuracy of the proposed word sense disambiguation system is evaluated using an objective analysis. The analysis is carried out by determining the correct sense prediction percentage for the test words. Being trained on 1,600 words and tested on 400 words using 7 features, the word sense disambiguation system predicts the senses with 71.75% accuracy. The performance of the proposed model is discussed in TABLE 1. TABLE I: Objective Evaluation of CART based word sense disambiguation model 5. CONCLUSION AND FUTURE WORK This paper proposed a preliminary attempt to predict the senses of the words for Manipuri language. A set of positional and contextual features are suggested for developing the word sense disambiguation system. Due to unavailability of morphological and other relevant linguistic Predicted Sense Correctly Predicted Sense Accuracy 400 287 71.75%
  • 6. Advanced Computing: An International Journal (ACIJ), Vol.5, No.4, July 2014 22 knowledge, the accuracy of the model is lower than the models built for languages like English. Moreover, the size of training data or the features used are not sufficient to achieve the classification quality of human transcribers. A relevant and appropriate feature set can achieve higher accuracy. Further improvements are expected by: 1. Joining a rule-based model with the CART based data-driven model to improve accuracy of the word sense disambiguation system. 2. Exploring other well-known classifiers such as ANN, SVM or other probabilistic methods like HMM and N-gram models. ACKNOWLEDGEMENTS The work presented in this paper is performed at KIIT University as a project for the fullfilment of the degree of Masters of Technology. Special thanks to Mr. Kishorjit N. Singh for his help in data collection. REFERENCES [1] D.Jurafsky, J.H.Martin, “Speech and Language Processing,”, Pearson Publishers, December, pp. 658- 672 [2] R.Mihalcea, D. Moldovan, “Word Sense Disambiguation based on Semantic Density”, in proceedings of Seventh International conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2010, pp. 1492-1496. [3] M.Sinha, M.K.Reddy, P.Bhattacharyya, P.Pandey and L.Kashyap, “Word Sense Disambiguation,” in proceedings of International Journal of Computer Applications (IJCA),2010, vol. 5.9, pp. 25-32. [4] R.Mihalcea and D. Moldovan, “A Method for Disambiguating Word Senses in a Large Corpus,” in proceedings of COLING/ACL Workshop on Usage of WordNet in Natural Language Processing, Computers and the Humanities, 1992, volume 26, no. 5-6,pp. 415-439. [5] C.Leacock, G.Towell and E.Voorhees, “Corpus-Based Statistical Sense Resolution,” ARPA Workshop on Human Language Technology, 1993. [6] T.D.Singh and S.Bandyopadhyay, “Word Word Class and Sentence Type Identification in Manipuri Morphological Analyzer,” in proceedings of MSPIL, 2006. [7 ]K.Nongmeikapam, L.Nonglenjaoba, Y.Nirmal and S.Bandhyopadhyay, “Improvement of CRF based Manipuri POS tagger by using Reduplicated MWE (RMWE),” arXiv preprint arXiv: 1111.2399, 2011. [8] K.Nongmeikapam and S. Bandhyopadhyay, “A Transliteration of CRF Based Manipuri POS Tagging,” in proceedings of Procedia Technology, Elsevier, 2012, pp. 582-589. [9] M S. Olsen, “WordNet Word sense Disambiguation using an Automatically Generated Ontology,” Class of 2003 Senior Conference on Natural Language Processing, 2003. [10] J.Veronis and M. Nancy, “Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionarie,” in Proceedings of 13th conference on Computational linguistics, 1990, volume 2, pp. 389-394. [11] M.M. Khapra, S. Shah, P. Kedia and P. Bhattacharyya, “Projecting Parameters for Multilingual Word Sense Disambiguationin Proceedings of 2009 Conference on Empirical Methods in Natural Language Processing, 2009,volume 1, pp. 459-467. [12] K.Ghosh, R. V. Reddy, N. P. Narendra, S. Maity, S. Koolagudi and K. S. Rao, “Grapheme-to- phoneme Conversion in Bengali for Festival based TTS Framework”, in Proceedings of International Conference of Natural Language Processing, 2010, pp. 294. [13] A.Eneko and G.Rigau, “Word Sense Disambiguation using Conceptual Density”, in Proceedings of the 16th International Conference on Computational Linguistics (COLING), Copenhagen, Denmark, 1996.