SlideShare a Scribd company logo
1 of 11
Download to read offline
International Journal of Computer and Technology (IJCET), ISSN 0976 – 6367(Print),
International Journal of Computer Engineering Engineering
and Technology (IJCET), ISSN 0976May - June Print) © IAEME
ISSN 0976 – 6375(Online) Volume 1, Number 1, – 6367( (2010),
ISSN 0976 – 6375(Online) Volume 1                                     IJCET
Number 1, May - June (2010), pp. 250-260                           ©IAEME
© IAEME, http://www.iaeme.com/ijcet.html


    CURRENT STATE OF THE ART POS TAGGING FOR
                   INDIAN LANGUAGES – A STUDY
                                  Shambhavi. B. R
                    Department of CSE, R V College of Engineering
                     Bangalore, E-Mail: shambhavibr@rvce.edu.in

                            Dr. Ramakanth Kumar P
                    Department of ISE, R V College of Engineering
                    Bangalore, E-Mail: ramakanthkp@rvce.edu.in

ABSTRACT
       Parts-of-speech (POS) tagging is the basic building block of any Natural
Language Processing (NLP) tool. A POS tagger has many applications. Especially for
Indian languages, POS tagging adds many more dimensions as most of them are
agglutinative, morphologically very rich highly inflected and are sometimes diglossic.
Taggers have been developed using linguistic rules, stochastic models or both. This paper
is a survey about different POS taggers developed for eight Indian Language, namely
Hindi, Bengali, Tamil, Telugu, Gujarati, Malayalam, Manipuri and Assamese in the
recent past.
Keywords- Parts-of-speech tagger, Indian languages, agglutinative
I. INTRODUCTION
       India is a large multi-lingual country of diverse culture. It has many languages
with written forms and over a thousand spoken languages. The Constitution of India
recognizes 22 languages, spoken in different parts the country. The languages can be
categorized into two major linguistic families namely Indo Aryan and Dravidian. These
classes of languages have some important differences. Their ways of developing words
and grammar are different. But both include a lot of Sanskrit words. In addition, both
have a similar construction and phraseology that links them close together.




                                           250
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


        There is a need to develop information processing tools to facilitate human
machine interaction, in Indian Languages and multi-lingual knowledge resources. A POS
tagger forms an integral part of any such processing tool to be developed. POS Tagging
involves selecting the most likely sequences of syntactic categories for the words in a
sentence. The tagger facilitates the process of creating an annotated corpus. Annotated
corpora find its major application in various NLP related applications like Text to Speech
Conversion, Speech Recognition, Word sense disambiguation, Machine Translation,
Information retrieval etc.
II. TECHNIQUES FOR POS TAGGING
        There exist different approaches to POS Tagging. The tagging models can be
classified into Unsupervised and Supervised techniques. Both of these differ in terms of
the degree of automation of the training and the tagging process. The unsupervised POS
tagging model does not require previously annotated corpus. Instead, they use advanced
computational techniques to automatically induce tagsets, transformation rules, etc.
Based on this information, they either calculate the probabilistic information needed by
the stochastic taggers or induce the contextual rules needed by rule based systems or
transformation based systems. The supervised POS Tagging models require a pre-
annotated corpus which is used for training to learn information about the tagset, word-
tag frequencies, the tag sequence probabilities and/or rule sets, etc. There are various
taggers existing based on these models. Both the supervised and unsupervised taggers can
be further classified into the following types.
                                           POS Tagging



                        Unsupervised                            Supervised


   Rule Based      Stochastic     Neural           Rule Based     Stochastic        Neural


                   Baum Welch              Brill                                    CRF
                                                           Maximum       Decision
                                                   HMM     Likelihood     Trees

                                                                    N-grams          SVM
                                               Viterbi
                                              Algorithm

                          Figure 1Various techniques for POS tagging


                                                     251
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


A. Rule based Tagger
        Rule-based taggers use rules, which can be hand-coded or derived from data, a
tagged corpus. Rules are based on experience and help to distinguish the tag ambiguity.
For example, Brill tagger is system of rule based tagging. It includes lexical rules, used
for initialisation and contextual rules, used to correct the tags.
B. Stochastic Tagger
        Stochastic taggers use statistics i.e., frequency or probability to tag the input text.
The simplest stochastic taggers resolve ambiguity of words based on the probability that
a word occurs with a particular tag. The tag encountered most frequently in the training
set is the one assigned to an ambiguous instance of that word in the testing data. The
disadvantage of this approach is that it might yield a correct tag for a given word but it
could also yield invalid sequences of tags. The other alternative to the word frequency
approach is to calculate the probability of a given sequence of tags occurring. This is
referred to as the n-gram approach, referring to the fact that the best tag for a given word
is determined by the probability that it occurs with the n-1 previous tags. The stochastic
model is based on different models such as Hidden Markov Model (HMM), Maximum
Likelihood Estimation, Decision Trees, n-grams, Maximum Entropy, Support Vector
Machines or Conditional Random Fields.
C. Neural Tagger
        Neural Taggers are based on neural networks which learn the parameters of POS
tagger from a representative training data set [1]. The performance has shown to be better
than stochastic taggers.
III. CURRENT WORK IN INDIAN LANGUAGES
        There has been extensive work towards building a POS tagger for languages
across the world. Western languages have annotated corpora in abundance and hence all
machine learning techniques have been tried. The accuracy of these taggers
approximately ranges from 93-98%. But tagging of Indian languages is a very
challenging task. The primary reason to this, being the limited availability of annotated
corpora and morphological richness of Indian languages. This section details the work
carried out in various Universities and Research Centres in India in this regard.


                                                252
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


A. Hindi
        In recent years, there has been lot of work towards building a POS tagger for
Hindi, the official language of India. Early work started with development of the partial
POS tagger by Ray et.al [2]. This was followed by work by Shrivastava et al. who
proposed harnessing morphological characteristics of Hindi for POS tagging [3]. This
was further enhanced in [4], which suggests a methodology that makes use of detailed
morphological analysis and lexicon lookup for tagging. It used an annotated corpus of
around 15,000 words collected from BBC news site and a decision tree based learning
algorithm – CN2. The accuracy was 93.45% with a tagset of 23 POS tags.
        International Institute of Information Technology (IIIT), Hyderabad, initiated a
POS tagging and chunking contest, NLPAI ML for the Indian languages in 2006. Several
teams came up with various approaches for tagging in three Indian languages namely,
Hindi Bengali and Telugu. In this contest, CRFs were first applied to Hindi by Ravindran
et. Al. [5] and Himanshu et. al.[6] for POS tagging and chunking, where they reported a
performance of 89.69% and 90.89% respectively. In the work of Sankaran Bhaskaran [7],
HMM based statistical technique was attempted. Here probability models of certain
contextual features were also used. POS tagging of Hindi language based on Maximum
Entropy Markov Model was developed by Aniket Dalal et al [8]. In this system, the main
POS tagging features used were context based features, dictionary features, word
features, and corpus-based features.
        In 2007, as part of the SPSAL workshop in IJCAI-07, IIIT, Hyderabad conducted
a competition on POS tagging and chunking for south Asian languages of Hindi, Bengali
and Telugu. None of the teams tried the rule based approach. All eight participants tried
wide range of learning techniques like HMM, Decision trees, CRF, Naïve Bayes and
Maximum Entropy Model. The average POS tagging accuracy of all the systems for
Hindi, Bengali and Telugu are 73.93 %, 72.35 % and 71.83 % respectively.




                                                253
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


    Table I. Summary of the approaches followed and accuracies obtained by various
                       participating teams of SPSAL workshop
                   Team       Approach Hindi Bengali Telugu
                                 Used
                  Pattabhi       HMM       76.34 72.12       53.17
                    et al
                   Satish        HMM       69.35 60.08       77.20
                    and
                  Kishore
                  Rao and        HMM       73.90 69.07       72.38
                 Yarowsky
                 Himanshu         CRF      62.35     76      77.16
                 Asif et al     Hybrid     76.87 73.17       67.69
                                 HMM
                 Sandipan         CRF      75.69 77.61       74.47
                 Ravi et al      Max.      78.35 74.58       75.27
                                Entropy
                  Avinesh      Decision 78.66 76.08          77.37
                    and          Trees
                  Karthik

        Manish Shrivastava & Pushpak Bhattacharyya [9] designed a simple POS tagger
for Hindi based on HMM. It utilized the morphological richness of the language without
restoring to complex and expensive analysis. It achieved a good accuracy of 93.12%.
Recent work in this area has been one by Ankur Parikh [10] where Neural Networks are
tried for tagging. This multi-neuro tagger deals with sparse data, manages multiple
contexts, takes less training time and has good accuracy comparable to other traditional
tagging approaches for Indian languages
B. Bengali
        Bengali is an eastern Indo-Aryan language. It is ranked the sixth most spoken
language of the world. Almost all approaches to tagging have been experimented with
Bengali text. Participants at NLPAI Contest 2006 and SPSAL 2007 tried tagging for
Bengali along with Hindi and Telugu. The highest accuracies obtained were 84.34% and
77.61% for Bengali in the contests respectively. HMM based tagger is reported in [11].
Maximum Entropy based tagger was built in [12]. This tagger demonstrated an accuracy
of 88.2% for a test set of 20,000 word forms. CRF and SVM based taggers are reported in
[13] and [14] respectively. SVM tagger used 26 tags and had a performance of 86.84%.



                                                254
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


Recently Ekbal et. al applied voted approach [15] in order obtain best results in Bengali
tagging.
C. Tamil
        Tamil is the Dravidian language for which good and comparatively large work
has been done in the field of POS tagging. A work by Vasu Ranganathan named tag tamil
is based on Lexical phonological approach. The tagger does morphotactics of
morphological processing of verbs by using index method. Ganeshan’s POS Tagger [16]
works on CIIL corpus. The tagset includes 82 tags at morph level and 22 at word level.
Kathambam is a heuristic rule based tagger designed at RCILTS-Tamil. The performance
of the tagger is around 80%. It is based on the bigram model. In [17] a hybrid tagger
using rule based and HMM technique is developed. SVMTool was used to tag the corpus
in [18] and an accuracy of 94.12% was obtained. Lakshmana Pandian and Geetha [19]
experimented with a morpheme based tagger. A naive Bayes probabilistic model using
morphemes is the first stage for preliminary POS tagging and a CRF model is the next
stage to disambiguate the conflicts that arise in the first stage. The overall accuracy of the
tagger was 95.92%. Dhanalakshmi et. al [20] used SVM methodology based on Linear
programming. This gave the accuracy of 95.63% on the test data.
D. Telugu
        Telugu is the third most-spoken language in India (with about 74 million native
speakers). It is the official language of Andhra Pradesh. In 2006, Sreeganesh [21]
implemented a rule based POS tagger. In the initial stage, a Telugu Morphological
Analyzer analyses the input text. To this, tagset is added and finally around 524
formulated morpho-syntactic rules do the disambiguation. During NLPAI Contest 2006, a
POS tagger of accuracy 81.59%was built. In SPSAL 2007 workshop of IJCAI-07, the
best Telugu tagger was proposed by Avinesh et. al [22] with a performance of 77.37%.
In [23], three Telugu taggers namely (i) Rule-based tagger, (ii) Brill Tagger and (iii)
Maximum Entropy tagger were developed with accuracies of 98.016%, 92.146%, and
87.81% respectively. Recent work has been by Sindhiya Binulal et. al [24] who applied
SVMTool to tagging. The tagset included 10 tags and accuracy of around 95% was
obtained.



                                                255
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


E. Gujarati
        Gujarati is a less privileged language with respect to available resources and
manually tagged data. As a very first step towards tagging, Chirag Patel and Karthik Gali
[25] have designed a hybrid model. The linguistic rules specific to Gujarati are converted
into features and provided to CRF, in order to take advantages of both statistical and rule
based approach. An accuracy of 92% has been achieved by this approach.
F. Malayalam
        Malayalam is primarily spoken in Southern Coastal India by over 36 million
speakers. It is one of the Dravidian languages where much work is still to be done. Manju
K et. al [26] experimented with the stochastic approach for tagging of Malayalam words.
In the first step, a morphological analyzer is used to generate tagged corpora which are
later used by the HMM model based tagger. The results obtained were promising. Later
work was by Antony P.J et. al [27] who applied SVM approach to tag words. They
identified the ambiguities in Malayalam lexical items, and developed a tag set of 29 tags.
The result was more accurate compared to earlier work. With the increase in the number
of words in the training set, the performance increased to around 94%.
G. Manipuri
        Manipuri language is the official language of Manipur. There are at least 29
different dialects spoken in Manipur. The Manipuri tagging is dependent on the
morphological analysis and lexical rules of each category. Hence Thoudam Doren Singh
and Sivaji Bandyopadhyay initially tried to build a morphology driven tagger [28]. This
showed an accuracy of only 69%. Later they built a tagger [29] using Conditional
Random Field (CRF) and Support Vector Machine (SVM). The tagset consisted of 26
tags. Evaluation results demonstrated improvement in the accuracies. They obtained
72.04%, and 74.38% accuracies in the CRF, and SVM, respectively.
H. Assamese
        Assamese is a morphologically rich, relatively free word order and agglutinative
language like any other Indian languages. Navanath Saharia et.al [30] built an Assamese
tagger using the HMM model with Viterbi algorithm. An accuracy of 87% was achieved
by the tagger for the test inputs. Pallav Kumar Dutta has attempted to develop an online


                                                256
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


semi automated tagger. This was designed to deal with sparse data problem of the
language. NLTK is used to tag the test data and for the ambiguous tags an online tagger
would help the user to change the tags.
IV. CONCLUSION
        Development of a high accuracy POS tagger is an active research area in NLP.
The bottleneck to POS tagging of Indian languages is the non-availability of lexical
resources. In addition, adoption of common tagset by researchers would facilitate
reusability and interoperability of annotated corpora. We have in this paper a detailed
study of the POS taggers developed for eight Indian languages. But there exist other
languages of the country, for which hardly any attempts towards building a POS tagger
have started.
REFERENCES
[1] Ahmed (2002), “Application of multilayer perceptron network for tagging parts-of-
speech”, Proceedings of the Language Engineering Conference, IEEE.
[2] A. Basu P. R. Ray, V. Harish and S. Sarkar(2003), ”Part of speech tagging and local
word grouping techniques for natural language parsing in Hindi”, Proceedings of the
International Conference on Natural Language Processing (ICON 2003).
[3] S. Singh M. Shrivastava, N. Agrawal and P. Bhattacharya (2005), “Harnessing
morphological analysis in pos tagging task”, Proceedings of the International Conference
on Natural Language Processing (ICON 2005).
[4] Smriti Singh, Kuhoo Gupta, Manish Shrivastava, and Pushpak Bhattacharyya
(2006),“Morphological richness offsets resource demand – experiences in constructing a
pos tagger for Hindi”, Proceedings of the COLING/ACL 2006 Main Conference Poster
Sessions, Sydney, Australia, pp. 779–786.
[5] Pranjal Awasthi, Delip Rao, Balaraman Ravindran (2006), “Part Of Speech Tagging
and Chunking with HMM and CRF”, Proceedings of the NLPAI MLcontest workshop,
National Workshop on Artificial Intelligence.
[6] Himanshu Agrawal, Anirudh Mani (2006), “Part Of Speech Tagging and Chunking
Using Conditional Random Fields” Proceedings of the NLPAI MLcontest workshop,
National Workshop on Artificial Intelligence.



                                                257
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


[7] Sankaran Baskaran (2006), “Hindi POS tagging and Chunking”, Proceedings of the
NLPAI MLcontest workshop, National Workshop on Artificial Intelligence.
[8] Aniket Dalal, Kumar Nagaraj, Uma Sawant, Sandeep Shelke (2006), “Hindi Part-of-
Speech Tagging and Chunking: A Maximum Entropy Approach” Proceedings of the
NLPAI MLcontest workshop, National Workshop on Artificial Intelligence.
[9] Manish Shrivastava, Pushpak Bhattacharyya (2008), “Hindi POS Tagger Using Naive
Stemming: Harnessing Morphological Information Without Extensive Linguistic
Knowledge”, Proceedings of ICON-2008: 6th International Conference on Natural
Language Processing.
[10] Ankur Parikh (2009), “Part-Of-Speech Tagging using Neural network”, Proceedings
of ICON-2009: 7th International Conference on Natural Language Processing.
[11] Ekbal, Asif, Mondal, S., and S. Bandyopadhyay (2007) “POS Tagging using HMM
and Rule-based Chunking”, In Proceedings of SPSAL-2007, IJCAI-07, pp. 25-28.
[12] A. Ekbal, R. Haque and S. Bandyopadhyay (2008), “Maximum Entropy Based
Bengali Part of Speech Tagging”, Advances in Natural Language Processing and
Applications, Research in Computing Science (RCS) Journal, Vol. (33), pp. 67-78.
[13] A. Ekbal, R. Haque and S. Bandyopadhyay (2007), “Bengali Part of Speech Tagging
using Conditional Random Field”, Proceedings of the 7th International Symposium on
Natural Language Processing (SNLP-07), Thailand, pp.131-136.
[14] A. Ekbal and S. Bandyopadhyay (2008), “Part of Speech Tagging in Bengali using
Support Vector Machine”, Proceedings of the International Conference on Information
Technology (ICIT 2008), pp.106-111, IEEE.
[15] A. Ekbal , M. Hasanuzzaman and S. Bandyopadhyay (2009), “Voted Approach for
Part of Speech Tagging in Bengali”, Proceedings of the 23rd Pacific Asia Conference on
Language, Information and Computation (PACLIC-09), December 3-5, Hong Kong, pp.
120-129.
[16] Ganesan M (2007), “Morph and POS Tagger for Tamil” (Software) Annamalai
University, Annamalai Nagar.
[17] Arulmozhi P, Sobha L (2006) “A Hybrid POS Tagger for a Relatively Free Word
Order Language”, Proceedings of MSPIL-2006, Indian Institute of Technology, Bombay.



                                                258
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


[18] Dhanalakshmi V, Anandkumar M, Vijaya M.S, Loganathan R, Soman K.P,
Rajendran S (2008), “Tamil Part-of-Speech tagger based on SVMTool”, Proceedings of
the COLIPS International Conference on Asian Language Processing 2008 (IALP),
Chiang Mai, Thailand.
[19] S. Lakshmana Pandian and T. V. Geetha (2008), “Morpheme based Language Model
for Tamil Part-of-Speech Tagging”, Research journal on Computer science and computer
engineering with applications, July-Dec 2008, pp. 19-25.
[20]Dhanalakshmi V, Anandkumar M, Shivapratap G, Soman, K P, Rajendran S (2009)
“Tamil POS Tagging using Linear Programming”, International Journal of Recent Trends
in Engineering, 1(2) pp.166-169.
[21] T. Sreeganesh(2006), “Telugu Parts of Speech Tagging in WSD”, Language of
India, Vol 6: 8 August 2006.
[22] Avinesh PVS and Karthik Gali (2007), “Part-of-speech tagging and chunking using
conditional random fields and transformation based learning”, Proceedings of the IJCAI
and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 21–24.
[23] Rama Sree, R.J, Kusuma Kumari P (2007), “Combining POS Taggers for improved
Accuracy to create Telugu annotated texts for Information Retrieval”, Tirupati.
[24] G.Sindhiya Binulal, P. Anand Goud, K.P.Soman(2009), “A SVM based approach to
Telugu Parts Of Speech Tagging using SVMTool”, International Journal of Recent
Trends in Engineering, Vol. 1, No. 2, May 2009
[25] Chirag Patel and Karthik Gali (2008), “Part-Of-Speech Tagging for Gujarati Using
Conditional Random Fields”, Proceedings of the IJCNLP-08 Workshop on NLP for Less
Privileged Languages, Hyderabad, India, pp. 117–122.
[26] Manju K, Soumya S, Sumam Mary Idicula (2009), “Development of A Pos Tagger
for Malayalam-An Experience”, Proceedings of 2009 International Conference on
Advances in Recent Technologies in Communication and Computing, IEEE
[27] Antony P.J, Santhanu P Mohan, Soman K.P (2010), “SVM Based Part of Speech
Tagger for Malayalam”, Proceedings of 2010 International Conference on Recent Trends
in Information, Telecommunication and Computing, IEEE.




                                                259
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),
ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME


[28] Thoudam Doren Singh, Sivaji Bandyopadhyay (2008), “Morphology Driven
Manipuri POS Tagger”, Proceedings of the IJCNLP-08 Workshop on NLP for Less
Privileged Languages, Hyderabad, India, pp. 91–98.
[29] Thoudam Doren Singh, Sivaji Bandyopadhyay (2008), “Manipuri POS Tagging
using CRF and SVM: A Language Independent Approach”, Proceedings of ICON-2008:
6th International Conference on Natural Language Processing.
[30] Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, Jugal Kalita (2009), “Part of
Speech Tagger for Assamese Text”, Proceedings of the ACL-IJCNLP 2009 Conference
Short Papers, Suntec, Singapore, pp. 33–36.




                                                260

More Related Content

What's hot

Driving cycle development for Kuala Terengganu city using k-means method
Driving cycle development for Kuala Terengganu city using k-means methodDriving cycle development for Kuala Terengganu city using k-means method
Driving cycle development for Kuala Terengganu city using k-means methodIJECEIAES
 
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...ijnlc
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficultiesijtsrd
 
Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...IJECEIAES
 
Quality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingQuality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingijcsa
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageacijjournal
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELijnlc
 
An Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptAn Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptIJERA Editor
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
 

What's hot (15)

Cf32516518
Cf32516518Cf32516518
Cf32516518
 
Driving cycle development for Kuala Terengganu city using k-means method
Driving cycle development for Kuala Terengganu city using k-means methodDriving cycle development for Kuala Terengganu city using k-means method
Driving cycle development for Kuala Terengganu city using k-means method
 
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
 
Applsci 09-02758
Applsci 09-02758Applsci 09-02758
Applsci 09-02758
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficulties
 
Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...Creation of speech corpus for emotion analysis in Gujarati language and its e...
Creation of speech corpus for emotion analysis in Gujarati language and its e...
 
Quality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemmingQuality estimation of machine translation outputs through stemming
Quality estimation of machine translation outputs through stemming
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
 
A decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri languageA decision tree based word sense disambiguation system in manipuri language
A decision tree based word sense disambiguation system in manipuri language
 
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...IRJET-  	  Spoken Language Identification System using MFCC Features and Gaus...
IRJET- Spoken Language Identification System using MFCC Features and Gaus...
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
 
An Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptAn Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari Script
 
G1803013542
G1803013542G1803013542
G1803013542
 
Emotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifierEmotional telugu speech signals classification based on k nn classifier
Emotional telugu speech signals classification based on k nn classifier
 

Viewers also liked

A model based security requirements engineering framework
A model based security requirements engineering frameworkA model based security requirements engineering framework
A model based security requirements engineering frameworkIAEME Publication
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reorderingIAEME Publication
 
Determination of optimum fft for wi max under different fading
Determination of optimum fft for wi max under different fadingDetermination of optimum fft for wi max under different fading
Determination of optimum fft for wi max under different fadingIAEME Publication
 
Optimum design of automotive composite drive shaft
Optimum design of automotive composite drive shaftOptimum design of automotive composite drive shaft
Optimum design of automotive composite drive shaftIAEME Publication
 
Risk and technology management in banking industry
Risk and technology management in banking industryRisk and technology management in banking industry
Risk and technology management in banking industryIAEME Publication
 
Class quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecardsClass quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecardsIAEME Publication
 
A critical study on road side marketing a new avenue for farmers in small v...
A critical study on road side marketing   a new avenue for farmers in small v...A critical study on road side marketing   a new avenue for farmers in small v...
A critical study on road side marketing a new avenue for farmers in small v...IAEME Publication
 
Octave wave sound signal measurements in ducted axial fan under stable region...
Octave wave sound signal measurements in ducted axial fan under stable region...Octave wave sound signal measurements in ducted axial fan under stable region...
Octave wave sound signal measurements in ducted axial fan under stable region...IAEME Publication
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteIAEME Publication
 

Viewers also liked (9)

A model based security requirements engineering framework
A model based security requirements engineering frameworkA model based security requirements engineering framework
A model based security requirements engineering framework
 
Survey on transaction reordering
Survey on transaction reorderingSurvey on transaction reordering
Survey on transaction reordering
 
Determination of optimum fft for wi max under different fading
Determination of optimum fft for wi max under different fadingDetermination of optimum fft for wi max under different fading
Determination of optimum fft for wi max under different fading
 
Optimum design of automotive composite drive shaft
Optimum design of automotive composite drive shaftOptimum design of automotive composite drive shaft
Optimum design of automotive composite drive shaft
 
Risk and technology management in banking industry
Risk and technology management in banking industryRisk and technology management in banking industry
Risk and technology management in banking industry
 
Class quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecardsClass quality evaluation using class quality scorecards
Class quality evaluation using class quality scorecards
 
A critical study on road side marketing a new avenue for farmers in small v...
A critical study on road side marketing   a new avenue for farmers in small v...A critical study on road side marketing   a new avenue for farmers in small v...
A critical study on road side marketing a new avenue for farmers in small v...
 
Octave wave sound signal measurements in ducted axial fan under stable region...
Octave wave sound signal measurements in ducted axial fan under stable region...Octave wave sound signal measurements in ducted axial fan under stable region...
Octave wave sound signal measurements in ducted axial fan under stable region...
 
Aco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suiteAco based solution for tsp model for evaluation of software test suite
Aco based solution for tsp model for evaluation of software test suite
 

Similar to Current state of the art pos tagging for indian languages – a study

Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002IJARTES
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
 
Live Sign Language Translation: A Survey
Live Sign Language Translation: A SurveyLive Sign Language Translation: A Survey
Live Sign Language Translation: A SurveyIRJET Journal
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
Toward accurate Amazigh part-of-speech tagging
Toward accurate Amazigh part-of-speech taggingToward accurate Amazigh part-of-speech tagging
Toward accurate Amazigh part-of-speech taggingIAESIJAI
 
IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET Journal
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalamijcsit
 
Language and Offensive Word Detection
Language and Offensive Word DetectionLanguage and Offensive Word Detection
Language and Offensive Word DetectionIRJET Journal
 
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...IJCI JOURNAL
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2IAEME Publication
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 
IRJET- Vernacular Language Spell Checker & Autocorrection
IRJET- Vernacular Language Spell Checker & AutocorrectionIRJET- Vernacular Language Spell Checker & Autocorrection
IRJET- Vernacular Language Spell Checker & AutocorrectionIRJET Journal
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)kevig
 
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGGENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGijnlc
 
English to punjabi machine translation system using hybrid approach of word s
English to punjabi machine translation system using hybrid approach of word sEnglish to punjabi machine translation system using hybrid approach of word s
English to punjabi machine translation system using hybrid approach of word sIAEME Publication
 
Improving a Lightweight Stemmer for Gujarati Language
Improving a Lightweight Stemmer for Gujarati LanguageImproving a Lightweight Stemmer for Gujarati Language
Improving a Lightweight Stemmer for Gujarati Languageijistjournal
 

Similar to Current state of the art pos tagging for indian languages – a study (20)

Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
 
Live Sign Language Translation: A Survey
Live Sign Language Translation: A SurveyLive Sign Language Translation: A Survey
Live Sign Language Translation: A Survey
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
Toward accurate Amazigh part-of-speech tagging
Toward accurate Amazigh part-of-speech taggingToward accurate Amazigh part-of-speech tagging
Toward accurate Amazigh part-of-speech tagging
 
IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET - Text Optimization/Summarizer using Natural Language Processing
IRJET - Text Optimization/Summarizer using Natural Language Processing
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
 
Language and Offensive Word Detection
Language and Offensive Word DetectionLanguage and Offensive Word Detection
Language and Offensive Word Detection
 
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 
IRJET- Vernacular Language Spell Checker & Autocorrection
IRJET- Vernacular Language Spell Checker & AutocorrectionIRJET- Vernacular Language Spell Checker & Autocorrection
IRJET- Vernacular Language Spell Checker & Autocorrection
 
Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...Design and Development of a Malayalam to English Translator- A Transfer Based...
Design and Development of a Malayalam to English Translator- A Transfer Based...
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)Named Entity Recognition using Hidden Markov Model (HMM)
Named Entity Recognition using Hidden Markov Model (HMM)
 
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGGENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
 
English to punjabi machine translation system using hybrid approach of word s
English to punjabi machine translation system using hybrid approach of word sEnglish to punjabi machine translation system using hybrid approach of word s
English to punjabi machine translation system using hybrid approach of word s
 
Improving a Lightweight Stemmer for Gujarati Language
Improving a Lightweight Stemmer for Gujarati LanguageImproving a Lightweight Stemmer for Gujarati Language
Improving a Lightweight Stemmer for Gujarati Language
 

More from IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

More from IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Current state of the art pos tagging for indian languages – a study

  • 1. International Journal of Computer and Technology (IJCET), ISSN 0976 – 6367(Print), International Journal of Computer Engineering Engineering and Technology (IJCET), ISSN 0976May - June Print) © IAEME ISSN 0976 – 6375(Online) Volume 1, Number 1, – 6367( (2010), ISSN 0976 – 6375(Online) Volume 1 IJCET Number 1, May - June (2010), pp. 250-260 ©IAEME © IAEME, http://www.iaeme.com/ijcet.html CURRENT STATE OF THE ART POS TAGGING FOR INDIAN LANGUAGES – A STUDY Shambhavi. B. R Department of CSE, R V College of Engineering Bangalore, E-Mail: shambhavibr@rvce.edu.in Dr. Ramakanth Kumar P Department of ISE, R V College of Engineering Bangalore, E-Mail: ramakanthkp@rvce.edu.in ABSTRACT Parts-of-speech (POS) tagging is the basic building block of any Natural Language Processing (NLP) tool. A POS tagger has many applications. Especially for Indian languages, POS tagging adds many more dimensions as most of them are agglutinative, morphologically very rich highly inflected and are sometimes diglossic. Taggers have been developed using linguistic rules, stochastic models or both. This paper is a survey about different POS taggers developed for eight Indian Language, namely Hindi, Bengali, Tamil, Telugu, Gujarati, Malayalam, Manipuri and Assamese in the recent past. Keywords- Parts-of-speech tagger, Indian languages, agglutinative I. INTRODUCTION India is a large multi-lingual country of diverse culture. It has many languages with written forms and over a thousand spoken languages. The Constitution of India recognizes 22 languages, spoken in different parts the country. The languages can be categorized into two major linguistic families namely Indo Aryan and Dravidian. These classes of languages have some important differences. Their ways of developing words and grammar are different. But both include a lot of Sanskrit words. In addition, both have a similar construction and phraseology that links them close together. 250
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME There is a need to develop information processing tools to facilitate human machine interaction, in Indian Languages and multi-lingual knowledge resources. A POS tagger forms an integral part of any such processing tool to be developed. POS Tagging involves selecting the most likely sequences of syntactic categories for the words in a sentence. The tagger facilitates the process of creating an annotated corpus. Annotated corpora find its major application in various NLP related applications like Text to Speech Conversion, Speech Recognition, Word sense disambiguation, Machine Translation, Information retrieval etc. II. TECHNIQUES FOR POS TAGGING There exist different approaches to POS Tagging. The tagging models can be classified into Unsupervised and Supervised techniques. Both of these differ in terms of the degree of automation of the training and the tagging process. The unsupervised POS tagging model does not require previously annotated corpus. Instead, they use advanced computational techniques to automatically induce tagsets, transformation rules, etc. Based on this information, they either calculate the probabilistic information needed by the stochastic taggers or induce the contextual rules needed by rule based systems or transformation based systems. The supervised POS Tagging models require a pre- annotated corpus which is used for training to learn information about the tagset, word- tag frequencies, the tag sequence probabilities and/or rule sets, etc. There are various taggers existing based on these models. Both the supervised and unsupervised taggers can be further classified into the following types. POS Tagging Unsupervised Supervised Rule Based Stochastic Neural Rule Based Stochastic Neural Baum Welch Brill CRF Maximum Decision HMM Likelihood Trees N-grams SVM Viterbi Algorithm Figure 1Various techniques for POS tagging 251
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME A. Rule based Tagger Rule-based taggers use rules, which can be hand-coded or derived from data, a tagged corpus. Rules are based on experience and help to distinguish the tag ambiguity. For example, Brill tagger is system of rule based tagging. It includes lexical rules, used for initialisation and contextual rules, used to correct the tags. B. Stochastic Tagger Stochastic taggers use statistics i.e., frequency or probability to tag the input text. The simplest stochastic taggers resolve ambiguity of words based on the probability that a word occurs with a particular tag. The tag encountered most frequently in the training set is the one assigned to an ambiguous instance of that word in the testing data. The disadvantage of this approach is that it might yield a correct tag for a given word but it could also yield invalid sequences of tags. The other alternative to the word frequency approach is to calculate the probability of a given sequence of tags occurring. This is referred to as the n-gram approach, referring to the fact that the best tag for a given word is determined by the probability that it occurs with the n-1 previous tags. The stochastic model is based on different models such as Hidden Markov Model (HMM), Maximum Likelihood Estimation, Decision Trees, n-grams, Maximum Entropy, Support Vector Machines or Conditional Random Fields. C. Neural Tagger Neural Taggers are based on neural networks which learn the parameters of POS tagger from a representative training data set [1]. The performance has shown to be better than stochastic taggers. III. CURRENT WORK IN INDIAN LANGUAGES There has been extensive work towards building a POS tagger for languages across the world. Western languages have annotated corpora in abundance and hence all machine learning techniques have been tried. The accuracy of these taggers approximately ranges from 93-98%. But tagging of Indian languages is a very challenging task. The primary reason to this, being the limited availability of annotated corpora and morphological richness of Indian languages. This section details the work carried out in various Universities and Research Centres in India in this regard. 252
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME A. Hindi In recent years, there has been lot of work towards building a POS tagger for Hindi, the official language of India. Early work started with development of the partial POS tagger by Ray et.al [2]. This was followed by work by Shrivastava et al. who proposed harnessing morphological characteristics of Hindi for POS tagging [3]. This was further enhanced in [4], which suggests a methodology that makes use of detailed morphological analysis and lexicon lookup for tagging. It used an annotated corpus of around 15,000 words collected from BBC news site and a decision tree based learning algorithm – CN2. The accuracy was 93.45% with a tagset of 23 POS tags. International Institute of Information Technology (IIIT), Hyderabad, initiated a POS tagging and chunking contest, NLPAI ML for the Indian languages in 2006. Several teams came up with various approaches for tagging in three Indian languages namely, Hindi Bengali and Telugu. In this contest, CRFs were first applied to Hindi by Ravindran et. Al. [5] and Himanshu et. al.[6] for POS tagging and chunking, where they reported a performance of 89.69% and 90.89% respectively. In the work of Sankaran Bhaskaran [7], HMM based statistical technique was attempted. Here probability models of certain contextual features were also used. POS tagging of Hindi language based on Maximum Entropy Markov Model was developed by Aniket Dalal et al [8]. In this system, the main POS tagging features used were context based features, dictionary features, word features, and corpus-based features. In 2007, as part of the SPSAL workshop in IJCAI-07, IIIT, Hyderabad conducted a competition on POS tagging and chunking for south Asian languages of Hindi, Bengali and Telugu. None of the teams tried the rule based approach. All eight participants tried wide range of learning techniques like HMM, Decision trees, CRF, Naïve Bayes and Maximum Entropy Model. The average POS tagging accuracy of all the systems for Hindi, Bengali and Telugu are 73.93 %, 72.35 % and 71.83 % respectively. 253
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME Table I. Summary of the approaches followed and accuracies obtained by various participating teams of SPSAL workshop Team Approach Hindi Bengali Telugu Used Pattabhi HMM 76.34 72.12 53.17 et al Satish HMM 69.35 60.08 77.20 and Kishore Rao and HMM 73.90 69.07 72.38 Yarowsky Himanshu CRF 62.35 76 77.16 Asif et al Hybrid 76.87 73.17 67.69 HMM Sandipan CRF 75.69 77.61 74.47 Ravi et al Max. 78.35 74.58 75.27 Entropy Avinesh Decision 78.66 76.08 77.37 and Trees Karthik Manish Shrivastava & Pushpak Bhattacharyya [9] designed a simple POS tagger for Hindi based on HMM. It utilized the morphological richness of the language without restoring to complex and expensive analysis. It achieved a good accuracy of 93.12%. Recent work in this area has been one by Ankur Parikh [10] where Neural Networks are tried for tagging. This multi-neuro tagger deals with sparse data, manages multiple contexts, takes less training time and has good accuracy comparable to other traditional tagging approaches for Indian languages B. Bengali Bengali is an eastern Indo-Aryan language. It is ranked the sixth most spoken language of the world. Almost all approaches to tagging have been experimented with Bengali text. Participants at NLPAI Contest 2006 and SPSAL 2007 tried tagging for Bengali along with Hindi and Telugu. The highest accuracies obtained were 84.34% and 77.61% for Bengali in the contests respectively. HMM based tagger is reported in [11]. Maximum Entropy based tagger was built in [12]. This tagger demonstrated an accuracy of 88.2% for a test set of 20,000 word forms. CRF and SVM based taggers are reported in [13] and [14] respectively. SVM tagger used 26 tags and had a performance of 86.84%. 254
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME Recently Ekbal et. al applied voted approach [15] in order obtain best results in Bengali tagging. C. Tamil Tamil is the Dravidian language for which good and comparatively large work has been done in the field of POS tagging. A work by Vasu Ranganathan named tag tamil is based on Lexical phonological approach. The tagger does morphotactics of morphological processing of verbs by using index method. Ganeshan’s POS Tagger [16] works on CIIL corpus. The tagset includes 82 tags at morph level and 22 at word level. Kathambam is a heuristic rule based tagger designed at RCILTS-Tamil. The performance of the tagger is around 80%. It is based on the bigram model. In [17] a hybrid tagger using rule based and HMM technique is developed. SVMTool was used to tag the corpus in [18] and an accuracy of 94.12% was obtained. Lakshmana Pandian and Geetha [19] experimented with a morpheme based tagger. A naive Bayes probabilistic model using morphemes is the first stage for preliminary POS tagging and a CRF model is the next stage to disambiguate the conflicts that arise in the first stage. The overall accuracy of the tagger was 95.92%. Dhanalakshmi et. al [20] used SVM methodology based on Linear programming. This gave the accuracy of 95.63% on the test data. D. Telugu Telugu is the third most-spoken language in India (with about 74 million native speakers). It is the official language of Andhra Pradesh. In 2006, Sreeganesh [21] implemented a rule based POS tagger. In the initial stage, a Telugu Morphological Analyzer analyses the input text. To this, tagset is added and finally around 524 formulated morpho-syntactic rules do the disambiguation. During NLPAI Contest 2006, a POS tagger of accuracy 81.59%was built. In SPSAL 2007 workshop of IJCAI-07, the best Telugu tagger was proposed by Avinesh et. al [22] with a performance of 77.37%. In [23], three Telugu taggers namely (i) Rule-based tagger, (ii) Brill Tagger and (iii) Maximum Entropy tagger were developed with accuracies of 98.016%, 92.146%, and 87.81% respectively. Recent work has been by Sindhiya Binulal et. al [24] who applied SVMTool to tagging. The tagset included 10 tags and accuracy of around 95% was obtained. 255
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME E. Gujarati Gujarati is a less privileged language with respect to available resources and manually tagged data. As a very first step towards tagging, Chirag Patel and Karthik Gali [25] have designed a hybrid model. The linguistic rules specific to Gujarati are converted into features and provided to CRF, in order to take advantages of both statistical and rule based approach. An accuracy of 92% has been achieved by this approach. F. Malayalam Malayalam is primarily spoken in Southern Coastal India by over 36 million speakers. It is one of the Dravidian languages where much work is still to be done. Manju K et. al [26] experimented with the stochastic approach for tagging of Malayalam words. In the first step, a morphological analyzer is used to generate tagged corpora which are later used by the HMM model based tagger. The results obtained were promising. Later work was by Antony P.J et. al [27] who applied SVM approach to tag words. They identified the ambiguities in Malayalam lexical items, and developed a tag set of 29 tags. The result was more accurate compared to earlier work. With the increase in the number of words in the training set, the performance increased to around 94%. G. Manipuri Manipuri language is the official language of Manipur. There are at least 29 different dialects spoken in Manipur. The Manipuri tagging is dependent on the morphological analysis and lexical rules of each category. Hence Thoudam Doren Singh and Sivaji Bandyopadhyay initially tried to build a morphology driven tagger [28]. This showed an accuracy of only 69%. Later they built a tagger [29] using Conditional Random Field (CRF) and Support Vector Machine (SVM). The tagset consisted of 26 tags. Evaluation results demonstrated improvement in the accuracies. They obtained 72.04%, and 74.38% accuracies in the CRF, and SVM, respectively. H. Assamese Assamese is a morphologically rich, relatively free word order and agglutinative language like any other Indian languages. Navanath Saharia et.al [30] built an Assamese tagger using the HMM model with Viterbi algorithm. An accuracy of 87% was achieved by the tagger for the test inputs. Pallav Kumar Dutta has attempted to develop an online 256
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME semi automated tagger. This was designed to deal with sparse data problem of the language. NLTK is used to tag the test data and for the ambiguous tags an online tagger would help the user to change the tags. IV. CONCLUSION Development of a high accuracy POS tagger is an active research area in NLP. The bottleneck to POS tagging of Indian languages is the non-availability of lexical resources. In addition, adoption of common tagset by researchers would facilitate reusability and interoperability of annotated corpora. We have in this paper a detailed study of the POS taggers developed for eight Indian languages. But there exist other languages of the country, for which hardly any attempts towards building a POS tagger have started. REFERENCES [1] Ahmed (2002), “Application of multilayer perceptron network for tagging parts-of- speech”, Proceedings of the Language Engineering Conference, IEEE. [2] A. Basu P. R. Ray, V. Harish and S. Sarkar(2003), ”Part of speech tagging and local word grouping techniques for natural language parsing in Hindi”, Proceedings of the International Conference on Natural Language Processing (ICON 2003). [3] S. Singh M. Shrivastava, N. Agrawal and P. Bhattacharya (2005), “Harnessing morphological analysis in pos tagging task”, Proceedings of the International Conference on Natural Language Processing (ICON 2005). [4] Smriti Singh, Kuhoo Gupta, Manish Shrivastava, and Pushpak Bhattacharyya (2006),“Morphological richness offsets resource demand – experiences in constructing a pos tagger for Hindi”, Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, pp. 779–786. [5] Pranjal Awasthi, Delip Rao, Balaraman Ravindran (2006), “Part Of Speech Tagging and Chunking with HMM and CRF”, Proceedings of the NLPAI MLcontest workshop, National Workshop on Artificial Intelligence. [6] Himanshu Agrawal, Anirudh Mani (2006), “Part Of Speech Tagging and Chunking Using Conditional Random Fields” Proceedings of the NLPAI MLcontest workshop, National Workshop on Artificial Intelligence. 257
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME [7] Sankaran Baskaran (2006), “Hindi POS tagging and Chunking”, Proceedings of the NLPAI MLcontest workshop, National Workshop on Artificial Intelligence. [8] Aniket Dalal, Kumar Nagaraj, Uma Sawant, Sandeep Shelke (2006), “Hindi Part-of- Speech Tagging and Chunking: A Maximum Entropy Approach” Proceedings of the NLPAI MLcontest workshop, National Workshop on Artificial Intelligence. [9] Manish Shrivastava, Pushpak Bhattacharyya (2008), “Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge”, Proceedings of ICON-2008: 6th International Conference on Natural Language Processing. [10] Ankur Parikh (2009), “Part-Of-Speech Tagging using Neural network”, Proceedings of ICON-2009: 7th International Conference on Natural Language Processing. [11] Ekbal, Asif, Mondal, S., and S. Bandyopadhyay (2007) “POS Tagging using HMM and Rule-based Chunking”, In Proceedings of SPSAL-2007, IJCAI-07, pp. 25-28. [12] A. Ekbal, R. Haque and S. Bandyopadhyay (2008), “Maximum Entropy Based Bengali Part of Speech Tagging”, Advances in Natural Language Processing and Applications, Research in Computing Science (RCS) Journal, Vol. (33), pp. 67-78. [13] A. Ekbal, R. Haque and S. Bandyopadhyay (2007), “Bengali Part of Speech Tagging using Conditional Random Field”, Proceedings of the 7th International Symposium on Natural Language Processing (SNLP-07), Thailand, pp.131-136. [14] A. Ekbal and S. Bandyopadhyay (2008), “Part of Speech Tagging in Bengali using Support Vector Machine”, Proceedings of the International Conference on Information Technology (ICIT 2008), pp.106-111, IEEE. [15] A. Ekbal , M. Hasanuzzaman and S. Bandyopadhyay (2009), “Voted Approach for Part of Speech Tagging in Bengali”, Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC-09), December 3-5, Hong Kong, pp. 120-129. [16] Ganesan M (2007), “Morph and POS Tagger for Tamil” (Software) Annamalai University, Annamalai Nagar. [17] Arulmozhi P, Sobha L (2006) “A Hybrid POS Tagger for a Relatively Free Word Order Language”, Proceedings of MSPIL-2006, Indian Institute of Technology, Bombay. 258
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME [18] Dhanalakshmi V, Anandkumar M, Vijaya M.S, Loganathan R, Soman K.P, Rajendran S (2008), “Tamil Part-of-Speech tagger based on SVMTool”, Proceedings of the COLIPS International Conference on Asian Language Processing 2008 (IALP), Chiang Mai, Thailand. [19] S. Lakshmana Pandian and T. V. Geetha (2008), “Morpheme based Language Model for Tamil Part-of-Speech Tagging”, Research journal on Computer science and computer engineering with applications, July-Dec 2008, pp. 19-25. [20]Dhanalakshmi V, Anandkumar M, Shivapratap G, Soman, K P, Rajendran S (2009) “Tamil POS Tagging using Linear Programming”, International Journal of Recent Trends in Engineering, 1(2) pp.166-169. [21] T. Sreeganesh(2006), “Telugu Parts of Speech Tagging in WSD”, Language of India, Vol 6: 8 August 2006. [22] Avinesh PVS and Karthik Gali (2007), “Part-of-speech tagging and chunking using conditional random fields and transformation based learning”, Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 21–24. [23] Rama Sree, R.J, Kusuma Kumari P (2007), “Combining POS Taggers for improved Accuracy to create Telugu annotated texts for Information Retrieval”, Tirupati. [24] G.Sindhiya Binulal, P. Anand Goud, K.P.Soman(2009), “A SVM based approach to Telugu Parts Of Speech Tagging using SVMTool”, International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009 [25] Chirag Patel and Karthik Gali (2008), “Part-Of-Speech Tagging for Gujarati Using Conditional Random Fields”, Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, Hyderabad, India, pp. 117–122. [26] Manju K, Soumya S, Sumam Mary Idicula (2009), “Development of A Pos Tagger for Malayalam-An Experience”, Proceedings of 2009 International Conference on Advances in Recent Technologies in Communication and Computing, IEEE [27] Antony P.J, Santhanu P Mohan, Soman K.P (2010), “SVM Based Part of Speech Tagger for Malayalam”, Proceedings of 2010 International Conference on Recent Trends in Information, Telecommunication and Computing, IEEE. 259
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print), ISSN 0976 – 6375(Online) Volume 1, Number 1, May - June (2010), © IAEME [28] Thoudam Doren Singh, Sivaji Bandyopadhyay (2008), “Morphology Driven Manipuri POS Tagger”, Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, Hyderabad, India, pp. 91–98. [29] Thoudam Doren Singh, Sivaji Bandyopadhyay (2008), “Manipuri POS Tagging using CRF and SVM: A Language Independent Approach”, Proceedings of ICON-2008: 6th International Conference on Natural Language Processing. [30] Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, Jugal Kalita (2009), “Part of Speech Tagger for Assamese Text”, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Suntec, Singapore, pp. 33–36. 260