SlideShare a Scribd company logo
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
10.5121/ijnlc.2014.3312 121
CHUNKING IN MANIPURI USING CRF
Kishorjit Nongmeikapam1
,Chiranjiv Chingangbam1
, Nepoleon Keisham1
,
Biakchungnunga Varte1
, Sivaji Bandopadhyay2
1
Department of Computer Science & Engineering, Manipur Institute of Technology,
Manipur University, Imphal, India
2
Department of Computer Science & Engineering, Jadavpur University, West Bengal,
India
ABSTRACT
This paper deals about the chunking of the Manipuri language, which is very highly agglutinative in
Nature. The system works in such a way that the Manipuri text is clean upto the gold standard. The text is
processed for Part of Speech (POS) tagging using Conditional Random Field (CRF). The output file is
treated as an input file for the CRF based Chunking system. The final output is a completely chunk tag
Manipuri text. The system shows a recall of 71.30%, a precision of 77.36% and a F-measure of 74.21%.
KEYWORDS
CRF; POS; Chunk; Manipuri
1. INTRODUCTION
The Manipuri Language has its origin in the north-eastern parts of India, widely spoken in the
state Manipur, and some in the countries of Myanmar and Bangladesh. The Manipuri Language
belongs to a high agglutinative class of language. The Conditional Random Fields (CRFs) serve
as a powerful model for predicting structured labeling.
Chunking is the process of identifying and labeling the simple phrases (it may be a Noun Phrase
or a Verb Phrase) from the tagged output, of which the utterance of words for a given phrase
forms as a chunk for this language. A POS tagged sequence output might also form as a base
input for the CRF-based chunking.
We synthesized a full scale Manipuri chunked file as the output. The procedure that we follow is
that the input file is passed onto a CRF based POS tagger, and then this output from the tagger
serve as the input for the CRF based Chunking, which duly generates the output chunked file.
The paper is arranged in such a way that the related works is listed in Section II. Section III
describes the concept of Conditional Random Field (CRF) which is followed by the System
design at IV. The experiment and evaluation is discussed at Section V and the conclusion is
drawn at Section VI.
2. RELATED WORKS
Until now, no works in the area of CRF based chunking has ever been performed on the Manipuri
language. Most of the previous works for other languages on this area make use of two machine-
learning approaches for sequence labeling, namely HMM in [1] and the second approach as the
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
122
sequence labeling problem as a sequence of a classification problem, one for each of the labels in
the sequence.
Apart from the above two approaches, the CRF based chunking utilizes and gives the best of the
generative and classification models. It resembles the classical model, in a way that they can
accommodate many statistically correlated features of the inputs. And consecutively, it resembles
the generative model, they have the ability to trade-off decisions at different sequence positions,
and consequently it obtains a globally optimal labeling. It is shown in [2] that CRFs are better
than related classification models. Parsing by chunks is discussed in [3]. Dynamic programming
for parsing and estimation of stochastic unication-based grammars is mentioned in [4] and other
related works are found in [5]-[7].
And on the field of text chunking, [1] proposed a Conditional Random Field based approach. The
works on chunking can be observe applying both rule based and the probabilistic or statistical
methods.
3. CONCEPT OF CONDITION RANDOM FIELD
The concept of Conditional Random Field [8] is developed in order to calculate the conditional
probabilities of values on other designated input nodes of undirected graphical models. CRF
encodes a conditional probability distribution with a given set of features. It is an unsupervised
approach where the system learns by giving some training and can be used for testing other texts.
The conditional probability of a state sequence X=(x1, x2,..xT) given an observation sequence
Y=(y1, y2,..yT) is calculated as :
P(Y|X) =
1
ZX
exp(∑
t= 1
T
∑
k
λk f k( yt-1 ,yt , X,t))
---(1)
where, fk( yt-1,yt, X, t) is a feature function whose weight λk is a learnt weight associated with fk
and to be learned via training. The values of the feature functions may range between -∞ … +∞,
but typically they are binary. ZX is the normalization factor:
∑∑∑ =
=
T
t k
kk
y
XZ
1
t1-t t))X,,y,y(fexp λ
---(2)
which is calculated in order to make the probability of all state sequences sum to 1. This is
calculated as in Hidden Markov Model (HMM) and can be obtained efficiently by dynamic
programming. Since CRF defines the conditional probability P(Y|X), the appropriate objective
for parameter learning is to maximize the conditional likelihood of the state sequence or training
data.
∑=
N
1i
)x|P(ylog ii
---(3)
where, {(xi
, yi
)} is the labeled training data.
Gaussian prior on the λ’s is used to regularize the training (i.e., smoothing). If λ ~
N(0,ρ2
), the objective function becomes,
∑∑ −
= k
i
2
2N
1i 2
)x|P(ylog ii
ρ
λ
---(4)
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
123
The objective function is concave, so the λ’s have a unique set of optimal values.
4. SYSTEM DESIGN
The system works with the application of CRF in two layers. The first layer is meant for the POS
tagging of the Manipuri text file using certain features as mention in [9]. In the second layer the
output file of the CRF based POS tagging is used as an input file of the CRF based chunking.
Fig.1 explains the System block diagram.
The chunking tag is the I-O-B tagging. That is as follows:
TABLE I. IOB TAGGING
B-X Beginning of the chunk word X
I-X
Intermediate or non beginning chunk
word X
O Word outside of the chunk text
The processing and running of the CRF is shown on Fig. 2.
Figure 1. System Block diagram
The input file for the first time is a training file which gives and output of a model file and in the
second run the input file is a testing file. The output file of the CRF is a labeled file.
TEXT FILE
CRF BASED POS
TAGGER
CRF BASED MANIPURI
CHUNKER
CHUNKED MANIPURI
FILE
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
124
Figure 2. CRF based POS tagging
The working of CRF is mainly based on the feature selection. The feature listed for the POS
tagging is as follows:
F= { Wi-m, … ,W i-1, W i, W i+1, …, W i+n, SWi-m, …, SWi-1, SWi, SWi+1,… , SWi-n , number of
acceptable standard suffixes, number of acceptable standard prefixes, acceptable suffixes
present in the word, acceptable prefixes present in the word, word length, word frequency,
digit feature, symbol feature, RMWE}
The details of the set of features that have been applied for POS tagging in Manipuri are as
follows:
The details of the set of features that have been applied for POS tagging in Manipuri are as
follows:
1. Surrounding words as feature: Preceeding word(s) or the successive word(s) are important in
POS tagging because these words play an important role in determining the POS of the present
word.
2. Surrounding Stem words as feature: The Stemming algorithm mentioned in [10] is used.
The preceding and the following stemmed words of a particular word can be used as features. It is
because the preceding and the following words influence the present word POS tagging.
3. Number of acceptable standard suffixes as feature: As mention in [10], Manipuri being an
agglutinative language the suffixes plays an important in determining the POS of a word. For
every word the number of suffixes are identified during stemming and the number of suffixes is
used as a feature.
4. Number of acceptable standard prefixes as feature: Prefixes plays an important role for
Manipuri language. Prefixes are identified during stemming and the prefixes are used as a feature.
5. Acceptable suffixes present as feature: The standard 61 suffixes of Manipuri which are
identified is used as one feature. The maximum number of appended suffixes is reported as ten.
So taking into account of such cases, for every word ten columns separated by a space are created
for every suffix present in the word. A “0” notation is being used in those columns when the word
consists of no acceptable suffixes.
6. Acceptable prefixes present as feature: 11 prefixes have been manually identified in
Manipuri and the list of prefixes is used as one feature. For every word if the prefix is present
then a column is created mentioning the prefix, otherwise the “0” notation is used.
7. Length of the word: Length of the word is set to 1 if it is greater than 3 otherwise, it is set to
0. Very short words are generally pronouns and rarely proper nouns.
8. Word frequency: A range of frequency for words in the training corpus is set: those words
with frequency <100 occurrences are set the value 0, those words which occurs >=100 are set to
1. It is considered as one feature since occurrence of determiners, conjunctions and pronouns are
abundant.
Evaluation Results
Pre-processing
Documents Collection
Data Test
Labeling
Features Extraction
Data Training
CRF Model
Features Extraction
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
125
9. Digit features: Quantity measurement, date and monetary values are generally digits. Thus the
digit feature is an important feature. A binary notation of ‘1’ is used if the word consist of a digit
else ‘0’.
10. Symbol feature: Symbols like $,% etc. are meaningful in textual use, so the feature is set to 1
if it is found in the token, otherwise 0. This helps to recognize Symbols and Quantifier number
tags.
11. Reduplicated Multiword Expression (RMWE): (RMWE) are also considered as a feature
since Manipuri is rich of RMWE. The work of RMWE is used in [11].
5. EXPERIMENT AND EVALUATION
The text document file is cleaned for processing where the error and grammatical mistakes are
minutely checked by an expert. For the POS tagging the expert also mark each word with the
POS using a tag set. The POS marked texts are used for both training and testing.
Once the text document are tagged with the POS the same text with POS and the previous
features are used to run the CRF based chunking. In other word the POS tag are used as the other
features for the chunking. The C++ based CRF++ 0.53 package1
is used in this work and it is
readily available as open source for segmenting or labeling sequential data.
In total to train and test the system 30000 words corpus is used. This corpus is considered as gold
standard since an expert manually identifies the POS and the chunk words. Fig.3 shows the
sample of POS and chunking which are marked by the expert.
……………………………………………………....
................................................
oooo NN B-X
aaaa JJ B-X
NC I-X
aaaa QT I-X
VFC B-X
| SYM O
……….
………..
Figure 3. Smaple of the words with POS and BOI chunking
Of the 30000 words 20000 words are considered for the training and the rest of the 10000 are
used for the testing.
Evaluation is done with the parameter of Recall, Precision and F-score as follows:
Recall, R =
texttheinanscorrectofNo
systemthebygivenanscorrectofNo
Precision, P =
systemthebygivenansofNo
systemthebygivenanscorrectofNo
1

http://crfpp.sourceforge.net/
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
126
F-score, F =
RRRRPPPP2222ββββ 1)PR1)PR1)PR1)PR2222((((ββββ +
+
Where is one, precision and recall are given equal weight.
Different combinations of the features are tried for the chunking of the Manipuri text document.
Among the combinations the best features are found to be as follows:
F= { Wi-2, W i-1, W i, W i+1, SWi-1, SWi, SWi+1, number of acceptable standard suffixes,
number of acceptable standard prefixes, acceptable suffixes present in the word, acceptable
prefixes present in the word, word length, word frequency, digit feature, symbol feature,
reduplicated MWE, POS}
The Table II shows the recall, precision and f-measure of the system.
TABLE I. BEST RESULT
3.
CO
NCLUSIONS
So far, the chunking work on Manipuri is not reported and this work can be a starting point for the
future. Other algorithms for the improvement of the score can also be worked on. The main
handicap with this language is its highly agglutinative nature. The system shows a recall of
71.30%, a precision of 77.36% and a F-measure of 74.21% which has lot of rooms for
improvement.
REFERENCES
[1] Fei Sha and Fernando Pereira,“Shallow Parsing with Conditional Random Fields”.In the Proceedings
of HLT-NAACL 2003.
[2] John Lafferty, Andrew McCallum and Fernando Pereira, Conditional Random Fields: Probabilistic
Models for Segment-ing and Labeling Sequence Data.
[3] S. Abney. Parsing by chunks. In R. Berwick, S. Abney, and C. Tenny, editors, Principle-based
Parsing. Kluwer Academic Publishers, 1991.
[4] S. Geman and M. Johnson. Dynamic programming for parsing and estimation of stochastic
uni_cation-based grammars. In Proc. 40th ACL, 2002.
[5] A. Ratnaparkhi. A linear observed time statistical parser based on maximum entropy models. In C.
Cardie and R. Weischedel, editors, EMNLP-2. ACL, 1997.
[6] E. F. T. K. Sang. Memory-based shallow parsing. Journal of Machine Learning Research, 2:559.594,
2002.
[7] T. Zhang, F. Damerau, and D. Johnson. Text chunking based on a generalization of winnow. Journal
of Machine Learning Research, 2:615.637, 2002.
[8] Lafferty, J., McCallum, A., Pereira, F. Conditional Random Fields: Probabilistic Models for
Segmenting and Labeling Sequence Data, In the Procceedings of the 18th ICML01, Williamstown,
MA, USA., 2001, p. 282-289.
[9] Kishorjit, N. and Sivaji, B., “A Transliteration of CRF Based Manipuri POS Tagging”, In the
Proceedings of 2nd International Conference on Communication, Computing  Security (ICCCS-
2012), Elsevier Ltd, 2012
Model Recall Precision F-Score
CRF 71.30 77.36 74.21
International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014
127
[10] Kishorjit, N., Bishworjit, S., Romina, M., Mayekleima Chanu, Ng.  Sivaji, B., (2011) A Light
Weight Manipuri Stemmer, In the Proceedings of Natioanal Conference on Indian Language
Computing (NCILC), Chochin, India
[11] Kishorjit Nongmeikapam, Nonglenjaoba L., Nirmal Y.  Sivaji Bandhyopadhyay, Reduplicated
MWE (RMWE) Helps in Improving the CRF Based Manipuri POS Tagger, International Journal of
Information Technology Convergence and Services (IJITCS) Vol.2, No.1, DOI :
10.5121/ijitcs.2012.2106, 2012, p.45-59.
Authors
Kishorjit Nongmeikapam is working as Asst. Professor at Department of Computer
Science and Engineering, MIT, Manipur University, India. He has completed his BE from
PSG college of Tech., Coimbatore and has completed his ME from Jadavpur University,
Kolkata, India. He is presently doing research in the area of Multiword Expression and its
applications. He has so far published 30 papers and presently handling a Transliteration
project funded by DST, Govt. of Manipur, India. He is the author of the Book, “See the C
Programming Language”.
Chiranjiv Chingangbam is presently a student of Manipur Institute Of Technology. He
is pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is
NLP.
Nepoleon Keisham is presently a student of Manipur Institute Of Technology. He is
pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is
NLP.
Biakchnungnunga Varte is presently a student of Manipur Institute Of Technology. He is
pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is
NLP.
Sivaji Bandyopadhyay is working as a Professor since 2001 in the Computer Science
and Engineering Department at Jadavpur University, Kolkata, India. His research interests
include machine translation, sentiment analysis, textual entailment, question answering
systems and information retrieval among others. He is currently supervising six national
and international level projects in various areas of language technology. He has published
a large number of journal and conference publications.

More Related Content

What's hot

BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
JEE HYUN PARK
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
ijaia
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
ijnlc
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
WarNik Chow
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
ijnlc
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
ijnlc
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
AbdurrahimDerric
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
ijistjournal
 
Bert
BertBert
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
IJECEIAES
 
combination
combinationcombination
combination
Harshana Madushanka
 
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
IJERA Editor
 
P-6
P-6P-6
P-6
butest
 
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
ijnlc
 

What's hot (17)

BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODELEXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
EXTRACTIVE SUMMARIZATION WITH VERY DEEP PRETRAINED LANGUAGE MODEL
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMHINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVM
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
Bert
BertBert
Bert
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
 
combination
combinationcombination
combination
 
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
 
P-6
P-6P-6
P-6
 
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELNERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODEL
 

Viewers also liked

CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSINGCLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
ijnlc
 
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONHANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
ijnlc
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
ijnlc
 
Building a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemBuilding a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl system
ijnlc
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
ijnlc
 
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
ijnlc
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
ijnlc
 
Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...
ijnlc
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
ijnlc
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
ijnlc
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
ijnlc
 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymy
ijnlc
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
ijnlc
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
ijnlc
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
ijnlc
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
ijnlc
 
khelchandra project on ai
khelchandra project on aikhelchandra project on ai
khelchandra project on ai
gopaljee1989
 
3ªHistoria: Lánzate a la Piscina
3ªHistoria: Lánzate a la Piscina3ªHistoria: Lánzate a la Piscina
3ªHistoria: Lánzate a la Piscina
Google Emprendedores
 
doc
docdoc
docP S
 
New Microsoft Excel Worksheet
New Microsoft Excel WorksheetNew Microsoft Excel Worksheet
New Microsoft Excel Worksheet
babuchak
 

Viewers also liked (20)

CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSINGCLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
CLUSTERING WEB SEARCH RESULTS FOR EFFECTIVE ARABIC LANGUAGE BROWSING
 
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATIONHANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
HANDLING UNKNOWN WORDS IN NAMED ENTITY RECOGNITION USING TRANSLITERATION
 
Hybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic textHybrid part of-speech tagger for non-vocalized arabic text
Hybrid part of-speech tagger for non-vocalized arabic text
 
Building a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl systemBuilding a vietnamese dialog mechanism for v dlg~tabl system
Building a vietnamese dialog mechanism for v dlg~tabl system
 
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITIONA MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
A MULTI-STREAM HMM APPROACH TO OFFLINE HANDWRITTEN ARABIC WORD RECOGNITION
 
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
IMPLEMENTATION OF NLIZATION FRAMEWORK FOR VERBS, PRONOUNS AND DETERMINERS WIT...
 
Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...Smart grammar a dynamic spoken language understanding grammar for inflective ...
Smart grammar a dynamic spoken language understanding grammar for inflective ...
 
Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...Developemnt and evaluation of a web based question answering system for arabi...
Developemnt and evaluation of a web based question answering system for arabi...
 
An improved apriori algorithm for association rules
An improved apriori algorithm for association rulesAn improved apriori algorithm for association rules
An improved apriori algorithm for association rules
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymy
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
 
khelchandra project on ai
khelchandra project on aikhelchandra project on ai
khelchandra project on ai
 
3ªHistoria: Lánzate a la Piscina
3ªHistoria: Lánzate a la Piscina3ªHistoria: Lánzate a la Piscina
3ªHistoria: Lánzate a la Piscina
 
doc
docdoc
doc
 
New Microsoft Excel Worksheet
New Microsoft Excel WorksheetNew Microsoft Excel Worksheet
New Microsoft Excel Worksheet
 

Similar to Chunking in manipuri using crf

Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
sipij
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
kevig
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Publishing House
 
Isolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationIsolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantization
eSAT Journals
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Model
ijtsrd
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
IJMER
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
ijcsit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
ijcseit
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languages
ijnlc
 
50120140503001
5012014050300150120140503001
50120140503001
IAEME Publication
 
50120140503001
5012014050300150120140503001
50120140503001
IAEME Publication
 
50120140503001
5012014050300150120140503001
50120140503001
IAEME Publication
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
ijcsit
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
AIRCC Publishing Corporation
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation method
ijcsit
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
ijfcstjournal
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
ijnlc
 

Similar to Chunking in manipuri using crf (20)

Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
BIDIRECTIONAL LONG SHORT-TERM MEMORY (BILSTM)WITH CONDITIONAL RANDOM FIELDS (...
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Isolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationIsolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantization
 
Myanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov ModelMyanmar Named Entity Recognition with Hidden Markov Model
Myanmar Named Entity Recognition with Hidden Markov Model
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
A survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languagesA survey of named entity recognition in assamese and other indian languages
A survey of named entity recognition in assamese and other indian languages
 
50120140503001
5012014050300150120140503001
50120140503001
 
50120140503001
5012014050300150120140503001
50120140503001
 
50120140503001
5012014050300150120140503001
50120140503001
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation method
 
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOLHIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
HIDDEN MARKOV MODEL BASED NAMED ENTITY RECOGNITION TOOL
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
 

Recently uploaded

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 

Recently uploaded (20)

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 

Chunking in manipuri using crf

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 10.5121/ijnlc.2014.3312 121 CHUNKING IN MANIPURI USING CRF Kishorjit Nongmeikapam1 ,Chiranjiv Chingangbam1 , Nepoleon Keisham1 , Biakchungnunga Varte1 , Sivaji Bandopadhyay2 1 Department of Computer Science & Engineering, Manipur Institute of Technology, Manipur University, Imphal, India 2 Department of Computer Science & Engineering, Jadavpur University, West Bengal, India ABSTRACT This paper deals about the chunking of the Manipuri language, which is very highly agglutinative in Nature. The system works in such a way that the Manipuri text is clean upto the gold standard. The text is processed for Part of Speech (POS) tagging using Conditional Random Field (CRF). The output file is treated as an input file for the CRF based Chunking system. The final output is a completely chunk tag Manipuri text. The system shows a recall of 71.30%, a precision of 77.36% and a F-measure of 74.21%. KEYWORDS CRF; POS; Chunk; Manipuri 1. INTRODUCTION The Manipuri Language has its origin in the north-eastern parts of India, widely spoken in the state Manipur, and some in the countries of Myanmar and Bangladesh. The Manipuri Language belongs to a high agglutinative class of language. The Conditional Random Fields (CRFs) serve as a powerful model for predicting structured labeling. Chunking is the process of identifying and labeling the simple phrases (it may be a Noun Phrase or a Verb Phrase) from the tagged output, of which the utterance of words for a given phrase forms as a chunk for this language. A POS tagged sequence output might also form as a base input for the CRF-based chunking. We synthesized a full scale Manipuri chunked file as the output. The procedure that we follow is that the input file is passed onto a CRF based POS tagger, and then this output from the tagger serve as the input for the CRF based Chunking, which duly generates the output chunked file. The paper is arranged in such a way that the related works is listed in Section II. Section III describes the concept of Conditional Random Field (CRF) which is followed by the System design at IV. The experiment and evaluation is discussed at Section V and the conclusion is drawn at Section VI. 2. RELATED WORKS Until now, no works in the area of CRF based chunking has ever been performed on the Manipuri language. Most of the previous works for other languages on this area make use of two machine- learning approaches for sequence labeling, namely HMM in [1] and the second approach as the
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 122 sequence labeling problem as a sequence of a classification problem, one for each of the labels in the sequence. Apart from the above two approaches, the CRF based chunking utilizes and gives the best of the generative and classification models. It resembles the classical model, in a way that they can accommodate many statistically correlated features of the inputs. And consecutively, it resembles the generative model, they have the ability to trade-off decisions at different sequence positions, and consequently it obtains a globally optimal labeling. It is shown in [2] that CRFs are better than related classification models. Parsing by chunks is discussed in [3]. Dynamic programming for parsing and estimation of stochastic unication-based grammars is mentioned in [4] and other related works are found in [5]-[7]. And on the field of text chunking, [1] proposed a Conditional Random Field based approach. The works on chunking can be observe applying both rule based and the probabilistic or statistical methods. 3. CONCEPT OF CONDITION RANDOM FIELD The concept of Conditional Random Field [8] is developed in order to calculate the conditional probabilities of values on other designated input nodes of undirected graphical models. CRF encodes a conditional probability distribution with a given set of features. It is an unsupervised approach where the system learns by giving some training and can be used for testing other texts. The conditional probability of a state sequence X=(x1, x2,..xT) given an observation sequence Y=(y1, y2,..yT) is calculated as : P(Y|X) = 1 ZX exp(∑ t= 1 T ∑ k λk f k( yt-1 ,yt , X,t)) ---(1) where, fk( yt-1,yt, X, t) is a feature function whose weight λk is a learnt weight associated with fk and to be learned via training. The values of the feature functions may range between -∞ … +∞, but typically they are binary. ZX is the normalization factor: ∑∑∑ = = T t k kk y XZ 1 t1-t t))X,,y,y(fexp λ ---(2) which is calculated in order to make the probability of all state sequences sum to 1. This is calculated as in Hidden Markov Model (HMM) and can be obtained efficiently by dynamic programming. Since CRF defines the conditional probability P(Y|X), the appropriate objective for parameter learning is to maximize the conditional likelihood of the state sequence or training data. ∑= N 1i )x|P(ylog ii ---(3) where, {(xi , yi )} is the labeled training data. Gaussian prior on the λ’s is used to regularize the training (i.e., smoothing). If λ ~ N(0,ρ2 ), the objective function becomes, ∑∑ − = k i 2 2N 1i 2 )x|P(ylog ii ρ λ ---(4)
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 123 The objective function is concave, so the λ’s have a unique set of optimal values. 4. SYSTEM DESIGN The system works with the application of CRF in two layers. The first layer is meant for the POS tagging of the Manipuri text file using certain features as mention in [9]. In the second layer the output file of the CRF based POS tagging is used as an input file of the CRF based chunking. Fig.1 explains the System block diagram. The chunking tag is the I-O-B tagging. That is as follows: TABLE I. IOB TAGGING B-X Beginning of the chunk word X I-X Intermediate or non beginning chunk word X O Word outside of the chunk text The processing and running of the CRF is shown on Fig. 2. Figure 1. System Block diagram The input file for the first time is a training file which gives and output of a model file and in the second run the input file is a testing file. The output file of the CRF is a labeled file. TEXT FILE CRF BASED POS TAGGER CRF BASED MANIPURI CHUNKER CHUNKED MANIPURI FILE
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 124 Figure 2. CRF based POS tagging The working of CRF is mainly based on the feature selection. The feature listed for the POS tagging is as follows: F= { Wi-m, … ,W i-1, W i, W i+1, …, W i+n, SWi-m, …, SWi-1, SWi, SWi+1,… , SWi-n , number of acceptable standard suffixes, number of acceptable standard prefixes, acceptable suffixes present in the word, acceptable prefixes present in the word, word length, word frequency, digit feature, symbol feature, RMWE} The details of the set of features that have been applied for POS tagging in Manipuri are as follows: The details of the set of features that have been applied for POS tagging in Manipuri are as follows: 1. Surrounding words as feature: Preceeding word(s) or the successive word(s) are important in POS tagging because these words play an important role in determining the POS of the present word. 2. Surrounding Stem words as feature: The Stemming algorithm mentioned in [10] is used. The preceding and the following stemmed words of a particular word can be used as features. It is because the preceding and the following words influence the present word POS tagging. 3. Number of acceptable standard suffixes as feature: As mention in [10], Manipuri being an agglutinative language the suffixes plays an important in determining the POS of a word. For every word the number of suffixes are identified during stemming and the number of suffixes is used as a feature. 4. Number of acceptable standard prefixes as feature: Prefixes plays an important role for Manipuri language. Prefixes are identified during stemming and the prefixes are used as a feature. 5. Acceptable suffixes present as feature: The standard 61 suffixes of Manipuri which are identified is used as one feature. The maximum number of appended suffixes is reported as ten. So taking into account of such cases, for every word ten columns separated by a space are created for every suffix present in the word. A “0” notation is being used in those columns when the word consists of no acceptable suffixes. 6. Acceptable prefixes present as feature: 11 prefixes have been manually identified in Manipuri and the list of prefixes is used as one feature. For every word if the prefix is present then a column is created mentioning the prefix, otherwise the “0” notation is used. 7. Length of the word: Length of the word is set to 1 if it is greater than 3 otherwise, it is set to 0. Very short words are generally pronouns and rarely proper nouns. 8. Word frequency: A range of frequency for words in the training corpus is set: those words with frequency <100 occurrences are set the value 0, those words which occurs >=100 are set to 1. It is considered as one feature since occurrence of determiners, conjunctions and pronouns are abundant. Evaluation Results Pre-processing Documents Collection Data Test Labeling Features Extraction Data Training CRF Model Features Extraction
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 125 9. Digit features: Quantity measurement, date and monetary values are generally digits. Thus the digit feature is an important feature. A binary notation of ‘1’ is used if the word consist of a digit else ‘0’. 10. Symbol feature: Symbols like $,% etc. are meaningful in textual use, so the feature is set to 1 if it is found in the token, otherwise 0. This helps to recognize Symbols and Quantifier number tags. 11. Reduplicated Multiword Expression (RMWE): (RMWE) are also considered as a feature since Manipuri is rich of RMWE. The work of RMWE is used in [11]. 5. EXPERIMENT AND EVALUATION The text document file is cleaned for processing where the error and grammatical mistakes are minutely checked by an expert. For the POS tagging the expert also mark each word with the POS using a tag set. The POS marked texts are used for both training and testing. Once the text document are tagged with the POS the same text with POS and the previous features are used to run the CRF based chunking. In other word the POS tag are used as the other features for the chunking. The C++ based CRF++ 0.53 package1 is used in this work and it is readily available as open source for segmenting or labeling sequential data. In total to train and test the system 30000 words corpus is used. This corpus is considered as gold standard since an expert manually identifies the POS and the chunk words. Fig.3 shows the sample of POS and chunking which are marked by the expert. …………………………………………………….... ................................................ oooo NN B-X aaaa JJ B-X NC I-X aaaa QT I-X VFC B-X | SYM O ………. ……….. Figure 3. Smaple of the words with POS and BOI chunking Of the 30000 words 20000 words are considered for the training and the rest of the 10000 are used for the testing. Evaluation is done with the parameter of Recall, Precision and F-score as follows: Recall, R = texttheinanscorrectofNo systemthebygivenanscorrectofNo Precision, P = systemthebygivenansofNo systemthebygivenanscorrectofNo 1 http://crfpp.sourceforge.net/
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 126 F-score, F = RRRRPPPP2222ββββ 1)PR1)PR1)PR1)PR2222((((ββββ + + Where is one, precision and recall are given equal weight. Different combinations of the features are tried for the chunking of the Manipuri text document. Among the combinations the best features are found to be as follows: F= { Wi-2, W i-1, W i, W i+1, SWi-1, SWi, SWi+1, number of acceptable standard suffixes, number of acceptable standard prefixes, acceptable suffixes present in the word, acceptable prefixes present in the word, word length, word frequency, digit feature, symbol feature, reduplicated MWE, POS} The Table II shows the recall, precision and f-measure of the system. TABLE I. BEST RESULT 3. CO NCLUSIONS So far, the chunking work on Manipuri is not reported and this work can be a starting point for the future. Other algorithms for the improvement of the score can also be worked on. The main handicap with this language is its highly agglutinative nature. The system shows a recall of 71.30%, a precision of 77.36% and a F-measure of 74.21% which has lot of rooms for improvement. REFERENCES [1] Fei Sha and Fernando Pereira,“Shallow Parsing with Conditional Random Fields”.In the Proceedings of HLT-NAACL 2003. [2] John Lafferty, Andrew McCallum and Fernando Pereira, Conditional Random Fields: Probabilistic Models for Segment-ing and Labeling Sequence Data. [3] S. Abney. Parsing by chunks. In R. Berwick, S. Abney, and C. Tenny, editors, Principle-based Parsing. Kluwer Academic Publishers, 1991. [4] S. Geman and M. Johnson. Dynamic programming for parsing and estimation of stochastic uni_cation-based grammars. In Proc. 40th ACL, 2002. [5] A. Ratnaparkhi. A linear observed time statistical parser based on maximum entropy models. In C. Cardie and R. Weischedel, editors, EMNLP-2. ACL, 1997. [6] E. F. T. K. Sang. Memory-based shallow parsing. Journal of Machine Learning Research, 2:559.594, 2002. [7] T. Zhang, F. Damerau, and D. Johnson. Text chunking based on a generalization of winnow. Journal of Machine Learning Research, 2:615.637, 2002. [8] Lafferty, J., McCallum, A., Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, In the Procceedings of the 18th ICML01, Williamstown, MA, USA., 2001, p. 282-289. [9] Kishorjit, N. and Sivaji, B., “A Transliteration of CRF Based Manipuri POS Tagging”, In the Proceedings of 2nd International Conference on Communication, Computing Security (ICCCS- 2012), Elsevier Ltd, 2012 Model Recall Precision F-Score CRF 71.30 77.36 74.21
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 127 [10] Kishorjit, N., Bishworjit, S., Romina, M., Mayekleima Chanu, Ng. Sivaji, B., (2011) A Light Weight Manipuri Stemmer, In the Proceedings of Natioanal Conference on Indian Language Computing (NCILC), Chochin, India [11] Kishorjit Nongmeikapam, Nonglenjaoba L., Nirmal Y. Sivaji Bandhyopadhyay, Reduplicated MWE (RMWE) Helps in Improving the CRF Based Manipuri POS Tagger, International Journal of Information Technology Convergence and Services (IJITCS) Vol.2, No.1, DOI : 10.5121/ijitcs.2012.2106, 2012, p.45-59. Authors Kishorjit Nongmeikapam is working as Asst. Professor at Department of Computer Science and Engineering, MIT, Manipur University, India. He has completed his BE from PSG college of Tech., Coimbatore and has completed his ME from Jadavpur University, Kolkata, India. He is presently doing research in the area of Multiword Expression and its applications. He has so far published 30 papers and presently handling a Transliteration project funded by DST, Govt. of Manipur, India. He is the author of the Book, “See the C Programming Language”. Chiranjiv Chingangbam is presently a student of Manipur Institute Of Technology. He is pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is NLP. Nepoleon Keisham is presently a student of Manipur Institute Of Technology. He is pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is NLP. Biakchnungnunga Varte is presently a student of Manipur Institute Of Technology. He is pursuing his B.E. in Dept. of Computer Science and Engineering. His area of interest is NLP. Sivaji Bandyopadhyay is working as a Professor since 2001 in the Computer Science and Engineering Department at Jadavpur University, Kolkata, India. His research interests include machine translation, sentiment analysis, textual entailment, question answering systems and information retrieval among others. He is currently supervising six national and international level projects in various areas of language technology. He has published a large number of journal and conference publications.