SlideShare a Scribd company logo
1 of 8
Download to read offline
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
DOI : 10.5121/ijnlc.2015.4102 23
AN IMPLEMENTATION OF APERTIUM BASED ASSAMESE
MORPHOLOGICAL ANALYZER
Mirzanur Rahman1
and Shikhar Kumar Sarma2
Department of Information Technology, Gauhati University, Guwahati, Assam, India
ABSTRACT
Morphological Analysis is an important branch of linguistics for any Natural Language Processing
Technology. Morphology studies the word structure and formation of word of a language. In current
scenario of NLP research, morphological analysis techniques have become more popular day by day. For
processing any language, morphology of the word should be first analyzed. Assamese language contains
very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers
for developing morphological analyzer for Assamese Language with some limited domain and we get
72.7% accuracy
KEYWORDS
Assamese Language, Morphology, Natural Language Processing, FST
1.INTRODUCTION
Assamese is the major language spoken in Assam. The state Assam is the north –eastern part of
the country. Assamese Language served as a bridge language among different speech
communities in the whole area of the state. The language Assamese is an Indo-Aryan language
originated from Vedic dialects [1]. The language as it stands today, passes through tremendous
modifications in all the component viz. phonology, morphology, conjunction etc. There are two
variations of Assamese language according to dialectical regions [2] i.e. Eastern Assamese and
Western Assamese Language. Both are different in terms of phonology and morphology. But still
the written text is same for all the regions.
Morphology is an important component of any language. So, before any processing, we must first
analyze the morphology of the words of that language. In language processing technology such as
Machine Translation [15], Parsing, POS Tagger, Text summarization etc requires morphological
analyzers to find out the lexical component of a word. And lexical components are the very
important parts of a grammar of a language.
In our work we have considered only the standard written Assamese Text corpus and use
Apertium engine with Lttoolbox to build morphological analyzer for Assamese language. In this
paper we have discuss how we proceed towards developing morphological analyzer.
2. MORPHOLOGY OF A LANGUAGE
In Language Technology research, morphological analysis studies the structure of words and
word formation of a language.
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
24
Words in a language can be divided into many small units which are known as morpheme [10].
Recognizing different morpheme in a word with their lexical properties is known as
morphological analysis. For example in English language
 Girls=Girl +s
o Root : Girl (category- noun)
o Affix: „s‟ (indefinite plural marker)
In the above example the word “Girls” is a combination of “Girl” morpheme and “s” morpheme.
When we analyze a word with morphological analyzer, it should provide all the combination of
morpheme with their lexical properties.
According to Golockchandra Goswami [1], all the morphemes of Assamese Language can be
divided in to three categories.
o Root Morpheme
o Sub- Root Morpheme
o Affixes Morpheme
Root morphemes are the main morpheme depending on which all the morphological construction
is done and affixes are attached. Root morpheme may be categorize into other Parts-of-speech
category like noun, pronoun, verb, indeclinable etc.
For example in Assamese Language:
 ল’ৰাজন = ল’ৰা + জন
o Root: ল’ৰা (Category -Noun)
o Affix: জন (Definite Article, singular case marker)
Sub-root morphemes are the morpheme which can occur as an independent root as well as a
suffix in an Assamese sentence.
For example:
 খন, জন ডাল etc. can be used as a root word or as a suffix for singular definitive
o মানুহজন বৰ ভাল (“জন” is used as a singular suffix marker)
o মমাৰ জন আহহলললন? (“জন” is used as a independent root)
 মবাৰ/মবাৰৰ, হবলাক etc can be used as a root word or as a suffix for plural definitive
o বস্তুমবাৰৰ দাম নাই৷ (“মবাৰৰ” is used as a plural suffix marker)
o ম ামাললাকৰ মবাৰৰ হক খবৰ? (“মবাৰৰ” is used as a independent root)
Affixes are always added to the root and it contains some own meaning. But they have no
separate existence and can never form a free form alone or in conjunction with themselves.
For Example: এ (মানুলহ), ক (মানুহক), লল (ম ামালল),লক (ভাললক) etc
3. PRIOR ARTS
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
25
For Assamese language also we have found some of the reported work for morphological
analysis. In this section we will try to summarize all the reported work related to Assamese
morphological analysis.
 In [3], the authors have presented building Morphological Analyzers using the Suffix
Stripping method for the four languages – Assamese, Bengali, Bodo and Oriya. In the
proposed mechanism they have deals with only inflectional suffixes. The method
involves identifying individual suffixes from a series of suffixes attached to a stem/root,
using morpheme sequencing rules.
In the approach the analyzer analyses the inflected form of a word into suffixes and stems
by using a root/stem dictionary (for identifying legitimate roots/stems), a list of suffixes,
comprising of all possible suffixes that various categories can take (in order to identify a
valid suffix), and the morpheme sequencing rules. . The authors get 50 % coverage for
7000 to 8000 root entries.
 In [4], the authors have presented A Suffix-based Noun and Verb Classifier for an
Inflectional Language. In the proposed mechanism they have consider only the morpho-
syntactic properties of Assamese words. Assamese words can be categorized into
inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and
particle.
 In [5], the authors describe an approach to unsupervised learning of morphology from an
unannotated corpus for Assamese Language in their paper “Acquisition of Morphology of
an Indic Language from Text Corpus”. In their paper they have present & elaborately
discussed an unsupervised method for acquisition of Assamese morphology from a text
corpus. This is the initial work towards unsupervised morphological analysis and it is
very suitable for Assamese language. This approach, acquire the suffixation morphology
of the language from a text corpus of about 300,000 words and build a morphological
lexicon. The F-measure of the suffix acquisition is about 69%.
 In [6], the authors have presented suffix stripping approach, where they add a rule engine
which generates all the possible suffix sequences for analyzing morphology of a word.
They got 82% accuracy with a root-word list of size 20,000 approximately with this
method.
 In [7], the authors combine a rule based algorithm and HMM based algorithm. Where
rule based algorithm is used for predicting multiple letter suffixes and an HMM based
algorithm for predicting the single letter suffixes .This added method can predict
morphologically inflected words with 92% accuracy.
 In [8] Utpal Sarma proposed an unsupervised method for learning morphology of a
language in his Ph.D thesis “Unsupervised Learning of Morphology of a Highly
Inflectional Language”
4. IMPLEMENTATION USING APERTIUM AND LTTOOLBOX
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
26
Apertium is a rule-based open-source shallow-transfer machine translation platform [11]. It is free
software and released under the terms of the GNU General Public License. It includes the engine,
maintenance tools, and open linguistic data for several language pairs. Lttoolbox is a toolbox for
lexical processing, morphological analysis and generation of words. Lttoolbox used finite-state
transducers (FST). FST are a type of finite-state automata, which may be used as one-pass
morphological analyzers.
In Apertium, the analyzer data is stored in Apertium‟s dictionary (dix) format with XML syntax.
The analyzer can be easily converted to a morphological generator from the single morphological
database (monodix), depending on in which direction the system read the dictionary. If the system
read the dictionary from left to right, we obtain the analyzer, and read from right to left, we obtain
the generator. It is proven that an XML based dictionary (monodix) is generally faster than a
normal text or database based dictionary.
For creating Morphological Analyzer, different modules of Apertium engine are required.
4.1. Dictionary
An Apertium based system can use two types of dictionaries, Monolingual and Bilingual
Dictionary. Monolingual dictionary is used for Morphological analyzer & generator and Bilingual
dictionary is used for machine translation purpose. In our work, we use monolingual dictionary.
4.2. Paradigm definitions <pardef>
A Paradigm is the complete set of related inflectional and productive derivational word forms of a
given category. A paradigm can be understood as a small dictionary of alternative transformations
that can be concatenated to the parts of words (or to entries of another paradigm) to specify
regularities in the lexical processing of the dictionary entries, such as inflection regularities. In the
definition along with the root word it contains other information like category, gender, number,
person, case marker, tense etc.
Figure 1: Dictionary entry of Assamese Word চকু (Eye) in Paradigm
4.3. Element for Reference to a Paradigm
Apertium provides a lexico-semantic layer, for working with inflection of a word. The layer
introduces the lexemes into derivation and concurrently follows the inflection of the derived
lexeme. It is used inside <pardefs> entry. Main advantage of using reference paradigm is that,
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
27
there is no need to write all the inflected forms of a lemma in a morphological dictionary entry
because it can be referred from other paradigms.
Figure 2: XML format for Reference Paradigm
4.4. Morpheme
All the root word (morpheme) is included in the dictionary, generated for Morphological
Analyzer. The dictionary is different from a conventional dictionary, because it contains other
information with morpheme like lexical categories and their corresponding paradigm.
Figure 3: Dictionary Entry of Morpheme with their lexical category
4.5. Lttoolbox Modules
Lttoolbox contains three modules, lexical processing (lt-comp), morphological analysis &
generation (lt-proc) and Expansion (lt-expand).
For Morphological analysis lt-comp and lt-proc module is required, lt-comp for processing and lt-
proc for generation [12]. Lt-comp module is responsible for compiling our morphological
dictionaries into its own finite-sate representation and lt-comp module is responsible for
processing the compiled input data into required output.
4.5.1. Compilation:
lt-comp module compile the given .dix format file into binary format from left to right
(LR) or from right to lest (RL). When we compile with LR, it creates an analyzer and
RL usually creates a generator.
Syntax of lt-comp:
$ lt-comp lr apertium-asm.morph.dix analyser.bin
Compile the apertium-asm.morph.dix dictionary in a left-to-right manner into the
binary analyser.bin
$ lt-comp rl apertium-asm.morph.dix generator.bin
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
28
Compile the apertium-asm.morph.dix dictionary in a right-to-left manner into the
binary generator.bin
4.5.2. Processing:
lt-proc module contains two functions, one is analysis (which is the default mode)
and generation. Analysis converts surface forms into the set of possible lexical forms,
while generation converts a lexical form into the corresponding surface form.
Syntax of lt-proc:
$ echo “চকুযুহৰ" | lt-proc analyser.bin
Output: ^চকুযুহৰ/চকু<n><pl>$
Here we analyze the Assamese word চকুযুহৰ (Eyes) with the binary format dictionary
(left-to-right) analyser.in
$ echo "^চকু<n><pl>$" | lt-proc -g generator.bin
Output: চকুযুহৰ
Here we generate the plural form of Assamese word চকু (Eye) with the binary format
dictionary (right-to-left) generator.bin
4.6. Meaning of Analyzers Output format
Figure 4: Meaning of output format (Morphological Analyzer)
4.7. Current Dictionary
In our current work, we have considered only limited number of word with selected Parts-of-
Speech (POS) categories. The following table shows summery of our used database entries
Table 1: Number of Entries in XML Dictionary (apertium-asm.morph.dix)
Main Category Entry in dictionary
Noun 22368
Pronoun 121
Verb 1844
Adverb 232
As a source of our dictionary, we have use following sources
 Assamese text corpus obtained from Language Technology Development Project,
Gauhati University
 Asamiya Abhidhan [9]
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
29
All the dictionary entries are done manually by using notepad++ software. The root words lexical
categories are verified by linguistics, so that the output of the analyzer is proper.
5. EVALUATION AND TEST RESULT
In the above table (Table no 1) we have seen that the number of dictionary entry is only 24565,
which is not a very high in number. Since the work is going on, we can expect in future we will
have an XML dictionary with large number of entries. Here we have considered only the most
frequently used words in Assamese Language.
Till now we have not added any rule for lexical selection in the Apertium engine. That‟s why
some times it cannot analyze a word properly. For example, the Assamese word জন can be used
as a suffix or it can be used as a person name. Most of the time the word জন (man in general)
(Definite Article) is used as a suffix in Assamese sentences, but if someone use this word as a
person name (Though জন(jhon) (Proper noun) is not commonly used as a name in Assam) then
our analyzer cannot give proper analysis. We have faced another problem with this analyzer is
that if a word has more than one meaning depending on the situation and position within the
sentence, it cannot analyze properly. For Example the Assamese Words: মালা (Garland) and অনল
(Fire). Both can be used as a material noun or proper noun depending on the use of the word in
the sentence.
In testing phase we have use a set of data collected from different Assamese blogs and pages
containing 1120 words (after Cleaning). Words are first tokenize and passes through cleaning
process (for removing stop word, delimiter and extra white space ) with the help of java
programming language. Then one by one we pass the word to the Apertium engine for analysis
and store the result in a text file. The text file is checked manually for correctness of the results.
The result we have found is shown below
Table 2: Test Result
Total words 1120
Correctly recognize 815
Wrongly recognize 305
From the above table we have seen that the analyzer provides only 72.7% correct results .Other
27% are wrongly recognize due to limited database entry , unavailability of lexical rules for
selecting proper category and limited POS category.
6. CONCLUSION
In this paper we have discus about the implementation of a Morphological analyzer using
Apertium & Lttoolbox. At present this analyzer can handles only inflectional morphology, since
we are excluding derivational morphology and we are working on noun, pronoun, verb and
adverb. Our current dictionary can only provide information about suffixes.
Form the previous works (in section III) we can see that maximum works done with supervised
suffix stripping method. Only limited no of [5, 8] reported work has implement unsupervised
technique for analyzing the morphology. Here we have used supervised Finite-state-transducer
(FST) method with the help of Apertium engine, since Finite-state-transducers have many
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015
30
advantages [13]. With the help of single source, FST can work as bidirectional engine for both
analysis and generation and they are fast (thousands of words per second), and compact.
Currently our morphological analyzer is in initial stage. In the future we will extend our work to
the remaining grammatical categories, include derivational morphology and populate dictionary
with prefix information to get better performance.
ACKNOWLEDGEMENTS
The authors are thankful to the Department of Information Technology, Gauhati University for
providing us the corpus, which helped us in building the MA system and people from Language
Technology Development Project, Gauhati University for their immense support
REFERENCES
[1] Kalyanee Kanchan Baruah, Pranjal Das, Abdul Hannan, Shikhar Kr Sarma, “Assamese- Goswami
Golockchandra (1982), Structure of Assamese, Gauhati University publication
[2] Bani Kanta Kakati (1962), Assamese, Its Formation and Development
[3] Mona Parakh and Rajesha N, “Developing Morphological Analyzer for Four Indian Languages Using
A Rule Based Affix Stripping Approach”, Linguistic Data Consortium for Indian Languages, CIIL,
Mysore, 2011.
[4] Navanath Saharia, Utpal Sharma and Jugal Kalita, “A Suffix based Noun and Verb Classifier for an
Inflectional Language” International Conference on Asian Language Proceesing(IALP-10), China,
2010
[5] Sharma, Utpal and Kalita, Jugal K and Das, Rajib K. “Acquisition of Morphology of an Indic
Language from Text Corpus”. ACM Transactions of Asian Language Information Processing
(TALIP), vol 7, no. 3, article 9, p 1-33, August 2008.
[6] Navanath Saharia, Utpal Sharma and Jugal Kalita, “Analysis and Evaluation of Stemming
Algorithms: A case study with Assamese” Proceedings of the International Conference on Advances
in Computing, Communications and Informatics, Pages 842-846, Chennai, 2012
[7] Navanath Saharia, Kishori M. Konwar, Utpal Sharma and Jugal Kalita, “An Improved Stemming
Approach Using HMM for a Highly Inflectional Language”, Proceedings of 14th International
Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Pages 164-173,
Samos, Greece, 2013
[8] Utpal Sharma, Unsupervised Learning of Morphology of a Highly Inflectional Language, Phd. Thesis,
2006
[9] Giridhar Sarma (1952), Asamiya Abhidhan,
[10] James Allen ,Natural Language Understanding, Second Edition, Pearson Education India, ISBN:
8131708950
[11] (2014) Apertium Wikipedia Page, Available: http://en.wikipedia.org/wiki/Apertium
[12] (2014) Apertium Monodix Wikipedia Page, Available: http://wiki.apertium.org/wiki/ Monodix_basics
[13] (2014) Finite state transducer Wikipedia Page, Available: http://en.wikipedia.org/wiki/
Finite_state_transducer
[14] Mikel L. Forcada,Boyan Ivanov Bonev,Sergio Ortiz Rojas,Juan Antonio P´erez Ortiz,Gema Ram´ırez
S´anchez,Felipe S´anchez Mart´ınez,Carme Armentano-Oller,Marco A. Montava,Francis M. Tyers
(March 10, 2010), “IDocumentation of the Open-Source Shallow-Transfer Machine Translation
Platform Apertium” ,[Online] Departament de Llenguatges i Sistemes Inform`atics Universitat
d‟Alacant, Available : http://xixona.dlsi.ua.es/~fran/ apertium2 -documentation.pdf, [Accessed 27th
April 2014]
[15] English Bilingual Machine Translation” International Journal on Natural Language Computing
(IJNLC) Vol. 3, No.3, June 2014
Authors
Mirzanur Rahman: PhD Scholar, Department of Information Technology, Gauhati
University.
Shikhar Kr. Sarma: Head of the Department, Department of Information
Technology, Gauhati University.

More Related Content

What's hot

A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEkevig
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalamijcsit
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution inijfcstjournal
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING kevig
 
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachKannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachIJERA Editor
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiIJERA Editor
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHijnlc
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERcsandit
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizercsandit
 
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh UrduAn OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urduijtsrd
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 

What's hot (18)

A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
Aw32322326
Aw32322326Aw32322326
Aw32322326
 
A New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in MalayalamA New Approach to Parts of Speech Tagging in Malayalam
A New Approach to Parts of Speech Tagging in Malayalam
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
 
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES
 
MYANMAR WORDS SORTING
MYANMAR WORDS SORTING MYANMAR WORDS SORTING
MYANMAR WORDS SORTING
 
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachKannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical Approach
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizer
 
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh UrduAn OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
551 466-472
551 466-472551 466-472
551 466-472
 

Viewers also liked

K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACH
K AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACHK AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACH
K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACHijnlc
 
C ONSTRUCTION O F R ESOURCES U SING J APANESE - S PANISH M EDICAL D ATA
C ONSTRUCTION O F  R ESOURCES  U SING J APANESE - S PANISH M EDICAL  D ATAC ONSTRUCTION O F  R ESOURCES  U SING J APANESE - S PANISH M EDICAL  D ATA
C ONSTRUCTION O F R ESOURCES U SING J APANESE - S PANISH M EDICAL D ATAijnlc
 
Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...ijnlc
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniquesijnlc
 
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGGENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGijnlc
 
Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...ijnlc
 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyijnlc
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageijnlc
 
KANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATION
KANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATIONKANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATION
KANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATIONijnlc
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ijnlc
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAlijnlc
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationijnlc
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERijnlc
 
M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...
M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...
M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...ijnlc
 
T URN S EGMENTATION I NTO U TTERANCES F OR A RABIC S PONTANEOUS D IALOGUES ...
T URN S EGMENTATION I NTO U TTERANCES F OR  A RABIC  S PONTANEOUS D IALOGUES ...T URN S EGMENTATION I NTO U TTERANCES F OR  A RABIC  S PONTANEOUS D IALOGUES ...
T URN S EGMENTATION I NTO U TTERANCES F OR A RABIC S PONTANEOUS D IALOGUES ...ijnlc
 
A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts
A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts
A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts ijnlc
 

Viewers also liked (20)

K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACH
K AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACHK AMBA  P ART  O F  S PEECH  T AGGER  U SING  M EMORY  B ASED  A PPROACH
K AMBA P ART O F S PEECH T AGGER U SING M EMORY B ASED A PPROACH
 
C ONSTRUCTION O F R ESOURCES U SING J APANESE - S PANISH M EDICAL D ATA
C ONSTRUCTION O F  R ESOURCES  U SING J APANESE - S PANISH M EDICAL  D ATAC ONSTRUCTION O F  R ESOURCES  U SING J APANESE - S PANISH M EDICAL  D ATA
C ONSTRUCTION O F R ESOURCES U SING J APANESE - S PANISH M EDICAL D ATA
 
Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniques
 
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGGENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGING
 
Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...Novel cochlear filter based cepstral coefficients for classification of unvoi...
Novel cochlear filter based cepstral coefficients for classification of unvoi...
 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymy
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
KANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATION
KANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATIONKANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATION
KANNADA NAMED ENTITY RECOGNITION AND CLASSIFICATION
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION
 
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...A N H YBRID  A PPROACH TO  W ORD  S ENSE  D ISAMBIGUATION  W ITH  A ND  W ITH...
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...
 
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
S ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAlS ENTIMENT A NALYSIS  F OR M ODERN S TANDARD  A RABIC  A ND  C OLLOQUIAl
S ENTIMENT A NALYSIS F OR M ODERN S TANDARD A RABIC A ND C OLLOQUIAl
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...
M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...
M ACHINE T RANSLATION D EVELOPMENT F OR I NDIAN L ANGUAGE S A ND I TS A PPROA...
 
T URN S EGMENTATION I NTO U TTERANCES F OR A RABIC S PONTANEOUS D IALOGUES ...
T URN S EGMENTATION I NTO U TTERANCES F OR  A RABIC  S PONTANEOUS D IALOGUES ...T URN S EGMENTATION I NTO U TTERANCES F OR  A RABIC  S PONTANEOUS D IALOGUES ...
T URN S EGMENTATION I NTO U TTERANCES F OR A RABIC S PONTANEOUS D IALOGUES ...
 
A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts
A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts
A Novel Approach for Recognizing Text in Arabic Ancient Manuscripts
 

Similar to An implementation of apertium based assamese morphological analyzer

Designing a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextDesigning a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextWaqas Tariq
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityijaia
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
 
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...IOSR Journals
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
 
Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Trijan Faam
 
An Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop StemmerAn Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop Stemmerkevig
 
Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodijnlc
 
Design of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageDesign of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageWaqas Tariq
 
APMorph: finite-state transducer for Amazigh pronominal morphology
APMorph: finite-state transducer for Amazigh pronominal morphology  APMorph: finite-state transducer for Amazigh pronominal morphology
APMorph: finite-state transducer for Amazigh pronominal morphology IJECEIAES
 
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...kevig
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGEScsandit
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESLinda Garcia
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...ijaia
 

Similar to An implementation of apertium based assamese morphological analyzer (20)

Designing a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextDesigning a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo Text
 
Using automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivityUsing automated lexical resources in arabic sentence subjectivity
Using automated lexical resources in arabic sentence subjectivity
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYUSING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITY
 
B0340710
B0340710B0340710
B0340710
 
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Bp4201446450
Bp4201446450Bp4201446450
Bp4201446450
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)Derivational process in matbat language (jurnal ijhan)
Derivational process in matbat language (jurnal ijhan)
 
An Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop StemmerAn Unsupervised Approach to Develop Stemmer
An Unsupervised Approach to Develop Stemmer
 
Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar Rules
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
 
F017163443
F017163443F017163443
F017163443
 
Design of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa LanguageDesign of A Spell Corrector For Hausa Language
Design of A Spell Corrector For Hausa Language
 
APMorph: finite-state transducer for Amazigh pronominal morphology
APMorph: finite-state transducer for Amazigh pronominal morphology  APMorph: finite-state transducer for Amazigh pronominal morphology
APMorph: finite-state transducer for Amazigh pronominal morphology
 
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

An implementation of apertium based assamese morphological analyzer

  • 1. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 DOI : 10.5121/ijnlc.2015.4102 23 AN IMPLEMENTATION OF APERTIUM BASED ASSAMESE MORPHOLOGICAL ANALYZER Mirzanur Rahman1 and Shikhar Kumar Sarma2 Department of Information Technology, Gauhati University, Guwahati, Assam, India ABSTRACT Morphological Analysis is an important branch of linguistics for any Natural Language Processing Technology. Morphology studies the word structure and formation of word of a language. In current scenario of NLP research, morphological analysis techniques have become more popular day by day. For processing any language, morphology of the word should be first analyzed. Assamese language contains very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers for developing morphological analyzer for Assamese Language with some limited domain and we get 72.7% accuracy KEYWORDS Assamese Language, Morphology, Natural Language Processing, FST 1.INTRODUCTION Assamese is the major language spoken in Assam. The state Assam is the north –eastern part of the country. Assamese Language served as a bridge language among different speech communities in the whole area of the state. The language Assamese is an Indo-Aryan language originated from Vedic dialects [1]. The language as it stands today, passes through tremendous modifications in all the component viz. phonology, morphology, conjunction etc. There are two variations of Assamese language according to dialectical regions [2] i.e. Eastern Assamese and Western Assamese Language. Both are different in terms of phonology and morphology. But still the written text is same for all the regions. Morphology is an important component of any language. So, before any processing, we must first analyze the morphology of the words of that language. In language processing technology such as Machine Translation [15], Parsing, POS Tagger, Text summarization etc requires morphological analyzers to find out the lexical component of a word. And lexical components are the very important parts of a grammar of a language. In our work we have considered only the standard written Assamese Text corpus and use Apertium engine with Lttoolbox to build morphological analyzer for Assamese language. In this paper we have discuss how we proceed towards developing morphological analyzer. 2. MORPHOLOGY OF A LANGUAGE In Language Technology research, morphological analysis studies the structure of words and word formation of a language.
  • 2. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 24 Words in a language can be divided into many small units which are known as morpheme [10]. Recognizing different morpheme in a word with their lexical properties is known as morphological analysis. For example in English language  Girls=Girl +s o Root : Girl (category- noun) o Affix: „s‟ (indefinite plural marker) In the above example the word “Girls” is a combination of “Girl” morpheme and “s” morpheme. When we analyze a word with morphological analyzer, it should provide all the combination of morpheme with their lexical properties. According to Golockchandra Goswami [1], all the morphemes of Assamese Language can be divided in to three categories. o Root Morpheme o Sub- Root Morpheme o Affixes Morpheme Root morphemes are the main morpheme depending on which all the morphological construction is done and affixes are attached. Root morpheme may be categorize into other Parts-of-speech category like noun, pronoun, verb, indeclinable etc. For example in Assamese Language:  ল’ৰাজন = ল’ৰা + জন o Root: ল’ৰা (Category -Noun) o Affix: জন (Definite Article, singular case marker) Sub-root morphemes are the morpheme which can occur as an independent root as well as a suffix in an Assamese sentence. For example:  খন, জন ডাল etc. can be used as a root word or as a suffix for singular definitive o মানুহজন বৰ ভাল (“জন” is used as a singular suffix marker) o মমাৰ জন আহহলললন? (“জন” is used as a independent root)  মবাৰ/মবাৰৰ, হবলাক etc can be used as a root word or as a suffix for plural definitive o বস্তুমবাৰৰ দাম নাই৷ (“মবাৰৰ” is used as a plural suffix marker) o ম ামাললাকৰ মবাৰৰ হক খবৰ? (“মবাৰৰ” is used as a independent root) Affixes are always added to the root and it contains some own meaning. But they have no separate existence and can never form a free form alone or in conjunction with themselves. For Example: এ (মানুলহ), ক (মানুহক), লল (ম ামালল),লক (ভাললক) etc 3. PRIOR ARTS
  • 3. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 25 For Assamese language also we have found some of the reported work for morphological analysis. In this section we will try to summarize all the reported work related to Assamese morphological analysis.  In [3], the authors have presented building Morphological Analyzers using the Suffix Stripping method for the four languages – Assamese, Bengali, Bodo and Oriya. In the proposed mechanism they have deals with only inflectional suffixes. The method involves identifying individual suffixes from a series of suffixes attached to a stem/root, using morpheme sequencing rules. In the approach the analyzer analyses the inflected form of a word into suffixes and stems by using a root/stem dictionary (for identifying legitimate roots/stems), a list of suffixes, comprising of all possible suffixes that various categories can take (in order to identify a valid suffix), and the morpheme sequencing rules. . The authors get 50 % coverage for 7000 to 8000 root entries.  In [4], the authors have presented A Suffix-based Noun and Verb Classifier for an Inflectional Language. In the proposed mechanism they have consider only the morpho- syntactic properties of Assamese words. Assamese words can be categorized into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle.  In [5], the authors describe an approach to unsupervised learning of morphology from an unannotated corpus for Assamese Language in their paper “Acquisition of Morphology of an Indic Language from Text Corpus”. In their paper they have present & elaborately discussed an unsupervised method for acquisition of Assamese morphology from a text corpus. This is the initial work towards unsupervised morphological analysis and it is very suitable for Assamese language. This approach, acquire the suffixation morphology of the language from a text corpus of about 300,000 words and build a morphological lexicon. The F-measure of the suffix acquisition is about 69%.  In [6], the authors have presented suffix stripping approach, where they add a rule engine which generates all the possible suffix sequences for analyzing morphology of a word. They got 82% accuracy with a root-word list of size 20,000 approximately with this method.  In [7], the authors combine a rule based algorithm and HMM based algorithm. Where rule based algorithm is used for predicting multiple letter suffixes and an HMM based algorithm for predicting the single letter suffixes .This added method can predict morphologically inflected words with 92% accuracy.  In [8] Utpal Sarma proposed an unsupervised method for learning morphology of a language in his Ph.D thesis “Unsupervised Learning of Morphology of a Highly Inflectional Language” 4. IMPLEMENTATION USING APERTIUM AND LTTOOLBOX
  • 4. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 26 Apertium is a rule-based open-source shallow-transfer machine translation platform [11]. It is free software and released under the terms of the GNU General Public License. It includes the engine, maintenance tools, and open linguistic data for several language pairs. Lttoolbox is a toolbox for lexical processing, morphological analysis and generation of words. Lttoolbox used finite-state transducers (FST). FST are a type of finite-state automata, which may be used as one-pass morphological analyzers. In Apertium, the analyzer data is stored in Apertium‟s dictionary (dix) format with XML syntax. The analyzer can be easily converted to a morphological generator from the single morphological database (monodix), depending on in which direction the system read the dictionary. If the system read the dictionary from left to right, we obtain the analyzer, and read from right to left, we obtain the generator. It is proven that an XML based dictionary (monodix) is generally faster than a normal text or database based dictionary. For creating Morphological Analyzer, different modules of Apertium engine are required. 4.1. Dictionary An Apertium based system can use two types of dictionaries, Monolingual and Bilingual Dictionary. Monolingual dictionary is used for Morphological analyzer & generator and Bilingual dictionary is used for machine translation purpose. In our work, we use monolingual dictionary. 4.2. Paradigm definitions <pardef> A Paradigm is the complete set of related inflectional and productive derivational word forms of a given category. A paradigm can be understood as a small dictionary of alternative transformations that can be concatenated to the parts of words (or to entries of another paradigm) to specify regularities in the lexical processing of the dictionary entries, such as inflection regularities. In the definition along with the root word it contains other information like category, gender, number, person, case marker, tense etc. Figure 1: Dictionary entry of Assamese Word চকু (Eye) in Paradigm 4.3. Element for Reference to a Paradigm Apertium provides a lexico-semantic layer, for working with inflection of a word. The layer introduces the lexemes into derivation and concurrently follows the inflection of the derived lexeme. It is used inside <pardefs> entry. Main advantage of using reference paradigm is that,
  • 5. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 27 there is no need to write all the inflected forms of a lemma in a morphological dictionary entry because it can be referred from other paradigms. Figure 2: XML format for Reference Paradigm 4.4. Morpheme All the root word (morpheme) is included in the dictionary, generated for Morphological Analyzer. The dictionary is different from a conventional dictionary, because it contains other information with morpheme like lexical categories and their corresponding paradigm. Figure 3: Dictionary Entry of Morpheme with their lexical category 4.5. Lttoolbox Modules Lttoolbox contains three modules, lexical processing (lt-comp), morphological analysis & generation (lt-proc) and Expansion (lt-expand). For Morphological analysis lt-comp and lt-proc module is required, lt-comp for processing and lt- proc for generation [12]. Lt-comp module is responsible for compiling our morphological dictionaries into its own finite-sate representation and lt-comp module is responsible for processing the compiled input data into required output. 4.5.1. Compilation: lt-comp module compile the given .dix format file into binary format from left to right (LR) or from right to lest (RL). When we compile with LR, it creates an analyzer and RL usually creates a generator. Syntax of lt-comp: $ lt-comp lr apertium-asm.morph.dix analyser.bin Compile the apertium-asm.morph.dix dictionary in a left-to-right manner into the binary analyser.bin $ lt-comp rl apertium-asm.morph.dix generator.bin
  • 6. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 28 Compile the apertium-asm.morph.dix dictionary in a right-to-left manner into the binary generator.bin 4.5.2. Processing: lt-proc module contains two functions, one is analysis (which is the default mode) and generation. Analysis converts surface forms into the set of possible lexical forms, while generation converts a lexical form into the corresponding surface form. Syntax of lt-proc: $ echo “চকুযুহৰ" | lt-proc analyser.bin Output: ^চকুযুহৰ/চকু<n><pl>$ Here we analyze the Assamese word চকুযুহৰ (Eyes) with the binary format dictionary (left-to-right) analyser.in $ echo "^চকু<n><pl>$" | lt-proc -g generator.bin Output: চকুযুহৰ Here we generate the plural form of Assamese word চকু (Eye) with the binary format dictionary (right-to-left) generator.bin 4.6. Meaning of Analyzers Output format Figure 4: Meaning of output format (Morphological Analyzer) 4.7. Current Dictionary In our current work, we have considered only limited number of word with selected Parts-of- Speech (POS) categories. The following table shows summery of our used database entries Table 1: Number of Entries in XML Dictionary (apertium-asm.morph.dix) Main Category Entry in dictionary Noun 22368 Pronoun 121 Verb 1844 Adverb 232 As a source of our dictionary, we have use following sources  Assamese text corpus obtained from Language Technology Development Project, Gauhati University  Asamiya Abhidhan [9]
  • 7. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 29 All the dictionary entries are done manually by using notepad++ software. The root words lexical categories are verified by linguistics, so that the output of the analyzer is proper. 5. EVALUATION AND TEST RESULT In the above table (Table no 1) we have seen that the number of dictionary entry is only 24565, which is not a very high in number. Since the work is going on, we can expect in future we will have an XML dictionary with large number of entries. Here we have considered only the most frequently used words in Assamese Language. Till now we have not added any rule for lexical selection in the Apertium engine. That‟s why some times it cannot analyze a word properly. For example, the Assamese word জন can be used as a suffix or it can be used as a person name. Most of the time the word জন (man in general) (Definite Article) is used as a suffix in Assamese sentences, but if someone use this word as a person name (Though জন(jhon) (Proper noun) is not commonly used as a name in Assam) then our analyzer cannot give proper analysis. We have faced another problem with this analyzer is that if a word has more than one meaning depending on the situation and position within the sentence, it cannot analyze properly. For Example the Assamese Words: মালা (Garland) and অনল (Fire). Both can be used as a material noun or proper noun depending on the use of the word in the sentence. In testing phase we have use a set of data collected from different Assamese blogs and pages containing 1120 words (after Cleaning). Words are first tokenize and passes through cleaning process (for removing stop word, delimiter and extra white space ) with the help of java programming language. Then one by one we pass the word to the Apertium engine for analysis and store the result in a text file. The text file is checked manually for correctness of the results. The result we have found is shown below Table 2: Test Result Total words 1120 Correctly recognize 815 Wrongly recognize 305 From the above table we have seen that the analyzer provides only 72.7% correct results .Other 27% are wrongly recognize due to limited database entry , unavailability of lexical rules for selecting proper category and limited POS category. 6. CONCLUSION In this paper we have discus about the implementation of a Morphological analyzer using Apertium & Lttoolbox. At present this analyzer can handles only inflectional morphology, since we are excluding derivational morphology and we are working on noun, pronoun, verb and adverb. Our current dictionary can only provide information about suffixes. Form the previous works (in section III) we can see that maximum works done with supervised suffix stripping method. Only limited no of [5, 8] reported work has implement unsupervised technique for analyzing the morphology. Here we have used supervised Finite-state-transducer (FST) method with the help of Apertium engine, since Finite-state-transducers have many
  • 8. International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015 30 advantages [13]. With the help of single source, FST can work as bidirectional engine for both analysis and generation and they are fast (thousands of words per second), and compact. Currently our morphological analyzer is in initial stage. In the future we will extend our work to the remaining grammatical categories, include derivational morphology and populate dictionary with prefix information to get better performance. ACKNOWLEDGEMENTS The authors are thankful to the Department of Information Technology, Gauhati University for providing us the corpus, which helped us in building the MA system and people from Language Technology Development Project, Gauhati University for their immense support REFERENCES [1] Kalyanee Kanchan Baruah, Pranjal Das, Abdul Hannan, Shikhar Kr Sarma, “Assamese- Goswami Golockchandra (1982), Structure of Assamese, Gauhati University publication [2] Bani Kanta Kakati (1962), Assamese, Its Formation and Development [3] Mona Parakh and Rajesha N, “Developing Morphological Analyzer for Four Indian Languages Using A Rule Based Affix Stripping Approach”, Linguistic Data Consortium for Indian Languages, CIIL, Mysore, 2011. [4] Navanath Saharia, Utpal Sharma and Jugal Kalita, “A Suffix based Noun and Verb Classifier for an Inflectional Language” International Conference on Asian Language Proceesing(IALP-10), China, 2010 [5] Sharma, Utpal and Kalita, Jugal K and Das, Rajib K. “Acquisition of Morphology of an Indic Language from Text Corpus”. ACM Transactions of Asian Language Information Processing (TALIP), vol 7, no. 3, article 9, p 1-33, August 2008. [6] Navanath Saharia, Utpal Sharma and Jugal Kalita, “Analysis and Evaluation of Stemming Algorithms: A case study with Assamese” Proceedings of the International Conference on Advances in Computing, Communications and Informatics, Pages 842-846, Chennai, 2012 [7] Navanath Saharia, Kishori M. Konwar, Utpal Sharma and Jugal Kalita, “An Improved Stemming Approach Using HMM for a Highly Inflectional Language”, Proceedings of 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), Pages 164-173, Samos, Greece, 2013 [8] Utpal Sharma, Unsupervised Learning of Morphology of a Highly Inflectional Language, Phd. Thesis, 2006 [9] Giridhar Sarma (1952), Asamiya Abhidhan, [10] James Allen ,Natural Language Understanding, Second Edition, Pearson Education India, ISBN: 8131708950 [11] (2014) Apertium Wikipedia Page, Available: http://en.wikipedia.org/wiki/Apertium [12] (2014) Apertium Monodix Wikipedia Page, Available: http://wiki.apertium.org/wiki/ Monodix_basics [13] (2014) Finite state transducer Wikipedia Page, Available: http://en.wikipedia.org/wiki/ Finite_state_transducer [14] Mikel L. Forcada,Boyan Ivanov Bonev,Sergio Ortiz Rojas,Juan Antonio P´erez Ortiz,Gema Ram´ırez S´anchez,Felipe S´anchez Mart´ınez,Carme Armentano-Oller,Marco A. Montava,Francis M. Tyers (March 10, 2010), “IDocumentation of the Open-Source Shallow-Transfer Machine Translation Platform Apertium” ,[Online] Departament de Llenguatges i Sistemes Inform`atics Universitat d‟Alacant, Available : http://xixona.dlsi.ua.es/~fran/ apertium2 -documentation.pdf, [Accessed 27th April 2014] [15] English Bilingual Machine Translation” International Journal on Natural Language Computing (IJNLC) Vol. 3, No.3, June 2014 Authors Mirzanur Rahman: PhD Scholar, Department of Information Technology, Gauhati University. Shikhar Kr. Sarma: Head of the Department, Department of Information Technology, Gauhati University.