SlideShare a Scribd company logo
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
1
Development of Morphological Analyzer for Hindi
Mayuri Rastogi
Department of Computer Science
Amity University
Lucknow, India
Pooja Khanna
Faculty in Department of Computer Science
Amity University
Lucknow, India
ABSTRACT
One of the important phase of Natural language Processing is
Morphological Analysis that helps in work of machine
translation. Effective implementation of morphological
analyzer can be seen in language which is rich in morphemes.
Hindi being an inflected language has capability of generating
hundreds of words from the root word. It is morphologically
rich language, due to wide variety of words available in Hindi.
This paper is primarily concerned with the design of a
morphological analyzer for Hindi language. The input to this
analyzer will be a Hindi word or sentence and after doing the
proper analysis it will return the root word along with its
feature as output. The features will have categories like part of
speech, gender, number, and person. Two approaches will be
followed by analyzer-rule based and corpus based. Till now
none of the developed morphological analyzers have worked
for both inflectional and derivational morphemes.
Keywords
Morpheme inflected, root word, corpus.
1. INTRODUCTION
Basically there are five stages of NLP namely-morphological
analysis, syntactic analysis, semantic analysis and pragmatic
analysis. Morphological analysis is concerned with analyzing
individual words into their components. There are basically
two classes of morphology-
Class 1- inflectional morphology
Class 2-derivational morphology
In inflectional morphology the new word formed is of same
class as that of previous word i.e. if previous word is of class
noun then after adding affixes to it will remain a noun. This
can be explained by a example as- घड़ी (Noun) becomes घडिय ॉँ
(Noun) on adding ि य as suffix
In derivational morphology, the new word formed will be of
different class than the previous word or let’s say root word.
For example- मूर्ख (fool) (Adj) becomes मूर्खता (foolishness)
(Noun) on adding ता as suffix.
The objective of this paper is to present a idea about a
morphological analyzer for Hindi language that works on both
classes of morphemes. Also this paper aim to present a
effective morphological analyzer as till now most of the
morphological analyzers have been developed either for
inflectional or for derivational morpheme. In this paper we
will follow two approaches for development of my
morphological analyzer.
Approach 1- corpus based
Approach 2- rule based used for derivational morphology.
The first approach follows a database that stores the root word
along with some of the features of root word like-gender
(masculine or feminine), category (noun, pronoun, verb,
adverb, adjective, particles, connectives and interjections) and
number (singular or plural).
The second approach takes into consideration careful analysis
of Hindi word and then formation of rules which put the word
in certain category specified earlier. Before applying any of
the above approaches careful study of inflected and
derivational word of Hindi language was done so that the
developed analyzer works for both kinds of words.
In the below sections, other related works, structure of Hindi
sentences/words and approach followed is explained.
2. RELATED WORK
There is no morphological analyzer that works for both
inflectional and derivational morphology to the best of our
knowledge.
However, morphological analyzer and generator for
inflectional morphology is discussed in paper by Vishal
Goyal,Gurpreet S. Lehal[1] in which they focused on the time
taken to search for a word in database to know the
grammatical information of word. But this method was
limited for languages that have finite or less number of
inflections.
Niraj Aswani and Robert Gaizauskas[2] presented a rule-
based morphological analyzer. The approach uses suffix
replacement rule for finding out the root word from its
inflected form. The major drawback in their work was that not
all rules produced correct results.
Deepak kumar, Manjeet Singh and Seema Shukla[3]
presented paper with idea to use FST(Finite State Transducer)
for developing morphological analyzer for Hindi language.
They discussed two methodologies for development of
morphological analyzer- lexicon generator and generation of
morphological processor.
Nikhil Kamparthi, Abhilash Inumella and Dipti
MisraSharma[4] presented an algorithm to upgrade Hindi
inflectional analyzer to derivational. The algorithm used the
principle of Porter’s stemmer and Krovetz stemmer. The
approach followed included study of Hindi derivatives,
defining derivational rules, using Wikipedia for confirming
genuine data n devising algorithm for derivational analysis.
The paper on “The effect of dictionary and lexicon to
morphological analysis” by Mohd Yunus Sharum[5]
highlights that use of lexicon in morphological analyzer
improves the correctness and performance of the analyzer (in
terms of quality of output).
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
2
The lightweight stemmer for Hindi language developed by
Ramanathan and Rao[6] was based on hand-crafted rules.
After careful observation of Hindi words for noun, verbs,
adjectives and adverbs they produced list of suffixes for each
category.
3. STRUCTURE OF HINDI WORDS
To develop morphological analyzer for Hindi language it is
important to know about the actual structure of Hindi words
and sentences-how they are formed, their special
characteristics etc. Hindi is quite similar to other Indo-Aryan
languages in terms of linguistic characteristics. There are ten
vowels in Hindi language. The Hindi syllable contains a
vowel as it nucleus, followed or preceded by consonants.
Words usually have two or three syllable.
Hindi morphological structure of Hindi consists of various
word classes that include description about their derivational
and inflection forms. In the forthcoming sections, details
about the word classes are given which are referred from book
“Modern Hindi Grammar” by Omkar K.Koul [11 ]
3.1 Nouns
Nouns in Hindi are inflected for gender, number and case.
3.1.1 Gender
There are three types of nouns:
Type I have masculine nouns ending with आ/a: /
Types II have all other masculine nouns.
Types III have all other feminine nouns.
Generally आ/a/ ending masculine nouns have feminine
forms ending in ई/i: /
Masculine Feminine
लडका Ladka boy लडकी Ladki girl
चरर्ा Charkha wheel चरऱ्ी Charkha Reel
ई/i: / ending animate masculine nouns have their feminine
forms ending in -अन/-an/
Masculine Feminine
तेली Teeli Oilrman तेलन Teelan oilwoman
माली Maali Gardener मालन Maalan Gardener
Nouns ending in आ/a:/ form their feminine by replacing आ/a:/
with –इया/-iya:/
Masculine Feminine
गुड्िा Gudda Boy
toy
गुडिया Guddiya Girl
toy
चूहा Chuha Rat चुहहया Chuhiya Rat
Generally –आ/a: / ending nouns are masculine are replaced by
- ई/-i: / to form feminine
Masculine Feminine
राजा Raja King राऩी Rani Queen
बिल्ला Billa Cat बिल्ली Billi Cat
Add suffix ऩी/ni: / to the masculine nouns to form the
feminine
Masculine Feminine
िाक्टर Docter Docter िाक्टरऩी Doctorni docter
Add suffix ई/-i: / to the masculine noun to form feminine
Masculine Feminine
देव dev God देव़ी devi goddess
नट nat Acrobat नटी nati Showgirl
3.1.2 Number
Singular and plural are the two types of number.
The आ/a: ending masculine noun (including pronoun and
adjective) with some exceptions change into plural ending
with ए/e/
Singular Plural
च़ीता Leopard च़ीते Leopards
छाता Umbrella छाते Umbrellas
*पिता Father पिता Fathers
*नेता Leader नेता Leaders
* remain same in their plural form
All other consonants and other vowel-ending nouns do not
change their plural forms.
मच्छर Mosquito
गगलहरी Squirrel
To form feminine plurals add the suffix एं/ẽ/ to the consonant-
ending singular forms
Singular Plural
िुस्तक Book िुस्तकें Books
Feminine nouns ending with ई on adding इयााँ becomes plural
Singular Plural
नारी Woman नाररय Women
सऱ्ी Friend सखर्य Friends
In this last vowel of the stem is removed.
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
3
3.1.3 Noun Derivation
Nouns in Hindi are derived from nouns, adjectives and verbs
by using suffixes.
3.1.3.1 Nouns from Nouns:
Suffixes are दार–da:r ,गर-gar and दान –da:n
थाना दार थानेदार Inspector
सौदा गर सौदागर Merchant
र्जाना च़ी र्जानच़ी Cashier
रोशन दान रोशनदान Window
गुसल र्ाना गुसलर्ाना Bathroom
3.1.3.2 Nouns from Adjectives:
Suffixes for this purpose are ई –i, ता –t:a ,pan, आई-a:I, इयत-
iyat, आस-a:s
र्ुश Happy र्ुश़ी Happiness
िुरा Bad िुराई Badness
एक One एकता Unity
िालक Boy िालकिन Childness
आदम़ी Man आदममयत Humanly
र्ट्टा Sour र्टास Sourness
3.1.3.3 Nouns from Verbs
To derive nouns from verbs suffixes used are अस-as, अन-an,
ई-ee, वत-vat, ना-na
िढ़ Read िढ़ना Reading
धडक Throb धिकन Throbbing
लड Quarrel लडाई Dispute
िना Make िनावट Shape
3.2. Adjectives-
In Hindi these are classified as inflected and uninflected.
3.2.1 Inflected Adjectives:
Masculine Feminine
Singular Plural Singular/plural
भूर् भूर्े भूऱ्ी
गोरा गोरे गोरी
मोटा मोटे मोटी
Uninflected Adjectives:
सुन्दर लडका/लडकी Sundar
ladka/ladki
beautiful boy/girl
सफे द आदम़ी/औरत Safed aadmi/aurat white man/woman
3.2.2 Derivation of Adjectives:
Adjectives are derived from nouns by adding suffixes आ-aa,
ई-I, उ-u, ईला-ila , लू-lu , इक-ik, जनक-janak, दाई-daai, मय-
may, वन-van, आना-ana, नाक-nau, ईन-inn, मंद-mand, दार-dar.
Noun Adjective
सच truth सच्चा truthful
िाजार market िाजारू common
रस juice रस़ीला juicy
दया mercy दयालु Kind
हदन day दैनक daily
िल strength िलवान strong
पवदेश foreign पवदेश़ी foreigner
3.3 Verbs
There are two types of verbs: main verb and auxiliary verb.
Main verb is classified as simple, conjunct and compound
verb.
Verbal construction is classified as:
3.3.1 Intransitive verb
3.3.2 Transitive verb
3.3.3 Ditransitve verb
3.3.4 Causative verb
3.3.5 Dative verb
3.3.6 Conjuct verb
3.3.7 Compound verb
Here we have considered intransitive, transitive and
ditransitive verb in our database.
3.3.1 Intransitive verbs
They are like आ-aa, जा-ja, उठ-uth, िैठ-baith, they do not take
direct object and are not marked by any preposition in present
or future tense.
Eg.-
अममत घर जाएगा
Amit ghar jae:ga
Amit will go home
3.3.2 Transitive verbs
They are derived from intransitive verbs by certain vocalic
changes to the verb roots.
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
4
Eg.-
Intransitive Transitive
मर मार
पिस be ground ि़ीस Grind
घूम go round घुमा turn around
In some cases beside vocal changes consonantal changes also
take place.
Eg-
Intransitive Transitive
फाड Be torn फाडना Tear
िेच Be sold िेचना Sell
3.3.2 Ditransitive verb
They are like देना dena(to give), सुना (to tell), िेचना becna(to
sell). They take three arguments-subject, object and indirect
object. Indirect objects are in dative while rest follow
transitive pattern.
Eg-
राम ने रीता को िुस्तक दी
Ram gave book to Rita.
4. APPROACH FOLLOWED
The methodology used by me for development of
morphological analyzer for Hindi can be explained as-
Tokenize
Fig 1: Block diagram of proposed methodology
Thus, from above the methodology followed is quite clear.
The proposed morphological analyzer works for both- Hindi
words and sentences.
If the input to analyzer is a sentence, the sentence is tokenized
into words. The word is then searched in database or the
maintained corpa.
Database is maintained as-
INPUT {Word_id(Primary key), name, W_Word Category,
Gender, Number}
Database stores the root word along with some features like
category, gender etc. also it is desirable to store some
common or frequently used common noun in database for
better accuracy however it is a tedious job. With space
nowadays of no major issue, searching time is primarily of
concern. Efficient searching technique need to be implied for
effective use of maintained database or dictionary. The prime
focus will be on nouns except proper noun.
If the tokenized word is not found in database then rules are
applied and the root word along with its features is displayed
Rules here are not learnt automatically.
5. CONCLUSION AND RESULT
ANALYSIS
The methodology illustrated above can be analyzed with the
examples of Hindi word-
ILLUSTRATION I:
INPUT- ननिरता
CHECK if the input is a word or sentence. Here it is a word,
so go and search for match in database. If match found then
find the features of word given as input (rules) and display
morphemes of the input word along with the features of the
word.
OUTPUT-
ननिरता = नन(prefix) + िर(Category) + Feminine(Gender) +
Any(Number)+ ता (Suffix)
ILLUSTRATION II:
INPUT- “मनु ध़ीरे चलता है”
CHECK if the input is word or sentence. Here input is a word,
so sentence is tokenized into individual tokens or words and
then matching of each word with the databade takes place.
The first token मनु was serached in database, match found and
display features of मनु. Move to next token which is ध़ीरे and
repeat above procedure. Similarly do for third and fourth
token which are चलता and है respectively.
OUTPUT-
मनु = मनु (rootword) + Noun(Category) +
Mascuilne(Gender) + Singular(Number)
ध़ीरे = ध़ीरे (rootword) + Adj(Category) + Any(Gender) +
Singular(Number)
चलता = चलता (rootword)+ Verb(Category) +
Mascuilne(Gender) + Singular(Number)
है= indeclinable
Input
Sentence
Word
Rule based
approach
Corpus based
approach
Rules for-
 Noun
 Pronoun
 Verb
 Adjective
Database
International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
5
ILLUSTRATION III:
INPUT- सरलो
CHECK if the input is word. Here it is a word indeed but no
match is found in the database as there is no word like सरलो
in Hindi langugae however सरल is a word so invalid word is
given out as output.
OUTPUT-
Invalid word.
5.1 Conclusion
The paper presents a method for developing Hindi
morphological analyzer that works on inflectional and
derivational morphemes by using rule based and dictionary or
corpus based approach. The commonly used words along with
their features are stored in corpus. Time and accuracy of result
is preferred over memory space. But memory space is not a
problem these days, neither in terms of cost nor in terms of
storage requirements- so this method can be used to get good
results. Moreover, this method has an advantage over other
suffix trimming or using Porter’s stemmer and Krovetz
stemmer as it gives root word along with the features rather
than just giving root word as the result. The approach
discussed in this paper has capability of working on either
type of morphology.
6. ACKNOWLEDGEMENTS
This research paper is made possible through the help and
support from parents, teachers and friends. I dedicate my
acknowledgement of gratitude towards my guide ‘Ms. Pooja
Khanna’ for her kind support and advice on every step. My
thanks to the experts who have contributed towards
development of this paper and to my parents and family,
without whose financial and moral support, this work would
not have been materialized.
7. REFERENCES
[1] Vishal Goyal, Gurpreet Singh Lehal,”Hindi
morphological Analyzer and generator”, First
International Conference on Emerging Trends in
Engineering and Techonology, USA,pp.1156-1159,
2008.
[2] NirajAswani, Robert Gaizauskas, “Developing
Morphological Analyzers for South Asian Languages:
Experimenting with the Hindi and Gujrati Languages”,
Proceedings of the Seventh International Conference on
Language Resources and Evaluation, Valleta, Malta
pp.811-815, May, 2010.
[3] Deepak Kumar, Manjeer Singh, Seema Shukla, “FST
based Morphological Analyzer for Hindi Language”,
Internatinal Journal of Computer Science Issues(IJCSI),
Vol.9, pp. 349-353, July, 2012.
[4] Nikhil Kanuparthi, Abhilash Inumella, Dipti Misra
Sharma, “Hindi Derivational Morphological Analyzer”,
Proceedings of the twelfth meeting of the Special Interest
Group on Computational Morphology & Phonology,
Canada, pp.10-16, June, 2012.
[5] Mohd Yunus Sharum, ”The Effects of Dictionary and
Lexicon to Morphological Analysis”, International
Conference on Computer and Information
Application(ICCIA),pp 9-12,2010.
[6] A. Ramanathan and D.Rao, ”A Lightweight Stemmer for
Hindi”, ACM Transactions on Asian Language
Information Processing(TALIP),Vol.2, pp. 130-
142,2003.
[7] Gill Mandeep Singh, Lehal gurpreet Singh, Joshi S.S.,“A
full form Lexicon Based Morphological Analysis and
Generation tool”, Punjabi International Journal of
Cybermatics and Informatics, Hyderabad, India. October
2007, pp 38.
[8] Omkar N. Koul, “Modern Hindi Grammar”, Dunwoody
Press, USA, 2008.
[9] LTRC, IIIT Hyderabad http://ltrc.iiit.ac.in
[10] Teena Bajaj, Prateek Bhatia, “Semi supervised learning
Approach of Hindi Morphology”, All India Conferences
on Advances in Communication Computers, Control and
Knowledge Management (AICACCC-KM), Bahadurgarh
,Feb,2008.
IJCATM : www.ijcaonline.org

More Related Content

What's hot

Sense relations (linguistics)
Sense relations (linguistics)Sense relations (linguistics)
Sense relations (linguistics)
Erick Mwacha
 
morphemes
morphemesmorphemes
Transformational Grammar
Transformational GrammarTransformational Grammar
Transformational Grammar
Cristina Tamayo
 
Orgin of language
Orgin of languageOrgin of language
Orgin of language
mahilaaaaaaaaaa
 
Semantics
SemanticsSemantics
Semantics
desmunddantes
 
Morphology & Morpheme
Morphology & MorphemeMorphology & Morpheme
Morphology & Morpheme
Notre Dame University Bangladesh
 
Language families
Language familiesLanguage families
Language families
Dr. Mohsin Khan
 
Lexemes
LexemesLexemes
Lexemes
Antony Agape
 
MORPHOLOGY
 MORPHOLOGY  MORPHOLOGY
MORPHOLOGY
King Saud University
 
1. introduction to semantics
1. introduction to semantics1. introduction to semantics
1. introduction to semantics
Asmaa Alzelibany
 
Syntax
SyntaxSyntax
Ling304 assignment2 12127013
Ling304 assignment2 12127013Ling304 assignment2 12127013
Ling304 assignment2 12127013Amina Magaji
 
Morphological Analysis
Morphological AnalysisMorphological Analysis
Morphological Analysis
Akshat Pandey
 
The allomorphs
The allomorphsThe allomorphs
The allomorphs
alejaseba
 
Introduction to semantics
Introduction to semanticsIntroduction to semantics
Introduction to semantics
Novii CcissNuupypnggaahddhiititanium
 
Morphological productivity
Morphological productivityMorphological productivity
Morphological productivity
Dr. Mohsin Khan
 
Derivational and inflectional morphemes
Derivational and inflectional morphemesDerivational and inflectional morphemes
Derivational and inflectional morphemes
Dewi Maharani
 

What's hot (20)

Sense relations (linguistics)
Sense relations (linguistics)Sense relations (linguistics)
Sense relations (linguistics)
 
morphemes
morphemesmorphemes
morphemes
 
Transformational Grammar
Transformational GrammarTransformational Grammar
Transformational Grammar
 
General linguistics 1
General linguistics 1General linguistics 1
General linguistics 1
 
Orgin of language
Orgin of languageOrgin of language
Orgin of language
 
Semantics
SemanticsSemantics
Semantics
 
Morphology & Morpheme
Morphology & MorphemeMorphology & Morpheme
Morphology & Morpheme
 
Language families
Language familiesLanguage families
Language families
 
Lexemes
LexemesLexemes
Lexemes
 
MORPHOLOGY
 MORPHOLOGY  MORPHOLOGY
MORPHOLOGY
 
1. introduction to semantics
1. introduction to semantics1. introduction to semantics
1. introduction to semantics
 
Morphology Presentation
Morphology PresentationMorphology Presentation
Morphology Presentation
 
Syntax
SyntaxSyntax
Syntax
 
Traditional grammar
Traditional grammarTraditional grammar
Traditional grammar
 
Ling304 assignment2 12127013
Ling304 assignment2 12127013Ling304 assignment2 12127013
Ling304 assignment2 12127013
 
Morphological Analysis
Morphological AnalysisMorphological Analysis
Morphological Analysis
 
The allomorphs
The allomorphsThe allomorphs
The allomorphs
 
Introduction to semantics
Introduction to semanticsIntroduction to semantics
Introduction to semantics
 
Morphological productivity
Morphological productivityMorphological productivity
Morphological productivity
 
Derivational and inflectional morphemes
Derivational and inflectional morphemesDerivational and inflectional morphemes
Derivational and inflectional morphemes
 

Similar to Development of morphological analyzer for hindi

Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466IJRAT
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer method
ijcsa
 
Comparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systemsComparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systems
ijfcstjournal
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
Dhabal Sethi
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
ijfcstjournal
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzer
ijnlc
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiIJERA Editor
 
Analysis of anaphora resolution system for
Analysis of anaphora resolution system forAnalysis of anaphora resolution system for
Analysis of anaphora resolution system for
ijitjournal
 
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGESCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
kevig
 
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGESCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
ijnlc
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizer
csandit
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
cscpconf
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
csandit
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachKannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical Approach
IJERA Editor
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Waqas Tariq
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
ijnlc
 
Phonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: KannadaPhonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: Kannada
IJERA Editor
 
Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...
ijnlc
 
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
IOSR Journals
 

Similar to Development of morphological analyzer for hindi (20)

Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
 
Anaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer methodAnaphora resolution in hindi language using gazetteer method
Anaphora resolution in hindi language using gazetteer method
 
Comparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systemsComparative performance analysis of two anaphora resolution systems
Comparative performance analysis of two anaphora resolution systems
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 
Pronominal anaphora resolution in
Pronominal anaphora resolution inPronominal anaphora resolution in
Pronominal anaphora resolution in
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzer
 
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to PunjabiA Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
A Tool to Search and Convert Reduplicate Words from Hindi to Punjabi
 
Analysis of anaphora resolution system for
Analysis of anaphora resolution system forAnalysis of anaphora resolution system for
Analysis of anaphora resolution system for
 
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGESCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
 
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGESCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGE
 
Design of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizerDesign of a rule based hindi lemmatizer
Design of a rule based hindi lemmatizer
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
 
DESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZERDESIGN OF A RULE BASED HINDI LEMMATIZER
DESIGN OF A RULE BASED HINDI LEMMATIZER
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical ApproachKannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical Approach
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
 
Phonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: KannadaPhonetic Dictionary for Natural Language Processing: Kannada
Phonetic Dictionary for Natural Language Processing: Kannada
 
Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...Developing links of compound sentences for parsing through marathi link gramm...
Developing links of compound sentences for parsing through marathi link gramm...
 
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
 

More from Made Artha

Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...
Made Artha
 
Comparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print imagesComparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print images
Made Artha
 
Comparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print imagesComparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print images
Made Artha
 
An automated health care computing model for continuous monitoring of patient...
An automated health care computing model for continuous monitoring of patient...An automated health care computing model for continuous monitoring of patient...
An automated health care computing model for continuous monitoring of patient...
Made Artha
 
A secure tunnel technique using i pv6 transition over ipv4 channel
A secure tunnel technique using i pv6 transition over ipv4 channelA secure tunnel technique using i pv6 transition over ipv4 channel
A secure tunnel technique using i pv6 transition over ipv4 channel
Made Artha
 
A cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storageA cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storage
Made Artha
 

More from Made Artha (6)

Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...Design of dfe based mimo communication system for mobile moving with high vel...
Design of dfe based mimo communication system for mobile moving with high vel...
 
Comparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print imagesComparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print images
 
Comparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print imagesComparative study of various enhancement techniques for finger print images
Comparative study of various enhancement techniques for finger print images
 
An automated health care computing model for continuous monitoring of patient...
An automated health care computing model for continuous monitoring of patient...An automated health care computing model for continuous monitoring of patient...
An automated health care computing model for continuous monitoring of patient...
 
A secure tunnel technique using i pv6 transition over ipv4 channel
A secure tunnel technique using i pv6 transition over ipv4 channelA secure tunnel technique using i pv6 transition over ipv4 channel
A secure tunnel technique using i pv6 transition over ipv4 channel
 
A cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storageA cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storage
 

Recently uploaded

Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 

Development of morphological analyzer for hindi

  • 1. International Journal of Computer Applications (0975 – 8887) Volume 95– No.17, June 2014 1 Development of Morphological Analyzer for Hindi Mayuri Rastogi Department of Computer Science Amity University Lucknow, India Pooja Khanna Faculty in Department of Computer Science Amity University Lucknow, India ABSTRACT One of the important phase of Natural language Processing is Morphological Analysis that helps in work of machine translation. Effective implementation of morphological analyzer can be seen in language which is rich in morphemes. Hindi being an inflected language has capability of generating hundreds of words from the root word. It is morphologically rich language, due to wide variety of words available in Hindi. This paper is primarily concerned with the design of a morphological analyzer for Hindi language. The input to this analyzer will be a Hindi word or sentence and after doing the proper analysis it will return the root word along with its feature as output. The features will have categories like part of speech, gender, number, and person. Two approaches will be followed by analyzer-rule based and corpus based. Till now none of the developed morphological analyzers have worked for both inflectional and derivational morphemes. Keywords Morpheme inflected, root word, corpus. 1. INTRODUCTION Basically there are five stages of NLP namely-morphological analysis, syntactic analysis, semantic analysis and pragmatic analysis. Morphological analysis is concerned with analyzing individual words into their components. There are basically two classes of morphology- Class 1- inflectional morphology Class 2-derivational morphology In inflectional morphology the new word formed is of same class as that of previous word i.e. if previous word is of class noun then after adding affixes to it will remain a noun. This can be explained by a example as- घड़ी (Noun) becomes घडिय ॉँ (Noun) on adding ि य as suffix In derivational morphology, the new word formed will be of different class than the previous word or let’s say root word. For example- मूर्ख (fool) (Adj) becomes मूर्खता (foolishness) (Noun) on adding ता as suffix. The objective of this paper is to present a idea about a morphological analyzer for Hindi language that works on both classes of morphemes. Also this paper aim to present a effective morphological analyzer as till now most of the morphological analyzers have been developed either for inflectional or for derivational morpheme. In this paper we will follow two approaches for development of my morphological analyzer. Approach 1- corpus based Approach 2- rule based used for derivational morphology. The first approach follows a database that stores the root word along with some of the features of root word like-gender (masculine or feminine), category (noun, pronoun, verb, adverb, adjective, particles, connectives and interjections) and number (singular or plural). The second approach takes into consideration careful analysis of Hindi word and then formation of rules which put the word in certain category specified earlier. Before applying any of the above approaches careful study of inflected and derivational word of Hindi language was done so that the developed analyzer works for both kinds of words. In the below sections, other related works, structure of Hindi sentences/words and approach followed is explained. 2. RELATED WORK There is no morphological analyzer that works for both inflectional and derivational morphology to the best of our knowledge. However, morphological analyzer and generator for inflectional morphology is discussed in paper by Vishal Goyal,Gurpreet S. Lehal[1] in which they focused on the time taken to search for a word in database to know the grammatical information of word. But this method was limited for languages that have finite or less number of inflections. Niraj Aswani and Robert Gaizauskas[2] presented a rule- based morphological analyzer. The approach uses suffix replacement rule for finding out the root word from its inflected form. The major drawback in their work was that not all rules produced correct results. Deepak kumar, Manjeet Singh and Seema Shukla[3] presented paper with idea to use FST(Finite State Transducer) for developing morphological analyzer for Hindi language. They discussed two methodologies for development of morphological analyzer- lexicon generator and generation of morphological processor. Nikhil Kamparthi, Abhilash Inumella and Dipti MisraSharma[4] presented an algorithm to upgrade Hindi inflectional analyzer to derivational. The algorithm used the principle of Porter’s stemmer and Krovetz stemmer. The approach followed included study of Hindi derivatives, defining derivational rules, using Wikipedia for confirming genuine data n devising algorithm for derivational analysis. The paper on “The effect of dictionary and lexicon to morphological analysis” by Mohd Yunus Sharum[5] highlights that use of lexicon in morphological analyzer improves the correctness and performance of the analyzer (in terms of quality of output).
  • 2. International Journal of Computer Applications (0975 – 8887) Volume 95– No.17, June 2014 2 The lightweight stemmer for Hindi language developed by Ramanathan and Rao[6] was based on hand-crafted rules. After careful observation of Hindi words for noun, verbs, adjectives and adverbs they produced list of suffixes for each category. 3. STRUCTURE OF HINDI WORDS To develop morphological analyzer for Hindi language it is important to know about the actual structure of Hindi words and sentences-how they are formed, their special characteristics etc. Hindi is quite similar to other Indo-Aryan languages in terms of linguistic characteristics. There are ten vowels in Hindi language. The Hindi syllable contains a vowel as it nucleus, followed or preceded by consonants. Words usually have two or three syllable. Hindi morphological structure of Hindi consists of various word classes that include description about their derivational and inflection forms. In the forthcoming sections, details about the word classes are given which are referred from book “Modern Hindi Grammar” by Omkar K.Koul [11 ] 3.1 Nouns Nouns in Hindi are inflected for gender, number and case. 3.1.1 Gender There are three types of nouns: Type I have masculine nouns ending with आ/a: / Types II have all other masculine nouns. Types III have all other feminine nouns. Generally आ/a/ ending masculine nouns have feminine forms ending in ई/i: / Masculine Feminine लडका Ladka boy लडकी Ladki girl चरर्ा Charkha wheel चरऱ्ी Charkha Reel ई/i: / ending animate masculine nouns have their feminine forms ending in -अन/-an/ Masculine Feminine तेली Teeli Oilrman तेलन Teelan oilwoman माली Maali Gardener मालन Maalan Gardener Nouns ending in आ/a:/ form their feminine by replacing आ/a:/ with –इया/-iya:/ Masculine Feminine गुड्िा Gudda Boy toy गुडिया Guddiya Girl toy चूहा Chuha Rat चुहहया Chuhiya Rat Generally –आ/a: / ending nouns are masculine are replaced by - ई/-i: / to form feminine Masculine Feminine राजा Raja King राऩी Rani Queen बिल्ला Billa Cat बिल्ली Billi Cat Add suffix ऩी/ni: / to the masculine nouns to form the feminine Masculine Feminine िाक्टर Docter Docter िाक्टरऩी Doctorni docter Add suffix ई/-i: / to the masculine noun to form feminine Masculine Feminine देव dev God देव़ी devi goddess नट nat Acrobat नटी nati Showgirl 3.1.2 Number Singular and plural are the two types of number. The आ/a: ending masculine noun (including pronoun and adjective) with some exceptions change into plural ending with ए/e/ Singular Plural च़ीता Leopard च़ीते Leopards छाता Umbrella छाते Umbrellas *पिता Father पिता Fathers *नेता Leader नेता Leaders * remain same in their plural form All other consonants and other vowel-ending nouns do not change their plural forms. मच्छर Mosquito गगलहरी Squirrel To form feminine plurals add the suffix एं/ẽ/ to the consonant- ending singular forms Singular Plural िुस्तक Book िुस्तकें Books Feminine nouns ending with ई on adding इयााँ becomes plural Singular Plural नारी Woman नाररय Women सऱ्ी Friend सखर्य Friends In this last vowel of the stem is removed.
  • 3. International Journal of Computer Applications (0975 – 8887) Volume 95– No.17, June 2014 3 3.1.3 Noun Derivation Nouns in Hindi are derived from nouns, adjectives and verbs by using suffixes. 3.1.3.1 Nouns from Nouns: Suffixes are दार–da:r ,गर-gar and दान –da:n थाना दार थानेदार Inspector सौदा गर सौदागर Merchant र्जाना च़ी र्जानच़ी Cashier रोशन दान रोशनदान Window गुसल र्ाना गुसलर्ाना Bathroom 3.1.3.2 Nouns from Adjectives: Suffixes for this purpose are ई –i, ता –t:a ,pan, आई-a:I, इयत- iyat, आस-a:s र्ुश Happy र्ुश़ी Happiness िुरा Bad िुराई Badness एक One एकता Unity िालक Boy िालकिन Childness आदम़ी Man आदममयत Humanly र्ट्टा Sour र्टास Sourness 3.1.3.3 Nouns from Verbs To derive nouns from verbs suffixes used are अस-as, अन-an, ई-ee, वत-vat, ना-na िढ़ Read िढ़ना Reading धडक Throb धिकन Throbbing लड Quarrel लडाई Dispute िना Make िनावट Shape 3.2. Adjectives- In Hindi these are classified as inflected and uninflected. 3.2.1 Inflected Adjectives: Masculine Feminine Singular Plural Singular/plural भूर् भूर्े भूऱ्ी गोरा गोरे गोरी मोटा मोटे मोटी Uninflected Adjectives: सुन्दर लडका/लडकी Sundar ladka/ladki beautiful boy/girl सफे द आदम़ी/औरत Safed aadmi/aurat white man/woman 3.2.2 Derivation of Adjectives: Adjectives are derived from nouns by adding suffixes आ-aa, ई-I, उ-u, ईला-ila , लू-lu , इक-ik, जनक-janak, दाई-daai, मय- may, वन-van, आना-ana, नाक-nau, ईन-inn, मंद-mand, दार-dar. Noun Adjective सच truth सच्चा truthful िाजार market िाजारू common रस juice रस़ीला juicy दया mercy दयालु Kind हदन day दैनक daily िल strength िलवान strong पवदेश foreign पवदेश़ी foreigner 3.3 Verbs There are two types of verbs: main verb and auxiliary verb. Main verb is classified as simple, conjunct and compound verb. Verbal construction is classified as: 3.3.1 Intransitive verb 3.3.2 Transitive verb 3.3.3 Ditransitve verb 3.3.4 Causative verb 3.3.5 Dative verb 3.3.6 Conjuct verb 3.3.7 Compound verb Here we have considered intransitive, transitive and ditransitive verb in our database. 3.3.1 Intransitive verbs They are like आ-aa, जा-ja, उठ-uth, िैठ-baith, they do not take direct object and are not marked by any preposition in present or future tense. Eg.- अममत घर जाएगा Amit ghar jae:ga Amit will go home 3.3.2 Transitive verbs They are derived from intransitive verbs by certain vocalic changes to the verb roots.
  • 4. International Journal of Computer Applications (0975 – 8887) Volume 95– No.17, June 2014 4 Eg.- Intransitive Transitive मर मार पिस be ground ि़ीस Grind घूम go round घुमा turn around In some cases beside vocal changes consonantal changes also take place. Eg- Intransitive Transitive फाड Be torn फाडना Tear िेच Be sold िेचना Sell 3.3.2 Ditransitive verb They are like देना dena(to give), सुना (to tell), िेचना becna(to sell). They take three arguments-subject, object and indirect object. Indirect objects are in dative while rest follow transitive pattern. Eg- राम ने रीता को िुस्तक दी Ram gave book to Rita. 4. APPROACH FOLLOWED The methodology used by me for development of morphological analyzer for Hindi can be explained as- Tokenize Fig 1: Block diagram of proposed methodology Thus, from above the methodology followed is quite clear. The proposed morphological analyzer works for both- Hindi words and sentences. If the input to analyzer is a sentence, the sentence is tokenized into words. The word is then searched in database or the maintained corpa. Database is maintained as- INPUT {Word_id(Primary key), name, W_Word Category, Gender, Number} Database stores the root word along with some features like category, gender etc. also it is desirable to store some common or frequently used common noun in database for better accuracy however it is a tedious job. With space nowadays of no major issue, searching time is primarily of concern. Efficient searching technique need to be implied for effective use of maintained database or dictionary. The prime focus will be on nouns except proper noun. If the tokenized word is not found in database then rules are applied and the root word along with its features is displayed Rules here are not learnt automatically. 5. CONCLUSION AND RESULT ANALYSIS The methodology illustrated above can be analyzed with the examples of Hindi word- ILLUSTRATION I: INPUT- ननिरता CHECK if the input is a word or sentence. Here it is a word, so go and search for match in database. If match found then find the features of word given as input (rules) and display morphemes of the input word along with the features of the word. OUTPUT- ननिरता = नन(prefix) + िर(Category) + Feminine(Gender) + Any(Number)+ ता (Suffix) ILLUSTRATION II: INPUT- “मनु ध़ीरे चलता है” CHECK if the input is word or sentence. Here input is a word, so sentence is tokenized into individual tokens or words and then matching of each word with the databade takes place. The first token मनु was serached in database, match found and display features of मनु. Move to next token which is ध़ीरे and repeat above procedure. Similarly do for third and fourth token which are चलता and है respectively. OUTPUT- मनु = मनु (rootword) + Noun(Category) + Mascuilne(Gender) + Singular(Number) ध़ीरे = ध़ीरे (rootword) + Adj(Category) + Any(Gender) + Singular(Number) चलता = चलता (rootword)+ Verb(Category) + Mascuilne(Gender) + Singular(Number) है= indeclinable Input Sentence Word Rule based approach Corpus based approach Rules for-  Noun  Pronoun  Verb  Adjective Database
  • 5. International Journal of Computer Applications (0975 – 8887) Volume 95– No.17, June 2014 5 ILLUSTRATION III: INPUT- सरलो CHECK if the input is word. Here it is a word indeed but no match is found in the database as there is no word like सरलो in Hindi langugae however सरल is a word so invalid word is given out as output. OUTPUT- Invalid word. 5.1 Conclusion The paper presents a method for developing Hindi morphological analyzer that works on inflectional and derivational morphemes by using rule based and dictionary or corpus based approach. The commonly used words along with their features are stored in corpus. Time and accuracy of result is preferred over memory space. But memory space is not a problem these days, neither in terms of cost nor in terms of storage requirements- so this method can be used to get good results. Moreover, this method has an advantage over other suffix trimming or using Porter’s stemmer and Krovetz stemmer as it gives root word along with the features rather than just giving root word as the result. The approach discussed in this paper has capability of working on either type of morphology. 6. ACKNOWLEDGEMENTS This research paper is made possible through the help and support from parents, teachers and friends. I dedicate my acknowledgement of gratitude towards my guide ‘Ms. Pooja Khanna’ for her kind support and advice on every step. My thanks to the experts who have contributed towards development of this paper and to my parents and family, without whose financial and moral support, this work would not have been materialized. 7. REFERENCES [1] Vishal Goyal, Gurpreet Singh Lehal,”Hindi morphological Analyzer and generator”, First International Conference on Emerging Trends in Engineering and Techonology, USA,pp.1156-1159, 2008. [2] NirajAswani, Robert Gaizauskas, “Developing Morphological Analyzers for South Asian Languages: Experimenting with the Hindi and Gujrati Languages”, Proceedings of the Seventh International Conference on Language Resources and Evaluation, Valleta, Malta pp.811-815, May, 2010. [3] Deepak Kumar, Manjeer Singh, Seema Shukla, “FST based Morphological Analyzer for Hindi Language”, Internatinal Journal of Computer Science Issues(IJCSI), Vol.9, pp. 349-353, July, 2012. [4] Nikhil Kanuparthi, Abhilash Inumella, Dipti Misra Sharma, “Hindi Derivational Morphological Analyzer”, Proceedings of the twelfth meeting of the Special Interest Group on Computational Morphology & Phonology, Canada, pp.10-16, June, 2012. [5] Mohd Yunus Sharum, ”The Effects of Dictionary and Lexicon to Morphological Analysis”, International Conference on Computer and Information Application(ICCIA),pp 9-12,2010. [6] A. Ramanathan and D.Rao, ”A Lightweight Stemmer for Hindi”, ACM Transactions on Asian Language Information Processing(TALIP),Vol.2, pp. 130- 142,2003. [7] Gill Mandeep Singh, Lehal gurpreet Singh, Joshi S.S.,“A full form Lexicon Based Morphological Analysis and Generation tool”, Punjabi International Journal of Cybermatics and Informatics, Hyderabad, India. October 2007, pp 38. [8] Omkar N. Koul, “Modern Hindi Grammar”, Dunwoody Press, USA, 2008. [9] LTRC, IIIT Hyderabad http://ltrc.iiit.ac.in [10] Teena Bajaj, Prateek Bhatia, “Semi supervised learning Approach of Hindi Morphology”, All India Conferences on Advances in Communication Computers, Control and Knowledge Management (AICACCC-KM), Bahadurgarh ,Feb,2008. IJCATM : www.ijcaonline.org