This document presents the development of a morphological analyzer for the Hindi language. It discusses the importance of morphological analysis in natural language processing. Hindi is described as a morphologically rich, inflected language capable of generating hundreds of words from root words. The proposed analyzer will take in single Hindi words or sentences and return the root word along with features like part of speech, gender, number, and person. It will use both a rule-based approach and corpus-based approach, as previous analyzers have not worked for both inflectional and derivational morphemes. The document outlines the structure of Hindi words and sentences to provide context for developing the analyzer.
Transformational generative grammar is a grammar that was proposed by American Linguist Noam Chomsky in his book entitled, Syntactic Structures in 1957. He talked about some major concepts that is Competence and Performance
Secondly most of our knowledge are innate and learners only have to learn idosyncratic features and thirdly he said that language is universal and deep structures in all languages are same they only differs at the level of transformational level.
Chomsky proposed two level of sentences; deep structures (shows the semantic content of a sentence) and surface structure (determines its phonetic form).
Phrase Structure rule provide us with the underlying syntactic structure of sentence. They are used to break down natural language sentence into its constituent parts. It is a tree diagram which has branches and nodes and each node followed the other.
Transformational Structure Rules contain two parts; Structural analysis specifying the class of strings of two which the rule applies. The second part specifies the structural change.
Morphophonemic rules deals with the alteration of phonetic of morphemes across morpheme boundary. It has a form of phonological rules but it is restricted to a particular morphological environment.
Transformational generative grammar is a grammar that was proposed by American Linguist Noam Chomsky in his book entitled, Syntactic Structures in 1957. He talked about some major concepts that is Competence and Performance
Secondly most of our knowledge are innate and learners only have to learn idosyncratic features and thirdly he said that language is universal and deep structures in all languages are same they only differs at the level of transformational level.
Chomsky proposed two level of sentences; deep structures (shows the semantic content of a sentence) and surface structure (determines its phonetic form).
Phrase Structure rule provide us with the underlying syntactic structure of sentence. They are used to break down natural language sentence into its constituent parts. It is a tree diagram which has branches and nodes and each node followed the other.
Transformational Structure Rules contain two parts; Structural analysis specifying the class of strings of two which the rule applies. The second part specifies the structural change.
Morphophonemic rules deals with the alteration of phonetic of morphemes across morpheme boundary. It has a form of phonological rules but it is restricted to a particular morphological environment.
Basic information about Morpheme and Morphology discussed here for the English department students. Students will find out the definition and difference between these two sectors.
Different types of Morphology and Morphemes you will find here with proper explanation and examples.
Derivational and inflectional morphemesDewi Maharani
Provide the explanation how words are formed by adding morpheme(s) and how the addition of morpheme affect the word (meaning or class). beside\s, this also provide the explanaton of kinds of derivational and inflectional mor[pheme
Anaphora resolution in hindi language using gazetteer methodijcsa
Anaphora resolution is one of the active research areas within the realm of natural language processing.
Resolution of anaphoric reference is one of the most challenging and complex task to be handled. This
paper completely emphasis on pronominal anaphora resolution for Hindi Language. There are various
methodologies for resolving anaphora. This paper presents a computational model for anaphora resolution
in Hindi that is based on Gazetteer method. Gazetteer method is a creation of lists and then applies
operations to classify elements present in the list. There are many salient factors for resolving anaphora.
The proposed model resolves anaphora by using two factors that is Animistic and Recency. Animistic factor
always represent living things and non living things whereas Recency describes that the referents
mentioned in current sentence tends to have higher weights than those in previous sentence. This paper
demonstrate the experiments conducted on short Hindi stories ,news articles and biography content from
Wikipedia, its result & future directions to improve accuracy.
Basic information about Morpheme and Morphology discussed here for the English department students. Students will find out the definition and difference between these two sectors.
Different types of Morphology and Morphemes you will find here with proper explanation and examples.
Derivational and inflectional morphemesDewi Maharani
Provide the explanation how words are formed by adding morpheme(s) and how the addition of morpheme affect the word (meaning or class). beside\s, this also provide the explanaton of kinds of derivational and inflectional mor[pheme
Anaphora resolution in hindi language using gazetteer methodijcsa
Anaphora resolution is one of the active research areas within the realm of natural language processing.
Resolution of anaphoric reference is one of the most challenging and complex task to be handled. This
paper completely emphasis on pronominal anaphora resolution for Hindi Language. There are various
methodologies for resolving anaphora. This paper presents a computational model for anaphora resolution
in Hindi that is based on Gazetteer method. Gazetteer method is a creation of lists and then applies
operations to classify elements present in the list. There are many salient factors for resolving anaphora.
The proposed model resolves anaphora by using two factors that is Animistic and Recency. Animistic factor
always represent living things and non living things whereas Recency describes that the referents
mentioned in current sentence tends to have higher weights than those in previous sentence. This paper
demonstrate the experiments conducted on short Hindi stories ,news articles and biography content from
Wikipedia, its result & future directions to improve accuracy.
Comparative performance analysis of two anaphora resolution systemsijfcstjournal
Anaphora Resolution is the process of finding referents in the given discourse. Anaphora Resolution is one
of the complex tasks of linguistics. This paper presents the performance analysis of two computational
models that uses Gazetteer method for resolving anaphora in Hindi Language. In Gazetteer method
different classes (Gazettes) of elements are created. These Gazettes are used to provide external knowledge
to the system. The two models use Recency and Animistic factor for resolving anaphors. For Recency factor
first model uses the concept of centering approach and second uses the concept of Lappin Leass approach.
Gazetteers are used to provide Animistic knowledge. This paper presents the experimental results of both
the models. These experiments are conducted on short Hindi stories, news articles and biography content
from Wikipedia. The respective accuracy for both the model is analyzed and finally the conclusion is drawn
for the best suitable model for Hindi Language.
Anaphora Resolution is a process of finding referents in discourse. In computational linguistic, Anaphora
resolution is complex and challenging task. This paper focuses on pronominal anaphora resolution. It is a
subpart of anaphora resolution where pronouns are referred to noun referents. Including anaphora
resolution into many applications like automatic summarization, opinion mining, machine translation,
question answering systems etc. increase their accuracy by 10%. Related work in this field has been done
in many languages. This paper focuses on resolving anaphora for Punjabi language. A model is proposed
for resolving anaphora and an experiment is conducted to measure the accuracy of the system. The model
uses two factors: Recency and Animistic knowledge. Recency factor works on the concept of Lappin Leass
approach and for introducing animistic knowledge gazetteer method is used. The experiment is conducted
on a Punjabi story containing more than 1000 words and result is drawn with the future directions.
An implementation of apertium based assamese morphological analyzerijnlc
Morphological Analysis is an important branch of linguistics for any Natural Language Processing Technology. Morphology studies the word structure and formation of word of a language. In current scenario of NLP research, morphological analysis techniques have become more popular day by day. For processing any language, morphology of the word should be first analyzed. Assamese language contains very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers for developing morphological analyzer for Assamese Language with some limited domain and we get 72.7% accuracy
Analysis of anaphora resolution system forijitjournal
Anaphora resolution is complex problem in linguistics and has attracted the attention of many researchers.
It is the problem of identifying referents in the discourse. Anaphora Resolution plays an important role in
Natural language processing task. This paper completely emphasis on pronominal anaphora resolution for
English Language in which pronouns refers to the intended noun in discourse. In this paper two
computational models are proposed for anaphora resolution. Resolution of anaphora is based on various
factors among which these models use Recency factor and Animistic Knowledge. Recency factor is
implemented by using Lappin Leass approach in first model and using Centering approach in second
model. Information about animacy is obtained by Gazetteer method. The identification of animistic
elements is employed to improve the accuracy of the system. This paper demonstrates experiment
conducted by both the models on different data sets from different domains. A comparative result of both
the model is summarized and conclusion is drawn for the best suitable model.
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGEkevig
Sentiment analysis has been performed in different languages and in various domains, such as movie reviews, product reviews and tourism reviews. However, not much work has been done in the area of books considering the high availability of book reviews on Hindi blogs and online forums. In this paper, a scorebased sentiment mining system for Hindi language is discussed, which captures the sentiment behind the words of book review sentences. We conducted three experiments using scores from the HindiSentiWordNet (H-SWN), first using parts-of-speech tags of opinion words to extract their potential scores. Then, we focused on word-sense disambiguation (WSD) to increase the accuracy of system. Finally, the classification results were improved by handling morphological variations. The results were validated against human annotations achieving an overall accuracy of 86.3%. The work was extended further using Hindi Subjective Lexicon (HSL). We also developed an annotated corpus of book reviews in Hindi.
SCORE-BASED SENTIMENT ANALYSIS OF BOOK REVIEWS IN HINDI LANGUAGEijnlc
Sentiment analysis has been performed in different languages and in various domains, such as movie reviews, product reviews and tourism reviews. However, not much work has been done in the area of books considering the high availability of book reviews on Hindi blogs and online forums. In this paper, a scorebased sentiment mining system for Hindi language is discussed, which captures the sentiment behind the
words of book review sentences. We conducted three experiments using scores from the HindiSentiWordNet
(H-SWN), first using parts-of-speech tags of opinion words to extract their potential scores. Then, we focused on word-sense disambiguation (WSD) to increase the accuracy of system. Finally, the classification results were improved by handling morphological variations. The results were validated against human annotations achieving an overall accuracy of 86.3%. The work was extended further using Hindi Subjective Lexicon (HSL). We also developed an annotated corpus of book reviews in Hindi.
Stemming is the process of clipping off the affixes from the input word to obtain the respective
root word, but it is not necessary that stemming provide us the genuine and meaningful root
word. To overcome this problem we come up with a solution- Lemmatizer. It is the process by
which we crave out the lemma from the given word and can also add additional rules to make
the clipped word a proper stem. In this paper we have created an inflectional lemmatizer which
generates the rules for extracting the suffixes and also added rules for generating a proper
meaningful root word.
Stemming is the process of clipping off the affixes from the input word to obtain the respective root word, but it is not necessary that stemming provide us the genuine and meaningful root
word. To overcome this problem we come up with a solution- Lemmatizer. It is the process by which we crave out the lemma from the given word and can also add additional rules to make
the clipped word a proper stem. In this paper we have created an inflectional lemmatizer which generates the rules for extracting the suffixes and also added rules for generating a proper meaningful root word.
Stemming is the process of clipping off the affixes from the input word to obtain the respective root word, but it is not necessary that stemming provide us the genuine and meaningful root word. To overcome this problem we come up with a solution- Lemmatizer. It is the process by
which we crave out the lemma from the given word and can also add additional rules to make the clipped word a proper stem. In this paper we have created an inflectional lemmatizer which generates the rules for extracting the suffixes and also added rules for generating a proper
meaningful root word.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals
Kannada Phonemes to Speech Dictionary: Statistical ApproachIJERA Editor
The input or output of a natural Language processing system can be either written text or speech. To process written text we need to analyze: lexical, syntactic, semantic knowledge about the language, discourse information, real world knowledge to process spoken language, we need to analyze everything required to process written text, along with the challenges of speech recognition and speech synthesis. This paper describes how articulatory phonetics of Kannada is used to generate the phoneme to speech dictionary for Kannada; the statistical computational approach is used to map the elements which are taken from input query or documents. The articulatory phonetics is the place of articulation of a consonant. It is the point of contact where an obstruction occurs in the vocal tract between an articulatory gesture, an active articulator, typically some part of the tongue, and a passive location, typically some part of the roof of the mouth. Along with the manner of articulation and the phonation, this gives the consonant its distinctive sound. The results are presented for the same.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Word sense disambiguation using wsd specific wordnet of polysemy wordsijnlc
This paper presents a new model of WordNet that is used to disambiguate the correct sense of polysemy
word based on the clue words. The related words for each sense of a polysemy word as well as single sense
word are referred to as the clue words. The conventional WordNet organizes nouns, verbs, adjectives and
adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the
structure of WordNet, we developed a new model of WordNet that organizes the different senses of
polysemy words as well as the single sense words based on the clue words. These clue words for each sense
of a polysemy word as well as for single sense word are used to disambiguate the correct meaning of the
polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms.
The clue word can be a noun, verb, adjective or adverb.
Phonetic Dictionary for Natural Language Processing: KannadaIJERA Editor
India has 22 officially recognized languages: Assamese, Bengali, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu, and Urdu. Clearly, India owns the language diversity problem. In the age of Internet, the multiplicity of languages makes it even more necessary to have sophisticated Systems for Natural Language Process. In this paper we are developing the phonetic dictionary for natural language processing particularly for Kannada. Phonetics is the scientific study of speech sounds. Acoustic phonetics studies the physical properties of sounds and provides a language to distinguish one sound from another in quality and quantity. Kannada language is one of the major Dravidian languages of India. The language uses forty nine phonemic letters, divided into three groups: Swaragalu (thirteen letters); Yogavaahakagalu (two letters); and Vyanjanagalu (thirty-four letters), similar to the vowels and consonants of English, respectively.
Developing links of compound sentences for parsing through marathi link gramm...ijnlc
Marathi is a verb-final language with a relatively free word order. Complex Sentences is one of the major types of sentences which are used commonly in any language. This paper explores the study of complex sentence structure of Marathi language. The paper proposes various links of complex sentence clauses and modelling of the complex sentences using proposed links in the Link Grammar Framework for parsing purpose.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The French Revolution Class 9 Study Material pdf free download
Development of morphological analyzer for hindi
1. International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
1
Development of Morphological Analyzer for Hindi
Mayuri Rastogi
Department of Computer Science
Amity University
Lucknow, India
Pooja Khanna
Faculty in Department of Computer Science
Amity University
Lucknow, India
ABSTRACT
One of the important phase of Natural language Processing is
Morphological Analysis that helps in work of machine
translation. Effective implementation of morphological
analyzer can be seen in language which is rich in morphemes.
Hindi being an inflected language has capability of generating
hundreds of words from the root word. It is morphologically
rich language, due to wide variety of words available in Hindi.
This paper is primarily concerned with the design of a
morphological analyzer for Hindi language. The input to this
analyzer will be a Hindi word or sentence and after doing the
proper analysis it will return the root word along with its
feature as output. The features will have categories like part of
speech, gender, number, and person. Two approaches will be
followed by analyzer-rule based and corpus based. Till now
none of the developed morphological analyzers have worked
for both inflectional and derivational morphemes.
Keywords
Morpheme inflected, root word, corpus.
1. INTRODUCTION
Basically there are five stages of NLP namely-morphological
analysis, syntactic analysis, semantic analysis and pragmatic
analysis. Morphological analysis is concerned with analyzing
individual words into their components. There are basically
two classes of morphology-
Class 1- inflectional morphology
Class 2-derivational morphology
In inflectional morphology the new word formed is of same
class as that of previous word i.e. if previous word is of class
noun then after adding affixes to it will remain a noun. This
can be explained by a example as- घड़ी (Noun) becomes घडिय ॉँ
(Noun) on adding ि य as suffix
In derivational morphology, the new word formed will be of
different class than the previous word or let’s say root word.
For example- मूर्ख (fool) (Adj) becomes मूर्खता (foolishness)
(Noun) on adding ता as suffix.
The objective of this paper is to present a idea about a
morphological analyzer for Hindi language that works on both
classes of morphemes. Also this paper aim to present a
effective morphological analyzer as till now most of the
morphological analyzers have been developed either for
inflectional or for derivational morpheme. In this paper we
will follow two approaches for development of my
morphological analyzer.
Approach 1- corpus based
Approach 2- rule based used for derivational morphology.
The first approach follows a database that stores the root word
along with some of the features of root word like-gender
(masculine or feminine), category (noun, pronoun, verb,
adverb, adjective, particles, connectives and interjections) and
number (singular or plural).
The second approach takes into consideration careful analysis
of Hindi word and then formation of rules which put the word
in certain category specified earlier. Before applying any of
the above approaches careful study of inflected and
derivational word of Hindi language was done so that the
developed analyzer works for both kinds of words.
In the below sections, other related works, structure of Hindi
sentences/words and approach followed is explained.
2. RELATED WORK
There is no morphological analyzer that works for both
inflectional and derivational morphology to the best of our
knowledge.
However, morphological analyzer and generator for
inflectional morphology is discussed in paper by Vishal
Goyal,Gurpreet S. Lehal[1] in which they focused on the time
taken to search for a word in database to know the
grammatical information of word. But this method was
limited for languages that have finite or less number of
inflections.
Niraj Aswani and Robert Gaizauskas[2] presented a rule-
based morphological analyzer. The approach uses suffix
replacement rule for finding out the root word from its
inflected form. The major drawback in their work was that not
all rules produced correct results.
Deepak kumar, Manjeet Singh and Seema Shukla[3]
presented paper with idea to use FST(Finite State Transducer)
for developing morphological analyzer for Hindi language.
They discussed two methodologies for development of
morphological analyzer- lexicon generator and generation of
morphological processor.
Nikhil Kamparthi, Abhilash Inumella and Dipti
MisraSharma[4] presented an algorithm to upgrade Hindi
inflectional analyzer to derivational. The algorithm used the
principle of Porter’s stemmer and Krovetz stemmer. The
approach followed included study of Hindi derivatives,
defining derivational rules, using Wikipedia for confirming
genuine data n devising algorithm for derivational analysis.
The paper on “The effect of dictionary and lexicon to
morphological analysis” by Mohd Yunus Sharum[5]
highlights that use of lexicon in morphological analyzer
improves the correctness and performance of the analyzer (in
terms of quality of output).
2. International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
2
The lightweight stemmer for Hindi language developed by
Ramanathan and Rao[6] was based on hand-crafted rules.
After careful observation of Hindi words for noun, verbs,
adjectives and adverbs they produced list of suffixes for each
category.
3. STRUCTURE OF HINDI WORDS
To develop morphological analyzer for Hindi language it is
important to know about the actual structure of Hindi words
and sentences-how they are formed, their special
characteristics etc. Hindi is quite similar to other Indo-Aryan
languages in terms of linguistic characteristics. There are ten
vowels in Hindi language. The Hindi syllable contains a
vowel as it nucleus, followed or preceded by consonants.
Words usually have two or three syllable.
Hindi morphological structure of Hindi consists of various
word classes that include description about their derivational
and inflection forms. In the forthcoming sections, details
about the word classes are given which are referred from book
“Modern Hindi Grammar” by Omkar K.Koul [11 ]
3.1 Nouns
Nouns in Hindi are inflected for gender, number and case.
3.1.1 Gender
There are three types of nouns:
Type I have masculine nouns ending with आ/a: /
Types II have all other masculine nouns.
Types III have all other feminine nouns.
Generally आ/a/ ending masculine nouns have feminine
forms ending in ई/i: /
Masculine Feminine
लडका Ladka boy लडकी Ladki girl
चरर्ा Charkha wheel चरऱ्ी Charkha Reel
ई/i: / ending animate masculine nouns have their feminine
forms ending in -अन/-an/
Masculine Feminine
तेली Teeli Oilrman तेलन Teelan oilwoman
माली Maali Gardener मालन Maalan Gardener
Nouns ending in आ/a:/ form their feminine by replacing आ/a:/
with –इया/-iya:/
Masculine Feminine
गुड्िा Gudda Boy
toy
गुडिया Guddiya Girl
toy
चूहा Chuha Rat चुहहया Chuhiya Rat
Generally –आ/a: / ending nouns are masculine are replaced by
- ई/-i: / to form feminine
Masculine Feminine
राजा Raja King राऩी Rani Queen
बिल्ला Billa Cat बिल्ली Billi Cat
Add suffix ऩी/ni: / to the masculine nouns to form the
feminine
Masculine Feminine
िाक्टर Docter Docter िाक्टरऩी Doctorni docter
Add suffix ई/-i: / to the masculine noun to form feminine
Masculine Feminine
देव dev God देव़ी devi goddess
नट nat Acrobat नटी nati Showgirl
3.1.2 Number
Singular and plural are the two types of number.
The आ/a: ending masculine noun (including pronoun and
adjective) with some exceptions change into plural ending
with ए/e/
Singular Plural
च़ीता Leopard च़ीते Leopards
छाता Umbrella छाते Umbrellas
*पिता Father पिता Fathers
*नेता Leader नेता Leaders
* remain same in their plural form
All other consonants and other vowel-ending nouns do not
change their plural forms.
मच्छर Mosquito
गगलहरी Squirrel
To form feminine plurals add the suffix एं/ẽ/ to the consonant-
ending singular forms
Singular Plural
िुस्तक Book िुस्तकें Books
Feminine nouns ending with ई on adding इयााँ becomes plural
Singular Plural
नारी Woman नाररय Women
सऱ्ी Friend सखर्य Friends
In this last vowel of the stem is removed.
3. International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
3
3.1.3 Noun Derivation
Nouns in Hindi are derived from nouns, adjectives and verbs
by using suffixes.
3.1.3.1 Nouns from Nouns:
Suffixes are दार–da:r ,गर-gar and दान –da:n
थाना दार थानेदार Inspector
सौदा गर सौदागर Merchant
र्जाना च़ी र्जानच़ी Cashier
रोशन दान रोशनदान Window
गुसल र्ाना गुसलर्ाना Bathroom
3.1.3.2 Nouns from Adjectives:
Suffixes for this purpose are ई –i, ता –t:a ,pan, आई-a:I, इयत-
iyat, आस-a:s
र्ुश Happy र्ुश़ी Happiness
िुरा Bad िुराई Badness
एक One एकता Unity
िालक Boy िालकिन Childness
आदम़ी Man आदममयत Humanly
र्ट्टा Sour र्टास Sourness
3.1.3.3 Nouns from Verbs
To derive nouns from verbs suffixes used are अस-as, अन-an,
ई-ee, वत-vat, ना-na
िढ़ Read िढ़ना Reading
धडक Throb धिकन Throbbing
लड Quarrel लडाई Dispute
िना Make िनावट Shape
3.2. Adjectives-
In Hindi these are classified as inflected and uninflected.
3.2.1 Inflected Adjectives:
Masculine Feminine
Singular Plural Singular/plural
भूर् भूर्े भूऱ्ी
गोरा गोरे गोरी
मोटा मोटे मोटी
Uninflected Adjectives:
सुन्दर लडका/लडकी Sundar
ladka/ladki
beautiful boy/girl
सफे द आदम़ी/औरत Safed aadmi/aurat white man/woman
3.2.2 Derivation of Adjectives:
Adjectives are derived from nouns by adding suffixes आ-aa,
ई-I, उ-u, ईला-ila , लू-lu , इक-ik, जनक-janak, दाई-daai, मय-
may, वन-van, आना-ana, नाक-nau, ईन-inn, मंद-mand, दार-dar.
Noun Adjective
सच truth सच्चा truthful
िाजार market िाजारू common
रस juice रस़ीला juicy
दया mercy दयालु Kind
हदन day दैनक daily
िल strength िलवान strong
पवदेश foreign पवदेश़ी foreigner
3.3 Verbs
There are two types of verbs: main verb and auxiliary verb.
Main verb is classified as simple, conjunct and compound
verb.
Verbal construction is classified as:
3.3.1 Intransitive verb
3.3.2 Transitive verb
3.3.3 Ditransitve verb
3.3.4 Causative verb
3.3.5 Dative verb
3.3.6 Conjuct verb
3.3.7 Compound verb
Here we have considered intransitive, transitive and
ditransitive verb in our database.
3.3.1 Intransitive verbs
They are like आ-aa, जा-ja, उठ-uth, िैठ-baith, they do not take
direct object and are not marked by any preposition in present
or future tense.
Eg.-
अममत घर जाएगा
Amit ghar jae:ga
Amit will go home
3.3.2 Transitive verbs
They are derived from intransitive verbs by certain vocalic
changes to the verb roots.
4. International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
4
Eg.-
Intransitive Transitive
मर मार
पिस be ground ि़ीस Grind
घूम go round घुमा turn around
In some cases beside vocal changes consonantal changes also
take place.
Eg-
Intransitive Transitive
फाड Be torn फाडना Tear
िेच Be sold िेचना Sell
3.3.2 Ditransitive verb
They are like देना dena(to give), सुना (to tell), िेचना becna(to
sell). They take three arguments-subject, object and indirect
object. Indirect objects are in dative while rest follow
transitive pattern.
Eg-
राम ने रीता को िुस्तक दी
Ram gave book to Rita.
4. APPROACH FOLLOWED
The methodology used by me for development of
morphological analyzer for Hindi can be explained as-
Tokenize
Fig 1: Block diagram of proposed methodology
Thus, from above the methodology followed is quite clear.
The proposed morphological analyzer works for both- Hindi
words and sentences.
If the input to analyzer is a sentence, the sentence is tokenized
into words. The word is then searched in database or the
maintained corpa.
Database is maintained as-
INPUT {Word_id(Primary key), name, W_Word Category,
Gender, Number}
Database stores the root word along with some features like
category, gender etc. also it is desirable to store some
common or frequently used common noun in database for
better accuracy however it is a tedious job. With space
nowadays of no major issue, searching time is primarily of
concern. Efficient searching technique need to be implied for
effective use of maintained database or dictionary. The prime
focus will be on nouns except proper noun.
If the tokenized word is not found in database then rules are
applied and the root word along with its features is displayed
Rules here are not learnt automatically.
5. CONCLUSION AND RESULT
ANALYSIS
The methodology illustrated above can be analyzed with the
examples of Hindi word-
ILLUSTRATION I:
INPUT- ननिरता
CHECK if the input is a word or sentence. Here it is a word,
so go and search for match in database. If match found then
find the features of word given as input (rules) and display
morphemes of the input word along with the features of the
word.
OUTPUT-
ननिरता = नन(prefix) + िर(Category) + Feminine(Gender) +
Any(Number)+ ता (Suffix)
ILLUSTRATION II:
INPUT- “मनु ध़ीरे चलता है”
CHECK if the input is word or sentence. Here input is a word,
so sentence is tokenized into individual tokens or words and
then matching of each word with the databade takes place.
The first token मनु was serached in database, match found and
display features of मनु. Move to next token which is ध़ीरे and
repeat above procedure. Similarly do for third and fourth
token which are चलता and है respectively.
OUTPUT-
मनु = मनु (rootword) + Noun(Category) +
Mascuilne(Gender) + Singular(Number)
ध़ीरे = ध़ीरे (rootword) + Adj(Category) + Any(Gender) +
Singular(Number)
चलता = चलता (rootword)+ Verb(Category) +
Mascuilne(Gender) + Singular(Number)
है= indeclinable
Input
Sentence
Word
Rule based
approach
Corpus based
approach
Rules for-
Noun
Pronoun
Verb
Adjective
Database
5. International Journal of Computer Applications (0975 – 8887)
Volume 95– No.17, June 2014
5
ILLUSTRATION III:
INPUT- सरलो
CHECK if the input is word. Here it is a word indeed but no
match is found in the database as there is no word like सरलो
in Hindi langugae however सरल is a word so invalid word is
given out as output.
OUTPUT-
Invalid word.
5.1 Conclusion
The paper presents a method for developing Hindi
morphological analyzer that works on inflectional and
derivational morphemes by using rule based and dictionary or
corpus based approach. The commonly used words along with
their features are stored in corpus. Time and accuracy of result
is preferred over memory space. But memory space is not a
problem these days, neither in terms of cost nor in terms of
storage requirements- so this method can be used to get good
results. Moreover, this method has an advantage over other
suffix trimming or using Porter’s stemmer and Krovetz
stemmer as it gives root word along with the features rather
than just giving root word as the result. The approach
discussed in this paper has capability of working on either
type of morphology.
6. ACKNOWLEDGEMENTS
This research paper is made possible through the help and
support from parents, teachers and friends. I dedicate my
acknowledgement of gratitude towards my guide ‘Ms. Pooja
Khanna’ for her kind support and advice on every step. My
thanks to the experts who have contributed towards
development of this paper and to my parents and family,
without whose financial and moral support, this work would
not have been materialized.
7. REFERENCES
[1] Vishal Goyal, Gurpreet Singh Lehal,”Hindi
morphological Analyzer and generator”, First
International Conference on Emerging Trends in
Engineering and Techonology, USA,pp.1156-1159,
2008.
[2] NirajAswani, Robert Gaizauskas, “Developing
Morphological Analyzers for South Asian Languages:
Experimenting with the Hindi and Gujrati Languages”,
Proceedings of the Seventh International Conference on
Language Resources and Evaluation, Valleta, Malta
pp.811-815, May, 2010.
[3] Deepak Kumar, Manjeer Singh, Seema Shukla, “FST
based Morphological Analyzer for Hindi Language”,
Internatinal Journal of Computer Science Issues(IJCSI),
Vol.9, pp. 349-353, July, 2012.
[4] Nikhil Kanuparthi, Abhilash Inumella, Dipti Misra
Sharma, “Hindi Derivational Morphological Analyzer”,
Proceedings of the twelfth meeting of the Special Interest
Group on Computational Morphology & Phonology,
Canada, pp.10-16, June, 2012.
[5] Mohd Yunus Sharum, ”The Effects of Dictionary and
Lexicon to Morphological Analysis”, International
Conference on Computer and Information
Application(ICCIA),pp 9-12,2010.
[6] A. Ramanathan and D.Rao, ”A Lightweight Stemmer for
Hindi”, ACM Transactions on Asian Language
Information Processing(TALIP),Vol.2, pp. 130-
142,2003.
[7] Gill Mandeep Singh, Lehal gurpreet Singh, Joshi S.S.,“A
full form Lexicon Based Morphological Analysis and
Generation tool”, Punjabi International Journal of
Cybermatics and Informatics, Hyderabad, India. October
2007, pp 38.
[8] Omkar N. Koul, “Modern Hindi Grammar”, Dunwoody
Press, USA, 2008.
[9] LTRC, IIIT Hyderabad http://ltrc.iiit.ac.in
[10] Teena Bajaj, Prateek Bhatia, “Semi supervised learning
Approach of Hindi Morphology”, All India Conferences
on Advances in Communication Computers, Control and
Knowledge Management (AICACCC-KM), Bahadurgarh
,Feb,2008.
IJCATM : www.ijcaonline.org