Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Aw32322326 Aw32322326 Document Transcript

  • Pratiksha Gawade, Deepika Madhavi, Jayshree Gaikwad, Sharvari Jadhav, Rahul Ambekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 2, March -April 2013, pp.322-326 Morphological Analyzer for Marathi using NLP Pratiksha Gawade Deepika Madhavi Jayshree Gaikwad Sharvari Jadhav Rahul Ambekar I. T. Department, Padmabhushan Vasantdada Patil Pratishthan’s College Of Engineering. Sion (East), Mumbai-400 022Abstract Morphology is a part of linguistic that recognize these relations from their tacit knowledgedeals with study of words, i.e internal structure of the rules of word formation in English. They inferand partially their meanings. A morphological intuitively that dog is to dogs as cat is to cats;analyzer is a program for analyzing morphology similarly, dog is to dog catcher as dish isfor an input word, it detects morphemes of any to dishwasher, in one sense. The rules understoodtext. In current technique, only provides by the speaker reflect specific patterns, ordictionary which defines the meaning of the regularities, in the way words are formed fromword, but does not give the grammatical smaller units and how those smaller units interact inexplanation regarding that word. In propose speech. In this way, morphology is the branch ofsystem, we evaluate the morphological analyzer linguistics that studies patterns of word formationfor Marathi, an inflectional language and even a within and across languages, and attempts toparsed tree i.e a grammatical structure. We plug formulate rules that model the knowledge of thethe morphological analyzer with statistical pos speakers of those languages.tagger and chunker to see its impact on theirperformance so as to confirm its usability as a 1.2 The Alphabetsfoundation for NLP applications. Marathi script consists of 16 vowels and 36 consonents making a total of 52 alphabets.Keywords: Analyze , Inflection, Lexicon, Marathimorphology, Natural language Processing. 1.3 Vowels The vowels are grouped in two groups. The first1. INTRODUCTION group consists of 12 vowels as follows:1.1 Marathi morphology: In linguistics, morphology[2] is the aaa(A) i ii(I) u uu(U) e ai o au aMaHidentification, analysis and description of thestructure of a givenlanguages morphemes and other linguistic units,such as words, affixes, parts of The first 10 vowels are very widely used. The lastspeech, intonation/stress, or implied context (words two are less commonly a lexicon are the subject matter oflexicology). Morphological typology represents a 1.4 Linguistic Resourcesmethod for classifying languages according to the The linguistic resources required by theways by which morphemes are used in a language morphological analyzer include a lexicon and—from the analytic that use only isolated inflectionmorphemes, through the agglutinative ("stuck- rules for all paradigms. These are few linguistictogether") and fusional languages that use bound resources:morphemes (affixes), up to the polysynthetic, whichcompress lots of separate morphemes into single 1.4.1 Lexiconwords. In linguistics, the description of a language is split into two parts, the grammar consisting of rules While words are generally accepted as describing correct sentence formation andbeing (with clitics) the smallest units of syntax, it is the lexicon listing words and phrases that can beclear that in most languages, if not all, words can be used in the sentences. The lexicon (or wordstock) ofrelated to other words by rules (grammars). For a language is its vocabulary. Statistically, mostexample, English speakers recognize that the lexemes contain a single morpheme. Lexemeswords dog and dogs are closely related — composed of multiple morpheme also known asdifferentiated only by the plurality morpheme "-s", compound words such as idiomaticwhich is only found bound to nouns, and is never expressions and colocations are also considered partseparate. Speakers of English (a fusional language) of the lexicon. In practical applications, such as 322 | P a g e
  • Pratiksha Gawade, Deepika Madhavi, Jayshree Gaikwad, Sharvari Jadhav, Rahul Ambekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 2, March -April 2013, pp.322-326language learning the lexicon is represented by forms of nouns. The changed form of a noun toa dictionary, which lists words alphabetically and which such attachment is done, is calledprovides definition. saamaanyaroop (oblique form) of that noun. For example, in morphological transformation of word1.4.2 Inflection Rules rama(ram) to word ramaalaa(ramala), the Inflection rules [7] specify the inflectional samanyaroop of rama(ram) is ramaa(rama).suffixes to be inserted (or deleted) to (or from)different 1.5.3 Pronoun Morphologypositions in the root to get its inflected form. An Exhaustive list of all possible (over 550) inflectionsinflectional rule has the format: <inflectional of all pronouns is prepared because pronouns showSuffixes, morphosyntactic features, label>. The very irregular behavior. The ratio of inflectionalelement morphosyntactic features specifies the set rules to actual forms in the case of pronouns is closeof morphosyntactic features associated with the to one. A pronoun has a specific single oblique forminflectional form obtained by applying the given to which all shabdayogiavyays are attachedinflection rule. Following is the exhaustive list ofmorphosyntactic features to which different 1.5.4 Verb Morphologymorphemes get inflected: Aakhyaata Theory is the basis of verb morphology analysis. It systematically segments the1) Case: Direct, Oblique verb forms into verb roots and terminating suffixes2) Gender: Masculine, Feminine, Neuter ,Non- called Aakhyaatas. Aakhyaata representsspecific information about TAM and GNP. They are named3) Number: Singular, Plural, Non-specific according to the phonemic shape such as taakhyaata,4) Person: 1st, 2nd, 3rd vaakhyaat and laakhyaata. A regular verb root5) Tense: Past, Present, Future generates over 80 forms. In addition to regular6) Aspect: Perfective, Completive, Frequentative, verbs, there are over 35 irregular verbs. The rulesHabitual, Durative, Inceptive, Stative are represented in the form of tables.7) Mood: Imperative, Probabilitive, Subjunctive,Conditional, Deontic, Abilitive, Permissive 1.5.5 Adjective Morphology Adjectives are classified in inflectional and1.5 Category Wise Morphological Formulation non-inflectional categories. Inflections result from The grammatical categories observed in gender, number and attachment of postpositions toMarathi include nouns, pronouns, verbs, adjectives, the noun modified by such adjective. Table 2 showsadverbs, conjunctions, interjections and a snapshot of inflectional rules. In the spellchecker,postpositions. The morphemes belonging to the root form is chosen as masculine form, fromdifferent categories undergo different treatment. which other forms are generated.1.5.1 Postposition Morphology Changing part Change Paradigms of postpositions are created in masculine Feminine Neuter Obliquebased on their linguistic behavior. They include case form formmarkers (vibhaktipratyay) and a class of Aaa [- I ee yaayapostpositions called shabdayogiavyay. The latter areattached to singular and plural forms of nouns and Table 1. Adjective Morphologypronouns. Some shabdayogiavyays exhibit specificbehavior. When genitive case markers or some Shabdayogiavyays are attached to nouns, it produces For example, some postpositions need to be adjectives. These forms are automatically covered inwritten separately when they follow syllable (cyaa) noun morphology.(chya), which is a case marker. Someshabdayogiavyays can be suffixed with case 1.5.6 Adverb, Conjunction and Interjectionsmarkers This is an important class of part of speech,cao(che),caI,(chi),caa(cha),cyaa(chya).Some for which the rule-based approach proved to beshabdayogiavyays can be composed of appropriate. Attachment of postpositions to nouns,others.PostpositionshI(hI) andca(cha) can be verbs and pronouns is one of the strategies of adverbattached before some shabdayogiavyays, but not formation. In addition, there are non-inflectionalbefore vibhaktipratyays. Some shabdayogiavyays adverbs. The set of derived adverbs is automaticallycan be attached to different oblique forms of verbs. covered at the level of morphology of postpositions, nouns, verbs and pronouns. The list of all1.5.2 Noun Morphology lexicalized adverbs is constructed. Similarly, all Changes due to the attachment of conjunctions and interjections are handled as a listpostpositions are different for singular and plural since they are non-inflectional. When some 323 | P a g e
  • Pratiksha Gawade, Deepika Madhavi, Jayshree Gaikwad, Sharvari Jadhav, Rahul Ambekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 2, March -April 2013, pp.322-326postpositions are attached to demonstrativepronouns, conjunctions are derived. These arehandled at the level of rules for pronouns andpostpositions. Morphological analyzer forms thefoundation for applications like informationretrieval, POS tagging, chunking and ultimately themachine translation. Morphological analyzers forvarious languages have been studied and developedfor years. But, this research is dominated by themorphological analyzers for agglutinative languagesor for the languages like English that show lowdegree of inflection. The proposed system, describes themorphemes, lexicon, type in English for Marathiword. It helps the person not knowing Englishlanguage will be able to understand the English Fig1: architecture for Marathi morphologicalgrammar in a better way. analyzer2. RELATED WORKNatural Language Processing [8] The term ―natural‖ languages refer to thelanguages that people speak, like English, Assameseand Hindi etc. The goal of the Natural LanguageProcessing (NLP) group isto design and build software that will analyze,understand, and generate languages that humans usenaturally. The applications of Natural Language canbe divided into twoclasses• Text based Applications: It involves the processingof written text, such as books,newspapers, reports, manuals, e-mail messages etc.These are all reading based tasks.• Dialogue based Applications: It involves human–machine communication like spoken language. Alsoincludes interaction using keyboards. From an end-user’s perspective, an application may require NLPfor either processing natural language input orproducing natural language output, or both. Also,for a particular application, only some of the tasksof NLP may be required, and depth of analysis at thevarious levels may vary. Achieving human likelanguage processing capability is a difficult goal fora machine. The difficulties are:• Ambiguity• Interpreting partial information• Many inputs can mean same thing Fig.2: flow of query inflector3. PROPOSED SYSTEM Step1: Enter the query or the word whichArchitecture design for analyzer morphemes are needed to be found. But we need to enter the word written into English language Step2: Decision making 2.1If query found follow up the next steps. 2.2If query enter is not return to first step. Step3: If record found display the records. Step4: Perform the morphological analysis. 324 | P a g e
  • Pratiksha Gawade, Deepika Madhavi, Jayshree Gaikwad, Sharvari Jadhav, Rahul Ambekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 2, March -April 2013, pp.322-326Step5: Perform the parsing.Step5: Display the parsed tree. Display themorphemes found.Step6: Stop.4. EXPERIMENTAL RESULTS In this paper, we present the morphologicalanalyzer for Marathi which is official language ofthe state of Maharashtra (India). Marathi is thelanguage spoken by the native people ofMaharashtra. Marathi belongs to the group of Indo-Aryan languages which are a part of the larger ofgroup of Indo-European languages, all of which canbe traced back to a common root. Among the Indo-Aryan languages, Marathi is the southern-mostlanguage. All of the Indo-Aryan languagesoriginated from Sanskrit.The morphological analyzer takes the Marathi inputthen converts it into the English language. We getsits type, lexicon, morphemes. Fig3. Example14.1 Morphological Analyzer for Marathi IN: Preposition or subordinate The formation of polymorphemic words DT: Determinerleads to complexities which need to be handled NN: Noun singularduring 2. consider the word ―aayushyabharasathi‖the analysis process. FSMs prove to be elegant and (for the life time)computationally efficient tools for modeling Equation (2) illustrates this processthe suffix ordering in such words. However, the aayushya ->aayushya +bhara+sathirecursive process of word formation in Marathi =aayushyabharasathiinvolves inflection at the time of attachment of The parse tree is as follows:every new suffix. The FSMs need to be capableof handling them. Koskenniemi (1983) suggests theuse of separate FSMs to model the orthographicchanges. But, Marathi has a well devised system ofparadigms to handle them. One of our observationsled us to a solution that combines paradigm-basedinflectional system with FSM for modeling. Theobservation was that, during the ith recursion only(i-1)th morpheme changes its form which can behandled by suitably modifying he FSM.A. Example for morphological analysis:1. Consider the word ―Darashejari‖ (besidesthe door)Equation (1) illustrates this processdara ->daraa + shejari = darashejariThe parse tree is as follows: Fig4: Example2 5. CONCLUSION We presented a high accuracy morphological analyzer for Marathi that exploits the regularity in the inflectional paradigms while employing he Finite State Systems for modeling the language in an elegant way. We gave detailed description of the morphological phenomena present in Marathi. The classification of postpositions and the 325 | P a g e
  • Pratiksha Gawade, Deepika Madhavi, Jayshree Gaikwad, Sharvari Jadhav, Rahul Ambekar / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 Vol. 3, Issue 2, March -April 2013, pp.322-326development of morphotactic FSA is one of the Veena, Satish Dethe, and Rushikesh K.important contributions since Marathi has complex Joshi. 2006.morphotactics. As a next step the morphological [3] “James Allen. Natural Languageanalyzer can be further extended to handle Understanding”. Pearson Education,thederivation morphology and compound words. Singapur, second edition, 2004.We plan todevelop a hybrid system using methodsto [4] ―Natural Language Processing: Ahandle unknown words and to improve the overall Paninian Perspective”. Bharati, Akshar,accuracy of the system. In the meantime, more Vineet Chaitanya, and Rajeev Sanghalanalysis will be addedto the system to cover aspects 1995.which might have eluded us so far. Even [ 5 ] ” Natural Language Processing – Aconsideration of sentence also taken into account to Paninian Perspective”. R. S. Akshardevelop the further models using this kind of Bharati and V. Chaitanya,1995.analysis. [6] “Two-level Morphology: a general computational model for word-formREFERENCES recognition and production”. [1] ―An improvised Morphological Analyzer Koskenniemi, Kimmo 1983. for Tamil:A case of implementing the open [7] ”A Paradigm-Based Finite State source platformApertium”. Parameswari K. Morphological Analyzer for Marathi”, Unpublished M.Phil. Thesis. Pushpak Bhattacharyya. Hyderabad:University of Hyderabad. 2009. [2] ” Design and Implementation of a [8] “Natural Language Processing”. Utpal Morphology-based Spellchecker Sharma. Department of Computer Science forMarathi, an Indian Language”. Dixit, and Information Technology, Tezpur University, Tezpur-784028, Assam ,India. 326 | P a g e