Sanskrit Parser Report

1,581 views
1,408 views

Published on

In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy
to translate it in some other language.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,581
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sanskrit Parser Report

  1. 1. SANSKRIT LANGUAGE PARSER Akash Bhargava - 10UCS002 Ashok Kumar - 10UCS010 Laxmi Kant Yadav - 10UCS027 Vijay Kumar Gupta - 10UCS057 COMPUTER SCIENCE & ENGINEERING DEPARTMENT NATIONAL INSTITUTE OF TECHNOLOGY, AGARTALA INDIA-799055 MAY, 2014
  2. 2. SANSKRIT LANGUAGE PARSER Dissertation submitted to National Institute of Technology, Agartala for the award of the degree of Bachelor of Technology by Akash Bhargava - 10UCS002 Ashok Kumar - 10UCS010 Laxmi Kant Yadav - 10UCS027 Vijay Kumar Gupta - 10UCS057 Under the Guidance of Mr. Nikhil Debbarma Assistant Professor, CSE Department, NIT Agartala, India COMPUTER SCIENCE & ENGINEERING DEPARTMENT NATIONAL INSTITUTE OF TECHNOLOGY AGARTALA MAY, 2014
  3. 3. DISSERTATION APPROVAL SHEET This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002; Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027; Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor of Technology in Computer Science & Engineering. Nikhil Debbarma Dissertation Supervisor Assistant Professor Computer Science & Engineering Department NIT, Agartala Paritosh Bhattacharya Head Of Department Professor Computer Science & Engineering Department NIT, Agartala Date:19.05.2014 Place:NIT, Agartala iii
  4. 4. DECLARATION We declare that the work presented in this dissertation titled “Language Parser”, submitted to the Computer Science & Engineering Department, National Institute of Technology, Agartala, for the award of the Bachelor of Technology degree in Computer Science & Engineering, represents my ideas in my own words and where others’ ideas or words have been included, We have adequately cited and referenced the original sources. We also declare that we have adhered to all prin- ciples of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. We understand that any vi- olation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed. MAY, 2014 Agartala Akash Bhargava 10UCS002 Ashok Kumar 10UCS010 Laxmi Kant Yadav 10UCS027 Vijay Kumar Gupta 10UCS057 iv
  5. 5. CERTIFICATE This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002; Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027; Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor of Technology in Computer Science & Engineering. Nikhil Debbarma Dissertation Supervisor Assistant Professor Computer Science & Engineering Department NIT, Agartala Suman Deb Coordinator Assistant Professor Computer Science & Engineering Department NIT, Agartala v
  6. 6. Acknowledgement We would like to take this opportunity to express our deep sense of gratitude to all who helped us directly or indirectly during this project work. Firstly, we would like to thank out super- visor Asst. Prof. Nikhil Debbarma and Co-ordinator Asst. Prof. Suman Deb for being a great mentor and the best advisor we could ever have.His advice, encouragement and critics are source of innovative ideas, inspiration and causes behind the successful completion of this project. The confidence shown on us by him was the biggest source of inspiration for us. It has been privilege working with them for last two semesters on two different projects. We are highly obliged to all the faculty member of Computer Science and Engineering Depart- ment for their support and encouragement. We also thank out Director Dr. Gopal Mugeraya and HOD CSE Dept. Asst. Prof. Paritosh Bhattacharya for providing excellent computing and other facilities without which this work could not achieve its quality goal. We would like to express our sincere appreciation and gratitude towards Asst. Prof. Anupam Jamatia for his support to prepare this project report in LATEX. Finally we are grateful to out parents for their support. It was impossible for us to complete this project without their love, blessing and encouragement. -Akash Bhargava, Ashok Kumar, Laxmi Kant Yadav, Vijay Kumar Gupta vi
  7. 7. Dedicated to To our loving families for their kind love and support. To our Project Supervisor Asst. Prof. Nikhil Debbarma and our Project Coordinator Asst. Prof. Suman Deb for sharing valuable knowledge, encouragement showing confidence on us all the time. vii
  8. 8. Abstract Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part of speech.Traditional sentence parsing is often performed as a method of understanding the exact meaning of a sentence, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate. According to many researchers, Sanskrit is a very scientific language. Sanskrit behaves very closely as programming language. So if we are able to make a translator that translates Sanskrit into other language, then it would prove to be a significant development in the field of NLP(Natural Language Processing). In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. We take input as a Sanskrit sentence or paragraph. We tokenize the whole sentence(Lexical analysis). We recognize the parts of the speech from individual tokens(Parsing) and then we parse the sentence or try to make sense out of it(Parsing) viii
  9. 9. Contents Acknowledgement vi Dedicated to vii Abstract viii 1 Introduction 3 1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.6 About The Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 ix
  10. 10. 1.7 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.8 Study of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 System Requirement Specification 7 2.1 Compiler Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Lexical Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Semantic Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Intermediate Code Generation: . . . . . . . . . . . . . . . . . . . . . 9 2.1.4 Code Optimization : . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.5 Code Generation : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 System Design 19 3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Input Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Input Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Input Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.6 Output Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 Implementation & Screen shots 24 x
  11. 11. 4.1 Parser :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.2 Ambiguity : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Implementation Steps :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.1 The Lexer : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2.2 The Parser : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2.3 Grammer Used : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.4 Uses Of A Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 Input & Output : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5 Testing 35 5.1 Syntax Error Handling: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Error-Recovery Strategies : . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2.1 Panic mode: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2.2 Phrase-level recovery: . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2.3 Error productions : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2.4 Global correction : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6 Conclusion 38 7 Appendix 40 8 Reference 42 xi
  12. 12. List of Figures 2.1 Phase of Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Lexical Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Parsing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Vibhakti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Conjugational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.6 Noun and Adjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.7 Noun Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.8 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.9 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.10 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 x
  13. 13. CSED, NIT Agartala 3.3 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1 lexical Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Output Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Output SnapShot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1
  14. 14. CSED, NIT Agartala 2
  15. 15. Chapter 1 Introduction 1.1 Purpose In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. 1.2 Scope Ability to parse From Sanskrit sentence to English Sentence. 1.3 Basis We will first put up some concepts then employ them- 3
  16. 16. CSED, NIT Agartala • Lexical Analysis • Parsing • Advantages of using Sanskrit • Approach 1.4 Overview This Design Document is divided into five major Section. Section 1 is an Introduction that provides information about the document itself. Section 2 is an overview of the application and its primary functionality. Section 3 identifies the assumptions and constraints followed during the design the software. Section 4 documents the over system architecture. Section 5 provides the detailed design information for every subsystem and component in the current delivery 1.5 Objective In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. Here we are describing about Machine Translation Technique for translating Sanskrit sentence to English sentence. 1.6 About The Project • Machine Translation has been defined as the process that utilizes computer software to translate text from one natural language to another, It is one of the most important appli- cations of Natural Language Processing. • It helps people from different places to understand an unknown language without the aid of a human translator. 4
  17. 17. CSED, NIT Agartala • The language to be translated is the Source Language (SL). The language, to which source language translated is Target Language (TL). • The major machine translation technique are Rule Based Machine Translation Technique, Statistical Machine Translation Technique (SMT) and Example-based machine transla- tion (EBMT). • One of the effective techniques for machine translation is Rule Based Machine Transla- tion. • In India, different machine translation systems are implemented. AnglaUrdu (AnglaHindi based) Machine Translation System for English to Urdu , HindiAngla Machine Trans- lation Systems form Hindi to English, English-Assarnese Machine Translation System (Machine Translation System from English to Assamese, MaTra: Human Aided Machine Translation System, AnglaHindi: An English to Hindi Machine-Aided Translation Sys- tem and AnglaBharti Technology for machine aided translation from English to Indian Languages, these are some of the machine translation works implemented in India. • Machine translation from Sanskrit is never an easy task because of structural vastness of its Grammar, but the grammar is well organized and least ambiguous compared to other natural language. • The Sanskrit sentence which is the input for our first module i.e. lexical Parser it generates a Parse tree that is generated by using semantic relationships. • This parse tree acts as an input to the Second module i.e. Semantic mapper where the Sanskrit semantic word is mapped to the English semantic word. 1.7 Drawbacks Some of the most fluent drawbacks of the project: • This project is all about Parsing a language into another , it is not a pure translator. • This project is platform dependent (here platform is Linux). • It is Database oriented project not just using online approach. 5
  18. 18. CSED, NIT Agartala 1.8 Study of the Project To Provide the facility for users to give input in sanskrit language and converting (parsing ) it into English language. Here we have some predefined methods for Parsing As: • We first tokenize the input using strtok(str,´’ ´’); • Each token can be of 3 types- Noun,verb, preposition.The task is to identify these token which is done by matching in indexed database. • Each token is stored in a structure along with the meaning and its morphologic. • Then parser comes into play and form a tree type of structure. Using these tokens. Major approaches of Machine Translation are rule-based machine translation (RBMT, also known as the Rational approach). Rule based translation consists of: 1. Process of analyzing input sentence of a source language syntactically and or semantically 2. Process of generating output sentence of a target language based on internal structure each process is controlled by the dictionary and the rules. • The strength of the rule based method is that the information can be obtained through introspection and analysis. • The weakness of the rule based method is the accuracy of entire process is the product of the accuracies of each sub stage. 6
  19. 19. Chapter 2 System Requirement Specification 2.1 Compiler Phases Compiler operates in phases ans each phase transforms the source program from one represen- tation to another. Compiler has six phases :- • Lexical Analyzer • Syntax Analyzer • Semantic Analyzer • Intermediate code generation • Code optimization • Code Generation 7
  20. 20. CSED, NIT Agartala Symbol table and error handling interact with the six phases. Some of the phases may be grouped together. . Figure 2.1: Phase of Compiler 2.1.1 Lexical Analysis Phase : The lexical phase reads the characters in the source program and groups them into a stream of tokens in which each token represents a logically cohesive sequence of characters, such as, An identifier, A keyword, A punctuation character. The character sequence forming a token is called the lexeme for the token. The semantic standard representation was designed to provide a simple description of the grammatical relationships in a sentence that can easily be understood and effectively used by people without linguistic expertise who want to extract textual relations. The sentence relationships are represented uniformly as semantic standard relations between pairs of words. 8
  21. 21. CSED, NIT Agartala . Figure 2.2: Lexical Analyzer 2.1.2 Semantic Analysis Phase : This phase checks the source program for semantic errors and gathers type information for the subsequent code-generation phase. It uses the hierarchical structure determined by the syntax- analysis phase to identify the operators and operands of expressions and statements. An impor- tant component of semantic analysis is type checking. 2.1.3 Intermediate Code Generation: The syntax and semantic analysis generate a explicit intermediate representation of the source program. The intermediate representation should have two important properties: • It should be easy to produce. • Easy to translate into target program. Intermediate representation can have a variety of forms. One of the forms is: three address code; which is like the assembly language for a machine in which every location can act like a 9
  22. 22. CSED, NIT Agartala register. Three address code consists of a sequence of instructions, each of which has at most three operands 2.1.4 Code Optimization : Code optimization phase attempts to improve the intermediate code, so that faster-running ma- chine code will result. 2.1.5 Code Generation : The final phase of the compiler is the generation of target code, consisting normally of relocat- able machine code or assembly code. Memory locations are selected for each of the variables used by the program. Then, the each intermediate instruction is translated into a sequence of machine instructions that perform the same task. 2.2 Parsing Methods : In the compiler model, the parser obtains a string of tokens from the lexical analyser, and verifies that the string can be generated by the grammar for the source language. The parser returns any syntax error for the source language. There are two types of parsing methods: top-down and bottom-up. ”Top-down” is pretty much self-explanatory. From left to right, we drill down through each non-terminal until we get to a terminal. We also build our tree from the root node down to the leaves in a top-down fashion. It’s important to note that we drill down from left to right replacing the leftmost non-terminal first. The definitive meaning of top-down parsing is an attempt to find a leftmost derivation. ” In bottom-up parsing we are doing a rightmost derivation, where we replace the rightmost non-terminal first. There are three general types parsers for grammars.Universal parsing methods such as theCocke-Younger-Kasami algorithmand Earleys algorithmcan parse any grammar. These meth- ods are too inefficient to use in production compilers. The methods commonly used in compilers are classified as either top-down parsingorbottom-up parsing. Top-down parsers build parse trees from thetop (root)to the bottom (leaves) Bottom-up parsers build parse trees from the 10
  23. 23. CSED, NIT Agartala . Figure 2.3: Parsing Step leaves and work up to the root. In both case input to the parser is scanned from left to right, one symbol at a time. The output of the parser is some representation of the parse tree for the stream of tokens. There are number of tasks that might be conducted during parsing. Such as • Collecting information about various tokens into the symbol table. • Performing type checking and other kinds of semantic analysis. • Generating intermediate code. 11
  24. 24. CSED, NIT Agartala Algorithm for Parsing an English sentence 1. Tokenize the sentence into various tokens i.e. token list. 2. To find the relationship between tokens we are using dependency grammar and binary relation for our Sanskrit language. Token list acts as an input to semantic class to represent the semantic standard. 3. Semantic class generates a tree we have a class Tree Transform which will create a tree. 4. Semantic class generates a tree we have a class Tree Transform which will create a tree. 2.3 Grammar : Grammar provides a precise way to specify the syntax (structure or arrangement of composing units) of a language. In grade school we take grammar lessons that teach us to speak and write proper English. They teach us the correct way to form sentences with subjects, predicates, noun phrases, verb phrases, etc. Subjects, predicates, and phrases are some of the composing units of a sentence in English; similarly, if/else statements, assignment statements, and function definitions are some of the composing units of source code, which itself is a single sentence of a particular programming language. There are a very large number of valid English sentences one could compose; likewise, there are a large (probably infinite) number of valid source code programs one could create. If someone says ”on the computer she is,” we immediately recognize that the sentence is ill- formed. It’s structure is invalid, because the noun phrase should proceed the verb phrase. It should be: ”She is on the computer .If we take a look at that diagramming article, well see that the model is exactly like an AST. So it goes without saying that parsing, or more formally, syntactical analysis,” has its roots in Linguistics. Moreover, just as in English, programming languages need to be specified in a way that allows us to verify whether a sentence of the language is valid. That’s where context-free grammars (CFG) come to into play; they allow us to specify the syntax of a programming language’s source code. 12
  25. 25. CSED, NIT Agartala Vibhakti as Pointer . Figure 2.4: Vibhakti 13
  26. 26. CSED, NIT Agartala Basic conjugational endings : Figure 2.5: Conjugational . 14
  27. 27. CSED, NIT Agartala Basic noun and adjective declension . Figure 2.6: Noun and Adjective 15
  28. 28. CSED, NIT Agartala A-stems (noun words ending with a) Figure 2.7: Noun Word . 16
  29. 29. CSED, NIT Agartala i- and u-stems . Figure 2.8: Noun . Figure 2.9: Noun 17
  30. 30. CSED, NIT Agartala Sanskrit verbs There are 10 types of verb declension forms. One example of bhava root word is given here. (Only present, past, future). . Figure 2.10: Noun 2.4 Makefile GNU make utility to maintain groups of programs.The purpose of the make utility is to de- termine automatically which pieces of a large program need to be recompiled, and issue the commands to recompile them.To prepare to use make, you must write a file called the make- file that describes the relationships among files in your program, and the states the commands for updating each file. In a program, typically the executable file is updated from object files, which are in turn made by compiling source files . Once a suitable makefile exits, each time you change some source files. make command will process the file called makefile. In that case, we should use -f option if you want make command processes Makefile. make clean:- ”make clean” deletes any files generated by previous attempts, leaving you with clean source code 18
  31. 31. Chapter 3 System Design 3.1 Spiral Model The spiral model of software development is show diagrammatic representation of this model appears like a spiral with many loops. The exact number of loop in the spiral is not fixed each loop of the spiral represents a phase of the software process. This model is much more flex- ible than other model,since the exact no of phase of the phases through which the product is developed is not fixed. Each phase in this model is split into four sectors as shown in figure. The first quadrant identifies the objectives of the phase and the alternative solution is possible for the phase under consideration. During second phase, the alternative solutions are evaluate the best solutions possible. The spiral model provides direct support for coping with project risks.Activities during the fourth quadrant concern reviewing the result of the stages traversed so far with the customer and planning the next iteration around the spiral. This is viewed as meta model,since it subsumes all the discussed model. The spiral mode; uses a prototyping ap- proach by first building a prototype before embarking in the actual product development effort. Also, the spiral model can be considered as supporting the evolutionary model-the iterations 19
  32. 32. CSED, NIT Agartala Figure 3.1: Spiral Model along the spiral can be considered as evolutionary model levels through which the complete system is built. This enables the developer to understand and resolve the risks at each evolu- tionary level.the spiral model uses prototyping as a risk reduction mechanism and also return the systematic step-wise approach of the waterfall model. 3.2 Input Stages The main input stages can be listed as below: • Data supply • Data transaction • Data synchronization • Data verification • Data validation • Data correction 20
  33. 33. CSED, NIT Agartala 3.3 Input Types It is necessary to determine the various types of inputs.Inputs can be categorized as follows: • External inputs,which are prime inputs for the system. • Internal inputs,which are user communications with the system. • which are inputs entered during a dialogue. 3.4 Input Media At this stage choice has to be made about the input media. To conclude about the input media consideration has to be given to: • Type of input • Flexibility of format • Speed • Accuracy • Easy of correction • Easy to use • Portability 3.5 Data Flow Diagram 21
  34. 34. CSED, NIT Agartala Figure 3.2: Data Flow Diagram Figure 3.3: Data Flow Diagram 22
  35. 35. CSED, NIT Agartala 3.6 Output Design Outputs from computer systems are required primarily to communicate the results of processing to users.They are also used to provide a permanent copy of the results for later consultation.The various types of outputs are: • External Outputs,whose destination is in the file named Temp. • Internal outputs whose destination is with in organization and they are the Users main interface with the Linux system. • Operational outputs whose use is purely with in the android mobile department. • Interface outputs,which involve the user in communicating directly with the system. 23
  36. 36. Chapter 4 Implementation & Screen shots We will be finding trend in programming languages which are moving faster from machine level to high level to human level languages. See how it is moving from assembly¿c¿c++¿Java¿ruby And this will not stop until they create something entirely humanly. The scope of Sanskrit to become a computer language lies in library system. When you compile a code in C, it patches your code with some predefined libraries. E.g. if you do strcmp(string1,string2) is the best way to do it because it will link library code in your executable. Libraries are written in assembly language and highly optimized. So if you have all libraries with you, why you need C? Why cant just say GO AND OPEN THE DOOR and expect computer to understand it and do it in highly optimized way. Onus lies with intelligent interpreter. Sanskrit is language where letters have meanings. It does not need to be words for them to transmit emotions/information. Composition of letters to words, again changes their meaning. Yes, something like OOPS. E.g. ANU is particle and PARMANU is nanoparticle. To be a programming language Consistency is needed which is there in Sanskrit. Ill explorer more in future how Sanskrit can be adjusted to be a human computer language.Sanskrit is not descriptive language. You dont need to write paragraphs to explain. When you translate something to Sanskrit, its size will reduce. It is precise, crisp and clear. 24
  37. 37. CSED, NIT Agartala 4.1 Parser :- Parsing is the de-linearization of linguistic input; that is, the use of grammatical rules and other knowledge sources to determine the functions of words in the input sentence. Getting an efficient and unambiguous parse of natural languages has been a subject of wide interest in the field of artificial intelligence over past 50 years. A parser breaks data into smaller elements, according to a set of rules that describe its structure. Parsing is the process of analysing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given grammar. Following are the Steps to generate a Parse Tree:- 1. : Input is a English sentence. 2. : Lexical Analyzer Creates Tokens. 3. : Tokens generated acts as an input to Semantic analyzer. 4. : Tokens generated acts as an input to Semantic analyzer. 5. : Output is a parse tree. 4.1.1 Parsing Methods : There are two types of parsing methods: top-down and bottom-up. ”Top-down” is pretty much self-explanatory. From left to right, we drill down through each non-terminal until we get to a terminal. We also build our tree from the root node down to the leaves in a top-down fashion. It’s important to note that we drill down from left to right replacing the leftmost non-terminal first. The definitive meaning of top-down parsing is an attempt to find a leftmost derivation.” In bottom-up parsing we are doing a rightmost derivation, where we replace the rightmost non- terminal first. • Bottom-Up Parsing In bottom-up parsing the derivation starts from the string of terminals (our sentence) . We try to derive the start symbol of our CFG. It’s essentially a top-down derivation back- wards. Initially, instead of replacing a non-terminal with another non-terminal or terminal 25
  38. 38. CSED, NIT Agartala (drilling down), we replace a terminal with non-terminal (drilling up). At certain points we may even replace several non-terminals with one non-terminal. Since the derivation is the exact reverse of a leftmost derivation, we are then replacing non-terminals from right to left (a rightmost derivation). When we make a replacement we create a node that becomes the parent of some other node instead of its child. • Top-Down Parsing There are several problems with top-down parsing. (1) Left-recursion can lead to infinite parsing loops, so it must be eliminated. Left re- cursion in a CFG production occurs when the non-terminal on the left side appears first on the right side of the arrow. There are simple algorithms to remove it, but the CFG becomes twice as long in many cases. (2) Top-down parsing may involve backtracking. Backtracking is the act of climbing back up the derivation (the parse), reversing everything to try another derivation path. We end up re-scanning the input as well. If inserting information into a symbol table as the parse proceeds, everything has to be removed. The need for backtracking can be eliminated by parsing with lookahead. Backtracking isn’t restricted to top-down parsers. There are backtracking LR parsers as well. Finally, (3) the order in which we choose non-terminal expansions can cause valid inputs to be rejected without information as to why. 4.1.2 Ambiguity : Ambiguous grammars are those in which a string of the language has more than one parse tree. This is problematic because it may be hard to interpret the intended meaning of the string. x*y; That C statement can be interpreted as the multiplication of two variables, x and y, or as the declaration of a variable y whose type is a pointer to x. To resolve the conflict the compiler must locate y’s type information in the symbol table. If it’s a numerical type the statement is interpreted as an expression. Generally speaking, ambiguity is an unwanted feature of any grammar and may pose a threat to the correctness of both top-down and bottom-up parsers. Different parsers handle it with varying efficacy. In spite of all this, ambiguity isn’t always a problem. It’s possible to generate a non-ambiguous language from an ambiguous grammar. Even if there are two parse trees that generate a string, as long as it has one intended meaning there’s no problem. Some parser generators allow specifying precedence and associativity rules to remove any ambiguity. 26
  39. 39. CSED, NIT Agartala 4.2 Implementation Steps :- The following steps used for developing this application: 4.2.1 The Lexer : The first step towards creating a succesful Sanskrit English Parser(SEP) is to create a lexer that analyses every word of the input sanskrit sentence. Tokenizer: The tokenizer divides the complete sentence in a stream of individual words seperated by blank spaces. Avyaya Analyser : Every single output of the tokenizer goes through the smallest database of avyaya words(indeclinables) and only if it produces a complete match, the word is accepted as an avyaya. Verb Analyser : The second relatively bigger database of verb roots(dhaturoops) is placed after the avyaya database. Tokens not recognized as avyaya are then processed by the verb analyser. The pro- gram verb.cpp analyses the suffix of every input token and generates information regarding tense, person and number of corresponding token. The suffix is then removed and the verb is mapped to its respective root using the verb databse. If a match is found the token is accepted as a verb, else passed on for noun analysis. 27
  40. 40. CSED, NIT Agartala Noun Analyser : Tokens not yet recognized are fed to the noun analyser (noun.cpp). Noun declensions belonging to different genders have different pattern that can not be matched by the program. Hence of the 21 possible noun declensions for 1 single noun, 10 declensions are stored as exceptions while remaining 11 are processed by the program and the root word is obtained. Lastly if the word is still not recognized than it is not present in the database and must be entered manually for analysis. Figure 4.1: lexical Analysis Steps 4.2.2 The Parser : Equipped with the knowledge of what individual words represent we can now move towards re-arranging them in such a way that their mere translation results in a meaningful English sentence. When parsing from Sanskrit to English we move from a word order free language to a language in which only a particular order of words would convey the same meaning. 28
  41. 41. CSED, NIT Agartala How to represent CONTEXT ? By CONTEXT we mean the parts of a statement that precede or follow a specific word or pas- sage, usually influencing its meaning or effect. Sanskrit uses the concept of vibhakti to generate context. Due to lack of vibhakti in English the user will have to understand the context of every word with help from the LEXER. Using the lexer the user can add words like for, from, to, etc. which are not used in Sanskrit. Thus the PARSER gives us the spatial arrangement of input words in converted form (in English) and the LEXER is referred for context. This results in English translation of a Sanskrit sentence. Structure of an English sentence : Every English sentence is a combination of nouns and verbs related to each other through con- text. In a SIMPLE sentence (sentence without connectors having only 1 verb), the verb is the central entity. Nouns then relate to this central entity via context, as defined- Nominative(S) the SUBJECT/doer of verb Accusative (O) the OBJECT of verb Instrumental (I) the cause/means of verb Dative (D) the indirect object of verb Ablative (A) represents comparison/separation Locative (L) represents position in space/time The LEXER already generates this contextual information for every noun, and the PARSER can now arrange a simple input sentence spatially, following the rules of English as shown below. Thus, we have the following order S V O L/A/D/I The PARSER interprets LEXER’s outputs and rearranges various nouns at their respective po- sitions as shown. The user can now apply context of every noun used, to obtain a corresponding English translation. Parsing rules for a simple sentence : The PARSER can handle all forms of noun declensions,verb declensions and avyayas(including connectors). Following points summarise the working of the parser - 29
  42. 42. CSED, NIT Agartala . Figure 4.2: Parsing • The parser stores nouns, verbs and avyaya in 3 separate structures along with their re- spective information required by the parser like case context,number,person. • The parser can handle words representing adjectives. • The parser can handle words representing adverbs. • The parser can resolve ambiguity generated by Sanskrit noun declensions. Ex. If an input Sanskrit sentence contains no nominative noun but there is a noun which can be both nominative and accusative then it is treated as nominative. • The parser requires that the subject and verb agree on number.thus, is correct but, is incorrect • The parser also handles the GENETIVE case which represents a noun-noun relationship rather than a noun-verb relationship as other declensions do. • The parser handles avyayas which correspond to a given noun declension type. • The parser handles avyayas representing questions. • The parser handles avyayas that act as conjunctions of different types • The parser can thus handle multiple sentences joined together using avyayas. 30
  43. 43. CSED, NIT Agartala • The parser displays the interpreted spatial arrangement of the input sentence, in a text file named temp. • The parser can process an input even if some part of it is not defined in the laxer database. Such unrecognized input tokens are outputed as it is, at the start of resultant sentence, in the temp file. 4.2.3 Grammer Used : Sanskrit uses a context free grammar. Also the BNF grammar for Sanskrit also exists. The various forms of BNF grammar is given as: <BNF rule> ::= <nonterminal > ”::=” <definitions > <nonterminal > ::=” <” <words > ”>” <terminal > ::= <word > | <punctuation mark > |’ ” ’ <any chars >’ ” ’ <words > ::= <word >|<words ><word > <word > ::= <letter >|<word ><letter >|<word ><digit > <definitions > ::= <definition >|<definitions >”|” <definition > <definition > ::= <empty >|<term >|<definition ><term <empty > ::= <term > ::= <terminal >|<nonterminal > 4.2.4 Uses Of A Grammar : A BNF grammar can be used in two ways :- • To generate strings belonging to the grammar • To do this, start with a string containing a non-terminal; while there are still non-terminals in the string replace a non-terminal with one of its definitions. • To recognize strings belonging to the grammar • This is the way programs are compiled - a program is a string belonging to the grammar that defines the language 31
  44. 44. CSED, NIT Agartala • Recognition is much harder than generation 32
  45. 45. CSED, NIT Agartala 4.3 Input & Output : Figure 4.3: Output Snapshot 33
  46. 46. CSED, NIT Agartala Figure 4.4: Output SnapShot 34
  47. 47. Chapter 5 Testing While developing this project we faced some discrepancy between the grammar definition and the query classes implementation. In order to have a coherent implementation, we had to correct them. For the testing there are different strategies :- 5.1 Syntax Error Handling: Planning the error handling right from the start can both simplify the structure of a compiler and improve its response to errors. The program can contain errors at many different levels. e.g. • Lexical such as misspelling an identifier, keyword, or operator. • Syntax such as an arithmetic expression with unbalanced parenthesis. • Semantic such as an operator applied to an incompatible operand. 35
  48. 48. CSED, NIT Agartala • Logical such as an infinitely recursive call. Much of the error detection and recovery in a compiler is centred on the syntax analysis phase. One reason for this is that many errors are syntactic in nature or are exposed when the stream of tokens coming from the lexical analyser disobeys the grammatical rules defining the programming language. Another is the precision of modern parsing methods; they can detect the presence of syntactic errors in programs very efficiently. The error handler in a parser has simple goals:- • It should the presence of errors clearly and accurately. • It should recover from each error quickly enough to be able to detect subsequent errors. • It should not significantly slow down the processing of correct programs. 5.2 Error-Recovery Strategies : There are many different general strategies that a parser can employ to recover from a syntactic error. • Panic mode • Phrase level • Error production • Global correction 5.2.1 Panic mode: • This is used by most parsing methods. • On discovering an error, the parser discards input symbols one at a time until one of a designated set of synchronizing tokens ( delimiters; such as; semicolon or end ) is found. 36
  49. 49. CSED, NIT Agartala • Panic mode correction often skips a considerable amount of input without checking it for additional errors. • It is simple. 5.2.2 Phrase-level recovery: • On discovering an error; the parser may perform local correction on the remaining input; i.e., it may replace a prefix of the remaining input by some string that allows the parser to continue. • Exmple, local correction would be to replace a comma by a semicolon, deleting an extra- neous semicolon, or insert a missing semicolon. • Its major drawback is the difficulty it has in coping with situations in which the actual error has occurred before the point of detection. 5.2.3 Error productions : • If an error production is used by the parser, can generate appropriate error diagnostics to indicate the erroneous construct that has been recognized in the input. 5.2.4 Global correction : • Given an incorrect input string x and grammar G, the algorithm will find a parse tree for a related string y, such that the number of insertions, deletions and changes of tokens required to transform x into y is as small as possible. 37
  50. 50. Chapter 6 Conclusion The project is mainly based on Two languages C and C++. In this project we have Used Sanskrit as an input language and English as an output language. Firstly Taking input Sanskrit from Keyboard , Tokenize the sentence using Tokenizer , Identifying the tokens using Token Analyser , Then matching the Tokens from database and fetching the output words and finally Add all the resulting words to produce the output . The main goal of the current study was to parse a Sanskrit sentence so that later on it could be easy to translate it in some other language. The findings from this study make several contributions to the current literature. First that we should use Sanskrit as the primary language for programming purpose . Finally, a number of important limitations need to be considered. First This project is all about Parsing a language into another , it is not a pure translator. Second This project is platform dependent (here platform is Linux) and third It is Database oriented project not just using on- line approach. It is recommended that further research be undertaken in the following areas: • We can make this project more user friendly by using graphical user interface. • We can apply this scheme on many different languages. 38
  51. 51. CSED, NIT Agartala The findings of this study have a number of important implications for future practice.This translator is mainly based on fetching of data from database 39
  52. 52. Chapter 7 Appendix A Avyaya Analyser 37 Ambiguous 15 C Compiler 6 Code Optimization 9 Code Generation 9 D Drawbacks 4 Data Flow Diagram 20 E Error-Recovery Strategies 35 Error productions 36 G Grammar 11 Grammer Used 30 40
  53. 53. CSED, NIT Agartala Global correction 36 I Intermediate Code Generation 8 Input Stages 19 Input Types 20 L Lexical Analysis Phase 7 M Makefile 17 O Objective 3 Output Design 22 P Parsing Methods 9 SS Scope 2 T Testing 34 U Uses Of A Grammar 30 41
  54. 54. Chapter 8 Reference To our Project Supervisor Assistant Professor Nikhil Debbarma and our Project Coordina- tor Assistant Professor Suman Deb for sharing valuable knowledge, encouragement showing confidence on us all the time and some link on internet. • Sanskrit & Artificial Intelligence —NASA Knowledge Representation in Sanskrit and Artificial Intelligence by Rick Briggs Roacs, NASA Armes Research Centre, Moffet Field, California • http://www.vedicsciences.net/articles/sanskrit-nasa.html • AI Magazine publishes the importance of Sanskrit • http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/466 • http://sanskrit.jnu.ac.in/morph/analyze.jsp • http://uttishthabharata.wordpress.com/2011/05/30/sanskrit-programming/ 42

×