0
Project Mentor:
Mr. Nikhil Debbarma
Assistant Prof.
CSE Dept.
NIT,Agartala
Team Members:
Akash Bhargava (10UCS002)
Ashok K...
Translator must know the Grammatical Structure
of both Input and Output language.
 According to many researchers, Sanskrit
is a very scientific language.
 Sanskrit behaves very closely as
programming la...
“NASA scientist Rick Briggs had invited 1,000 Sanskrit
scholars from India for working at NASA. But scholars
refused to al...
We will first put up some concepts then employ
them -
 1. Advantages of using Sanskrit
 2. Lexical Analysis
 3. Parsing...
Advantages of using Sanskrit -
Why Sanskrit)
Fixed Morphology
Vibhakti as Pointer
Vibhakti as Pointer
 Lexical analysis is the process of converting a
sequence of characters into a sequence of tokens
 A program or function...
Lexical Analyzer Parser
Source
program
token
getNextToken
Indexed
Database
Output
 Output of lexical analysis is a stream of tokens
 A token is a syntactic category
◦ In English:
noun, verb, adjective, ...
 An implementation must do two things:
1. Recognize substrings corresponding to tokens
2. Search the identified token in ...
 Two important points:
1. The goal is to partition the string. This is implemented
by reading left-to-right, recognizing ...
LEXICAL ANALYSIS
LEXICAL ANALYSIS
Consider the dhatu(verb root) meaning „to heat‟
The following inflections are analyzed lexically -
HEATS ...
LEXICAL ANALYSIS
Consider the noun representing God
The following inclusions are possible
1. Nominative (subject)
2. Accus...
LEXICAL ANALYSIS
Input Sentence
Tokenize
Avyaya Analysis
Verb Analysis
Noun Analysis
Unknown word(add to database)
 The scanner recognizes words
 The parser recognizes syntactic units
 Parser operations:
◦ Check and verify syntax base...
1. Simplicity of design
2. Improving efficiency
3. Enhancing portability
Parsing Sanskrit Text
Now we move towards translating a Sanskrit
sentence into its parser equivalent
PARSING
Analyze (a se...
Parsing Sanskrit Text
Sanskrit Sentence Structure
SOV
English Sentence Structure
SVO
Boy reads chapter
S O V S V O
 We first tokenize the input using strtok(str,” ”);
 Each token can be of 3 types- Noun,verb,
preposition.The task is to...
 Bottom-Up LR
◦ Construct parse tree in a bottom-up manner
◦ Find the rightmost derivation in a reverse order
◦ For every...
 Programming language used: C and C++
 Database Used: Linux file system, indexed
 Data Structures: Array, Linked List, ...
 ::: this is a avyaya.. and the meaning is: where_there ]
 ::: Nominative,Singular, Gender-Masculine ,noun and the root ...
Avyaya words(indeclinables) are used to connect 2 or
more simple sentences. Examples -
- (if-then)
- (where-there)
(but)
(...
 Every word encountered in the input sentence could be
any parts of speech of sanskrit as there is no fixed
ordering.
 B...
 Grammar of Sanskrit language
 How can we represent it in BNF grammar.
 Parser techniques
 Structure of code
 A big chunk of our time was invested in research of
sanskrit language and its grammar which was quite
difficult.
 Till ...
Sanskrit & Artificial Intelligence — NASA
Knowledge Representation in Sanskrit and Artificial Intelligence
by Rick Briggs
...
Sanskrit parser Project Report
Sanskrit parser Project Report
Upcoming SlideShare
Loading in...5
×

Sanskrit parser Project Report

510

Published on

In this project we will basically try to parse a Sanskrit sentence so that later on it could be easy
to translate it in some other language.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
510
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Sanskrit parser Project Report"

  1. 1. Project Mentor: Mr. Nikhil Debbarma Assistant Prof. CSE Dept. NIT,Agartala Team Members: Akash Bhargava (10UCS002) Ashok Kumar(10UCS010) Laxmi Kant Yadav(10UCS027) Vijay Kumar Gupta(10UCS057)
  2. 2. Translator must know the Grammatical Structure of both Input and Output language.
  3. 3.  According to many researchers, Sanskrit is a very scientific language.  Sanskrit behaves very closely as programming language.  So if we are able to make a translator that translates Sanskrit into machine code, then it would prove to be a significant development in the field of NLP(Natural Language Processing).
  4. 4. “NASA scientist Rick Briggs had invited 1,000 Sanskrit scholars from India for working at NASA. But scholars refused to allow the language to be put to foreign use”- Dainik Being a computer and human understandable, Sanskrit was considered useful in Space research and many other natural language processing Applications.
  5. 5. We will first put up some concepts then employ them -  1. Advantages of using Sanskrit  2. Lexical Analysis  3. Parsing  4. Approach  5. Where we are now.  6. Problems  7. References
  6. 6. Advantages of using Sanskrit - Why Sanskrit)
  7. 7. Fixed Morphology
  8. 8. Vibhakti as Pointer
  9. 9. Vibhakti as Pointer
  10. 10.  Lexical analysis is the process of converting a sequence of characters into a sequence of tokens  A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner  A lexer often exists as a single function which is called by a parser or another function, or can be combined with the parser in scanner less parsing  The lexical analyzer is the first phase of translator. It‟s main task is to read the input characters and produces output a sequence of tokens that the parser uses for syntax analysis.
  11. 11. Lexical Analyzer Parser Source program token getNextToken Indexed Database Output
  12. 12.  Output of lexical analysis is a stream of tokens  A token is a syntactic category ◦ In English: noun, verb, adjective, … ◦ In sanskrit language: Vibhakti, kriya, vishashena, ..  Parser relies on the token distinctions:
  13. 13.  An implementation must do two things: 1. Recognize substrings corresponding to tokens 2. Search the identified token in the database to recognize it‟s context 3. According to the different context it may be different parts of speech of Sanskrit language eg: verb (kriya), vibhakti (dhatu roop). 4. Every token is tagged accordingly.
  14. 14.  Two important points: 1. The goal is to partition the string. This is implemented by reading left-to-right, recognizing one token at a time 2. “Lookahead” may be required to decide where one token ends and the next token begins ◦ Even our simple example has lookahead issues i vs. if = vs. == 14
  15. 15. LEXICAL ANALYSIS
  16. 16. LEXICAL ANALYSIS Consider the dhatu(verb root) meaning „to heat‟ The following inflections are analyzed lexically - HEATS WILL HEAT , , | , , | , , | , , | , , , , HEATED HEAT IT(order) , , | , , | , , | , , | , , , ,
  17. 17. LEXICAL ANALYSIS Consider the noun representing God The following inclusions are possible 1. Nominative (subject) 2. Accusative (object) 3. Instrumental (by) 4. Dative(to) 5. Ablative(from) 6. Genitive(of) 7. Locative(in)
  18. 18. LEXICAL ANALYSIS Input Sentence Tokenize Avyaya Analysis Verb Analysis Noun Analysis Unknown word(add to database)
  19. 19.  The scanner recognizes words  The parser recognizes syntactic units  Parser operations: ◦ Check and verify syntax based on specified syntax rules ◦ Report errors  Automation: ◦ The process can be automated
  20. 20. 1. Simplicity of design 2. Improving efficiency 3. Enhancing portability
  21. 21. Parsing Sanskrit Text Now we move towards translating a Sanskrit sentence into its parser equivalent PARSING Analyze (a sentence) into its component parts and describe their syntactic roles. Analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.
  22. 22. Parsing Sanskrit Text Sanskrit Sentence Structure SOV English Sentence Structure SVO Boy reads chapter S O V S V O
  23. 23.  We first tokenize the input using strtok(str,” ”);  Each token can be of 3 types- Noun,verb, preposition.The task is to identify these token which is done by matching in indexed database.  Each token is stored in a structure along with the meaning and its morphologic.  Then parser comes into play and form a tree type of structure using these tokens.
  24. 24.  Bottom-Up LR ◦ Construct parse tree in a bottom-up manner ◦ Find the rightmost derivation in a reverse order ◦ For every potential right hand side and token decide when a production is found More powerful  Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically
  25. 25.  Programming language used: C and C++  Database Used: Linux file system, indexed  Data Structures: Array, Linked List, structure,Tree, Indexing and Hashing  INPUT: A sanskrit sentence or paragraph  eg: !  OUTPUT: recognize all the parts of speech  Form a tree structure to be able to understand the sentence.
  26. 26.  ::: this is a avyaya.. and the meaning is: where_there ]  ::: Nominative,Singular, Gender-Masculine ,noun and the root is: and the meaning is Ram  ::: The root is: the meaning is: go present-tense,first- person,singular  ::: this is a avyaya.. and the meaning is: there  ::: Nominative,Plural Gender-Masculine ,noun ,and the root is: and the meaning is god  ::: Instrumental,Singular, Gender-Masculine ,noun, and the root is: and the meaning is boy  ::: Accusative,Singular, Gender-Feminine ,noun and the root is: and the meaning is river
  27. 27. Avyaya words(indeclinables) are used to connect 2 or more simple sentences. Examples - - (if-then) - (where-there) (but) (hence) (provided,if) Not only do avyaya connect sentences but they also affect structure of a simple sentence.
  28. 28.  Every word encountered in the input sentence could be any parts of speech of sanskrit as there is no fixed ordering.  Because of the above mentioned property of sanskrit, searching becomes important.  Database and word collection were in unicode format, size of each word becomes even larger.
  29. 29.  Grammar of Sanskrit language  How can we represent it in BNF grammar.  Parser techniques  Structure of code
  30. 30.  A big chunk of our time was invested in research of sanskrit language and its grammar which was quite difficult.  Till now we have implemented lexer part and parser part.
  31. 31. Sanskrit & Artificial Intelligence — NASA Knowledge Representation in Sanskrit and Artificial Intelligence by Rick Briggs  http://www.vedicsciences.net/articles/sanskrit-nasa.html  AI Magazine publishes the importance of Sanskrit  http://www.parankusa.org/SanskritAsProgramming.pdf  http://sanskrit.jnu.ac.in/morph/analyze.jsp  http://en.wikipedia.org/wiki/Sanskrit_verbs  http://en.wikipedia.org/wiki/Sanskrit_grammar
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×