Sanskrit in Natural Language ProcessingHitesh Joshi
As Sanskrit is most unambiguous language as compare to other natural languages. As stated by Rick Briggs, NASA it is the most suitable language for the computer in natural language processing.
The classification of the modern arabic poetry using machine learningTELKOMNIKA JOURNAL
In recent years, working on text classification and analysis of Arabic texts using machine learning
has seen some progress, but most of this research has not focused on Arabic poetry. Because of some
difficulties in the analysis of Arabic poetry, it was required the use of standard Arabic language on which
“Al Arud”, the science of studying poetry is based. This paper presents an approach that uses machine
learning for the classification of modern Arabic poetry into four types: love poems, Islamic poems, social
poems, and political poems. Each of these species usually has features that indicate the class of
the poem. Despite the challenges generated by the difficulty of the rules of the Arabic language on which
this classification depends, we proposed a new automatic way of modern Arabic poems classification to
solve these issues. The recommended method is suitable for the above-mentioned classes of poems. This
study used Naïve Bayes, Support Vector Machines, and Linear Support Vector for the classification
processes. Data preprocessing was an important step of the approach in this paper, as it increased
the accuracy of the classification.
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urduijtsrd
This paper provides the phonological properties of the alignment and the economical procedures of the epenthesis at the syllable structure of the words in Aligarh Urdu. The paper determines the behavior of certain segments that attach to its own neighboring words and elaborates the economy of the syllable structure of tokens in a particular language. In Aligarh Urdu, there are various types of segmental processes in terms of addition or deletion of phonemes that affects to the root and alters the entire physical mechanism structure of words. The objectives of this paper are to know the exact economic conditions of syllable structures in the words after the addition, elision and alignment of segments in Aligarh Urdu. All the process of conflicts between the addition and deletion of the segments will manipulate within the framework of constraint rankings in Optimality Theory Prince and Smolensky, 1993 . The general purpose of this paper is to reveal the whole criteria of implications of principles of Optimality Theory and explore the actual framework of syllables with their marginal and obligatory components. The researcher governs the phonological property of consonant clusters with the help of faithfulness constraints and markedness constraints. The architecture of root word completely varies from the artificial formulation of other words, but after the imposition of constraints, we reveal the concrete fact of linguistic items in a specific language. The groundwork of this paper leads to the systematic phenomena of epenthesis and the elimination of vowels or consonants with the tenets of OT. This paper deals with the gradient property of segments that alters the framework of underlying form and affected by some other features at the surface form. The generalization of each step of the syllable structure of words should be related to the positional variation of input and output candidates. The conflicts between input and output candidates to become the winner as an optimal candidate can be solved only on the presence of constraint rankings that are evolving in the Optimality Theory. Mohd Hamid Raza ""An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23517.pdf
Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/other/23517/an-ot-account-of-phonological-alignment-and-epenthesis-in-aligarh-urdu/mohd-hamid-raza
Sanskrit in Natural Language ProcessingHitesh Joshi
As Sanskrit is most unambiguous language as compare to other natural languages. As stated by Rick Briggs, NASA it is the most suitable language for the computer in natural language processing.
The classification of the modern arabic poetry using machine learningTELKOMNIKA JOURNAL
In recent years, working on text classification and analysis of Arabic texts using machine learning
has seen some progress, but most of this research has not focused on Arabic poetry. Because of some
difficulties in the analysis of Arabic poetry, it was required the use of standard Arabic language on which
“Al Arud”, the science of studying poetry is based. This paper presents an approach that uses machine
learning for the classification of modern Arabic poetry into four types: love poems, Islamic poems, social
poems, and political poems. Each of these species usually has features that indicate the class of
the poem. Despite the challenges generated by the difficulty of the rules of the Arabic language on which
this classification depends, we proposed a new automatic way of modern Arabic poems classification to
solve these issues. The recommended method is suitable for the above-mentioned classes of poems. This
study used Naïve Bayes, Support Vector Machines, and Linear Support Vector for the classification
processes. Data preprocessing was an important step of the approach in this paper, as it increased
the accuracy of the classification.
An OT Account of Phonological Alignment and Epenthesis in Aligarh Urduijtsrd
This paper provides the phonological properties of the alignment and the economical procedures of the epenthesis at the syllable structure of the words in Aligarh Urdu. The paper determines the behavior of certain segments that attach to its own neighboring words and elaborates the economy of the syllable structure of tokens in a particular language. In Aligarh Urdu, there are various types of segmental processes in terms of addition or deletion of phonemes that affects to the root and alters the entire physical mechanism structure of words. The objectives of this paper are to know the exact economic conditions of syllable structures in the words after the addition, elision and alignment of segments in Aligarh Urdu. All the process of conflicts between the addition and deletion of the segments will manipulate within the framework of constraint rankings in Optimality Theory Prince and Smolensky, 1993 . The general purpose of this paper is to reveal the whole criteria of implications of principles of Optimality Theory and explore the actual framework of syllables with their marginal and obligatory components. The researcher governs the phonological property of consonant clusters with the help of faithfulness constraints and markedness constraints. The architecture of root word completely varies from the artificial formulation of other words, but after the imposition of constraints, we reveal the concrete fact of linguistic items in a specific language. The groundwork of this paper leads to the systematic phenomena of epenthesis and the elimination of vowels or consonants with the tenets of OT. This paper deals with the gradient property of segments that alters the framework of underlying form and affected by some other features at the surface form. The generalization of each step of the syllable structure of words should be related to the positional variation of input and output candidates. The conflicts between input and output candidates to become the winner as an optimal candidate can be solved only on the presence of constraint rankings that are evolving in the Optimality Theory. Mohd Hamid Raza ""An OT Account of Phonological Alignment and Epenthesis in Aligarh Urdu"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23517.pdf
Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/other/23517/an-ot-account-of-phonological-alignment-and-epenthesis-in-aligarh-urdu/mohd-hamid-raza
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
An implementation of apertium based assamese morphological analyzerijnlc
Morphological Analysis is an important branch of linguistics for any Natural Language Processing Technology. Morphology studies the word structure and formation of word of a language. In current scenario of NLP research, morphological analysis techniques have become more popular day by day. For processing any language, morphology of the word should be first analyzed. Assamese language contains very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers for developing morphological analyzer for Assamese Language with some limited domain and we get 72.7% accuracy
المجلد: 1 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
مجلةالأهواز لدراسات علم اللغة
(مجلة فصلية دولية محكمة)
(ISSN: 2717-2716)
لمزید من المعلومات، ﯾرﺟﯽ زﯾﺎرة ﻣوﻗﻌﻧﺎ اﻹﻟﮐﺗروﻧﻲ : WWW.AJLS.IR
ترحب المجلة بجميع الباحثين في مجال اهتمامها العلمي والبحثي بإحدی اللغات التالیة: العربیة، الإنجلیزیة و الفارسیة فی احد المحاور المذکورة ادناه:
أ) اللغات و اللهجات (القضايا الراهنة بلسانیات اللغة)
ب) علم اللغة (القضايا الراهنة بعلم اللغة)
ج) الأدب (القضاية الراهنة بالأدب العربي، الإنجليزي، و سائر اللغات)
د) الترجمة (القضاية الراهنة بترجمة اللغات)
ه) القضايا الراهنة بلسانیات القرآن الکریم
و) القضايا الراهنة لتعلیم اللغات لغير الناطقين بها
ز) تعليم، برمجة و تقييم برامج تعليم و تعلم اللغات
ح) الاستراتيجيات، إمكانیات و تحديات التسويق وريادة الأعمال فی اللغات المتنوعة
ط) القضايا الراهنة بلسانیات النصوص و الخطاب الديني، الاقتصادی، الاجتماعي، القانوني، و ...
الأهواز / الصندوق البريدی 61335-4619:
الهاتف :32931199-61 (98+)
الفاکس:32931198-61(98+)
النقال و رقم للتواصل علی الواتس اب : 9165088772(98+)
البريد اﻹﻟﮑﺘﺮوﻧﻲ: info@pahi.ir
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONijnlc
The availability of lexical resources is huge to accelerate and simplify the sentiment analysis in English. In Arabic, there are few resources and these resources are not comprehensive. Most of the current research efforts for constructing Arabic Sentiment Lexicon (ASL) depend on a large number of lexical entities. However, the coverage of all Arabic sentiment expressions can be applied using refined regular expressions rather than a large number of lexical entities. This paper presents an ASL that more comprehensive than the existing lexicons, for covering many expressions with different dialects including Franco-Arabic, and in the same time more compact. Also, this paper shows how to integrate different lexicons and to refine them. To enrich lexical entries with very robust morphological syntactical information, regular expressions, the weight of sentiment polarity and n-gram terms have been augmented to each
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
Marathi Text-To-Speech Synthesis using Natural Language Processingiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
An implementation of apertium based assamese morphological analyzerijnlc
Morphological Analysis is an important branch of linguistics for any Natural Language Processing Technology. Morphology studies the word structure and formation of word of a language. In current scenario of NLP research, morphological analysis techniques have become more popular day by day. For processing any language, morphology of the word should be first analyzed. Assamese language contains very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers for developing morphological analyzer for Assamese Language with some limited domain and we get 72.7% accuracy
المجلد: 1 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
مجلةالأهواز لدراسات علم اللغة
(مجلة فصلية دولية محكمة)
(ISSN: 2717-2716)
لمزید من المعلومات، ﯾرﺟﯽ زﯾﺎرة ﻣوﻗﻌﻧﺎ اﻹﻟﮐﺗروﻧﻲ : WWW.AJLS.IR
ترحب المجلة بجميع الباحثين في مجال اهتمامها العلمي والبحثي بإحدی اللغات التالیة: العربیة، الإنجلیزیة و الفارسیة فی احد المحاور المذکورة ادناه:
أ) اللغات و اللهجات (القضايا الراهنة بلسانیات اللغة)
ب) علم اللغة (القضايا الراهنة بعلم اللغة)
ج) الأدب (القضاية الراهنة بالأدب العربي، الإنجليزي، و سائر اللغات)
د) الترجمة (القضاية الراهنة بترجمة اللغات)
ه) القضايا الراهنة بلسانیات القرآن الکریم
و) القضايا الراهنة لتعلیم اللغات لغير الناطقين بها
ز) تعليم، برمجة و تقييم برامج تعليم و تعلم اللغات
ح) الاستراتيجيات، إمكانیات و تحديات التسويق وريادة الأعمال فی اللغات المتنوعة
ط) القضايا الراهنة بلسانیات النصوص و الخطاب الديني، الاقتصادی، الاجتماعي، القانوني، و ...
الأهواز / الصندوق البريدی 61335-4619:
الهاتف :32931199-61 (98+)
الفاکس:32931198-61(98+)
النقال و رقم للتواصل علی الواتس اب : 9165088772(98+)
البريد اﻹﻟﮑﺘﺮوﻧﻲ: info@pahi.ir
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
FURTHER INVESTIGATIONS ON DEVELOPING AN ARABIC SENTIMENT LEXICONijnlc
The availability of lexical resources is huge to accelerate and simplify the sentiment analysis in English. In Arabic, there are few resources and these resources are not comprehensive. Most of the current research efforts for constructing Arabic Sentiment Lexicon (ASL) depend on a large number of lexical entities. However, the coverage of all Arabic sentiment expressions can be applied using refined regular expressions rather than a large number of lexical entities. This paper presents an ASL that more comprehensive than the existing lexicons, for covering many expressions with different dialects including Franco-Arabic, and in the same time more compact. Also, this paper shows how to integrate different lexicons and to refine them. To enrich lexical entries with very robust morphological syntactical information, regular expressions, the weight of sentiment polarity and n-gram terms have been augmented to each
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
Phonetic Dictionary for Natural Language Processing: KannadaIJERA Editor
India has 22 officially recognized languages: Assamese, Bengali, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu, and Urdu. Clearly, India owns the language diversity problem. In the age of Internet, the multiplicity of languages makes it even more necessary to have sophisticated Systems for Natural Language Process. In this paper we are developing the phonetic dictionary for natural language processing particularly for Kannada. Phonetics is the scientific study of speech sounds. Acoustic phonetics studies the physical properties of sounds and provides a language to distinguish one sound from another in quality and quantity. Kannada language is one of the major Dravidian languages of India. The language uses forty nine phonemic letters, divided into three groups: Swaragalu (thirteen letters); Yogavaahakagalu (two letters); and Vyanjanagalu (thirty-four letters), similar to the vowels and consonants of English, respectively.
Natural Language Toolkit (NLTK) is a generic platform to process the data of various natural (human)
languages and it provides various resources for Indian languages also like Hindi, Bangla, Marathi and so
on. In the proposed work, the repositories provided by NLTK are used to carry out the processing of Hindi
text and then further for analysis of Multi word Expressions (MWEs). MWEs are lexical items that can be
decomposed into multiple lexemes and display lexical, syntactic, semantic, pragmatic and statistical
idiomaticity. The main focus of this paper is on processing and analysis of MWEs for Hindi text. The
corpus used for Hindi text processing is taken from the famous Hindi novel “KaramaBhumi by Munshi
PremChand”. The result analysis is done using the Hindi corpus provided by Resource Centre for Indian
Language Technology Solutions (CFILT). Results are analysed to justify the accuracy of the proposed
work.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals
Using automated lexical resources in arabic sentence subjectivityijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
USING AUTOMATED LEXICAL RESOURCES IN ARABIC SENTENCE SUBJECTIVITYijaia
A common point in almost any work on Sentiment analysis is the need to identify which elements of
language (words) contribute to express the subjectivity in text. Collecting of these elements (sentiment
words) regardless the context with their polarities (positive/negative) is called sentiment lexical resources
or subjective lexicon. In this paper, we investigate the method for generating Sentiment Arabic lexical
Semantic Database by using lexicon based approach. Also, we study the prior polarity effects of each word
using our Sentiment Arabic Lexical Semantic Database on the sentence-level subjectivity and multiple
machine learning algorithms. The experiments were conducted on MPQA corpus containing subjective and
objective sentences of Arabic language, and we were able to achieve 76.1 % classification accuracy.
Learn Sanskrit courses in Pune & PCMC from top training institutes and get, Sanskrit certification. Get detailed information on best institutes, fees, coaching quality, duration, syllabus, placement services, photos, maps, user ratings & reviews in Pune.
The objective of the research is to classify the serial-verb constructions in Thai automatically by using the
word classes from Thai WordNet to classify verbs in the sentence. Due to the Thai language has the extendto-
the-right structure and put the adjective after the noun. Its overall grammar characteristic is the
"Subject-Verb-Object" or SVO type. And Thai language can be communicated using one verb after another
within the same sentence, that we called "Serial Verb". Today we already have many researches about this
serial-verb constructions, but no research is about its automatic classification.
Author Credits - Maaz Nomani
A Proposition Bank is a collection of sentences which are hand-annotated with the information of semantic labels in the respective sentences. Currently, around 10,000 sentences containing 0.2 million words have been hand-annotated with the semantic labels information.
This is a natural language resource of very rich linguistic information which can be used in a variety of NLP applications such as semantic parsing, syntactic parsing, sentiment analysis, dialogue systems etc.
In this paper, we present one such resource for a resource-poor Indian language Urdu. The Proposition Bank of Urdu is built on already built Treebank of Urdu (A Treebank is a corpus of sentences annotated with their POS, morphological, head, TAM and dependency labels information). A Propbank adds a layer of semantic information over this Treebank and hence can facilitate semantic parsing and other semantic level operations in a natural language sentence.
Arabic language is the most spoken languages in the Semitic languages group, and one of the most common languages in the world spoken by more than 422 million. It is also of paramount importance to Muslims, it is a sacred language of the Islamic Holly Book (Quran) and prayer (and other acts of worship) in Islam is performed only by mastering some of Arabic words. Arabic is also a major ritual language of a number of Christian churches in the Arab world and it is also used in writing several intellectual and religious Jewish books in the Middle Ages. Despite this, there is no semantic Arabic lexicon which researchers can depend on. In this paper we introduce Azhary as a lexical ontology for the Arabic language. It groups Arabic words into sets of synonyms called synsets, and records a number of relationships between words such as synonym, antonym, hypernym, hyponym, meronym, holonym and association relations. The ontology contains 26,195 words organized in 13,328 synsets. It has been developed and contrasted against AWN which is the most common available Arabic lexical ontology.
THE ECOLOGY OF THE MIND
The subject of my reflection in this first chapter, that came quite late in my three months, emerged when I opened the Russian doll of the word « eco-cultural » in which the cultural always seemed to be contained by the ecological. I wondered for some time what culture was at stake. I observed and read a lot and came to the conclusion that the cultural meant by this term of eco-cultural was the culture accompanying the eco-systems that were being defended, and if they were purely natural (fauna and flora), the culture of this approach was biodiversity and all it entails, or if they were social and economic (traditional indigenous life style), the culture was the biodiversity of traditional crops, traditional agricultural methods, traditional community life, traditional social relations and traditional entertainments, among which local rituals, religious or not.
The term thus used did not cover in its common acceptance the cultural heritage that is literature, music, poetry, architecture, history, religions and spiritualities, and so many other human creations, without forgetting the most important of them all, languages (in the plural because no linguistic area speaks one homogeneous language but always at least several dialects of this language, but most of the time some minority languages too, or « sacred » languages, for example Pāli for religious reasons in Sri Lanka along with Sinhala, not to speak of the « touristic » languages or the tourists’ languages, and of course Tamil, or Arabic among the Moslems, for religious reasons too. Which language is more important for an individual : his real native dialect (the dialect he learned from his parents), his religious language (the language he uses to practice the religion of his belief) or the official national language of his country (and what happens when there are two or more) ? I do not have the answer to that question but I doubt very much that it may and can be simple.
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
From many years we have been using Chomsky‟s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. Syntactic parsing mainly works with syntactic structure of a sentence. The 'syntax' refers to the grammatical and syntactical arrangement of words in a sentence and their relationship with other words. The main focus of syntactic analysis is important to find syntactic structure of a sentence which usually is represented as a tree structure. To identify the syntactic structure is useful in determining the meaning of a sentence Natural language processing processes the data through lexical analysis, Syntax analysis, Semantic analysis, and Discourse processing, Pragmatic analysis. This paper gives various parsing methods. The algorithm in this paper splits the English sentences into parts using POS (Parts Of Speech) tagger, It identifies the type of sentence (Simple, Complex, Interrogate, Facts, active, passive etc.) and then parses these sentences using grammar rules of Natural language. As natural language processing becomes an increasingly relevant, there is a need for tree banks catered to the specific needs of more individualized systems. Here, we present the open source technique to check and correct the grammar. The methodology will give appropriate grammatical suggestions.
Similar to Welcome to International Journal of Engineering Research and Development (IJERD) (20)
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
Distributed Denial of Service (DDoS) Attacks became a massive threat to the Internet. Traditional
Architecture of internet is vulnerable to the attacks like DDoS. Attacker primarily acquire his army of Zombies,
then that army will be instructed by the Attacker that when to start an attack and on whom the attack should be
done. In this paper, different techniques which are used to perform DDoS Attacks, Tools that were used to
perform Attacks and Countermeasures in order to detect the attackers and eliminate the Bandwidth Distributed
Denial of Service attacks (B-DDoS) are reviewed. DDoS Attacks were done by using various Flooding
techniques which are used in DDoS attack.
The main purpose of this paper is to design an architecture which can reduce the Bandwidth
Distributed Denial of service Attack and make the victim site or server available for the normal users by
eliminating the zombie machines. Our Primary focus of this paper is to dispute how normal machines are
turning into zombies (Bots), how attack is been initiated, DDoS attack procedure and how an organization can
save their server from being a DDoS victim. In order to present this we implemented a simulated environment
with Cisco switches, Routers, Firewall, some virtual machines and some Attack tools to display a real DDoS
attack. By using Time scheduling, Resource Limiting, System log, Access Control List and some Modular
policy Framework we stopped the attack and identified the Attacker (Bot) machines
Hearing loss is one of the most common human impairments. It is estimated that by year 2015 more
than 700 million people will suffer mild deafness. Most can be helped by hearing aid devices depending on the
severity of their hearing loss. This paper describes the implementation and characterization details of a dual
channel transmitter front end (TFE) for digital hearing aid (DHA) applications that use novel micro
electromechanical- systems (MEMS) audio transducers and ultra-low power-scalable analog-to-digital
converters (ADCs), which enable a very-low form factor, energy-efficient implementation for next-generation
DHA. The contribution of the design is the implementation of the dual channel MEMS microphones and powerscalable
ADC system.
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
-A composite beam is composed of a steel beam and a slab connected by means of shear connectors
like studs installed on the top flange of the steel beam to form a structure behaving monolithically. This study
analyzes the effects of the tensile behavior of the slab on the structural behavior of the shear connection like slip
stiffness and maximum shear force in composite beams subjected to hogging moment. The results show that the
shear studs located in the crack-concentration zones due to large hogging moments sustain significantly smaller
shear force and slip stiffness than the other zones. Moreover, the reduction of the slip stiffness in the shear
connection appears also to be closely related to the change in the tensile strain of rebar according to the increase
of the load. Further experimental and analytical studies shall be conducted considering variables such as the
reinforcement ratio and the arrangement of shear connectors to achieve efficient design of the shear connection
in composite beams subjected to hogging moment.
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
Gold has been extracted from northeast Africa for more than 5000 years, and this may be the first
place where the metal was extracted. The Arabian-Nubian Shield (ANS) is an exposure of Precambrian
crystalline rocks on the flanks of the Red Sea. The crystalline rocks are mostly Neoproterozoic in age. ANS
includes the nations of Israel, Jordan. Egypt, Saudi Arabia, Sudan, Eritrea, Ethiopia, Yemen, and Somalia.
Arabian Nubian Shield Consists of juvenile continental crest that formed between 900 550 Ma, when intra
oceanic arc welded together along ophiolite decorated arc. Primary Au mineralization probably developed in
association with the growth of intra oceanic arc and evolution of back arc. Multiple episodes of deformation
have obscured the primary metallogenic setting, but at least some of the deposits preserve evidence that they
originate as sea floor massive sulphide deposits.
The Red Sea Hills Region is a vast span of rugged, harsh and inhospitable sector of the Earth with
inimical moon-like terrain, nevertheless since ancient times it is famed to be an abode of gold and was a major
source of wealth for the Pharaohs of ancient Egypt. The Pharaohs old workings have been periodically
rediscovered through time. Recent endeavours by the Geological Research Authority of Sudan led to the
discovery of a score of occurrences with gold and massive sulphide mineralizations. In the nineties of the
previous century the Geological Research Authority of Sudan (GRAS) in cooperation with BRGM utilized
satellite data of Landsat TM using spectral ratio technique to map possible mineralized zones in the Red Sea
Hills of Sudan. The outcome of the study mapped a gossan type gold mineralization. Band ratio technique was
applied to Arbaat area and a signature of alteration zone was detected. The alteration zones are commonly
associated with mineralization. The alteration zones are commonly associated with mineralization. A filed check
confirmed the existence of stock work of gold bearing quartz in the alteration zone. Another type of gold
mineralization that was discovered using remote sensing is the gold associated with metachert in the Atmur
Desert.
Reducing Corrosion Rate by Welding DesignIJERD Editor
The paper addresses the importance of welding design to prevent corrosion at steel. Welding is
used to join pipe, profiles at bridges, spindle, and a lot more part of engineering construction. The
problems happened associated with welding are common issues in these fields, especially corrosion.
Corrosion can be reduced with many methods, they are painting, controlling humidity, and also good
welding design. In the research, it can be found that reducing residual stress on the welding can be
solved in corrosion rate reduction problem.
Preheating on 500oC and 600oC give better condition to reduce corosion rate than condition after
preheating 400oC. For all welding groove type, material with 500oC and 600oC preheating after 14 days
corrosion test is 0,5%-0,69% lost. Material with 400oC preheating after 14 days corrosion test is 0,57%-0,76%
lost.
Welding groove also influence corrosion rate. X and V type welding groove give better condition to reduce
corrosion rate than use 1/2V and 1/2 X welding groove. After 14 days corrosion test, the samples with
X welding groove type is 0,5%-0,57% lost. The samples with V welding groove after 14 days corrosion test is
0,51%-0,59% lost. The samples with 1/2V and 1/2X welding groove after 14 days corrosion test is 0,58%-
0,71% lost.
Router 1X3 – RTL Design and VerificationIJERD Editor
Routing is the process of moving a packet of data from source to destination and enables messages
to pass from one computer to another and eventually reach the target machine. A router is a networking device
that forwards data packets between computer networks. It is connected to two or more data lines from different
networks (as opposed to a network switch, which connects data lines from one single network). This paper,
mainly emphasizes upon the study of router device, it‟s top level architecture, and how various sub-modules of
router i.e. Register, FIFO, FSM and Synchronizer are synthesized, and simulated and finally connected to its top
module.
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
This paper presents a component within the flexible ac-transmission system (FACTS) family, called
distributed power-flow controller (DPFC). The DPFC is derived from the unified power-flow controller (UPFC)
with an eliminated common dc link. The DPFC has the same control capabilities as the UPFC, which comprise
the adjustment of the line impedance, the transmission angle, and the bus voltage. The active power exchange
between the shunt and series converters, which is through the common dc link in the UPFC, is now through the
transmission lines at the third-harmonic frequency. DPFC multiple small-size single-phase converters which
reduces the cost of equipment, no voltage isolation between phases, increases redundancy and there by
reliability increases. The principle and analysis of the DPFC are presented in this paper and the corresponding
simulation results that are carried out on a scaled prototype are also shown.
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
Power quality has been an issue that is becoming increasingly pivotal in industrial electricity
consumers point of view in recent times. Modern industries employ Sensitive power electronic equipments,
control devices and non-linear loads as part of automated processes to increase energy efficiency and
productivity. Voltage disturbances are the most common power quality problem due to this the use of a large
numbers of sophisticated and sensitive electronic equipment in industrial systems is increased. This paper
discusses the design and simulation of dynamic voltage restorer for improvement of power quality and
reduce the harmonics distortion of sensitive loads. Power quality problem is occurring at non-standard
voltage, current and frequency. Electronic devices are very sensitive loads. In power system voltage sag,
swell, flicker and harmonics are some of the problem to the sensitive load. The compensation capability
of a DVR depends primarily on the maximum voltage injection ability and the amount of stored
energy available within the restorer. This device is connected in series with the distribution feeder at
medium voltage. A fuzzy logic control is used to produce the gate pulses for control circuit of DVR and the
circuit is simulated by using MATLAB/SIMULINK software.
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
Additive manufacturing process, also popularly known as 3-D printing, is a process where a product
is created in a succession of layers. It is based on a novel materials incremental manufacturing philosophy.
Unlike conventional manufacturing processes where material is removed from a given work price to derive the
final shape of a product, 3-D printing develops the product from scratch thus obviating the necessity to cut away
materials. This prevents wastage of raw materials. Commonly used raw materials for the process are ABS
plastic, PLA and nylon. Recently the use of gold, bronze and wood has also been implemented. The complexity
factor of this process is 0% as in any object of any shape and size can be manufactured.
Spyware triggering system by particular string valueIJERD Editor
This computer programme can be used for good and bad purpose in hacking or in any general
purpose. We can say it is next step for hacking techniques such as keylogger and spyware. Once in this system if
user or hacker store particular string as a input after that software continually compare typing activity of user
with that stored string and if it is match then launch spyware programme.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
- Data over the cloud is transferred or transmitted between servers and users. Privacy of that
data is very important as it belongs to personal information. If data get hacked by the hacker, can be
used to defame a person’s social data. Sometimes delay are held during data transmission. i.e. Mobile
communication, bandwidth is low. Hence compression algorithms are proposed for fast and efficient
transmission, encryption is used for security purposes and blurring is used by providing additional
layers of security. These algorithms are hybridized for having a robust and efficient security and
transmission over cloud storage system.
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
A thorough review of existing literature indicates that the Buckley-Leverett equation only analyzes
waterflood practices directly without any adjustments on real reservoir scenarios. By doing so, quite a number
of errors are introduced into these analyses. Also, for most waterflood scenarios, a radial investigation is more
appropriate than a simplified linear system. This study investigates the adoption of the Buckley-Leverett
equation to estimate the radius invasion of the displacing fluid during waterflooding. The model is also adopted
for a Microbial flood and a comparative analysis is conducted for both waterflooding and microbial flooding.
Results shown from the analysis doesn’t only records a success in determining the radial distance of the leading
edge of water during the flooding process, but also gives a clearer understanding of the applicability of
microbes to enhance oil production through in-situ production of bio-products like bio surfactans, biogenic
gases, bio acids etc.
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
- Gesture gaming is a method by which users having a laptop/pc/x-box play games using natural or
bodily gestures. This paper presents a way of playing free flash games on the internet using an ordinary webcam
with the help of open source technologies. Emphasis in human activity recognition is given on the pose
estimation and the consistency in the pose of the player. These are estimated with the help of an ordinary web
camera having different resolutions from VGA to 20mps. Our work involved giving a 10 second documentary to
the user on how to play a particular game using gestures and what are the various kinds of gestures that can be
performed in front of the system. The initial inputs of the RGB values for the gesture component is obtained by
instructing the user to place his component in a red box in about 10 seconds after the short documentary before
the game is finished. Later the system opens the concerned game on the internet on popular flash game sites like
miniclip, games arcade, GameStop etc and loads the game clicking at various places and brings the state to a
place where the user is to perform only gestures to start playing the game. At any point of time the user can call
off the game by hitting the esc key and the program will release all of the controls and return to the desktop. It
was noted that the results obtained using an ordinary webcam matched that of the Kinect and the users could
relive the gaming experience of the free flash games on the net. Therefore effective in game advertising could
also be achieved thus resulting in a disruptive growth to the advertising firms.
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
-LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region[5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits.
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region [5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits. The supported simulation
is done through PSIM 6.0 software tool
Amateurs Radio operator, also known as HAM communicates with other HAMs through Radio
waves. Wireless communication in which Moon is used as natural satellite is called Moon-bounce or EME
(Earth -Moon-Earth) technique. Long distance communication (DXing) using Very High Frequency (VHF)
operated amateur HAM radio was difficult. Even with the modest setup having good transceiver, power
amplifier and high gain antenna with high directivity, VHF DXing is possible. Generally 2X11 YAGI antenna
along with rotor to set horizontal and vertical angle is used. Moon tracking software gives exact location,
visibility of Moon at both the stations and other vital data to acquire real time position of moon.
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool “MS-Extractor: An Innovative Approach to extract
Microsatellites on „Y‟ Chromosome” can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome „Y‟. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Importance of Measurements in Smart GridIJERD Editor
- The need to get reliable supply, independence from fossil fuels, and capability to provide clean
energy at a fixed and lower cost, the existing power grid structure is transforming into Smart Grid. The
development of a smart energy distribution grid is a current goal of many nations. A Smart Grid should have
new capabilities such as self-healing, high reliability, energy management, and real-time pricing. This new era
of smart future grid will lead to major changes in existing technologies at generation, transmission and
distribution levels. The incorporation of renewable energy resources and distribution generators in the existing
grid will increase the complexity, optimization problems and instability of the system. This will lead to a
paradigm shift in the instrumentation and control requirements for Smart Grids for high quality, stable and
reliable electricity supply of power. The monitoring of the grid system state and stability relies on the
availability of reliable measurement of data. In this paper the measurement areas that highlight new
measurement challenges, development of the Smart Meters and the critical parameters of electric energy to be
monitored for improving the reliability of power systems has been discussed.
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
One of the major environmental concerns is the disposal of the waste materials and utilization of
industrial by products. Lime stone quarries will produce millions of tons waste dust powder every year. Having
considerable high degree of fineness in comparision to cement this material may be utilized as a partial
replacement to cement. For this purpose an experiment is conducted to investigate the possibility of using lime
stone powder in the production of SCC with combined use GGBS and how it affects the fresh and mechanical
properties of SCC. First SCC is made by replacing cement with GGBS in percentages like 10, 20, 30, 40, 50 and
by taking the optimum mix with GGBS lime stone powder is blended to mix in percentages like 5, 10, 15, 20 as
a partial replacement to cement. Test results shows that the SCC mix with combination of 30% GGBS and 15%
limestone powder gives maximum compressive strength and fresh properties are also in the limits prescribed by
the EFNARC.
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Welcome to International Journal of Engineering Research and Development (IJERD)
1. International Journal of Engineering Research and Development
eISSN : 2278-067X, pISSN : 2278-800X, www.ijerd.com
Volume 4, Issue 8 (November 2012), PP. 33-36
Design & Analysis of an Exhaustive Algorithm for Sandhi
Processing In Sanskrit
Ravi Pal1, Dr. U. C. Jaiswal2
CSED, MMM Engineering College, Gorakhpur(UP), INDIA
Abstract:––It is almost impossible to learn a new language without the study of it’s grammar .Automated language
processing is in real centrally focused to drive to enable facilitated referencing of increasingly available Sanskrit E-texts.
For learning Sanskrit language , the study of it’s grammar plays a very important role .Proposed research paper presents a
fresh and new approach to processing Sandhi-s, in terms of an exhaustive algorithm using morphological analysis of Sanskrit
language words. This new exhaustive algorithm is based on panini’s complex codifications rules of grammar. The algorithm
has simple beginning and is yet powerful, much comprehensive and more computationally lean.
Keywords:––NLP, Automated language processing, Panini’s rules of Sanskrit, MT System, UTF-8.
I. INTRODUCTION
The recognition of Sanskrit language as a highly phonetic language as also one with an extensively codified
grammar is widespread. The very name Samskṛt (Sanskrit) basically means "language brought to formal perfection" [1].
That is the Backus- Naur Form [2] used in the specification of formal languages, has now come to be popularly known as the
Paṇini’s - Backus Form [2], bears an ample testimony to this fact. Sanskrit E-texts are now being increasingly made
available for reference in repositories such as the Gottingen Register of Electronic Texts in Indian Languages (GRETIL)
[3].Now the essential first step towards language processing of such Sanskrit E-texts is to develop exhaustive algorithms and
efficient tools to handle segmentation in Sanskrit compound words that are an integral part of Sanskrit texts. And that is why
this firstly necessitates the processing of sandhi-s in Sanskrit language.
In Sanskrit language, mainly we have two categories of complex words. They are:
(i) Sandhi (ii) Samaas
1.1 Sandhi:
When two words are combined to produce a new word whose point of combination is result of annihilation of
case-end of former word and case-begin of latter, is known as sandhi and process as sandhi-formation. In short, the resulted
new character that has been created at the point of combination is exactly equivalent to the sound produced when those two
words are uttered without a pause. And the inverse procedure to Sandhi-formation is known as Sandhi - Wicched. e.g., go +
yam = gavyam , sakhe + iha = sakhayiha etc.
1.2 Samaas:
When two or more words are combined, based on their semantics then and only then the resulting word is known
as Samaas or Compound. e.g., (pANI ca, pADam) → (pANIpAdam), (dukham, atItah) → (dukhatItah) etc.. Unlike
Sandhi, the point of combination in Samaas may or may not be a deformed in the resulting word. And the inverse procedure
of break-up of a Samaas is known as Samaas-Vigraha. Considering the complexity of this problem, we restricted our focus
to only Sandhi-s.
Rest of the paper is organized as follows : Unicode representation is described in section 2, section 3 focuses on
the basis of the work, need of the sandhi processing algorithm is described in section 4, mahesvara sutra’s are given in
section 5, problem statement is given in section 6, approach used for designing algorithm is elaborated in section 7, proposed
algorithm is given in section 8, snapshots of results are in section 9 and finally conclusion and future scope is discussed in
section 10 after that references cited are given.
II. UNICODE REPRESENTATION
The Unicode (UTF-8) standard [4], is what has been adopted universally for the purpose of encoding Indian
language texts into digital format. The Unicode Consortium for text formats has assigned the Unicode hexadecimal range
0900 - 097F for Sanskrit characters. All characters including the diacritical characters used to represent Sanskrit letters in E-
texts are found dispersed across the Basic Latin (0000-007F), Latin-1 Supplement (0080-00FF), Latin Extended-A (0100-
017F) and Latin Extended Additional (1E00 – 1EFF) Unicode ranges. In this work the Latin character set is being used to
represent Sanskrit letters as E-text.
III. THE BASIS OF THE WORK
The great “Paṇini” [5], the sage and scholar dated by historians in the fourth century BC or earlier, codified the
rules of the Sanskrit language mainly based on both the extant vast literature as well as the language in prevalent use at the
time. His magnum opus, the “Aṣṭadhyayi” [5], which literally means for „work in eight chapters‟, is regarded by all scholars
33
2. Design & Analysis of an Exhaustive Algorithm…
of sanskrit as the ultimate authority on Sanskrit grammar. In four parts each, these eight chapters mainly comprise nearly
four thousand sutra-s [5] or aphorisms, terse statements in Sanskrit. This grammar codification of Paṇini is perhaps
unparalleled, for it is terse and yet comprehensive, complex yet precise. Intensive study of the matter, taking recourse to
authoritative commentaries authored by adroit grammarians, is required intense to get a grasp of the work. Many
commentaries on the ‘Aṣṭadhyayi’, such as Sage “Patanjali‟s Mahabhaṣya” [6], are available and held as authentic and
comprehensive. One such authoritative commentary with a neat, topic-wise classification of ‘Paṇini‟s aphorisms‟ [5], is the
“Siddhanta-kaumudi” [6], written in the 70’s century by the Sanskrit grammarian, “Bhaṭṭoji Dikṣita” [6]. The most
important of these aphorisms were later extracted and compiled into the “Laghu-siddhanta-kaumudi” by the scholar
“Varadaraja” [6]. It is accepted among Sanskrit scholars that any exploratory work on Sanskrit grammar must necessarily
have the aphorisms of ‘Paṇini’ as its basis, alternatively or optionally taking recourse to any of the authoritative
commentaries. Our work on Sandhi-s is being also based directly on ‘Paṇini’s aphorisms’, and not on secondary or tertiary
sources of information. The ‘Siddhanta-kaumudi’ of ‘Bhaṭṭoji Dikṣita’, famed and accepted amongst scholars as an
unabridged source. comprehensive compendium of the entire ‘Aṣṭadhyayi’, has been studied in the original Sanskrit, and the
Sandhi-s dealt with in it form the basis of our work.
IV. NEED FOR THE SANDHI PROCESSING ALGORITHM
Sandhi processing algorithm will be a very important component in any NLP system that attempts to analyze and
understand Sanskrit for computational purposes. As we know that in the architecture of a computational Sanskrit platform,
various linguistic resources such as ‘lexicon’, ‘POS Tagger’, ‘karaka analyzer’, ‘subanta analyzer’, ‘tinanta analyzer’,
‘linga analyzer’, ‘sandhi analyzer’,and ‘samaasa analyzer’ etc. will be needed. All these resources will be interlinked yet
sandhi analyzer [7] is a pre-requisite for analyzing a Sanskrit text because words in Sanskrit language are generally written
with no explicit boundaries. This sandhi processing algorithm will be useful in many ways, as Sanskrit has a vast knowledge
reserve of diverse disciplines. To make this knowledge reserve available to the users of other languages, an automatic MTS
[7] from Sanskrit language to other languages will have to be developed. ‘Sandhi Analyzer’ will be an essential initial step
for this work. The other applications of this segmented form of Sanskrit text may be in building a search algorithm and spell
checker for Sanskrit corpora. A ‘sandhi-aware’ system thus will not only be essential for any larger Sanskrit NLP system,
but will also be very helpful for self reading and understanding of Sanskrit texts by those readers who do not know or want
to go through the rigors of ‘sandhi – formation’. It will also be very much helpful for interpretation and simplification of
Sanskrit text. Any NLP system or NL like Sanskrit compiler will have ‘sandhi Analyzer’ as a necessary initial component.
V. THE MAHESVARA SUTRA‟S OR APHORISMS
The backbone of ‘Paṇini’s code’, The „Mahesvara aphorisms‟ [5], said to have come from the beats of a special
drum type instrument called ‘ḍamaru’ (hourglass drum) held in the hand of „Lord Mahesvara‟ (a form of God in the Hindu
pantheon), are a set of aphorisms containing the letters of the Sanskrit language alphabets in a certain sequence. These
aphorisms form the basis of Paṇini’s composition of his grammar aphorisms. The ‘Mahesvara aphorisms’ are fourteen in
number and are listed below:
1. a-i-u-ṇ
2. ṛ-ḷ-k
3. e-o-n
4. ai-au-c
5. ha-ya-va-ra-ṭ
6. la-n
7. na-ma-na-ṇa-na-m
8. jha-bha-n
9. gha-ḍha-dha-ṣ
10. ja-ba-ga-ḍa-da-s
11. kha-pha-cha-ṭha-tha-ca-ṭa-ta-v
12. ka-pa-y
13. sa-ṣa-sa-r
14. ha-l
It must be noted here that the last letter in each of the above aphorisms is only a place-holder and is not counted as
an actual letter of the aphorism. Here the first four aphorisms list the short forms of all the vowels, while the rest list the
consonants. It must be also noted that the letter ‘a’ added to each of the consonants is only to facilitate pronunciation and is
not part of the consonant proper.
VI. THE PROBLEM
Sandhi-s in Sanskrit are points in words or between words, at which adjacent letters coalesce and transform. This is
a common interesting feature of Indian languages and is particularly elaborately dealt with and used in Sanskrit language.
The transformations that apply are commonly categorized into four categories as follows [5]:
1. agama – addition of an extra letter or set of letters
2. adesa – substitution of one or more of the letters
3. lopa – dropping of a letter
4. prakṛtibhava – no change
34
3. Design & Analysis of an Exhaustive Algorithm…
There are near about seventy ‘aphorisms of Paṇini’ that deal with sandhi-s. These aphorisms lay out the rules for
the above transformations, giving the conditions under which certain letters combine with certain others to give particular
results.
The challenge is to develop an Exhaustive Algorithm to handle the entire range of sandhi-s. Such an exhaustive
algorithm would be useful to generate various word forms of a given Sanskrit language word through the application of
sandhi rules. Though this task is not difficult for a scholar of Sanskrit language study with a thorough knowledge of the
‘Paṇinian system’ [5], it is surly a computationally non-trivial task, given the complexity and number of rules. Existing
methods of sandhi processing, be they methods to form compound words or even to try to split them, seem to be based on a
derived understanding of the functioning of Sandhi-s, and usually go the finite automata [8]. However, the present work
directly codifies Paṇini’s rules as is, recognizing that Paṇini’s codification of the grammar is based on the ‘Mahesvara
aphorisms’ that in turn lay out the letters of the alphabet in a non-trivial order. This work presents one novel method of
directly representing ‘Paṇini’s sandhi rules’.
VII. THE APPROACH FOR DESIGNING ALGORITHM
VIII. PROPOSED ALGORITHM
1. take a Sanskrit word input
{ str(len) = sans_wrd_1 }
2. take another Sanskrit word as input
{ str(len) = sans_wrd_2 }
3. now we have two words : left, right
{ left = sans_wrd_1 & right = sans_wrd_2 }
4. set a variable flag = false
{ flag = 0 }
5. read a rule from the rule base named as suffix of left and prefix of right .
{ lft = suff_left & rgt = prfix_right }
6. try applying each of the sandhi rules on left and right.
7. if one or more rules listed in the corresponding rulebase is matched then set flag = true
{ if (match( str ) =1) do flag =1 }
8. display the resultant word formed as the compound word with comments.
IX. RESULTS SNAPSHOTS
Figure 1: Snapshot for voval sandhi Figure 2: Snapshot for consonant sandhi
35
4. Design & Analysis of an Exhaustive Algorithm…
X. CONCLUSION AND FUTURE SCOPE
In this paper we have focused only on sandhi-formation and presented a workable solution in terms of an
exhaustive algorithm.In the machine translation of Sanskrit language words we need to perform sandhi formation of the
input words that are given to us. Thus at the very first stage of morphological parsing, there is a need for us to consider the
sandhi formation of words. This way the proposed paper presented helps in fulfilling one of the very basic needs of the
Sanskrit translators. The algorithm designed has been generalized to handle any number of possible subwords for particular
inputs. The algorithm described may be enhanced to take care of all possible types of Sandhi-s by elaborating the rules given
for the Sandhi Rule.
REFERENCESS
[1]. Higher Sanskrit Grammar,By M. R. Kale, Motilal Banarasi Dass Publishers.
[2]. The Panini-Backus Form in Syntax of Formal Languages, By Rao T. R. N., Kak Subhash, Center for Advanced
Computer Studies, University of Southwestern Louisiana, 1998.
[3]. Gottingen Register of Electronic Texts in Indian Languages (GRETIL), www.sub.uni-goettingen.de/ebene_1
/fiindolo/gretil.htm.
[4]. UTF-8 encoding table and Unicode characters, http://www.utf8-chartable.de/
[5]. Laghu-siddhanta-kaumudi, By Varadarāja, Translated with commentary by Panḍit Visvanatha Sastri Prabhakara,
Motilal Banarsidas Publishers, Delhi,1989.
[6]. Siddhanta-kaumudi, By Diksita Bhattoji, Translated by Srisa Candra Vasu, Volume 1, Motilal Banarsidas
Publishers, Delhi, 1962.
[7]. Developing a Sanskrit Analysis System for Machine Translation, By Girish N. Jha, Sudhir K. Mishra, R.
Chandrashekar,Priti Bhowmik, ,Special Center for Sanskrit Studies,Jawaharlal Nehru University, New Delhi,
http://sanskrit.jnu.ac.in/subanta/Paper/Kerala.PDF
[8]. From Pāninian Sandhi to Finite State Calculus, By Hyman Malcolm D., Sanskrit Computational Linguistics:
First and Second International Symposia, Revised Selected and Invited Papers, ISBN:978-3-642-00154-3,
Springer-Verlag, 2009
Ravi Pal obtained B. Tech in Information Technology (Gold Medalist) from Dr Ram Manohar Lohia
Avadh University,Faizabad (UP) in 2010.Presently, doing M.Tech(CSE) from MMM Engineering College, Gorakhpur .He
has abiding passion for teaching and has interest to teach a number of courses.mail at: raviratnakerpal@gmail.com
Dr Umesh Chandra Jaiswal is working as Associate Professor in the Department of Computer Science
and Engineering, MMM Engineering College, Gorakhpur, UP(INDIA). His area of interest is Natural Language Processing,
Design and Analysis of Algorithms, Operating Systems, and advanced Computing. He has published good number of papers
in various journals, international and national conferences.mail at ucj_jaiswal@yahoo.com
36