Phonetic matching plays an important role in multilingual information retrieval, where data is
manipulated in multiple languages and user need information in their native language which may be
different from the language where data has been maintained. In such an environment, we need a system
which matches the strings phonetically in any case. Once strings match, we can retrieve the information
irrespective of languages. In this paper, we proposed an approach which matches the strings either in
Hindi or Marathi or in cross-language in order to retrieve information. We compared our proposed
method with soundex and Q-gram methods. We found better results as compared to these approaches.
This document summarizes a student's thesis project on developing a spell checker for Bengali language. It presents the outline, describes previous related work, and explains the algorithms used - string matching, edit distance, Soundex, and Metaphone. An experiment was conducted using these algorithms with PHP, HTML, CSS and JavaScript. String matching achieved 99% accuracy, edit distance 50%, Soundex 80%, and Metaphone 85% after combining the algorithms. The conclusion is that cross-checking words with these algorithms can accurately identify misspelled words and suggest replacements, helping people use Bangla language more easily.
This document presents two new approaches for aligning sentences in parallel English-Arabic corpora: mathematical regression (MR) and genetic algorithm (GA) classifiers. Feature vectors containing text features like length, punctuation score, and cognate score are extracted from sentence pairs and used to train the MR and GA models on manually aligned training data. The trained models are then tested on additional sentence pairs, achieving better results than a baseline length-based approach. The methods can be applied to any language pair by modifying the feature vector.
This document is a preprint that summarizes a paper on developing a spell checker model using automata. It discusses using fuzzy automata rather than finite automata for improved string comparison. The proposed spell checker would incorporate autosuggestion features into a Windows application. It would use techniques like edit distance and tries to store dictionaries and suggest corrections. The paper outlines the design of the spell checker, discussing functions like comparing words to a dictionary and considering morphology. Advantages like improved accuracy and speed are discussed along with potential disadvantages like inability to detect all errors.
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSijaia
This document summarizes an investigation into improving the performance of a sampling-based alignment method for statistical machine translation. It proposes two contributions: 1) A method to enforce alignment of n-grams in distinct translation subtables to increase the number of longer n-grams, and 2) Examining combining phrase translation tables from the sampling method and MGIZA++, finding it slightly outperforms MGIZA++ alone and helps reduce out-of-vocabulary words. The method divides the parallel corpus into "unigramized" source-target n-gram subtables, runs the sampling aligner on each, and merges the subtables' phrase tables.
Improvement of Soundex Algorithm for Indian Language Based on Phonetic MatchingIJCSEA Journal
In real-world applications, service robots need to locate and identify objects in a scene. A range sensor provides a robust estimate of depth information, which is useful to accurately locate objects in a scene. On the other hand, color information is an important property for object recognition task. The objective of this paper is to detect and localize multiple objects within an image using both range and color features. The proposed method uses 3D shape features to generate promising hypotheses within range images and verifies these hypotheses by using features obtained from both range and color images.
A Survey of String Matching AlgorithmsIJERA Editor
The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities.
This document summarizes various stemming algorithms used for information retrieval. It discusses rule-based stemming algorithms like the Porter stemmer and Lovins stemmer which use language rules to remove suffixes and extract word stems. It also describes statistical stemming methods which use machine learning on text corpora to identify morphological variants. Finally, it analyzes different techniques for conflating words during stemming, such as affix removal, successor variety, table lookup, and n-gram methods.
This document summarizes a student's thesis project on developing a spell checker for Bengali language. It presents the outline, describes previous related work, and explains the algorithms used - string matching, edit distance, Soundex, and Metaphone. An experiment was conducted using these algorithms with PHP, HTML, CSS and JavaScript. String matching achieved 99% accuracy, edit distance 50%, Soundex 80%, and Metaphone 85% after combining the algorithms. The conclusion is that cross-checking words with these algorithms can accurately identify misspelled words and suggest replacements, helping people use Bangla language more easily.
This document presents two new approaches for aligning sentences in parallel English-Arabic corpora: mathematical regression (MR) and genetic algorithm (GA) classifiers. Feature vectors containing text features like length, punctuation score, and cognate score are extracted from sentence pairs and used to train the MR and GA models on manually aligned training data. The trained models are then tested on additional sentence pairs, achieving better results than a baseline length-based approach. The methods can be applied to any language pair by modifying the feature vector.
This document is a preprint that summarizes a paper on developing a spell checker model using automata. It discusses using fuzzy automata rather than finite automata for improved string comparison. The proposed spell checker would incorporate autosuggestion features into a Windows application. It would use techniques like edit distance and tries to store dictionaries and suggest corrections. The paper outlines the design of the spell checker, discussing functions like comparing words to a dictionary and considering morphology. Advantages like improved accuracy and speed are discussed along with potential disadvantages like inability to detect all errors.
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
AN INVESTIGATION OF THE SAMPLING-BASED ALIGNMENT METHOD AND ITS CONTRIBUTIONSijaia
This document summarizes an investigation into improving the performance of a sampling-based alignment method for statistical machine translation. It proposes two contributions: 1) A method to enforce alignment of n-grams in distinct translation subtables to increase the number of longer n-grams, and 2) Examining combining phrase translation tables from the sampling method and MGIZA++, finding it slightly outperforms MGIZA++ alone and helps reduce out-of-vocabulary words. The method divides the parallel corpus into "unigramized" source-target n-gram subtables, runs the sampling aligner on each, and merges the subtables' phrase tables.
Improvement of Soundex Algorithm for Indian Language Based on Phonetic MatchingIJCSEA Journal
In real-world applications, service robots need to locate and identify objects in a scene. A range sensor provides a robust estimate of depth information, which is useful to accurately locate objects in a scene. On the other hand, color information is an important property for object recognition task. The objective of this paper is to detect and localize multiple objects within an image using both range and color features. The proposed method uses 3D shape features to generate promising hypotheses within range images and verifies these hypotheses by using features obtained from both range and color images.
A Survey of String Matching AlgorithmsIJERA Editor
The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities.
This document summarizes various stemming algorithms used for information retrieval. It discusses rule-based stemming algorithms like the Porter stemmer and Lovins stemmer which use language rules to remove suffixes and extract word stems. It also describes statistical stemming methods which use machine learning on text corpora to identify morphological variants. Finally, it analyzes different techniques for conflating words during stemming, such as affix removal, successor variety, table lookup, and n-gram methods.
A survey of Stemming Algorithms for Information Retrievaliosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Improvement of soundex algorithm for indian language based on phonetic matchingIJCSEA Journal
In a system with a large database, there always has been a problem that names may not be spelled well or
might not be spelled in a way that one expected. So, data in the database gets degraded. In this case it is
required to search the duplicates and merge them in the single entity. In doing so, one problem is that the
way in which the strings would be compared. In such cases rather than looking for exact match,
approximate string matching would be appreciable. One of the string matching techniques is Phonetic
matching which is used to compare the name based on the pronunciation of the words. The similar
sounding words could be retrieved from the large database using different phonetic matching algorithm
and best known algorithm is Soundex algorithm. Phonetic matching is needed when many people from
different culture come together. They either speak with different pronunciation or their writing habits are
different. This scenario is very common in India, as we have many different languages like Hindi, Gujarati,
Marathi, Tamil etc. In this research work Soundex algorithm is used for Hindi and Gujarati language and
applied on the names along with their variations in order to retrieve the output with minimum false hits.
An Application of Pattern matching for Motif IdentificationCSCJournals
Pattern matching is one of the central and most widely studied problem in theoretical computer science. Solutions to the problem play an important role in many areas of science and information processing. Its performance has great impact on many applications including database query, text processing and DNA sequence analysis. In general Pattern matching algorithms are based on the shift value, the direction of the sliding window and the order in which comparisons are made. The performance of the algorithms can be enhanced to a great extent by a larger shift value and less number of comparison to get the shift value. In this paper we proposed an algorithm, for finding motif in DNA sequence. The algorithm is based on preprocessing of the pattern string(motif) by considering four consecutive nucleotides of the DNA that immediately follow the aligned pattern window in an event of mismatch between pattern(motif) and DNA sequence .Theoretically, we found the proposed algorithms work efficiently for motif identification.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
From many years we have been using Chomsky‟s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. Syntactic parsing mainly works with syntactic structure of a sentence. The 'syntax' refers to the grammatical and syntactical arrangement of words in a sentence and their relationship with other words. The main focus of syntactic analysis is important to find syntactic structure of a sentence which usually is represented as a tree structure. To identify the syntactic structure is useful in determining the meaning of a sentence Natural language processing processes the data through lexical analysis, Syntax analysis, Semantic analysis, and Discourse processing, Pragmatic analysis. This paper gives various parsing methods. The algorithm in this paper splits the English sentences into parts using POS (Parts Of Speech) tagger, It identifies the type of sentence (Simple, Complex, Interrogate, Facts, active, passive etc.) and then parses these sentences using grammar rules of Natural language. As natural language processing becomes an increasingly relevant, there is a need for tree banks catered to the specific needs of more individualized systems. Here, we present the open source technique to check and correct the grammar. The methodology will give appropriate grammatical suggestions.
Cross lingual similarity discrimination with translation characteristicsijaia
This document summarizes a research paper on cross-lingual similarity discrimination using translation characteristics. The paper proposes a discriminative model trained on bilingual corpora to classify sentences in a target language as similar or dissimilar to a given sentence in a source language. Features used in the model include translation characteristics like sentence length ratios, word alignments, and polarity. The model is trained on various sampling methods to address the imbalanced data of having many more negative samples than positive translations. Experiments on 1500 English-Chinese sentence pairs show the model achieves satisfactory performance according to three evaluation metrics, outperforming a baseline system.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
Part of Speech tagging in Indian Languages is still an open problem. We still lack a clear approach in implementing a POS tagger for Indian Languages. In this paper we describe our efforts to build a Hidden Markov Model based Part of Speech Tagger. We have used IL POS tag set for the development of this tagger. We have achieved the accuracy of 92%
Two Level Disambiguation Model for Query TranslationIJECEIAES
Selection of the most suitable translation among all translation candidates returned by bilingual dictionary has always been quiet challenging task for any cross language query translation. Researchers have frequently tried to use word co-occurrence statistics to determine the most probable translation for user query. Algorithms using such statistics have certain shortcomings, which are focused in this paper. We propose a novel method for ambiguity resolution, named „two level disambiguation model‟. At first level disambiguation, the model properly weighs the importance of translation alternatives of query terms obtained from the dictionary. The importance factor measures the probability of a translation candidate of being selected as the final translation of a query term. This removes the problem of taking binary decision for translation candidates. At second level disambiguation, the model targets the user query as a single concept and deduces the translation of all query terms simultaneously, taking into account the weights of translation alternatives also. This is contrary to previous researches which select translation for each word in source language query independently. The experimental result with English-Hindi cross language information retrieval shows that the proposed two level disambiguation model achieved 79.53% and 83.50% of monolingual translation and 21.11% and 17.36% improvement compared to greedy disambiguation strategies in terms of MAP for short and long queries respectively.
Paraphrasing refers to writing that either differs in its textual content or is dissimilar in rearrangement of words
but conveys the same meaning. Identifying a paraphrase is exceptionally important in various real life applications
such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount
of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no
existing system to identify paraphrases in Marathi. This is the first such endeavor in the Marathi Language.
A paraphrase has differently structured sentences, and since Marathi is a semantically strong language, this
system is designed for checking both statistical and semantic similarities of Marathi sentences. Statistical similarity
measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data
is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and worddistance. Universal Networking Language (UNL) speaks about the semantic significance in sentence without any
syntactic points of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for two
Marathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculated
after joining statistical and semantic similarity scores, which gives a judgment on whether there is paraphrase or
non-paraphrase about the Marathi sentences in question.
This document summarizes an approach for identifying word translations from non-parallel (unrelated) English and German corpora. It uses a co-occurrence clue, assuming words that frequently co-occur in one language will have translations that co-occur frequently in the other language. The approach computes association vectors for words based on log-likelihood ratios of co-occurrence counts within a 3-word window. It determines the translation of an unknown German word by finding the most similar association vector in an English co-occurrence matrix. Evaluation on 100 test words showed an accuracy of around 72%, a significant improvement over previous methods for non-parallel texts.
The document presents a knowledge-based method for measuring semantic similarity between texts. It combines word-to-word semantic similarity metrics with information about word specificity to calculate a text-to-text similarity score. An example application shows how word similarity scores from WordNet are combined using the Wu & Palmer metric to determine the semantic similarity between two text segments. The method is evaluated on paraphrase identification tasks and shown to outperform approaches based only on lexical matching.
This document describes a rule-based machine translation system for translating English text to Telugu. It discusses the challenges of developing such a system, including differences in grammar between the two languages. An algorithm is proposed that uses rules, probabilities, and rough sets to classify sentences and select the best word translations. The system works by tokenizing English sentences, tagging the words with parts of speech, looking up word translations in a bilingual dictionary, and concatenating the Telugu words to form the output sentence.
The document describes a system for semantic textual similarity (STS) that uses various techniques to estimate the semantic similarity between texts. The system combines lexical, syntactic, and semantic information sources using state-of-the-art algorithms. In SemEval 2016 tasks, the system achieved a mean Pearson correlation of 75.7% on the monolingual English task and 86.3% on the cross-lingual Spanish-English task, ranking first in the cross-lingual task. The system utilizes techniques such as word embeddings, paragraph vectors, tree-structured LSTMs, and word alignment to capture semantic similarity.
This document presents an overview of spell checking techniques in natural language processing. It discusses how spell checkers work by scanning text, comparing words to a dictionary, and using language-dependent algorithms. Two categories of spelling errors are described: real-word errors involving correctly spelled words and non-word errors containing no dictionary words. Techniques for error detection include dictionary lookup and n-gram comparisons using the Jaccard coefficient. The Levenshtein distance and Jaccard coefficient algorithms are then explained and shown to provide suggestions by calculating the edit distance between source and target words. The presentation concludes that these algorithms filter dictionary words and provide accurate suggestions to correct spelling mistakes in text.
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING mlaij
The Volume of text resources have been increasing in digital libraries and internet. Organizing these text documents has become a practical need. For organizing great number of objects into small or minimum number of coherent groups automatically, Clustering technique is used. These documents are widely used for information retrieval and Natural Language processing tasks. Different Clustering algorithms require a metric for quantifying how dissimilar two given documents are. This difference is often measured by similarity measure such as Euclidean distance, Cosine similarity etc. The similarity measure process in text
mining can be used to identify the suitable clustering algorithm for a specific problem. This survey discusses the existing works on text similarity by partitioning them into three significant approaches; String-based, Knowledge based and Corpus-based similarities.
HMM Classifier for Human Activity RecognitionCSEIJJournal
The rapid improvement in technology causes more attention towards to Recognizing of human activities
from video. These new technological growth has made vision-based research much more interesting and
efficient than ever before. This paper present novel HMM (Hidden Markov Model) based approach for
Human activity recognition from video. There are different approaches of HMM to recognize action of
human from video. Like threshold and voting to automatically and effectively segment and recognize
complex activities, segment and recognize complex activities and for simple activities we use Elman
Network (EN) and two hybrids of Neural Network (NN) and HMM, i.e. HMM-NN and NN-HMM.
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
More Related Content
Similar to Rule-Based Phonetic Matching Approach for Hindi and Marathi
A survey of Stemming Algorithms for Information Retrievaliosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Improvement of soundex algorithm for indian language based on phonetic matchingIJCSEA Journal
In a system with a large database, there always has been a problem that names may not be spelled well or
might not be spelled in a way that one expected. So, data in the database gets degraded. In this case it is
required to search the duplicates and merge them in the single entity. In doing so, one problem is that the
way in which the strings would be compared. In such cases rather than looking for exact match,
approximate string matching would be appreciable. One of the string matching techniques is Phonetic
matching which is used to compare the name based on the pronunciation of the words. The similar
sounding words could be retrieved from the large database using different phonetic matching algorithm
and best known algorithm is Soundex algorithm. Phonetic matching is needed when many people from
different culture come together. They either speak with different pronunciation or their writing habits are
different. This scenario is very common in India, as we have many different languages like Hindi, Gujarati,
Marathi, Tamil etc. In this research work Soundex algorithm is used for Hindi and Gujarati language and
applied on the names along with their variations in order to retrieve the output with minimum false hits.
An Application of Pattern matching for Motif IdentificationCSCJournals
Pattern matching is one of the central and most widely studied problem in theoretical computer science. Solutions to the problem play an important role in many areas of science and information processing. Its performance has great impact on many applications including database query, text processing and DNA sequence analysis. In general Pattern matching algorithms are based on the shift value, the direction of the sliding window and the order in which comparisons are made. The performance of the algorithms can be enhanced to a great extent by a larger shift value and less number of comparison to get the shift value. In this paper we proposed an algorithm, for finding motif in DNA sequence. The algorithm is based on preprocessing of the pattern string(motif) by considering four consecutive nucleotides of the DNA that immediately follow the aligned pattern window in an event of mismatch between pattern(motif) and DNA sequence .Theoretically, we found the proposed algorithms work efficiently for motif identification.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
Document similarity is an important part of Natural Language Processing and is most commonly used for
plagiarism-detection and text summarization. Thus, finding the overall most effective document similarity
algorithm could have a major positive impact on the field of Natural Language Processing. This report sets
out to examine the numerous document similarity algorithms, and determine which ones are the most
useful. It addresses the most effective document similarity algorithm by categorizing them into 3 types of
document similarity algorithms: statistical algorithms, neural networks, and corpus/knowledge-based
algorithms. The most effective algorithms in each category are also compared in our work using a series of
benchmark datasets and evaluations that test every possible area that each algorithm could be used in.
Implementation Of Syntax Parser For English Language Using Grammar RulesIJERA Editor
From many years we have been using Chomsky‟s generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. Syntactic parsing mainly works with syntactic structure of a sentence. The 'syntax' refers to the grammatical and syntactical arrangement of words in a sentence and their relationship with other words. The main focus of syntactic analysis is important to find syntactic structure of a sentence which usually is represented as a tree structure. To identify the syntactic structure is useful in determining the meaning of a sentence Natural language processing processes the data through lexical analysis, Syntax analysis, Semantic analysis, and Discourse processing, Pragmatic analysis. This paper gives various parsing methods. The algorithm in this paper splits the English sentences into parts using POS (Parts Of Speech) tagger, It identifies the type of sentence (Simple, Complex, Interrogate, Facts, active, passive etc.) and then parses these sentences using grammar rules of Natural language. As natural language processing becomes an increasingly relevant, there is a need for tree banks catered to the specific needs of more individualized systems. Here, we present the open source technique to check and correct the grammar. The methodology will give appropriate grammatical suggestions.
Cross lingual similarity discrimination with translation characteristicsijaia
This document summarizes a research paper on cross-lingual similarity discrimination using translation characteristics. The paper proposes a discriminative model trained on bilingual corpora to classify sentences in a target language as similar or dissimilar to a given sentence in a source language. Features used in the model include translation characteristics like sentence length ratios, word alignments, and polarity. The model is trained on various sampling methods to address the imbalanced data of having many more negative samples than positive translations. Experiments on 1500 English-Chinese sentence pairs show the model achieves satisfactory performance according to three evaluation metrics, outperforming a baseline system.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract A usage of regular expressions to search text is well known and understood as a useful technique. Regular Expressions are generic representations for a string or a collection of strings. Regular expressions (regexps) are one of the most useful tools in computer science. NLP, as an area of computer science, has greatly benefitted from regexps: they are used in phonology, morphology, text analysis, information extraction, & speech recognition. This paper helps a reader to give a general review on usage of regular expressions illustrated with examples from natural language processing. In addition, there is a discussion on different approaches of regular expression in NLP. Keywords— Regular Expression, Natural Language Processing, Tokenization, Longest common subsequence alignment, POS tagging
----------------------------
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
Part of Speech tagging in Indian Languages is still an open problem. We still lack a clear approach in implementing a POS tagger for Indian Languages. In this paper we describe our efforts to build a Hidden Markov Model based Part of Speech Tagger. We have used IL POS tag set for the development of this tagger. We have achieved the accuracy of 92%
Two Level Disambiguation Model for Query TranslationIJECEIAES
Selection of the most suitable translation among all translation candidates returned by bilingual dictionary has always been quiet challenging task for any cross language query translation. Researchers have frequently tried to use word co-occurrence statistics to determine the most probable translation for user query. Algorithms using such statistics have certain shortcomings, which are focused in this paper. We propose a novel method for ambiguity resolution, named „two level disambiguation model‟. At first level disambiguation, the model properly weighs the importance of translation alternatives of query terms obtained from the dictionary. The importance factor measures the probability of a translation candidate of being selected as the final translation of a query term. This removes the problem of taking binary decision for translation candidates. At second level disambiguation, the model targets the user query as a single concept and deduces the translation of all query terms simultaneously, taking into account the weights of translation alternatives also. This is contrary to previous researches which select translation for each word in source language query independently. The experimental result with English-Hindi cross language information retrieval shows that the proposed two level disambiguation model achieved 79.53% and 83.50% of monolingual translation and 21.11% and 17.36% improvement compared to greedy disambiguation strategies in terms of MAP for short and long queries respectively.
Paraphrasing refers to writing that either differs in its textual content or is dissimilar in rearrangement of words
but conveys the same meaning. Identifying a paraphrase is exceptionally important in various real life applications
such as Information Retrieval, Plagiarism Detection, Text Summarization and Question Answering. A large amount
of work in Paraphrase Detection has been done in English and many Indian Languages. However, there is no
existing system to identify paraphrases in Marathi. This is the first such endeavor in the Marathi Language.
A paraphrase has differently structured sentences, and since Marathi is a semantically strong language, this
system is designed for checking both statistical and semantic similarities of Marathi sentences. Statistical similarity
measure does not need any prior knowledge as it is only based on the factual data of sentences. The factual data
is calculated on the basis of the degree of closeness between the word-set, word-order, word-vector and worddistance. Universal Networking Language (UNL) speaks about the semantic significance in sentence without any
syntactic points of interest. Hence, the semantic similarity calculated on the basis of generated UNL graphs for two
Marathi sentences renders semantic equality of two Marathi sentences. The total paraphrase score was calculated
after joining statistical and semantic similarity scores, which gives a judgment on whether there is paraphrase or
non-paraphrase about the Marathi sentences in question.
This document summarizes an approach for identifying word translations from non-parallel (unrelated) English and German corpora. It uses a co-occurrence clue, assuming words that frequently co-occur in one language will have translations that co-occur frequently in the other language. The approach computes association vectors for words based on log-likelihood ratios of co-occurrence counts within a 3-word window. It determines the translation of an unknown German word by finding the most similar association vector in an English co-occurrence matrix. Evaluation on 100 test words showed an accuracy of around 72%, a significant improvement over previous methods for non-parallel texts.
The document presents a knowledge-based method for measuring semantic similarity between texts. It combines word-to-word semantic similarity metrics with information about word specificity to calculate a text-to-text similarity score. An example application shows how word similarity scores from WordNet are combined using the Wu & Palmer metric to determine the semantic similarity between two text segments. The method is evaluated on paraphrase identification tasks and shown to outperform approaches based only on lexical matching.
This document describes a rule-based machine translation system for translating English text to Telugu. It discusses the challenges of developing such a system, including differences in grammar between the two languages. An algorithm is proposed that uses rules, probabilities, and rough sets to classify sentences and select the best word translations. The system works by tokenizing English sentences, tagging the words with parts of speech, looking up word translations in a bilingual dictionary, and concatenating the Telugu words to form the output sentence.
The document describes a system for semantic textual similarity (STS) that uses various techniques to estimate the semantic similarity between texts. The system combines lexical, syntactic, and semantic information sources using state-of-the-art algorithms. In SemEval 2016 tasks, the system achieved a mean Pearson correlation of 75.7% on the monolingual English task and 86.3% on the cross-lingual Spanish-English task, ranking first in the cross-lingual task. The system utilizes techniques such as word embeddings, paragraph vectors, tree-structured LSTMs, and word alignment to capture semantic similarity.
This document presents an overview of spell checking techniques in natural language processing. It discusses how spell checkers work by scanning text, comparing words to a dictionary, and using language-dependent algorithms. Two categories of spelling errors are described: real-word errors involving correctly spelled words and non-word errors containing no dictionary words. Techniques for error detection include dictionary lookup and n-gram comparisons using the Jaccard coefficient. The Levenshtein distance and Jaccard coefficient algorithms are then explained and shown to provide suggestions by calculating the edit distance between source and target words. The presentation concludes that these algorithms filter dictionary words and provide accurate suggestions to correct spelling mistakes in text.
A SURVEY ON SIMILARITY MEASURES IN TEXT MINING mlaij
The Volume of text resources have been increasing in digital libraries and internet. Organizing these text documents has become a practical need. For organizing great number of objects into small or minimum number of coherent groups automatically, Clustering technique is used. These documents are widely used for information retrieval and Natural Language processing tasks. Different Clustering algorithms require a metric for quantifying how dissimilar two given documents are. This difference is often measured by similarity measure such as Euclidean distance, Cosine similarity etc. The similarity measure process in text
mining can be used to identify the suitable clustering algorithm for a specific problem. This survey discusses the existing works on text similarity by partitioning them into three significant approaches; String-based, Knowledge based and Corpus-based similarities.
Similar to Rule-Based Phonetic Matching Approach for Hindi and Marathi (20)
HMM Classifier for Human Activity RecognitionCSEIJJournal
The rapid improvement in technology causes more attention towards to Recognizing of human activities
from video. These new technological growth has made vision-based research much more interesting and
efficient than ever before. This paper present novel HMM (Hidden Markov Model) based approach for
Human activity recognition from video. There are different approaches of HMM to recognize action of
human from video. Like threshold and voting to automatically and effectively segment and recognize
complex activities, segment and recognize complex activities and for simple activities we use Elman
Network (EN) and two hybrids of Neural Network (NN) and HMM, i.e. HMM-NN and NN-HMM.
IOT SOLUTIONS FOR SMART PARKING- SIGFOX TECHNOLOGYCSEIJJournal
Sigfox technology has emerged as a competitive product in the communication service provider market for
approximately a decade. Widely implemented for smart parking solutions across various European
countries, it has now gained traction in Germany as well. The technology's successful track record and
reputation in the market demonstrate its effectiveness and reliability in addressing the communication
needs of IoT applications, particularly in the context of vehicle parking systems. This is noted in terms of a
city like Berlin-Germany, for on which the study is conducted. The major challenge being on how to relate
the parking techniques in a more user friendly, cost effective and less energy consumpmti0n mode where
the questions had at the beginning of the paper, relatively at the end the answers are sought to it via Sigfox
and its comparison with other related technologies like LoRA WAN and weightless. But more so future
areas of research study is also pointed out on areas which are not clearly identified in this particular
research area.
This paper entails the pros, cons adaptive, emerging and existing technology study in terms of cloud, big
data, Data analytics are all discussed in tandem to Sigfox.
Reliability Improvement with PSP of Web-Based Software ApplicationsCSEIJJournal
In diverse industrial and academic environments, the quality of the software has been evaluated using
different analytic studies. The contribution of the present work is focused on the development of a
methodology in order to improve the evaluation and analysis of the reliability of web-based software
applications. The Personal Software Process (PSP) was introduced in our methodology for improving the
quality of the process and the product. The Evaluation + Improvement (Ei) process is performed in our
methodology to evaluate and improve the quality of the software system. We tested our methodology in a
web-based software system and used statistical modeling theory for the analysis and evaluation of the
reliability. The behavior of the system under ideal conditions was evaluated and compared against the
operation of the system executing under real conditions. The results obtained demonstrated the
effectiveness and applicability of our methodology
DATA MINING FOR STUDENTS’ EMPLOYABILITY PREDICTIONCSEIJJournal
This study has been undertaken to predict the student employability.Assessing student employability
provides a method of integrating student abilities and employer business requirements, which is becoming
an increasingly important concern for academic institutions. Improving student evaluation techniques for
employability can help students to have a better understanding of business organizations and find the right
one for them. The data for the training classification models is gathered through a survey in which students
are asked to fill out a questionnaire in which they may indicate their abilities and academic achievement.
This information may be used to determine their competency in a variety of skill categories, including soft
skills, problem-solving skills and technical abilities and so on.The goal of this research is to use data
mining to predict student employability by considering different factors such as skills that the students have
gained during their diploma level and time duration with respect to the knowledge they have captured
when they expect the placement at the end of graduation. Further during this research most specific skills
with relevant to each job category also was identified. In this research for the prediction of the student
employability different data mining models such as such as KNN, Naive Bayer’s, and Decision Tree were
evaluated and out of that best model also was identified for this institute's student’s employability
prediction.So, in this research classification and association techniques were used and evaluated.
Call for Articles - Computer Science & Engineering: An International Journal ...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
A Complexity Based Regression Test Selection StrategyCSEIJJournal
Software is unequivocally the foremost and indispensable entity in this technologically driven world.
Therefore quality assurance, and in particular, software testing is a crucial step in the software
development cycle. This paper presents an effective test selection strategy that uses a Spectrum of
Complexity Metrics (SCM). Our aim in this paper is to increase the efficiency of the testing process by
significantly reducing the number of test cases without having a significant drop in test effectiveness. The
strategy makes use of a comprehensive taxonomy of complexity metrics based on the product level (class,
method, statement) and its characteristics.We use a series of experiments based on three applications with
a significant number of mutants to demonstrate the effectiveness of our selection strategy.For further
evaluation, we compareour approach to boundary value analysis. The results show the capability of our
approach to detect mutants as well as the seeded errors.
XML Encryption and Signature for Securing Web ServicesCSEIJJournal
In this research, we have focused on the most challenging issue that Web Services face, i.e. how to secure
their information. Web Services security could be guaranteed by employing security standards, which is the
main focus of this search. Every suggested model related to security design should put in the account the
securities' objectives; integrity, confidentiality, non- repudiation, authentication, and authorization. The
proposed model describes SOAP messages and the way to secure their contents. Due to the reason that
SOAP message is the core of the exchanging information in Web Services, this research has developed a
security model needed to ensure e-business security. The essence of our model depends on XML encryption
and XML signature to encrypt and sign SOAP message. The proposed model looks forward to achieve a
high speed of transaction and a strong level of security without jeopardizing the performance of
transmission information.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Paper Submission - Computer Science & Engineering: An International Journal (...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Performance Comparison of PCA,DWT-PCA And LWT-PCA for Face Image RetrievalCSEIJJournal
This paper compares the performance of face image retrieval system based on discrete wavelet transforms
and Lifting wavelet transforms with principal component analysis (PCA). These techniques are
implemented and their performances are investigated using frontal facial images from the ORL database.
The Discrete Wavelet Transform is effective in representing image features and is suitable in Face image
retrieval, it still encounters problems especially in implementation; e.g. Floating point operation and
decomposition speed. We use the advantages of lifting scheme, a spatial approach for constructing wavelet
filters, which provides feasible alternative for problems facing its classical counterpart. Lifting scheme has
such intriguing properties as convenient construction, simple structure, integer-to-integer transform, low
computational complexity as well as flexible adaptivity, revealing its potentials in Face image retrieval.
Comparing to PCA and DWT with PCA, Lifting wavelet transform with PCA gives less computation and
DWT-PCA gives high retrieval rate..
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Data security and privacy are important to prevent the re-
veal, modification and unauthorized usage of sensitive information. The
introduction of using critical power devices for internet of things (IoTs),
e-commerce, e-payment, and wireless sensor networks (WSNs) has brought
a new challenge of security due to the low computation capability of sen-
sors. Therefore, the lightweight authenticated key agreement protocols
are important to protect their security and privacy. Several researches
have been published about authenticated key agreement. However, there
is a need of lightweight schemes that can fit with critical capability de-
vices. Addition to that, a malicious key generation center (KGC) can
become a threat to watch other users, i.e impersonate user by causing
the key escrow problem
Call for Papers - Computer Science & Engineering: An International Journal (C...CSEIJJournal
Computer Science & Engineering: An International Journal (CSEIJ) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Computer Science & Computer Engineering. The journal is devoted to the publication of high quality papers on theoretical and practical aspects of computer science and computer Engineering.
Recommendation System for Information Services Adapted, Over Terrestrial Digi...CSEIJJournal
The development of digital television in Colombia has grown in last year’s, specially the digital terrestrial
television (DTT), which is an essential part to the projects of National Minister of ICT, thanks to the big
distribution and use of the television network and Internet in the country. This article explains how joining
different technologies like social networks, information adaptation and DTT, to get an application that
offers information services to users, based on their data, preferences, inclinations, use and interaction with
others users and groups inside the network.
Reconfiguration Strategies for Online Hardware Multitasking in Embedded SystemsCSEIJJournal
An intensive use of reconfigurable hardware is expected in future embedded systems. This means that the
system has to decide which tasks are more suitable for hardware execution. In order to make an efficient
use of the FPGA it is convenient to choose one that allows hardware multitasking, which is implemented by
using partial dynamic reconfiguration. One of the challenges for hardware multitasking in embedded
systems is the online management of the only reconfiguration port of present FPGA devices. This paper
presents different online reconfiguration scheduling strategies which assign the reconfiguration interface
resource using different criteria: workload distribution or task’ deadline. The online scheduling strategies
presented take efficient and fast decisions based on the information available at each moment. Experiments
have been made in order to analyze the performance and convenience of these reconfiguration strategies.
Performance Comparison and Analysis of Mobile Ad Hoc Routing ProtocolsCSEIJJournal
A mobile ad hoc network (MANET) is a wireless network that uses multi-hop peer-to-peer routing instead
of static network infrastructure to provide network connectivity. MANETs have applications in rapidly
deployed and dynamic military and civilian systems. The network topology in a MANET usually changes
with time. Therefore, there are new challenges for routing protocols in MANETs since traditional routing
protocols may not be suitable for MANETs. Researchers are designing new MANET routing protocols
and comparing and improving existing MANET routing protocols before any routing protocols are
standardized using simulations. However, the simulation results from different research groups are not
consistent with each other. This is because of a lack of consistency in MANET routing protocol models
and application environments, including networking and user traffic profiles. Therefore, the simulation
scenarios are not equitable for all protocols and conclusions cannot be generalized. Furthermore, it is
difficult for one to choose a proper routing protocol for a given MANET application. According to the
aforementioned issues, this paper focuses on MANET routing protocols. Specifically, my contribution
includes the characterization of different routing protocols and compare and analyze the performance of
different routing protocols.
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemCSEIJJournal
The hyperchaotic Qi system (Chen, Yang, Qi and Yuan, 2007) is one of the important models of four-
dimensional hyperchaotic systems. This paper investigates the adaptive stabilization and synchronization
of hyperchaotic Qi system with unknown parameters. First, adaptive control laws are designed to
stabilize the hyperchaotic Qi system to its equilibrium point at the origin based on the adaptive control
theory and Lyapunov stability theory. Then adaptive control laws are derived to achieve global chaos
synchronization of identical hyperchaotic Qi systems with unknown parameters. Numerical simulations
are shown to demonstrate the effectiveness of the proposed adaptive stabilization and synchronization
schemes.
An Energy Efficient Data Secrecy Scheme For Wireless Body Sensor NetworksCSEIJJournal
Data secrecy is one of the key concerns for wireless body sensor networks (WBSNs). Usually, a data
secrecy scheme should accomplish two tasks: key establishment and encryption. WBSNs generally face
more serious limitations than general wireless networks in terms of energy supply. To address this, in this
paper, we propose an energy efficient data secrecy scheme for WBSNs. On one hand, the proposed key
establishment protocol integrates node IDs, seed value and nonce seamlessly for security, then
establishes a session key between two nodes based on one-way hash algorithm SHA-1. On the other hand,
a low-complexity threshold selective encryption technology is proposed. Also, we design a security
selection patter exchange method with low-complexity for the threshold selection encryption. Then, we
evaluate the energy consumption of the proposed scheme. Our scheme shows the great advantage over
the other existing schemes in terms of low energy consumption.
To improve the QoS in MANETs through analysis between reactive and proactive ...CSEIJJournal
A Mobile Ad hoc NETwork (MANET), is a self-configuring infra structure less network of mobile devices
connected by wireless links. ad hoc is Latin and means "for this purpose". Each device in a MANET is free
to move independently in any direction, and will therefore change its links to other devices frequently. Each
must forward traffic unrelated to its own use, and therefore be a router. The primary challenge in building
a MANET is equipping each device to continuously maintain the information required to properly route
traffic. QOS is defined as a set of service requirements to be met by the network while transporting a
packet stream from source to destination. Intrinsic to the notion of QOS is an agreement or a guarantee by
the network to provide a set of measurable pre-specified service attributes to the user in terms of delay,
jitter, available bandwidth, packet loss, and so on. The analysis is mainly between proactive or table-driven
protocols like OLSR (Optimized Link State Routing) viz DSDV (Destination Sequenced Distance Vector) &
CGSR (Cluster Head Gateway Switch Routing) and reactive or source initiated routing protocols viz
AODV (Ad hoc on Demand distance Vector) & DSR (Dynamic Source Routing). The QoS analysis of the
above said protocols is simulated on NS2 and results are shown thereby.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Embedded machine learning-based road conditions and driving behavior monitoring
Rule-Based Phonetic Matching Approach for Hindi and Marathi
1. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
DOI : 10.5121/cseij.2011.1302 13
RULE-BASED PHONETIC MATCHING
APPROACH FOR HINDI AND MARATHI
Sandeep Chaware1
, Srikantha Rao2
1
Research Scholar, MPSTME, NMIMS University, Mumbai, INDIA
{smchaware@gmail.com}
2
PhD Guide, MPSTME, NMIMS University, Mumbai, INDIA
{dr_s_rao@hotmail.com}
ABSTRACT
Phonetic matching plays an important role in multilingual information retrieval, where data is
manipulated in multiple languages and user need information in their native language which may be
different from the language where data has been maintained. In such an environment, we need a system
which matches the strings phonetically in any case. Once strings match, we can retrieve the information
irrespective of languages. In this paper, we proposed an approach which matches the strings either in
Hindi or Marathi or in cross-language in order to retrieve information. We compared our proposed
method with soundex and Q-gram methods. We found better results as compared to these approaches.
KEYWORDS
Soundex, Q-gram, Edit distance, Indic-phonetic, phonetic matching.
1.INTRODUCTION
Phonetic matching is needed when many diversified people come together. They either speak
with different pronunciation styles or write many languages in various writing styles, but their
meaning is same. It deals with identified the similarity of two or more strings by pronunciation,
regardless of their actual spelling. A typical example is Just-Dial service in Mumbai, where a
telephone operator is given a name for finding out the information for the same. The operator
guesses for the possible spelling of it, and then searches from the database (or spelling may be
provided which may be incorrect).
Phonetic matching plays an important role in information retrieval in multilingual environment.
Information retrieval means searching the data in a database for retrieval. For this indexing and
ranking will be used. Information retrieval needs an exact match for a given string. Phonetic
matching can be defined as a process of identifying a set of strings those is most likely to be
similar in sound to a given keyword [1]. The strings can be spelled using different writing styles,
but they can be matched phonetically. All the strings represent the same keyword, only way of
writing is different. Since in rural areas, the word may be spelled or pronounce either wrongly or
differently. We can retrieve the data using phonetic matching. There is no need of exact string
matches.
2. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
14
There are many techniques had been proposed in order to find the phonetic matching of strings
like Soundex, Q-gram, Caverphone, Phonix etc. Each technique generates a code for the strings
and matches them through edit distances. If the edit distance is minimum, those strings are more
close to each other [2]. But, these methods have been not found suitable for all the strings. There
are some limitations like first, generation of code follows long procedure, and second, they are
generating the same code for mismatch strings or generate different codes for match strings.
Other technique uses text-to-phonetic (TTP) approach, which may not work for Indian languages,
as those codes are not available or TTP conversion is complex. [2].
In this paper we proposed an approach, which provides a simple and efficient way of matching
the strings. Our scheme will work on text-conversion technique for any Indian languages,
especially Hindi, Marathi. It has been found that most of the rural people don’t either spell or
make correct pronunciation the keywords properly.
The main objectives of the proposed system are, first, to convert the entered strings into its
equivalent phonetic forms by applying phonetic rules for each language instead of using TTP and
to compare our methodology with other methods. Second, is to retrieve the information needed
for the entered string from the database if it is matching phonetically.
2.PHONETIC MATCHING APPROACHES: A SURVEY
In this section, we describe the various techniques for phonetic matching. Existing techniques
were developed to match the string approximately, where as our approach matches the strings
exactly. These approaches used for phonetic matching.
A. Soundex for English
In this, each string is converted into a code, which consists of a first letter of a string and three
numbers. The numbers are assigned to each letter as per guidelines described by the algorithm
[2][3]. Zeros are added at the end if necessary to produce four-character code. Additional letters
are discarded. The soundex codes and algorithm are given in table1 and algorihm1 respectively.
The pitfall of this algorithm is producing the same code for phonetically two different strings or
producing different codes, even they are same.
Table 1: Soundex phonetic codes
Code: 0 1 2 3 4 5 6
Letters: a,e,i,o,u,
y,h,w
b,p c,g,j,k
q,s,x,z
d,t l m,
n
r
Algorithm 1: Soundex for Phonetic Matching
Input: Two string to match
Output: Soundex code
1. Retain the first letter of the string.
2. Remove all occurrences of the following letters, unless it is the first letter: a, e, i, o, u, y,
h, w.
3. Assign numbers as given in code table.
4. If two or more letters with same number were adjacent, then omit all except first.
5. Return the first four bytes padded with 0 if necessary.
3. Computer Science & Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
15
EXAMPLE
Consider the two strings in English as, ‘sandy’ and ‘sandhya’ for phonetic matching. After
applying the rules from table 1 and steps from algorithm 1, we are getting the same code as
‘s530’. So, by looking at the code we have to say that both strings are phonetically matching, but
both the strings are phonetically different.
B. Soundex for Hindi
If we consider the soundex algorithm for Hindi and assign the code as per the table 2. If we
match the strings with this algorithm, we will get different codes for the same string or we can
not generate the code. From this methodology, the code can be generated by taking halant for
each consonants concatenated together with vowel. This will change the code and it takes long
time to parse the string and assign the code to it. We have found out that, there is no code for
many alphabets for Hindi language, such as ण, झ, छ, घ etc.
Table 2: Soundex code for Hindi as per algorithm
Code: 0 1 2 3 4 5 6
Letters: अ, इ, ओ, उ, व, ह ब, प च, ग, ज, क,
स
ड,
ट,
त
ल,
ळ
म,
न
र
EXAMPLE
If we consider the same rules and algorithm for Hindi as shown in table 2, the codes for ‘सँडी’
and ‘संया’ strings are same as ‘स530’. By using soundex method, we are getting the same code
for both the strings entered in Hindi. As codes are same, we have to say that both strings match
phonetically, but they are not matching. There is an ambiguity to match.
C. Q-Gram for English
This method measures the string distances based on q-gram counts, where q-gram of a string s is
any substring of s of some fixed length q. A simple such measure is to count number of q-grams
for the two strings, with a higher count yielding a stronger match [2]. But, this algorithm is not
exactly phonetic, since they do not operate based on comparison of the phonetic characteristics of
words. The steps for the algorithm are as given in algorithm 2. Since phonetically similar words
often have similar spellings, this technique can provide favorable results, yet it also successfully
matches misspelled or otherwise muted words, even if they are rendered phonetically disparate.
For example, for the strings, ‘sandy’ and ‘sandhya’, with q = 2, the substrings are as shown in
table 3. With q-gram algorithm we are getting 3-grams matches out of 4-grams and for 2 grams
there are no gram to match from other string. Depending on number of q-grams, we have to find
out similarity of strings as phonetically.
4. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
16
Table 3: Q-gram Example
String 1 :
‘Sandy’
Sa an nd dy ---- ----
String 2:
‘Sandhya’
Sa an nd dh hy ya
Algorithm 2: Q-Gram
Input: Two strings to match
Output: Number of n-grams
1. Select the value for q.
2. Make substrings of the strings with number of letters equal to q.
3. for q value
i. First string = first q letters.
ii. Next substring = second letter of previous string + next letter.
iii. Repeat.
4. Count the number of substrings in common.
5. Maximum count results in most matches.
D. Q-gram for Hindi
If we apply the same rules as in algorithm 2 in order to form q-grams for Hindi strings then the
entire string will change or we will not get any q-gram that matches. One problem with this
method is that if parsing of a string goes wrong then it affects the generated code.
EXAMPLE
Consider the same strings as, ‘सँडी’ and ‘संया’, after applying Q-gram method for Hindi, we will
get only two 2-grams that match and two does not matches. There is also an ambiguity for
phonetically match.
E. Edit Distance for English
A simple edit distance, which counts the minimal number of single-character insertions, deletions
and replacements needed to transform one string into another, could be used for phonetic
matching since similar sounding words are often spelled similarly. Each string is converted into
phonetic string, and then the edit distance is calculated between them as shown in algorithm 3.
For conversion, text-to-phonetic algorithm can be used. In this method, we need TTP algorithm
for each language. TTP may not result appropriate for Indian languages.
Algorithm 3: Edit Distance
Input: Two strings to match
Output: Edit Distance between them
1. edit (0 , 0) = 0
2. edit (i , 0) = i
3. edit (0 , j) = j
5. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
17
4. edit (i , j) = min [edit (i-1),j + 1,
edit (i,j-1) + 1,
edit (i-1, j-1) + r(si + tj)]
5. if r(a, b) = 0, strings are similar
else strings are non-similar.
Where s and t are strings of length m and n respectively.
3.PROPOSED INDIC-PHONETIC APPROACH
In this approach, we are providing the user interface to provide the keywords or strings in local
languages like Hindi or Marathi for phonetically matching. Each entered string will be parsed to
get exact combination of vowels, consonants and/or modifiers as phonetic string according to
phonetic rules for each language mentioned below. We are assigning the codes for each phonetic
string, in order to match. The difference between those codes are compared with the threshold
value, if it is less or equal to threshold value, then these two strings are phonetically matched else
not.
A. DESIGN
Overall system architecture for proposed Hindi-Marathi phonetic approach is as shown in figure
1. The system includes modules as user interface module, parsing module, phonetic equivalent
string module, comparison module and display or result module. Each module is described below
in brief.
Figure 1: Hindi-Marathi Phonetic Approach System Architecture
1. User Interface Module: This module will give user an interface to interact with the
system in order to find phonetic equivalency between two strings. The user can enter
either two Hindi/Marathi strings or two strings in two different languages. Assuming that
the two entered strings are exactly as they should be written in those languages.
2. Parsing Module: Entered strings are being parsed by this module in order to extract each
alphabet as vowel, consonant or modifier. This will help us to get the equivalent phonetic
strings.
3. Phonetic Equivalent String Module: This module will translate the entered string into
its equivalent phonetic string by applying phonetic rules for each language. We had
Parsing
Module
Phonetic
Equivalent
Strings
Module
Result
Comparison
Module
User
Interface
Module
Hindi-Marathi
Phonetic Rules
Threshold
value
6. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
18
applied some rule directly from the grammar and designed some rules for Hindi and
Marathi languages.
4. Comparison Module: In this module, we are assigning phonetic code as Unicode as per
the phonetic group for each language and finally got the Unicode for the entire string by
summing up all the Unicode in a string. With some threshold value, we can compare
those two strings for phonetic matching. If found matching, then both the strings are very
closely phonetic matches.
5. Display/result Module: This module will display whether two strings are phonetically
equivalent or not as a message to the user.
B. IMPLEMENTATION STRATEGY
In this approach, we are forming the rules as per pronunciation of alphabets in Hindi and Marathi
languages. Each string will be interpreted and converted to its phonetic form using these rules.
For phonetic matching, we are forming the codes for each string as per Unicode table [6], which
will be common for both the languages. Then these codes are compared with 5% threshold fixed
value. If they are within a range, then we can say that they are phonetically matched.
I) Phonetic Rules for Hindi
We can formulate some rules according to the occurrences and way of pronunciation of each
character in Hindi[7]. Some rules are follows:
Rule 1: If ‘स’ or ‘श’ comes, then dot (.) will be pronounced as ‘न’.
Rule 2: If ‘ह’ is preceded by anuswar (.) then it will be pronounced as ‘घ’.
Rule 3: If ‘!’ occurs in a string, then it will be pronounced as ‘य ‘.
Rule 4: If ‘ड़’ or ‘ढ़’ occur as last character from a string, then it is pronounced lightly and will
have different code.
Rule 5: If ‘ड’ or ‘ढ’ occurs as a starting or after anuswar (.) in a string, then it is pronounced fully
and will have different code.
Rule 6: For ‘ऋ’, the pronounced character is ‘'र’.
Rule 7: If ‘(’ occurs, then it is pronounced as ‘त ्+ र’.
We are applying these mentioned rules of pronunciation to the Hindi strings, in order to acquire
the correct form of phonetic string. Now, for this phonetic form, we are assigning the codes from
the mapping table in order to match phonetically. We had used the difference between the codes
as a threshold value for matching. If the result crosses the threshold value, then the entered two
strings are phonetically matched, else not.
II) Phonetic Rules for Marathi
We can formulate some rules according to the occurrences and way of pronunciation of each
character in Marathi [8]. Some rules are follows:
Rule 1: Every consonant is a combination of consonant + halant + ‘अ’.
7. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
19
Rule 2: If ‘*’ occurs, then its pronounced character is ‘क् ष’.
Rule 3: If ‘!’ occurs, then its pronounced character is ‘, + न ्+ य’.
Rule 4: For last consonant of a string, it is pronounced with halant. For example, नवस = नवस ्.
Rule 5: For last but-one consonant is pronounced with halant. For example, हसरा = ह-ा.
Rule 6: A string of 4 characters, the second consonant is pronounced with halant, for example,
घरदार = घदा/र.
Rule 7: if ‘य’ after anuswar then it is pronounced as ‘यँ’, for example, संयु1त = सं2यु1त.
Rule 8: if ‘ल’ after anuswar then it is pronounced as ‘लँ’, for example, संलाप = सं3लाप.
Rule 9: if ‘र, व, श, ष, स, ह, !’ after ‘सं’ then it is pronounced as ‘वँ’, for example, संसद = सं4सद.
We are applying these mentioned rules of pronunciation to the Marathi strings, in order to acquire
the correct form of phonetic string. Now, for this phonetic form, we are assigning the codes from
the mapping table in order to match phonetically. We had used the difference between the codes
as a threshold value for matching. If the result crosses the threshold value, then the entered two
strings are phonetically matched, else not.
III)Proposed Rule-based Phonetic Matching for Hindi and Marathi Algorithm
Input: Two strings either in Hindi or Marathi to match
Output: Phonetic Matching Yes or No
1. Enter two Hindi or Marathi strings to match.
2. Each string is translated into its phonetic form by using phonetic rules for each
language.
3. Obtain Unicode for each translated string, by summing the Unicode value of each
characters of a string.
4. Compare the resultant Unicode by considering a threshold value of 5%.
5. If these values are within 5%, then we are saying that they are phonetically matched.
Else they are not matching.
EXAMPLE
If we consider the strings as in either Hindi or Marathi, after applying our proposed Indic-
phonetic approach, we can say that both the strings are NOT phonetically matching. As shown in
table 4, we are not getting the same phonetic strings as well as same phonetic code.
8. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
20
Table 4: Indic-phonetic approach for Hindi and Marathi strings
String
no.
Hindi strings Corresponding
phonetic code as per
Indic-phonetic
approach
Marathi strings Corresponding
phonetic code as
per Indic-phonetic
approach
1 ‘स5डी’ स ्ऐं7ई ‘स5डी’ स ्ऐं7ई
2 ‘संया’ स ्अन ्अय ्आ ‘संया’ संय ्आ
4. RESULTS
It has been found that our approach will give the exact match for any writing styles of a string,
which will be equivalent to phonetically. There is no need to generate a code or forming the
substrings, only conversion is needed. For this we had used mapping method, or Wordnet can be
used for the same.
Figures from 2 to 8 shows an interface to enter two strings to compare phonetically either in
Hindi or Marathi, its phonetic equivalent according to rules, and matching results for soundex, Q-
gram and proposed Indic-phonetic methods. The results also show the information retrieval as
phonetic matching from database. These phonetic representations are assigned codes as per rules
mentioned. These codes are compared with threshold given compared for phonetically matching.
Figure 2: Phonetic Matching Approach for Hindi and Marathi
9. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
21
Figure 3: Three Phonetic matching algorithm to compare
Figure 4: Soundex Approach for Hindi
Figure 5: Q-gram approach for Hindi
10. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
22
Figure 6: Q-gram result for Hindi
Figure 7: Indic-phonetic for Hindi
Figure 8: Indic-phonetic for Marathi
11. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
23
Table 5: Strings with codes and matching status for Hindi and Marathi
Strings HINDI MARATHI
SOUNDEX Q-GRAM INDIC-
PHONETIC
SOUNDEX Q-GRAM INDIC-
PHONETIC
संतोष संथोष YES YES YES YES YES YES
मुंबई मुंबाई YES YES NO YES YES NO
रघुलला राघुलला YES YES YES YES YES YES
संया स5डी YES YES NO YES YES NO
Figure 9: Phonetic Matching for soundex, q-gram and Indic-phonetic for Hindi and Marathi
5. CONCLUSIONS
There are many features of phonetic for Indian languages. In this chapter, we had tried to include
most of them by forming the rules in order to match the strings for Hindi and Marathi languages.
In this approach, we explored and compared some of the phonetic matching approaches for
Indian languages like Hindi and Marathi. We have found out that these approached are lagging in
accommodating all the characters form an alphabet for both Hindi and Marathi languages. We
proposed a rule-based approach for phonetic matching. We had also proposed this approach for
cross-language string matching. This method is without using IPA code, thus reducing the
ambiguity for matching. Advantages of our approach are that it gives the user and developer a
simple, easy and efficient way for phonetic matching.
12. Computer Science Engineering: An International Journal (CSEIJ), Vol.1, No.3, August 2011
24
REFERENCES
[1] Justin Zobel and Philip Dart, “Phonetic String Matching: Lessons from Information Retrieval”.
[2] A. Kumaran, “Multilingual Information Processing on Relational Database Architectures”, PhD
Thesis, IISc Bangolre, 2005.
[3] Alexander Beider and Stephen P. Morse, “Phonetic Matching: A Better Soundex”.
[4] http://en.wikipedia.org/wiki/Soundex
[5] www.bhashaindia.com
[6] http://tdil.mit.gov.in/unicodeapril03.pdf
[7] Hindi Grammar Book.
[8] Marathi Grammar Book.
Author 1
Mr. Sandeep Chaware, Research Scholar at MPSTME, NMIMS University, Mumbai, INDIA.
Author 2
Dr. Srikantha Rao, PhD Guide at MPSTME, NMIMS University, Mumbai, INDIA.