This document summarizes a survey on using deep learning to develop grammar checkers for Indian languages. It discusses different types of existing grammar checkers such as rule-based, statistical, and hybrid approaches. Grammar checkers have been developed for several languages including English, Afan Oromo, Portuguese, Punjabi, Amharic, Swedish, Nepali, Icelandic, and Hindi. Most use rule-based approaches where input sentences are checked against predefined grammar rules. Statistical approaches check input against annotated corpora. Hybrid approaches combine rule-based and statistical methods. The survey concludes that while many grammar checkers exist for other languages, only a limited number have been developed for Indian languages, leaving room for future research on developing deep
Techniques for automatically correcting words in textunyil96
The problem of automatically correcting words in text has been an ongoing research challenge since the 1960s. Existing spelling checkers and text recognition techniques are limited in their accuracy. Three main areas of research have focused on detecting and correcting (1) nonwords, (2) isolated misspelled words, and (3) context-dependent real-word errors. While progress has been made, fully automatic correction of all word errors requires techniques that can analyze contextual information to detect errors resulting in other valid words.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
A New Approach to Parts of Speech Tagging in Malayalamijcsit
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’s
usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes
include additional information, with case markers (number, gender etc) and tense markers. A large number
of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based
Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the
oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and
statistical information to assign tag to words. It use large corpus, so that Time complexity and Space
complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic
Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. The
currently used Algorithms are efficient Machine Learning Algorithms but these are not built for
Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use
Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence
structure along with the dictionary entry.
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
This document presents a proposed approach for analyzing and classifying sentiments in code-mixed data, which is text containing a combination of languages like Hindi and English. The approach uses wordnet techniques to first separate words in each sentence into English and non-English words. It then processes the English and non-English words separately to calculate polarity scores, which indicate whether the sentiment is positive or negative. These polarity scores are combined to determine the overall sentiment of the sentence. The proposed system is aimed at improving over dictionary-based approaches by leveraging wordnet resources like WordNet and Hindi SentiWordNet to analyze code-mixed movie review data and classify the overall polarity.
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
Named Entity Recognition (NER) is a major early step in Natural Language Processing (NLP) tasks like machine translation, text to speech synthesis, natural language understanding etc. It seeks to classify words which represent names in text into predefined categories like location, person-name, organization, date, time etc. In this paper we have used a combination of machine learning and Rule based approaches to classify named entities. The paper introduces a hybrid approach for NER. We have experimented with Statistical approaches like Conditional Random Fields (CRF) & Maximum Entropy (MaxEnt) and Rule based approach based on the set of linguistic rules. Linguistic approach plays a vital role in overcoming the limitations of statistical models for morphologically rich language like Hindi. Also the system uses voting method to improve the performance of the NER system. Keywords: NER, MaxEnt, CRF, Rule base, Voting, Hybrid approach
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
Techniques for automatically correcting words in textunyil96
The problem of automatically correcting words in text has been an ongoing research challenge since the 1960s. Existing spelling checkers and text recognition techniques are limited in their accuracy. Three main areas of research have focused on detecting and correcting (1) nonwords, (2) isolated misspelled words, and (3) context-dependent real-word errors. While progress has been made, fully automatic correction of all word errors requires techniques that can analyze contextual information to detect errors resulting in other valid words.
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
Cross Language Information Retrieval (CLIR) deals with retrieving relevant information stored in a language different from
the language of user’s query. This helps users to express the information need in their native languages. Machine translation based (MTbased)
approach of CLIR uses existing machine translation techniques to provide automatic translation of queries. This paper covers the
research work done in CLIR and MT systems for Marathi language in India.
A New Approach to Parts of Speech Tagging in Malayalamijcsit
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’s
usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes
include additional information, with case markers (number, gender etc) and tense markers. A large number
of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based
Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the
oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and
statistical information to assign tag to words. It use large corpus, so that Time complexity and Space
complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic
Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. The
currently used Algorithms are efficient Machine Learning Algorithms but these are not built for
Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use
Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence
structure along with the dictionary entry.
IRJET - Analysis on Code-Mixed Data for Movie ReviewsIRJET Journal
This document presents a proposed approach for analyzing and classifying sentiments in code-mixed data, which is text containing a combination of languages like Hindi and English. The approach uses wordnet techniques to first separate words in each sentence into English and non-English words. It then processes the English and non-English words separately to calculate polarity scores, which indicate whether the sentiment is positive or negative. These polarity scores are combined to determine the overall sentiment of the sentence. The proposed system is aimed at improving over dictionary-based approaches by leveraging wordnet resources like WordNet and Hindi SentiWordNet to analyze code-mixed movie review data and classify the overall polarity.
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
Named Entity Recognition (NER) is a major early step in Natural Language Processing (NLP) tasks like machine translation, text to speech synthesis, natural language understanding etc. It seeks to classify words which represent names in text into predefined categories like location, person-name, organization, date, time etc. In this paper we have used a combination of machine learning and Rule based approaches to classify named entities. The paper introduces a hybrid approach for NER. We have experimented with Statistical approaches like Conditional Random Fields (CRF) & Maximum Entropy (MaxEnt) and Rule based approach based on the set of linguistic rules. Linguistic approach plays a vital role in overcoming the limitations of statistical models for morphologically rich language like Hindi. Also the system uses voting method to improve the performance of the NER system. Keywords: NER, MaxEnt, CRF, Rule base, Voting, Hybrid approach
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...kevig
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...kevig
This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. In Japanese natural language processing systems, this technique plays an essential role in downstream applications because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and examined the influence of training data on texts of various genres.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language pair. The machine translation system will take input script as English sentence and parse with the help of Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will take the parsed output and separate the source text word by word and searches for their corresponding target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also reordering rules are there. After applying the reordering rules, English sentence will be syntactically reordered to suit Marathi language
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMijnlc
The document describes a machine transliteration system that transliterates Hindi and Marathi names and words to English using support vector machines (SVM). It segments source language names into phonetic units, and trains an SVM classifier using phonetic units and n-grams as features to label each unit with its English transliteration. The system achieves good accuracy for Hindi-English and Marathi-English transliteration.
Design of A Spell Corrector For Hausa LanguageWaqas Tariq
In this article, a spell corrector has been designed for the Hausa language which is the second most spoken language in Africa and do not yet have processing tools. This study is a contribution to the automatic processing of the Hausa language. We used existing techniques for other languages and adapted them to the special case of the Hausa language. The corrector designed operates essentially on Mijinguini’s dictionary and characteristics of the Hausa alphabet. After a brief review on spell checking and spell correcting techniques and the state of art in the Hausa language processing, we opted for the data structures trie and hash table to represent the dictionary. The edit distance and the specificities of the Hausa alphabet have been used to detect and correct spelling errors. The implementation of the spell corrector has been made on a special editor developed for that purpose (LyTexEditor) but also as an extension (add-on) for OpenOffice.org. A comparison was made on the performance of the two data structures used.
Comparison Analysis of Post- Processing Method for Punjabi FontIRJET Journal
The document describes a study that compares different post-processing methods for recognizing handwritten Punjabi (Gurmukhi) text from optical character recognition (OCR) systems. The study proposes using four phases: 1) an AVL tree implementation method to find suggestions, 2) a symmetric delete spelling correction algorithm, 3) taking the union of the results from the first two methods, and 4) ranking the suggestions based on a soundex approach. The goal is to correct errors and improve recognition rates for OCR of handwritten Gurmukhi text by using these post-processing techniques. The methods are tested on a database of over 115,000 Gurmukhi words and the results aim to reduce recognition errors and
IRJET- Querying Database using Natural Language InterfaceIRJET Journal
This document presents a proposed natural language interface system to allow users to query a database using English queries instead of SQL. The system aims to make database access easier for non-technical users. It discusses the architecture of the system, which includes modules for natural language processing, query translation to SQL, and speech conversion. It also reviews related work and discusses advantages and disadvantages of natural language interfaces for databases. The proposed system uses techniques like tokenization, parsing, and semantic analysis to understand queries and map them to equivalent SQL queries to retrieve results from the database.
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
We describe in detail a Grapheme-to-Phoneme (G2P) converter required for the development of a good quality
Marathi Text-to-Speech (TTS) system. The Festival and Festvox framework is chosen for developing the
Marathi TTS system. Since Festival does not provide complete language processing support specie to various
languages, it needs to be augmented to facilitate the development of TTS systems in certain new languages.
Because of this, a generic G2P converter has been developed. In the customized Marathi G2P converter, we
have handled schwa deletion and compound word extraction. In the experiments carried out to test the Marathi
G2P on a text segment of 2485 words, 91.47% word phonetisation accuracy is obtained. This Marathi G2P has
been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich
sentences. The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the
complete text corpora.
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
The language is an effective medium for the communication that conveys the ideas and expression of the human
mind. There are more than 5000 languages in the world for the communication. To know all these languages is
not a solution for problems due to the language barrier in communication. In this multilingual world with the
huge amount of information exchanged between various regions and in different languages in digitized format,
it has become necessary to find an automated process to convert from one language to another. Natural
Language Processing (NLP) is one of the hot areas of research that explores how computers can be utilizing to
understand and manipulate natural language text or speech. In the Proposed system a Hybrid approach to
transliterate the proper nouns from Punjabi to Hindi is developed. Hybrid approach in the proposed system is a
combination of Direct Mapping, Rule based approach and Statistical Machine Translation approach (SMT).
Proposed system is tested on various proper nouns from different domains and accuracy of the proposed system
is very good.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
This document describes research on detecting and correcting word ordering errors in Chinese sentences written by non-native Chinese language learners. The researchers:
1) Created a dataset of Chinese sentences with annotated word ordering errors from a corpus of learner compositions.
2) Proposed models using conditional random fields and support vector machines to detect segments containing errors and rank candidate corrections.
3) Reported their best model achieved an accuracy of 0.834 for detecting error segments and on average ranked the correct ordering 4.8th among 184.48 candidates.
Ijnlc020306NAMED ENTITY RECOGNITION IN NATURAL LANGUAGES USING TRANSLITERATIONijnlc
Transliteration may be defined as the process of mapping sounds in a text written in one language to
another language. Current paper discusses about transliteration and its use in Named Entity Recognition.
We have designed a code that executes Transliteration and assist in the process of Named Entity
Recognition. We have presented some of the results of Named Entity Recognition (NER) using
Transliteration
This document is a B Tech project report submitted by Abhishek Agarwal in April 2005 at the Dhirubhai Ambani Institute of Information & Communication Technology. The project aims to develop machine understanding of Indian spoken languages. It discusses work done in language identification based on phonetic characteristics. The report covers background on language identification systems, objectives of the project, a discussion on developing language models for the identification process, and proposes tools that could utilize language identification algorithms.
1) This document discusses stemming algorithms that have been used for the Odia language. Stemming is the process of reducing inflected words to their root or stem for purposes like information retrieval.
2) It reviews different stemming algorithms that have been applied to Odia text, including suffix stripping, affix removal, and stochastic algorithms. It also discusses common errors in stemming like over-stemming and under-stemming.
3) Applications of stemming discussed include information retrieval, text summarization, machine translation, indexing, and question answering systems. The document concludes by surveying prior work on stemming algorithms for Odia.
Transliteration by orthography or phonology for hindi and marathi to english ...ijnlc
e-Governance and Web based online commercial multilingual applications has given utmost importance to
the task of translation and transliteration. The Named Entities and Technical Terms occur in the source
language of translation are called out of vocabulary words as they are not available in the multilingual
corpus or dictionary used to support translation process. These Named Entities and Technical Terms need
to be transliterated from source language to target language without losing their phonetic properties. The
fundamental problem in India is that there is no set of rules available to write the spellings in English for
Indian languages according to the linguistics. People are writing different spellings for the same name at
different places. This fact certainly affects the Top-1 accuracy of the transliteration and in turn the
translation process. Major issue noticed by us is the transliteration of named entities consisting three
syllables or three phonetic units in Hindi and Marathi languages where people use mixed approach to
write the spelling either by orthographical approach or by phonological approach. In this paper authors
have provided their opinion through experimentation about appropriateness of either approach.
A Review on a web based Punjabi t o English Machine Transliteration SystemEditor IJCATR
This document summarizes a research paper on developing a Punjabi to English machine transliteration system using statistical machine translation. It discusses how existing transliteration systems between other languages use rule-based or hybrid approaches and have accuracies ranging from 73% to 95%. The proposed system aims to increase accuracy by using statistical machine translation techniques to learn from existing transliterated data and select the most probable transliteration when multiple options exist. It will help translate documents in the Punjabi language, which is official in Punjab, into English for international understanding.
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...ijma
In this work a new Bangla speech corpus along with proper transcriptions has been developed; also
various acoustic feature extraction methods have been investigated using Long Short-Term Memory
(LSTM) neural network to find their effective integration into a state-of-the-art Bangla speech recognition
system. The acoustic features are usually a sequence of representative vectors that are extracted from
speech signals and the classes are either words or sub word units such as phonemes. The most commonly
used feature extraction method, known as linear predictive coding (LPC), has been used first in this work.
Then the other two popular methods, namely, the Mel frequency cepstral coefficients (MFCC) and
perceptual linear prediction (PLP) have also been applied. These methods are based on the models of the
human auditory system. A detailed review of the implementation of these methods have been described
first. Then the steps of the implementation have been elaborated for the development of an automatic
speech recognition system (ASR) for Bangla speech.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
IRJET- Vernacular Language Spell Checker & AutocorrectionIRJET Journal
This document describes the development of a spell checker for the Hindi language. It discusses the importance of spell checkers for digitizing languages and some common techniques used in spell checking like n-gram analysis, edit distance algorithms, and probabilistic methods. The proposed system will use a corpus of Hindi text to build a language model and detect spelling errors. It will generate candidate corrections based on edit distance and rank them using n-gram frequency analysis. The goal is to develop a tool that can check for both non-word errors and real word errors in Hindi text.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
IRJET- Spelling and Grammar Checker and Template SuggestionIRJET Journal
This document summarizes a research paper that proposes a new technique for spelling and grammar checking using decision trees and Levenshtein distance analysis. The technique uses a dictionary lookup to check spelling by comparing words to entries in a dictionary using Levenshtein distance. Words with the smallest distance are considered correctly spelled replacements. It then applies parts-of-speech rules to suggest grammar corrections, such as correcting improper uses of "have". The researchers believe this technique can provide more accurate and fast spelling and grammar suggestions compared to other methods.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONkevig
Phonetic typing using the English alphabet has become widely popular nowadays for social media and chat services. As a result, a text containing various English and Bangla words and phrases has become increasingly common. Existing transliteration tools display poor performance for such texts. This paper proposes a robust Three-stage Hybrid Transliteration (THT) framework that can transliterate both English words and phonetic typed Bangla words satisfactorily. This is achieved by adopting a hybrid approach of dictionary-based and rule-based techniques. Experimental results confirm superiority of THT as it significantly outperforms the benchmark transliteration tool.
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
This paper presents a design for rule-based machine translation system for English to Marathi language pair. The machine translation system will take input script as English sentence and parse with the help of Stanford parser. The Stanford parser will be used for main purposes on the source side processing, in the machine translation system. English to Marathi Bilingual dictionary is going to be created. The system will take the parsed output and separate the source text word by word and searches for their corresponding target words in the bilingual dictionary. The hand coded rules are written for Marathi inflections and also reordering rules are there. After applying the reordering rules, English sentence will be syntactically reordered to suit Marathi language
HINDI AND MARATHI TO ENGLISH MACHINE TRANSLITERATION USING SVMijnlc
The document describes a machine transliteration system that transliterates Hindi and Marathi names and words to English using support vector machines (SVM). It segments source language names into phonetic units, and trains an SVM classifier using phonetic units and n-grams as features to label each unit with its English transliteration. The system achieves good accuracy for Hindi-English and Marathi-English transliteration.
Design of A Spell Corrector For Hausa LanguageWaqas Tariq
In this article, a spell corrector has been designed for the Hausa language which is the second most spoken language in Africa and do not yet have processing tools. This study is a contribution to the automatic processing of the Hausa language. We used existing techniques for other languages and adapted them to the special case of the Hausa language. The corrector designed operates essentially on Mijinguini’s dictionary and characteristics of the Hausa alphabet. After a brief review on spell checking and spell correcting techniques and the state of art in the Hausa language processing, we opted for the data structures trie and hash table to represent the dictionary. The edit distance and the specificities of the Hausa alphabet have been used to detect and correct spelling errors. The implementation of the spell corrector has been made on a special editor developed for that purpose (LyTexEditor) but also as an extension (add-on) for OpenOffice.org. A comparison was made on the performance of the two data structures used.
Comparison Analysis of Post- Processing Method for Punjabi FontIRJET Journal
The document describes a study that compares different post-processing methods for recognizing handwritten Punjabi (Gurmukhi) text from optical character recognition (OCR) systems. The study proposes using four phases: 1) an AVL tree implementation method to find suggestions, 2) a symmetric delete spelling correction algorithm, 3) taking the union of the results from the first two methods, and 4) ranking the suggestions based on a soundex approach. The goal is to correct errors and improve recognition rates for OCR of handwritten Gurmukhi text by using these post-processing techniques. The methods are tested on a database of over 115,000 Gurmukhi words and the results aim to reduce recognition errors and
IRJET- Querying Database using Natural Language InterfaceIRJET Journal
This document presents a proposed natural language interface system to allow users to query a database using English queries instead of SQL. The system aims to make database access easier for non-technical users. It discusses the architecture of the system, which includes modules for natural language processing, query translation to SQL, and speech conversion. It also reviews related work and discusses advantages and disadvantages of natural language interfaces for databases. The proposed system uses techniques like tokenization, parsing, and semantic analysis to understand queries and map them to equivalent SQL queries to retrieve results from the database.
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
We describe in detail a Grapheme-to-Phoneme (G2P) converter required for the development of a good quality
Marathi Text-to-Speech (TTS) system. The Festival and Festvox framework is chosen for developing the
Marathi TTS system. Since Festival does not provide complete language processing support specie to various
languages, it needs to be augmented to facilitate the development of TTS systems in certain new languages.
Because of this, a generic G2P converter has been developed. In the customized Marathi G2P converter, we
have handled schwa deletion and compound word extraction. In the experiments carried out to test the Marathi
G2P on a text segment of 2485 words, 91.47% word phonetisation accuracy is obtained. This Marathi G2P has
been used for phonetising large text corpora which in turn is used in designing an inventory of phonetically rich
sentences. The sentences ensured a good coverage of the phonetically valid di-phones using only 1.3% of the
complete text corpora.
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
The language is an effective medium for the communication that conveys the ideas and expression of the human
mind. There are more than 5000 languages in the world for the communication. To know all these languages is
not a solution for problems due to the language barrier in communication. In this multilingual world with the
huge amount of information exchanged between various regions and in different languages in digitized format,
it has become necessary to find an automated process to convert from one language to another. Natural
Language Processing (NLP) is one of the hot areas of research that explores how computers can be utilizing to
understand and manipulate natural language text or speech. In the Proposed system a Hybrid approach to
transliterate the proper nouns from Punjabi to Hindi is developed. Hybrid approach in the proposed system is a
combination of Direct Mapping, Rule based approach and Statistical Machine Translation approach (SMT).
Proposed system is tested on various proper nouns from different domains and accuracy of the proposed system
is very good.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
This document describes research on detecting and correcting word ordering errors in Chinese sentences written by non-native Chinese language learners. The researchers:
1) Created a dataset of Chinese sentences with annotated word ordering errors from a corpus of learner compositions.
2) Proposed models using conditional random fields and support vector machines to detect segments containing errors and rank candidate corrections.
3) Reported their best model achieved an accuracy of 0.834 for detecting error segments and on average ranked the correct ordering 4.8th among 184.48 candidates.
Ijnlc020306NAMED ENTITY RECOGNITION IN NATURAL LANGUAGES USING TRANSLITERATIONijnlc
Transliteration may be defined as the process of mapping sounds in a text written in one language to
another language. Current paper discusses about transliteration and its use in Named Entity Recognition.
We have designed a code that executes Transliteration and assist in the process of Named Entity
Recognition. We have presented some of the results of Named Entity Recognition (NER) using
Transliteration
This document is a B Tech project report submitted by Abhishek Agarwal in April 2005 at the Dhirubhai Ambani Institute of Information & Communication Technology. The project aims to develop machine understanding of Indian spoken languages. It discusses work done in language identification based on phonetic characteristics. The report covers background on language identification systems, objectives of the project, a discussion on developing language models for the identification process, and proposes tools that could utilize language identification algorithms.
1) This document discusses stemming algorithms that have been used for the Odia language. Stemming is the process of reducing inflected words to their root or stem for purposes like information retrieval.
2) It reviews different stemming algorithms that have been applied to Odia text, including suffix stripping, affix removal, and stochastic algorithms. It also discusses common errors in stemming like over-stemming and under-stemming.
3) Applications of stemming discussed include information retrieval, text summarization, machine translation, indexing, and question answering systems. The document concludes by surveying prior work on stemming algorithms for Odia.
Transliteration by orthography or phonology for hindi and marathi to english ...ijnlc
e-Governance and Web based online commercial multilingual applications has given utmost importance to
the task of translation and transliteration. The Named Entities and Technical Terms occur in the source
language of translation are called out of vocabulary words as they are not available in the multilingual
corpus or dictionary used to support translation process. These Named Entities and Technical Terms need
to be transliterated from source language to target language without losing their phonetic properties. The
fundamental problem in India is that there is no set of rules available to write the spellings in English for
Indian languages according to the linguistics. People are writing different spellings for the same name at
different places. This fact certainly affects the Top-1 accuracy of the transliteration and in turn the
translation process. Major issue noticed by us is the transliteration of named entities consisting three
syllables or three phonetic units in Hindi and Marathi languages where people use mixed approach to
write the spelling either by orthographical approach or by phonological approach. In this paper authors
have provided their opinion through experimentation about appropriateness of either approach.
A Review on a web based Punjabi t o English Machine Transliteration SystemEditor IJCATR
This document summarizes a research paper on developing a Punjabi to English machine transliteration system using statistical machine translation. It discusses how existing transliteration systems between other languages use rule-based or hybrid approaches and have accuracies ranging from 73% to 95%. The proposed system aims to increase accuracy by using statistical machine translation techniques to learn from existing transliterated data and select the most probable transliteration when multiple options exist. It will help translate documents in the Punjabi language, which is official in Punjab, into English for international understanding.
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...ijma
In this work a new Bangla speech corpus along with proper transcriptions has been developed; also
various acoustic feature extraction methods have been investigated using Long Short-Term Memory
(LSTM) neural network to find their effective integration into a state-of-the-art Bangla speech recognition
system. The acoustic features are usually a sequence of representative vectors that are extracted from
speech signals and the classes are either words or sub word units such as phonemes. The most commonly
used feature extraction method, known as linear predictive coding (LPC), has been used first in this work.
Then the other two popular methods, namely, the Mel frequency cepstral coefficients (MFCC) and
perceptual linear prediction (PLP) have also been applied. These methods are based on the models of the
human auditory system. A detailed review of the implementation of these methods have been described
first. Then the steps of the implementation have been elaborated for the development of an automatic
speech recognition system (ASR) for Bangla speech.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
IRJET- Vernacular Language Spell Checker & AutocorrectionIRJET Journal
This document describes the development of a spell checker for the Hindi language. It discusses the importance of spell checkers for digitizing languages and some common techniques used in spell checking like n-gram analysis, edit distance algorithms, and probabilistic methods. The proposed system will use a corpus of Hindi text to build a language model and detect spelling errors. It will generate candidate corrections based on edit distance and rank them using n-gram frequency analysis. The goal is to develop a tool that can check for both non-word errors and real word errors in Hindi text.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
RBIPA: An Algorithm for Iterative Stemming of Tamil Language Textskevig
Cyberbullying is currently one of the most important research fields. The majority of researchers have contributed to research on bully text identification in English texts or comments, due to the scarcity of data; analyzing Tamil textstemming is frequently a tedious job. Tamil is a morphologically diverse and agglutinative language. The creation of a Tamil stemmer is not an easy undertaking. After examining the major difficulties encountered, proposed the rule-based iterative preprocessing algorithm (RBIPA). In this attempt, Tamil morphemes and lemmas were extracted using the suffix stripping technique and a supervised machine learning algorithm for classify the word based for pronouns and proper nouns. The novelty of proposed system is developing a preprocessing algorithm for iterative stemming; lemmatize process to discovering exact words from the Tamil Language comments. RBIPA shows 84.96% of accuracy in the given Test Dataset which hasa total of 13000 words.
IRJET- Spelling and Grammar Checker and Template SuggestionIRJET Journal
This document summarizes a research paper that proposes a new technique for spelling and grammar checking using decision trees and Levenshtein distance analysis. The technique uses a dictionary lookup to check spelling by comparing words to entries in a dictionary using Levenshtein distance. Words with the smallest distance are considered correctly spelled replacements. It then applies parts-of-speech rules to suggest grammar corrections, such as correcting improper uses of "have". The researchers believe this technique can provide more accurate and fast spelling and grammar suggestions compared to other methods.
Allin Qillqay A Free On-Line Web Spell Checking Service For QuechuaAndrea Porter
This document describes a web-based spell checking service called "Allin Qillqay!" for the Quechua language. It utilizes existing open-source spell checking technologies for Quechua and integrates them into a web application using the CKEditor HTML editor and its Spell Check As You Type (SCAYT) plugin. The service allows for spell checking of Quechua texts within a web browser. It connects the CKEditor client-side interface with server-side spell checking programs for different Quechua varieties through an application programming interface. This provides an online spell checking resource for the Quechua language that improves through community use.
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET Journal
1. The document discusses the development of an intelligent system to optimize the English language using natural language processing techniques. The system will perform functions like summarization, spell check, grammar check, and sentence auto-completion.
2. It describes the various algorithms used for each function, including extracting important sentences for summarization, comparing words to dictionaries for spell check, analyzing syntax for grammar check, and completing sentences based on previous user data for auto-completion.
3. The system aims to build a smart tool that can correct errors and summarize text in English to improve communication through optimized language.
Designing A Rule Based Stemming Algorithm for Kambaata Language TextCSCJournals
Stemming is the process of reducing inflectional and derivational variants of a word to its stem. It has substantial importance in several natural language processing applications. In this research, a rule based stemming algorithm that conflates Kambaata word variants has been designed for the first time. The algorithm is a single pass, context-sensitive, and longest-matching designed by adapting rule-based stemming approach. Several studies agree that Kambaata is a strictly suffixing language with a rich morphology and word formations mostly relying on suffixation; even though its word formation involves infixation, compounding and reduplication as well.
The output of this study is a context-sensitive, longest-match stemming algorithm for Kambaata words. To evaluate the stemmer's effectiveness, error counting method was applied. A test set of 2425 distinct words was used to evaluate the stemmer. The output from the stemmer indicates that out of 2425 words, 2349 words (96.87%) were stemmed correctly, 63 words (2.60%) were over stemmed and 13 words (0.54%) were under stemmed. What is more, a dictionary reduction of 65.86% has also been achieved during evaluation.
The main factor for errors in stemming Kambaata words is the language's rich and complex morphology. Hence a number of errors can be corrected by exploring more rules. However, it is difficult to avoid the errors completely due to complex morphology that makes use of concatenated suffixes, irregularities through infixation, compounding, blending, and reduplication of affixes.
Designing A Rule Based Stemming Algorithm for Kambaata Language TextCSCJournals
This document describes the design and evaluation of a rule-based stemming algorithm for the Kambaata language. The algorithm uses a single-pass, longest-matching approach and is context-sensitive. It was evaluated on a test set of 2,425 words, correctly stemming 2,349 words (96.87%), overstemming 63 words (2.60%), and understemming 13 words (0.54%). The stemmer achieved a 65.86% reduction in dictionary size. Errors are due to the language's complex morphology involving suffixation, infixation, compounding, blending and reduplication. This represents the first stemming algorithm developed for Kambaata words.
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
This document describes a rule-based machine translation system for translating English text to Telugu. It discusses the challenges of developing such a system, including differences in grammar between the two languages. An algorithm is proposed that uses rules, probabilities, and rough sets to classify sentences and select the best word translations. The system works by tokenizing English sentences, tagging the words with parts of speech, looking up word translations in a bilingual dictionary, and concatenating the Telugu words to form the output sentence.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Bantu Spell Checker and Corrector using Modified Edit Distance Algorithm (MEDA)jennifer steffan
Automatic spelling correction for a language is critical since the current world is almost entirely dependent on digital devices that employ electronic keyboards. Correct spelling adds to textual document accessibility and readability. Many NLP applications, such as web search engines, text summarization, sentiment analysis, and so on, rely on automatic spelling correction. A few efforts on automatic spelling correction in Bantu languages have been completed; however, the numbers are insufficient. We proposed a spell checker for typed words based on the Modified minimum edit distance Algorithm (MEDA), and the Syllable Error Detection Algorithm (SEDA). In this study, we adjusted the minimal edit distance Algorithm by including a frequency score for letters and ordered operations. The SEDA identifies the component of the word and the position of the letter which has an error. For this research, the Setswana language was utilized for testing, and other languages related to Setswana will use this spell checker. Setswana is a Bantu language spoken mostly in Botswana, South Africa, and Namibia and its automatic spelling correction are still in its early stages. Setswana is Botswana’s national language and is mostly utilized in schools and government offices. The accuracy was measured in 2500 Setswana words for assessment. The SEDA discovered incorrect Setswana words with 99% accuracy. When evaluating MEDA, the edit distance algorithm was utilized as the baseline, and it generated an accuracy of 52%. In comparison, the edit distance algorithm with ordered operations provided 64% accuracy, and MEDA produced 92% accuracy. The model failed in the closely related terms.
This paper presents a natural language processing based automated system called DrawPlus for generating UML diagrams, user scenarios and test cases after analyzing the given business requirement specification which is written in natural language. The DrawPlus is presented for analyzing the natural languages and extracting the relative and required information from the given business requirement Specification by the user. Basically user writes the requirements specifications in simple English and the designed system has conspicuous ability to analyze the given requirement specification by using some of the core natural language processing techniques with our own well defined algorithms. After compound analysis and extraction of associated information, the DrawPlus system draws use case diagram, User scenarios and system level high level test case description. The DrawPlus provides the more convenient and reliable way of generating use case, user scenarios and test cases in a way reducing the time and cost of software development process while accelerating the 70 of works in Software design and Testing phase Janani Tharmaseelan ""Cohesive Software Design"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22900.pdf
Paper URL: https://www.ijtsrd.com/computer-science/other/22900/cohesive-software-design/janani-tharmaseelan
Adhyann – a hybrid part of-speech taggerijitjournal
Part of Speech Tagging automatically tags the word of a text by labels that can be used to determine the
structure of sentence. In this paper we propose an approach to the problem that is inspired from human
behavior. We used a combination of rule based and dictionary based approach to tackle this problem. Our
goal in this paper is to design a simple yet effective system to POS tagging that also helps us in more
effective understanding of human behavior.
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
This paper describes a transfer based scheme for translating Malayalam, a Dravidian language, to English. This system inputs Malayalam sentences and outputs equivalent English sentences. The system comprises of a preprocessor for splitting the compound words, a morphological parser for context disambiguation and chunking, a syntactic structure transfer module and a bilingual dictionary. All the modules are morpheme based to reduce dictionary size. The system does not rely on a stochastic approach and it is based on a rule-based architecture along with various linguistic knowledge components of both Malayalam and English. The system uses two sets of rules: rules for Malayalam morphology and rules for syntactic structure transfer from Malayalam to English. The system is designed using artificial intelligence techniques.
Improving a Lightweight Stemmer for Gujarati Languageijistjournal
The origin of route of text mining is the process of stemming. It is usually used in several types of applications such as Natural Language Processing (NLP), Information Retrieval (IR) and Text Mining (TM) including Text Categorization (TC), Text Summarization (TS). Establish a stemmer effective for the language of Gujarati has been always a search domain hot since the Gujarati has a very different structure and difficult that the other language due to the rich morphology.
The MSR-NLP Chinese word segmentation system is part of a full sentence analyzer. It uses a dictionary and rules for basic segmentation, morphology, and named entity recognition to build a word lattice. The system proposes new words, prunes the lattice, and uses a parser to produce the final segmentation. It participated in four segmentation bakeoff tracks, ranking highly in each. An analysis found that parameter tuning, morphology/NER, and lattice pruning contributed most to performance, while the parser helped less. Problems included inconsistent annotations and differences in defining new words.
An efficient approach to query reformulation in web searcheSAT Journals
Abstract Wide range of problems regarding to natural language processing, mining of data, bioinformatics and information retrieval can be categorized as string transformation, the following task refers the same. If we give an input string, the system will generates the top k most equivalent output strings which are related to the same input string. In this paper we proposes a narrative and probabilistic method for the transformation of string, which is considered as accurate and also efficient. The approach uses a log linear model, along with the method used for training the model, and also an algorithm that generates the top k outcomes. Log linear method can be defined as restrictive possibility distribution of a result string and the set of rules for the alteration conditioned on key string. It is guaranteed that the resultant top k list will be generated using the algorithm for string generation which is based on pruning. The projected technique is applied to correct the spelling error in query as well as reformulation of queries in case of web based search. Spelling error correction, query reformulation for the related query is not considered in the previous work. Efficiency is not considered as an important issue taken into the consideration in earlier methods and was not focused on improvement of accuracy and efficiency in string transformation. The experimental outcomes on huge scale data show that the projected method is extremely accurate and also efficient. Keywords: Log linear method, Query reformulation, Spelling Error correction.
IRJET- Kinyarwanda Speech Recognition in an Automatic Dictation System for Tr...IRJET Journal
This paper describes a speech recognition system called TKTALK that allows translators to dictate translations between English and Kinyarwanda. The system uses statistical translation models to help guide speech recognition by using information from the source text. It discusses challenges in building speech recognition for Kinyarwanda, including dictionary explosion from elision and homophones. The paper explores how to integrate translation models during acoustic testing to improve speech recognition performance when homophones are present by utilizing context from the source language.
Similar to IRJET- Survey on Grammar Checking and Correction using Deep Learning for Indian Languages (20)
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
1) The document discusses the Sungal Tunnel project in Jammu and Kashmir, India, which is being constructed using the New Austrian Tunneling Method (NATM).
2) NATM involves continuous monitoring during construction to adapt to changing ground conditions, and makes extensive use of shotcrete for temporary tunnel support.
3) The methodology section outlines the systematic geotechnical design process for tunnels according to Austrian guidelines, and describes the various steps of NATM tunnel construction including initial and secondary tunnel support.
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
This study examines the effect of response reduction factors (R factors) on reinforced concrete (RC) framed structures through nonlinear dynamic analysis. Three RC frame models with varying heights (4, 8, and 12 stories) were analyzed in ETABS software under different R factors ranging from 1 to 5. The results showed that displacement increased as the R factor decreased, indicating less linear behavior for lower R factors. Drift also decreased proportionally with increasing R factors from 1 to 5. Shear forces in the frames decreased with higher R factors. In general, R factors of 3 to 5 produced more satisfactory performance with less displacement and drift. The displacement variations between different building heights were consistent at different R factors. This study evaluated how R factors influence
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
This study compares the use of Stark Steel and TMT Steel as reinforcement materials in a two-way reinforced concrete slab. Mechanical testing is conducted to determine the tensile strength, yield strength, and other properties of each material. A two-way slab design adhering to codes and standards is executed with both materials. The performance is analyzed in terms of deflection, stability under loads, and displacement. Cost analyses accounting for material, durability, maintenance, and life cycle costs are also conducted. The findings provide insights into the economic and structural implications of each material for reinforcement selection and recommendations on the most suitable material based on the analysis.
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
This document discusses a study analyzing the effect of camber, position of camber, and angle of attack on the aerodynamic characteristics of airfoils. Sixteen modified asymmetric NACA airfoils were analyzed using computational fluid dynamics (CFD) by varying the camber, camber position, and angle of attack. The results showed the relationship between these parameters and the lift coefficient, drag coefficient, and lift to drag ratio. This provides insight into how changes in airfoil geometry impact aerodynamic performance.
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
This document reviews the progress and challenges of aluminum-based metal matrix composites (MMCs), focusing on their fabrication processes and applications. It discusses how various aluminum MMCs have been developed using reinforcements like borides, carbides, oxides, and nitrides to improve mechanical and wear properties. These composites have gained prominence for their lightweight, high-strength and corrosion resistance properties. The document also examines recent advancements in fabrication techniques for aluminum MMCs and their growing applications in industries such as aerospace and automotive. However, it notes that challenges remain around issues like improper mixing of reinforcements and reducing reinforcement agglomeration.
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
This document discusses research on using graph neural networks (GNNs) for dynamic optimization of public transportation networks in real-time. GNNs represent transit networks as graphs with nodes as stops and edges as connections. The GNN model aims to optimize networks using real-time data on vehicle locations, arrival times, and passenger loads. This helps increase mobility, decrease traffic, and improve efficiency. The system continuously trains and infers to adapt to changing transit conditions, providing decision support tools. While research has focused on performance, more work is needed on security, socio-economic impacts, contextual generalization of models, continuous learning approaches, and effective real-time visualization.
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
This document summarizes a research project that aims to compare the structural performance of conventional slab and grid slab systems in multi-story buildings using ETABS software. The study will analyze both symmetric and asymmetric building models under various loading conditions. Parameters like deflections, moments, shears, and stresses will be examined to evaluate the structural effectiveness of each slab type. The results will provide insights into the comparative behavior of conventional and grid slabs to help engineers and architects select appropriate slab systems based on building layouts and design requirements.
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
This document summarizes and reviews a research paper on the seismic response of reinforced concrete (RC) structures with plan and vertical irregularities, with and without infill walls. It discusses how infill walls can improve or reduce the seismic performance of RC buildings, depending on factors like wall layout, height distribution, connection to the frame, and relative stiffness of walls and frames. The reviewed research paper analyzes the behavior of infill walls, effects of vertical irregularities, and seismic performance of high-rise structures under linear static and dynamic analysis. It studies response characteristics like story drift, deflection and shear. The document also provides literature on similar research investigating the effects of infill walls, soft stories, plan irregularities, and different
This document provides a review of machine learning techniques used in Advanced Driver Assistance Systems (ADAS). It begins with an abstract that summarizes key applications of machine learning in ADAS, including object detection, recognition, and decision-making. The introduction discusses the integration of machine learning in ADAS and how it is transforming vehicle safety. The literature review then examines several research papers on topics like lightweight deep learning models for object detection and lane detection models using image processing. It concludes by discussing challenges and opportunities in the field, such as improving algorithm robustness and adaptability.
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
The document analyzes temperature and precipitation trends in Asosa District, Benishangul Gumuz Region, Ethiopia from 1993 to 2022 based on data from the local meteorological station. The results show:
1) The average maximum and minimum annual temperatures have generally decreased over time, with maximum temperatures decreasing by a factor of -0.0341 and minimum by -0.0152.
2) Mann-Kendall tests found the decreasing temperature trends to be statistically significant for annual maximum temperatures but not for annual minimum temperatures.
3) Annual precipitation in Asosa District showed a statistically significant increasing trend.
The conclusions recommend development planners account for rising summer precipitation and declining temperatures in
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
This document discusses the design and analysis of pre-engineered building (PEB) framed structures using STAAD Pro software. It provides an overview of PEBs, including that they are designed off-site with building trusses and beams produced in a factory. STAAD Pro is identified as a key tool for modeling, analyzing, and designing PEBs to ensure their performance and safety under various load scenarios. The document outlines modeling structural parts in STAAD Pro, evaluating structural reactions, assigning loads, and following international design codes and standards. In summary, STAAD Pro is used to design and analyze PEB framed structures to ensure safety and code compliance.
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
This document provides a review of research on innovative fiber integration methods for reinforcing concrete structures. It discusses studies that have explored using carbon fiber reinforced polymer (CFRP) composites with recycled plastic aggregates to develop more sustainable strengthening techniques. It also examines using ultra-high performance fiber reinforced concrete to improve shear strength in beams. Additional topics covered include the dynamic responses of FRP-strengthened beams under static and impact loads, and the performance of preloaded CFRP-strengthened fiber reinforced concrete beams. The review highlights the potential of fiber composites to enable more sustainable and resilient construction practices.
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
This document summarizes a survey on securing patient healthcare data in cloud-based systems. It discusses using technologies like facial recognition, smart cards, and cloud computing combined with strong encryption to securely store patient data. The survey found that healthcare professionals believe digitizing patient records and storing them in a centralized cloud system would improve access during emergencies and enable more efficient care compared to paper-based systems. However, ensuring privacy and security of patient data is paramount as healthcare incorporates these digital technologies.
Review on studies and research on widening of existing concrete bridgesIRJET Journal
This document summarizes several studies that have been conducted on widening existing concrete bridges. It describes a study from China that examined load distribution factors for a bridge widened with composite steel-concrete girders. It also outlines challenges and solutions for widening a bridge in the UAE, including replacing bearings and stitching the new and existing structures. Additionally, it discusses two bridge widening projects in New Zealand that involved adding precast beams and stitching to connect structures. Finally, safety measures and challenges for strengthening a historic bridge in Switzerland under live traffic are presented.
React based fullstack edtech web applicationIRJET Journal
The document describes the architecture of an educational technology web application built using the MERN stack. It discusses the frontend developed with ReactJS, backend with NodeJS and ExpressJS, and MongoDB database. The frontend provides dynamic user interfaces, while the backend offers APIs for authentication, course management, and other functions. MongoDB enables flexible data storage. The architecture aims to provide a scalable, responsive platform for online learning.
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
This paper proposes integrating Internet of Things (IoT) and blockchain technologies to help implement objectives of India's National Education Policy (NEP) in the education sector. The paper discusses how blockchain could be used for secure student data management, credential verification, and decentralized learning platforms. IoT devices could create smart classrooms, automate attendance tracking, and enable real-time monitoring. Blockchain would ensure integrity of exam processes and resource allocation, while smart contracts automate agreements. The paper argues this integration has potential to revolutionize education by making it more secure, transparent and efficient, in alignment with NEP goals. However, challenges like infrastructure needs, data privacy, and collaborative efforts are also discussed.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
This document provides a review of research on the performance of coconut fibre reinforced concrete. It summarizes several studies that tested different volume fractions and lengths of coconut fibres in concrete mixtures with varying compressive strengths. The studies found that coconut fibre improved properties like tensile strength, toughness, crack resistance, and spalling resistance compared to plain concrete. Volume fractions of 2-5% and fibre lengths of 20-50mm produced the best results. The document concludes that using a 4-5% volume fraction of coconut fibres 30-40mm in length with M30-M60 grade concrete would provide benefits based on previous research.
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
The document discusses optimizing business management processes through automation using Microsoft Power Automate and artificial intelligence. It provides an overview of Power Automate's key components and features for automating workflows across various apps and services. The document then presents several scenarios applying automation solutions to common business processes like data entry, monitoring, HR, finance, customer support, and more. It estimates the potential time and cost savings from implementing automation for each scenario. Finally, the conclusion emphasizes the transformative impact of AI and automation tools on business processes and the need for ongoing optimization.
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
The document describes the seismic design of a G+5 steel building frame located in Roorkee, India according to Indian codes IS 1893-2002 and IS 800. The frame was analyzed using the equivalent static load method and response spectrum method, and its response in terms of displacements and shear forces were compared. Based on the analysis, the frame was designed as a seismic-resistant steel structure according to IS 800:2007. The software STAAD Pro was used for the analysis and design.
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
This research paper explores using plastic waste as a sustainable and cost-effective construction material. The study focuses on manufacturing pavers and bricks using recycled plastic and partially replacing concrete with plastic alternatives. Initial results found that pavers and bricks made from recycled plastic demonstrate comparable strength and durability to traditional materials while providing environmental and cost benefits. Additionally, preliminary research indicates incorporating plastic waste as a partial concrete replacement significantly reduces construction costs without compromising structural integrity. The outcomes suggest adopting plastic waste in construction can address plastic pollution while optimizing costs, promoting more sustainable building practices.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
The CBC machine is a common diagnostic tool used by doctors to measure a patient's red blood cell count, white blood cell count and platelet count. The machine uses a small sample of the patient's blood, which is then placed into special tubes and analyzed. The results of the analysis are then displayed on a screen for the doctor to review. The CBC machine is an important tool for diagnosing various conditions, such as anemia, infection and leukemia. It can also help to monitor a patient's response to treatment.