The document summarizes research on part-of-speech tagging for several Indian languages. It discusses rule-based, stochastic, and neural network approaches to POS tagging and surveys work done on Hindi, Bengali, Tamil, Telugu, Gujarati, Malayalam, Manipuri, and Assamese. The accuracy of POS taggers for Indian languages ranges from 71-94% depending on the language and approach used, compared to 93-98% for Western languages, due to limited annotated data and morphological complexity in Indian languages.
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
This document summarizes research on named entity recognition for the Hindi language. It discusses various techniques that have been used for NER in Hindi, including rule-based approaches, machine learning approaches like hidden Markov models and conditional random fields, and hybrid approaches. The document also reviews several papers on Hindi NER systems that have used techniques like rule-based methods, list lookup, hidden Markov models, joint parsing with POS tagging, and distant supervision. Most studies found that machine learning approaches like hidden Markov models and conditional random fields produced relatively high accuracy, though performance could likely be improved further with larger annotated corpora.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODijait
This document describes the development of a part-of-speech tagger for Marathi text using a trigram statistical approach. The trigram method assigns POS tags to words based on the probabilities of tag transitions given the previous two tags. The tagger was evaluated on a test corpus of 2000 sentences and achieved an accuracy of 91.63%. Future work will aim to improve accuracy by expanding the training corpus with more tagged sentences. The document also provides background on previous work developing POS taggers for other Indian languages and challenges in tagging morphologically rich languages like Marathi.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
This paper presents a novel technique for context based numeral reading in Indian language text to speech systems. The model uses a set of rules to determine the context of the numeral pronunciation and is being integrated with the waveform concatenation technique to produce speech out of the input text in Indian languages. For this purpose, the three Indian languages Odia, Hindi and Bengali are considered. To analyze the performance of the proposed technique, a set of experiments are performed considering different context of numeral pronunciations and the results are compared with existing syllable-based technique. The results obtained from different experiments shows the effectiveness of the proposed technique in producing intelligible speech out of the entered text utterances compared to the existing technique even with very less storage and execution time.
A survey of named entity recognition in assamese and other indian languagesijnlc
Named Entity Recognition is always important when dealing with major Natural Language Processing
tasks such as information extraction, question-answering, machine translation, document summarization
etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular
reference to Assamese. There are various rule-based and machine learning approaches available for
Named Entity Recognition. At the very first of the paper we give an idea of the available approaches for
Named Entity Recognition and then we discuss about the related research in this field. Assamese like other
Indian languages is agglutinative and suffers from lack of appropriate resources as Named Entity
Recognition requires large data sets, gazetteer list, dictionary etc and some useful feature like
capitalization as found in English cannot be found in Assamese. Apart from this we also describe some of
the issues faced in Assamese while doing Named Entity Recognition.
Part of Speech tagging in Indian Languages is still an open problem. We still lack a clear approach in implementing a POS tagger for Indian Languages. In this paper we describe our efforts to build a Hidden Markov Model based Part of Speech Tagger. We have used IL POS tag set for the development of this tagger. We have achieved the accuracy of 92%
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
IRJET -Survey on Named Entity Recognition using Syntactic Parsing for Hindi L...IRJET Journal
This document summarizes research on named entity recognition for the Hindi language. It discusses various techniques that have been used for NER in Hindi, including rule-based approaches, machine learning approaches like hidden Markov models and conditional random fields, and hybrid approaches. The document also reviews several papers on Hindi NER systems that have used techniques like rule-based methods, list lookup, hidden Markov models, joint parsing with POS tagging, and distant supervision. Most studies found that machine learning approaches like hidden Markov models and conditional random fields produced relatively high accuracy, though performance could likely be improved further with larger annotated corpora.
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
PART OF SPEECH TAGGING OFMARATHI TEXT USING TRIGRAMMETHODijait
This document describes the development of a part-of-speech tagger for Marathi text using a trigram statistical approach. The trigram method assigns POS tags to words based on the probabilities of tag transitions given the previous two tags. The tagger was evaluated on a test corpus of 2000 sentences and achieved an accuracy of 91.63%. Future work will aim to improve accuracy by expanding the training corpus with more tagged sentences. The document also provides background on previous work developing POS taggers for other Indian languages and challenges in tagging morphologically rich languages like Marathi.
An Improved Approach for Word Ambiguity RemovalWaqas Tariq
Word ambiguity removal is a task of removing ambiguity from a word, i.e. correct sense of word is identified from ambiguous sentences. This paper describes a model that uses Part of Speech tagger and three categories for word sense disambiguation (WSD). Human Computer Interaction is very needful to improve interactions between users and computers. For this, the Supervised and Unsupervised methods are combined. The WSD algorithm is used to find the efficient and accurate sense of a word based on domain information. The accuracy of this work is evaluated with the aim of finding best suitable domain of word. Keywords: Human Computer Interaction, Supervised Training, Unsupervised Learning, Word Ambiguity, Word sense disambiguation
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
This paper presents a novel technique for context based numeral reading in Indian language text to speech systems. The model uses a set of rules to determine the context of the numeral pronunciation and is being integrated with the waveform concatenation technique to produce speech out of the input text in Indian languages. For this purpose, the three Indian languages Odia, Hindi and Bengali are considered. To analyze the performance of the proposed technique, a set of experiments are performed considering different context of numeral pronunciations and the results are compared with existing syllable-based technique. The results obtained from different experiments shows the effectiveness of the proposed technique in producing intelligible speech out of the entered text utterances compared to the existing technique even with very less storage and execution time.
A survey of named entity recognition in assamese and other indian languagesijnlc
Named Entity Recognition is always important when dealing with major Natural Language Processing
tasks such as information extraction, question-answering, machine translation, document summarization
etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular
reference to Assamese. There are various rule-based and machine learning approaches available for
Named Entity Recognition. At the very first of the paper we give an idea of the available approaches for
Named Entity Recognition and then we discuss about the related research in this field. Assamese like other
Indian languages is agglutinative and suffers from lack of appropriate resources as Named Entity
Recognition requires large data sets, gazetteer list, dictionary etc and some useful feature like
capitalization as found in English cannot be found in Assamese. Apart from this we also describe some of
the issues faced in Assamese while doing Named Entity Recognition.
Part of Speech tagging in Indian Languages is still an open problem. We still lack a clear approach in implementing a POS tagger for Indian Languages. In this paper we describe our efforts to build a Hidden Markov Model based Part of Speech Tagger. We have used IL POS tag set for the development of this tagger. We have achieved the accuracy of 92%
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
This paper presents a machine translation system that translates simple assertive English sentences to Marathi sentences. The system performs morphological analysis, part-of-speech tagging, and local word grouping to convert the meaning of the English sentence to the corresponding Marathi sentence. An English to Marathi bilingual dictionary is used for translation. The system aims to help people with primary education understand English words by providing translations to their native Marathi language.
Driving cycle development for Kuala Terengganu city using k-means methodIJECEIAES
Driving cycle plays a vital role in the production and evaluating the performance of the vehicle. Driving cycle is a representative speed-time profile of driving behavior of specific region or city. Many countries has developed their own driving cycle such as United State of America, United Kingdom, India, China, Ireland, Slovenia, Singapore, and many more. The objectives of this paper are to characterize and develop driving cycle of Kuala Terengganu city at 8.00 a.m. along five different routes using k-means method, to analyze fuel rate and emissions using the driving cycle developed and to compare the fuel rate and emissions with conventional engine vehicles, parallel plug-in hybrid electric vehicle, series plug-in hybrid electric vehicle and single split-mode plug-in hybrid electric vehicle. The methodology involves three major steps which are route selection, data collection using on-road measurement method and driving cycle development using k-means method. Matrix Laboratory software (MATLAB) has been used as the computer program platform in order to produce the best driving cycle and Vehicle System Simulation Tool Development (AUTONOMIE) software has been used to analyze fuel rate and gas emission. Based on the findings, it can be concluded that, Route C and single spilt-mode PHEV powertrain used and emit least amount of fuel and emissions.
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...ijnlc
The comprehension of whole manually written records is a testing issue which incorporates various
difficult undertakings. Given a written by hand archive, its format needs to be dissected to detach different
content sorts in a first step. These different content sorts can then be coordinated to specific frameworks,
including writer style, image, or table recognizers. Research in programmed author recognizable proof has
principally centred around the measurable methodology. This has prompted the particular and extraction
of factual elements, for example, run-length appropriations, incline dissemination, entropy, and edge-pivot
conveyance. The edge-pivot conveyance highlight out flanks all other measurable elements. Edge-pivot
circulation is an element that portrays the adjustments in bearing of a written work stroke in written by
hand content. The edge- pivot circulation is extricated by method for a window that is slid over an edgerecognized
on offline scanned images. At whatever point the focal pixel of the window is on, the two edge
pieces (i.e. associated successions of pixels) rising up out of this focal pixel are considered. Their bearings
are measured and put away as sets. A joint likelihood dissemination is gotten from an extensive recognition
This document summarizes research on recognizing Urdu handwritten characters using a convolutional neural network (CNN). The researchers created a novel dataset of Urdu handwritten characters since no publicly available dataset existed. A series of experiments were conducted on the proposed dataset using CNN, achieving accuracy among the best reported for this task. The paper provides background on the Urdu script, reviews previous work on Urdu handwritten character recognition, and describes the proposed CNN model and experimental results.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
Creation of speech corpus for emotion analysis in Gujarati language and its e...IJECEIAES
In the last couple of years emotion recognition has proven its significance in the area of artificial intelligence and man machine communication. Emotion recognition can be done using speech and image (facial expression), this paper deals with SER (speech emotion recognition) only. For emotion recognition emotional speech database is essential. In this paper we have proposed emotional database which is developed in Gujarati language, one of the official’s language of India. The proposed speech corpus bifurcate six emotional states as: sadness, surprise, anger, disgust, fear, happiness. To observe effect of different emotions, analysis of proposed Gujarati speech database is carried out using efficient speech parameters like pitch, energy and MFCC using MATLAB Software.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
This document describes the development of the first manually annotated named entity corpus for the Myanmar language. It contains approximately 170,000 named entities tagged with types like person, location, organization, race, time and number. The document also discusses experiments using various deep neural network architectures for named entity recognition on Myanmar text, without additional feature engineering. Results showed that syllable-based neural models outperformed the baseline conditional random field model. This research aims to apply neural networks to Myanmar natural language processing and promote future work on this under-resourced language.
A decision tree based word sense disambiguation system in manipuri languageacijjournal
This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri
language. The paper discusses related attempts made in the Manipuri language followed by the proposed
plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study.
Conventional positional and context based features are suggested to capture the sense of the words, which
have ambiguous and multiple senses. The proposed work is expected to predict the senses of the
polysemous words with high accuracy with the help of the suitable knowledge acquisition techniques. The
system produces an accuracy of 71.75 %.
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
This document presents a spoken language identification system that distinguishes between Tamil and Telugu languages using Mel-Frequency Cepstral Coefficient (MFCC) features extracted from speech signals and Gaussian Mixture Models (GMM) for language modeling. The system is trained on a dataset of speech samples in Tamil and Telugu from the Microsoft Speech Corpus for Indian Languages. MFCC features represent the acoustic properties of speech sounds and have been shown to provide good performance for language identification. GMM is used to model the probability distributions of MFCC features for each language. The proposed system aims to identify the language of unknown speech samples as either Tamil or Telugu.
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
This document summarizes an implementation of an English-text to Marathi-speech synthesizer. The synthesizer uses a unit selection approach based on concatenative synthesis to produce natural sounding Marathi speech from English text input. Over 28,000 Marathi syllables, words and sentences were recorded from a female speaker and used to create the speech corpus. Formant frequencies (F1, F2, F3) were analyzed from the synthesized speech using MATLAB and PRAAT tools to evaluate the quality and naturalness of the output.
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELijnlc
This document describes a tool called NERHMM for performing named entity recognition using hidden Markov models. The tool allows users to annotate raw text to create tagged corpora, train hidden Markov models on annotated data to calculate parameters, and test new text data to produce named entity tags. The tool works for multiple languages and can handle diverse tag sets. It provides a simple interface for tasks involved in named entity recognition like corpus development and parameter estimation for hidden Markov models. Evaluation on data shows the tool's performance increases with more training data.
An Optical Character Recognition for Handwritten Devanagari ScriptIJERA Editor
Optical Character Recognition is process of recognition of character from scanned document and lots of OCR now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters . There are no sufficient number of work on Indian language script like Devanagari so this paper present a review on optical character recognition on handwritten Devanagari script
This document describes the development of a text-to-speech synthesizer for the Pali language. It discusses previous work on speech synthesis systems for Indian languages. It then outlines the methodology used, including developing a phone set and speech database for Pali, and using a unit selection approach for speech synthesis. The system was evaluated based on the naturalness of the synthesized speech output. Results showed smooth spectral changes at concatenation points and uniform spectral changes across syllable boundaries, indicating the system produces intelligible synthetic Pali speech.
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
POS tagging using Resourch Rich Languagesuman101112
This document describes a project on part-of-speech (POS) tagging of the Marathi language using POS tagged data from the Hindi language. The approach uses parallel corpora of 50,000 Hindi-Marathi sentences to project POS tags from Hindi to Marathi via word alignments. Trigram similarities between languages are calculated using pointwise mutual information scores. Tags are assigned to Marathi words based on aligned Hindi words and propagation from neighboring words. An accuracy of 70.6% is achieved on a test set of 100 Marathi sentences.
This document discusses natural language processing (NLP) from a developer's perspective. It provides an overview of common NLP tasks like spam detection, machine translation, question answering, and summarization. It then discusses challenges in NLP like ambiguity and new forms of written language. The document goes on to explain how probabilistic models are used in NLP to infer language properties and complete tasks like sentence completion and phrase rearrangement using concepts like language models. It also covers text processing techniques like tokenization and regular expressions. Finally, it discusses spelling correction in detail using techniques like noisy channel modeling and confusion matrices.
This document is the slides for a lecture on part-of-speech tagging, keyword and phrase extraction, and text similarity for natural language processing. It introduces part-of-speech tagging and different taggers such as rule-based and ngram-based approaches. It also discusses methods for keyword and phrase extraction including supervised classifiers and unsupervised techniques like TF-IDF. Finally, it covers measuring text similarity using vector space models and cosine similarity.
N-gram models are statistical language models that can be used for word prediction. They calculate the probability of a word based on the previous N-1 words. Higher N-gram models require more data but capture context better. N-gram probabilities are estimated by counting word sequences in a large training corpus and calculating conditional probabilities. These probabilities provide information about syntactic patterns and semantic relationships in language.
This document provides an overview of natural language processing (NLP) including the linguistic basis of NLP, common NLP problems and approaches, sources of NLP data, and steps to develop an NLP system. It discusses tokenization, part-of-speech tagging, parsing, machine learning approaches like naive Bayes classification and dependency parsing, measuring word similarity, and distributional semantics. The document also provides advice on going from research to production systems and notes areas not covered like machine translation and deep learning methods.
This paper presents a machine translation system that translates simple assertive English sentences to Marathi sentences. The system performs morphological analysis, part-of-speech tagging, and local word grouping to convert the meaning of the English sentence to the corresponding Marathi sentence. An English to Marathi bilingual dictionary is used for translation. The system aims to help people with primary education understand English words by providing translations to their native Marathi language.
Driving cycle development for Kuala Terengganu city using k-means methodIJECEIAES
Driving cycle plays a vital role in the production and evaluating the performance of the vehicle. Driving cycle is a representative speed-time profile of driving behavior of specific region or city. Many countries has developed their own driving cycle such as United State of America, United Kingdom, India, China, Ireland, Slovenia, Singapore, and many more. The objectives of this paper are to characterize and develop driving cycle of Kuala Terengganu city at 8.00 a.m. along five different routes using k-means method, to analyze fuel rate and emissions using the driving cycle developed and to compare the fuel rate and emissions with conventional engine vehicles, parallel plug-in hybrid electric vehicle, series plug-in hybrid electric vehicle and single split-mode plug-in hybrid electric vehicle. The methodology involves three major steps which are route selection, data collection using on-road measurement method and driving cycle development using k-means method. Matrix Laboratory software (MATLAB) has been used as the computer program platform in order to produce the best driving cycle and Vehicle System Simulation Tool Development (AUTONOMIE) software has been used to analyze fuel rate and gas emission. Based on the findings, it can be concluded that, Route C and single spilt-mode PHEV powertrain used and emit least amount of fuel and emissions.
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...ijnlc
The comprehension of whole manually written records is a testing issue which incorporates various
difficult undertakings. Given a written by hand archive, its format needs to be dissected to detach different
content sorts in a first step. These different content sorts can then be coordinated to specific frameworks,
including writer style, image, or table recognizers. Research in programmed author recognizable proof has
principally centred around the measurable methodology. This has prompted the particular and extraction
of factual elements, for example, run-length appropriations, incline dissemination, entropy, and edge-pivot
conveyance. The edge-pivot conveyance highlight out flanks all other measurable elements. Edge-pivot
circulation is an element that portrays the adjustments in bearing of a written work stroke in written by
hand content. The edge- pivot circulation is extricated by method for a window that is slid over an edgerecognized
on offline scanned images. At whatever point the focal pixel of the window is on, the two edge
pieces (i.e. associated successions of pixels) rising up out of this focal pixel are considered. Their bearings
are measured and put away as sets. A joint likelihood dissemination is gotten from an extensive recognition
This document summarizes research on recognizing Urdu handwritten characters using a convolutional neural network (CNN). The researchers created a novel dataset of Urdu handwritten characters since no publicly available dataset existed. A series of experiments were conducted on the proposed dataset using CNN, achieving accuracy among the best reported for this task. The paper provides background on the Urdu script, reviews previous work on Urdu handwritten character recognition, and describes the proposed CNN model and experimental results.
Natural Language Processing Theory, Applications and Difficultiesijtsrd
The promise of a powerful computing device to help people in productivity as well as in recreation can only be realized with proper human machine communication. Automatic recognition and understanding of spoken language is the first step toward natural human machine interaction. Research in this field has produced remarkable results, leading to many exciting expectations and new challenges. This field is known as Natural language Processing. In this paper the natural language generation and Natural language understanding is discussed. Difficulties in NLU, applications and comparison with structured programming language are also discussed here. Mrs. Anjali Gharat | Mrs. Helina Tandel | Mr. Ketan Bagade "Natural Language Processing Theory, Applications and Difficulties" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-6 , October 2019, URL: https://www.ijtsrd.com/papers/ijtsrd28092.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/28092/natural-language-processing-theory-applications-and-difficulties/mrs-anjali-gharat
Creation of speech corpus for emotion analysis in Gujarati language and its e...IJECEIAES
In the last couple of years emotion recognition has proven its significance in the area of artificial intelligence and man machine communication. Emotion recognition can be done using speech and image (facial expression), this paper deals with SER (speech emotion recognition) only. For emotion recognition emotional speech database is essential. In this paper we have proposed emotional database which is developed in Gujarati language, one of the official’s language of India. The proposed speech corpus bifurcate six emotional states as: sadness, surprise, anger, disgust, fear, happiness. To observe effect of different emotions, analysis of proposed Gujarati speech database is carried out using efficient speech parameters like pitch, energy and MFCC using MATLAB Software.
Quality estimation of machine translation outputs through stemmingijcsa
Machine Translation is the challenging problem for Indian languages. Every day we can see some machine
translators being developed , but getting a high quality automatic translation is still a very distant dream .
The correct translated sentence for Hindi language is rarely found. In this paper, we are emphasizing on
English-Hindi language pair, so in order to preserve the correct MT output we present a ranking system,
which employs some machine learning techniques and morphological features. In ranking no human
intervention is required. We have also validated our results by comparing it with human ranking.
Myanmar named entity corpus and its use in syllable-based neural named entity...IJECEIAES
This document describes the development of the first manually annotated named entity corpus for the Myanmar language. It contains approximately 170,000 named entities tagged with types like person, location, organization, race, time and number. The document also discusses experiments using various deep neural network architectures for named entity recognition on Myanmar text, without additional feature engineering. Results showed that syllable-based neural models outperformed the baseline conditional random field model. This research aims to apply neural networks to Myanmar natural language processing and promote future work on this under-resourced language.
A decision tree based word sense disambiguation system in manipuri languageacijjournal
This paper manifests a primary attempt on building a word sense disambiguation system in Manipuri
language. The paper discusses related attempts made in the Manipuri language followed by the proposed
plan. A database, consisting of 650 sentences, is collected in Manipuri language in the course of the study.
Conventional positional and context based features are suggested to capture the sense of the words, which
have ambiguous and multiple senses. The proposed work is expected to predict the senses of the
polysemous words with high accuracy with the help of the suitable knowledge acquisition techniques. The
system produces an accuracy of 71.75 %.
IRJET- Spoken Language Identification System using MFCC Features and Gaus...IRJET Journal
This document presents a spoken language identification system that distinguishes between Tamil and Telugu languages using Mel-Frequency Cepstral Coefficient (MFCC) features extracted from speech signals and Gaussian Mixture Models (GMM) for language modeling. The system is trained on a dataset of speech samples in Tamil and Telugu from the Microsoft Speech Corpus for Indian Languages. MFCC features represent the acoustic properties of speech sounds and have been shown to provide good performance for language identification. GMM is used to model the probability distributions of MFCC features for each language. The proposed system aims to identify the language of unknown speech samples as either Tamil or Telugu.
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
This document summarizes an implementation of an English-text to Marathi-speech synthesizer. The synthesizer uses a unit selection approach based on concatenative synthesis to produce natural sounding Marathi speech from English text input. Over 28,000 Marathi syllables, words and sentences were recorded from a female speaker and used to create the speech corpus. Formant frequencies (F1, F2, F3) were analyzed from the synthesized speech using MATLAB and PRAAT tools to evaluate the quality and naturalness of the output.
NERHMM: A TOOL FOR NAMED ENTITY RECOGNITION BASED ON HIDDEN MARKOV MODELijnlc
This document describes a tool called NERHMM for performing named entity recognition using hidden Markov models. The tool allows users to annotate raw text to create tagged corpora, train hidden Markov models on annotated data to calculate parameters, and test new text data to produce named entity tags. The tool works for multiple languages and can handle diverse tag sets. It provides a simple interface for tasks involved in named entity recognition like corpus development and parameter estimation for hidden Markov models. Evaluation on data shows the tool's performance increases with more training data.
An Optical Character Recognition for Handwritten Devanagari ScriptIJERA Editor
Optical Character Recognition is process of recognition of character from scanned document and lots of OCR now available in the market. But most of these systems work for Roman, Chinese, Japanese and Arabic characters . There are no sufficient number of work on Indian language script like Devanagari so this paper present a review on optical character recognition on handwritten Devanagari script
This document describes the development of a text-to-speech synthesizer for the Pali language. It discusses previous work on speech synthesis systems for Indian languages. It then outlines the methodology used, including developing a phone set and speech database for Pali, and using a unit selection approach for speech synthesis. The system was evaluated based on the naturalness of the synthesized speech output. Results showed smooth spectral changes at concatenation points and uniform spectral changes across syllable boundaries, indicating the system produces intelligible synthetic Pali speech.
Emotional telugu speech signals classification based on k nn classifiereSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
Part of speech (POS) tagging is the process of assigning a part of speech tag like noun, verb, adjective to each word in a sentence. It involves determining the most likely tag sequence given the probabilities of tags occurring before or after other tags, and words occurring with certain tags. POS tagging is the first step in many NLP applications and helps determine the grammatical role of words. It involves calculating bigram and lexical probabilities from annotated corpora to find the tag sequence with the highest joint probability.
POS tagging using Resourch Rich Languagesuman101112
This document describes a project on part-of-speech (POS) tagging of the Marathi language using POS tagged data from the Hindi language. The approach uses parallel corpora of 50,000 Hindi-Marathi sentences to project POS tags from Hindi to Marathi via word alignments. Trigram similarities between languages are calculated using pointwise mutual information scores. Tags are assigned to Marathi words based on aligned Hindi words and propagation from neighboring words. An accuracy of 70.6% is achieved on a test set of 100 Marathi sentences.
This document discusses natural language processing (NLP) from a developer's perspective. It provides an overview of common NLP tasks like spam detection, machine translation, question answering, and summarization. It then discusses challenges in NLP like ambiguity and new forms of written language. The document goes on to explain how probabilistic models are used in NLP to infer language properties and complete tasks like sentence completion and phrase rearrangement using concepts like language models. It also covers text processing techniques like tokenization and regular expressions. Finally, it discusses spelling correction in detail using techniques like noisy channel modeling and confusion matrices.
This document is the slides for a lecture on part-of-speech tagging, keyword and phrase extraction, and text similarity for natural language processing. It introduces part-of-speech tagging and different taggers such as rule-based and ngram-based approaches. It also discusses methods for keyword and phrase extraction including supervised classifiers and unsupervised techniques like TF-IDF. Finally, it covers measuring text similarity using vector space models and cosine similarity.
N-gram models are statistical language models that can be used for word prediction. They calculate the probability of a word based on the previous N-1 words. Higher N-gram models require more data but capture context better. N-gram probabilities are estimated by counting word sequences in a large training corpus and calculating conditional probabilities. These probabilities provide information about syntactic patterns and semantic relationships in language.
This document provides an overview of natural language processing (NLP) including the linguistic basis of NLP, common NLP problems and approaches, sources of NLP data, and steps to develop an NLP system. It discusses tokenization, part-of-speech tagging, parsing, machine learning approaches like naive Bayes classification and dependency parsing, measuring word similarity, and distributional semantics. The document also provides advice on going from research to production systems and notes areas not covered like machine translation and deep learning methods.
Words and sentences are the basic units of text. In this lecture we discuss basics of operations on words and sentences such as tokenization, text normalization, tf-idf, cosine similarity measures, vector space models and word representation
Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
Introduction to Natural Language ProcessingPranav Gupta
the presentation gives a gist about the major tasks and challenges involved in natural language processing. In the second part, it talks about one technique each for Part Of Speech Tagging and Automatic Text Summarization
The document provides an overview of natural language processing (NLP). It defines NLP as the automatic processing of human language and discusses how NLP relates to fields like linguistics, cognitive science, and computer science. The document also describes common NLP tasks like information extraction, machine translation, and summarization. It discusses challenges in NLP like ambiguity and examines techniques used in NLP like rule-based systems, probabilistic models, and the use of linguistic knowledge.
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
A COMPREHENSIVE ANALYSIS OF STEMMERS AVAILABLE FOR INDIC LANGUAGES ijnlc
Stemming is the process of term conflation. It conflates all the word variants to a common form called as stem. It plays significant role in numerous Natural Language Processing (NLP) applications like morphological analysis, parsing, document summarization, text classification, part-of-speech tagging, question-answering system, machine translation, word sense disambiguation, information retrieval (IR), etc. Each of these tasks requires some pre-processing to be done. Stemming is one of the important building blocks for all these applications. This paper, presents an overview of various stemming techniques, evaluation criteria for stemmers and various existing stemmers for Indic languages.
Live Sign Language Translation: A SurveyIRJET Journal
The document discusses various approaches that have been used for live sign language translation. It reviews 20 research papers that used techniques like convolutional neural networks, support vector machines, k-nearest neighbors, and LSTM networks to classify hand gestures and translate sign language into text with varying levels of accuracy between 62.3% to 99.9%. Deep learning models using CNNs and LSTMs achieved the highest accuracy compared to traditional classifiers. The paper aims to help other researchers in the field understand past approaches and how to potentially improve sign language translation systems.
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
This document summarizes a research paper that proposes a fuzzy rule-based system for classifying and recognizing handwritten Hindi words. The system works in six stages: preprocessing, segmentation, normalization, classification, feature extraction, and recognition. In the classification stage, characters are classified into seven classes based on the presence, position, length, connectivity, and number of junction points of their vertical bars. Experimental results on 450 words written by 30 people showed the system achieved a classification and recognition rate of 92.02%.
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
This document describes a fuzzy rule-based system for classifying and recognizing handwritten Hindi words. The system works in six stages: preprocessing, segmentation, normalization, classification, feature extraction, and recognition. Preprocessing includes binarization, thinning, slant correction, dilation, erosion, and filtering to prepare images for further processing. Classification uses fuzzy if-then rules based on the presence and position of vertical bars to classify characters into seven classes. Feature extraction identifies curves, lines, junction points and endpoints. The system was tested on 450 words written by 30 people, achieving a recognition rate of 92.02%.
Toward accurate Amazigh part-of-speech taggingIAESIJAI
Part-of-speech (POS) tagging is the process of assigning to each word in a text its corresponding grammatical information POS. It is an important preprocessing step in other natural language processing (NLP) tasks, so the objective of finding the most accurate one. The previous approaches were based on traditional machine learning algorithms, later with the development of deep learning, more POS taggers were adopted. If the accuracy of POS tagging reaches 97%, even with the traditional machine learning, for high resourced language like English, French, it’s far the case in low resource language like Amazigh. The most used approaches are traditional machine learning, and the results are far from those for rich language. In this paper, we present a new POS tagger based on bidirectional long short-term memory for Amazigh language and the experiments that have been done on real dataset shows that it outperforms the existing machine learning methods.
IRJET - Text Optimization/Summarizer using Natural Language Processing IRJET Journal
1. The document discusses the development of an intelligent system to optimize the English language using natural language processing techniques. The system will perform functions like summarization, spell check, grammar check, and sentence auto-completion.
2. It describes the various algorithms used for each function, including extracting important sentences for summarization, comparing words to dictionaries for spell check, analyzing syntax for grammar check, and completing sentences based on previous user data for auto-completion.
3. The system aims to build a smart tool that can correct errors and summarize text in English to improve communication through optimized language.
A New Approach to Parts of Speech Tagging in Malayalamijcsit
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word’s
usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes
include additional information, with case markers (number, gender etc) and tense markers. A large number
of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based
Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the
oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and
statistical information to assign tag to words. It use large corpus, so that Time complexity and Space
complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic
Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the root word forms. The
currently used Algorithms are efficient Machine Learning Algorithms but these are not built for
Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use
Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence
structure along with the dictionary entry.
The document presents a study on language recognition and offensive word detection. It discusses developing models using machine learning algorithms like logistic regression, TF-IDF, and SVM to identify the language of text inputs consisting of 44 languages, and detect offensive English and Hindi words. The models are trained on large datasets containing language samples and offensive terms. The system architecture involves data collection, training classifiers, and using techniques like n-grams and ensemble modeling for language identification and offensive word detection. Evaluation shows logistic regression achieving 99.2% accuracy for language recognition. The study aims to build automated tools to analyze multilingual texts and detect inappropriate content.
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...IJCI JOURNAL
Speech technology is a field that encompasses various techniques and tools used to enable machines to interact with speech, such as automatic speech recognition (ASR), spoken dialog systems, and others, allowing a device to capture spoken words through a microphone from a human speaker. End-to-end approaches such as Connectionist Temporal Classification (CTC) and attention-based methods are the most used for the development of ASR systems. However, these techniques were commonly used for research and development for many high-resourced languages with large amounts of speech data for training and evaluation, leaving low-resource languages relatively underdeveloped. While the CTC method has been successfully used for other languages, its effectiveness for the Sepedi language remains uncertain. In this study, we present the evaluation of the Sepedi-English code-switched automatic speech recognition system. This end-to-end system was developed using the Sepedi Prompted Code Switching corpus and the CTC approach. The performance of the system was evaluated using both the NCHLT Sepedi test corpus and the Sepedi Prompted Code Switching corpus. The model produced the lowest WER of 41.9%, however, the model faced challenges in recognizing the Sepedi only text.
The document discusses a new approach for identifying the script of words in low-resolution images of display boards using texture features. It aims to identify 3 Indian scripts: Hindi, Kannada, and English. The proposed method extracts discrete cosine transform-based texture features from word images and uses a threshold-based function to classify the script. When evaluated on 800 word images, it achieved an overall accuracy of 85.44% and individual accuracies of 100% for Hindi, 70.33% for Kannada, and 86% for English. The method is robust to variations in fonts, character spacing, noise and other degradations.
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
Segmentation technique plays a major role in scripting the documents for extraction of various
features. Many researchers are doing various research works in this field to make the segmenting process
simple as well as efficient. In this paper a simple segmentation technique for both the line and word
segmentation of a script document has been proposed. The main objective of this technique is to recognize the
spaces that separate two text lines.For the Word segmentation technique also similar procedure is followed. In
this work ,three different scanned document have been taken as input images for both line and word
segmentation techniques. The results found were outstanding with average accuracy for both line and word. It
provides 100% accuracy for line segmentation and 100% for line segmentation as well. Evaluation results show
that our method outperforms several competing methods.
IRJET- Vernacular Language Spell Checker & AutocorrectionIRJET Journal
This document describes the development of a spell checker for the Hindi language. It discusses the importance of spell checkers for digitizing languages and some common techniques used in spell checking like n-gram analysis, edit distance algorithms, and probabilistic methods. The proposed system will use a corpus of Hindi text to build a language model and detect spelling errors. It will generate candidate corrections based on edit distance and rank them using n-gram frequency analysis. The goal is to develop a tool that can check for both non-word errors and real word errors in Hindi text.
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
This paper describes a transfer based scheme for translating Malayalam, a Dravidian language, to English. This system inputs Malayalam sentences and outputs equivalent English sentences. The system comprises of a preprocessor for splitting the compound words, a morphological parser for context disambiguation and chunking, a syntactic structure transfer module and a bilingual dictionary. All the modules are morpheme based to reduce dictionary size. The system does not rely on a stochastic approach and it is based on a rule-based architecture along with various linguistic knowledge components of both Malayalam and English. The system uses two sets of rules: rules for Malayalam morphology and rules for syntactic structure transfer from Malayalam to English. The system is designed using artificial intelligence techniques.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch
of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis,
natural language understanding, Information Extraction, Information retrieval, question answering etc.
The aim of NER is to classify words into some predefined categories like location name, person name,
organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based
approach of machine learning in detail to identify the named entities. The main idea behind the use of
HMM model for building NER system is that it is language independent and we can apply this system for
any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can
use it according to their interest. The corpus used by our NER system is also not domain specific.
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name, organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific
Named Entity Recognition using Hidden Markov Model (HMM)kevig
Named Entity Recognition (NER) is the subtask of Natural Language Processing (NLP) which is the branch of artificial intelligence. It has many applications mainly in machine translation, text to speech synthesis, natural language understanding, Information Extraction, Information retrieval, question answering etc. The aim of NER is to classify words into some predefined categories like location name, person name,
organization name, date, time etc. In this paper we describe the Hidden Markov Model (HMM) based approach of machine learning in detail to identify the named entities. The main idea behind the use of HMM model for building NER system is that it is language independent and we can apply this system for
any language domain. In our NER system the states are not fixed means it is of dynamic in nature one can use it according to their interest. The corpus used by our NER system is also not domain specific.
Genetic Approach For Arabic Part Of Speech Taggingkevig
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
GENETIC APPROACH FOR ARABIC PART OF SPEECH TAGGINGijnlc
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
English to punjabi machine translation system using hybrid approach of word sIAEME Publication
This document describes a hybrid machine translation and word sense disambiguation system for translating English sentences to Punjabi. The system uses conditional random fields to disambiguate words with multiple meanings by determining the category with the highest word frequency in the context. The system achieves 81.2% accuracy on test sentences by first analyzing sentences, then synthesizing the translation while addressing word ambiguities, and finally outputting the translated sentence.
Similar to Current state of the art pos tagging for indian languages – a study (20)
Tech transfer making it as a risk free approach in pharmaceutical and biotech iniaemedu
Tech transfer is a common methodology for transferring new products or an existing
commercial product to R&D or to another manufacturing site. Transferring product knowledge to the
manufacturing floor is crucial and it is an ongoing approach in the pharmaceutical and biotech
industry. Without adopting this process, no company can manufacture its niche products, let alone
market them. Technology transfer is a complicated, process because it is highly cross functional. Due
to its cross functional dependence, these projects face numerous risks and failure. If anidea cannot be
successfully brought out in the form of a product, there is no customer benefit, or satisfaction.
Moreover, high emphasis is in sustaining manufacturing with highest quality each and every time. It
is vital that tech transfer projects need to be executed flawlessly. To accomplish this goal, risk
management is crucial and project team needs to use the risk management approach seamlessly.
Integration of feature sets with machine learning techniquesiaemedu
This document summarizes a research paper that proposes a novel approach for spam filtering using selective feature sets combined with machine learning techniques. The paper presents an algorithm and system architecture that extracts feature sets from emails and uses machine learning to classify emails and generate rules to identify spam. Several metrics are identified to evaluate the efficiency of the feature sets, including false positive rate. An experiment is described that uses keyword lists as feature sets to train filters and compares the proposed approach to other spam filtering methods.
Effective broadcasting in mobile ad hoc networks using gridiaemedu
This document summarizes a research paper that proposes a new grid-based broadcasting mechanism for mobile ad hoc networks. The paper argues that flooding approaches to broadcasting are inefficient and cause network congestion. The proposed approach divides the network into a hierarchical grid structure. When a node needs to broadcast a message, it sends the message to the first node in the appropriate grid, which is then responsible for updating and forwarding the message within that grid. Simulation results showed the grid-based approach outperformed other broadcasting protocols and was more reliable, efficient and scalable.
Effect of scenario environment on the performance of mane ts routingiaemedu
The document analyzes the effect of scenario environment on the performance of the AODV routing protocol in mobile ad hoc networks (MANETs). It studies AODV performance under different scenarios varying network size, maximum node speed, and pause time. The performance is evaluated based on packet delivery ratio, throughput, and end-to-end delay. The results show that AODV performs best in some scenarios and worse in others, indicating that scenario parameters significantly impact routing protocol performance in MANETs.
Adaptive job scheduling with load balancing for workflow applicationiaemedu
This document discusses adaptive job scheduling with load balancing for workflow applications in a grid platform. It begins with an abstract that describes grid computing and how scheduling plays a key role in performance for grid workflow applications. Both static and dynamic scheduling strategies are discussed, but they require high scheduling costs and may not produce good schedules. The paper then proposes a novel semi-dynamic algorithm that allows the schedule to adapt to changes in the dynamic grid environment through both static and dynamic scheduling. Load balancing is incorporated to handle situations where jobs are delayed due to resource fluctuations or overloading of processors. The rest of the paper outlines the related works, proposed scheduling algorithm, system model, and evaluation of the approach.
This document summarizes research on transaction reordering techniques. It discusses transaction reordering approaches based on reducing resource conflicts and increasing resource sharing. Specifically, it covers:
1) A "steal-on-abort" technique that reorders an aborted transaction behind the transaction that caused the abort to avoid repeated conflicts.
2) A replication protocol that attempts to reorder transactions during certification to avoid aborts rather than restarting immediately.
3) Transaction reordering and grouping during continuous data loading to prevent deadlocks when loading data for materialized join views.
The document discusses semantic web services and their challenges. It provides an overview of semantic web technologies like WSDL, SOAP, UDDI, and OIL which are used to build semantic web services. The semantic web architecture adds semantics to web services through ontologies written in OWL and DAML+OIL. Key approaches to semantic web services include annotation, composition, and addressing privacy and security. However, semantic web services still face challenges in achieving their full potential due to issues in representation, reasoning, and a lack of real-world applications and data.
Website based patent information searching mechanismiaemedu
This document summarizes a research paper on developing a website-based patent information searching mechanism. It discusses how patent information can be used for technology development, rights acquisition and utilization, and management information. It describes different types of patent searches including novelty, validity, infringement, and state-of-the-art searches. It also evaluates and compares two major patent websites, Delphion and USPTO, in terms of their search capabilities and features.
Revisiting the experiment on detecting of replay and message modificationiaemedu
This document summarizes a research paper that proposes methods for detecting message modification and replay attacks in ad-hoc wireless networks. It begins with background on security issues in wireless networks and types of attacks. It then reviews existing intrusion detection systems and security techniques. Related work that detects attacks using features from the media access control layer or radio frequency fingerprinting is also discussed. The paper aims to present a simple, economical, and platform-independent system for detecting message modification, replay attacks, and unauthorized users in ad-hoc networks.
1) The document discusses the Cyclic Model Analysis (CMA) technique for sequential pattern mining which aims to predict customer purchasing behavior.
2) CMA calculates the Trend Distribution Function from sequential patterns to model purchasing trends over time. It then uses Generalized Periodicity Detection and Trend Modeling to identify periodic patterns and construct an approximating model.
3) The Cyclic Model Analysis algorithm is applied to further analyze the patterns, dividing the domain into segments where the distribution function is increasing or decreasing and applying the other techniques recursively to fully model the cyclic behavior.
Performance analysis of manet routing protocol in presenceiaemedu
This document analyzes the performance of different routing protocols in a mobile ad hoc network (MANET) under hybrid traffic conditions. It simulates a MANET with 50 nodes moving at speeds up to 20 m/s using the AODV, DSDV, and DSR routing protocols. Traffic included both constant bit rate and variable bit rate sources. Results found that AODV had lower average end-to-end delay and higher packet delivery ratios than DSDV and DSR as the percentage of variable bit rate traffic increased. AODV also performed comparably under both low and high node mobility scenarios with hybrid traffic.
Performance measurement of different requirements engineeringiaemedu
This document summarizes a research paper that compares the performance of different requirements engineering (RE) process models. It describes three RE process models - two existing linear models and the authors' iterative model. It also reviews literature on common RE activities and issues with descriptive models not reflecting real-world practices. The authors conducted interviews at two Indian companies to model their RE processes and compare them to the three models. They found the existing linear models did not fully capture the iterative nature of observed RE processes.
This document proposes a mobile safety system for automobiles that uses Android operating system. The system has two main components: a safety device and an automobile base unit. The safety device allows users to monitor the vehicle's location on a map, check its status, and control functions remotely. It communicates with the base unit in the vehicle using GPRS. The base unit collects data from sensors, determines the vehicle's GPS location, and can execute control commands like activating the brakes or switching off the engine. The document provides details on the design and algorithms of both components and includes examples of Java code implementation. The goal is to create an intelligent, secure and easy-to-use mobile safety system for vehicles using embedded systems and Android
Efficient text compression using special character replacementiaemedu
The document describes a proposed algorithm for efficient text compression using special character replacement and space removal. The algorithm replaces words with non-printable ASCII characters or combinations of characters to compress text files. It uses a dynamic dictionary to map words to their symbols. Spaces are removed from the compressed file in some cases to further reduce file size. Experimental results show the algorithm achieves better compression ratios than LZW, WinZip 10.0 and WinRAR 3.93 for various text file types while allowing lossless decompression.
The document discusses agile programming and proposes a new methodology. It provides an overview of existing agile methodologies like Scrum and Extreme Programming. Scrum uses short sprints to define tasks and deadlines. Extreme Programming focuses on practices like test-first development, pair programming, and continuous integration. The document notes drawbacks like an inability to support large or multi-site projects. It proposes designing a new methodology that combines the advantages of existing methods while overcoming their deficiencies.
Adaptive load balancing techniques in global scale grid environmentiaemedu
The document discusses various adaptive load balancing techniques for distributed applications in grid environments. It first describes adaptive mesh refinement algorithms that partition computational domains using space-filling curves or by distributing grids independently or at different levels. It also discusses dynamic load balancing using tiling and multi-criteria geometric partitioning. The document then covers repartitioning algorithms based on multilevel diffusion and the adaptive characteristics of structured adaptive mesh refinement applications. Finally, it discusses adaptive workload balancing on heterogeneous resources by benchmarking resource characteristics and estimating application parameters to find optimal load distribution.
A survey on the performance of job scheduling in workflow applicationiaemedu
This document summarizes a survey on job scheduling performance in workflow applications on grid platforms. It discusses an adaptive dual objective scheduling (ADOS) algorithm that takes both completion time and resource usage into account for measuring schedule performance. The study shows ADOS delivers good performance in completion time, resource usage, and robustness to changes in resource performance. It also describes the system architecture used, which includes a planner and executor component. The planner focuses on scheduling to minimize completion time while considering resource usage, and can reschedule if needed. The executor enacts the schedule on the grid resources.
A survey of mitigating routing misbehavior in mobile ad hoc networksiaemedu
This document summarizes existing methods to detect misbehavior in mobile ad hoc networks (MANETs). It discusses how routing protocols assume nodes will cooperate fully, but misbehavior like packet dropping can occur. It describes several techniques to detect misbehavior, including watchdog, ACK/SACK, TWOACK, S-TWOACK, and credit-based/reputation-based schemes. Credit-based schemes use virtual currencies to provide incentives for nodes to forward packets, while reputation-based schemes track nodes' past behaviors. The document aims to survey approaches for mitigating the impact of misbehaving nodes in MANET routing.
A novel approach for satellite imagery storage by classifyiaemedu
This document presents a novel approach for classifying and storing satellite imagery by detecting and storing only non-duplicate regions. It uses kernel principal component analysis to reduce the dimensionality and extract features of satellite images. Fuzzy N-means clustering is then used to segment the images into blocks. A duplication detection algorithm compares blocks to identify duplicate and non-duplicate regions. Only the non-duplicate regions are stored in the database, improving storage efficiency and updating speed compared to completely replacing existing images. Support vector machines are used to categorize the non-duplicate blocks into the appropriate classes in the existing images.
A self recovery approach using halftone images for medical imageryiaemedu
This document summarizes a proposed approach for securely transferring medical images over the internet using visual cryptography and halftone images. The approach uses error diffusion techniques to generate a halftone host image from the grayscale medical image. Shadow images are then created from the halftone host image using visual cryptography algorithms. When stacked together, the shadow images reveal the secret medical image. The halftone host image also contains an embedded logo that can be extracted to verify the integrity of the reconstructed image without a trusted third party.