J. Anurag, P. Nupur and Agrawal, S.S.
School of Information Technology, Guru Gobind Singh Indraprastha University, Delhi, India
Centre for Development of Advanced Computing, Noida, India
Taking into account communities of practice’s specific vocabularies in inform...inscit2006
L. Damas and C. Million-Rousseau
Condillac Group, LISTIC, Université de Savoie. 73370 Le Bourget du Lac, France
Ontologos Corp. 6, route de Nanfray, 74000 Cran-Gevrier, France
Words can have more than one distinct meaning and many words can be interpreted in multiple ways
depending on the context in which they occur. The process of automatically identifying the meaning of
a polysemous word in a sentence is a fundamental task in Natural Language Processing (NLP). This
phenomenon poses challenges to Natural Language Processing systems. There have been many efforts
on word sense disambiguation for English; however, the amount of efforts for Amharic is very little.
Many natural language processing applications, such as Machine Translation, Information Retrieval,
Question Answering, and Information Extraction, require this task, which occurs at the semantic level.
In this thesis, a knowledge-based word sense disambiguation method that employs Amharic WordNet
is developed. Knowledge-based Amharic WSD extracts knowledge from word definitions and relations
among words and senses. The proposed system consists of preprocessing, morphological analysis and
disambiguation components besides Amharic WordNet database. Preprocessing is used to prepare the
input sentence for morphological analysis and morphological analysis is used to reduce various forms
of a word to a single root or stem word. Amharic WordNet contains words along with its different
meanings, synsets and semantic relations with in concepts. Finally, the disambiguation component is
used to identify the ambiguous words and assign the appropriate sense of ambiguous words in a
sentence using Amharic WordNet by using sense overlap and related words.
We have evaluated the knowledge-based Amharic word sense disambiguation using Amharic
WordNet system by conducting two experiments. The first one is evaluating the effect of Amharic
WordNet with and without morphological analyzer and the second one is determining an optimal
windows size for Amharic WSD. For Amharic WordNet with morphological analyzer and Amharic
WordNet without morphological analyzer we have achieved an accuracy of 57.5% and 80%,
respectively. In the second experiment, we have found that two-word window on each side of the
ambiguous word is enough for Amharic WSD. The test results have shown that the proposed WSD
methods have performed better than previous Amharic WSD methods.
Keywords: Natural Language Processing, Amharic WordNet, Word Sense Disambiguation,
Knowledge Based Approach, Lesk Algorithm
In this presentation we discuss several concepts that include Word Representation using SVD as well as neural networks based techniques. In addition we also cover core concepts such as cosine similarity, atomic and distributed representations.
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
Taking into account communities of practice’s specific vocabularies in inform...inscit2006
L. Damas and C. Million-Rousseau
Condillac Group, LISTIC, Université de Savoie. 73370 Le Bourget du Lac, France
Ontologos Corp. 6, route de Nanfray, 74000 Cran-Gevrier, France
Words can have more than one distinct meaning and many words can be interpreted in multiple ways
depending on the context in which they occur. The process of automatically identifying the meaning of
a polysemous word in a sentence is a fundamental task in Natural Language Processing (NLP). This
phenomenon poses challenges to Natural Language Processing systems. There have been many efforts
on word sense disambiguation for English; however, the amount of efforts for Amharic is very little.
Many natural language processing applications, such as Machine Translation, Information Retrieval,
Question Answering, and Information Extraction, require this task, which occurs at the semantic level.
In this thesis, a knowledge-based word sense disambiguation method that employs Amharic WordNet
is developed. Knowledge-based Amharic WSD extracts knowledge from word definitions and relations
among words and senses. The proposed system consists of preprocessing, morphological analysis and
disambiguation components besides Amharic WordNet database. Preprocessing is used to prepare the
input sentence for morphological analysis and morphological analysis is used to reduce various forms
of a word to a single root or stem word. Amharic WordNet contains words along with its different
meanings, synsets and semantic relations with in concepts. Finally, the disambiguation component is
used to identify the ambiguous words and assign the appropriate sense of ambiguous words in a
sentence using Amharic WordNet by using sense overlap and related words.
We have evaluated the knowledge-based Amharic word sense disambiguation using Amharic
WordNet system by conducting two experiments. The first one is evaluating the effect of Amharic
WordNet with and without morphological analyzer and the second one is determining an optimal
windows size for Amharic WSD. For Amharic WordNet with morphological analyzer and Amharic
WordNet without morphological analyzer we have achieved an accuracy of 57.5% and 80%,
respectively. In the second experiment, we have found that two-word window on each side of the
ambiguous word is enough for Amharic WSD. The test results have shown that the proposed WSD
methods have performed better than previous Amharic WSD methods.
Keywords: Natural Language Processing, Amharic WordNet, Word Sense Disambiguation,
Knowledge Based Approach, Lesk Algorithm
In this presentation we discuss several concepts that include Word Representation using SVD as well as neural networks based techniques. In addition we also cover core concepts such as cosine similarity, atomic and distributed representations.
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
Source and target word segmentation and alignment is a primary step in the statistical learning of a Transliteration. Here, we analyze the benefit of a syllable-like segmentation approach for learning a transliteration from English to an Indic language, which aligns the training set word pairs in terms of sub-syllable-like units instead of individual character units. While this has been found useful in the case of dealing with Out-of-vocabulary words in English-Chinese in the presence of multiple target dialects, we asked if this would be true for Indic languages which are simpler in their phonetic representation and pronunciation. We expected this syllable-like method to perform marginally better, but we found instead that even though our proposed approach improved the Top-1 accuracy, the individual-character-unit alignment model
somewhat outperformed our approach when the Top-10 results of the system were re-ranked using language modeling approaches. Our experiments were conducted for English to Telugu transliteration (our method will apply equally well to most written Indic languages); our training consisted of a syllable-like segmentation and alignment of a large training set, on which we built a statistical model by modifying a previous character-level maximum entropy based Transliteration learning system due to Kumaran and Kellner; our testing consisted of using the same segmentation of a test English word, followed by applying the model, and reranking the resulting top 10 Telugu words. We also report the dataset creation and selection since standard datasets are not available.
Pedagogical applications of corpus data for English for General and Specific ...Pascual Pérez-Paredes
FIAL (conférence ouverte aux chercheurs et étudiants): "Pedagogical applications of corpus data for English for General and Specific Purposes" le mercredi 4 décembre, 12h45 (local ERAS 56). UCL, Louvain-la-Neuve
The Presentation contains about Word Sense Diassambiguation. I had tried to explain about the Word Sense in terms of Python language. But it can be also done using nltk.
Words and sentences are the basic units of text. In this lecture we discuss basics of operations on words and sentences such as tokenization, text normalization, tf-idf, cosine similarity measures, vector space models and word representation
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
Improving Document Clustering by Eliminating Unnatural LanguageJinho Choi
Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can be an important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of unnatural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various formats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that removing unnatural language components gives an absolute improvement in document clustering by up to 15%. Our corpus and tool are publicly available.
In this talk I intend to review some basic and high-level concepts like formal languages, grammars and ontologies. Languages to transmit knowledge from a sender to a receiver; grammars to formally specify languages; ontologies as formals specifications of specific knowledge domains. After this introductory revision, enhancing the role of each of those elements in the context of computer-based problem solving (programming), I will talk about a project aimed at automatically infer and generate a Grammar for a Domain Specific Language (DSL) from a given ontology that describes this specific domain. The transformation rules will be presented and the system, Onto2Gra, that fully implements that "Ontological approach for DSL development" will be introduced.
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
Word Sense Disambiguation is a classification of me
aning of word in a precise context which is a trick
y
task to perform in Natural Language Processing whic
h is used in application like machine translation,
information extraction and retrieval, automatic or
closed domain question answering system for the rea
son
that of its semantics perceptive. Researchers tried
for unsupervised and knowledge based learning
approaches however such approaches have not proved
more helpful. Various supervised learning
algorithms have been made, but in vain as the attem
pt of creating the training corpus which is a tagge
d
sense marked corpora is tricky. This paper presents
a hybrid approach for resolving ambiguity in a
sentence which is based on integrating lexical know
ledge and world knowledge. English Wordnet
developed at Princeton University, SemCor corpus an
d the JAWS library (Java API for WordNet
searching) has been used for this purpose.
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
Natural Language Processing is an interrelated disincline adding the capability of communicating as human beings to Computerworld. Amharic language is having much improvement over time thanks to researcher at PHD, MSC level at AAU. Here , I have tried to study and come up a limited scope solution that does syntax parsing for Amharic language and draws syntax parse trees using Python!!
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
Automatic classification of bengali sentences based on sense definitions pres...ijctcm
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the
Bengali sentences automatically into different groups in accordance with their underlying senses. The input
sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL
project of the Govt. of India, while information about the different senses of particular ambiguous lexical
item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic
model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a
particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses
that render sentences in different meanings. In our experiment we have achieved around 84% accurate
result on the sense classification over the total input sentences. We have analyzed those residual sentences
that did not comply with our experiment and did affect the results to note that in many cases, wrong
syntactic structures and less semantic information are the main hurdles in semantic classification of
sentences. The applicational relevance of this study is attested in automatic text classification, machine
learning, information extraction, and word sense disambiguation
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and fluidity are better.
Keywords:TTS, SBS, Sillable, Diphone.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Pedagogical applications of corpus data for English for General and Specific ...Pascual Pérez-Paredes
FIAL (conférence ouverte aux chercheurs et étudiants): "Pedagogical applications of corpus data for English for General and Specific Purposes" le mercredi 4 décembre, 12h45 (local ERAS 56). UCL, Louvain-la-Neuve
The Presentation contains about Word Sense Diassambiguation. I had tried to explain about the Word Sense in terms of Python language. But it can be also done using nltk.
Words and sentences are the basic units of text. In this lecture we discuss basics of operations on words and sentences such as tokenization, text normalization, tf-idf, cosine similarity measures, vector space models and word representation
Abstract
Part of speech tagging plays an important role in developing natural language processing software. Part of speech tagging means assigning part of speech tag to each word of the sentence. The part of speech tagger takes a sentence as input and it assigns respective/appropriate part of speech tag to each word of that sentence. In this article I surveys the different work have done about odia POS tagging.
________________________________________________
Improving Document Clustering by Eliminating Unnatural LanguageJinho Choi
Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can be an important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of unnatural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various formats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that removing unnatural language components gives an absolute improvement in document clustering by up to 15%. Our corpus and tool are publicly available.
In this talk I intend to review some basic and high-level concepts like formal languages, grammars and ontologies. Languages to transmit knowledge from a sender to a receiver; grammars to formally specify languages; ontologies as formals specifications of specific knowledge domains. After this introductory revision, enhancing the role of each of those elements in the context of computer-based problem solving (programming), I will talk about a project aimed at automatically infer and generate a Grammar for a Domain Specific Language (DSL) from a given ontology that describes this specific domain. The transformation rules will be presented and the system, Onto2Gra, that fully implements that "Ontological approach for DSL development" will be introduced.
A N H YBRID A PPROACH TO W ORD S ENSE D ISAMBIGUATION W ITH A ND W ITH...ijnlc
Word Sense Disambiguation is a classification of me
aning of word in a precise context which is a trick
y
task to perform in Natural Language Processing whic
h is used in application like machine translation,
information extraction and retrieval, automatic or
closed domain question answering system for the rea
son
that of its semantics perceptive. Researchers tried
for unsupervised and knowledge based learning
approaches however such approaches have not proved
more helpful. Various supervised learning
algorithms have been made, but in vain as the attem
pt of creating the training corpus which is a tagge
d
sense marked corpora is tricky. This paper presents
a hybrid approach for resolving ambiguity in a
sentence which is based on integrating lexical know
ledge and world knowledge. English Wordnet
developed at Princeton University, SemCor corpus an
d the JAWS library (Java API for WordNet
searching) has been used for this purpose.
Natural language processing with python and amharic syntax parse tree by dani...Daniel Adenew
Natural Language Processing is an interrelated disincline adding the capability of communicating as human beings to Computerworld. Amharic language is having much improvement over time thanks to researcher at PHD, MSC level at AAU. Here , I have tried to study and come up a limited scope solution that does syntax parsing for Amharic language and draws syntax parse trees using Python!!
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
In recent decades speech interactive systems have gained increasing importance. Performance of an ASR system mainly depends on the availability of large corpus of speech. The conventional method of building a large vocabulary speech recognizer for any language uses a top-down approach to speech. This approach requires large speech corpus with sentence or phoneme level transcription of the speech utterances. The transcriptions must also include different speech order so that the recognizer can build models for all the sounds present. But, for Telugu language, because of its complex nature, a very large, well annotated speech database is very difficult to build. It is very difficult, if not impossible, to cover all the words of any Indian language, where each word may have thousands and millions of word forms. A significant part of grammar that is handled by syntax in English (and other similar languages) is handled within morphology in Telugu. Phrases including several words (that is, tokens) in English would be mapped on to a single word in Telugu.Telugu language is phonetic in nature in addition to rich in morphology. That is why the speech technology developed for English cannot be applied to Telugu language. This paper highlights the work carried out in an attempt to build a voice enabled text editor with capability of automatic term suggestion. Main claim of the paper is the recognition enhancement process developed by us for suitability of highly inflecting, rich morphological languages. This method results in increased speech recognition accuracy with very much reduction in corpus size. It also adapts Telugu words to the database dynamically, resulting in growth of the corpus.
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
Abstract:The interest in text to speech synthesis increased in the world .text to speech have been developed for many popular languages such as English, Spanish and French and many researches and developments have been applied to those languages. Persian on the other hand, has been given little attention compared to other languages of similar importance and the research in Persian is still in its infancy. Persian languages possess many difficulty and exceptions that increase complexity of text to speech systems. For example: short vowels is absent in written text or existence of homograph words. in this paper we propose a new method for Persian text to phonetic that base on pronunciations by analogy in words, semantic relations and grammatical rules for finding proper phonetic.Keywords:PbA, text to speech, Persian language, Phonetic recognition.
Title:Phonetic Recognition In Words For Persian Text To Speech Systems
Author:Ahmad Musavi Nasab, Ali Joharpour
International Journal of Recent Research in Mathematics Computer Science and Information Technology (IJRRMCSIT)
Paper Publications
Automatic classification of bengali sentences based on sense definitions pres...ijctcm
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the
Bengali sentences automatically into different groups in accordance with their underlying senses. The input
sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL
project of the Govt. of India, while information about the different senses of particular ambiguous lexical
item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic
model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a
particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses
that render sentences in different meanings. In our experiment we have achieved around 84% accurate
result on the sense classification over the total input sentences. We have analyzed those residual sentences
that did not comply with our experiment and did affect the results to note that in many cases, wrong
syntactic structures and less semantic information are the main hurdles in semantic classification of
sentences. The applicational relevance of this study is attested in automatic text classification, machine
learning, information extraction, and word sense disambiguation
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and fluidity are better.
Keywords:TTS, SBS, Sillable, Diphone.
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
IOSR journal of VLSI and Signal Processing (IOSRJVSP) is a double blind peer reviewed International Journal that publishes articles which contribute new results in all areas of VLSI Design & Signal Processing. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced VLSI Design & Signal Processing concepts and establishing new collaborations in these areas.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels
Direct Punjabi to English Speech Translation using Discrete UnitsIJCI JOURNAL
Speech-to-speech translation is yet to reach the same level of coverage as text-to-text translation systems. The current speech technology is highly limited in its coverage of over 7000 languages spoken worldwide, leaving more than half of the population deprived of such technology and shared experiences. With voice-assisted technology (such as social robots and speech-to-text apps) and auditory content (such as podcasts and lectures) on the rise, ensuring that the technology is available for all is more important than ever. Speech translation can play a vital role in mitigating technological disparity and creating a more inclusive society. With a motive to contribute towards speech translation research for low-resource languages, our work presents a direct speech-to-speech translation model for one of the Indic languages called Punjabi to English. Additionally, we explore the performance of using a discrete representation of speech called discrete acoustic units as input to the Transformer-based translation model. The model, abbreviated as Unit-to-Unit Translation (U2UT), takes a sequence of discrete units of the source language (the language being translated from) and outputs a sequence of discrete units of the target language (the language being translated to). Our results show that the U2UT model performs better than the Speechto-Unit Translation (S2UT) model by a 3.69 BLEU score.
Design and Development of a Malayalam to English Translator- A Transfer Based...Waqas Tariq
This paper describes a transfer based scheme for translating Malayalam, a Dravidian language, to English. This system inputs Malayalam sentences and outputs equivalent English sentences. The system comprises of a preprocessor for splitting the compound words, a morphological parser for context disambiguation and chunking, a syntactic structure transfer module and a bilingual dictionary. All the modules are morpheme based to reduce dictionary size. The system does not rely on a stochastic approach and it is based on a rule-based architecture along with various linguistic knowledge components of both Malayalam and English. The system uses two sets of rules: rules for Malayalam morphology and rules for syntactic structure transfer from Malayalam to English. The system is designed using artificial intelligence techniques.
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
The language is an effective medium for the communication that conveys the ideas and expression of the human
mind. There are more than 5000 languages in the world for the communication. To know all these languages is
not a solution for problems due to the language barrier in communication. In this multilingual world with the
huge amount of information exchanged between various regions and in different languages in digitized format,
it has become necessary to find an automated process to convert from one language to another. Natural
Language Processing (NLP) is one of the hot areas of research that explores how computers can be utilizing to
understand and manipulate natural language text or speech. In the Proposed system a Hybrid approach to
transliterate the proper nouns from Punjabi to Hindi is developed. Hybrid approach in the proposed system is a
combination of Direct Mapping, Rule based approach and Statistical Machine Translation approach (SMT).
Proposed system is tested on various proper nouns from different domains and accuracy of the proposed system
is very good.
A Context-based Numeral Reading Technique for Text to Speech Systems IJECEIAES
This paper presents a novel technique for context based numeral reading in Indian language text to speech systems. The model uses a set of rules to determine the context of the numeral pronunciation and is being integrated with the waveform concatenation technique to produce speech out of the input text in Indian languages. For this purpose, the three Indian languages Odia, Hindi and Bengali are considered. To analyze the performance of the proposed technique, a set of experiments are performed considering different context of numeral pronunciations and the results are compared with existing syllable-based technique. The results obtained from different experiments shows the effectiveness of the proposed technique in producing intelligible speech out of the entered text utterances compared to the existing technique even with very less storage and execution time.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...IJCI JOURNAL
Speech technology is a field that encompasses various techniques and tools used to enable machines to interact with speech, such as automatic speech recognition (ASR), spoken dialog systems, and others, allowing a device to capture spoken words through a microphone from a human speaker. End-to-end approaches such as Connectionist Temporal Classification (CTC) and attention-based methods are the most used for the development of ASR systems. However, these techniques were commonly used for research and development for many high-resourced languages with large amounts of speech data for training and evaluation, leaving low-resource languages relatively underdeveloped. While the CTC method has been successfully used for other languages, its effectiveness for the Sepedi language remains uncertain. In this study, we present the evaluation of the Sepedi-English code-switched automatic speech recognition system. This end-to-end system was developed using the Sepedi Prompted Code Switching corpus and the CTC approach. The performance of the system was evaluated using both the NCHLT Sepedi test corpus and the Sepedi Prompted Code Switching corpus. The model produced the lowest WER of 41.9%, however, the model faced challenges in recognizing the Sepedi only text.
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
Marathi is one of the oldest languages in India. This research paper describes the development of Marathi Textto-
Speech System (TTS). In Marathi TTS the input is Marathi text in Unicode. The voices are sampled from real
recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding
spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech
from any text input to imitate human speakers. Text processing and speech generation are two main components
of a text to speech system. To build a natural sounding speech synthesis system, it is essential that text
processing component produce an appropriate sequence of phonemic units. Generation of sequence of phonetic
units for a given standard word is referred to as letter to phoneme rule or text to phoneme rule. The
complexity of these rules and their derivation depends upon the nature of the language. The quality of a speech
synthesizer is judged by its closeness to the natural human voice and understandability. In this research paper we
described an approach to build a Marathi TTS system using concatenative synthesis method with syllable as a
basic unit of concatenation.
Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation Systemijtsrd
Sadhu and Cholit bhasha are two significant Bangladeshi languages. Sadhu was functional in ancient era and had Sanskrit components but in present era cholit took its place. There are many formal and legal paper works present in Sadhu language which direly need to be translated in Cholit because its more favorable and speaker friendly. Therefore, this paper dealt with this issue by familiarizing the current era with Sadhu by creating a software. Different sentences were chosen and final data set was obtained by Principal Component Analysis PCA . MATLAB and Python are used for different machine learning algorithms. Most work is being done using Scikit Learn and MATLAB machine learning toolbox. It was found that Linear Discriminant Analysis LDA functions best. Speed prediction was also done and values were determined through graphs. It was inferred that this categorizer efficiently translated all Sadhu words to Cholit precisely and in well structured way. Therefore, Sadhu will not remain a complex language in this decade. Nakib Aman Turzo | Pritom Sarker | Biplob Kumar "Interpretation of Sadhu into Cholit Bhasha by Cataloguing and Translation System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-3 , April 2020, URL: https://www.ijtsrd.com/papers/ijtsrd30792.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/30792/interpretation-of-sadhu-into-cholit-bhasha-by-cataloguing-and-translation-system/nakib-aman-turzo
Similar to Improvement in Quality of Speech associated with Braille codes - A Review (20)
Difference of application of fuzzy rough sets and probability random on targe...inscit2006
Qinge Wu, Tuo Wang, Yongxuan Huang and Jisheng Li
Systems Engineering Institute School of Electronic and Information Engineering Xi’an Jiao Tong University, Xi’an, Shaanxi, 710049, P.R.China
A Comparative Study of RDBMs and OODBMs in Relation to Security of Datainscit2006
Mansaf Alam and Siri Krishan Wasan
Department of Computer Sciences, Jamia Millia Islamia, New Delhi, India.
Department of Mathematics, Jamia Millia Islamia, New Delhi, India.
Knowledge Discovery in Environmental Impact Report’s summary texts: an explor...inscit2006
Cláudia Viviane Viegas, Roseli Búrigo, José Leomar Todesco, Fernando Alvaro Ostuni Gauthier, Paulo Maurício Selig
Federal University of Santa Catarina, Engineering and Knowledge Management Post-graduation Program, Florianópolis (SC), BRAZIL
Feevale University, Novo Hamburgo (RS), BRAZIL
Santa Catarina Extrem South University, Criciúma (SC), BRAZIL
Parametric Study to Enhance Genetic Algorithm's Performance using Ranked base...inscit2006
Omar Al Jadaan, Dr. C.R. Rao and Prof. Lakishmi Rajamani
Dept.of Mathematics and Statistics, University of Hyderabad, Hyderabad 500-046, India.
CSE,EC, Osmania University, Hyderabad 500-007, India.
Mensajería instantánea: una puerta para una nueva percepción del mundo para n...inscit2006
Silvina Ruth Crenzel and Vera Lúcia Nojima
Pontifícia Universidade Católica do Rio de Janeiro, Puc-Rio, Rua Marquês de São Vicente, 225 - Rio de Janeiro, RJ 22453-900 – Brasil
How Task Analysis Can Prime the Design of Context-Aware Technologiesinscit2006
Yun-Maw Cheng, Yue-Sun Kuo, Wai Yu and Chris Johnson
Institute of Information Science, Academia Sinica, Taipei, TAIWAN
Virtual Engineering Centre, Queen’s University, Belfast, BT9 5HN, UK
Department of Computing Science, University of Glasgow, G12 8RZ, UK
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Improvement in Quality of Speech associated with Braille codes - A Review
1. Improvement in Quality of Speech associated with Braille codes- A Review Anurag Jain, Nupur Prakash School of Information Technology, Guru Gobind Singh Indraprastha University, Delhi, India S.S. Agrawal Centre for Development of Advanced Computing, Noida, India