The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
Visual speech to text conversion applicable to telephone communicationSwathi Venugopal
This document discusses a new system to convert visual-speech to text for deaf individuals using telephone communication. The system aims to automatically recognize Cued Speech gestures and convert them to text. Researchers extracted lip shapes and hand coordinates from video recordings of individuals performing Cued Speech. They used Hidden Markov Models and feature fusion to integrate lip shapes and hand gestures for isolated word and continuous phoneme recognition. The system achieved 86-94% accuracy for isolated words and 82-89% for continuous phonemes, indicating it can effectively convert visual-speech to text.
Our speech to text conversion project aims to help the nearly 20% of people worldwide with disabilities by allowing them to control their computer and share information using only their voice. The system uses acoustic and language models with a speech engine to recognize speech and convert it to text. It can perform operations like opening calculator and wordpad. Speech recognition has applications in areas like cars, healthcare, education and daily life. Accuracy depends on factors like vocabulary size, speaker dependence, and speech type (isolated, continuous). The system aims to improve accessibility while reducing costs.
This document describes a student project implementing speech recognition for desktop applications. It was completed by three students - Sarang Afle, Sneh Joshi, and Surbhi Sharma - for their computer science degree under the supervision of Professor Nitesh Rastogi. The project involved developing a speech recognition software that allows users to operate a computer through voice commands.
YouTube Link: https://youtu.be/sHeJgKBaiAI
** Python Certification Training: https://www.edureka.co/python **
This Edureka video on 'Speech Recognition in Python' will cover the concepts of speech recognition module in python with a program using speech recognition to translate speech into text. Following are the topics discussed:
How Speech Recognition Works?
How To Install SpeechRecognition In Python?
Working With Microphones
How To Install Pyaudio In Python?
Use case
Speech Recognition Service (SPRS) is a mobile application capable of recognition some of voices. And due to the system compatibility, users can running this application from any mobile devices supported by android system and SPRS is designed especially for deaf people.
Voice recognition service (VRS) recognize voices in our real life especially in our home such that sound of the bell, telephone and door. VRS allows the user to many of the notifications options when you hear a specific sound previously identified where the user can choose one of the available options such as sending a message to mobile number or vibration. VRS requires minimum knowledge of how to use the mobile in order to be able to run it. VRS has Arabic, simple and user-friendly interface.
This document includes a detailed description of system requirements both functional and non-functional, design models and description, functionalities of all system objects, and system testing so it could be used as a user manual for system users.
This report provides an overview of speech recognition technology, including how speech recognition systems work, common applications, and future uses. It discusses key concepts such as utterances, pronunciation, grammar, accuracy, and training. The report also examines speech recognition software and provides examples of free and commercial speech recognition programs. Overall, the report finds that speech recognition has various applications in fields like education, healthcare, gaming, and more, and the technology is expected to continue advancing to support additional future applications.
Intro to Auto Speech Recognition -- How ML Learns Speech-to-TextYoshiyuki Igarashi
Automatic speech recognition (ASR) is the technology that converts speech to written text. There are two main approaches: static systems that use acoustic, pronunciation, and language models sequentially; and end-to-end neural networks that use deep neural networks for feature extraction, acoustic modeling, and language modeling. Challenges for ASR systems include noise, variations in accents and ages, transferring learning across dialects, and operating locally on devices without internet.
This document provides an overview of speech recognition technology. It defines key terms like utterances, pronunciation, and grammar. It describes how speech recognition works by explaining the acoustic model, grammar, and recognized text. It also discusses standards, performance measurement, and provides an example of Google Search by Voice.
Visual speech to text conversion applicable to telephone communicationSwathi Venugopal
This document discusses a new system to convert visual-speech to text for deaf individuals using telephone communication. The system aims to automatically recognize Cued Speech gestures and convert them to text. Researchers extracted lip shapes and hand coordinates from video recordings of individuals performing Cued Speech. They used Hidden Markov Models and feature fusion to integrate lip shapes and hand gestures for isolated word and continuous phoneme recognition. The system achieved 86-94% accuracy for isolated words and 82-89% for continuous phonemes, indicating it can effectively convert visual-speech to text.
Our speech to text conversion project aims to help the nearly 20% of people worldwide with disabilities by allowing them to control their computer and share information using only their voice. The system uses acoustic and language models with a speech engine to recognize speech and convert it to text. It can perform operations like opening calculator and wordpad. Speech recognition has applications in areas like cars, healthcare, education and daily life. Accuracy depends on factors like vocabulary size, speaker dependence, and speech type (isolated, continuous). The system aims to improve accessibility while reducing costs.
This document describes a student project implementing speech recognition for desktop applications. It was completed by three students - Sarang Afle, Sneh Joshi, and Surbhi Sharma - for their computer science degree under the supervision of Professor Nitesh Rastogi. The project involved developing a speech recognition software that allows users to operate a computer through voice commands.
YouTube Link: https://youtu.be/sHeJgKBaiAI
** Python Certification Training: https://www.edureka.co/python **
This Edureka video on 'Speech Recognition in Python' will cover the concepts of speech recognition module in python with a program using speech recognition to translate speech into text. Following are the topics discussed:
How Speech Recognition Works?
How To Install SpeechRecognition In Python?
Working With Microphones
How To Install Pyaudio In Python?
Use case
Speech Recognition Service (SPRS) is a mobile application capable of recognition some of voices. And due to the system compatibility, users can running this application from any mobile devices supported by android system and SPRS is designed especially for deaf people.
Voice recognition service (VRS) recognize voices in our real life especially in our home such that sound of the bell, telephone and door. VRS allows the user to many of the notifications options when you hear a specific sound previously identified where the user can choose one of the available options such as sending a message to mobile number or vibration. VRS requires minimum knowledge of how to use the mobile in order to be able to run it. VRS has Arabic, simple and user-friendly interface.
This document includes a detailed description of system requirements both functional and non-functional, design models and description, functionalities of all system objects, and system testing so it could be used as a user manual for system users.
This report provides an overview of speech recognition technology, including how speech recognition systems work, common applications, and future uses. It discusses key concepts such as utterances, pronunciation, grammar, accuracy, and training. The report also examines speech recognition software and provides examples of free and commercial speech recognition programs. Overall, the report finds that speech recognition has various applications in fields like education, healthcare, gaming, and more, and the technology is expected to continue advancing to support additional future applications.
Intro to Auto Speech Recognition -- How ML Learns Speech-to-TextYoshiyuki Igarashi
Automatic speech recognition (ASR) is the technology that converts speech to written text. There are two main approaches: static systems that use acoustic, pronunciation, and language models sequentially; and end-to-end neural networks that use deep neural networks for feature extraction, acoustic modeling, and language modeling. Challenges for ASR systems include noise, variations in accents and ages, transferring learning across dialects, and operating locally on devices without internet.
This document provides an overview of speech recognition technology. It defines key terms like utterances, pronunciation, and grammar. It describes how speech recognition works by explaining the acoustic model, grammar, and recognized text. It also discusses standards, performance measurement, and provides an example of Google Search by Voice.
The document discusses voice recognition systems and their key components. It describes:
1) Sphinx, an open source tool used for speech recognition that uses Hidden Markov Models and applies feature extraction, language modeling, and acoustic modeling.
2) The CMU lexical access system which hypothesizes words from a phonetic dictionary using syllable anchors.
3) Key parts of speech recognition systems including feature extraction, acoustic modeling, language modeling, and the use of HMMs to match features to models.
Skype Translator created buzz all over the place. Now you can embed same speech to speech translation service in Your applications. How does it work and what opportunities it creates for us to turn our visions of the future to reality of Today. Month ago Microsoft released a service, that allows anyone to extend their solutions and apps with such capability. In this session you will learn how Speech to Speech translations work. And will learn about companies and solutions that already use this capability.
The document outlines a proposed voice to text mobile application project with the following goals:
1) Develop a voice to text converter for mobile devices within 6 months to assist people with limited mobility or visual impairments.
2) The voice to text system aims to have an accuracy level of at least 80%.
3) The application will be sold to major mobile companies for 99p per installation.
The project was started with a sole aim in mind that the design should be able to recognize the voice of a person by analyzing the speech signal. The simulation is done in MATLAB. The design of the project is based on using the Linear prediction filter coefficient (LPC) and Principal component analysis (PCA) on data (princomp) for the speech signal analysis. The Sample Collection process is accomplished by using the microphone to record the speech of male/female. After executing the program the speech is analyzed by the analysis part of our MATLAB program code and our design should be able to identify and give the judgment that the recorded speech signal is same as that of our desired output.
Hugo Moreno discusses speech recognition and its applications in control. Speech recognition is the process of converting speech signals to sequences of words through computer algorithms. It involves feature extraction from speech and matching patterns to vocabularies. Speech recognition can be used for applications like elevator control, robot control, translation, stress monitoring, and hands-free computing. It provides an acceptable level of accuracy but improving accuracy reduces speed. Speech recognition involves matching voice patterns to acquire or provide vocabularies.
This document discusses automatic speech recognition, including defining the task, the main challenges, and common approaches. The key difficulties identified are digitizing speech, separating speech from noise, dealing with variability between individuals, identifying phonemes, disambiguating homophones, handling continuous speech, and interpreting prosodic features. Common approaches are template matching, rule-based systems, and statistical/machine learning methods like hidden Markov models. Remaining challenges include robustness, adaptability, language modeling, and handling spontaneous speech.
- Text-to-speech software was first created in Japan in the late 1960s and came to America in 1973 when Bell Atlantic developed their own version. It allowed printed text to be read aloud.
- Since the 1980s, development has focused on expanding what can be read and making the voices sound more natural. In recent years, the goal has been to make the voices as human-like as possible.
- Text-to-speech software can be used to read any text on a computer for students with disabilities or language barriers as an inclusion tool. It allows for adjustment of reading speed and voice used.
This document provides an outline on natural language processing and machine vision. It begins with an introduction to different levels of natural language analysis, including phonetic, syntactic, semantic, and pragmatic analysis. Phonetic analysis constructs words from phonemes using frequency spectrograms. Syntactic analysis builds a structural description of sentences through parsing. Semantic analysis generates a partial meaning representation from syntax, while pragmatic analysis uses context. The document also introduces machine vision as a technology using optical sensors and cameras for industrial quality control through detection of faults. It operates through sensing images, processing/analyzing images, and various applications.
Introduction to Natural Language Processingrohitnayak
Natural Language Processing has matured a lot recently. With the availability of great open source tools complementing the needs of the Semantic Web we believe this field should be on the radar of all software engineering professionals.
This document discusses language translation and provides an overview of a language translation tool. It begins with an introduction that defines translation and its objectives. It then discusses why translation is necessary in different contexts like education, business, and media. The document outlines the hardware, software, and development tools required for the language translation tool, including using Python and Visual Studio Code. It describes the methodology used in the tool, which utilizes the Googletrans library to implement Google Translate API. The modes of the translation tool include writing text, processing, output, and listening. The document concludes with discussing the future of translation and the benefits of language translators.
Voice morphing is a technique that modifies a source speaker's speech to sound like a target speaker. It does this by changing the pitch from the source speaker, like a male voice, to the target speaker, like a female voice. This is done by interpolating the linear predictive coding coefficients of the source and target signals. The pitch of the morphed signal can be positioned between the source and target by varying a constant value between 0 and 1. Applications include changing voices for security or entertainment purposes, but limitations include difficulties with voice detection and requiring extensive sound libraries.
Speech recognition systems convert spoken words to text in real-time. They are used in dictation software and intelligent assistants. Design challenges include background noise, accent variations, and speed of speech. Speaker dependent systems recognize one voice, while speaker independent systems recognize any voice without training. Speech is broken into phonemes and a hidden Markov model identifies phonemes and language models recognize words. Components include signal analysis, acoustic and language models. Applications include healthcare, military, phones, and personal computers. Siri and Google Now are examples of intelligent assistants using these techniques.
This document discusses speech recognition technology. It begins by defining speech recognition as the process of converting spoken words to text. It then discusses some key companies in the space, including Nuance Communications which was founded in 1994 as a spinoff from SRI to commercialize speech recognition technology. The document also outlines some features and applications of Dragon speech recognition software, as well as limitations, opportunities, and the future of speech recognition technology.
Voice morphing is a technique that modifies a source speaker's speech to sound like it was spoken by a target speaker. It works by analyzing the source speech into an excitation signal and filter components, then resynthesizing it with the pitch and vocal characteristics of the target speaker. The key steps are detecting the pitches of the source and target speakers, scaling the source pitch to match the target, then resynthesizing the source speech using the target's vocal filter characteristics and the pitch-scaled excitation signal. Voice morphing was developed in 1999 and has applications in text-to-speech, dubbing, voice disguising, and public announcement systems.
This document summarizes a student's summer internship report on developing a language translation program. The report includes an introduction describing language translation, the objective of developing a simple and easy to use translation system, and a background section describing the importance of translation. It outlines the methodology, tools, and technologies used including hardware, software, and development tools. The report then describes the implementation and provides a conclusion on future improvements.
This document summarizes a seminar presentation on text-to-speech synthesis and voice stick devices. The presentation covers the introduction of speech synthesis and its challenges. It discusses the disadvantages of braille systems and introduces the voice stick device, how it works using optical character recognition to convert text to audio. The presentation discusses the working principles of text-to-speech systems and their architecture. It outlines the advantages of these systems and their applications in devices for the blind, smartphones, vehicles and more. The presentation concludes with a section on further research and development opportunities in this area.
This document describes how to build a simple automatic speaker recognition system. It discusses the principles of speaker recognition, which can be identification (determining which registered speaker is speaking) or verification (accepting or rejecting a speaker's claimed identity). The key components are feature extraction and feature matching. Feature extraction converts the speech waveform into features using techniques like MFCC. Feature matching then compares the extracted features to stored reference models to identify the speaker. The document focuses on the speech feature extraction process, which involves framing the speech signal, windowing frames, taking the FFT, and calculating MFCCs to characterize the signal in a way that mimics human hearing.
This document presents a voice-based billing system that takes voice input from customers on purchased items and quantities and generates an itemized bill. It consists of three main modules: 1) A speech-to-text module that converts voice input into text using Google APIs. 2) A word tokenization module that splits the text into individual words using NLTK. 3) A bill generation module that takes the tokenized words as input to calculate the total bill amount. The system was tested on purchasing various fruits and achieved 90% accuracy compared to existing systems. It aims to reduce time complexity for billing compared to manual entry.
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
The document compares the performance of two speech synthesis methods - unit selection and hidden Markov model (HMM) - for Hindi language. It finds that unit selection results in higher quality synthesized speech than HMM based on both subjective and objective quality measurements. Subjective measurements using mean opinion scores show unit selection receives higher average ratings. Objective measurements of mean square error and peak signal-to-noise ratio also indicate unit selection introduces less distortion compared to the original speech samples.
The document discusses voice recognition systems and their key components. It describes:
1) Sphinx, an open source tool used for speech recognition that uses Hidden Markov Models and applies feature extraction, language modeling, and acoustic modeling.
2) The CMU lexical access system which hypothesizes words from a phonetic dictionary using syllable anchors.
3) Key parts of speech recognition systems including feature extraction, acoustic modeling, language modeling, and the use of HMMs to match features to models.
Skype Translator created buzz all over the place. Now you can embed same speech to speech translation service in Your applications. How does it work and what opportunities it creates for us to turn our visions of the future to reality of Today. Month ago Microsoft released a service, that allows anyone to extend their solutions and apps with such capability. In this session you will learn how Speech to Speech translations work. And will learn about companies and solutions that already use this capability.
The document outlines a proposed voice to text mobile application project with the following goals:
1) Develop a voice to text converter for mobile devices within 6 months to assist people with limited mobility or visual impairments.
2) The voice to text system aims to have an accuracy level of at least 80%.
3) The application will be sold to major mobile companies for 99p per installation.
The project was started with a sole aim in mind that the design should be able to recognize the voice of a person by analyzing the speech signal. The simulation is done in MATLAB. The design of the project is based on using the Linear prediction filter coefficient (LPC) and Principal component analysis (PCA) on data (princomp) for the speech signal analysis. The Sample Collection process is accomplished by using the microphone to record the speech of male/female. After executing the program the speech is analyzed by the analysis part of our MATLAB program code and our design should be able to identify and give the judgment that the recorded speech signal is same as that of our desired output.
Hugo Moreno discusses speech recognition and its applications in control. Speech recognition is the process of converting speech signals to sequences of words through computer algorithms. It involves feature extraction from speech and matching patterns to vocabularies. Speech recognition can be used for applications like elevator control, robot control, translation, stress monitoring, and hands-free computing. It provides an acceptable level of accuracy but improving accuracy reduces speed. Speech recognition involves matching voice patterns to acquire or provide vocabularies.
This document discusses automatic speech recognition, including defining the task, the main challenges, and common approaches. The key difficulties identified are digitizing speech, separating speech from noise, dealing with variability between individuals, identifying phonemes, disambiguating homophones, handling continuous speech, and interpreting prosodic features. Common approaches are template matching, rule-based systems, and statistical/machine learning methods like hidden Markov models. Remaining challenges include robustness, adaptability, language modeling, and handling spontaneous speech.
- Text-to-speech software was first created in Japan in the late 1960s and came to America in 1973 when Bell Atlantic developed their own version. It allowed printed text to be read aloud.
- Since the 1980s, development has focused on expanding what can be read and making the voices sound more natural. In recent years, the goal has been to make the voices as human-like as possible.
- Text-to-speech software can be used to read any text on a computer for students with disabilities or language barriers as an inclusion tool. It allows for adjustment of reading speed and voice used.
This document provides an outline on natural language processing and machine vision. It begins with an introduction to different levels of natural language analysis, including phonetic, syntactic, semantic, and pragmatic analysis. Phonetic analysis constructs words from phonemes using frequency spectrograms. Syntactic analysis builds a structural description of sentences through parsing. Semantic analysis generates a partial meaning representation from syntax, while pragmatic analysis uses context. The document also introduces machine vision as a technology using optical sensors and cameras for industrial quality control through detection of faults. It operates through sensing images, processing/analyzing images, and various applications.
Introduction to Natural Language Processingrohitnayak
Natural Language Processing has matured a lot recently. With the availability of great open source tools complementing the needs of the Semantic Web we believe this field should be on the radar of all software engineering professionals.
This document discusses language translation and provides an overview of a language translation tool. It begins with an introduction that defines translation and its objectives. It then discusses why translation is necessary in different contexts like education, business, and media. The document outlines the hardware, software, and development tools required for the language translation tool, including using Python and Visual Studio Code. It describes the methodology used in the tool, which utilizes the Googletrans library to implement Google Translate API. The modes of the translation tool include writing text, processing, output, and listening. The document concludes with discussing the future of translation and the benefits of language translators.
Voice morphing is a technique that modifies a source speaker's speech to sound like a target speaker. It does this by changing the pitch from the source speaker, like a male voice, to the target speaker, like a female voice. This is done by interpolating the linear predictive coding coefficients of the source and target signals. The pitch of the morphed signal can be positioned between the source and target by varying a constant value between 0 and 1. Applications include changing voices for security or entertainment purposes, but limitations include difficulties with voice detection and requiring extensive sound libraries.
Speech recognition systems convert spoken words to text in real-time. They are used in dictation software and intelligent assistants. Design challenges include background noise, accent variations, and speed of speech. Speaker dependent systems recognize one voice, while speaker independent systems recognize any voice without training. Speech is broken into phonemes and a hidden Markov model identifies phonemes and language models recognize words. Components include signal analysis, acoustic and language models. Applications include healthcare, military, phones, and personal computers. Siri and Google Now are examples of intelligent assistants using these techniques.
This document discusses speech recognition technology. It begins by defining speech recognition as the process of converting spoken words to text. It then discusses some key companies in the space, including Nuance Communications which was founded in 1994 as a spinoff from SRI to commercialize speech recognition technology. The document also outlines some features and applications of Dragon speech recognition software, as well as limitations, opportunities, and the future of speech recognition technology.
Voice morphing is a technique that modifies a source speaker's speech to sound like it was spoken by a target speaker. It works by analyzing the source speech into an excitation signal and filter components, then resynthesizing it with the pitch and vocal characteristics of the target speaker. The key steps are detecting the pitches of the source and target speakers, scaling the source pitch to match the target, then resynthesizing the source speech using the target's vocal filter characteristics and the pitch-scaled excitation signal. Voice morphing was developed in 1999 and has applications in text-to-speech, dubbing, voice disguising, and public announcement systems.
This document summarizes a student's summer internship report on developing a language translation program. The report includes an introduction describing language translation, the objective of developing a simple and easy to use translation system, and a background section describing the importance of translation. It outlines the methodology, tools, and technologies used including hardware, software, and development tools. The report then describes the implementation and provides a conclusion on future improvements.
This document summarizes a seminar presentation on text-to-speech synthesis and voice stick devices. The presentation covers the introduction of speech synthesis and its challenges. It discusses the disadvantages of braille systems and introduces the voice stick device, how it works using optical character recognition to convert text to audio. The presentation discusses the working principles of text-to-speech systems and their architecture. It outlines the advantages of these systems and their applications in devices for the blind, smartphones, vehicles and more. The presentation concludes with a section on further research and development opportunities in this area.
This document describes how to build a simple automatic speaker recognition system. It discusses the principles of speaker recognition, which can be identification (determining which registered speaker is speaking) or verification (accepting or rejecting a speaker's claimed identity). The key components are feature extraction and feature matching. Feature extraction converts the speech waveform into features using techniques like MFCC. Feature matching then compares the extracted features to stored reference models to identify the speaker. The document focuses on the speech feature extraction process, which involves framing the speech signal, windowing frames, taking the FFT, and calculating MFCCs to characterize the signal in a way that mimics human hearing.
This document presents a voice-based billing system that takes voice input from customers on purchased items and quantities and generates an itemized bill. It consists of three main modules: 1) A speech-to-text module that converts voice input into text using Google APIs. 2) A word tokenization module that splits the text into individual words using NLTK. 3) A bill generation module that takes the tokenized words as input to calculate the total bill amount. The system was tested on purchasing various fruits and achieved 90% accuracy compared to existing systems. It aims to reduce time complexity for billing compared to manual entry.
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
The document compares the performance of two speech synthesis methods - unit selection and hidden Markov model (HMM) - for Hindi language. It finds that unit selection results in higher quality synthesized speech than HMM based on both subjective and objective quality measurements. Subjective measurements using mean opinion scores show unit selection receives higher average ratings. Objective measurements of mean square error and peak signal-to-noise ratio also indicate unit selection introduces less distortion compared to the original speech samples.
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
Marathi is one of the oldest languages in India. This research paper describes the development of Marathi Textto-
Speech System (TTS). In Marathi TTS the input is Marathi text in Unicode. The voices are sampled from real
recorded speech. The objective of a text to speech system is to convert an arbitrary text into its corresponding
spoken waveform. Speech synthesis is a process of building machinery that can generate human-like speech
from any text input to imitate human speakers. Text processing and speech generation are two main components
of a text to speech system. To build a natural sounding speech synthesis system, it is essential that text
processing component produce an appropriate sequence of phonemic units. Generation of sequence of phonetic
units for a given standard word is referred to as letter to phoneme rule or text to phoneme rule. The
complexity of these rules and their derivation depends upon the nature of the language. The quality of a speech
synthesizer is judged by its closeness to the natural human voice and understandability. In this research paper we
described an approach to build a Marathi TTS system using concatenative synthesis method with syllable as a
basic unit of concatenation.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
This document summarizes a research paper on SMaTTS, a rule-based text-to-speech synthesis system for Standard Malay. The system uses a sinusoidal synthesis method and some pre-recorded wave files to generate speech. It has two main phases: natural language processing (NLP) and digital signal processing (DSP). The NLP phase analyzes text and converts it to phonetic representations with prosody information. The DSP phase generates the speech waveform. The system was evaluated using diagnostic rhyme tests and mean opinion scores, and areas for improvement are discussed.
This document provides an outline and details of a student internship project on text-to-speech conversion using the Python programming language. The project was conducted at iPEC Solutions, which provides AI training and services. The student designed a text-to-speech system using tools including Praat, Audacity, and WaveSurfer. The system converts text to speech by extracting phonetic components, matching them to inventory items, and generating acoustic signals for output. The project aimed to help those with communication difficulties through improved accessibility of text-to-speech technology.
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
The document describes a system for multilingual speech-to-text conversion using Hugging Face that aims to assist deaf individuals. The system uses a Transformer-based encoder-decoder model trained on audio datasets. Feature extraction and tokenization are performed on the audio inputs. The model is fine-tuned using Hugging Face and evaluated based on word error rate. A web application allows users to record speech via microphone and see the transcription output. The implemented model achieved a 32.42% word error rate on a Hindi language dataset. The goal is to enable seamless communication for those with hearing impairments across multiple languages.
This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
This document describes an expert system for speech synthesis of Standard Arabic text. It involves two main stages: 1) creation of a sound database and 2) text-to-speech transformation. The transformation process involves phonetic orthographic transcription of the text and then generating voice signals corresponding to the transcribed phonetic sequence. The expert system uses a knowledge base containing sound data and rewriting rules. It transcribes text using graphemes as basic units and then concatenates sound units from the database to synthesize speech. Tests achieved a 96% success rate in pronouncing sentences correctly. Future work aims to improve prosody and develop fully automatic signal segmentation.
Rendering Of Voice By Using Convolutional Neural Network And With The Help Of...IRJET Journal
This document summarizes a research paper that proposes a novel text-to-speech technique using deep convolutional neural networks without recurrent units. The technique aims to alleviate the long training times required for recurrent neural network-based text-to-speech models. The proposed Deep Convolutional Text-to-Speech model consists of a Text2Mel network that synthesizes a mel spectrogram from input text, and a Spectrogram Super-resolution Network that converts the mel spectrogram into a full spectrogram. Experimental results found that the proposed model could be sufficiently trained in 15 hours using an ordinary gaming PC with two GPUs, producing speech quality that was almost acceptable, while RNN-based models typically require days or weeks to train.
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
In this paper, improvement of an ASR system for
Hindi language, based on Vector quantized MFCC as feature
vectors and HMM as classifier, is discussed. MFCC features
are usually pre-processed before being used for recognition.
One of these pre-processing is to create delta and delta-delta
coefficients and append them to MFCC to create feature vector.
This paper focuses on all digits in Hindi (Zero to Nine), which
is based on isolated word structure. Performance of the system
is evaluated by accurate Recognition Rate (RR). The effect of
the combination of the Delta MFCC (DMFCC) feature along
with the Delta-Delta MFCC (DDMFCC) feature shows
approximately 2.5% further improvement in the RR, with no
additional computational costs involved. RR of the system for
the speakers involved in the training phase is found to give
better recognition accuracy than that for the speakers who
were not involved in the training phase. Word wise RR is
observed to be good in some digits with distinct phones.
This document reviews techniques used in spoken-word recognition systems. It discusses popular feature extraction techniques like MFCC, LPC, DWT, WPD that are used to represent speech signals in a compact form before classification. Classification techniques discussed are ANN, HMM, DTW, and VQ. The document provides a brief overview of each technique and their advantages. It also presents the generalized workflow of a spoken-word recognition system including stages of speech acquisition, pre-emphasis, feature extraction, modeling, classification, and output of recognized text.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This document describes a speech recognized automation system using speaker identification through wireless communication. The system uses a speech processor and MATLAB coding with MFCC algorithms to perform speech recognition and speaker identification. It then wirelessly controls electrical devices based on speech commands. Testing showed 80-85% accuracy for the actual speaker and lower (10-20%) accuracy for other speakers. Future work could involve improving speaker recognition accuracy as the number of speakers increases.
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
This paper discusses the methodology for a project named “Speech Recognized Automation System
using Speaker Identification through wireless communication”. This project gives the design of Automation
system using wireless communication and speaker recognition using Matlab code. Straightforward
programming interface of Matlab makes it an ideal tool for speech analysis in project. This automation system
is useful for home appliances as well as in industry. This paper discusses the overall design of a wireless
automation system which is built and implemented. The speech recognition centers on recognition of speech
commands stored in data base of Matlab and it is matched with incoming voice command of speaker. Mel
Frequency Cepstral Coefficient (MFCC) algorithm is used to recognize the speech of speaker and to extract
features of speech. It uses low-power RF ZigBee transceiver wireless communication modules which are
relatively cheap. This automation system is intended to control lights, fans and other electrical appliances in a
home or office using speech commands like Light, Fan etc. Further, if security is not big issue then Speech
processor is used to control the appliances without speaker identification
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures.
To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence ਟ,ਪ,ਸ .
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKijistjournal
This paper describes an Hidden Markov Model-based Punjabi text-to-speech synthesis system (HTS), in which speech waveform is generated from Hidden Markov Models themselves, and applies it to Punjabi speech synthesis using the general speech synthesis architecture of HTK (HMM Tool Kit). This Hidden Markov Model based TTS can be used in mobile phones for stored phone directory or messages. Text messages and caller’s identity in English language are mapped to tokens in Punjabi language which are further concatenated to form speech with certain rules and procedures.
To build the synthesizer we recorded the speech database and phonetically segmented it, thus first extracting context-independent monophones and then context-dependent triphones. For e.g. for word bharat monophones are a, bh, t etc. & triphones are bh-a+r. These speech utterances and their phone level transcriptions (monophones and triphones) are the inputs to the speech synthesis system. System outputs the sequence of phonemes after resolving various ambiguities regarding selection of phonemes using word network files e.g. for the word Tapas the output phoneme sequence is ਤ,ਪ,ਸ instead of phoneme sequence ਟ,ਪ,ਸ .
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
This document describes a web browser application that converts text to speech. The key features are:
1. The browser can open different file formats (e.g. doc, pdf) and read the text aloud, reducing reading effort.
2. It includes a text-to-speech converter, recorder to save audio, and image-based history with timestamps.
3. The project aims to combine online content browsing with text-to-speech in a single application, addressing limitations of separate browser and text converter tools.
Effect of MFCC Based Features for Speech Signal Alignmentskevig
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level.
This document provides an overview of speech recognition systems and recent progress in the field. It discusses different types of speech recognition including isolated word, connected word, continuous speech, and spontaneous speech. Various techniques used in speech recognition are also summarized, such as simulated evolutionary computation, artificial neural networks, fuzzy logic, Kalman filters, and Hidden Markov Models. The document reviews several papers published between 2004-2012 that studied speech recognition methods including using dynamic spectral subband centroids, Kalman filters, biomimetic computing techniques, noise estimation, and modulation filtering. It concludes that Hidden Markov Models combined with MFCC features provide good recognition results and are suitable for large vocabulary, speaker-independent, continuous speech recognition.
Similar to Speech to text conversion for visually impaired person using µ law companding (20)
An Examination of Effectuation Dimension as Financing Practice of Small and M...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Does Goods and Services Tax (GST) Leads to Indian Economic Development?iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Childhood Factors that influence success in later lifeiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Emotional Intelligence and Work Performance Relationship: A Study on Sales Pe...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Customer’s Acceptance of Internet Banking in Dubaiiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study of Employee Satisfaction relating to Job Security & Working Hours amo...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Consumer Perspectives on Brand Preference: A Choice Based Model Approachiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Student`S Approach towards Social Network Sitesiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Broadcast Management in Nigeria: The systems approach as an imperativeiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study on Retailer’s Perception on Soya Products with Special Reference to T...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
A Study Factors Influence on Organisation Citizenship Behaviour in Corporate ...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Consumers’ Behaviour on Sony Xperia: A Case Study on Bangladeshiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Design of a Balanced Scorecard on Nonprofit Organizations (Study on Yayasan P...iosrjce
1. The document describes a study that designed a balanced scorecard for a nonprofit organization called Yayasan Pembinaan dan Kesembuhan Batin (YPKB) in Malang, Indonesia.
2. The balanced scorecard translated YPKB's vision and mission into strategic objectives across four perspectives: financial, customer, internal processes, and learning and growth.
3. Key strategic objectives included donation growth, budget effectiveness, customer satisfaction, reputation, service quality, innovation, and employee development. Customers perspective had the highest weighting, suggesting a focus on public service over financial growth.
Public Sector Reforms and Outsourcing Services in Nigeria: An Empirical Evalu...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Media Innovations and its Impact on Brand awareness & Considerationiosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Customer experience in supermarkets and hypermarkets – A comparative studyiosrjce
- The document examines customer experience in supermarkets and hypermarkets in India through a survey of 418 customers.
- It finds that in supermarkets, previous experience, atmosphere, price, social environment and experience in other channels most influence customer experience, while in hypermarkets, previous experience, product assortment, social environment and experience in other channels are most influential.
- The study provides insights for retailers on key determinants of customer experience in each format to help them improve strategies and competitive positioning.
Social Media and Small Businesses: A Combinational Strategic Approach under t...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Secretarial Performance and the Gender Question (A Study of Selected Tertiary...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Implementation of Quality Management principles at Zimbabwe Open University (...iosrjce
This document discusses the implementation of quality management principles at Zimbabwe Open University's Matabeleland North Regional Centre. It begins with background information on ZOU and the importance of quality management in open and distance learning institutions. The study aimed to determine if quality management and its principles were being implemented at the regional centre. Key findings included that the centre prioritized customer focus and staff involvement. Decisions were made based on data analysis. The regional centre implemented a quality system informed by its policy documents. The document recommends ensuring staffing levels match needs and providing sufficient resources to the regional centre.
Organizational Conflicts Management In Selected Organizaions In Lagos State, ...iosrjce
IOSR Journal of Business and Management (IOSR-JBM) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Speech to text conversion for visually impaired person using µ law companding
1. IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)
e-ISSN: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 6, Ver. II (Nov - Dec .2015), PP 58-62
www.iosrjournals.org
DOI: 10.9790/2834-10625862 www.iosrjournals.org 58 | Page
Speech to text conversion for visually impaired person
using µ law companding
Suraj Mallik1
, Rajesh Mehra2
1,2
Department of Electronics & Communication Engineering, National Institute of Technical Teachers’ Training
& Research, Chandigarh-160019, India
Abstract: The paper represents the overall design and implementation of DSP based speech recognition and
text conversion system. Speech is usually taken as a preferred mode of operation for human being, This paper
represent voice oriented command for converting into text. We intended to compute the entire speech processing
in real time. This involves simultaneously accepting the input from the user and using software filters to analyse
the data. The comparison was then to be established by using correlation and µ law companding techniques. In
this paper, voice recognition is carried out using MATLAB. The voice command is a person independent. The
voice command is stored in the data base with the help of the function keys. The real time input speech received
is then processed in the speech recognition system where the required feature of the speech words are extracted,
filtered out and matched with the existing sample stored in the database. Then the required MATLAB processes
are done to convert the received data and into text form.
Keywords: Asr, Dsp, Gui, Hmm, Matlab, Stt
I. Introduction
Since the beginning of humanity and the interaction with the virtual technological world, technology
has dramatically changing our life and living style. Research in Human Language Technology has made great
progress in the past few years. The challenge to design much better automation processes is to accommodate the
variation in between different user. Also, a unique and better user interface design can be the solution to some
existing automation process design problems. Automatic speech-to-text (STT) processing systems are capable of
producing English word transcripts of conversational telephone speech at 15.2% word error rate, a decrease of
53% over the past 5 years. An ideal user independent interface still does not exist at present and to build an ideal
interface requires knowledge of both sociological linguistic and technological fields. According to many major
companies that are involved in building speech recognition system and researches, speech will be the primary
interface between humans and machines in the near future.[1] Research and development group have
investigated the possibility of using speech activation in cars to enable hands free controlling. Recently, a
Hidden Markov Model (HMM) based speech recognition and processing system was implemented in to enable
voice activated wheelchair controlling. Speech recognition technology allows computers to translate speech in
pure audio or spoken form and convert it to text format. By providing a specific grammar and limiting the
vocabulary, the system needs to recognize the speech with good recognition results. The performance of the
speech recognition in home environments depends on the implementation of the speech recognition system
Language is the ability to express one‟s thoughts by means of a set of signs, whether graphical gestural,
acoustic, or even musical. It is distinctive nature of human beings, who are the only creatures to use such a
structured system. Speech is one of its main components. It is by far the oldest means of communication
between human being and also the most widely used. No wonder, then, that people have extensively studied it
and often tried to build machines to handle it in acoustic way. Most of the Information in digital world is
accessible to a few who can read or understand a particular language. Language technologies can provide
solutions in the form of natural interfaces so the digital content can reach to the masses and facilitate the
exchange of information across different people speaking different languages. These technologies play a crucial
role in multi-lingual societies such as India which has about 1652 dialects/native languages.[2] A speech to text
converter convert‟s normal language into text. Synthesized speech can be created by concatenating pieces of
recorded speech in the form of wav. file that are stored in a database. Systems differ in the size of the stored
speech units; a system that stores speech signal using microphone, but may lack clarity. For specific usage
domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can
incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic"
voice output. Here question arises that whether machine or simply computer can perform same task of text to
speech conversion? Answer is not that much easily as human can.[3] The machine has to follow some procedure
which is divided in basic two steps: I : Speech sample recognition. Next step is STT that is speech to text
conversion in this we have to convert recognized speech into text format.
2. Speech to text conversion for visually impaired person using µ law companding
DOI: 10.9790/2834-10625862 www.iosrjournals.org 59 | Page
II. Speech Sample Recognition
Recording of voice sample („A,B,C,…Z‟) is done to improve the accuracy and preciseness of the
sample to be filtered and then convert into .wav file which is MATLAB executable format to read an audio
signal. Once the speech samples are stored in the system, the specified location is to be carried out using
wavread command by specifying different function variables to different speech signal.[4]
1. Extracting required speech from given speech signal.
A speech word sample will be extracted from no. of samples & kept in separate word. The Microphone
is used to store the input signal into the system which is then filtered and processed in MATLAB using DSP
techniques.
Fig.1 Input voice sample
2. Speech Analysis & Speech Detection
Speech analysis is mainly concern with analysis of extracted text from given voice samples which are
in wav. file.[5] Organize and maintain them into a list of words. This list contains abbreviation, numbers, and
acronyms & converts them into a full line when needed. Speech Detection is a process of identifying preciously
where it is located in that given voice sample.
After recording the desired voice signal sample, the individual samples are filter which can be done
using various DSP filtering techniques or carried out by extracting the required Speech signal individually
through graphical method. The next thing is to correlate different speech signal with each other keeping the
sample bits same. For this, the sample bits need to be extracted in such a fashion that the matrix so formed
should be of the same order.[6]
Fig.2 Individual filtered voice sample plot
3. Speech Transformation
It is normalization of speech to pronounceable from. It pronounces line by line words take pause when
space is detected between words. It reads the speech according to the punctuation rules, accent marts & stop
words much similar as many users. The individual samples of speech signal are then separate out from the
original voice data.[7] Then by using correlation and µ law companding technique, the required signals can be
synthesize in MATLAB for correlating the stored voice sample with the target one that the user will command.
R = corrcoef(X)returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are
observations and whose columns are variables. The matrix R = corrcoef(X) is related to the covariance matrix C
= cov(X) by
…(1)
3. Speech to text conversion for visually impaired person using µ law companding
DOI: 10.9790/2834-10625862 www.iosrjournals.org 60 | Page
corrcoef(X) is the zeroth lag of the normalized covariance function, that is, the zeroth lag of xcov(x,'coeff')
packed into a square array.
R = corrcoef(x,y) where x and y are column vectors is the same as corrcoef([x y]).
4. Pre-processing:
Pre-processing consists of a number of preliminary steps to make the raw data usable for recognizer.
Firstly the raw input signal from the user is converted in to MATLAB executable file format as Wav. And then
it is further transform into matrix form so as to compare the real time data with the stored one.[8] The noise free
speech signal is passed to the segmentation step, where the individual speech signal is segmented in to
characters. It is the most important aspect of pre-processing stage.
Fig.3 Comparative plot of voice samples
out = compand(in,param,v) implements a µ-law compressor for the input vector in. Mu specifies µ, and v is the
input signal's maximum magnitude. out has the same dimensions and maximum magnitude as in.
out =compand(in,Mu,v,'mu/compressor')
The real time speech signal is then input to the system for correlate with the stored one using;
file = sprintf('%s%d.wav','rec',i);
The system is thus made to ask the user to input the user voice and then again that input voice
sample is converted into the required data matrix which has to be correlate with the stored data sample.[9]
A threshold value which is the average of the entire correlation output data sample is set so as to give the
required output from the system as which particular speech or voice is said by the user.
III. Speech Synthesis
A Speech-To-Text (STT) synthesizer is a computer based system that should be able to read any
speech aloud, when it is directly introduced in the computer by an operator.
IV. What Is Speech Synthesis?
Speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic
speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). All speech-to-text
systems rely on at least two models: an acoustic model and a language model. In addition large vocabulary
systems use a pronunciation model.It is more suitable to define Speech-To-Text or speech synthesis as an
automatic production of text, by given speech signal as transcripted data input to the system in real time. SR
systems use "training" (also called "enrolment") where an individual speaker reads text or isolated vocabulary
into the system.[10] Alpha numeric characters are the smallest distinguishing unit in a speech language. It does
not carry meaning by itself. Alpha numeric characters include alphabetic letters, numerical digits, punctuation
marks, and the individual symbols of any of the world's English language systems. A phoneme is "the smallest
segmental unit of sound employed to form meaningful utterances" The first task faced by STT system is the
conversion of input speech into text representation.
The basic types of synthesis system the following are:
• Formant
• Concatenated
• Prerecorded
1. Concatenative Synthesis:
Concatenative synthesis is based on the concatenation (or stringing together) of segments of recorded
speech. Generally, concatenative synthesis produces the most natural-sounding synthesized speech. There are
three main sub-types of concatenative synthesis.[11]
4. Speech to text conversion for visually impaired person using µ law companding
DOI: 10.9790/2834-10625862 www.iosrjournals.org 61 | Page
1.1 Unit Selection Synthesis:
Unit selection synthesis uses large databases of recorded speech. During database creation, each
recorded utterance is segmented into some or all of the following: individual phones, diphones, half-phones,
syllables, morphemes, words, phrases, and sentences.
1.2. Diphone Synthesis:
Diphone synthesis uses a minimal speech database containing all the diphones (sound-to-sound
transitions) occurring in a language.
2. Formant Synthesis:
Formant synthesis does not use human speech samples at runtime. Instead, the synthesized speech
output is created using additive synthesis and an acoustic model (physical modelling synthesis) Parameters such
as fundamental frequency, voicing, and noise levels are varied over time to create a waveform of artificial
speech.[12] This method is sometimes called rules-based synthesis;
3 Prerecorded Synthesis:
In prerecorded synthesis record large paragraphs of English words (commonly used English
vocabulary) in a continuous rhythm with small gap between two successive words in form of a silence and save
them as sound files on the database.
V. Stt Processes
1. Create a data base and call back functions.
1.1 Create a STT data base which are the user distinguish language alphanumeric characters which recognise the
user speech signal when call back in real time application.
1.2 Get notified on state changes, language changes, recognition results, and errors by registered callback
functions.[13]
2. Start, stop, and cancel recognition.
2.1 Start recording user voice by microphone and analyze the recorded data as text.
2.2 If you stop recording manually by API, the STT stops the recording and recognizes the sound data. Then, the
recognized text is sent by the recorded callback function.
2.3 You also can set sounds which are played before the STT recording start or after the recording stops.
3. Get the recognition result.
3.1.The recognition result is sent by the required callback function.
3.2 With a specific STT engine, you can obtain the time stamp information for the recognition result.
VI. Results
A GUI is made and implemented for Speech to text conversion as shown in the following fig. The adjacent
graph plot showing the given input speech signal being fed into the system in real time from the user.
Fig. 4 Graphic User Interface
Command window showing the execution of the program in back end where the given real time signal
is correlate and using µ law commanding the given input sample is compared with the stored data sample to
carry out output voice as “Yes voice matched” to carry out which significant word is being said by the user.[14]
5. Speech to text conversion for visually impaired person using µ law companding
DOI: 10.9790/2834-10625862 www.iosrjournals.org 62 | Page
Fig. 5 MATLAB Command Window
VII. Conclusion
A system based on voice recognition was built and implemented. The system is targeted at elderly and
disabled people and also to robotics applications.The proposed system therefore provides solutions for the
problems faced by old or disabled persons in daily life and makes their life easier and more comfortable by
proposing a cost effective and reliable solution. The system developed can be used to control AC and DC
appliances through speech.The prototype developed can control electrical devices in a home or office.
Confirmative voice with specific voice pitch and frequency is desired by the speech recognizer used in this
system to produce better recognition results. The system controls extended and multiple appliances by using
speech recognition technology. It can be applied in various applications such as voice activated wheel chairs,
robotic control appliances etc.
Acknowledgements
The author would like to thank Director and Head of Electronics and Communication Engineering
Department, National Institute of Technical Teachers‟ Training & Research, Chandigarh, India and for their
constant inspirations, support and helpful suggestions throughout this research work.
References
[1] Poonam.S.Shetake, “Review of text to speech conversion methods” International Journal of Industrial Electronics and Electrical
Engineering, ISSN: 2347-6982 Volume-2, Issue-8, pp. 28-32, Aug.-2014
[2] R. Gadalla, “Voice Recognition System for Massey University Smarthouse,” M. Eng thesis, Massey University, Auckland, New
Zealand, 2006.
[3] Kyung-Saeng Kim and Kwyro Lee, "Low-power and area efficient FIR filter implementation suitable for multiple taps", IEEE
Transaction on Very Large Scale Integration, Vol. 11, no. 1, pp.150-153, 2003.
[4] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, New Jersey, US: Prentice Hall Inc, 1978.
[5] Devendra Kumar Somwanshi, Image Acquisition, Recognition & Speech conversion, Thapar University, Patiala,M.E, july-2009
[6] Xiaohua Zeng ,Fapojuwo, A. ; Davies, R.J., Design and performance evaluation of voice activated wireless home devices, Research
in Motion, Ottawa, Ont.
[7] www.mathworks.in/help/comm/ref/compand.html
[8] www.mathworks.in/help/matlab/ref/corrcoef.html
[9] V. Shanmughaneethi; Ra. Yagna Praveen; S. Swamynathan, Detection of command injection vulnerabilities in web services
through aspect-oriented programming, International Journal of Computer Applications in Technology (IJCAT), Vol. 44, No. 4,
2012
[10] R. Puviarasi, Mritha Ramalingam, Elanchezhian Chinnavan, Low Cost Self-assistive Voice Controlled Technology for Disabled
People, International Journal of Modern Engineering Research (IJMER,,) ISSN: 2249-6645, Vol. 3, Issue. 4, pp-2133-2138, Jul.-
Aug. 2013
[11] Y Bala Krishna, S. Nagendram, Zigbee Based Voice Control System For smart Home, Int.J.Computer Techology & Applications,
Vol 3 (1), 163-168 IJCTA, 163 ISSN:2229-6093, JAN-FEB 2012
[12] Monica Singhal, Rajesh Mehra, Analyzing Aliasing effect in Down Sampler with increase in factor M, International Journal of
Scientific Research Engineering & Technology (IJSRET) ISSN: 2278–0882
[13] Rajesh Mehra, Shaily Verma, FPGA Based Design of Direct Form FIR Polyphase Interpolator for Wireless Communication,
International Journal of Electrical Electronics &Telecommunication Engineering, ISSN:2051-3240, Vol.44,Issue.1
[14] M. AL-Rousan,
K. Assaleh, “A wavelet and neural network based voice system for a smart wheel chair control” Journal of the
Franklin Institute, Volume 348, Issue 1, Pages 90–100, February 2011