This document summarizes the technical progress in speech recognition over the past few decades. It discusses the key components of a speech recognition system including speech analysis, feature extraction, language translation, and message understanding. The document also outlines the main challenges in speech recognition like variability in context, environmental conditions, speakers, and audio quality. Furthermore, it covers different types of speech recognition systems and their applications in various sectors such as education, medicine, transportation and more. The document concludes with an overview of the major approaches to speech recognition including acoustic-phonetic, pattern recognition, and artificial intelligence and discusses popular feature extraction methods used.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document describes a speaker independent Malayalam word recognition system using support vector machines. It extracts mel frequency cepstral coefficients (MFCCs) as features and compares the performance of linear and radial basis function kernels in the SVM classifier. An overall accuracy of 91.8% was achieved on a database containing isolated word utterances from different speakers, demonstrating the effectiveness of the MFCC features and RBF kernel for speaker independent speech recognition.
This document provides a review of speech recognition by machines over the past 60 years. It discusses the major approaches to speech recognition, including acoustic phonetic, pattern recognition, and artificial intelligence approaches. The pattern recognition approach using hidden Markov models has become predominant. The document outlines the basic model of speech recognition systems and various issues that affect recognition accuracy such as environment, speakers, speech styles, and vocabulary. It also discusses applications of speech recognition in different domains.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This report provides an overview of speech recognition technology, including how speech recognition systems work, common applications, and future uses. It discusses key concepts such as utterances, pronunciation, grammar, accuracy, and training. The report also examines speech recognition software and provides examples of free and commercial speech recognition programs. Overall, the report finds that speech recognition has various applications in fields like education, healthcare, gaming, and more, and the technology is expected to continue advancing to support additional future applications.
This document provides an overview of speech recognition technology. It defines speech recognition as the ability to translate spoken words to text. The key components of a speech recognition system include an audio input, grammar, speech recognition engine, acoustic model, and recognized text output. The speech recognition engine uses the acoustic model and grammar to analyze the audio input and return recognized text. Applications of speech recognition include dictation, data entry, and assisting individuals with disabilities. While speech recognition technology has advanced, challenges remain around digitization of speech, signal processing, and accurately recognizing different speakers. The future of assistive technology using speech recognition in education looks promising.
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
This document describes a study that developed a speech recognition system for recognizing spoken Malayalam digits. It used two wavelet-based feature extraction techniques - Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD) - and evaluated their performance using a Naive Bayes classifier. DWT achieved 83.5% accuracy and WPD achieved 80.7% accuracy. To improve recognition accuracy, the study introduced a new technique called Discrete Wavelet Packet Decomposition (DWPD) that utilizes features from both DWT and WPD. DWPD achieved the highest accuracy of 86.2% along with the Naive Bayes classifier.
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
This document presents a study on automatic speech recognition (ASR). It discusses the different types of ASR systems including speaker-dependent, speaker-independent, and speaker-adaptive systems. It also covers the different types of utterances that can be recognized, such as isolated words, connected words, continuous speech, and spontaneous speech. The document then describes the basic phases involved in ASR, including front-end analysis using techniques like pre-emphasis, framing, windowing and feature extraction. It also discusses back-end processing using acoustic and language models to map features to words. Hidden Markov models are presented as a commonly used acoustic modeling technique in ASR systems.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document describes a speaker independent Malayalam word recognition system using support vector machines. It extracts mel frequency cepstral coefficients (MFCCs) as features and compares the performance of linear and radial basis function kernels in the SVM classifier. An overall accuracy of 91.8% was achieved on a database containing isolated word utterances from different speakers, demonstrating the effectiveness of the MFCC features and RBF kernel for speaker independent speech recognition.
This document provides a review of speech recognition by machines over the past 60 years. It discusses the major approaches to speech recognition, including acoustic phonetic, pattern recognition, and artificial intelligence approaches. The pattern recognition approach using hidden Markov models has become predominant. The document outlines the basic model of speech recognition systems and various issues that affect recognition accuracy such as environment, speakers, speech styles, and vocabulary. It also discusses applications of speech recognition in different domains.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This report provides an overview of speech recognition technology, including how speech recognition systems work, common applications, and future uses. It discusses key concepts such as utterances, pronunciation, grammar, accuracy, and training. The report also examines speech recognition software and provides examples of free and commercial speech recognition programs. Overall, the report finds that speech recognition has various applications in fields like education, healthcare, gaming, and more, and the technology is expected to continue advancing to support additional future applications.
This document provides an overview of speech recognition technology. It defines speech recognition as the ability to translate spoken words to text. The key components of a speech recognition system include an audio input, grammar, speech recognition engine, acoustic model, and recognized text output. The speech recognition engine uses the acoustic model and grammar to analyze the audio input and return recognized text. Applications of speech recognition include dictation, data entry, and assisting individuals with disabilities. While speech recognition technology has advanced, challenges remain around digitization of speech, signal processing, and accurately recognizing different speakers. The future of assistive technology using speech recognition in education looks promising.
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
This document describes a study that developed a speech recognition system for recognizing spoken Malayalam digits. It used two wavelet-based feature extraction techniques - Discrete Wavelet Transforms (DWT) and Wavelet Packet Decomposition (WPD) - and evaluated their performance using a Naive Bayes classifier. DWT achieved 83.5% accuracy and WPD achieved 80.7% accuracy. To improve recognition accuracy, the study introduced a new technique called Discrete Wavelet Packet Decomposition (DWPD) that utilizes features from both DWT and WPD. DWPD achieved the highest accuracy of 86.2% along with the Naive Bayes classifier.
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
This document presents a study on automatic speech recognition (ASR). It discusses the different types of ASR systems including speaker-dependent, speaker-independent, and speaker-adaptive systems. It also covers the different types of utterances that can be recognized, such as isolated words, connected words, continuous speech, and spontaneous speech. The document then describes the basic phases involved in ASR, including front-end analysis using techniques like pre-emphasis, framing, windowing and feature extraction. It also discusses back-end processing using acoustic and language models to map features to words. Hidden Markov models are presented as a commonly used acoustic modeling technique in ASR systems.
The document proposes a voice-based security system (VBSS) that uses voice recognition technology to uniquely identify individuals for security purposes. The VBSS would analyze characteristics of a user's voice and compare it to stored voiceprints to grant access. It aims to provide more secure authentication than passwords alone. The proposed system uses a voice-to-phoneme algorithm to convert speech to an abstract representation in a speaker-independent manner. It includes modules for voice tracking, analysis, storage, authentication and administration to enroll users and handle access requests.
This document discusses speech recognition technology. It begins by defining speech recognition as the process of converting spoken words to text. It then discusses some key companies in the space, including Nuance Communications which was founded in 1994 as a spinoff from SRI to commercialize speech recognition technology. The document also outlines some features and applications of Dragon speech recognition software, as well as limitations, opportunities, and the future of speech recognition technology.
Assistive Examination System for Visually ImpairedEditor IJCATR
This paper presents a design of voice enabled examination system which can be used by the visually challenged students.
The system uses Text-to-Speech (TTS) and Speech-to-Text (STT) technology. The text-to-speech and speech-to-text web based
academic testing software would provide an interaction for blind students to enhance their educational experiences by providing them
with a tool to give the exams. This system will aid the differently-abled to appear for online tests and enable them to come at par with
the other students. This system can also be used by students with learning disabilities or by people who wish to take the examination in
a combined auditory and visual way.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
This report looks at the patenting activity around voice control of mobile devices and also captures key litigations and NPE’s operating in this area. Patents were categorized as per key algorithms and application areas and analyzed for generating different trends within PatSeer Project.
Voice recognition system is a system which is used to convert human voice into signal, which can be understood by the machines. When this is achieved, the machine can be made to work, as desired. The machine could be a computer, a typewriter, or even a robot. There are systems available, in which the machine ‘speaks’ the recorded word. But that is out of the scope of this paper. Here, only the human is expected to talk. Further, the voice recognition systems described here, can be used for projects only.
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET Journal
This document presents a robust sign language and hand gesture recognition system using convolutional neural networks. The system captures image frames and processes them through various neural network layers to classify hand gestures into letters, numbers, or other symbols. It segments the hand from images using color thresholds and edge detection. The images then undergo preprocessing like resizing before being classified by the CNN. The CNN is trained on a large dataset to accurately recognize gestures in sign language and provide text output to bridge communication between deaf and non-signing individuals. The system achieved good results classifying several alphabet letters but could be expanded to recognize word combinations.
Our speech to text conversion project aims to help the nearly 20% of people worldwide with disabilities by allowing them to control their computer and share information using only their voice. The system uses acoustic and language models with a speech engine to recognize speech and convert it to text. It can perform operations like opening calculator and wordpad. Speech recognition has applications in areas like cars, healthcare, education and daily life. Accuracy depends on factors like vocabulary size, speaker dependence, and speech type (isolated, continuous). The system aims to improve accessibility while reducing costs.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
The document summarizes a presentation on automatic speech recognition systems. It includes an introduction defining ASR as the transcription of spoken language into text in real time. It shows the basic block diagram of an ASR system and explains how it works similarly to the human process of hearing, transmitting signals to the brain, and understanding. Some key uses of ASR are in smartphones, AI robots, home automation, and computers. The benefits mentioned are hands-free use, aiding reading and spelling, and easy operation.
The project was started with a sole aim in mind that the design should be able to recognize the voice of a person by analyzing the speech signal. The simulation is done in MATLAB. The design of the project is based on using the Linear prediction filter coefficient (LPC) and Principal component analysis (PCA) on data (princomp) for the speech signal analysis. The Sample Collection process is accomplished by using the microphone to record the speech of male/female. After executing the program the speech is analyzed by the analysis part of our MATLAB program code and our design should be able to identify and give the judgment that the recorded speech signal is same as that of our desired output.
speech recognition,History of speech recognition,what is speech recognition,Voice recognition software , Advantages and Disadvantages speech recognition, voice recognition,Voice recognition in operating systems ,Types of speech recognition
Complete power point presentation on SPEECH RECOGNITION TECHNOLOGY.
Very helpful for final year students for their seminar.
One can use this presentation as their final year seminar.
Speech Recognition is a very interesting topic for seminar.
Voice input and speech recognition system in tourism/social mediacidroypaes
Voice input devices allow users to input data or commands using speech instead of other input methods like keyboards. Some voice input devices recognize words from a predefined vocabulary while others need to be trained for a specific speaker. When a word is spoken, the matching input is displayed on screen for verification.
Speech recognition is the process of converting spoken language to text using computer programs. It draws from linguistics, computer science, and electrical engineering. Applications include voice assistants, dictation software, call routing, and more. Accuracy depends on factors like vocabulary size, presence of similar sounding words, whether the system is designed for one speaker or many, and whether speech is isolated, connected or continuous.
Speech Recognition: Transcription and transformation of human speechSubmissionResearchpa
The specified subfield of computational linguistics and computer science can said to be linked with speech recognition. Speech recognition can develop new variation technologies as well as methodologies generated as interdisciplinary concept. It can be considered to translate and recognize and satisfy the capability towards understanding and translating the words that are already spoken. It is more preciously said that in the most recent times this field has secured positive feedback by intense learning of voice recognition. Such evidences shows the proof that it has more market demand for implementing the application of specific data as voice recognition. Deployment of speech recognition systems can be utilized as the evidence shown to its analyzing methods that is helpful for designing each and every individual’s future. It is said that the computer plays an important role for this process as by this all the translated words can be acknowledged by the texts also. Vishal Dineshkumar Soni 2019. Speech Recognition: Transcription and transformation of human speech . International Journal on Integrated Education. 2, 6 (Dec. 2019), 257-262. DOI:https://doi.org/10.31149/ijie.v2i6.497. Pdf Url : https://journals.researchparks.org/index.php/IJIE/article/view/497/478 Paper Url : https://journals.researchparks.org/index.php/IJIE/article/view/497
This document summarizes a research paper on developing a syllable-based speech recognition system for the Myanmar language. The proposed system has three main components: feature extraction, phone recognition, and decoding. Feature extraction transforms speech into acoustic feature vectors. Phone recognition computes likelihoods of acoustic observations given linguistic units like phones. Decoding uses acoustic and language models to find the most likely sequence of words. The paper discusses building acoustic and language models for Myanmar. The acoustic model is trained using Hidden Markov Models and Gaussian mixture models. The language model is an n-gram model built using syllable segmentation of text. Developing the first speech recognition system for Myanmar poses technical challenges due to its tonal syllabic structure.
A Comparison between Natural and Synthetic Food Flavoring Extracts Using Infr...IOSR Journals
Food is the basic necessity of life. One works hard and earns to satisfy our hunger .But at the end of the day, many of us are not sure of what we eat. We may be eating a dangerous flavors and dyes. Often, we invite diseases rather than good health. The purpose of this article is to detect the presence of food adulterants in some common foods and to create awareness about the artificial tests and dyes. A study of the IR spectra and the optical activitiy of two natural and artificial most common used flavor and colors (Vanilla and Strawberry) were detected. IR spectra of synthetic Vanilla were dominated by specific peaks that attributed to corresponding synthetic pigments (specific spectral band of stretching C=0 ester of aldehydic and ketonic groups in synthetic flavor at1744.87cm-1 with a weak shoulder at1700 cm-1 .And stretching CO of sucrose at (990.49 and 923,70) cm-1.The synthetic Strawberry characterized with specific spectral bands of (C=O stretching at 1634.96 cm-1 in ester and CO stretching of sucrose at 925 cm-1), while these functional groups disappeared in natural. Vanilla and Strawberry extracts. The natural Flavoring extracts posse's levorotatory property; they are optically active, while the synthetic extracts not rotates the plane of polarization of the light which passes through the material, they are said to be; not active optically. The obtained results indicated that, Infrared spectrum and Optical activity could be adapted to detect adulterants added products, and to differentiate between natural and artificial food flavoring extracts.
This document summarizes a research paper that analyzes the performance of the Dynamic Source Routing (DSR) protocol for mobile ad hoc networks (MANETs) in terms of energy consumption. It proposes an Energy Secure DSR (ESDSR) protocol that modifies DSR to optimize energy consumption by not including nodes with low energy in route selection. Simulations using the NS-3 network simulator show that ESDSR has better performance than DSR in terms of energy consumption, delay, throughput, and packet delivery ratio.
This document presents a method for image upscaling using a fuzzy ARTMAP neural network. It begins with an introduction to image upscaling and interpolation techniques. It then provides background on ARTMAP neural networks and fuzzy logic. The proposed method uses a linear interpolation algorithm trained with an ARTMAP network. Results show the method performs better than nearest neighbor interpolation in terms of peak signal-to-noise ratio, mean squared error, and structural similarity, though not as high as bicubic interpolation. Overall, the fuzzy ARTMAP network provides an effective way to perform image upscaling with fewer artifacts than traditional methods.
This document describes a virtual mouse system that uses computer vision and color tracking to replace a conventional mouse. The system tracks colored objects like a red or blue object held in the user's hand to map hand movements to mouse movements and clicks. It analyzes image frames from a webcam to detect pixel colors and scale the detected positions to match screen coordinates. This allows for freer motion than a physical mouse and reduces costs compared to alternatives like touchscreens. The system is implemented using OpenCV for image processing and runs entirely in software on the user's computer.
The document discusses vulnerabilities in the Border Gateway Protocol (BGP) and proposes solutions to secure BGP. It begins by providing an overview of BGP, describing how it exchanges routing information but lacks security mechanisms. This makes it vulnerable to attacks like prefix hijacking. The document then proposes using X.509 certificates to authenticate BGP speakers and their address spaces. It also suggests applying IPsec to secure communications. The proposed solution authenticates BGP speakers by verifying certificates before establishing sessions, aiming to overcome BGP vulnerabilities with low computational cost.
The document proposes a voice-based security system (VBSS) that uses voice recognition technology to uniquely identify individuals for security purposes. The VBSS would analyze characteristics of a user's voice and compare it to stored voiceprints to grant access. It aims to provide more secure authentication than passwords alone. The proposed system uses a voice-to-phoneme algorithm to convert speech to an abstract representation in a speaker-independent manner. It includes modules for voice tracking, analysis, storage, authentication and administration to enroll users and handle access requests.
This document discusses speech recognition technology. It begins by defining speech recognition as the process of converting spoken words to text. It then discusses some key companies in the space, including Nuance Communications which was founded in 1994 as a spinoff from SRI to commercialize speech recognition technology. The document also outlines some features and applications of Dragon speech recognition software, as well as limitations, opportunities, and the future of speech recognition technology.
Assistive Examination System for Visually ImpairedEditor IJCATR
This paper presents a design of voice enabled examination system which can be used by the visually challenged students.
The system uses Text-to-Speech (TTS) and Speech-to-Text (STT) technology. The text-to-speech and speech-to-text web based
academic testing software would provide an interaction for blind students to enhance their educational experiences by providing them
with a tool to give the exams. This system will aid the differently-abled to appear for online tests and enable them to come at par with
the other students. This system can also be used by students with learning disabilities or by people who wish to take the examination in
a combined auditory and visual way.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
This report looks at the patenting activity around voice control of mobile devices and also captures key litigations and NPE’s operating in this area. Patents were categorized as per key algorithms and application areas and analyzed for generating different trends within PatSeer Project.
Voice recognition system is a system which is used to convert human voice into signal, which can be understood by the machines. When this is achieved, the machine can be made to work, as desired. The machine could be a computer, a typewriter, or even a robot. There are systems available, in which the machine ‘speaks’ the recorded word. But that is out of the scope of this paper. Here, only the human is expected to talk. Further, the voice recognition systems described here, can be used for projects only.
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET Journal
This document presents a robust sign language and hand gesture recognition system using convolutional neural networks. The system captures image frames and processes them through various neural network layers to classify hand gestures into letters, numbers, or other symbols. It segments the hand from images using color thresholds and edge detection. The images then undergo preprocessing like resizing before being classified by the CNN. The CNN is trained on a large dataset to accurately recognize gestures in sign language and provide text output to bridge communication between deaf and non-signing individuals. The system achieved good results classifying several alphabet letters but could be expanded to recognize word combinations.
Our speech to text conversion project aims to help the nearly 20% of people worldwide with disabilities by allowing them to control their computer and share information using only their voice. The system uses acoustic and language models with a speech engine to recognize speech and convert it to text. It can perform operations like opening calculator and wordpad. Speech recognition has applications in areas like cars, healthcare, education and daily life. Accuracy depends on factors like vocabulary size, speaker dependence, and speech type (isolated, continuous). The system aims to improve accessibility while reducing costs.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
The document summarizes a presentation on automatic speech recognition systems. It includes an introduction defining ASR as the transcription of spoken language into text in real time. It shows the basic block diagram of an ASR system and explains how it works similarly to the human process of hearing, transmitting signals to the brain, and understanding. Some key uses of ASR are in smartphones, AI robots, home automation, and computers. The benefits mentioned are hands-free use, aiding reading and spelling, and easy operation.
The project was started with a sole aim in mind that the design should be able to recognize the voice of a person by analyzing the speech signal. The simulation is done in MATLAB. The design of the project is based on using the Linear prediction filter coefficient (LPC) and Principal component analysis (PCA) on data (princomp) for the speech signal analysis. The Sample Collection process is accomplished by using the microphone to record the speech of male/female. After executing the program the speech is analyzed by the analysis part of our MATLAB program code and our design should be able to identify and give the judgment that the recorded speech signal is same as that of our desired output.
speech recognition,History of speech recognition,what is speech recognition,Voice recognition software , Advantages and Disadvantages speech recognition, voice recognition,Voice recognition in operating systems ,Types of speech recognition
Complete power point presentation on SPEECH RECOGNITION TECHNOLOGY.
Very helpful for final year students for their seminar.
One can use this presentation as their final year seminar.
Speech Recognition is a very interesting topic for seminar.
Voice input and speech recognition system in tourism/social mediacidroypaes
Voice input devices allow users to input data or commands using speech instead of other input methods like keyboards. Some voice input devices recognize words from a predefined vocabulary while others need to be trained for a specific speaker. When a word is spoken, the matching input is displayed on screen for verification.
Speech recognition is the process of converting spoken language to text using computer programs. It draws from linguistics, computer science, and electrical engineering. Applications include voice assistants, dictation software, call routing, and more. Accuracy depends on factors like vocabulary size, presence of similar sounding words, whether the system is designed for one speaker or many, and whether speech is isolated, connected or continuous.
Speech Recognition: Transcription and transformation of human speechSubmissionResearchpa
The specified subfield of computational linguistics and computer science can said to be linked with speech recognition. Speech recognition can develop new variation technologies as well as methodologies generated as interdisciplinary concept. It can be considered to translate and recognize and satisfy the capability towards understanding and translating the words that are already spoken. It is more preciously said that in the most recent times this field has secured positive feedback by intense learning of voice recognition. Such evidences shows the proof that it has more market demand for implementing the application of specific data as voice recognition. Deployment of speech recognition systems can be utilized as the evidence shown to its analyzing methods that is helpful for designing each and every individual’s future. It is said that the computer plays an important role for this process as by this all the translated words can be acknowledged by the texts also. Vishal Dineshkumar Soni 2019. Speech Recognition: Transcription and transformation of human speech . International Journal on Integrated Education. 2, 6 (Dec. 2019), 257-262. DOI:https://doi.org/10.31149/ijie.v2i6.497. Pdf Url : https://journals.researchparks.org/index.php/IJIE/article/view/497/478 Paper Url : https://journals.researchparks.org/index.php/IJIE/article/view/497
This document summarizes a research paper on developing a syllable-based speech recognition system for the Myanmar language. The proposed system has three main components: feature extraction, phone recognition, and decoding. Feature extraction transforms speech into acoustic feature vectors. Phone recognition computes likelihoods of acoustic observations given linguistic units like phones. Decoding uses acoustic and language models to find the most likely sequence of words. The paper discusses building acoustic and language models for Myanmar. The acoustic model is trained using Hidden Markov Models and Gaussian mixture models. The language model is an n-gram model built using syllable segmentation of text. Developing the first speech recognition system for Myanmar poses technical challenges due to its tonal syllabic structure.
A Comparison between Natural and Synthetic Food Flavoring Extracts Using Infr...IOSR Journals
Food is the basic necessity of life. One works hard and earns to satisfy our hunger .But at the end of the day, many of us are not sure of what we eat. We may be eating a dangerous flavors and dyes. Often, we invite diseases rather than good health. The purpose of this article is to detect the presence of food adulterants in some common foods and to create awareness about the artificial tests and dyes. A study of the IR spectra and the optical activitiy of two natural and artificial most common used flavor and colors (Vanilla and Strawberry) were detected. IR spectra of synthetic Vanilla were dominated by specific peaks that attributed to corresponding synthetic pigments (specific spectral band of stretching C=0 ester of aldehydic and ketonic groups in synthetic flavor at1744.87cm-1 with a weak shoulder at1700 cm-1 .And stretching CO of sucrose at (990.49 and 923,70) cm-1.The synthetic Strawberry characterized with specific spectral bands of (C=O stretching at 1634.96 cm-1 in ester and CO stretching of sucrose at 925 cm-1), while these functional groups disappeared in natural. Vanilla and Strawberry extracts. The natural Flavoring extracts posse's levorotatory property; they are optically active, while the synthetic extracts not rotates the plane of polarization of the light which passes through the material, they are said to be; not active optically. The obtained results indicated that, Infrared spectrum and Optical activity could be adapted to detect adulterants added products, and to differentiate between natural and artificial food flavoring extracts.
This document summarizes a research paper that analyzes the performance of the Dynamic Source Routing (DSR) protocol for mobile ad hoc networks (MANETs) in terms of energy consumption. It proposes an Energy Secure DSR (ESDSR) protocol that modifies DSR to optimize energy consumption by not including nodes with low energy in route selection. Simulations using the NS-3 network simulator show that ESDSR has better performance than DSR in terms of energy consumption, delay, throughput, and packet delivery ratio.
This document presents a method for image upscaling using a fuzzy ARTMAP neural network. It begins with an introduction to image upscaling and interpolation techniques. It then provides background on ARTMAP neural networks and fuzzy logic. The proposed method uses a linear interpolation algorithm trained with an ARTMAP network. Results show the method performs better than nearest neighbor interpolation in terms of peak signal-to-noise ratio, mean squared error, and structural similarity, though not as high as bicubic interpolation. Overall, the fuzzy ARTMAP network provides an effective way to perform image upscaling with fewer artifacts than traditional methods.
This document describes a virtual mouse system that uses computer vision and color tracking to replace a conventional mouse. The system tracks colored objects like a red or blue object held in the user's hand to map hand movements to mouse movements and clicks. It analyzes image frames from a webcam to detect pixel colors and scale the detected positions to match screen coordinates. This allows for freer motion than a physical mouse and reduces costs compared to alternatives like touchscreens. The system is implemented using OpenCV for image processing and runs entirely in software on the user's computer.
The document discusses vulnerabilities in the Border Gateway Protocol (BGP) and proposes solutions to secure BGP. It begins by providing an overview of BGP, describing how it exchanges routing information but lacks security mechanisms. This makes it vulnerable to attacks like prefix hijacking. The document then proposes using X.509 certificates to authenticate BGP speakers and their address spaces. It also suggests applying IPsec to secure communications. The proposed solution authenticates BGP speakers by verifying certificates before establishing sessions, aiming to overcome BGP vulnerabilities with low computational cost.
This document discusses power transformer differential relay inrush restraint setting applications. It describes how differential relays use harmonic restraint to block tripping during transformer energization due to magnetizing inrush currents, which contain high levels of second harmonic current. The document outlines criteria for setting transformer differential relays, including using 15% of the second harmonic and 35% of the fifth harmonic for the inrush restraint current setting. It also describes simulating a power transformer system using software to examine harmonic levels and achieve suitable inrush restraint settings. The settings provide fast fault clearing while properly restraining for inrush events.
This document summarizes an article that optimizes the Okumura-Hata propagation model for 800MHz radio frequency in Yaounde, Cameroon using a Newton second order iterative algorithm. Radio measurements were collected through drive tests on an existing CDMA2000 network in Yaounde. The root mean squared error between predicted and measured values was calculated to validate the optimized model. A new model developed through this process was found to better represent the local environment compared to the standard Okumura-Hata model. The optimized model can be used for future radio network planning in Yaounde, such as for upcoming LTE deployment in the 800MHz band.
Lean and Agile Manufacturing as productivity enhancement techniques - a compa...IOSR Journals
Lean and agile manufacturing are productivity enhancement techniques that aim to improve responsiveness to customers and reduce costs. Lean focuses on eliminating waste through continuous improvement processes, while agile emphasizes flexibility and nimbleness to respond quickly to changes. Both approaches seek to enhance value for customers. Key differences are that lean focuses more on efficiency and waste reduction within operations, while agile takes a more holistic view across the entire enterprise to thrive in uncertain environments through rapid adaptation.
The document discusses the concept and working of a six-stroke internal combustion engine. A six-stroke engine generates power twice per cycle by adding two additional strokes to the traditional four-stroke cycle. This results in higher efficiency and lower fuel consumption compared to four-stroke engines. The six-stroke cycle includes intake, compression, power, exhaust, and two additional strokes where heated air is used to generate a second power stroke. Major inventors who developed six-stroke engines include Malcolm Beare, Bruce Crower, and Velozeta. The advantages are increased efficiency and reduced emissions, but disadvantages include increased complexity and cost.
This document describes a simulation model developed to calculate Overall Equipment Effectiveness (OEE) for a generic production line. The model takes input data from XML files generated by an optimization model that minimizes costs based on factors like work in progress inventory and machine idle time. Both crisp and fuzzy models are implemented to calculate availability, performance, quality, and overall OEE. The fuzzy model uses Mamdani inference with triangular membership functions. Simulation results in VB and Excel are presented and compared to world class standards. Goal seek and scenario manager tools are used to determine input parameter changes needed to meet standards. The model provides a way to evaluate a production line's efficiency and identify areas for improvement.
Dietary Supplementation with Calcium in Healthy Rats Administered with Artemi...IOSR Journals
Reports on the role of calcium on predisposition to cardiovascular disease have been rather inconsistent while studies on its interaction with other medications are ongoing. We therefore investigated the effect of separate and combine administration of calcium supplement with artemisinin-based combination drug on hepatic and serum lipid profile. Thirty two male wistar rats were randomly assigned into four groups of eight rats each. The control (group A) received normal saline. Group B and D were placed on 10mg/Kg calcium twice daily for four weeks. On the thirtieth day, therapeutic dose of artemisinin-based combination was simultaneously administered to group C and group D twice daily for three days. All the rats were then sacrificed after 12 hours fasting, blood was withdrawn and the liver removed and homogenized in an appropriate buffer. Biochemical analysis showed no significant (p>0.05) variation in hepatic triaacylglycerol in all the treated groups whereas calcium supplementation was observed to induce a significant (p<0.05) reduction in hepatic cholesterol. Significant elevations due to calcium supplementation were also observed in serum total cholesterol, LDL cholesterol level and atherogenic risk index with a concomitant reduction in serum HDL cholesterol. No significant change was observed in serum total cholesterol, triacylglycerol and serum lipoproteins in all other treatment groups. Our study suggests that calcium supplementation may predispose to cardiovascular disease and that its co administration with ACT may not aggravate nor reduced the predisposition risk.
Productive disciplinary engagement according to students’ school levels: a co...IOSR Journals
This document summarizes a research study that compares the productive disciplinary engagement of students at different school levels in gymnastics lessons in Tunisia. The study uses two theoretical frameworks: the theory of didactic joint action and the concept of productive disciplinary engagement. Ethnographic observations and video recordings of lessons were analyzed to describe student task transformations. Results showed how breaks in the didactic contract allowed knowledge progression and student contributions. Patterns of engagement were identified for strong and weak students. Productive disciplinary engagement was characterized by principles of problematizing, authority, accountability, and access to resources. The frameworks were found to be complementary for analyzing student engagement and the conditions created by teachers.
Providing The Security Against The DDOS Attack In Mobile Ad Hoc NetworksIOSR Journals
This document discusses providing security against distributed denial of service (DDOS) attacks in mobile ad hoc networks. It begins by introducing mobile ad hoc networks and some of their security vulnerabilities. It then discusses different types of attacks against MANETs, including black hole attacks, wormhole attacks, denial of service attacks, and distributed denial of service attacks. It proposes using an intrusion detection system to detect attacks and block attacking nodes. Simulation results are discussed to analyze the effectiveness of detection and mitigation techniques against DDOS attacks in terms of network performance metrics. The conclusion is that implementing queue management algorithms in network routers can help protect users during DDOS attacks by guaranteeing a certain level of bandwidth.
This paper proposes using fuzzy cognitive maps (FCM) to automatically detect a student's learning style in an adaptive e-learning system based on the Felder-Silverman learning style model. FCM is a soft computing technique that combines fuzzy logic and neural networks. The paper reviews related work on automatic detection of learning styles. It then explains how FCM would be used to model student behaviors and interactions to identify their learning style dimensions. The results of testing this approach are discussed. The overall goal is to personalize the e-learning experience based on a student's detected learning style.
This document discusses using bloodstain pattern evidence from crime scenes to predict the positions of victims, perpetrators, and bystanders through Bayesian networks. It begins by providing context on violent crime statistics and definitions. It then outlines the typical process of investigating and reconstructing a crime scene. This involves defining, processing, and collecting information from the scene. Specifically, it discusses using bloodstain patterns to sequence events, determine directions of movement, and infer positions. The research aims to add objectivity to crime scene reconstruction by documenting how stain patterns vary with impact angles, heights, and apertures using physics and fluid mechanics principles. The goal is probabilistic positional prediction through Bayesian reasoning.
Intelligent Fault Identification System for Transmission Lines Using Artifici...IOSR Journals
Transmission and distribution lines are vital links between generating units and consumers. They are
exposed to atmosphere, hence chances of occurrence of fault in transmission line is very high, which has to be
immediately taken care of in order to minimize damage caused by it. This paper focuses on detecting the faults
on electric power transmission lines using artificial neural networks. A feed forward neural network is
employed, which is trained with back propagation algorithm. Analysis on neural networks with varying number
of hidden layers and neurons per hidden layer has been provided to validate the choice of the neural networks
in each step. The developed neural network is capable of detecting single line to ground and double line to
ground for all the three phases. Simulation is done using MATLAB Simulink to demonstrate that artificial
neural network based method are efficient in detecting faults on transmission lines and achieve satisfactory
performances. A 300km, 25kv transmission line is used to validate the proposed fault detection system.
Hardware implementation of neural network is done on TMS320C6713.
The document discusses securing biometric templates when transmitted over non-secure channels by selecting partial fingerprint and iris data, encrypting it using AES with an iris hash as the key, and transmitting the encrypted data. It outlines the need to protect biometric data due to risks of identity theft if templates are compromised. Various attacks on biometric systems and methods of template protection including cryptography and cancelable biometrics are also reviewed.
A detailed geological history of quartz and industrial minerals present in different localities of
Eritrea is given. Well-grown transparent quartz crystals reflecting the hexagonal crystallographic features and
isolated, irregular shaped small milky quartz stones are found in western suburb of Asmara and the area
between Molebso and Zara in central northern Eritrea. Mechanism of formation of growth features observed on
the habit faces of transparent quartz crystals is briefly explained. Micro-topographical studies carried out on
these crystals indicate that to begin with, they grow and develop under high supersaturating conditions.
Most of the milky quartz stones are observed to be generally randomly scattered and devoid of gold. However,
few such specimens having yellow colored dots on their surfaces contain gold particles. Energy dispersion of Xray
analysis (EDAX) indicates high content of gold to the tune of 48% present in such samples. Commercial
implications related to quartz bearing gold are discussed. It is proposed that gold exists in large quantity in
quartz veins deep beneath the surface of earth in this region.
Physiological, Biochemical and Modern Biotechnological Approach to Improvemen...IOSR Journals
Rauwolfia serpentina also known as Sarpagandha (Apocynaceae) is an integral part of Ayurvedic medical system in India for over centuries for the treatment of various ailments. The leaves and roots ofRauwolfiaserpentina contain alkaloids which are secondary metabolites. Major alkaloids identified are Reserpine, Rauwolfine, Serpentine, Sarpagine, Ajmaline, Yohimbine and Ajmalicine.The present paper is an overview of the studies concerning with physiological, biochemical and modern biotechnological approach to improvement of Rauwolfiaserpentina.
Balking and Reneging in the Queuing SystemIOSR Journals
In this paper, we have discussed about a steady state solution of the ordered queuing problem with balking and reneging. Here we have taken the waiting line is of chi-square queue with Poisson balking probability which depend not only on the number of customers in the system, but also the rate of services in the system.
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
This document discusses speech feature extraction and classification techniques for speech recognition systems. It provides an overview of common feature extraction methods like MFCC and LPC, and classification algorithms like ANN and SVM. MFCC mimics human auditory perception but provides weak power spectrum, while LPC is easy to calculate but does not capture information at a linear scale. ANN can learn from data but is complex for large datasets, while SVM is accurate and suitable for pattern recognition but requires fixed-length coefficients. The document evaluates these techniques and concludes that MFCC performance is more efficient than LPC for feature extraction in speech recognition.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
A survey on Enhancements in Speech RecognitionIRJET Journal
This document discusses enhancements in speech recognition and provides an overview of the history and basic model of speech recognition. It summarizes key enhancements researchers have made to improve speech recognition, especially in noisy environments. The basic model of speech recognition involves speech input, preprocessing using techniques like MFCCs, classification models like RNNs and HMMs, and output of a transcript. Researchers are working to develop robust speech recognition that can understand speech in any environment.
This document discusses a fast algorithm for noisy speaker recognition using artificial neural networks. It summarizes the following:
- The algorithm first records voice patterns from speakers via a noisy channel and applies noise removal techniques.
- It then extracts features using Mel Frequency Cepstral Coefficients (MFCC) and reduces features using Principal Component Analysis.
- The reduced feature vectors are classified using an artificial neural network classifier.
- Experimental results showed the proposed algorithm achieved about 99% accuracy on average, which is higher than other methods, and was also faster.
Classification of Language Speech Recognition Systemijtsrd
This paper is aimed to implement Classification of Language Speech Recognition System by using feature extraction and classification. It is an Automatic language Speech Recognition system. This system is a software architecture which outputs digits from the input speech signals. The system is emphasized on Speaker Dependent Isolated Word Recognition System. To implement this system, a good quality microphone is required to record the speech signals. This system contains two main modules feature extraction and feature matching. Feature extraction is the process of extracting a small amount of data from the voice signal that can later be used to represent each speech signal. Feature matching involves the actual procedure to identify the unknown speech signal by comparing extracted features from the voice input of a set of known speech signals and the decision making process. In this system, the Mel frequency Cepstrum Coefficient MFCC is used for feature extraction and Vector Quantization VQ which uses the LBG algorithm is used for feature matching. Khin May Yee | Moh Moh Khaing | Thu Zar Aung "Classification of Language Speech Recognition System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26546.pdfPaper URL: https://www.ijtsrd.com/computer-science/speech-recognition/26546/classification-of-language-speech-recognition-system/khin-may-yee
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
This document discusses feature extraction techniques for isolated word speech recognition. It begins with an introduction to digital speech processing and speech recognition models. The main part of the document compares two common feature extraction techniques: Mel Frequency Cepstral Coefficients (MFCC) and Relative Spectral (RASTA) filtering. MFCC allows signals to extract feature vectors and provides high performance but lacks robustness. RASTA filtering reduces the impact of noise in signals and provides high robustness by band-passing feature coefficients in both log spectral and spectral domains. The document provides details on the process of MFCC feature extraction, which involves steps like framing, windowing, fast Fourier transform, mel filtering, discrete cosine transform, and calculating
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Performance of different classifiers in speech recognitioneSAT Journals
Abstract Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods. Index Terms: Speech Recognition, Soft Thresholding, Discrete Wavelet Transforms, Artificial Neural Networks, Support Vector Machines and Naive Bayes Classifier.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
An optimized approach to voice translation on mobile phoneseSAT Journals
Abstract Current voice translation tools and services use natural language understanding and natural language processing to convert words. However, these parsing methods concentrate more on capturing keywords and translating them, completely neglecting the considerable amount of processing time involved. In this paper, we are suggesting techniques that can optimize the processing time thereby increasing the throughput of voice translation services. Techniques like template matching, indexing frequently used words using probability search and session-based cache can considerably enhance processing times. More so, these factors become all the more important when we need to achieve real-time translation on mobile phones. Keywords:-Optimized voice translation, mobile client, speech recognition, language interpretation, template matching, probability search, session- based cache.
Robust Speech Recognition Technique using Mat labIRJET Journal
The document proposes a new robust speech recognition technique using MATLAB. It takes an audio signal as input, eliminates noise, and matches patterns to recognize the input. It uses Fourier transforms to extract data from the audio, performs noise elimination, converts the noise-eliminated data to patterns, and matches the patterns to those stored in a database. Patterns are represented as states in a weighted automaton. The technique achieves 90% accuracy in recognizing words and is well-suited for voice command systems.
This document provides an overview of a seminar on AI for speech recognition. It includes an introduction to AI and speech recognition, different models for speech recognition including HMM and DTW, applications of speech recognition in various domains, and challenges. The content list covers topics like performance of speech recognition systems, applications, and failures of speech recognition. Statistical models are important for decoding speech accurately. AI is recognized as an efficient method for speech recognition.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages: Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the input speech waveform into a sequence of acoustic feature vectors, each vector representing the information in a small time window of the signal. And then the likelihood of the observation of feature vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language Model (LM). The system will produce the most likely sequence of words as the output. The system creates the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
This proposed system is syllable-based Myanmar speech recognition system. There are three stages:
Feature Extraction, Phone Recognition and Decoding. In feature extraction, the system transforms the
input speech waveform into a sequence of acoustic feature vectors, each vector representing the
information in a small time window of the signal. And then the likelihood of the observation of feature
vectors given linguistic units (words, phones, subparts of phones) is computed in the phone recognition
stage. Finally, the decoding stage takes the Acoustic Model (AM), which consists of this sequence of
acoustic likelihoods, plus an phonetic dictionary of word pronunciations, combined with the Language
Model (LM). The system will produce the most likely sequence of words as the output. The system creates
the language model for Myanmar by using syllable segmentation and syllable based n-gram method.
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
This document describes an expert system for speech synthesis of Standard Arabic text. It involves two main stages: 1) creation of a sound database and 2) text-to-speech transformation. The transformation process involves phonetic orthographic transcription of the text and then generating voice signals corresponding to the transcribed phonetic sequence. The expert system uses a knowledge base containing sound data and rewriting rules. It transcribes text using graphemes as basic units and then concatenates sound units from the database to synthesize speech. Tests achieved a 96% success rate in pronouncing sentences correctly. Future work aims to improve prosody and develop fully automatic signal segmentation.
This document provides a technical review of secure banking using RSA and AES encryption methodologies. It discusses how RSA and AES are commonly used encryption standards for secure data transmission between ATMs and bank servers. The document first provides background on ATM security measures and risks of attacks. It then reviews related work analyzing encryption techniques. The document proposes using a one-time password in addition to a PIN for ATM authentication. It concludes that implementing encryption standards like RSA and AES can make transactions more secure and build trust in online banking.
This document analyzes the performance of various modulation schemes for achieving energy efficient communication over fading channels in wireless sensor networks. It finds that for long transmission distances, low-order modulations like BPSK are optimal due to their lower SNR requirements. However, as transmission distance decreases, higher-order modulations like 16-QAM and 64-QAM become more optimal since they can transmit more bits per symbol, outweighing their higher SNR needs. Simulations show lifetime extensions up to 550% are possible in short-range networks by using higher-order modulations instead of just BPSK. The optimal modulation depends on transmission distance and balancing the energy used by electronic components versus power amplifiers.
This document provides a review of mobility management techniques in vehicular ad hoc networks (VANETs). It discusses three modes of communication in VANETs: vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and hybrid vehicle (HV) communication. For each communication mode, different mobility management schemes are required due to their unique characteristics. The document also discusses mobility management challenges in VANETs and outlines some open research issues in improving mobility management for seamless communication in these dynamic networks.
This document provides a review of different techniques for segmenting brain MRI images to detect tumors. It compares the K-means and Fuzzy C-means clustering algorithms. K-means is an exclusive clustering algorithm that groups data points into distinct clusters, while Fuzzy C-means is an overlapping clustering algorithm that allows data points to belong to multiple clusters. The document finds that Fuzzy C-means requires more time for brain tumor detection compared to other methods like hierarchical clustering or K-means. It also reviews related work applying these clustering algorithms to segment brain MRI images.
1) The document simulates and compares the performance of AODV and DSDV routing protocols in a mobile ad hoc network under three conditions: when users are fixed, when users move towards the base station, and when users move away from the base station.
2) The results show that both protocols have higher packet delivery and lower packet loss when users are either fixed or moving towards the base station, since signal strength is better in those scenarios. Performance degrades when users move away from the base station due to weaker signals.
3) AODV generally has better performance than DSDV, with higher throughput and packet delivery rates observed across the different user mobility conditions.
This document describes the design and implementation of 4-bit QPSK and 256-bit QAM modulation techniques using MATLAB. It compares the two techniques based on SNR, BER, and efficiency. The key steps of implementing each technique in MATLAB are outlined, including generating random bits, modulation, adding noise, and measuring BER. Simulation results show scatter plots and eye diagrams of the modulated signals. A table compares the results, showing that 256-bit QAM provides better performance than 4-bit QPSK. The document concludes that QAM modulation is more effective for digital transmission systems.
The document proposes a hybrid technique using Anisotropic Scale Invariant Feature Transform (A-SIFT) and Robust Ensemble Support Vector Machine (RESVM) to accurately identify faces in images. A-SIFT improves upon traditional SIFT by applying anisotropic scaling to extract richer directional keypoints. Keypoints are processed with RESVM and hypothesis testing to increase accuracy above 95% by repeatedly reprocessing images until the threshold is met. The technique was tested on similar and different facial images and achieved better results than SIFT in retrieval time and reduced keypoints.
This document studies the effects of dielectric superstrate thickness on microstrip patch antenna parameters. Three types of probes-fed patch antennas (rectangular, circular, and square) were designed to operate at 2.4 GHz using Arlondiclad 880 substrate. The antennas were tested with and without an Arlondiclad 880 superstrate of varying thicknesses. It was found that adding a superstrate slightly degraded performance by lowering the resonant frequency and increasing return loss and VSWR, while decreasing bandwidth and gain. Specifically, increasing the superstrate thickness or dielectric constant resulted in greater changes to the antenna parameters.
This document describes a wireless environment monitoring system that utilizes soil energy as a sustainable power source for wireless sensors. The system uses a microbial fuel cell to generate electricity from the microbial activity in soil. Two microbial fuel cells were created using different soil types and various additives to produce different current and voltage outputs. An electronic circuit was designed on a printed circuit board with components like a microcontroller and ZigBee transceiver. Sensors for temperature and humidity were connected to the circuit to monitor the environment wirelessly. The system provides a low-cost way to power remote sensors without needing battery replacement and avoids the high costs of wiring a power source.
1) The document proposes a model for a frequency tunable inverted-F antenna that uses ferrite material.
2) The resonant frequency of the antenna can be significantly shifted from 2.41GHz to 3.15GHz, a 31% shift, by increasing the static magnetic field placed on the ferrite material.
3) Altering the permeability of the ferrite allows tuning of the antenna's resonant frequency without changing the physical dimensions, providing flexibility to operate over a wide frequency range.
This document summarizes a research paper that presents a speech enhancement method using stationary wavelet transform. The method first classifies speech into voiced, unvoiced, and silence regions based on short-time energy. It then applies different thresholding techniques to the wavelet coefficients of each region - modified hard thresholding for voiced speech, semi-soft thresholding for unvoiced speech, and setting coefficients to zero for silence. Experimental results using speech from the TIMIT database corrupted with white Gaussian noise at various SNR levels show improved performance over other popular denoising methods.
This document reviews the design of an energy-optimized wireless sensor node that encrypts data for transmission. It discusses how sensing schemes that group nodes into clusters and transmit aggregated data can reduce energy consumption compared to individual node transmissions. The proposed node design calculates the minimum transmission power needed based on received signal strength and uses a periodic sleep/wake cycle to optimize energy when not sensing or transmitting. It aims to encrypt data at both the node and network level to further optimize energy usage for wireless communication.
This document discusses group consumption modes. It analyzes factors that impact group consumption, including external environmental factors like technological developments enabling new forms of online and offline interactions, as well as internal motivational factors at both the group and individual level. The document then proposes that group consumption modes can be divided into four types based on two dimensions: vertical (group relationship intensity) and horizontal (consumption action period). These four types are instrument-oriented, information-oriented, enjoyment-oriented, and relationship-oriented consumption modes. Finally, the document notes that consumption modes are dynamic and can evolve over time.
The document summarizes a study of different microstrip patch antenna configurations with slotted ground planes. Three antenna designs were proposed and their performance evaluated through simulation: a conventional square patch, an elliptical patch, and a star-shaped patch. All antennas were mounted on an FR4 substrate. The effects of adding different slot patterns to the ground plane on resonance frequency, bandwidth, gain and efficiency were analyzed parametrically. Key findings were that reshaping the patch and adding slots increased bandwidth and shifted resonance frequency. The elliptical and star patches in particular performed better than the conventional design. Three antenna configurations were selected for fabrication and measurement based on the simulations: a conventional patch with a slot under the patch, an elliptical patch with slots
1) The document describes a study conducted to improve call drop rates in a GSM network through RF optimization.
2) Drive testing was performed before and after optimization using TEMS software to record network parameters like RxLevel, RxQuality, and events.
3) Analysis found call drops were occurring due to issues like handover failures between sectors, interference from adjacent channels, and overshooting due to antenna tilt.
4) Corrective actions taken included defining neighbors between sectors, adjusting frequencies to reduce interference, and lowering the mechanical tilt of an antenna.
5) Post-optimization drive testing showed improvements in RxLevel, RxQuality, and a reduction in dropped calls.
This document describes the design of an intelligent autonomous wheeled robot that uses RF transmission for communication. The robot has two modes - automatic mode where it can make its own decisions, and user control mode where a user can control it remotely. It is designed using a microcontroller and can perform tasks like object recognition using computer vision and color detection in MATLAB, as well as wall painting using pneumatic systems. The robot's movement is controlled by DC motors and it uses sensors like ultrasonic sensors and gas sensors to navigate autonomously. RF transmission allows communication between the robot and a remote control unit. The overall aim is to develop a low-cost robotic system for industrial applications like material handling.
This document reviews cryptography techniques to secure the Ad-hoc On-Demand Distance Vector (AODV) routing protocol in mobile ad-hoc networks. It discusses various types of attacks on AODV like impersonation, denial of service, eavesdropping, black hole attacks, wormhole attacks, and Sybil attacks. It then proposes using the RC6 cryptography algorithm to secure AODV by encrypting data packets and detecting and removing malicious nodes launching black hole attacks. Simulation results show that after applying RC6, the packet delivery ratio and throughput of AODV increase while delay decreases, improving the security and performance of the network under attack.
The document describes a proposed modification to the conventional Booth multiplier that aims to increase its speed by applying concepts from Vedic mathematics. Specifically, it utilizes the Urdhva Tiryakbhyam formula to generate all partial products concurrently rather than sequentially. The proposed 8x8 bit multiplier was coded in VHDL, simulated, and found to have a path delay 44.35% lower than a conventional Booth multiplier, demonstrating its potential for higher speed.
This document discusses image deblurring techniques. It begins by introducing image restoration and focusing on image deblurring. It then discusses challenges with image deblurring being an ill-posed problem. It reviews existing approaches to screen image deconvolution including estimating point spread functions and iteratively estimating blur kernels and sharp images. The document also discusses handling spatially variant blur and summarizes the relationship between the proposed method and previous work for different blur types. It proposes using color filters in the aperture to exploit parallax cues for segmentation and blur estimation. Finally, it proposes moving the image sensor circularly during exposure to prevent high frequency attenuation from motion blur.
This document describes modeling an adaptive controller for an aircraft roll control system using PID, fuzzy-PID, and genetic algorithm. It begins by introducing the aircraft roll control system and motivation for developing an adaptive controller to minimize errors from noisy analog sensor signals. It then provides the mathematical model of aircraft roll dynamics and describes modeling the real-time flight control system in MATLAB/Simulink. The document evaluates PID, fuzzy-PID, and PID-GA (genetic algorithm) controllers for aircraft roll control and finds that the PID-GA controller delivers the best performance.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
1. IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)
e-ISSN: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 4, Ver. I (Jul - Aug .2015), PP 61-67
www.iosrjournals.org
DOI: 10.9790/2834-10416167 www.iosrjournals.org 61 | Page
Survey of Technical Progress in Speech Recognition by Machine
over Few Years of Research
1
Nnamdi Okomba S., 2
Adegboye Mutiu Adesina, and 3
Candidus O. Okwor.
Computer Engineering Department
1,2,3
Federal University Oye-Ekiti, Ekiti State, Nigeria
Abstract: This paper reviews some of various research carried out over the last decade in the area of
Automatic Speech Recognition (ASR) and discusses the major themes and advance made in the last decade of
research, in order to show the outlook of technology and an appreciation of the fundamental progress that has
been achieved in this weighty area of speech communication. Over period of research and development, the
accuracy of automatic speech recognition remains one of the important research challenges such as variation of
the context, environmental condition, speaker’s variation and poor-quality audio. The design of speech
recognition requires careful attention to the following issue: Definition of various types of speech classes,
speech representation, techniques, database and performance evaluation. The history, challenges of speech
recognition system and various techniques to solve these challenges constructed by various research works have
been presented in a chronological order. The objective of this paper is to compare and summarize well know
approaches used in various steps of speech recognition system.
Keywords: Automatic Speech Recognition, Database, Feature extraction, Performance evaluation, Techniques
I. Introduction
Speech is the primary means of communication between human. Speech Recognition (also known as
Computer Speech Recognition or Automatic Speech Recognition (ASR)) is the process of converting a speech
signal to a sequence of word or other linguistic unit,by means of an algorithm implemented as a computer
program [1]. It is the most natural form of human communication and it has been one of the most existing areas
of the signal processing. Generation of speech waveforms and speech recognition has been under development
for several decades [2].
The main goal of speech recognition area is to develop techniques and systems for speech input to
machine. This paper reviews key highpoints during the last few decades in the research and development of
speech recognition, so as to provide a technological perspective. Although many technical progresses have been
made, there are still remaining many research problems that need to be addressed.
A. Speech Recognition System
Speech recognition system is used as intelligence home in personal communication system, banking
system and security system [3], [4]. Speech recognition technology was increasingly used within telephone
networks to automate as well as to enhance the operation service. Fig 1 shows the schematic diagram of speech
recognition system for human being.
It has four main building block speech analyses, feature extraction coding, language translation and
message understanding. Speech analysis made up of noise removal, silence removal and end point detection
which are necessary to improve the performance of speech recognition system. The speech analysis also deals
with suitable frame size for further analysis using sub segmentation, segmentation and super segmental analysis
techniques [5].
The feature extraction and coding unit reduce the dimensionality of the input vector and maintain
discriminating power of the signal. The spectral signal output of speech analysis converted to activity signal on
the auditory nerve using neural transducer method. Then, activity signal converted into a language code within
the brain and message understanding is finally accomplished for speech recognition [6].
2. Survey Of Technical Progress In Speech Recognition By Machine …
DOI: 10.9790/2834-10416167 www.iosrjournals.org 62 | Page
B. Types of Speech Recognition System
Automatic Speech Recognition System can be classified according to types of words they have the
ability to recognize. The Wordsare classified as the following:
Isolated Words
Isolated word recognizers required each utterance to have quiet (lack of an audio signal) on both sides
of the sample window. It accepts only single utterance/word at a time. It has Listen/ Non-Listen and requires the
speaker to pause between utterances.
Connected Words
Connected word recognizers (more correctly as „connected utterances‟) have almost the same
characteristic with isolated words, but it is slightly difference in the sense that it allows separate utterance to be
run-together with a least wait between them.
Spontaneous Speech
Recognizers with continuous speech proficiencies are sometimes most difficult to create because they
utilize special technique to determine utterance boundaries. It allows users to speak nearly naturally, while
system determines the content.
C. Automatic Speech Recognition(ASR)
Design Challenges
The task of Automatic Speech Recognition is to extract the primary linguistic message from a complex
acoustic patter contains many sources of variability. The challenges on which recognition accuracy depends has
been tabulated and presented in the Table 1.
Table 1: ASR System Design Challenges
Speech Analysis
(Noise and Silence
Removal and End
Point Detection)
Feature
Extraction
and Coding
Input
Speech
Recognized
Speech
Message
Understanding
Language
Translation
(Discrete
Output)
T
r
a
i
n
i
n
g
Fig. 1 Schematic diagram of Speech Recognition System
3. Survey Of Technical Progress In Speech Recognition By Machine …
DOI: 10.9790/2834-10416167 www.iosrjournals.org 63 | Page
D. Application of Speech Recognition System
Speech recognition has been used in the various area of application. Table 2 present difference sector, input
pattern, pattern classes and application area in which speech recognition systems have been used.
Table 2: Application of Speech Recognition
Sector Input Pattern Pattern Classes Application Area
Education Sectors Speech Wave
Form
Spoken Words Enable students who are physically
handicapped and unable to use
keyboard to enter text verbally
Non- Education
Sectors
Speech Wave
Form
Spoken Words Computer games, Precision surgery
Translation Speech Wave
Form
Spoken Words It is advanced application which
translate from one language to another
Medical Sectors Speech Wave
Form
Spoken Words Medical Transcriptions (digital speech
to text), Health care
Military Sectors Speech Wave
Form
Spoken Words Training air traffic controller,
Fighter aircraft,
Artificial
Intelligence (AI)
Sector
Speech Wave
Form
Spoken Words Robotics
Generally Speech Wave
Form
Spoken Words News reporting, Court reporting
II. Approach To Speech Recognition
Speech recognition research has been ongoing for more than 80 years. Over the period there have been
three major approaches, each with various techniques as presented in the Table 3.
Table 3: Speech Recognition Technique
Approach Technique Representation Recognition
Function
Acoustic
phonetic
approach
Phonemes/
Segmentation
and labeling
Probabilistic
lexical access
procedure
Pattern
recognition
approach
Template Speech samples,
Pixels and
curves
Correlation,
distance
measure
Dynamic
Time Wave
(DTW)
Set of a
sequences of
spectral vectors
Dynamic
warping optimal
algorithm
Vector
Quantization
(QV)
Set of a spectral
vectors
Clustering
functions
(Codebook)
Statistical Features Discrimination
Support
vector
Machine
Kernel based
features
Maximal margin
hyper plane
AI approach Knowledge
based
Neural
Network
Rules/ Unit/
Procedures
based
Network
function
III. Feature Extraction
Feature extraction is a process of extracts data from the speech (Voice signal) that is unique for each
speaker. Mel Frequency Ceptral Coefficient (MFCC) technique is frequently used to create the fingerprint of the
sound files [7]. The Mel Frequency Ceptral Coefficient based on the know variation of the human ear‟s critical
bandwidth frequencies with filters spaced linearly at low frequencies and logarithmically at high frequencies
used to capture the important characteristics of speech [8], [9].
The main goal of the feature extraction step in speech recognition is to compute a parsimonious
sequence of feature vectors providing a compact representation of the given input signal.
The three stage in which feature extraction usually performed are:
1st
Stage: The first stage is called acoustic front end or speech analysis. It carries out temporal analysis of signal
and produce raw feature that describing envelop of the power spectrum of short speech interval.
2nd
Stage: The second stage composed of static and dynamic feature.
4. Survey Of Technical Progress In Speech Recognition By Machine …
DOI: 10.9790/2834-10416167 www.iosrjournals.org 64 | Page
3rd
Stage: The third stage transforms these extended feature vectors into more compact and robust vectors that
are then delivered to the recognizer. The Table 4 presents various methods for feature extraction in speech
recognition.
Table 4: Feature Extracted Method
S/N Method Property
1 Mel frequency cepstrum
(MFFCs)
Power spectrum is computed by
performing Fourier Analysis
2 Kernel based feature extraction
method
Non- linear transformation
3 Cepstral Analysis Static feature extraction method,
Power Spectrum
4 Filter bank analysis Filters tuned required frequencies
5 Principal component analysis
(PCA)
Non-linear feature extraction
method, Linear map/ fast/
eigenvector-based
6 Mel frequency scale analysis Statics feature extraction method,
Spectral analysis
7 Wavelet Better time resolution than Fourier
Transform
8 Dynamic feature extraction (LPC
&MFCCs)
Acceleration and delta coefficients
9 Cepstralmeanssubstraction Robust feature extraction
10 Linear Discriminant Analysis
(LDA)
Non-linear feature extraction
method, Linear map, iterative non
Gaussian
IV. Overview Of Speech Recognition
A. Between Years 1920 To 1960s
In the early 1920s machine recognition came in to an existence. The first machine to recognize speech
to any important step commercially named, Radio Rex (toy) was manufactured in 1920 [10]. Research into the
concepts of speech technology started in the year 1936 at Bell Laboratory. In 1939, Bell Labs showed a speech
synthesis machine at World Fair in New York before they later abandoned struggle to develop speech simulated
listening and recognition based on an inappropriate conclusion that artificial intelligence would eventually be
necessary for accomplishment.
In 1950s, earliest attempt to device system for automatics speech recognition by machine were made
when several researchers tried to exploit the fundamental concepts of acoustic phonetics. During this period,
most of the speech recognition system investigated special resonances during the vowel system of each
utterance which were extracted from output region of each utterance which were extracted from output signals
of an analogue of filter bank and logic circuits [11].
In 1952, at Bell Laboratories, David, Biddulph and Balashek developed a system for isolated digit
recognition for a single speaker [12]. The system depends deeply on measuring spectral resonances during the
vowel region of each digit. In 1956, at RCA Laboratory, Olson and Belar tried to recognize 10 distinct syllables
of a single talker, as personified in 10 monosyllabic words [13]. In another effort of Fry and Denes at University
College in England, in 1959, a phoneme recognizer that recognizes four vowels and nine consonants were
developed [14]. They used a spectrum analyzer and a pattern matcher to make the recognition decision. Again
during 1959 period vowel recognizer of forgie and forgie built at MIT Licoin Laboratory in which 10 vowels
embedded in a /b/ vowel /t/ format were recognized in a speaker independent mode [15].
B. Between Years 1960 To 1980s
In the 1960s at Japanes Laboratory Suzuks and Nakata started their research in the speech recognition
field and developed special purpose hardware as part of their system due to computation that were not fast
enough then [16]. In 1962s, another hardware effort in Japan was the work of Sakari and Doshita of Kyoto
University, who developed hardware phoneme recognizer [17]. The third Japanese work wasthe digit recognizer
hardware of Negate and Coworkers at NEC Laboratory in 1963 [18].
In the separate effort of Japan Sakoe and Chiba at NEC Laboratories dynamic programming technique
was used to solve the non-uniformity problems [19].The final achievement of annotation in the late 1960s was
pioneering research of Reddy in the area of continuous speech recognition by dynamic tracking of phonemes
[20].
5. Survey Of Technical Progress In Speech Recognition By Machine …
DOI: 10.9790/2834-10416167 www.iosrjournals.org 65 | Page
The field of isolated word or discrete utterance recognition became a possible and functional
technology in 1970s through the fundamental studies by Velichko and Zagoruyko in Russia [21], Itakura in
United State, Cakoe and Chiba in Japan [22]. Durring this decade, striving speech understanding project was
funded by Defence Advanced Research Projects Agencies (DARPA), which lead to various seminal system and
technology [23]. One of the demonstration of speech understanding was achieved by CMU in 1973 and Heresay
I system was able to used semantic data to significantly moderate the number of alternatives considered by the
recognizer. CMU‟s Harphy system was displayed to be able to recognize speech using a vocabulary of 1, 011
words with the judicious accuracy [24].
Another success of research in the 1970s was the beginning of a longstanding, extremely successful
effort of group in large vocabulary speech recognition at IBM in which researchers studied three different tasks
over a period of almost two decades, namely the New Raleigh Language [25] for simple database queries, Laser
potent text Language [26] for transcribing laser potent and lastly the oofice correspondent tasks called Tangora
[27] for dictation of simple communications.
C. Between years 1980 to 200s
Research in field of speech recognition in 1980s was characterized by a shift in technology from template based
approach to statistical modeling method, most especially the hidden Markov model (MM) approach [28], [29].
The approach of hidden markov model (HMM) was well known and understands in a few laboratories
like Primary IBM, Institute for Defense Analysis (IDA) and Dargon Systems but it became extensively used in
the middle of 1980s. Another innovative technology that came into existed in the late 1980s was the method of
applying neural network to problem of speech recognition [30]. The approach was first introduced in the 1950s,
but they did not prove useful initially because they had many practical problems [31] [32].
Era of 1980s was decade in which a major motivation was given to large vocabulary and continuous
speech recognition system by the defences Advanced Research Project Agency (DARPA) community,
sponsored a large research program aimed at accomplishing high word accuracy for 1000 word continuous
speech recognition, database management task. Major research contributions resulted from effort at CMU [33],
AT & T Bell Labs [34], Lincoln Labs [35] and SRI [36]. The CMU (also known as SPHINX System)
successfully integrated the Statistical method of HMM with the network search strength of the earlier Harpy
System.
D. Between Years 1990 To 2000
The year 1990s was a decade in which a number of innovations took place in the area of pattern
recognition. The problem of pattern recognition which traditionally followed the framework of Bayes and
required estimation of distributions for the data was changed into an optimization problem resulting to the
reduction of the empirical recognition error [37].
During this decade, a key issue {38] in design and implementation of speech recognition system was how to
appropriately select the speech material used to train the recognition algorithm. A number of human language
technology projects funded by DARPA in the 1980s and 1990s further enhanced the progress, as showed by
many papers published in the proceedings of the DARPA speech and Natural language/ Human language
workshop. The research describes the development of accomplishments for speech recognition that were
conducted in the 1990s [39], at Fajitsu Laboratories Limited.
E. Between Years 2000 Till Date
In year 2000, Variational Bayesian (VB) estimation and clustering technique were developed [40]. This
VB method was based on a subsequent distribution of parameters. Giuseppe Richardi [41] developed this
technique to solve the problem of adaptive learning in automatic speech recognition and also proposed active
learning algorithm for automatic speech recognition in the year 2005. Some enhancements have been worked
out on large vocabulary continuous speech recognition system on performance improvement [42].
Sadaoki and Furui investigated SR technique in 2005 that can adapt to solve speech variation using a large
number of model trained based on clustering technique. De-waehter et al. [43], attempted to overcome the time
dependencies probems in Speech Recognition(SR) by using straight forward template matching technique. In
2008, the authors [44] explained the application of corresponding framelevel of phone and phonological
attributed classes. In the recent work carried out by Vrinda and Chander, suitable speech recognition was
developed for hindilanguage, for the people thatare physically challenges and cannot able to operate the
computer through keyboard and mouse,using hidden markov model (HMM) to recognize speech sample to give
admirable result for isolated words [1].
6. Survey Of Technical Progress In Speech Recognition By Machine …
DOI: 10.9790/2834-10416167 www.iosrjournals.org 66 | Page
V.Speech Databases
Speech databases have an extensive uses in Automatics Speech Recognition (ASR). It is also used in
other key applications such as Automatic Speech Synthesis, Coding and Analysis, SpeakerLanguage
Identification and Verification. These applications required large amounts of recoded database. Speech
databases are most generally classified into multi-session and single-session databases.
Multi-session databases allow estimation of temporal intra-speaker variability. Based on acoustic
environment, databases are recorded either in noise free environment, such as office or home. Also, according to
the purpose of the databases, some corporations are designed for mounting and evaluating speech recognition.
V. Summary Of Thetechnology Progress
In the last 60 years, most especially in the last three decades, research in speech recognition has been highly
carried out worldwide, encouraged on by advance in signal processing algorithms, architectures and hardwares.
The technological progress in the 60 years is summarized and presented in the Table 4 [45].
Table4: Summary of the Technological Progress over 60 years
Past New (Present)
Template matching Corpus-based statistical modeling
Filter bank/ spectral
resonance
Cepstral features, kernel based function,
group delay function
Heuristic time
normalization
DP/ DTW matching
Distance-based Likelihood based method
Read speech
recognition
Spontaneous speech recognition
Maximum
likelihood
Discriminative
Hardware
recognizer
Software recognizer
Monologue
recognition
Dialogue/Conversation recognition
Clean speech
recognition
Noisy/Telephone speech recognition
Small vocabulary Large vocabulary
Isolated word
recognition
Continuous speech recognition
Single speaker
recognition
Speaker-independent/adaptive recognition
Single modality
(audio signal only)
Multimodal (audio/visual) Speech
recognition
VI. Performance Measure Of Speech Recognition
Speech recognition accuracy and speech recognition rate are two important to be consider in order
measuring performances of speech recognition system. Speech recognition accuracy is measured in terms of
Word Error Rate (WER), whereas speech recognition rate is measured in terms of computation rate.WER is a
common metric of the performance of speech recognition.
VII. Conclusion
Speech is the essential, most effective, reliable and common medium to communicate in real time
system. There are so many applications of Speech Recognition Systems that are still far from reality due to
deficiency of resourceful and reliable noise removal machinery and technique for improving the quantity of
recorded speech of every word being comprehensible. This paper attempts to provide a comprehensive survey of
research on speech recognition and the progress made in some years back till date. Although significance
improvement has been made in the last 50 years, still we believe that there are so many works to be doing under
the full variation in Speech Style, Environmental Condition and Speech Variation.
References
[1] Vrinda and S. Chander, “Speech recognition system for English language,” International Journal of Advanced Research in
Computer and Communication Engineering, Vol. 2, Issue 1, pp. 919-922, 2013
[2] L. Rabiner and B. H. Juang, “Fundamental of Speech Recognition,” AT & T, 1993.
[3] Burstein, A. Stolzle, and R. W. Broderson, “Using speech recognition in a personal communication system” IEEE in International
Conference on Conference on Communication, Vol. 3, pp. 1717-1992.
[4] T. Isobe, M. Morishima, F. Yoshitani, N. Koizumi, and K. Murakani, “Voice-activated home banking system and it field trial,”
International Conference on Spoken Language, Vol. 3, pp. 1688-1691, 1996.
[5] H. SJayanna and S. R. Mahadeva, “Analysis, Feature Extraction, Modeling and Tech., Rev., 26: 181-190, 2009.
7. Survey Of Technical Progress In Speech Recognition By Machine …
DOI: 10.9790/2834-10416167 www.iosrjournals.org 67 | Page
[6] L. Rabiner, B. H. Juang, and B. Yegnanarayana, “Fundamentals of speech recognition,” Pearson Education. First edition, ISBN
978-81-7758-560-5, 2009.
[7] M. Lindasalwa, B. Mumtaj, and Elamvazuthi, “Voice recognition algorithms using Mel Frequency Cepstral Coefficient (MFCC)
and DTW Technique” Journal of Computing, Volume 2, Issue 3, 2010.
[8] K. Ahsamul and M. A. Mohammad, ”Vector quantization in text dependent automatic speech recognition using Mel-Frequency
Cepstrum Coefficient,” 6th
WSEAS International Conference on Circuits System, Electronics, Control & Signal Processing, Cairo,
Egypt, pp. 352-355, Dec 29-31, 2007.
[9] S. Mahd and T. Azizollah, “Voice command recognition system based on MFCC and VQ algorithm, “World Academy of Science,
Engineering and Technology Journal, 2009.
[10] W. Stefan and H. Reinhold, “Approaches to iterative speech feature enhancement and recognition, “IEEE Transaction on Audio,
Speech and Language Processing, Vol. No.5, July 2009.
[11] F. Sadaoki, “50 years of progress in speech and speaker recognition research,“ ECTI Transactions on Computer and Information
Technology, Vol. 1. No.2, November 2005.
[12] K. H. Davis,R.Biddulp,and S.Balashek, “Automatic recognition of spoken digit,” J. Acoust.Soc.Am., 24(6):637-642, 1952.
[13] H. F. Olson and H. Belar, ” Phonetic Typewriter,” J. Acoust.Soc.Am., 28(6): 1072-1081, 1956.
[14] P. Deves, “The design and Operation of the Mechanical speech recognizer at University College London,” J. British Inst. Radio
Engr., 19(4), 211-299, 1959.
[15] J. W. Forgie and C. D. Forgie, “Results obtained from a vowel recognition computer program,” J.A.S.A., 31(11), pp. 1480-1489,
1959.
[16] J. Suzuki and K. Nakata, “Recognition of Japanese vowel-preliminary to the recognition of speech,” J. Radio Res. Lab 37(8), pp.
193-212, 1961.
[17] T. Sakai and S. Doshita, “The phonetic typewriter, information processing 1962,” Proc. ITIP congress 1962.
[18] K. Nagata, Y. Kato, and C. Chiba, “Spoken digit recognizer for Japanese Language,” NEC Res. Develop., No.6, 1963.
[19] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. Acoustic,
Speech and Signal Processing ASSP. 26(1), pp. 43-49, 1978.
[20] D. R. Reddy, “An approach to computer speech recognition by direct analysis of the speech wave,” Tech. Report No.C549,
Computer Science Dept., Stanford Univ., September 1966.
[21] V. M. Velichko and N. G. Zagoruyko, “Automatic recognition of 200 words,” Int. J. Man-Machine Studies, 2:223, June 1970.
[22] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. Acoustics,
speech, signal proc., ASSP 26(1), pp. 43-49, Febreary 1978.
[23] D. Klatt, “Review of the ARPA speech understanding project,” J.A.S.A. 62(6), pp. 1324-1366, 1977.
[24] Lowrre, The HARPY speech understanding project,” Trends in Speech Recognition, W. Lea, Ed., Speech Science Pub., pp. 576-
586, 1990.
[25] Tappert, N. R. Dixon, A. S. Rabinowitz, and W. D. Chapman, Automatic recognition of Continuous speech utilizing dynamic
segmentation, Dual Classification, Sequential Decoding and Error Recover,” Rome Air Dev. Cen, Rome, NY, Tech. Report TR-71-
146, 1971.
[26] F. Jelink, L. R.Bahl, and R. L. Mercer, “design of a Linguistic Statistical Decoder for the recognition of continuous speech,” IEEE
Trans. Information Theory, IT- 21 pp. 250-256, 1975.
[27] Jelink, “The development of an experimental discrete dictation recognizer,” Proc. IEEE, 73(11), pp.1616-624, 1985.
[28] J. Ferguson, “Ed., Hidden Markov Models for speech,” IDA, Princeton, NJ, 1980.
[29] L. Rabiner, “A tutorial on hidden markov model and selected applications in speech recognition,” Proc. IEEE, 77(2) pp.257-286,
February 1989.
[30] W. Chou and B. H. Juang, “(Eds.) Pattern recognition in speech and language processing, CRC Press, pp. 115-147, 2003.
[31] R.P. Lippmann, “An introduction to computing with neural nets,” IEEE Trans. ASSP Mag., 4(2), pp. 4-22, April 1987.
[32] A.Weibel, et at., “Phoneme recognition using time-delay neural networks,” IEEE Trans. Acoustics, Speech, Signal Proc., 37, pp.
393-404, 1989.
[33] K. F. Lee, “An overview of the SPHINX speech recognition system,” Proc ICASSP, 38, pp. 600-610, 1990.
[34] H. Lee, et al., “Acoustic modeling for large volcabulary speech recognition,” Computer Speech and Language, 4 pp. 127-165,
1990.
[35] M. Weintraub et al., “Linguistic constraints in hidden markov model based speech recognition,” Proc. ICASSP, pp. 699-702, 1989.
[37] Y. Chow, et al. “BYBLOS, the BBN continuous speech recognition system,” Proc. ICASSP, pp. 89-92, 1987.
[38] S. Furui and T. Ichiba, “Cluster-based modeling for ubiquitous speech recognition,”Department of Computer Science Tokyo
Institute of Technology, Interspeech 2005.
[39] M. K. Brown et al. “Advance in speech recognition technology, ”IEEE Transaction on Signal Processing Vol.39, No.6, June 1991.
[40] M. Afify and O. Siohan, “Sequential estimation with optimal forgetting for robust speech recognition,” IEEE Transaction on
Speech and Audio Processing,” Vol. 12, No.4, July 2004.
[41] M. Afify, F. Liu, and H. Jiang, “A new verification-based fast-match for large vocabulary continuous speech recognition,” IEEE
Transaction on Speech and Audio Processing, Vol. 133 No.4 July 2005.
[42] Riccardi, “Activate learning: Theory and application to Automatic speech recognition,” IEEE Transaction on speech and Audio
Processing, Vol. 13, No. 4, July 2005.
[43] A.Erell et al., “Filter bank energy estimation using mixture and morkov models for recognition of noisy speech,” IEEE
Transactions on Audio, Speech and Language Processing, Vol.1, No.1, 1993.
[44] J. weih, “Construction modulation frequency domain based feature for robust speech recognition,” IEEE Transaction on Audio,
Speech and Language Processing, Vol.16, No.1 Jan 2008.
[45] A.sloin et al. “Support vector Machine Training for improvement hidden markov model,” IEEE Transactions on Signal Processing,
Vol.56, No.1, Jan. 2008.