Speech is an important biological signal for primary mode of communication among human being and also the most natural and efficient form of exchanging information among human in speech. Speech processing is the most important aspect in signal processing. In this paper the theory of linear algebra called singular value decomposition (SVD) is applied to the speech signal. SVD is a technique for deriving important parameters of a signal. The parameters derived using SVD may further be reduced by perceptual evaluation of the synthesized speech using only perceptually important parameters, where the speech signal can be compressed so that the information can be transformed into compressed form without losing its quality. This technique finds wide applications in speech compression, speech recognition, and speech synthesis. The objective of this paper is to investigate the effect of SVD based feature selection of the input speech on the perception of the processed speech signal. The speech signal which is in the form of vowels \a\, \e\, \u\ were recorded from each of the six speakers (3 males and 3 females). The vowels for the six speakers were analyzed using SVD based processing and the effect of the reduction in singular values was investigated on the perception of the resynthesized vowels using reduced singular values. Investigations have shown that the number of singular values can be drastically reduced without significantly affecting the perception of the vowels.
Audio Steganography Coding Using the Discreet Wavelet TransformsCSCJournals
The performance of audio steganography compression system using discreet wavelet transform (DWT) is investigated. Audio steganography coding is the technology of transforming stego-speech into efficiently encoded version that can be decoded in the receiver side to produce a close representation of the initial signal (non compressed). Experimental results prove the efficiency of the used compression technique since the compressed stego-speech are perceptually intelligible and indistinguishable from the equivalent initial signal, while being able to recover the initial stego-speech with slight degradation in the quality .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
This paper presents compressive sensing technique used for speech reconstruction using linear predictive coding because the
speech is more sparse in LPC. DCT of a speech is taken and the DCT points of sparse speech are thrown away arbitrarily.
This is achieved by making some point in DCT domain to be zero by multiplying with mask functions. From the incomplete
points in DCT domain, the original speech is reconstructed using compressive sensing and the tool used is Gradient
Projection for Sparse Reconstruction. The performance of the result is compared with direct IDCT subjectively. The
experiment is done and it is observed that the performance is better for compressive sensing than the DCT.
Comparison and Analysis Of LDM and LMS for an Application of a SpeechCSCJournals
Most of the automatic speech recognition (ASR) systems are based on Guassian Mixtures model. The output of these models depends on subphone states. We often measure and transform the speech signal in another form to enhance our ability to communicate. Speech recognition is the conversion from acoustic waveform into written equivalent message information. The nature of speech recognition problem is heavily dependent upon the constraints placed on the speaker, speaking situation and message context. Various speech recognition systems are available. The system which detects the hidden conditions of speech is the best model. LMS is one of the simple algorithm used to reconstruct the speech and linear dynamic model is also used to recognize the speech in noisy atmosphere..This paper is analysis and comparison between the LDM and a simple LMS algorithm which can be used for speech recognition purpose.
In the present-day communications speech signals get contaminated due to
various sorts of noises that degrade the speech quality and adversely impacts
speech recognition performance. To overcome these issues, a novel approach
for speech enhancement using Modified Wiener filtering is developed and
power spectrum computation is applied for degraded signal to obtain the
noise characteristics from a noisy spectrum. In next phase, MMSE technique
is applied where Gaussian distribution of each signal i.e. original and noisy
signal is analyzed. The Gaussian distribution provides spectrum estimation
and spectral coefficient parameters which can be used for probabilistic model
formulation. Moreover, a-priori-SNR computation is also incorporated for
coefficient updation and noise presence estimation which operates similar to
the conventional VAD. However, conventional VAD scheme is based on the
hard threshold which is not capable to derive satisfactory performance and a
soft-decision based threshold is developed for improving the performance of
speech enhancement. An extensive simulation study is carried out using
MATLAB simulation tool on NOIZEUS speech database and a comparative
study is presented where proposed approach is proved better in comparison
with existing technique.
Audio Steganography Coding Using the Discreet Wavelet TransformsCSCJournals
The performance of audio steganography compression system using discreet wavelet transform (DWT) is investigated. Audio steganography coding is the technology of transforming stego-speech into efficiently encoded version that can be decoded in the receiver side to produce a close representation of the initial signal (non compressed). Experimental results prove the efficiency of the used compression technique since the compressed stego-speech are perceptually intelligible and indistinguishable from the equivalent initial signal, while being able to recover the initial stego-speech with slight degradation in the quality .
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
This paper presents compressive sensing technique used for speech reconstruction using linear predictive coding because the
speech is more sparse in LPC. DCT of a speech is taken and the DCT points of sparse speech are thrown away arbitrarily.
This is achieved by making some point in DCT domain to be zero by multiplying with mask functions. From the incomplete
points in DCT domain, the original speech is reconstructed using compressive sensing and the tool used is Gradient
Projection for Sparse Reconstruction. The performance of the result is compared with direct IDCT subjectively. The
experiment is done and it is observed that the performance is better for compressive sensing than the DCT.
Comparison and Analysis Of LDM and LMS for an Application of a SpeechCSCJournals
Most of the automatic speech recognition (ASR) systems are based on Guassian Mixtures model. The output of these models depends on subphone states. We often measure and transform the speech signal in another form to enhance our ability to communicate. Speech recognition is the conversion from acoustic waveform into written equivalent message information. The nature of speech recognition problem is heavily dependent upon the constraints placed on the speaker, speaking situation and message context. Various speech recognition systems are available. The system which detects the hidden conditions of speech is the best model. LMS is one of the simple algorithm used to reconstruct the speech and linear dynamic model is also used to recognize the speech in noisy atmosphere..This paper is analysis and comparison between the LDM and a simple LMS algorithm which can be used for speech recognition purpose.
In the present-day communications speech signals get contaminated due to
various sorts of noises that degrade the speech quality and adversely impacts
speech recognition performance. To overcome these issues, a novel approach
for speech enhancement using Modified Wiener filtering is developed and
power spectrum computation is applied for degraded signal to obtain the
noise characteristics from a noisy spectrum. In next phase, MMSE technique
is applied where Gaussian distribution of each signal i.e. original and noisy
signal is analyzed. The Gaussian distribution provides spectrum estimation
and spectral coefficient parameters which can be used for probabilistic model
formulation. Moreover, a-priori-SNR computation is also incorporated for
coefficient updation and noise presence estimation which operates similar to
the conventional VAD. However, conventional VAD scheme is based on the
hard threshold which is not capable to derive satisfactory performance and a
soft-decision based threshold is developed for improving the performance of
speech enhancement. An extensive simulation study is carried out using
MATLAB simulation tool on NOIZEUS speech database and a comparative
study is presented where proposed approach is proved better in comparison
with existing technique.
In the recent years, large scale information transfer by remote computing and the development
of massive storage and retrieval systems have witnessed a tremendous growth. To cope up with the
growth in the size of databases, additional storage devices need to be installed and the modems and
multiplexers have to be continuously upgraded in order to permit large amounts of data transfer between
computers and remote terminals. This leads to an increase in the cost as well as equipment. One solution
to these problems is “COMPRESSION” where the database and the transmission sequence can be
encoded efficiently. In this we investigated for optimum wavelet, optimum level, and optimum scaling
factor.
A Study of Digital Media Based Voice Activity Detection Protocolsijtsrd
Speaker identification is critical for a variety of voice based applications in safety and surveillance systems, and these kinds of methods are now employed in household appliances for user controlled device toggling. A comprehensive and language uncertain Voice Activity Detection VAD system is critical for Digital Media Content DMC . VAD systems are utilised for DMC generation in a variety of methods, including supplementing subtitle formation, detecting and correcting subtitle drifting, and sound distortion. The goal of this article is to provide a comprehensive overview of numerous strategies utilised for voice recognition in the entertainment industry. An analysis of several speaker recognition strategies used earlier and those utilised in current studies was explored, and a clear understanding of the superior methodology was discovered through a survey across various literature for more than two decades. We give a comprehensive survey of DNN based VADs using DMC data concerning accuracy, noise sensitivity, and language agnostic performance in this paper. Nikhil Kumar | Sumit Dalal "A Study of Digital Media-Based Voice Activity Detection Protocols" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd56376.pdf Paper URL: https://www.ijtsrd.com.com/engineering/electronics-and-communication-engineering/56376/a-study-of-digital-mediabased-voice-activity-detection-protocols/nikhil-kumar
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITYcsandit
Rapid increase in data transmission over internet results in emphasis on information security.
Audio steganography is used for secure transmission of secret data with audio signal as the
carrier. In the proposed method, cover audio file is transformed from space domain to wavelet
domain using lifting scheme, leading to secure data hiding. Text message is encrypted using
dynamic encryption algorithm. Cipher text is then hidden in wavelet coefficients of cover audio
signal. Signal to Noise Ratio (SNR) and Squared Pearson Correlation Coefficient (SPCC)
values are computed to judge the quality of the stego audio signal. Results show that stego
audio signal is perceptually indistinguishable from the cover audio signal. Stego audio signal is
robust even in presence of external noise. Proposed method provides secure and least error
data extraction.
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
Ultrasound imaging is one of the most widely used non-destructive testing methods. The transducer emits pulses that travel through the imaged samples and are reflected by echo-forming impedance. The resulting ultrasonic signals usually contain noise. Most of the traditional noise reduction algorithms require high skills and prior knowledge of noise distribution, which has a crucial impact on their performances. As a result, these methods generally yield a loss of information, significantly influencing the final data and deeply limiting both sensitivity and resolution of imaging devices in medical and industrial applications. In the present study, a denoising method based on an attention-gated convolutional autoencoder is proposed to fill this gap. To evaluate its performance, the suggested protocol is compared to widely used methods such as butterworth filtering (BF), discrete wavelet transforms (DWT), principal component analysis (PCA), and convolutional autoencoder (CAE) methods. Results proved that better denoising can be achieved especially when the original signal-to-noise ratio (SNR) is very low and the sound waves’ traces are distorted by noise. Moreover, the initial SNR was improved by up to 30 dB and the resulting Pearson correlation coefficient was maintained over 99% even for ultrasonic signals with poor initial SNR.
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Speech is the vocalizer form of human communication,and based upon the syntactic combination of lexical and vocabularies. The aim of speech coding is to compress the speech signal to the highest possible compression ratio bu t maintaining user acceptability.There are many methods for speech compression like Linear Pre dictive coding (LPC),Code Excited Linear Predictive coding (CELP),Sub-band coding,T ransform coding:- Fast Fourier Transform (FFT),Discrete Cosine Transform (DCT),Continuous Wavelet Transform (CWT),Discrete Wavelet Transform (DWT),Variance Fractal Compression (VFC),Discrete Cosine Transform (DCT),Psychoacoustics and etc. Few of them are discus in this paper.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITYcscpconf
Rapid increase in data transmission over internet results in emphasis on information security. Audio steganography is used for secure transmission of secret data with audio signal as the carrier. In the proposed method, cover audio file is transformed from space domain to wavelet domain using lifting scheme, leading to secure data hiding. Text message is encrypted using
dynamic encryption algorithm. Cipher text is then hidden in wavelet coefficients of cover audio signal. Signal to Noise Ratio (SNR) and Squared Pearson Correlation Coefficient (SPCC)
values are computed to judge the quality of the stego audio signal. Results show that stego audio signal is perceptually indistinguishable from the cover audio signal. Stego audio signal is robust even in presence of external noise. Proposed method provides secure and least error data extraction
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
Sundanese language is one of the popular languages in Indonesia. Thus, research in Sundanese language becomes essential to be made. It is the reason this study was being made. The vital parts to get the high accuracy of recognition are feature extraction and classifier. The important goal of this study was to analyze the first one. Three types of feature extraction tested were Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Human Factor Cepstral Coefficients (HFCC). The results of the three feature extraction became the input of the classifier. The study applied Hidden Markov Models as its classifier. However, before the classification was done, we need to do the quantization. In this study, it was based on clustering. Each result was compared against the number of clusters and hidden states used. The dataset came from four people who spoke digits from zero to nine as much as 60 times to do this experiments. Finally, it showed that all feature extraction produced the same performance for the corpus used.
Speech compression analysis using matlabeSAT Journals
Abstract The growth of the cellular technology and wireless networks all over the world has increased the demand for digital information by manifold. This massive demand poses difficulties for handling huge amounts of data that need to be stored and transferred. To overcome this problem we can compress the information by removing the redundancies present in it. Redundancies are the major source of generating errors and noisy signals. Coding in MATLAB helps in analyzing compression of speech signals with varying bit rate and remove errors and noisy signals from the speech signals. Speech signal’s bit rate can also be reduced to remove error and noisy signals which is suitable for remote broadcast lines, studio links, satellite transmission of high quality audio and voice over internet This paper focuses on speech compression process and its analysis through MATLAB by which processed speech signal can be heard with clarity and in noiseless mode at the receiver end . Keywords: Speech compression, bit rate, filter, MPEG, DCT.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
This paper proposes the combined methods of Wavelet Transform (WT) and Euclidean Distance
(ED) to estimate the expected value of the possibly feature vector of Indonesian syllables. This research
aims to find the best properties in effectiveness and efficiency on performing feature extraction of each
syllable sound to be applied in the speech recognition systems. This proposed approach which is the
state-of-the-art of the previous study consist of three main phase. In the first phase, the speech signal is
segmented and normalized. In the second phase, the signal is transformed into frequency domain by using
the WT. In the third phase, to estimate the expected feature vector, the ED algorithm is used. Th e result
shows the list of features of each syllables can be used for the next research, and some recommendations
on the most effective and efficient WT to be used in performing syllable sound recognition.
Investigations of the Distributions of Phonemic Durations in Hindi and Dogrikevig
Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations.
For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
May 2024 - Top10 Cited Articles in Natural Language Computingkevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
More Related Content
Similar to Effect of Singular Value Decomposition Based Processing on Speech Perception
In the recent years, large scale information transfer by remote computing and the development
of massive storage and retrieval systems have witnessed a tremendous growth. To cope up with the
growth in the size of databases, additional storage devices need to be installed and the modems and
multiplexers have to be continuously upgraded in order to permit large amounts of data transfer between
computers and remote terminals. This leads to an increase in the cost as well as equipment. One solution
to these problems is “COMPRESSION” where the database and the transmission sequence can be
encoded efficiently. In this we investigated for optimum wavelet, optimum level, and optimum scaling
factor.
A Study of Digital Media Based Voice Activity Detection Protocolsijtsrd
Speaker identification is critical for a variety of voice based applications in safety and surveillance systems, and these kinds of methods are now employed in household appliances for user controlled device toggling. A comprehensive and language uncertain Voice Activity Detection VAD system is critical for Digital Media Content DMC . VAD systems are utilised for DMC generation in a variety of methods, including supplementing subtitle formation, detecting and correcting subtitle drifting, and sound distortion. The goal of this article is to provide a comprehensive overview of numerous strategies utilised for voice recognition in the entertainment industry. An analysis of several speaker recognition strategies used earlier and those utilised in current studies was explored, and a clear understanding of the superior methodology was discovered through a survey across various literature for more than two decades. We give a comprehensive survey of DNN based VADs using DMC data concerning accuracy, noise sensitivity, and language agnostic performance in this paper. Nikhil Kumar | Sumit Dalal "A Study of Digital Media-Based Voice Activity Detection Protocols" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-3 , June 2023, URL: https://www.ijtsrd.com.com/papers/ijtsrd56376.pdf Paper URL: https://www.ijtsrd.com.com/engineering/electronics-and-communication-engineering/56376/a-study-of-digital-mediabased-voice-activity-detection-protocols/nikhil-kumar
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
Speech recognition can be defined as the process of converting voice signals into the ranks of the
word, by applying a specific algorithm that is implemented in a computer program. The research of speech
recognition in Indonesia is relatively limited. This paper has studied methods of feature extraction which is
the best among the Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCC) for
speech recognition in Indonesian language. This is important because the method can produce a high
accuracy for a particular language does not necessarily produce the same accuracy for other languages,
considering every language has different characteristics. Thus this research hopefully can help further
accelerate the use of automatic speech recognition for Indonesian language. There are two main
processes in speech recognition, feature extraction and recognition. The method used for comparison
feature extraction in this study is the LPC and MFCC, while the method of recognition using Hidden
Markov Model (HMM). The test results showed that the MFCC method is better than LPC in Indonesian
language speech recognition.
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITYcsandit
Rapid increase in data transmission over internet results in emphasis on information security.
Audio steganography is used for secure transmission of secret data with audio signal as the
carrier. In the proposed method, cover audio file is transformed from space domain to wavelet
domain using lifting scheme, leading to secure data hiding. Text message is encrypted using
dynamic encryption algorithm. Cipher text is then hidden in wavelet coefficients of cover audio
signal. Signal to Noise Ratio (SNR) and Squared Pearson Correlation Coefficient (SPCC)
values are computed to judge the quality of the stego audio signal. Results show that stego
audio signal is perceptually indistinguishable from the cover audio signal. Stego audio signal is
robust even in presence of external noise. Proposed method provides secure and least error
data extraction.
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
Speech processing is considered as crucial and an intensive field of research in the growth of robust and efficient speech recognition system. But the accuracy for speech recognition still focuses for variation of context, speaker’s variability, and environment conditions. In this paper, we stated curvelet based Feature Extraction (CFE) method for speech recognition in noisy environment and the input speech signal is decomposed into different frequency channels using the characteristics of curvelet transform for reduce the computational complication and the feature vector size successfully and they have better accuracy, varying window size because of which they are suitable for non –stationary signals. For better word classification and recognition, discrete hidden markov model can be used and as they consider time distribution of
speech signals. The HMM classification method attained the maximum accuracy in term of identification rate for informal with 80.1%, scientific phrases with 86%, and control with 63.8 % detection rates. The objective of this study is to characterize the feature extraction methods and classification phage in speech
recognition system. The various approaches available for developing speech recognition system are compared along with their merits and demerits. The statistical results shows that signal recognition accuracy will be increased by using discrete Curvelet transforms over conventional methods.
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
Ultrasound imaging is one of the most widely used non-destructive testing methods. The transducer emits pulses that travel through the imaged samples and are reflected by echo-forming impedance. The resulting ultrasonic signals usually contain noise. Most of the traditional noise reduction algorithms require high skills and prior knowledge of noise distribution, which has a crucial impact on their performances. As a result, these methods generally yield a loss of information, significantly influencing the final data and deeply limiting both sensitivity and resolution of imaging devices in medical and industrial applications. In the present study, a denoising method based on an attention-gated convolutional autoencoder is proposed to fill this gap. To evaluate its performance, the suggested protocol is compared to widely used methods such as butterworth filtering (BF), discrete wavelet transforms (DWT), principal component analysis (PCA), and convolutional autoencoder (CAE) methods. Results proved that better denoising can be achieved especially when the original signal-to-noise ratio (SNR) is very low and the sound waves’ traces are distorted by noise. Moreover, the initial SNR was improved by up to 30 dB and the resulting Pearson correlation coefficient was maintained over 99% even for ultrasonic signals with poor initial SNR.
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Speech is the vocalizer form of human communication,and based upon the syntactic combination of lexical and vocabularies. The aim of speech coding is to compress the speech signal to the highest possible compression ratio bu t maintaining user acceptability.There are many methods for speech compression like Linear Pre dictive coding (LPC),Code Excited Linear Predictive coding (CELP),Sub-band coding,T ransform coding:- Fast Fourier Transform (FFT),Discrete Cosine Transform (DCT),Continuous Wavelet Transform (CWT),Discrete Wavelet Transform (DWT),Variance Fractal Compression (VFC),Discrete Cosine Transform (DCT),Psychoacoustics and etc. Few of them are discus in this paper.
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
Speech is the natural and simplest way of communication and Speech Recognition is a fascinating application of Digital Signal Processing which has many real-world applications. In this paper, a speech recognition system is developed for Automated Teller Machines (ATMs) using Wavelet Packet Decomposition (WPD) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. ATM machines communicate with the customers using the stored speech samples and the user communicates with the machine using spoken digits. Daubechies wavelets are employed here. A multilayer neural network trained with back propagation training algorithm is used for classification purpose. The proposed method is implemented for 500 speakers uttering 10 spoken digits in English. The experimental results show good recognition accuracy of 87.38% and the efficiency of combining these two techniques
DATA HIDING IN AUDIO SIGNALS USING WAVELET TRANSFORM WITH ENHANCED SECURITYcscpconf
Rapid increase in data transmission over internet results in emphasis on information security. Audio steganography is used for secure transmission of secret data with audio signal as the carrier. In the proposed method, cover audio file is transformed from space domain to wavelet domain using lifting scheme, leading to secure data hiding. Text message is encrypted using
dynamic encryption algorithm. Cipher text is then hidden in wavelet coefficients of cover audio signal. Signal to Noise Ratio (SNR) and Squared Pearson Correlation Coefficient (SPCC)
values are computed to judge the quality of the stego audio signal. Results show that stego audio signal is perceptually indistinguishable from the cover audio signal. Stego audio signal is robust even in presence of external noise. Proposed method provides secure and least error data extraction
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
Sundanese language is one of the popular languages in Indonesia. Thus, research in Sundanese language becomes essential to be made. It is the reason this study was being made. The vital parts to get the high accuracy of recognition are feature extraction and classifier. The important goal of this study was to analyze the first one. Three types of feature extraction tested were Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Human Factor Cepstral Coefficients (HFCC). The results of the three feature extraction became the input of the classifier. The study applied Hidden Markov Models as its classifier. However, before the classification was done, we need to do the quantization. In this study, it was based on clustering. Each result was compared against the number of clusters and hidden states used. The dataset came from four people who spoke digits from zero to nine as much as 60 times to do this experiments. Finally, it showed that all feature extraction produced the same performance for the corpus used.
Speech compression analysis using matlabeSAT Journals
Abstract The growth of the cellular technology and wireless networks all over the world has increased the demand for digital information by manifold. This massive demand poses difficulties for handling huge amounts of data that need to be stored and transferred. To overcome this problem we can compress the information by removing the redundancies present in it. Redundancies are the major source of generating errors and noisy signals. Coding in MATLAB helps in analyzing compression of speech signals with varying bit rate and remove errors and noisy signals from the speech signals. Speech signal’s bit rate can also be reduced to remove error and noisy signals which is suitable for remote broadcast lines, studio links, satellite transmission of high quality audio and voice over internet This paper focuses on speech compression process and its analysis through MATLAB by which processed speech signal can be heard with clarity and in noiseless mode at the receiver end . Keywords: Speech compression, bit rate, filter, MPEG, DCT.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
This paper proposes the combined methods of Wavelet Transform (WT) and Euclidean Distance
(ED) to estimate the expected value of the possibly feature vector of Indonesian syllables. This research
aims to find the best properties in effectiveness and efficiency on performing feature extraction of each
syllable sound to be applied in the speech recognition systems. This proposed approach which is the
state-of-the-art of the previous study consist of three main phase. In the first phase, the speech signal is
segmented and normalized. In the second phase, the signal is transformed into frequency domain by using
the WT. In the third phase, to estimate the expected feature vector, the ED algorithm is used. Th e result
shows the list of features of each syllables can be used for the next research, and some recommendations
on the most effective and efficient WT to be used in performing syllable sound recognition.
Investigations of the Distributions of Phonemic Durations in Hindi and Dogrikevig
Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations.
For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.
May 2024 - Top10 Cited Articles in Natural Language Computingkevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4,
demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to
identify which specific terms in prompts positively or negatively impact relevance
evaluation with LLMs. We employed two types of prompts: those used in previous
research and generated automatically by LLMs. By comparing the performance of
these prompts in both few-shot and zero-shot settings, we analyze the influence of
specific terms in the prompts. We have observed two main findings from our study.
First, we discovered that prompts using the term ‘answer’ lead to more effective
relevance evaluations than those using ‘relevant.’ This indicates that a more direct
approach, focusing on answering the query, tends to enhance performance. Second,
we noted the importance of appropriately balancing the scope of ‘relevance.’ While
the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments.
The inclusion of few-shot examples helps in more precisely defining this balance.
By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute
to refine relevance criteria. In conclusion, our study highlights the significance of
carefully selecting terms in prompts for relevance evaluation with LLMs.
Identifying Key Terms in Prompts for Relevance Evaluation with GPT Modelskevig
Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4, demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with LLMs. We employed two types of prompts: those used in previous research and generated automatically by LLMs. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts. We have observed two main findings from our study. First, we discovered that prompts using the term ‘answer’ lead to more effective relevance evaluations than those using ‘relevant.’ This indicates that a more direct approach, focusing on answering the query, tends to enhance performance. Second, we noted the importance of appropriately balancing the scope of ‘relevance.’ While the term ‘relevant’ can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments. The inclusion of few-shot examples helps in more precisely defining this balance. By providing clearer contexts for the term ‘relevance,’ few-shot examples contribute to refine relevance criteria. In conclusion, our study highlights the significance of carefully selecting terms in prompts for relevance evaluation with LLMs.
In recent years, great advances have been made in the speed, accuracy, and coverage of automatic word
sense disambiguator systems that, given a word appearing in a certain context, can identify the sense of
that word. In this paper we consider the problem of deciding whether same words contained in different
documents are related to the same meaning or are homonyms. Our goal is to improve the estimate of the
similarity of documents in which some words may be used with different meanings. We present three new
strategies for solving this problem, which are used to filter out homonyms from the similarity computation.
Two of them are intrinsically non-semantic, whereas the other one has a semantic flavor and can also be
applied to word sense disambiguation. The three strategies have been embedded in an article document
recommendation system that one of the most important Italian ad-serving companies offers to its customers.
Genetic Approach For Arabic Part Of Speech Taggingkevig
With the growing number of textual resources available, the ability to understand them becomes critical.
An essential first step in understanding these sources is the ability to identify the parts-of-speech in each
sentence. Arabic is a morphologically rich language, which presents a challenge for part of speech
tagging. In this paper, our goal is to propose, improve, and implement a part-of-speech tagger based on a
genetic algorithm. The accuracy obtained with this method is comparable to that of other probabilistic
approaches.
Rule Based Transliteration Scheme for English to Punjabikevig
Machine Transliteration has come out to be an emerging and a very important research area in the field of
machine translation. Transliteration basically aims to preserve the phonological structure of words. Proper
transliteration of name entities plays a very significant role in improving the quality of machine translation.
In this paper we are doing machine transliteration for English-Punjabi language pair using rule based
approach. We have constructed some rules for syllabification. Syllabification is the process to extract or
separate the syllable from the words. In this we are calculating the probabilities for name entities (Proper
names and location). For those words which do not come under the category of name entities, separate
probabilities are being calculated by using relative frequency through a statistical machine translation
toolkit known as MOSES. Using these probabilities we are transliterating our input text from English to
Punjabi.
Improving Dialogue Management Through Data Optimizationkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and computers through natural language stands as a critical advancement. Central to these systems is the dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue management has embraced a variety of methodologies, including rule-based systems, reinforcement learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset imperfections, especially mislabeling, on the challenges inherent in refining dialogue management processes.
Document Author Classification using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used, for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship using grammatical structural information extracted using a statistical natural language parser. This paper provides a proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted from the statisticalnaturallanguage parserwere explored: all subtrees of some depth from any level; rooted subtrees of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the features into a lower dimensional space. Statistical experiments on these documents demonstrate that information from a statistical parser can, in fact, assist in distinguishing authors.
Rag-Fusion: A New Take on Retrieval Augmented Generationkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots, but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal scores and fusing the documents and scores. Through manually evaluating answers on accuracy, relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and comprehensive answers due to the generated queries contextualizing the original query from various perspectives. However, some answers strayed off topic when the generated queries' relevance to the original query is insufficient. This research marks significant progress in artificial intelligence (AI) and natural language processing (NLP) applications and demonstrates transformations in a global and multi-industry context.
Performance, Energy Consumption and Costs: A Comparative Analysis of Automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their performance. However, this often leads to overlooking other crucial aspects that should be given careful consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches (SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance parameters (standard indices), but also alternative measures such as timing, energy consumption and costs, which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of complex models (LLM and generative models), consuming much less energy and requiring fewer resources. These findings suggest that companies should consider additional considerations when choosing machine learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
Evaluation of Medium-Sized Language Models in German and English Languagekevig
Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative question answering, which requires models to provide elaborate answers without external document retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results show that combining the best answers from different MLMs yielded an overall correct answer rate of 82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B parameters, which highlights the importance of using appropriate training data for fine-tuning rather than solely relying on the number of parameters. More fine-grained feedback should be used to further improve the quality of answers. The open source community is quickly closing the gap to the best commercial models.
IMPROVING DIALOGUE MANAGEMENT THROUGH DATA OPTIMIZATIONkevig
In task-oriented dialogue systems, the ability for users to effortlessly communicate with machines and
computers through natural language stands as a critical advancement. Central to these systems is the
dialogue manager, a pivotal component tasked with navigating the conversation to effectively meet user
goals by selecting the most appropriate response. Traditionally, the development of sophisticated dialogue
management has embraced a variety of methodologies, including rule-based systems, reinforcement
learning, and supervised learning, all aimed at optimizing response selection in light of user inputs. This
research casts a spotlight on the pivotal role of data quality in enhancing the performance of dialogue
managers. Through a detailed examination of prevalent errors within acclaimed datasets, such as
Multiwoz 2.1 and SGD, we introduce an innovative synthetic dialogue generator designed to control the
introduction of errors precisely. Our comprehensive analysis underscores the critical impact of dataset
imperfections, especially mislabeling, on the challenges inherent in refining dialogue management
processes.
Document Author Classification Using Parsed Language Structurekevig
Over the years there has been ongoing interest in detecting authorship of a text based on statistical properties of the
text, such as by using occurrence rates of noncontextual words. In previous work, these techniques have been used,
for example, to determine authorship of all of The Federalist Papers. Such methods may be useful in more modern
times to detect fake or AI authorship. Progress in statistical natural language parsers introduces the possibility of
using grammatical structure to detect authorship. In this paper we explore a new possibility for detecting authorship
using grammatical structural information extracted using a statistical natural language parser. This paper provides a
proof of concept, testing author classification based on grammatical structure on a set of “proof texts,” The Federalist
Papers and Sanditon which have been as test cases in previous authorship detection studies. Several features extracted
of some depth, part of speech, and part of speech by level in the parse tree. It was found to be helpful to project the
features into a lower dimensional space. Statistical experiments on these documents demonstrate that information
from a statistical parser can, in fact, assist in distinguishing authors.
RAG-FUSION: A NEW TAKE ON RETRIEVALAUGMENTED GENERATIONkevig
Infineon has identified a need for engineers, account managers, and customers to rapidly obtain product
information. This problem is traditionally addressed with retrieval-augmented generation (RAG) chatbots,
but in this study, I evaluated the use of the newly popularized RAG-Fusion method. RAG-Fusion combines
RAG and reciprocal rank fusion (RRF) by generating multiple queries, reranking them with reciprocal
scores and fusing the documents and scores. Through manually evaluating answers on accuracy,
relevance, and comprehensiveness, I found that RAG-Fusion was able to provide accurate and
comprehensive answers due to the generated queries contextualizing the original query from various
perspectives. However, some answers strayed off topic when the generated queries' relevance to the
original query is insufficient. This research marks significant progress in artificial intelligence (AI) and
natural language processing (NLP) applications and demonstrates transformations in a global and multiindustry context
Performance, energy consumption and costs: a comparative analysis of automati...kevig
The common practice in Machine Learning research is to evaluate the top-performing models based on their
performance. However, this often leads to overlooking other crucial aspects that should be given careful
consideration. In some cases, the performance differences between various approaches may be insignificant, whereas factors like production costs, energy consumption, and carbon footprint should be taken into
account. Large Language Models (LLMs) are widely used in academia and industry to address NLP problems. In this study, we present a comprehensive quantitative comparison between traditional approaches
(SVM-based) and more recent approaches such as LLM (BERT family models) and generative models (GPT2 and LLAMA2), using the LexGLUE benchmark. Our evaluation takes into account not only performance
parameters (standard indices), but also alternative measures such as timing, energy consumption and costs,
which collectively contribute to the carbon footprint. To ensure a complete analysis, we separately considered the prototyping phase (which involves model selection through training-validation-test iterations) and
the in-production phases. These phases follow distinct implementation procedures and require different resources. The results indicate that simpler algorithms often achieve performance levels similar to those of
complex models (LLM and generative models), consuming much less energy and requiring fewer resources.
These findings suggest that companies should consider additional considerations when choosing machine
learning (ML) solutions. The analysis also demonstrates that it is increasingly necessary for the scientific
world to also begin to consider aspects of energy consumption in model evaluations, in order to be able to
give real meaning to the results obtained using standard metrics (Precision, Recall, F1 and so on).
EVALUATION OF MEDIUM-SIZED LANGUAGE MODELS IN GERMAN AND ENGLISH LANGUAGEkevig
Large language models (LLMs) have garnered significant attention, but the definition of “large” lacks
clarity. This paper focuses on medium-sized language models (MLMs), defined as having at least six
billion parameters but less than 100 billion. The study evaluates MLMs regarding zero-shot generative
question answering, which requires models to provide elaborate answers without external document
retrieval. The paper introduces an own test dataset and presents results from human evaluation. Results
show that combining the best answers from different MLMs yielded an overall correct answer rate of
82.7% which is better than the 60.9% of ChatGPT. The best MLM achieved 71.8% and has 33B
parameters, which highlights the importance of using appropriate training data for fine-tuning rather than
solely relying on the number of parameters. More fine-grained feedback should be used to further improve
the quality of answers. The open source community is quickly closing the gap to the best commercial
models.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Enhanced Retrieval of Web Pages using Improved Page Rank Algorithmkevig
Information Retrieval (IR) is a very important and vast area. While searching for context web returns all
the results related to the query. Identifying the relevant result is most tedious task for a user. Word Sense
Disambiguation (WSD) is the process of identifying the senses of word in textual context, when word has
multiple meanings. We have used the approaches of WSD. This paper presents a Proposed Dynamic Page
Rank algorithm that is improved version of Page Rank Algorithm. The Proposed Dynamic Page Rank
algorithm gives much better results than existing Google’s Page Rank algorithm. To prove this we have
calculated Reciprocal Rank for both the algorithms and presented comparative results.
Effect of MFCC Based Features for Speech Signal Alignmentskevig
The fundamental techniques used for man-machine communication include Speech synthesis, speech
recognition, and speech transformation. Feature extraction techniques provide a compressed
representation of the speech signals. The HNM analyses and synthesis provides high quality speech with
less number of parameters. Dynamic time warping is well known technique used for aligning two given
multidimensional sequences. It locates an optimal match between the given sequences. The improvement in
the alignment is estimated from the corresponding distances. The objective of this research is to investigate
the effect of dynamic time warping on phrases, words, and phonemes based alignments. The speech signals
in the form of twenty five phrases were recorded. The recorded material was segmented manually and
aligned at sentence, word, and phoneme level. The Mahalanobis distance (MD) was computed between the
aligned frames. The investigation has shown better alignment in case of HNM parametric domain. It has
been seen that effective speech alignment can be carried out even at phrase level
Power-sharing Class 10 is a vital aspect of democratic governance. It refers to the distribution of power among different organs of government, levels of government, and social groups. This ensures that no single entity can control all aspects of governance, promoting stability and unity in a diverse society.
For more information, visit-www.vavaclasses.com
Extraction Of Natural Dye From Beetroot (Beta Vulgaris) And Preparation Of He...SachinKumar945617
If you want to make , ppt, dissertation/research, project or any document edit service
DM me on what's app 8434381558
E-mail sachingone220@gmail.com
I will take charge depend upon how much pages u want
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity Green house effect & Hydrological cycle
Types of Ecosystem
(1) Natural Ecosystem
(2) Artificial Ecosystem
component of ecosystem
Biotic Components
Abiotic Components
Producers
Consumers
Decomposers
Functions of Ecosystem
Types of Biodiversity
Genetic Biodiversity
Species Biodiversity
Ecological Biodiversity
Importance of Biodiversity
Hydrological Cycle
Green House Effect
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
plant breeding methods in asexually or clonally propagated crops
Effect of Singular Value Decomposition Based Processing on Speech Perception
1. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
DOI : 10.5121/ijnlc.2013.2102 9
EFFECT OF SINGULAR VALUE DECOMPOSITION
BASED PROCESSING ON SPEECH PERCEPTION
1
Balvinder kour, 1
Randhir Singh, 2
Parveen Lehana* and 2
Padmini Rajput
1
ECE Department, Shri Sai College of Engineering and Technology, Pathankot, Punjab
2
Dept. of Physics and Electronics, University of Jammu, Jammu, India
*pklehanajournals@gmail.com
ABSTRACT
Speech is an important biological signal for primary mode of communication among human being and also
the most natural and efficient form of exchanging information among human in speech. Speech processing
is the most important aspect in signal processing. In this paper the theory of linear algebra called singular
value decomposition (SVD) is applied to the speech signal. SVD is a technique for deriving important
parameters of a signal. The parameters derived using SVD may further be reduced by perceptual
evaluation of the synthesized speech using only perceptually important parameters, where the speech signal
can be compressed so that the information can be transformed into compressed form without losing its
quality. This technique finds wide applications in speech compression, speech recognition, and speech
synthesis. The objective of this paper is to investigate the effect of SVD based feature selection of the input
speech on the perception of the processed speech signal. The speech signal which is in the form of vowels
a, e, u were recorded from each of the six speakers (3 males and 3 females). The vowels for the six
speakers were analyzed using SVD based processing and the effect of the reduction in singular values was
investigated on the perception of the resynthesized vowels using reduced singular values. Investigations
have shown that the number of singular values can be drastically reduced without significantly affecting the
perception of the vowels.
KEYWORDS
Speech signal, Speech generation, Speech processing, Speech compression, Singular value decomposition
1. INTRODUCTION
The information technology is having an important impact on many aspects of our lives.
Communication between human beings and information processing devices is also equally
important. Up to now, such communication is entirely through the use of keyboards and screens,
but speech is most widely used, because it is the fastest and most natural means of communication
for the human beings [1]. Speech is basically a significant biomedical signal for transmission on
the electrical channels. The artificial speech that can be generated in an automated way has been a
dream of humankind for many centuries [2]. Modern communication techniques use digital
transmission of signals. The phenomenon known as speech recognition is used for converting
speech signal to a sequence of words by means of algorithms implemented as a computer program
[3]. During speech production, the air flows from lungs and passes from the glottis, throat, and the
vocal tract. Depending on which speech sound we articulate, the vocal tract can be excited in three
possible ways which are voiced, unvoiced, and transient excitation. In voiced excitation, the air
pressure makes the glottis to open and close periodically which generates a triangle shaped
periodic pulse train. This fundamental frequency of the excitation lies in the range from 80 Hz to
350 Hz. Unvoiced excitation are generated when the glottis is open and the air pressure creates the
2. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
10
narrow passage in the throat or mouth. This results in a turbulence which generates a noise signal.
The spectral shape of the noise is determined by the location of the narrowness. In transient
excitation a closure in the throat or mouth will raise the air pressure. By suddenly opening the
closure, the air pressure drops down immediately involving explosive burst [4]. The various signal
models have been developed which can provide the basis for theoretical description of signal
processing system used to provide desired output and also information regarding signal source
without the availability of source. The acoustic models are used to capture speech feature statistics
in a parametric way. The dominant technique for acoustic model is the hidden Markov model
(HMM). HMM is designed to capture time varying signal's statistics and can be considered as a
generalization of the Gaussian mixture model (GMM). As it is difficult to formulate a continuously
time varying model, the HMM models it by a state to state transition, hence this approach can be
considered as the discretization of the continuous varying case [5]. Compression of signals is based
on removing the redundancy between neighbouring samples and between the adjacent cycles.
Lossy compression scheme is mostly used to compress information such as speech signals.
Discrete wavelets transform (DWT) lossy compression techniques are used to solve the limited
bandwidth problem. The performance of the DWT for speech compression is very good compared
with other techniques such as μ-law speech coder. It is very essential to represent data by as small
as possible number of coefficients within an acceptable loss of visual quality in a data compression
of a signal.
Compression techniques can be divided into two main categories: lossless and lossy. Compression
methods can also be divided into three functional categories: Direct method, Transformation
method and Parameter extraction method. In direct methods the samples of the signals are directly
provided for the compression. While transformation methods include Fourier transform (FT),
wavelets transform (WT), and discrete cosines transform (DCT). In parameter extraction method,
pre-processor is employed to extract some features which can be used to reconstruct the signal.
The compression of speech signals has many practical applications. It can be used in digital
cellular technology where the same frequency bandwidth can be shared between many users. In the
compression many users are allowed to the system. In the data compression technology the
information can be represented with lowest number of bits i.e. minimum size. This technology is
required in the field of speech which can be used to satisfy transfer requirements of huge speech
signals via communication companies and Internet. With the increasing technology the demand for
digital information is increasing over the past decades [6].In speech compression it is very difficult
to achieve a low bit rate in digital representation of input signal with negligible loss of signal
quality and its quality can be assessed by a human which is an ultimate receiver. The perceptual
coding based method has also been developed which is based on the concept of masking the
distortion [7].
SVD can also be used as a one of the data compression method. It has been already used for
RADAR signals. In this technique SVD is used to extract the prototype pulse from a matrix of
aligned pulses [8]. The objective of this paper is to investigate the effect of SVD based feature
selection of the input speech on the perception of the processed speech signal. The scope of this
paper is limited to only the processing and evaluation of cardinal vowels. The detail of SVD is
given in the following section. The methodology of the investigations is presented in Section III.
The results and conclusions are presented in Section IV.
2. SINGULAR VALUE DECOMPOSITION
Singular value decomposition (SVD) is a technique for deriving the important parameters of a
given signal. In linear algebra, the singular value decomposition means the factorization of a real
or complex matrix, with many useful applications in signal processing and statistics [9].Singular
value decomposition has been proved to be an efficient tool for signal processing techniques such
3. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
11
as image coding, signal enhancement, and image filtering. SVD based speech enhancement
technique has recently been proposed in the literature. This technique assumes that the noise in the
speech is additive and uncorrelated with the pure speech signal. The SVD technique enhances the
noisy signal by retaining a few of the singular values from the decomposition of an over-
determined, over-extended data matrix. The singular values that are ignored are associated with the
noisy part of the signal. The signal reconstructed from the reduced rank matrix is the enhanced
speech signal [10]. The applications employing the SVD include computing the pseudo inverse,
least squares fitting of data, matrix approximation, and determining the rank, range and null space
of a matrix [9]. SVD process has also found application in square matrices which was further
converted to rectangular matrices [11]. A few years ago, singular value decomposition was also
explored for water marking. For data processing and dimension reduction of the signal Principal
component analysis (PCA), a variable reduction procedure is used [12]. PCA has found numerous
applications such as handwritten zip code classification and human face recognition [13]. The
SVD method has also been employed on the eigenvectors corresponding to the largest singular
values contained signal information, while the eigenvectors corresponding to the smallest singular
values containing noise information. The enhanced signal was reconstructed using only the
information associated with the largest singular values [14]. The spoken language interface relies
on speech synthesis or text to speech system to provide range of capability for having machine
communicate to the user. The various speech coding technique were developed for both efficient
transmission and storage of speech to conserve bandwidth or bit rate maintaining voice quality
[15]. The combination with column pivoting for SVD and QR decomposition column pivoting
(QRcp) have been used to select the potential set of features. The idea is to select those features
that can explain different dimensions showing minimal similarities among them in an orthogonal
sense. This helps in the improvement of quality of selection and enhanced identification rate. SVD
QRcp has been found useful for selecting the subset of data in a heart sound classification problem
using artificial neural network [16]. The optimized singular vector denoising approach for speech
enhancement explains about a new algorithm for speech enhancement. In this approach, the effect
of noise is reduced from both singular values and singular vectors [17].
3. METHODOLOGY
For the analysis of the speech signal let ( )
,
S t
be any singular matrix, representing the
spectrogram of the given speech signal ( )
s t . The short time Fourier transform of input speech
signal may be written as
( ) ( ) ( ) ( )
, , , ,
T
S t U w t E w t V w t
=
where V(ω,t) is n k
× unitary matrix, E(ω) is an k k
× rectangular diagonal matrix having
singular values, and V(ω,t)T
is conjugate transpose of V(ω,t) is an k n
× unitary matrix [18]. The
recording of three male and three female speakers having the age between 20-25 years at the
sampling frequency of 16 kHz has been taken. The recorded speech was segmented into three
different vowels to investigate the effect of SVD based processing of input speech on the
perception of speech signal. Spectrograms of the segmented vowels were obtained. SVD of the
spectrograms was taken and the singular values less than a threshold were retained. The synthesis
was carried out with the reduced singular values. The perceptual evaluation of the synthesized
speech was carried out.
4. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
12
4. RESULTS AND CONCLUSIONS
The investigations were carried out using perceptual evaluation and spectrograms of the
synthesized speech. Table I shows the minimum singular values used for the synthesis for
obtaining perceptually satisfactory quality. The mean of the singular values for three vowels is
also plotted as histograms in Fig. 1. It is clear from the histograms that the minimum singular
values required for the satisfactory synthesis quality is a small fraction of the maximum singular
values in the recorded speech.
The spectrogram of three cardinal vowels for six speakers is shown in Fig. 2 to Fig. 7. The first
column shows the spectrograms of the recorded vowels and the second column shows the
spectrograms of the synthesized vowels at different singular values. The spectrograms of the
synthesized vowels are smooth and visually similar to the original recorded vowels. It was
observed that first four or five values of the singular matrix are enough to synthesize the vowels
with satisfactory quality.
Table 1 Maximum and minimum SVD values in the three vowels for the recorded and synthesized vowels,
respectively.
Speakers
a e u
Max. SVD
in
recorded
Min SVD
in synth.
Max. SVD
in
recorded
Min SVD
in synth.
Max. SVD
in
recorded
Min SVD
in synth.
Sp1 0.6224, 1 0.0302, 7 0.7208, 1 0.0090, 12 0.7211, 1 0.0317, 4
Sp2 0.9648, 1 0.0626, 7 0.7921, 1 0.0180, 7 0.6968, 1 0.0400, 4
Sp3 0.7887, 1 0.1638, 4 1.0600, 1 0.0576, 4 1.8170, 1 0.0940, 5
Sp4 1.2191, 1 0.457, 10 1.1056,1 0.0427, 5 1.5483, 1 0.1464, 4
Sp5 1.1856, 1 0.1686, 4 1.0470, 1 0.0912, 4 1.2232, 1 0.0959, 5
Sp6 0.6249, 1 0.0488, 6 1.0031, 1 0.0379, 5 1.0135,1 0.0782, 5
Mean 0.9009166 0.1551 0.95476 0.04274 1.16998 0.08013
Fig. 1 Averaged maximum and minimum SVD values in the three vowels for the recorded and synthesized
vowels, respectively.
5. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
13
Original speech signal Synthesized speech signal
Fig. 2 Spectrogram of original and synthesized speech for vowels a,e, u for Sp1.
Original speech signal Synthesized speech signal
Fig. 3 Spectrogram of original and synthesized speech for vowels a, e, u for Sp2.
6. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
14
Original speech signal Synthesized speech signal
Fig. 4 Spectrogram of original and synthesized speech for vowels a, e, u for Sp3.
Original speech signal Synthesized speech signal
Fig. 5 Spectrogram of original and synthesized speech for vowels a, e, u for Sp4.
7. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
15
Original speech signal Synthesized speech signal
Fig. 6 Spectrogram of original and synthesized speech for vowels a, e, u for Sp5.
Original speech signal Synthesized speech signal
Fig. 7 Spectrogram of original and synthesized speech for vowels a, e, u for Sp6
8. International Journal on Natural Language Computing (IJNLC) Vol. 2, No.1, February 2013
16
REFERENCES
[1] Ziolko B (2009) “Speech Recognition of Highly Inflective Languages”, Ph.D. Thesis, University of
York, Artificial Intelligence Group Pattern Recognition and Computer Vision Group, Department of
Computer Science, United Kingdom, pp 1-122.
[2] Pantazis I (2006) “Detection of discontinuities in concatenative speech synthesis”, Msc thesis
,University of Crete, Department of Computer Science, Heraklion, Fall.
[3] Gaikwad S K, Marathwada B A , Gawali B W, & Yannawar P (2010) “A review on speech
recognition technique”, Research Student Department of CS& IT, International Journal of Computer
Applications, Nov., Vol. 10, No.3, pp 1-24.
[4] Plannerer B (2005) “An introduction to speech recognition”, March, pp 1-68.
[5] Rabiner L R (1989) “A tutorial on hidden markov models and selected applications in speech
recognition”, in Proc. of the IEEE, Vol.77, No. 2.
[6] Elaydi H, Jaber M I, Tanboura M B “Speech compression using wavelets”, Electrical & Computer
Engineering Department, Islamic University of Gaza,Gaza, Palestine.
[7] Jayant N ,fellow, Johnston J, & Safranek R (1993) “ Signal compression based on models of human
perception”, in proc. of IEEE signal processing research department , oct,Vol. 81, No. 10 , pp 1385-
1422.
[8] ZHOU Z (2001) “ Data compression for radar signals an SVD based approach” , Msc thesis, state
university of new york at binghamton, may, pp 1-57.
[9] Wall M E., Rechtsteiner A, & Rocha L M (2003) "Singular value decomposition and principal
component analysis".
[10] Lilly B T & Paliwal K K (1997)” Robust speech recognition using singular value decomposition
based speech enhancement” , IEEE Tencon Speech and Image Technologies for Computing and
Telecommunications, Signal Processing Laboratory School of Microelectronic Engineering Griffith
University.
[11] Akritas A G & Malaschonok G I (2002) “Applications of singular value decomposition (SVD)”,
Department of Computer and Communication Engineering, University of Thessaly, Greece, pp 1-15.
[12] Jolliffe, I. T (1986) “ Principal Component Analysis” ,Springer-Verlag ,pp. 487.
[13] Zou H, Hastiey T and Tibshiraniz R (2004), “ Sparse Principal Component Analysis”, Department of
Health Research & Policy and Department of Statistics, Stanford University, Stanford, April 26, pp 1-
30.
[14] Dendrinos et al,& Loizou P C (2003) “A generalized subspace approach for enhancing speech
corrupted by colored noise” ,Yi Hu, Student Member, IEEE.
[15] Kamm C, Walker M, & Rabiner L “The role of speech processing in human-computer intelligent
communication”, Speech and Image Processing Services Research Laboratory AT&T Labs Research,
Florham Park, pp 1-26.
[16] Chakroborty S, & Saha G (2010) “Feature selection using singular value decomposition and QR
factorization with column pivoting for text-independent speaker identification”, Speech
Communication, pp 693–709.
[17] Zehtabian A, & Hassanpour H “Optimized Singular vector denoising approach for speech
enhancement”, Iranica Journal of Energy & Environment, IJEE an Official Peer Reviewed Journal ,
Babol Noshirvani University of Technology, pp 166-180.
[18] Cao L (2007) “Singular Value Decomposition Applied to Digital Image Processing”, Division of
computing studies, Arizona state university polytechnic campus mesa, May, pp 1-16.