International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...CSCJournals
In this paper a new thresholding based speech enhancement approach is presented, where the threshold is statistically determined by employing the Teager energy operation on the Wavelet Packet (WP) coefficients of noisy speech. The threshold thus obtained is applied on the WP coefficients of the noisy speech by using a hard thresholding function in order to obtain an enhanced speech. Detailed simulations are carried out in the presence of white, car, pink, and babble noises to evaluate the performance of the proposed method. Standard objective measures, spectrogram representations and subjective listening tests show that the proposed method outperforms the existing state-of-the-art thresholding based speech enhancement approaches for noisy speech from high to low levels of SNR.
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionIOSRJVSP
This paper is aimed to reduce background noise introduced in speech signal during capture, storage, transmission and processing using Spectral Subtraction algorithm. To consider the fact that colored noise corrupts the speech signal non-uniformly over different frequency bands, Multi-Band Spectral Subtraction (MBSS) approach is exploited wherein amount of noise subtracted from noisy speech signal is decided by a weighting factor. Choice of optimal values of weights decides the performance of the speech enhancement system. In this paper weights are decided based on SFM (Spectral Flatness Measure) than conventional SNR (Signal to Noise Ratio) based rule. Since SFM is able to provide true distinction between speech signal and noise signal. Spectrogram, Mean Opinion Score show that speech enhanced from proposed SFM based MBSS possess better perceptual quality and improved intelligibility than existing SNR based MBSS
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...IOSR Journals
This paper deals with residual musical noise which results from the perceptual speech enhancement
type algorithms and especially using wiener filtering approach. Perceptual speech enhancement techniques
perform better than the non perceptual techniques, most of them still return a trouble residual musical noise.
This is due to that only noise above the noise masking threshold (NMT) is filtered out then noise below the noise
masking threshold (NMT) can become audible if its maskers are filtered. It can affect the performance of
perceptual speech enhancement method that process the audible noise only (Residual noise is still present). In
order to overcome this drawback a new speech enhancement technique is proposed here.The main aim here is to improve the enhanced speech signal quality provided by perceptual wiener filtering and by controlling the
latter via a second filter regarded as a psychoacoustically motivated weighting factor. The simulation results
gives the information that the performance is improved compared to other perceptual speech enhancement
methods
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...CSCJournals
In this paper a new thresholding based speech enhancement approach is presented, where the threshold is statistically determined by employing the Teager energy operation on the Wavelet Packet (WP) coefficients of noisy speech. The threshold thus obtained is applied on the WP coefficients of the noisy speech by using a hard thresholding function in order to obtain an enhanced speech. Detailed simulations are carried out in the presence of white, car, pink, and babble noises to evaluate the performance of the proposed method. Standard objective measures, spectrogram representations and subjective listening tests show that the proposed method outperforms the existing state-of-the-art thresholding based speech enhancement approaches for noisy speech from high to low levels of SNR.
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionIOSRJVSP
This paper is aimed to reduce background noise introduced in speech signal during capture, storage, transmission and processing using Spectral Subtraction algorithm. To consider the fact that colored noise corrupts the speech signal non-uniformly over different frequency bands, Multi-Band Spectral Subtraction (MBSS) approach is exploited wherein amount of noise subtracted from noisy speech signal is decided by a weighting factor. Choice of optimal values of weights decides the performance of the speech enhancement system. In this paper weights are decided based on SFM (Spectral Flatness Measure) than conventional SNR (Signal to Noise Ratio) based rule. Since SFM is able to provide true distinction between speech signal and noise signal. Spectrogram, Mean Opinion Score show that speech enhanced from proposed SFM based MBSS possess better perceptual quality and improved intelligibility than existing SNR based MBSS
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...IOSR Journals
This paper deals with residual musical noise which results from the perceptual speech enhancement
type algorithms and especially using wiener filtering approach. Perceptual speech enhancement techniques
perform better than the non perceptual techniques, most of them still return a trouble residual musical noise.
This is due to that only noise above the noise masking threshold (NMT) is filtered out then noise below the noise
masking threshold (NMT) can become audible if its maskers are filtered. It can affect the performance of
perceptual speech enhancement method that process the audible noise only (Residual noise is still present). In
order to overcome this drawback a new speech enhancement technique is proposed here.The main aim here is to improve the enhanced speech signal quality provided by perceptual wiener filtering and by controlling the
latter via a second filter regarded as a psychoacoustically motivated weighting factor. The simulation results
gives the information that the performance is improved compared to other perceptual speech enhancement
methods
Broad phoneme classification using signal based featuresijsc
Speech is the most efficient and popular means of human communication Speech is produced as a sequence
of phonemes. Phoneme recognition is the first step performed by automatic speech recognition system. The
state-of-the-art recognizers use mel-frequency cepstral coefficients (MFCC) features derived through short
time analysis, for which the recognition accuracy is limited. Instead of this, here broad phoneme
classification is achieved using features derived directly from the speech at the signal level itself. Broad
phoneme classes include vowels, nasals, fricatives, stops, approximants and silence. The features identified
useful for broad phoneme classification are voiced/unvoiced decision, zero crossing rate (ZCR), short time
energy, most dominant frequency, energy in most dominant frequency, spectral flatness measure and first
three formants. Features derived from short time frames of training speech are used to train a multilayer
feedforward neural network based classifier with manually marked class label as output and classification
accuracy is then tested. Later this broad phoneme classifier is used for broad syllable structure prediction
which is useful for applications such as automatic speech recognition and automatic language
identification.
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals
Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper we propose a new method for minimum tracking in Minimum Statistics (MS) noise estimation method. This noise estimation algorithm is proposed for highly nonstationary noise environments. This was confirmed with formal listening tests which indicated that the proposed noise estimation algorithm when integrated in speech enhancement was preferred over other noise estimation algorithms.
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Optimization of Mel Cepstral Coefficient by Flower Pollination Algorithm for ...IJERA Editor
This paper describes the speech intelligibility enhancement by extracting the speech features. A method is
introduced for modifying the speech features i.e. Mel frequency cepstral coefficient from a fusion of noise and
speech signal and then these features are optimized by Flower pollination algorithm (FPA) which is a natureinspired
algorithm and is based on the process called pollination. Different results shows each and every step,
extracting the speech features and optimized by flower pollination algorithm and by taking same speech signal
and different noises FPA signal-to-noise ratio (SNR) gives better results and when different speech signals and
different noisy signals have been compared still the FPA SNR perform better than the existing technique. Thus,
proposed method gives reliable and effective results.
This paper describe the morphing concept in which we convert the voice of any person into pre -analyzed or pre-recorded voice of any animals.As the user generate a pre-established voice, his pitch, timbre, vibrato and articulation can be modified to resemble those of a pre-recorded and pre-analyzed voice of animal. This technique is based on SMS. Thus using this concept we can develop many funny application and we can used this type of application in mobile device, personal computer etc. for enjoying the sometime of period.
Speech signal analysis for linear filter banks of different orderseSAT Journals
Abstract In speech signal processing using of filter banks is very important. The critical requirement is the sum of all the frequency responses of the band-pass filters of the filter bank i.e. composite frequency response be flat with linear phase. This paper deals with design and implementation of linear-phase FIR digital filters based filter bank flat with flat composite frequency response. The design is based on special properties of FIR filters by which excellent frequency response can be achieved. Keyword: Speech signal, FIR Filter, Composite frequency response.
Noise reduction in speech processing using improved active noise control (anc...eSAT Journals
Abstract An improved feed forward adaptive Active Noise Control (ANC) scheme is proposed by using Voice Activity Detector (VAD) and wiener filtering method. The ‘speech-plus-noise’ periods and ‘noise-only’ periods are separated using VAD and the unwanted noise is removed by adaptive filtering method. By using Speech Distortion Weighted- Multichannel Wiener Filtering (SDW-MWF) algorithm the noise periods which is present along with the speech signal, is processed and filtered out. The background noise along with the speech samples are removed by the iterative procedure of filtering process. Feed forward based ANC is used to achieve a system with a better noise reduction in speech processing. Adaptive filtering process is carried out and the speech signal without the background noise can be achieved. Key Words: Active Noise Control (ANC), Noise reduction, Adaptive filtering, Feed forward ANC
Noise reduction in speech processing using improved active noise control (anc...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This is my presentation on a Journal Club. It's based on the article: "Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners". You can find all the references in the slide at the end of the article. I review very basic techniques in noise reduction, and how the techniques are implemented in the area of deep neural-network.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Broad phoneme classification using signal based featuresijsc
Speech is the most efficient and popular means of human communication Speech is produced as a sequence
of phonemes. Phoneme recognition is the first step performed by automatic speech recognition system. The
state-of-the-art recognizers use mel-frequency cepstral coefficients (MFCC) features derived through short
time analysis, for which the recognition accuracy is limited. Instead of this, here broad phoneme
classification is achieved using features derived directly from the speech at the signal level itself. Broad
phoneme classes include vowels, nasals, fricatives, stops, approximants and silence. The features identified
useful for broad phoneme classification are voiced/unvoiced decision, zero crossing rate (ZCR), short time
energy, most dominant frequency, energy in most dominant frequency, spectral flatness measure and first
three formants. Features derived from short time frames of training speech are used to train a multilayer
feedforward neural network based classifier with manually marked class label as output and classification
accuracy is then tested. Later this broad phoneme classifier is used for broad syllable structure prediction
which is useful for applications such as automatic speech recognition and automatic language
identification.
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals
Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper we propose a new method for minimum tracking in Minimum Statistics (MS) noise estimation method. This noise estimation algorithm is proposed for highly nonstationary noise environments. This was confirmed with formal listening tests which indicated that the proposed noise estimation algorithm when integrated in speech enhancement was preferred over other noise estimation algorithms.
The primary goal of this paper is to provide an overview of existing Text-To-Speech (TTS) Techniques by highlighting its usage and advantage. First Generation Techniques includes Formant Synthesis and Articulatory Synthesis. Formant Synthesis works by using individually controllable formant filters, which can be set to produce accurate estimations of the vocal-track transfer function. Articulatory Synthesis produces speech by direct modeling of Human articulator behavior. Second Generation Techniques incorporates Concatenative synthesis and Sinusoidal synthesis. Concatenative synthesis generates speech output by concatenating the segments of recorded speech. Generally, Concatenative synthesis generates the natural sounding synthesized speech. Sinusoidal Synthesis use a harmonic model and decompose each frame into a set of harmonics of an estimated fundamental frequency. The model parameters are the amplitudes and periods of the harmonics. With these, the value of the fundamental can be changed while keeping the same basic spectral..In adding, Third Generation includes Hidden Markov Model (HMM) and Unit Selection Synthesis.HMM trains the parameter module and produce high quality Speech. Finally, Unit Selection operates by selecting the best sequence of units from a large speech database which matches the specification.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Optimization of Mel Cepstral Coefficient by Flower Pollination Algorithm for ...IJERA Editor
This paper describes the speech intelligibility enhancement by extracting the speech features. A method is
introduced for modifying the speech features i.e. Mel frequency cepstral coefficient from a fusion of noise and
speech signal and then these features are optimized by Flower pollination algorithm (FPA) which is a natureinspired
algorithm and is based on the process called pollination. Different results shows each and every step,
extracting the speech features and optimized by flower pollination algorithm and by taking same speech signal
and different noises FPA signal-to-noise ratio (SNR) gives better results and when different speech signals and
different noisy signals have been compared still the FPA SNR perform better than the existing technique. Thus,
proposed method gives reliable and effective results.
This paper describe the morphing concept in which we convert the voice of any person into pre -analyzed or pre-recorded voice of any animals.As the user generate a pre-established voice, his pitch, timbre, vibrato and articulation can be modified to resemble those of a pre-recorded and pre-analyzed voice of animal. This technique is based on SMS. Thus using this concept we can develop many funny application and we can used this type of application in mobile device, personal computer etc. for enjoying the sometime of period.
Speech signal analysis for linear filter banks of different orderseSAT Journals
Abstract In speech signal processing using of filter banks is very important. The critical requirement is the sum of all the frequency responses of the band-pass filters of the filter bank i.e. composite frequency response be flat with linear phase. This paper deals with design and implementation of linear-phase FIR digital filters based filter bank flat with flat composite frequency response. The design is based on special properties of FIR filters by which excellent frequency response can be achieved. Keyword: Speech signal, FIR Filter, Composite frequency response.
Noise reduction in speech processing using improved active noise control (anc...eSAT Journals
Abstract An improved feed forward adaptive Active Noise Control (ANC) scheme is proposed by using Voice Activity Detector (VAD) and wiener filtering method. The ‘speech-plus-noise’ periods and ‘noise-only’ periods are separated using VAD and the unwanted noise is removed by adaptive filtering method. By using Speech Distortion Weighted- Multichannel Wiener Filtering (SDW-MWF) algorithm the noise periods which is present along with the speech signal, is processed and filtered out. The background noise along with the speech samples are removed by the iterative procedure of filtering process. Feed forward based ANC is used to achieve a system with a better noise reduction in speech processing. Adaptive filtering process is carried out and the speech signal without the background noise can be achieved. Key Words: Active Noise Control (ANC), Noise reduction, Adaptive filtering, Feed forward ANC
Noise reduction in speech processing using improved active noise control (anc...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This is my presentation on a Journal Club. It's based on the article: "Auditory inspired machine learning techniques can improve speech intelligibility and quality for hearing-impaired listeners". You can find all the references in the slide at the end of the article. I review very basic techniques in noise reduction, and how the techniques are implemented in the area of deep neural-network.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Lifting Scheme Cores for Wavelet TransformDavid Bařina
The thesis focuses on efficient computation of the two-dimensional discrete wavelet transform. The state-of-the-art methods are extended in several ways to perform the transform in a single loop, possibly in multi-scale fashion, using a compact streaming core. This core can further be appropriately reorganized to target the minimization of certain platform resources. The approach presented here nicely fits into common SIMD extensions, exploits the cache hierarchy of modern general-purpose processors, and is suitable for parallel evaluation. Finally, the approach presented is incorporated into the JPEG 2000 compression chain, in which it has proved to be fundamentally faster than widely used implementations.
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms.
Speech Enhancement for Nonstationary Noise Environmentssipij
In this paper, we present a simultaneous detection and estimation approach for speech enhancement in nonstationary noise environments. A detector for speech presence in the short-time Fourier transform domain is combined with an estimator, which jointly minimizes a cost function that takes into account both detection and estimation errors. Under speech-presence, the cost is proportional to a quadratic spectral amplitude error, while under speech-absence, the distortion depends on a certain attenuation factor. Experimental results demonstrate the advantage of using the proposed simultaneous detection and estimation approach which facilitate suppression of nonstationary noise with a controlled level of speech distortion.
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...sipij
Usually, hearing impaired people use hearing aids which are implemented with speech enhancement
algorithms. Estimation of speech and estimation of nose are the components in single channel speech
enhancement system. The main objective of any speech enhancement algorithm is estimation of noise power
spectrum for non stationary environment. VAD (Voice Activity Detector) is used to identify speech pauses
and during these pauses only estimation of noise. MMSE (Minimum Mean Square Error) speech
enhancement algorithm did not enhance the intelligibility, quality and listener fatigues are the perceptual
aspects of speech. Novel evaluation approach SR (Signal to Residual spectrum ratio) based on uncertainty
parameter introduced for the benefits of hearing impaired people in non stationary environments to control
distortions. By estimation and updating of noise based on division of original pure signal into three parts
such as pure speech, quasi speech and non speech frames based on multiple threshold conditions. Different
values of SR and LLR demonstrate the amount of attenuation and amplification distortions. The proposed
method will compared with any one method WAT(Weighted Average Technique) Hence by using
parameters SR (signal to residual spectrum ratio) and LLR (log like hood ratio), MMSE (Minim Mean
Square Error) in terms of segmented SNR and LLR.
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
Previous research has found autocorrelation domain as an appropriate domain for signal and noise
separation. This paper discusses a simple and effective method for decreasing the effect of noise on the
autocorrelation of the clean signal. This could later be used in extracting mel cepstral parameters for
speech recognition. Two different methods are proposed to deal with the effect of error introduced by
considering speech and noise completely uncorrelated. The basic approach deals with reducing the effect
of noise via estimation and subtraction of its effect from the noisy speech signal autocorrelation. In order
to improve this method, we consider inserting a speech/noise cross correlation term into the equations used
for the estimation of clean speech autocorrelation, using an estimate of it, found through Kernel method.
Alternatively, we used an estimate of the cross correlation term using an averaging approach. A further
improvement was obtained through introduction of an overestimation parameter in the basic method. We
tested our proposed methods on the Aurora 2 task. The Basic method has shown considerable improvement
over the standard features and some other robust autocorrelation-based features. The proposed techniques
have further increased the robustness of the basic autocorrelation-based method.
International Journal of Engineering Research and Applications (IJERA) aims to cover the latest outstanding developments in the field of all Engineering Technologies & science.
International Journal of Engineering Research and Applications (IJERA) is a team of researchers not publication services or private publications running the journals for monetary benefits, we are association of scientists and academia who focus only on supporting authors who want to publish their work. The articles published in our journal can be accessed online, all the articles will be archived for real time access.
Our journal system primarily aims to bring out the research talent and the works done by sciaentists, academia, engineers, practitioners, scholars, post graduate students of engineering and science. This journal aims to cover the scientific research in a broader sense and not publishing a niche area of research facilitating researchers from various verticals to publish their papers. It is also aimed to provide a platform for the researchers to publish in a shorter of time, enabling them to continue further All articles published are freely available to scientific researchers in the Government agencies,educators and the general public. We are taking serious efforts to promote our journal across the globe in various ways, we are sure that our journal will act as a scientific platform for all researchers to publish their works online.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Speech Analysis and synthesis using VocoderIJTET Journal
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period.
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
Speech synthesis and recognition are the basic techniques used for man-machine communication. This type
of communication is valuable when our hands and eyes are busy in some other task such as driving a
vehicle, performing surgery, or firing weapons at the enemy. Dynamic time warping (DTW) is mostly used
for aligning two given multidimensional sequences. It finds an optimal match between the given sequences.
The distance between the aligned sequences should be relatively lesser as compared to unaligned
sequences. The improvement in the alignment may be estimated from the corresponding distances. This
technique has applications in speech recognition, speech synthesis, and speaker transformation. The
objective of this research is to investigate the amount of improvement in the alignment corresponding to the
sentence based and phoneme based manually aligned phrases. The speech signals in the form of twenty five
phrases were recorded from each of six speakers (3 males and 3 females). The recorded material was
segmented manually and aligned at sentence and phoneme level. The aligned sentences of different speaker
pairs were analyzed using HNM and the HNM parameters were further aligned at frame level using DTW.
Mahalanobis distances were computed for each pair of sentences. The investigations have shown more than
20 % reduction in the average Mahalanobis distances.
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Broad Phoneme Classification Using Signal Based Features ijsc
Speech is the most efficient and popular means of human communication Speech is produced as a sequence of phonemes. Phoneme recognition is the first step performed by automatic speech recognition system. The state-of-the-art recognizers use mel-frequency cepstral coefficients (MFCC) features derived through short time analysis, for which the recognition accuracy is limited. Instead of this, here broad phoneme classification is achieved using features derived directly from the speech at the signal level itself. Broad phoneme classes include vowels, nasals, fricatives, stops, approximants and silence. The features identified useful for broad phoneme classification are voiced/unvoiced decision, zero crossing rate (ZCR), short time energy, most dominant frequency, energy in most dominant frequency, spectral flatness measure and first three formants. Features derived from short time frames of training speech are used to train a multilayer feedforward neural network based classifier with manually marked class label as output and classification accuracy is then tested. Later this broad phoneme classifier is used for broad syllable structure prediction which is useful for applications such as automatic speech recognition and automatic language identification.
A Gaussian Clustering Based Voice Activity Detector for Noisy Environments Us...CSCJournals
In this paper, a voice activity detector is proposed on the basis of Gaussian modeling of noise in the spectro-temporal space. Spectro-temporal space is obtained from auditory cortical processing. The auditory model that offers a multi-dimensional picture of the sound includes two stages: the initial stage is a model of inner ear and the second stage is the auditory central cortical modeling in the brain. In this paper, the speech noise in this picture has been modeled by a 3-D mono Gaussian cluster. At the start of suggested VAD process, the noise is modeled by a Gaussian shaped cluster. The average noise behavior is obtained in different spectrotemporal space in various points for each frame. In the stage of separation of speech from noise, the criterion is the difference between the average noise behavior and the speech signal amplitude in spectrotemporal domain. This was measured for each frame and was used as the criterion of classification. Using Noisex92, this method is tested in different noise models such as White, exhibition, Street, Office and Train noises. The results are compared to both auditory model and multifeature method. It is observed that the performance of this method in low signal-to-noise ratios (SNRs) conditions is better than other current methods.
Similar to International Journal of Computational Engineering Research(IJCER) (20)
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
International Journal of Computational Engineering Research(IJCER)
1. International Journal of Computational Engineering Research||Vol, 03||Issue, 10||
Development of A Powerful Signal Processing Approach Using
Multi Resolution Analysis & Speech Denoising
1,
N.V. Narendra Babu , 2,Prof.Dr.G.Manoj Someswar,
1,
Asst. Professor, Department of Mathematics, GITAM University, Hyderabad, A.P., India.(First
Author)
2,
B.Tech., M.S.(USA), M.C.A., Ph.D., Principal & Professor, Department of Computer Science & Enginering,
Anwar-ul-uloom College of Engineering &Technology, Yennepally, RR District, Vikarabad – 501101, A.P.,
India.(Second Author)
ABSTRACT
Speech denoising has been a long lasting problem in audio processing community. There exist
lots of algorithms for denoising if the noise is stationary. For example, Wiener filter is suitable for
additive Gaussian noise. However, if the noise is non-stationary, the classical denoising algorithms
usually have poor performance because the statistical information of the non-stationary noise is difficult
to estimate.
KEYWORDS : Short-Time-Fourier-Transform, modulation domain, image compression, Lossy
compression, Chroma subsampling, Discrete Wavelet Transform
I.
INTRODUCTION
Schmidt [14 ]use NMF to do speech denoising under non-stationary noise, which is completely different
than classical statistical approaches. The key idea is that clean speech signal can be sparsely represented by a
speech dictionary, but non-stationary noise cannot. Similarly, non-stationary noise can also be sparsely
represented by a noise dictionary, but speech cannot.The algorithm for NMF denoising goes as follows. Two
dictionaries, one for speech and one for noise, need to be trained offline. Once a noisy speech is given, we first
calculate the magnitude of the Short-Time-Fourier-Transform. Second, separate it into two parts via NMF, one
can be sparsely represented by the speech dictionary, and the other part can be sparsely represented by the noise
dictionary. Third, the part that is represented by the speech dictionary will be the estimated clean speech.
The enhancement of speech by applying MMSE short-time spectral magnitude estimation in the
modulation domain. For this purpose, the traditional analysis-modification-synthesis framework is extended to
include modulation domain processing. We compensate the noisy modulation spectrum for additive noise
distortion by applying the MMSE short-time spectral magnitude estimation algorithm in the modulation domain.
A number of subjective experiments were conducted. Initially, we determine the parameter values that maximise
the subjective quality of stimuli enhanced using the MMSE modulation magnitude estimator. Next, we compare
the quality of stimuli processed by the MMSE modulation magnitude estimator to those processed using the
MMSE acoustic magnitude estimator and the modulation spectral subtraction method, and show that good
improvement in speech quality is achieved through use of the proposed approach. Then we evaluate the effect of
including speech presence uncertainty and log-domain processing on the quality of enhanced speech, and find
that this method works better with speech uncertainty. Finally we compare the quality of speech enhanced using
the MMSE modulation magnitude estimator (when used with speech presence uncertainty) with that enhanced
using different acoustic domain MMSE magnitude estimator formulations, and those enhanced using different
modulation domain based enhancement algorithms. Results of these tests show that the MMSE modulation
magnitude estimator improves the quality of processed stimuli, without introducing musical noise or spectral
smearing distortion. The proposed method is shown to have better noise suppression than MMSE acoustic
magnitude estimation, and improved speech quality compared to other modulation domain based enhancement
methods considered.Speech enhancement methods aim to improve the quality of noisy speech by reducing
noise, while at the same time minimising any speech distortion introduced by the enhancement process. Many
enhancement methods are based on the short-time Fourier analysis-modification-synthesis framework.
||Issn 2250-3005 ||
||October||2013||
Page 1
2. Development Of A Powerful Signal Processing...
Some examples of these are the spectral subtraction method (Boll 1979), the Wiener filter method
(Wiener, 1949), and the MMSE short-time spectral amplitude estimation method (Ephraim and Mala, 1984).
Spectral subtraction is perhaps one of the earliest and most extensively studied methods for speech
enhancement. This simple method enhances speech by subtracting a spectral estimate of noise from the noisy
speech spectrum in either the magnitude or energy domain. Though this method is effective at reducing noise, it
suffers from the problem of musical noise distortion, which is very annoying to listeners. To overcome this
problem, Ephraim and Mala in 1984 proposed the MMSE short-time spectral amplitude estimator, referred to
throughout this work as the acoustic magnitude estimator (AME). In the literature (e.g., Cappe, 1984; Scalart
and Filho, 1996), it has been suggested that the good performance of the AME can be largely attributed to the
use of the decision-directed approach for estimation of the a priori signal-to-noise ratio (a priori SNR). The
AME method, even today, remains one of the most effective and popular methods for speech
enhancement.Recently, the modulation domain has become popular for speech processing. This has been in part
due to the strong psychoacoustic and physiological evidence, which supports the significance of the modulation
domain for the analysis of speech signals. Zadeh (1950) was perhaps the first to propose a two-dimensional bifrequency system, where the second dimension for frequency analysis was the transform of the time variation of
the magnitudes at each standard (acoustic) frequency. Atlas et al. (2004) more recently defines the acoustic
frequency as the axis of the first short-time Fourier transform (STFT) of the input signal and the modulation
frequency as the independent variable of the second STFT transform.
Early efforts to utilise the modulation domain for speech enhancement assumed speech and noise to be
stationary, and applied fixed filtering on the trajectories of the acoustic magnitude spectrum. For example,
Hermansky et al. (1995) proposed band-pass filtering the time trajectories of the cubic-root compressed shorttime power spectrum to enhance speech. Falk et al. (2007) and Lyons and Paliwal (2008) applied similar bandpass filtering to the time trajectories of the short-time magnitude (power) spectrum for speech
enhancement.However, speech and possibly noise are known to be nonstationary. To capture this
nonstationarity, one option is to assume speech to be quasi-stationary, and process the trajectories of the
acoustic magnitude spectrum on a short time basis. At this point it is useful to differentiate the acoustic spectrum
from the modulation spectrum as follows. The acoustic spectrum is the STFT of the speech signal, while the
modulation spectrum at a given acoustic frequency is the STFT of the time series of the acoustic spectral
magnitudes at that frequency. The short-time modulation spectrum is thus a function of time, acoustic frequency
and modulation frequency. This type of short-time processing in the modulation domain has been used in the
past for automatic speech recognition (ASR), Kingsbury et al. (1998) for example, applied a modulation
spectrogram representation that emphasized low-frequency amplitude modulations to ASR for improved
robustness in noisy and reverberant conditions. Tyagi et al. (2003) applied mel-cepstrum modulation features to
ASR to give improved performance in the presence of non-stationary noise. Short-time modulation domain
processing has also been applied to objective quality. For example, Kim and Oct (2004, 2005) as well as Falk
and Chan (2008) used the short-time modulation magnitude spectrum to derive objective measures that
characterise the quality of processed speech. For speech enhancement, short-time modulation domain processing
was recently applied in the modulation spectral subtraction method (ModSSub) of Paliwal et al. (2010). Here,
the spectral subtraction method was extended to the modulation domain, enhancing speech by subtracting the
noise modulation energy spectrum from the noisy modulation energy spectrum in an analysis-modification
synthesis (AMS) framework. In ModSSub method, the frame duration used for computing the short-time
modulation spectrum was found to be an important parameter, providing a trade-off between quality and level of
musical noise. Increasing the frame duration reduced musical noise, but introduced a slurring distortion. A
somewhat long frame duration of 256 ms was recommended as a good compromise.
The disadvantages of using longer modulation domain analysis window are as follows. Firstly, we are
assuming stationarity which we know is not the case. Secondly, quite a long portion is needed for the initial
estimation of noise, and thirdly, as shown by Paliwal et al. (2011), speech quality and intelligibility is higher
when the modulation magnitude spectrum is processed using short frame durations and lower when processed
using longer frame durations. For these reasons, we aim to find a method better suited to the use of shorter
modulation analysis window durations. Since the AME method has been found to be more effective than
spectral subtraction in the acoustic domain, in this paper, we explore the effectiveness of this method in the
short-time modulation domain. For this purpose, the traditional analysis-modification-synthesis framework is
extended to include modulation domain processing, then the noisy modulation spectrum is compensated for
additive noise distortion by applying the MMSE short-time spectral magnitude estimation algorithm. The
advantage of applying a MMSE-based method is that it does not introduce musical noise and hence can be used
with shorter frame durations in the modulation domain.
||Issn 2250-3005 ||
||October||2013||
Page 2
3. Development Of A Powerful Signal Processing...
II.
IMAGE COMPRESSION:
The objective of image compression is to reduce irrelevance and redundancy of the image data in
order to be able to store or transmit data in an efficient form.
Figure 1 : A chart showing the relative quality of various jpg settings and also compares saving a file as
a jpg normally and using a "save for web" technique.
III.
LOSSY AND LOSSLESS COMPRESSION
Image compression may be lossy or lossless. Lossless compression is preferred for archival purposes
and often for medical imaging, technical drawings, clip art, or comics. This is because lossy compression
methods, especially when used at low bit rates, introduce compression artifacts. Lossy methods are especially
suitable for natural images such as photographs in applications where minor (sometimes imperceptible) loss of
fidelity is acceptable to achieve a substantial reduction in bit rate. The lossy compression that produces
imperceptible differences may be called visually lossless.
Methods for lossless image compression are:
Run-length encoding – used as default method in PCX and as one of possible in BMP, TGA, TIFF
DPCM and Predictive Coding
Entropy encoding
Adaptive dictionary algorithms such as LZW – used in GIF and TIFF
Deflation – used in PNG, MNG, and TIFF
Chain codes
Methods for lossy compression:
Reducing the color space to the most common colors in the image. The selected colors are specified in the
color palette in the header of the compressed image. Each pixel just references the index of a color in the
color palette, this method can be combined with dithering to avoid posterization.
Chroma subsampling. This takes advantage of the fact that the human eye perceives spatial changes of
brightness more sharply than those of color, by averaging or dropping some of the chrominance information
in the image.
Transform coding. This is the most commonly used method. In particular, a Fourier-related transform such
as the Discrete Cosine Transform (DCT) is widely used: N. Ahmed, T. Natarajan and K.R.Rao, "Discrete
Cosine Transform," IEEE Trans. Computers, 90-93, Jan. 1974. The DCT is sometimes referred to as "DCTII" in the context of a family of discrete cosine transforms; e.g., see discrete cosine transform. The more
recently developed wavelet transform is also used extensively, followed by quantization and entropy
coding.
Fractal compression.
||Issn 2250-3005 ||
||October||2013||
Page 3
4. Development Of A Powerful Signal Processing...
IV.
THE WAVELET TRANSFORM BASED CODING
The main difference between the wavelet transform (WT) and the discrete cosine transform (DCT)
coding system is the omission of transform coder‘s sub-image processing stages in WT. Because WTs are both
computationally efficient and inherently local (i.e. their basis functions are limited in duration), subdivision of
the original image is not required. The removal of the subdivision step eliminates the blocking artifact. Wavelet
coding techniques are based on the idea that the coefficient of a transform which de-correlates the pixels of an
image can be coded more efficiently than the original pixels themselves [9]. The computed transform converts a
large portion of the original image to horizontal, vertical and diagonal decomposition coefficients with zero
mean and Laplacian–like distribution. The 9/7 tap biorthogonal filters[10], which produce floating point wavelet
coefficients, are widely used in MIC techniques to generate a wavelet transform [11,12,13]. The wavelet
coefficients are uniformly quantized by dividing by a user specified parameter and rounding off to the nearest
integer. Typically, a large majority of coefficients with small values are quantized to zero by this step. The
zeroes in the resulting sequence are run-length encoded, and Huffman and arithmetic coding are performed on
the resulting sequence. The various subbands blocks of coefficients are coded separately, which improves the
overall compression [9]. If the quantization parameter is increased, more coefficients are quantized to zero, the
remaining ones are quantized more coarsely, the representation accuracy decreases, and the CR increases
consequently. Since the input image needs to be divided into blocs in DCT based compression, correlation
across the block boundaries is not eliminated. This results in ‗blocking artifacts‘ particularly at low bpp.
Whereas in WT coding, there is no need to block the input image and its basis functions have variable length
hence wavelet schemes at higher CRs avoid blocking artifacts. The basic structure of WT based compression
process is shown in Figure 2 below. The other details of wavelet transform may be referred in [5, 14].
Figure 2: Basic Structure of WT based Compression
4.1.The Discrete Wavelet Transform
Wavelet Transforms are based on ‗basis functions‘. Unlike the Fourier transform, whose basis
functions are sinusoids, Wavelet Transforms are based on small waves, called ‗wavelets‘ of varying frequency
and limited duration. Wavelets are the foundation of a powerful signal processing approach, called MultiResolution Analysis (MRA). As its name implies, the multi-resolution theory is concerned with the
representation and analysis of signals (or images) at more than one resolution. Hence features that might go
undetected at one resolution may be easy to spot at another. The Wavelet analysis is based on two important
functions viz. the scaling function and the Wavelet function. Calculating wavelet coefficients at every possible
scale is a fair amount of work, and it generates lot of data. If chosen only a subset of scales and positions at
which to make the calculations, it turns out, rather remarkably, that if chosen scales and positions based on
powers of two — so called dyadic scales and positions — then the analysis will be much more efficient and just
as accurate. If the function being expanded is a sequence of numbers, like samples of a continuous function f(x),
the resulting coefficients are called the discrete wavelet transform (DWT) of f (x). The decomposition process of
high and low frequency components by using DWT is depicted in Figure 3 in the block diagram [14].
||Issn 2250-3005 ||
||October||2013||
Page 4
5. Development Of A Powerful Signal Processing...
Figure 3
Figure 4: 2D discrete wavelet transform used in JPEG2000
In an example of the 2D discrete wavelet transform that is used in JPEG2000, the original image is
high-pass filtered, yielding the three large images, each describing local changes in brightness (details) in the
original image. It is then low-pass filtered and downscaled, yielding an approximation image; this image is highpass filtered to produce the three smaller detail images, and low-pass filtered to produce the final approximation
image in the upper-left.
In numerical analysis and functional analysis, a discrete wavelet transform (DWT) is any wavelet
transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it
has over Fourier transforms is temporal resolution: it captures both frequency and location information (location
in time).
Examples Haar wavelets :The first DWT was invented by the Hungarian mathematician Alfréd Haar. For an
input represented by a list of numbers, the Haar wavelet transform may be considered to simply pair up input
values, storing the difference and passing the sum. This process is repeated recursively, pairing up the sums to
provide the next scale: finally resulting in
differences and one final sum. Daubechies wavelets The most
commonly used set of discrete wavelet transforms was formulated by the Belgian mathematician Ingrid
Daubechies in 1988. This formulation is based on the use of recurrence relations to generate progressively finer
discrete samplings of an implicit mother wavelet function; each resolution is twice that of the previous scale. In
her seminal paper, Daubechies derives a family of wavelets, the first of which is the Haar wavelet. Interest in
this field has exploded since then, and many variations of Daubechies' original wavelets were developed.[1]
The Dual-Tree Complex Wavelet Transform (ℂWT) The Dual-Tree Complex Wavelet Transform
(ℂWT) is relatively recent enhancement to the discrete wavelet transform (DWT), with important additional
properties: It is nearly shift invariant and directionally selective in two and higher dimensions. It achieves this
with a redundancy factor of only for d-dimensional signals, which is substantially lower than the undecimated
||Issn 2250-3005 ||
||October||2013||
Page 5
6. Development Of A Powerful Signal Processing...
DWT. The multidimensional (M-D) dual-tree ℂWT is nonseparable but is based on a computationally efficient,
separable filter bank (FB).[2] Others Other forms of discrete wavelet transform include the non- or undecimated
wavelet transform (where downsampling is omitted), the Newland transform (where an orthonormal basis of
wavelets is formed from appropriately constructed top-hat filters in frequency space).
Wavelet packet
transforms are also related to the discrete wavelet transform. Complex wavelet transform is another form.
Properties The Haar DWT illustrates the desirable properties of wavelets in general. First, it can be performed in
operations; second, it captures not only a notion of the frequency content of the input, by examining it at
different scales, but also temporal content, i.e. the times at which these frequencies occur. Combined, these two
properties make the Fast wavelet transform (FWT) an alternative to the conventional Fast Fourier Transform
(FFT). Time Issues Due to the rate-change operators in the filter bank, the discrete WT is not time-invariant but
actually very sensitive to the alignment of the signal in time. To address the time-varying problem of wavelet
transforms, Mallat and Zhong proposed a new algorithm for wavelet representation of a signal, which is
invariant to time shifts.[3] According to this algorithm, which is called a TI-DWT, only the scale parameter is
sampled along the dyadic sequence 2^j (j∈Z) and the wavelet transform is calculated for each point in
time.[4][5]
Applications
The discrete wavelet transform has a huge number of applications in science, engineering, mathematics
and computer science. Most notably, it is used for signal coding, to represent a discrete signal in a more
redundant form, often as a preconditioning for data compression. Practical applications can also be found in
signal processing of accelerations for gait analysis, [6] in digital communications and many others. [7] [8][9] It is
shown that discrete wavelet transform (discrete in scale and shift, and continuous in time) is successfully
implemented as analog filter bank in biomedical signal processing for design of low-power pacemakers and also
in ultra-wideband (UWB) wireless communications. [10]
V.
DATA INTERPRETATION AND ANALYSIS
Comparison with Fourier transform
To illustrate the differences and similarities between the discrete wavelet transform with the discrete
Fourier transform, consider the DWT and DFT of the following sequence: (1,0,0,0), a unit impulse.
The DFT has orthogonal basis (DFT matrix):
1 1 1 1
1 0 –1 0
0 1 0 –1
1 –1 1 –1
while the DWT with Haar wavelets for length 4 data has orthogonal basis in the rows of:
1 1 1 1
1 1 –1 –1
1 –1 0 0
0 0 1 –1
(To simplify notation, whole numbers are used, so the bases are orthogonal but not orthonormal.)
Preliminary observations include:
Wavelets have location – the (1,1,–1,–1) wavelet corresponds to ―left side‖ versus ―right side‖, while the
last two wavelets have support on the left side or the right side, and one is a translation of the other.
Sinusoidal waves do not have location – they spread across the whole space – but do have phase – the
second and third waves are translations of each other, corresponding to being 90° out of phase, like cosine
and sine, of which these are discrete versions.
Decomposing the sequence with respect to these bases yields:
||Issn 2250-3005 ||
||October||2013||
Page 6
7. Development Of A Powerful Signal Processing...
The DWT demonstrates the localization: the (1,1,1,1) term gives the average signal value, the (1,1,–1,–
1) places the signal in the left side of the domain, and the (1,–1,0,0) places it at the left side of the left side, and
truncating at any stage yields a downsampled version of the signal:
Figure 5:
The sinc function, showing the time domain artifacts (undershoot and ringing) of truncating a Fourier series.
The DFT, by contrast, expresses the sequence by the interference of waves of various frequencies –
thus truncating the series yields a low-pass filtered version of the series:
Notably, the middle approximation (2-term) differs. From the frequency domain perspective, this is a
better approximation, but from the time domain perspective it has drawbacks – it exhibits undershoot – one of
the values is negative, though the original series is non-negative everywhere – and ringing, where the right side
is non-zero, unlike in the wavelet transform. On the other hand, the Fourier approximation correctly shows a
peak, and all points are within
of their correct value, though all points have error.
The wavelet approximation, by contrast, places a peak on the left half, but has no peak at the first point,
and while it is exactly correct for half the values (reflecting location), it has an error of
for the other
values.This illustrates the kinds of trade-offs between these transforms, and how in some respects the DWT
provides preferable behavior, particularly for the modeling of transients.
||Issn 2250-3005 ||
||October||2013||
Page 7
8. Development Of A Powerful Signal Processing...
VI.
RESULTS & DISCUSSION
4.1.One level of the transform
The DWT of a signal is calculated by passing it through a series of filters. First the samples are
passed through a low pass filter with impulse response resulting in a convolution of the two:
The signal is also decomposed simultaneously using a high-pass filter . The outputs giving the detail
coefficients (from the high-pass filter) and approximation coefficients (from the low-pass). It is important that
the two filters are related to each other and they are known as a quadrature mirror filter.
However, since half the frequencies of the signal have now been removed, half the samples can be
discarded according to Nyquist‘s rule. The filter outputs are then subsampled by 2 (Mallat's and the common
notation is the opposite, g- high pass and h- low pass):
This decomposition has halved the time resolution since only half of each filter output characterises the
signal. However, each output has half the frequency band of the input so the frequency resolution has been
doubled.
Figure 6: Block diagram of filter analysis
With the subsampling operator
the above summation can be written more concisely.
However computing a complete convolution
computation time.
with subsequent downsampling would waste
The Lifting scheme is an optimization where these two computations are interleaved.
Cascading and Filter banks
This decomposition is repeated to further increase the frequency resolution and the approximation
coefficients decomposed with high and low pass filters and then down-sampled. This is represented as a binary
tree with nodes representing a sub-space with a different time-frequency localisation. The tree is known as a
filter bank.
||Issn 2250-3005 ||
||October||2013||
Page 8
9. Development Of A Powerful Signal Processing...
Figure 7: A 3 level filter bank
In case of a 3 level filter bank, at each level in the above diagram the signal is decomposed into low
and high frequencies. Due to the decomposition process the input signal must be a multiple of
where is the
number of levels.
For example a signal with 32 samples, frequency range 0 to
scales are produced:
Level Frequencies
3
and 3 levels of decomposition, 4 output
Samples
4
to
to
4
2
to
8
1
to
16
Figure 8: Frequency domain representation of the DWT
Other transforms
The Adam7 algorithm, used for interlacing in the Portable Network Graphics (PNG) format, is a
multiscale model of the data which is similar to a DWT with Haar wavelets.Unlike the DWT, it has a specific
scale – it starts from an 8×8 block, and it downsamples the image, rather than decimating (low-pass filtering,
then downsampling). It thus offers worse frequency behavior, showing artifacts (pixelation) at the early stages,
in return for simpler implementation.
REFERENCES:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
.
P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice-Hall, 1993.
G. Strang and T. Q. Nguyen,Wavelets and Filter Banks. Boston, MA: Wellesley-Cambridge, 1996.
M. N. Do and M. Vetterli, ―The contourlet transform: An efficient directional multiresolution image representation,‖ IEEE Trans.
Image Process. vol. 14, no. 12, pp. 2091–2106, Dec. 2005 [Online]. Available:http://www.ifp.uiuc.edu/~minhdo/software/
P. Burt and E. Adelson, ―The Laplacian pyramid as a compact image code,‖ IEEE Trans. Commun., vol. COM-31, no. 4, pp.
532–540, Apr.1983.
E. Candès and D. L. Donoho, ―Curvelets—A surprisingly effective nonadaptive representation for objects with edges,‖ in Curves
and Surfaces Fitting, A. Cohen, C. Rabut, and L. L. Schumaker,
Eds. Nashville, TN: Vanderbilt Univ. Press, 1999.
J. L. Starck, E. J. Candes, and D. L. Donoho, ―The curvelet transform for image denoising,‖ IEEE Trans. Image Process., vol. 11,
no. 6, pp.670–684, Jun. 2002.
E. Candès and D. L. Donoho, ―New tight frames of curvelets and optimal representations of objects with piecewise �
singularities,‖Commun. Pure and Appl. Math., pp. 219–266, 2004.
Y. Lu and M. N. Do, ―CRISP-contourlet: A critically sampled directional multiresolution image representation,‖ in Proc. SPIE
Conf. Wavelet Applications in Signal and Image Processing, 2003, pp.655–665.
Y. Lu and M. N. Do, ―A new contourlet transform with sharp frequency localization,‖ in Proc. Int. Conf. Image Processing, 2006,
pp. 1629–1632.
||Issn 2250-3005 ||
||October||2013||
Page 9
10. Development Of A Powerful Signal Processing...
[11]
[12]
[13]
[14]
[15]
[16]
T. T. Nguyen and S. Oraintara, ―A class of multiresolution directional filter bank,‖ IEEE Trans. Signal Process., vol. 55, no. 3, pp
949–961,Mar. 2007.
R. H. Bamberger and M. J. T. Smith, ―A filter bank for the directional decomposition of images: Theory and design,‖ IEEE
Trans. Signal Process., vol. 40, no. 4, pp. 882–893, Apr. 1992.
R. H. Bamberger, ―New results on two and three dimensional directional filter banks,‖ in Proc. 27th Asilomar Conf. Signals, S.-I.
Systems and Computers, 1993, pp. 1286–1290.
Park, M. J. T. Smith, and R. M. Mersereau, ―Improved structures of maximally decimated directional filter Banks for spatial R
image analysis,‖ IEEE Trans. Image Process., vol. 13, no. 11, pp. 1424–1431,Nov. 2004.
. Eslami and H. Radha, ―A new family of nonredundant transforms using hybrid wavelets and directional filter banks,‖ IEEE
Trans. Image Process., vol. 16, no. 4, pp. 1152–1167, Apr. 2007.
A.Said and W. A. Pearlman, ―A new, fast, and efficient image codec based on set partitioning in hierarchical trees,‖ IEEE Trans.
Circuits Syst. Video Technol., vol. 6, no. 3, pp. 243–250, Mar.1996.
||Issn 2250-3005 ||
||October||2013||
Page 10