The aim of audio morphing algorithms is to combine two or more sounds to create a new sound with intermediate timbre and duration. During the last two decades several efforts have been made to improve morphing algorithms in order to obtain more realistic and perceptually relevant sounds. In this paper we present an automatic audio morphing technique applied to percussive musical instruments. Based on preprocessing of the sound references in frequency domain and linear interpolation in time domain, the presented approach allows one to generate high quality hybrid sounds at a low computational cost. Several results are reported in order to show the effectiveness of the proposed approach in terms of audio quality and acoustic perception of the generated hybrid sounds, taking into consideration different percussive samples. Mean opinion score and multidimensional scaling were used to compare the presented approach with existing state of the art techniques.
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...a3labdsp
The document proposes an algorithm for audio rendering using multiple impulse responses that allows for reproduction of a moving listener position. It analyzes impulse response tails to generate a prototype tail and uses a hybrid reverberation structure including FIR and IIR filters to synthesize the reverberation effect in real-time. Experimental results on a church impulse response database show the approach can accurately reproduce reverberation time and clarity measurements compared to real impulse responses. Informal listening tests found no perceptible differences between the proposed approach and an existing technique.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Digital signal processing (DSP) involves converting analog signals to digital signals and manipulating the digital signals using software algorithms. DSP systems use analog-to-digital conversion to convert analog signals to digital signals represented as sequences of numbers. They then process the digital signals using a digital signal processor and convert them back to analog signals using digital-to-analog conversion. Key techniques in DSP include decomposing signals into simple components, processing the components individually, and then combining the results.
Digital signal processing through speech, hearing, and PythonMel Chua
Slides from PyCon 2013 tutorial reformatted for self-study. Code at https://github.com/mchua/pycon-sigproc, original description follows: Why do pianos sound different from guitars? How can we visualize how deafness affects a child's speech? These are signal processing questions, traditionally tackled only by upper-level engineering students with MATLAB and differential equations; we're going to do it with algebra and basic Python skills. Based on a signal processing class for audiology graduate students, taught by a deaf musician.
The document summarizes linear predictive coding (LPC), a speech compression technique. LPC works by modeling the human vocal tract and representing each speech segment as a linear combination of past speech samples. It analyzes speech signals by determining if segments are voiced or unvoiced, estimating the pitch period, and computing filter coefficients. The coefficients and other parameters are transmitted to allow reconstruction of the speech. LPC can achieve a bit rate of 2400 bps, making it suitable for secure communications. Simulation results show LPC can compress male and female speech but introduces noise, performing better on male voices which have less high frequencies.
An Advanced Implementation of a Digital Artificial Reverberatora3labdsp
This paper proposes an enhanced hybrid reverberator that uses both measured impulse responses and synthesized impulse responses to model reverberation effects. It develops an automatic procedure to set the parameters of the hybrid reverberator by analyzing the mixing time of measured impulse responses and minimizing a loss function in the cepstral domain. Experimental results show the proposed method produces higher quality reverberation effects than a previous method, with only a small increase in computational cost, as confirmed by listening tests.
Hybrid Reverberation Algorithm: a Practical Approacha3labdsp
Reverberation is a well known eect that has an important role in our listening experience. Reverberation changes positively the perception of the sound, adding fullness and sense of space. Generally, two approaches are employed for articial reverberation: the desired signal can be obtained by convolving the input signal
with a measured impulse response (IR) or by synthetic techniques based on recursive lter structures. Taking into account the advantages of both approaches, a hybrid articial reverberation algorithm is presented aiming to reproduce the acoustic behaviour of real environment with a low computational load. More in detail, the early reflections are derived from a real impulse response, truncated considering the calculated mixing time, and the reverberation tail is obtained using an IIR lter network. The parameters dening this structure are automatically derived from the analyzed impulse response, using a minimization criteria based on Simultaneous Perturbation Stochastic Approximation (SPSA). The effectiveness of the proposed approach has been proved taking into account a real Italian Theatre impulse response providing comparison with the existing state-of-art techniques in terms of objective and subjective measures.
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...a3labdsp
The document proposes an algorithm for audio rendering using multiple impulse responses that allows for reproduction of a moving listener position. It analyzes impulse response tails to generate a prototype tail and uses a hybrid reverberation structure including FIR and IIR filters to synthesize the reverberation effect in real-time. Experimental results on a church impulse response database show the approach can accurately reproduce reverberation time and clarity measurements compared to real impulse responses. Informal listening tests found no perceptible differences between the proposed approach and an existing technique.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Digital signal processing (DSP) involves converting analog signals to digital signals and manipulating the digital signals using software algorithms. DSP systems use analog-to-digital conversion to convert analog signals to digital signals represented as sequences of numbers. They then process the digital signals using a digital signal processor and convert them back to analog signals using digital-to-analog conversion. Key techniques in DSP include decomposing signals into simple components, processing the components individually, and then combining the results.
Digital signal processing through speech, hearing, and PythonMel Chua
Slides from PyCon 2013 tutorial reformatted for self-study. Code at https://github.com/mchua/pycon-sigproc, original description follows: Why do pianos sound different from guitars? How can we visualize how deafness affects a child's speech? These are signal processing questions, traditionally tackled only by upper-level engineering students with MATLAB and differential equations; we're going to do it with algebra and basic Python skills. Based on a signal processing class for audiology graduate students, taught by a deaf musician.
The document summarizes linear predictive coding (LPC), a speech compression technique. LPC works by modeling the human vocal tract and representing each speech segment as a linear combination of past speech samples. It analyzes speech signals by determining if segments are voiced or unvoiced, estimating the pitch period, and computing filter coefficients. The coefficients and other parameters are transmitted to allow reconstruction of the speech. LPC can achieve a bit rate of 2400 bps, making it suitable for secure communications. Simulation results show LPC can compress male and female speech but introduces noise, performing better on male voices which have less high frequencies.
An Advanced Implementation of a Digital Artificial Reverberatora3labdsp
This paper proposes an enhanced hybrid reverberator that uses both measured impulse responses and synthesized impulse responses to model reverberation effects. It develops an automatic procedure to set the parameters of the hybrid reverberator by analyzing the mixing time of measured impulse responses and minimizing a loss function in the cepstral domain. Experimental results show the proposed method produces higher quality reverberation effects than a previous method, with only a small increase in computational cost, as confirmed by listening tests.
Hybrid Reverberation Algorithm: a Practical Approacha3labdsp
Reverberation is a well known eect that has an important role in our listening experience. Reverberation changes positively the perception of the sound, adding fullness and sense of space. Generally, two approaches are employed for articial reverberation: the desired signal can be obtained by convolving the input signal
with a measured impulse response (IR) or by synthetic techniques based on recursive lter structures. Taking into account the advantages of both approaches, a hybrid articial reverberation algorithm is presented aiming to reproduce the acoustic behaviour of real environment with a low computational load. More in detail, the early reflections are derived from a real impulse response, truncated considering the calculated mixing time, and the reverberation tail is obtained using an IIR lter network. The parameters dening this structure are automatically derived from the analyzed impulse response, using a minimization criteria based on Simultaneous Perturbation Stochastic Approximation (SPSA). The effectiveness of the proposed approach has been proved taking into account a real Italian Theatre impulse response providing comparison with the existing state-of-art techniques in terms of objective and subjective measures.
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Spatial Fourier transform-based localized sound zone generation with loudspea...Takuma_OKAMOTO
The document describes spatial Fourier transform-based methods for generating localized sound zones with loudspeaker arrays. It discusses previous methods and their problems, and proposes using spatial Fourier transforms to analytically derive spatial filters in the angular spectrum domain. This allows generating bright and dark zones to create localized listening and quiet areas with both linear and circular loudspeaker arrays. Results show the proposed method can generate localized zones more accurately than previous techniques.
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
This document presents a speech enhancement method based on spectral subtraction involving the magnitude and phase components. The proposed method aims to improve noisy speech quality by estimating and subtracting noise from the noisy speech signal in the frequency domain. It involves segmenting the noisy speech into overlapping frames, estimating the noise spectrum during non-speech periods, subtracting the noise magnitude from the noisy speech magnitude, and reconstructing the enhanced speech using the inverse FFT and overlap-add processing. The method was tested on different noise types using MATLAB simulations. Results showed the proposed method achieved better noise reduction compared to conventional spectral subtraction while introducing less speech distortion.
Paper, presented at the Workshop “Music, Mind, Invention”, 30.-31.March 2012, Ewing, NJ.
When a 2D Fourier Transform is applied to piano roll plots which are often used in sequencer software, the resulting 2D graphic is a novel music visualization which reveals internal musical structure. This visualization converts the set of musical notes from the notation display in the piano roll plot to a display which shows structure over time and spectrum within a set musical time period. The transformation is reversible, which means that it also can be used as a novel interface for editing music. The concept of this visualization is demonstrated by software which was written for using MIDI files and creating the visualization with the Fast Fourier Transform (FFT) algorithm. This software demonstrates the live real-time display of this visualization in replay of MIDI files or by music input through a connected MIDI keyboard. The resulting display is independent of pitch transformation or tempo. This visualization approach can be used for musicology studies, for music fingerprinting, comparing composition styles, and for a new creative composition method.
Introductory Lecture to Audio Signal ProcessingAngelo Salatino
The document provides an introduction to audio signal processing and related topics. It discusses analog and digital audio signals, the waveform audio file format (WAV) specification including its header structure, and tools for audio processing like FFmpeg and MATLAB. Example code is given to read header metadata and audio samples from a WAV file in C++. While useful for understanding audio formats and processing, the solution contains an error and FFmpeg is noted as a better library for audio tasks.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Takuma_OKAMOTO
This document proposes a real-time neural text-to-speech system for pitch accent languages using a sequence-to-sequence acoustic model with full-context label input and either a WaveGlow or single Gaussian WaveRNN vocoder. The system realizes high-fidelity synthesis comparable to human speech with a real-time factor of 0.16 using WaveGlow on a GPU. Subjective evaluations show the proposed single Gaussian WaveRNN outperforms other vocoder options. Future work will explore real-time inference on CPUs and compare the sequence-to-sequence acoustic model to conventional pipeline models.
LPC is a speech compression technique that models speech production as a linear function of past speech samples. It has an analysis stage that determines filter coefficients to reproduce a block of speech from the input signal. In the decoding stage, the filter is rebuilt using the coefficients. The document analyzes cepstral coefficients of male and female voices for different Bodo vowels, finding distinctions between genders for certain frames. It concludes LPCC measures may be better for identifying a speaker's sex.
This document summarizes digital modeling techniques for speech signals. It describes the vocal source and vocal tract that produce speech. It then discusses using sampling and techniques like PCM to digitally represent speech signals. Linear predictive coding is presented as a simple method to analyze speech that approximates samples as combinations of past signals. The summary concludes that linear prediction can be used for spectrum estimation by representing the vocal tract transfer function, pitch detection, and speech synthesis.
This paper proposes a voice morphing system for people suffering from Laryngectomy, which is the surgical removal of all or part of the larynx or the voice box, particularly performed in cases of laryngeal cancer. A primitive method of achieving voice morphing is by extracting the source's vocal coefficients and then converting them into the target speaker's vocal parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping the coefficients from source to destination. However, the use of the traditional/conventional GMM-based mapping approach results in the problem of over-smoothening of the converted voice. Thus, we hereby propose a unique method to perform efficient voice morphing and conversion based on GMM, which overcomes the traditional-method effects of over-smoothening. It uses a technique of glottal waveform separation and prediction of excitations and hence the result shows that not only over-smoothening is eliminated but also the transformed vocal tract parameters match with the target. Moreover, the synthesized speech thus obtained is found to be of a sufficiently high quality. Thus, voice morphing based on a unique GMM approach has been proposed and also critically evaluated based on various subjective and objective evaluation parameters. Further, an application of voice morphing for Laryngectomees which deploys this unique approach has been recommended by this paper
1) Equalizer matching involves finding the power spectrum of an example audio, then multiplying the input audio's magnitude spectrogram by a filter matching the example's power spectrum.
2) Noise matching involves denoising the input and example separately, then recombining their clean and noise components using the original signal-to-noise ratio.
3) Reverberation matching uses convolutive non-negative matrix factorization to decompose the input into a dry sound and reverb kernel, and convolve the estimated dry input with the example's reverb kernel.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.
Spatial Enhancement for Immersive Stereo Audio ApplicationsAndreas Floros
In this wok, we propose a stereo recording spatial enhancement technique which retains the original panning / source location, proportionally mapped into the perceived expanded sound stage. The technique uses a time-frequency domain metric for retrieving the panning coefficients applied during the initial stereo mixing. Panning information is also used for separating the original single channel audio streams and finally for synthesizing the expanded sound field using binaural processing. The technique is mainly intended for playback applications over short-distant loudspeaker setups or headphones and, in the context of this work, it is subjectively evaluated in terms of the perceived sound field expansion and the sound-source spatial distribution accuracy.
This document provides an overview of recent developments in sound recognition techniques. It discusses several methods for sound recognition, including matching pursuit algorithms with MFCC features, probabilistic distance support vector machines using generalized gamma modeling of STE features, and frequency vector principal component analysis. The document also reviews related literature on environmental sound recognition using time-frequency audio features and sound event recognition. It aims to present an updated survey on sound recognition methods and discuss future research trends in the field.
This document discusses linear predictive coding (LPC) methods and horn noise detection. It begins with an introduction to speech coders and speech production modeling. It then covers the basic principles of LPC analysis, including the autocorrelation and covariance methods. It discusses solving the LPC equations and using LPC residue to detect horn noise by comparing the residue of speech, silence and known horn noise samples. The document provides results of adding speech and horn noise signals and detecting the horn noise. It concludes by listing references on speech coding algorithms, LPC, and speech processing.
This document discusses discrete-time signal processing and audio signal processing. It covers topics like discrete-time signals, the z-transform, discrete Fourier transform (DFT) and fast Fourier transform (FFT). The key points are:
- Audio signals are typically sampled at 44.1 kHz and quantized to 16 bits per sample.
- The z-transform and discrete Fourier transform (DTFT) are used to analyze discrete-time signals in the transform domain, similar to the Laplace transform and continuous-time Fourier transform for analog signals.
- The discrete Fourier transform (DFT) provides a computational tool to calculate Fourier transforms by sampling the frequency domain at discrete points, resulting in periodicity in the time and
Speech is the vocalizer form of human communication,and based upon the syntactic combination of lexical and vocabularies. The aim of speech coding is to compress the speech signal to the highest possible compression ratio bu t maintaining user acceptability.There are many methods for speech compression like Linear Pre dictive coding (LPC),Code Excited Linear Predictive coding (CELP),Sub-band coding,T ransform coding:- Fast Fourier Transform (FFT),Discrete Cosine Transform (DCT),Continuous Wavelet Transform (CWT),Discrete Wavelet Transform (DWT),Variance Fractal Compression (VFC),Discrete Cosine Transform (DCT),Psychoacoustics and etc. Few of them are discus in this paper.
The document discusses the Fast Fourier Transform (FFT) algorithm. It begins by explaining how the Discrete Fourier Transform (DFT) and its inverse can be computed on a digital computer, but require O(N2) operations for an N-point sequence. The FFT was discovered to reduce this complexity to O(NlogN) operations by exploiting redundancy in the DFT calculation. It achieves this through a recursive decomposition of the DFT into smaller DFT problems. The FFT provides a significant speedup and enables practical spectral analysis of long signals.
The document discusses speech compression using GSM RPE-LTP encoding. It begins with an introduction to GSM standards and architecture. It then describes how speech is generated and modeled in the GSM 6.10 vocoder using the RPE-LTP algorithm. The algorithm compresses speech by analyzing the signal to determine if it is voiced or unvoiced, encoding the periodicity of voiced sounds, and transmitting the filter parameters. At the receiver, the decoder uses these parameters to reconstruct the speech signal through linear predictive coding, long term prediction synthesis filtering, and residual pulse decoding.
This document summarizes a research paper on spectral analysis techniques for speech processing. It discusses how spectral subtraction is commonly used to reduce additive background noise in speech signals. The proposed method is a frequency-dependent spectral subtraction approach. Unlike conventional spectral subtraction that subtracts noise estimates uniformly across the entire spectrum, the proposed method allows flexible control over subtraction levels to reduce artifacts. Results show the method produces enhanced speech with lower residual noise and better quality than conventional spectral subtraction. In conclusion, a multi-band speech enhancement strategy is developed based on spectral subtraction to improve noise reduction while minimizing speech distortion.
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
This document summarizes a research paper on pitch detection of speech synthesis using MATLAB. It discusses using an adaptable filter and peak-valley decision method to determine pitch marks for speech synthesis. Low-pass filtering and autocorrelation are used to detect pitch periods. An adaptive filter is designed to flatten spectral peaks. Peak and valley costs are calculated over each pitch period to determine pitch marks. Dynamic programming is then used to obtain the optimal pitch mark locations for high quality speech synthesis.
Voice morphing is a technique that modifies a source speaker's speech to sound like it was spoken by a target speaker. It works by analyzing the source speech into an excitation signal and filter components, then resynthesizing it with the pitch and vocal characteristics of the target speaker. The key steps are detecting the pitches of the source and target speakers, scaling the source pitch to match the target, then resynthesizing the source speech using the target's vocal filter characteristics and the pitch-scaled excitation signal. Voice morphing was developed in 1999 and has applications in text-to-speech, dubbing, voice disguising, and public announcement systems.
Spatial Fourier transform-based localized sound zone generation with loudspea...Takuma_OKAMOTO
The document describes spatial Fourier transform-based methods for generating localized sound zones with loudspeaker arrays. It discusses previous methods and their problems, and proposes using spatial Fourier transforms to analytically derive spatial filters in the angular spectrum domain. This allows generating bright and dark zones to create localized listening and quiet areas with both linear and circular loudspeaker arrays. Results show the proposed method can generate localized zones more accurately than previous techniques.
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
This document presents a speech enhancement method based on spectral subtraction involving the magnitude and phase components. The proposed method aims to improve noisy speech quality by estimating and subtracting noise from the noisy speech signal in the frequency domain. It involves segmenting the noisy speech into overlapping frames, estimating the noise spectrum during non-speech periods, subtracting the noise magnitude from the noisy speech magnitude, and reconstructing the enhanced speech using the inverse FFT and overlap-add processing. The method was tested on different noise types using MATLAB simulations. Results showed the proposed method achieved better noise reduction compared to conventional spectral subtraction while introducing less speech distortion.
Paper, presented at the Workshop “Music, Mind, Invention”, 30.-31.March 2012, Ewing, NJ.
When a 2D Fourier Transform is applied to piano roll plots which are often used in sequencer software, the resulting 2D graphic is a novel music visualization which reveals internal musical structure. This visualization converts the set of musical notes from the notation display in the piano roll plot to a display which shows structure over time and spectrum within a set musical time period. The transformation is reversible, which means that it also can be used as a novel interface for editing music. The concept of this visualization is demonstrated by software which was written for using MIDI files and creating the visualization with the Fast Fourier Transform (FFT) algorithm. This software demonstrates the live real-time display of this visualization in replay of MIDI files or by music input through a connected MIDI keyboard. The resulting display is independent of pitch transformation or tempo. This visualization approach can be used for musicology studies, for music fingerprinting, comparing composition styles, and for a new creative composition method.
Introductory Lecture to Audio Signal ProcessingAngelo Salatino
The document provides an introduction to audio signal processing and related topics. It discusses analog and digital audio signals, the waveform audio file format (WAV) specification including its header structure, and tools for audio processing like FFmpeg and MATLAB. Example code is given to read header metadata and audio samples from a WAV file in C++. While useful for understanding audio formats and processing, the solution contains an error and FFmpeg is noted as a better library for audio tasks.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Takuma_OKAMOTO
This document proposes a real-time neural text-to-speech system for pitch accent languages using a sequence-to-sequence acoustic model with full-context label input and either a WaveGlow or single Gaussian WaveRNN vocoder. The system realizes high-fidelity synthesis comparable to human speech with a real-time factor of 0.16 using WaveGlow on a GPU. Subjective evaluations show the proposed single Gaussian WaveRNN outperforms other vocoder options. Future work will explore real-time inference on CPUs and compare the sequence-to-sequence acoustic model to conventional pipeline models.
LPC is a speech compression technique that models speech production as a linear function of past speech samples. It has an analysis stage that determines filter coefficients to reproduce a block of speech from the input signal. In the decoding stage, the filter is rebuilt using the coefficients. The document analyzes cepstral coefficients of male and female voices for different Bodo vowels, finding distinctions between genders for certain frames. It concludes LPCC measures may be better for identifying a speaker's sex.
This document summarizes digital modeling techniques for speech signals. It describes the vocal source and vocal tract that produce speech. It then discusses using sampling and techniques like PCM to digitally represent speech signals. Linear predictive coding is presented as a simple method to analyze speech that approximates samples as combinations of past signals. The summary concludes that linear prediction can be used for spectrum estimation by representing the vocal tract transfer function, pitch detection, and speech synthesis.
This paper proposes a voice morphing system for people suffering from Laryngectomy, which is the surgical removal of all or part of the larynx or the voice box, particularly performed in cases of laryngeal cancer. A primitive method of achieving voice morphing is by extracting the source's vocal coefficients and then converting them into the target speaker's vocal parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping the coefficients from source to destination. However, the use of the traditional/conventional GMM-based mapping approach results in the problem of over-smoothening of the converted voice. Thus, we hereby propose a unique method to perform efficient voice morphing and conversion based on GMM, which overcomes the traditional-method effects of over-smoothening. It uses a technique of glottal waveform separation and prediction of excitations and hence the result shows that not only over-smoothening is eliminated but also the transformed vocal tract parameters match with the target. Moreover, the synthesized speech thus obtained is found to be of a sufficiently high quality. Thus, voice morphing based on a unique GMM approach has been proposed and also critically evaluated based on various subjective and objective evaluation parameters. Further, an application of voice morphing for Laryngectomees which deploys this unique approach has been recommended by this paper
1) Equalizer matching involves finding the power spectrum of an example audio, then multiplying the input audio's magnitude spectrogram by a filter matching the example's power spectrum.
2) Noise matching involves denoising the input and example separately, then recombining their clean and noise components using the original signal-to-noise ratio.
3) Reverberation matching uses convolutive non-negative matrix factorization to decompose the input into a dry sound and reverb kernel, and convolve the estimated dry input with the example's reverb kernel.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.
Spatial Enhancement for Immersive Stereo Audio ApplicationsAndreas Floros
In this wok, we propose a stereo recording spatial enhancement technique which retains the original panning / source location, proportionally mapped into the perceived expanded sound stage. The technique uses a time-frequency domain metric for retrieving the panning coefficients applied during the initial stereo mixing. Panning information is also used for separating the original single channel audio streams and finally for synthesizing the expanded sound field using binaural processing. The technique is mainly intended for playback applications over short-distant loudspeaker setups or headphones and, in the context of this work, it is subjectively evaluated in terms of the perceived sound field expansion and the sound-source spatial distribution accuracy.
This document provides an overview of recent developments in sound recognition techniques. It discusses several methods for sound recognition, including matching pursuit algorithms with MFCC features, probabilistic distance support vector machines using generalized gamma modeling of STE features, and frequency vector principal component analysis. The document also reviews related literature on environmental sound recognition using time-frequency audio features and sound event recognition. It aims to present an updated survey on sound recognition methods and discuss future research trends in the field.
This document discusses linear predictive coding (LPC) methods and horn noise detection. It begins with an introduction to speech coders and speech production modeling. It then covers the basic principles of LPC analysis, including the autocorrelation and covariance methods. It discusses solving the LPC equations and using LPC residue to detect horn noise by comparing the residue of speech, silence and known horn noise samples. The document provides results of adding speech and horn noise signals and detecting the horn noise. It concludes by listing references on speech coding algorithms, LPC, and speech processing.
This document discusses discrete-time signal processing and audio signal processing. It covers topics like discrete-time signals, the z-transform, discrete Fourier transform (DFT) and fast Fourier transform (FFT). The key points are:
- Audio signals are typically sampled at 44.1 kHz and quantized to 16 bits per sample.
- The z-transform and discrete Fourier transform (DTFT) are used to analyze discrete-time signals in the transform domain, similar to the Laplace transform and continuous-time Fourier transform for analog signals.
- The discrete Fourier transform (DFT) provides a computational tool to calculate Fourier transforms by sampling the frequency domain at discrete points, resulting in periodicity in the time and
Speech is the vocalizer form of human communication,and based upon the syntactic combination of lexical and vocabularies. The aim of speech coding is to compress the speech signal to the highest possible compression ratio bu t maintaining user acceptability.There are many methods for speech compression like Linear Pre dictive coding (LPC),Code Excited Linear Predictive coding (CELP),Sub-band coding,T ransform coding:- Fast Fourier Transform (FFT),Discrete Cosine Transform (DCT),Continuous Wavelet Transform (CWT),Discrete Wavelet Transform (DWT),Variance Fractal Compression (VFC),Discrete Cosine Transform (DCT),Psychoacoustics and etc. Few of them are discus in this paper.
The document discusses the Fast Fourier Transform (FFT) algorithm. It begins by explaining how the Discrete Fourier Transform (DFT) and its inverse can be computed on a digital computer, but require O(N2) operations for an N-point sequence. The FFT was discovered to reduce this complexity to O(NlogN) operations by exploiting redundancy in the DFT calculation. It achieves this through a recursive decomposition of the DFT into smaller DFT problems. The FFT provides a significant speedup and enables practical spectral analysis of long signals.
The document discusses speech compression using GSM RPE-LTP encoding. It begins with an introduction to GSM standards and architecture. It then describes how speech is generated and modeled in the GSM 6.10 vocoder using the RPE-LTP algorithm. The algorithm compresses speech by analyzing the signal to determine if it is voiced or unvoiced, encoding the periodicity of voiced sounds, and transmitting the filter parameters. At the receiver, the decoder uses these parameters to reconstruct the speech signal through linear predictive coding, long term prediction synthesis filtering, and residual pulse decoding.
This document summarizes a research paper on spectral analysis techniques for speech processing. It discusses how spectral subtraction is commonly used to reduce additive background noise in speech signals. The proposed method is a frequency-dependent spectral subtraction approach. Unlike conventional spectral subtraction that subtracts noise estimates uniformly across the entire spectrum, the proposed method allows flexible control over subtraction levels to reduce artifacts. Results show the method produces enhanced speech with lower residual noise and better quality than conventional spectral subtraction. In conclusion, a multi-band speech enhancement strategy is developed based on spectral subtraction to improve noise reduction while minimizing speech distortion.
Audio Noise Removal – The State of the Artijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
This document summarizes a research paper on pitch detection of speech synthesis using MATLAB. It discusses using an adaptable filter and peak-valley decision method to determine pitch marks for speech synthesis. Low-pass filtering and autocorrelation are used to detect pitch periods. An adaptive filter is designed to flatten spectral peaks. Peak and valley costs are calculated over each pitch period to determine pitch marks. Dynamic programming is then used to obtain the optimal pitch mark locations for high quality speech synthesis.
Voice morphing is a technique that modifies a source speaker's speech to sound like it was spoken by a target speaker. It works by analyzing the source speech into an excitation signal and filter components, then resynthesizing it with the pitch and vocal characteristics of the target speaker. The key steps are detecting the pitches of the source and target speakers, scaling the source pitch to match the target, then resynthesizing the source speech using the target's vocal filter characteristics and the pitch-scaled excitation signal. Voice morphing was developed in 1999 and has applications in text-to-speech, dubbing, voice disguising, and public announcement systems.
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Maskingdhia_naruto
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or
consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be
obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York
10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is
not permitted without direct permission from the Journal of the Audio Engineering Society.
Feature Extraction of Musical Instrument Tones using FFT and Segment AveragingTELKOMNIKA JOURNAL
A feature extraction for musical instrument tones that based on a transform domain approach was
proposed in this paper. The aim of the proposed feature extraction was to get the lower feature extraction
coefficients. In general, the proposed feature extraction was carried out as follow. Firstly, the input signal
was transformed using FFT (Fast Fourier Transform). Secondly, the left half of the transformed signal was
divided into a number of segments. Finally, the averaging results of that segments, was the feature
extraction of the input signal. Based on the test results, the proposed feature extraction was highly efficient
for the tones, which have many significant local peaks in the Fourier transform domain, because it only
required at least four feature extraction coefficients, in order to represent every tone.
This document proposes a noise reduction method for audio signals based on an LMS adaptive filter. It segments the noisy audio signal into frames and uses an NLMS adaptive algorithm to estimate the filter coefficients and minimize the mean square error between the clean signal and filter output. Simulation results show the proposed method significantly reduces noise and improves the signal to noise ratio by adaptively filtering the noisy audio signal in the time domain. Analysis of the output signal variance indicates the noise level is substantially decreased compared to the original noisy signal.
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
This document presents a modified least mean square (LMS) algorithm to reduce noise in real-time speech signals. The proposed approach modifies the standard LMS algorithm by incorporating a Wiener filter. Experiments are conducted on speech samples from the NOIZEUS database with various types of noise at different signal-to-noise ratios. Objective measures like segmental SNR, log likelihood ratio, Itakura-Saito spectral distance, and cepstrum are used to evaluate the performance of the proposed algorithm compared to the standard LMS algorithm. The results show that the modified LMS algorithm with Wiener filter outperforms the standard LMS algorithm in enhancing the quality of noisy speech signals based on the objective measure values.
The influence of sampling frequency on tone recognition of musical instrumentsTELKOMNIKA JOURNAL
Sampling frequency of musical instruments tone recognition generally follows the Shannon sampling theorem. This paper explores the influence of sampling frequency that does not follow the Shannon sampling theorem, in the tone recognition system using segment averaging for feature extraction and template matching for classification. The musical instruments we used were bellyra, flute, and pianica, where each of them represented a musical instrument that had one, a few, and many significant local peaks in the Discrete Fourier Transform (DFT) domain. Based on our experiments, until the sampling frequency is as low as 312 Hz, recognition rate performance of bellyra and flute tones were influenced a little since it reduced in the range of 5%. However, recognition rate performance of pianica tones was not influenced by that sampling frequency. Therefore, if that kind of reduced recognition rate could be accepted, the sampling frequency as low as 312 Hz could be used for tone recognition of musical instruments.
The document discusses spatial hearing and head-related transfer functions (HRTFs) for virtual audio. It covers measuring HRTFs using a KEMAR manikin, constructing filters based on measured HRTFs to localize sound, issues with non-individualized HRTFs, synthetic HRTF approaches, and techniques for externalization like reverberation and decorrelation. Applications mentioned include immersive environments, hearing aids, and representational sounds.
This document summarizes a research paper on speech enhancement using the signal subspace algorithm. It begins with an abstract describing how noise degrades speech quality and intelligibility in communication systems. It then provides background on speech enhancement objectives and commonly used methods like spectral subtraction and signal subspace. The paper describes the signal subspace algorithm and shows its ability to enhance speech signals by suppressing noise. Experimental results on sine waves with added Gaussian noise demonstrate improved peak signal-to-noise ratios when using the signal subspace method compared to the noisy signals. The conclusion is that the algorithm removes noise to a great extent from noisy speech.
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANIJNSA Journal
The document proposes a noise reduction technique for speech signals in wireless LAN using a linear prediction error filter (LPEF) and adaptive digital filter (ADF). It aims to improve the signal-to-noise ratio. The LPEF is used to predict the speech signal and generate a prediction error signal. The ADF then reconstructs and subtracts the background noise from the error signal to extract the speech. Additionally, the document demonstrates that wideband MRI can obtain images with quality identical to conventional MRI in terms of SNR. It involves simultaneously exciting and acquiring multiple slices using a wideband signal.
(2013) Rigaud et al. - A parametric model and estimation techniques for the i...François Rigaud
This document proposes a parametric model to jointly model the inharmonicity and tuning of pianos across their entire pitch range. It uses a small number of parameters to represent both the specific design characteristics of different piano types and tuner practices. An estimation algorithm is presented that can estimate the parameters from recordings of isolated notes or chords, assuming the played notes are known. The model aims to provide a synthetic description of a particular piano's tuning/inharmonicity pattern that can highlight tuner choices and be useful for applications like piano synthesis or transcription of piano music.
Experimental Evaluation of Distortion in Amplitude Modulation Techniques for ...Huynh MVT
Experimental Evaluation of Distortion in Amplitude Modulation Techniques for Parametric Loudspeakers
A PC (Intel Xeon with 16Gb of RAM, Intel Corporation, Santa Clara, California, USA)
Audio Measurements in the Presence of a High-Level Ultrasonic Carrier
This document discusses voice modification techniques. It compares four models for voice modification: LPC, H/S, TD-PSOLA, and MBR-PSOLA. TD-PSOLA relies on pitch-synchronous overlap-add and modifies the speech signal in the time domain based on analysis and synthesis markings. The document implements a voice modification system using TD-PSOLA that can modify vocal tract, pitch, and time scale parameters to change voice quality, such as making a female voice sound more husky or nasal. Results show the algorithm can effectively modify input voices as desired.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document summarizes and compares different speech enhancement methods for noisy Punjabi speech at the phoneme level. It describes spectral subtraction, Wiener filtering, Kalman filtering, and RASTA methods. It also proposes a hybrid method using features from Wiener and Kalman filtering. Phonemes of Punjabi language and the methodology are explained. Results applying various noise types at different signal-to-noise ratios on phonemes show the proposed method produces the best enhancement compared to other methods.
Vidyalankar final-essentials of communication systemsanilkurhekar
This document provides an overview of analog and digital communication systems. It discusses the basics of analog signals, frequency spectrum, and modulation. It then covers digital signals, terms, and performance metrics like data rate and bit error probability. Key concepts covered include Shannon capacity, signal energy and power, communication system blocks, filtering, and modulation. It also introduces concepts from probability theory and random processes used in analysis of communication systems like mean, autocorrelation, power spectral density, Gaussian processes, and noise. Examples of modulation techniques and noise sources in communication systems are briefly discussed.
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IAmr E. Mohamed
This document provides an overview of digital signal processing applications including digital spectrum analysis, speech processing, and radar. It discusses different types of digital spectrum analyzers including filter bank, swept, and FFT analyzers. It also covers topics related to speech processing like the anatomy of speech production, speech perception, voiced and unvoiced sounds, and phonemes. Common speech coding techniques are introduced such as vocoding, ADPCM, LPC, and CELP coding. Radar applications of DSP are also briefly mentioned.
Defense - Sound space rendering based on the virtual Sound space rendering ba...JunjieShi3
This document summarizes a thesis on developing and implementing an auditory display technique called ADVISE (Auditory Display based on the Virtual Sphere Model).
The document first reviews ADVISE and how it can render spatial audio for both virtual and real scenes using techniques like higher-order ambisonics and binaural rendering. It then reviews the adaptive rectangular decomposition (ARD) method for computational room acoustics simulation.
The document outlines the objective of improving ADVISE by developing a complete room acoustics program using ARD and integrating it into ADVISE. It describes implementing ADVISE to demonstrate spatial audio rendering for a sample music hall scene in Unity. Frequency limitations and computational costs of the
Similar to Audio Morphing for Percussive Sound Generation (20)
System Identification Based on Hammerstein Models Using Cubic Splinesa3labdsp
The document presents a new approach for identifying Hammerstein models, which are nonlinear systems composed of a static nonlinearity followed by a linear filter. The approach uses an adaptive Catmull-Rom cubic spline to model the static nonlinearity, instead of high-order polynomials. Experimental results on simulated and real-world systems show the spline-based approach more accurately identifies the nonlinear characteristics and outperforms an existing polynomial-based technique, especially for highly nonlinear systems. The linear filter is modeled using an adaptive IIR filter.
A Distributed System for Recognizing Home Automation Commands and Distress Ca...a3labdsp
The document describes a distributed system that recognizes home automation commands and distress calls in Italian. It consists of two units: a Local Multimedia Control Unit that recognizes commands/calls and manages communication, and a Central Management Unit that integrates home services and handles emergencies. The system uses acoustic echo cancellation and speech recognition to understand commands even in noisy environments. An evaluation of the system showed it achieved over 90% accuracy on headset microphone data and over 50% on distant microphone data.
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...a3labdsp
This document evaluates a multipoint equalization algorithm that combines fractional octave smoothing of impulse responses measured in multiple locations in a room or car. It investigates how the equalization performance is affected by varying parameters like the number of measurement positions and equalization zone size. The algorithm extracts a representative prototype response and inverse filter from the smoothed impulse responses. Tests show the proposed approach achieves better spectral deviation and equalization than a single-point method, and performance decreases but remains effective as the equalization zone is expanded to more distant positions.
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...a3labdsp
This document proposes a novel approach to decorrelating stereo acoustic signals for acoustic echo cancellation based on the psychoacoustic phenomenon of the "missing fundamental". The approach tracks and removes the pitch from one channel of the stereo signal using an adaptive notch filter, which greatly reduces inter-channel coherence in the lower spectrum without affecting signal quality. Experimental results show the proposed approach provides significant coherence reduction and faster convergence speed of adaptive filters compared to a masked noise injection method, while better preserving the stereo quality.
Mixed Time Frequency Approach for Multipoint Room Response Equalizationa3labdsp
A still open problem in the field of room response equalization is the development of perceptually useful mixed-phase equalizers. In a recent paper, a multipoint mixed-phase room response equalization system, integrating a minimum-phase multiple position room magnitude equalizer and a FIR group delay equalizer, was developed in the frequency domain. Starting from this approach, a mixed time-frequency algorithm is here proposed. The minimum-phase multiple position equalizer developed in the frequency domain, is combined with an all-pass FIR phase equalizer, designed in the time domain considering a suitable time-reversed version of a prototype function and taking advantage of the mixing time evaluation. Several tests have been performed considering real environments and comparing the proposed approach with the previous one, based on a group delay compensation. Subjective listening tests have also been done in a real environment, confirming the improvement in the perceived audio quality.
An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...a3labdsp
In the recent years, several techniques have been proposed in the literature in order to attempt the emulation of nonlinear electro-acoustic devices, such as compressors, limiters, and pre-amplifiers. Among them, the dynamic convolution technique is one of the most common approaches used to perform a nonlinear convolution. In this paper, an efficient DSP implementation of a nonlinear system emulation based on the dynamic convolution technique and principal component analysis is proposed with the aim of lowering the required workload and the global memory usage. Several results are reported in order to show the effectiveness of the proposed approach, taking into consideration a guitar pre-amplifier as a particular case study and also introducing comparisons with the existing techniques of the state of the art.
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...a3labdsp
In recent years, several techniques have been proposed in the literature in order to attempt the emulation of nonlinear electro-acoustic devices, such as compressors, distortions, and preamplifiers. Among them, the dynamic convolution technique is one of the most common approaches used to perform this task. In this paper an exhaustive objective and subjective analysis of a dynamic convolution operation based on principal components analysis has been performed. Taking into consideration real nonlinear systems, such as bass preamplifier, distortion, and compressor, comparisons with the existing techniques of the state of the art have been carried out in order to prove the effectiveness of the proposed approach.
An Efficient DSP Based Implementation of a Fast Convolution Approach with non...a3labdsp
"Finite impulse response convolution is one of the most widely used operation in digital signal processing field for filtering operations. In this context, low computationally demanding techniques become essential for calculating convolutions with low input/output latency in real scenarios, considering that the real time requirements are strictly related to the impulse response length. In this context, an efficient DSP implementation of a fast convolution approach is presented with the aim of lowering the workload required in applications like reverberation. It is based on a non uniform partitioning of the impulse response and a psychoacoustic technique derived from the human ear sensitivity. Several results are reported in order to prove the effectiveness of the proposed approach also introducing comparisons with the existing techniques of the state of the art."
Approximation of Real Impulse Response Using IIR Structures a3labdsp
In this paper, we propose a new approach to the approximation and simulation of a real impulse response. Starting from a preliminary analysis of the mixing time, the impulse response is decomposed in the time domain considering the early and late reflections. Therefore, an IIR structure composed of a cascade of second-order sections and four all-pass filters is employed to synthesize the first part of the impulse response, using a parametric optimization process in the frequency domain. Then, a recursive structure composed of comb and all-pass filters is used to synthesize the late reflections, exploiting a minimization criterion in the cepstral domain. Several results are reported taking into consideration a real impulse response, confirming the validity of the proposed approach.
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...a3labdsp
FIR convolution is a widely used operation in digital signal processing field, especially for filtering operations in real time scenarios. In this context, low computationally demanding techniques for calculating convolutions with low input/output latency become essential, considering that the real time requirements are strictly related to the impulse response length. In this paper, a multithreading real time implementation of a Non Uniform Partitioned Overlap and Save algorithm is proposed with the aim of lowering the workload required in applications like reverberation, also exploiting the human ear sensitivity. Several results are reported in order to show the effectiveness of the proposed approach in terms of computational cost, taking into consideration different impulse responses and also introducing comparisons with existing techniques of the state of the art.
A Hybrid Approach for Real-time Room Acoustic Response Simulationa3labdsp
Reverberation is a well known effect particularly important for music listening especially for recorded and live music. Generally, there are two approaches for artificial reverberation: the desired signal can be obtained by convolving the input signal with a measured impulse response (IR) or a synthetic one. Taking into account the advantages of both approaches, a hybrid artificial reverberation algorithm is presented. The early reflections are derived from a real IR, truncated considering the calculated mixing time, and the reverberation tail is obtained considering the Moorer's structure. The parameters defining this structure are derived from the analyzed IR, using a minimization criteria based on Simultaneous Perturbation Stochastic Approximation (SPSA). The obtained results showed a high-quality reverberator with a low computational load.
Optimized implementation of an innovative digital audio equalizera3labdsp
Digital audio equalization is one of the most common operations in the acoustic field, but its performance
depends on computational complexity and filter design techniques. Starting from a previous FIR
implementation based on multirate systems and filterbanks theory, an optimized digital audio equalizer
is derived. The proposed approach employs IIR filters to improve the filterbanks structure developed to
avoid ripple between adjacent bands. The effectiveness of the optimized implementation is shown comparing
it with the previous approach. The solution presented here has several advantages increasing
the equalization performance in terms of low computational complexity, low delay, and uniform frequency
response.
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
The ever increasing energy requirements of supercomputers and server farms is driving the scientific and industrial communities to take in deeper consideration the energy efficiency of computing equipments. This contribution addresses the issue proposing a cluster of ARM processors for high-performance computing. The cluster is composed of five BeagleBoard-xM, with one board managing the cluster, and the other boards executing the actual processing. The software platform is based on the Angstrom GNU/Linux distribution and is equipped with a distributed file system to ease sharing data and code among the nodes of the cluster, and with tools for managing tasks and monitoring the status of each node. The computational capabilities of the cluster have been assessed through High-Performance Linpack and a cluster-wide speaker diarization algorithm, while power consumption has been measured using a clamp meter. Experimental results obtained in the speaker diarization task showed that the energy efficiency of the BeagleBoard-xM cluster is comparable to the one of a laptop computer equipped with a Intel Core2 Duo T8300 running at 2.4 GHz. Furthermore, removing the bottleneck due to the Ethernet interface, the BeagleBoard-xM cluster is able to achieve a superior energy efficiency.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
1. Audio Morphing
Proposed Algorithm
Results
Conclusion
Audio Morphing for Percussive Hybrid Sound
Generation
Andrea Primavera1
, Francesco Piazza1
and Joshua D. Reiss2
1
A3lab - DIBET - Universita’ Politecnica delle
Marche - Ancona - ITALY
2
C4DM - Queen Mary University of London -
London - UK
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 1/21
2. Audio Morphing
Proposed Algorithm
Results
Conclusion
Introduction
State of the Art
Audio Morphing is typically employed to accomplish two different
tasks:
• Obtaining a smooth transition between two sounds;
• Generating a new intermediate sound which has the
characteristics of the two given samples (e.g., timbre and
duration).
Topic
An automatic audio morphing technique applied to percussive musical
instruments has been developed in this work.
Proposed Innovation
Creating new hybrid percussive sounds perceptually interesting (e.g.,
morphing among different brands of the same instrument).
AIM
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 2/21
3. Audio Morphing
Proposed Algorithm
Results
Conclusion
Introduction
State of the Art
Audio Morphing is typically employed to accomplish two different
tasks:
• Obtaining a smooth transition between two sounds;
• Generating a new intermediate sound which has the
characteristics of the two given samples (e.g., timbre and
duration).
Topic
An automatic audio morphing technique applied to percussive musical
instruments has been developed in this work.
Proposed Innovation
Creating new hybrid percussive sounds perceptually interesting (e.g.,
morphing among different brands of the same instrument).
AIM
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 2/21
4. Audio Morphing
Proposed Algorithm
Results
Conclusion
Introduction
State of the Art
Audio Morphing is typically employed to accomplish two different
tasks:
• Obtaining a smooth transition between two sounds;
• Generating a new intermediate sound which has the
characteristics of the two given samples (e.g., timbre and
duration).
Topic
An automatic audio morphing technique applied to percussive musical
instruments has been developed in this work.
Proposed Innovation
Creating new hybrid percussive sounds perceptually interesting (e.g.,
morphing among different brands of the same instrument).
AIM
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 2/21
5. Audio Morphing
Proposed Algorithm
Results
Conclusion
Introduction
State of the Art
During the last decades several algorithms have been presented to morph
audio signals: linear interpolation, sinusoidal plus residual [1] [2] and
spectral envelope using various features [3].
One of the first approaches used to
implement audio morphing.
y(t) = αs1(t) + (1 − α)s2(t)
PRO: Low computational cost;
CONS: When the timbre of the in-
put signals is slightly different, both
the references could be individually
perceivable.
Linear Interpolation (LI) Based on harmonic modeling [4] [5]:
s(t) =
N
n=1
An(t)cos[θn(t)] + e(t)
The morphed sound is obtained in-
terpolating the fundamental frequen-
cy, the harmonics timbre, and the re-
sidual envelope.
PRO: Capability of morphing sounds
with different timbre;
CONS: Higher computational co-
st than LI, some problems with
percussive sounds.
Sinusoidal Plus Noise (SMS)
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 3/21
6. Audio Morphing
Proposed Algorithm
Results
Conclusion
Introduction
State of the Art
During the last decades several algorithms have been presented to morph
audio signals: linear interpolation, sinusoidal plus residual [1] [2] and
spectral envelope using various features [3].
One of the first approaches used to
implement audio morphing.
y(t) = αs1(t) + (1 − α)s2(t)
PRO: Low computational cost;
CONS: When the timbre of the in-
put signals is slightly different, both
the references could be individually
perceivable.
Linear Interpolation (LI) Based on harmonic modeling [4] [5]:
s(t) =
N
n=1
An(t)cos[θn(t)] + e(t)
The morphed sound is obtained in-
terpolating the fundamental frequen-
cy, the harmonics timbre, and the re-
sidual envelope.
PRO: Capability of morphing sounds
with different timbre;
CONS: Higher computational co-
st than LI, some problems with
percussive sounds.
Sinusoidal Plus Noise (SMS)
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 3/21
7. Audio Morphing
Proposed Algorithm
Results
Conclusion
Introduction
State of the Art
During the last decades several algorithms have been presented to morph
audio signals: linear interpolation, sinusoidal plus residual [1] [2] and
spectral envelope using various features [3].
One of the first approaches used to
implement audio morphing.
y(t) = αs1(t) + (1 − α)s2(t)
PRO: Low computational cost;
CONS: When the timbre of the in-
put signals is slightly different, both
the references could be individually
perceivable.
Linear Interpolation (LI) Based on harmonic modeling [4] [5]:
s(t) =
N
n=1
An(t)cos[θn(t)] + e(t)
The morphed sound is obtained in-
terpolating the fundamental frequen-
cy, the harmonics timbre, and the re-
sidual envelope.
PRO: Capability of morphing sounds
with different timbre;
CONS: Higher computational co-
st than LI, some problems with
percussive sounds.
Sinusoidal Plus Noise (SMS)
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 3/21
8. Audio Morphing
Proposed Algorithm
Results
Conclusion
Proposed Algorithm
PreProcessing
Processing
An automatic audio morphing algorithm employed to generate new
hybrid percussive sounds is presented here. It is based on:
• PreProcessing (frequency domain);
• Processing exploiting linear interpolation (time domain).
Proposed Algorithm
Fig.1 Block diagram of the proposed morphing algorithm: yellow indicates the
frequency domain while blue represents the time domain.
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 4/21
9. Audio Morphing
Proposed Algorithm
Results
Conclusion
Proposed Algorithm
PreProcessing
Processing
The PreProcessing operation is performed on both the signal references
in order to obtain hybrid sounds that change in a linear fashion when the
interpolation factor varies linearly.
It allows to align the samples before executing the morphing exploiting
cross-correlation.
1. Time Align
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 5/21
10. Audio Morphing
Proposed Algorithm
Results
Conclusion
Proposed Algorithm
PreProcessing
Processing
The PreProcessing operation is performed on both the signal references
in order to obtain hybrid sounds that change in a linear fashion when the
interpolation factor varies linearly.
It allows to align the samples before executing the morphing exploiting
cross-correlation.
1. Time Align
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 5/21
11. Audio Morphing
Proposed Algorithm
Results
Conclusion
Proposed Algorithm
PreProcessing
Processing
The PreProcessing operation is performed on both the signal references
in order to obtain hybrid sounds that change in a linear fashion when the
interpolation factor varies linearly.
• The AR model is employed instead of the ADSR;
• Attack and release are determined using the Amplitude Centroid
Trajectory [6] [7] approach and evaluating the T60 [8] for each
audio sample.
Fig. 2: Time evolution of the spectral
centroid for a percussive sample.
SpectralCentroid[t] =
M
b=1 fb[t]ab[t]
M
b=1 ab[t]
where:
f is the frequency and a is the amplitude of
the b up to M bands, evaluated using FFT.
2. AR Model Analysis
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 6/21
12. Audio Morphing
Proposed Algorithm
Results
Conclusion
Proposed Algorithm
PreProcessing
Processing
The PreProcessing operation is performed on both the signal references
in order to obtain hybrid sounds that change in a linear fashion when the
interpolation factor varies linearly.
The release portions are time stretched or compressed as a function
of the interpolation factor value:
tr = αtr1 + (1 − α)tr2
3. Time Stretching
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 7/21
13. Audio Morphing
Proposed Algorithm
Results
Conclusion
Proposed Algorithm
PreProcessing
Processing
The Processing operation is executed on the preprocessed signal
references using the linear interpolation approach.
Since percussive sounds are typically characterized by a noisy spectrum
and “similar” timbre this approach is preferable to additive synthesis.
Linear Interpolation
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 8/21
14. Audio Morphing
Proposed Algorithm
Results
Conclusion
Proposed Algorithm
PreProcessing
Processing
The Processing operation is executed on the preprocessed signal
references using the linear interpolation approach.
Since percussive sounds are typically characterized by a noisy spectrum
and “similar” timbre this approach is preferable to additive synthesis.
Linear Interpolation
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 8/21
15. Audio Morphing
Proposed Algorithm
Results
Conclusion
Results
Analysis of Time and Spectral Features
1 Listening Test (MOS)
2 Listening Test (MDS)
Several tests have been carried out to evaluate the effectiveness of the
proposed approach through objective and subjective comparisons.
Real recordings of drum kits
have been employed using the
BFD2 sound library [9].
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 9/21
16. Audio Morphing
Proposed Algorithm
Results
Conclusion
Results
Analysis of Time and Spectral Features
1 Listening Test (MOS)
2 Listening Test (MDS)
Several tests have been carried out to evaluate the effectiveness of the
proposed approach through objective and subjective comparisons.
Spectral centroid evolution and release time have been evaluated as a
function of the interpolation factor.
Analyzed algorithms:
• Linear Interpolation (LI);
• Linear Interpolation with Preprocessing (LIP);
• Sinusoidal plus Residual (SMS).
Morphing examples:
• Different kind of snare drums (i.e, electronic and classic snare);
• Toms of different size and brand;
• Hi-hats of different brand.
Analysis of Time and Spectral Features (Objective Measure)
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 10/21
17. Audio Morphing
Proposed Algorithm
Results
Conclusion
Results
Analysis of Time and Spectral Features
1 Listening Test (MOS)
2 Listening Test (MDS)
Several tests have been carried out to evaluate the effectiveness of the
proposed approach through objective and subjective comparisons.
In order to evaluate the audio perception of the produced hybrid sounds
two different listening tests [10] have been performed:
• Mean Opinion Score (MOS) −→ Audio quality, estimated
interpolation factor;
• MultiDimensional Scaling (MDS) −→ Perceptual space.
Analyzed algorithms:
• Linear Interpolation (LI);
• Linear Interpolation with Preprocessing (LIP);
• Sinusoidal plus Residual (SMS).
Morphing examples:
• Different kind of snare drums (i.e, electronic and classic snare);
Listening test (Subjective Measure)
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 11/21
18. Audio Morphing
Proposed Algorithm
Results
Conclusion
Results
Analysis of Time and Spectral Features
1 Listening Test (MOS)
2 Listening Test (MDS)
SNARE TOM HI-HAT
0 0.2 0.4 0.6 0.8 1
0.8
1
1.2
1.4
1.6
1.8
Interpolation Factor
ReleaseTime(sec)
LI
LIP
SMS
0 0.2 0.4 0.6 0.8 1
2.8
3
3.2
3.4
3.6
3.8
4
Interpolation Factor
ReleaseTime(sec)
LI
LIP
SMS
0 0.2 0.4 0.6 0.8 1
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
Interpolation Factor
ReleaseTime(sec)
LI
LIP
SMS
Fig. 3 Release time as a function of different interpolation factor.
0 0.2 0.4 0.6 0.8 1
2000
2500
3000
3500
Interpolation Factor
Spectralcentroid(Hz)
LI
LIP
SMS
0 0.2 0.4 0.6 0.8 1
820
840
860
880
900
920
940
960
980
Interpolation Factor
Spectralcentroid(Hz)
LI
LIP
SMS
0 0.2 0.4 0.6 0.8 1
8000
8500
9000
9500
Interpolation Factor
Spectralcentroid(Hz)
LI
LIP
SMS
Fig.4 Spectral centroid evolution as a function of different interpolation factor.
• Release time changes linearly with the change of the interp. factor;
• The spectral centroid does not seem affected by the preprocessing.
Considerations
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 12/21
19. Audio Morphing
Proposed Algorithm
Results
Conclusion
Results
Analysis of Time and Spectral Features
1 Listening Test (MOS)
2 Listening Test (MDS)
Based on MOS, the estimated interpolation factor and the perceived
audio quality for LI, LIP and SMS are evaluated in the morphing between
a classic and electronic snare.
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Interpolation Factor
Est.Interp.Factor
LI
LIP
SMS
Fig. 5 Estimated interpolation factor.
LI LIP SMS
Quality 0.90 0.88 0.10
STD 0.06 0.07 0.08
Tab. 1 Percived audio quality expressed in
a scale from 0 to 1 (0 bad - 1 excellent).
• For the proposed approach (LIP) the estimated interpolation factor
changes roughly linearly with the change of the interpolation factor.
• For linear interpolation (LI) the factor 0.5 was perceived as near 0.8.
• Samples generated using SMS sound unnatural, this confirm the
harmonic models limitation in the reproduction of percussive sound.
Considerations
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 13/21
20. Audio Morphing
Proposed Algorithm
Results
Conclusion
Results
Analysis of Time and Spectral Features
1 Listening Test (MOS)
2 Listening Test (MDS)
Based on MDS [11] [12], the test aims to recreate the corresponding
perceptual spaces of the hybrid sound generated in morphing between a
classic and electronic snare with LI and LIP.
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
1
2
3
4
5
6
7
8
1 Dimension
2Dimension
Fig.6 MDS analysis obtained with LI.
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
1
2 3 4
5
6
7
8
1 Dimension
2Dimension
Fig.7 MDS analysis obtained with LIP.
LI LIP
CORR 0.986 0.980
P-VAL 7.2*10−6 1.9*10−5
Tab. 2 Correlation between first
dimension and release time.
LI LIP
CORR 0.66 0.77
P-VAL 0.07 0.02
Tab. 3 Correlation between second
dimension and audio quality.
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 14/21
21. Audio Morphing
Proposed Algorithm
Results
Conclusion
Results
Analysis of Time and Spectral Features
1 Listening Test (MOS)
2 Listening Test (MDS)
Fig.8 Correlation between first
dimension and release time for LI and
LIP.
The curves have been shifted on
the y-axis in order to enhance the
readability.
• High correlation between the
release time and the
movement along the first
dimension.
• Release time is an important
factor in the evaluation of
percussive audio morphing
algorithms.
• Although the correlation
between the second
dimension and the audio
quality is not zero, the values
obtained are not sufficient to
establish a strong relationship
between them.
Considerations
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 15/21
22. Audio Morphing
Proposed Algorithm
Results
Conclusion
Conclusion
Future Works
Questions
Bibliography
In conclusion:
• An automatic audio morphing technique applied to percussive
musical instruments has been presented in this work.
• The presented technique is based on preprocessing of the sound and
linear interpolation time domain.
• Different tests have been carried out in order to evaluate the
proposed approach, in terms of subjective evaluation and objective
measures comparing it with the previous state of the art.
• Objective analysis confirms that release time changes linearly with
the change of the interpolation factor while spectral centroid does
not seem affected by the preprocessing.
• Listening tests confirm a linear evolution in the perception of the
hybrid sounds using the presented approach.
• MDS multidimensional scaling suggests that release time is the most
important feature in the perception of hybrid percussive sounds.
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 16/21
23. Audio Morphing
Proposed Algorithm
Results
Conclusion
Conclusion
Future Works
Questions
Bibliography
Future work will be oriented toward:
• Further investigation of the proposed approach analyzing the
obtained performance for more dissimilar sounds.
• Real time implementation of the approach, considering an embedded
platform.
• Refinement of the preprocessing phase, extending it with the ADSR
model.
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 17/21
25. Audio Morphing
Proposed Algorithm
Results
Conclusion
Conclusion
Future Works
Questions
Bibliography
M. H. Serra, D Rubine, and R. Dannenberg,
“Analysis and synthesis of tones by spectral interpolation,”
J. Audio Eng. Soc., vol. 38, no. 3, pp. 111–128, March. 1990.
N Osaka,
“Timbre interpolation of sounds using a sinusoidal model,”
in Proc. Int. Computer Music Conference, Banff, Canada, Sep 1995,
vol. 34.
M. Slaney, M. Covell, and B. Lassiter,
“Automatic Audio Morphing,”
in Proc. IEEE International Conference on Acoustics, Speech and
Signal Processing, Atlanta, May. 1996.
X. Serra and J. Smith,
“Spectral Modeling Synthesis: A Sound Analysis/Synthesis System
Based on a Deterministic Plus Stochastic Decomposition,”
J. Computer Music, vol. 14, no. 4, pp. 12–24, 1990.
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 19/21
26. Audio Morphing
Proposed Algorithm
Results
Conclusion
Conclusion
Future Works
Questions
Bibliography
F. Boccardi and C. Drioli,
“Sound morphing with Gaussian mixture models,”
in Proc. International Conference on Digital Audio Effects),
Limerick, Ireland, December 2001.
John Hajda,
“A New Model for Segmenting the Envelope of Musical Signals: The
Relative Salience of Steady State versus Attack, Revisited,”
in Proc. 101th Audio Engineering Society Convention (AES’96), Los
Angeles, USA, Nov. 1996.
J.J. Burred, X. Rodet, and M. Caetano,
“Automatic Segmentation of the Temporal Evolution of Isolated
Acoustic Musical Instrument Sounds Using Spectro-Temporal Cues,”
in Proc. International Conference on Digital Audio Effects
(DAFX’10), Graz, AU, Sep. 2010.
Sabine, W.C.,
“Collected Papers on Acoustics,” 1964, reprinted by Dover, New
York.
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 20/21
27. Audio Morphing
Proposed Algorithm
Results
Conclusion
Conclusion
Future Works
Questions
Bibliography
“BFD2 version 2.0.1 Acoustic Drum Production Environment,
fxpansion,” .
ITU-R BS. 1534,
“Method for subjective listening tests of intermediate audio quality,”
2001.
D. Williams and T. Brookes,
“Perceptually motivated audio morphing: warmth,”
in Proc. of 128th Audio Engineering Society Convention (AES’10),
London, UK, May. 2010.
A. Zacharakis and J. Reiss,
“An additive synthesis technique for independent modification of the
auditory perceptions of brightness and warmth,”
in Proc. of 130th Audio Engineering Society Convention (AES’11),
London, UK, May. 2011.
Andrea Primavera Audio Morphing for Percussive Hybrid Sound Generation 21/21