1) The document proposes a DNN-based method to solve the permutation problem in frequency-domain independent component analysis (FDICA) for audio source separation.
2) Conventional permutation solvers sometimes fail to correctly align the separated signal components across frequencies. The proposed method trains a DNN on simulated permutation data to learn how to align components.
3) In experiments separating reverberant speech mixtures, the proposed DNN-based method improved the signal-to-distortion ratio by about 8 dB, outperforming other techniques and approaching the upper limit of performance.
DNN-based frequency component prediction for frequency-domain audio source se...Kitamura Laboratory
DNN-based frequency component prediction for frequency-domain audio source separation. The paper proposes a new framework that combines frequency-domain audio source separation with DNN to achieve high quality separation with lower computational cost. The framework applies multichannel NMF to separate sources in the low frequency band. A DNN then predicts the separated source components in the high frequency band based on the low frequency separated sources and mixture. Experiments show the mixture components help the DNN expand the bandwidth of separated sources, and the proposed framework achieves similar separation quality to fullband NMF with half the computational cost.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution," Télécom ParisTech, Invited Lecture, September 4th, 2017.
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
Presented at The 2015 European Signal Processing Conference (EUSIPCO 2015, international conference)
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Relaxation of rank-1 spatial constraint in overdetermined blind source separation," Proceedings of The 2015 European Signal Processing Conference (EUSIPCO 2015), pp.1271-1275, Nice, France, September 2015 (Invited Special Session).
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, and Hiroshi Saruwatari, "Experimental analysis of optimal window length for independent low-rank matrix analysis," Proceedings of The 2017 European Signal Processing Conference (EUSIPCO 2017), pp. 1210–1214, Kos, Greece, August 2017 (Invited Special Session).
Presented at 25th European Signal Processing Conference (EUSIPCO) 2017, "SS14: Multivariate Analysis for Audio Signal Source Enhancement," 14:30-16:10, August 30, 2017.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extensions," Ohio State University, Invited Lecture, December 15th, 2017.
This document summarizes a student project on blind source separation of audio signals. The student was able to recover three independent audio source signals from three mixtures with over 99.97% accuracy. Blind source separation is the separation of source signals from mixed signals without information about the sources or mixing process. It has applications in areas like speech processing. The student acknowledges help from advisors and friends. They provide background on blind source separation and describe their methodology, results, and references.
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceDaichi Kitamura
Daichi Kitamura presented his research on audio source separation. He discussed using low-rank modeling of spectrograms and non-negative matrix factorization to separate sources based on their structural properties in supervised settings. He also discussed using statistical independence between sources and the central limit theorem as the basis for blind source separation via independent component analysis. The talk covered applications of source separation, demonstrations of techniques, and challenges like basis mismatch for supervised methods and permutation problems for blind separation.
DNN-based frequency component prediction for frequency-domain audio source se...Kitamura Laboratory
DNN-based frequency component prediction for frequency-domain audio source separation. The paper proposes a new framework that combines frequency-domain audio source separation with DNN to achieve high quality separation with lower computational cost. The framework applies multichannel NMF to separate sources in the low frequency band. A DNN then predicts the separated source components in the high frequency band based on the low frequency separated sources and mixture. Experiments show the mixture components help the DNN expand the bandwidth of separated sources, and the proposed framework achieves similar separation quality to fullband NMF with half the computational cost.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution," Télécom ParisTech, Invited Lecture, September 4th, 2017.
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
Presented at The 2015 European Signal Processing Conference (EUSIPCO 2015, international conference)
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Relaxation of rank-1 spatial constraint in overdetermined blind source separation," Proceedings of The 2015 European Signal Processing Conference (EUSIPCO 2015), pp.1271-1275, Nice, France, September 2015 (Invited Special Session).
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, and Hiroshi Saruwatari, "Experimental analysis of optimal window length for independent low-rank matrix analysis," Proceedings of The 2017 European Signal Processing Conference (EUSIPCO 2017), pp. 1210–1214, Kos, Greece, August 2017 (Invited Special Session).
Presented at 25th European Signal Processing Conference (EUSIPCO) 2017, "SS14: Multivariate Analysis for Audio Signal Source Enhancement," 14:30-16:10, August 30, 2017.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extensions," Ohio State University, Invited Lecture, December 15th, 2017.
This document summarizes a student project on blind source separation of audio signals. The student was able to recover three independent audio source signals from three mixtures with over 99.97% accuracy. Blind source separation is the separation of source signals from mixed signals without information about the sources or mixing process. It has applications in areas like speech processing. The student acknowledges help from advisors and friends. They provide background on blind source separation and describe their methodology, results, and references.
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceDaichi Kitamura
Daichi Kitamura presented his research on audio source separation. He discussed using low-rank modeling of spectrograms and non-negative matrix factorization to separate sources based on their structural properties in supervised settings. He also discussed using statistical independence between sources and the central limit theorem as the basis for blind source separation via independent component analysis. The talk covered applications of source separation, demonstrations of techniques, and challenges like basis mismatch for supervised methods and permutation problems for blind separation.
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
This document describes a proposed method for online divergence switching in a hybrid music source separation system. The hybrid system uses directional clustering for spatial separation followed by supervised nonnegative matrix factorization (SNMF) for spectral separation. The optimal divergence for SNMF depends on the amount of spectral gaps ("chasms") caused by directional clustering, with KL-divergence preferred for many chasms and Euclidean distance preferred when chasms are few. The proposed method divides the online spectrogram into blocks and selects the optimal divergence for each block based on its chasm rate, allowing real-time adaptation to achieve high separation accuracy for any source spatial conditions. Experiments show the proposed method outperforms using a single divergence.
Efficient initialization for nonnegative matrix factorization based on nonneg...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis," The 15th International Workshop on Acoustic Signal Enhancement (IWAENC 2016), Xi'an, China, September 2016.
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
The document proposes a new supervised nonnegative matrix factorization (SNMF) method and hybrid method for multichannel signal separation. It analyzes the optimal divergence criterion for the SNMF with spectrogram restoration ability. The key points are:
1. A generalized cost function is introduced to extend SNMF to optimize the divergence criterion.
2. Theoretical analysis based on a data generation model finds the optimal divergence for basis extrapolation in spectrogram restoration is around Euclidean distance.
3. Experiments show the proposed hybrid method using Euclidean distance outperforms other methods for both instantaneous mixtures and real recordings, achieving the best separation quality measured by signal-to-distortion ratio.
Depth estimation of sound images using directional clustering and activation-...Daichi Kitamura
Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, "Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.437-440, Hawaii, USA, March 2014 (Student Paper Award).
Blind Source Separation using Dictionary LearningDavide Nardone
The sparse decomposition of images and signals found great use in the field of: Compression, Noise removal and also in the Sources separation. This implies the decomposition of signals in the form of linear combinations with some elements of a redundant dictionary. The dictionary may be either a fixed dictionary (Fourier, Wavelet, etc) or may be learned from a set of samples. The algorithms based on learning the dictionary can be applied to a broad class of signals and have a better compression performance than methods based on fixed dictionary. Here we present a Compressed Sensing (CS) approach with an adaptive dictionary for solving a Determined Blind Source Separation (DBSS). The proposed method has been developed by reformulating a DBSS as Sparse Coding (SC) problem. The algorithm consist of few steps: Mixing matrix estimation, Sparse source separation and Source reconstruction. A sparse mixture of the original source signals has been used for the estimating the mixing matrix which have been used for the reconstruction of the of the source signals. A 'block signal representation' is used for representing the mixture in order to greatly improve the computation efficiency of the 'mixing matrix estimation' and the 'signal recovery' processes without particularly lose separation accuracy. Some experimental results are provided to compare the computation and separation performance of the method by varying the type of the dictionary used, be it fixed or an adaptive one. Finally a real case of study in the field of the Wireless Sensor Network (WSN) is illustrated in which a set of sensor nodes relay data to a multi-receiver node. Since more nodes transmits messages simultaneously it's necessary to separate the mixture of information at the receiver, thus solving a BSS problem.
Superresolution-based stereo signal separation via supervised nonnegative mat...Daichi Kitamura
This document presents a new method called regularized superresolution-based nonnegative matrix factorization (NMF) for multichannel music signal separation. The proposed method addresses limitations of existing approaches like directional clustering and penalized supervised NMF. It utilizes directional clustering to separate sources by direction, then applies superresolution-based NMF to extrapolate missing components from the target source using supervised bases, regularized by an index matrix. An evaluation compares this hybrid approach to other methods, finding it achieves higher source separation quality in terms of SDR, SIR and SAR scores.
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
This document summarizes a research presentation on an online divergence switching method for a hybrid source separation technique. The hybrid method combines directional clustering for spatial separation with supervised nonnegative matrix factorization (SNMF) for spectral separation. The proposed method switches between KL divergence and Euclidean distance for the SNMF, depending on the amount of spectral gaps from the directional clustering. When there are many gaps, Euclidean distance is better for basis extrapolation. When gaps are fewer, KL divergence gives better separation. In experiments, the proposed online switching method outperformed using only one divergence, achieving higher signal-to-distortion ratios for music source separation.
Depth Estimation of Sound Images Using Directional Clustering and Activation...奈良先端大 情報科学研究科
The document proposes two methods for estimating the depth of sound images using direction of arrival (DOA) information extracted from stereo signals.
Method 1 estimates depth based on the shape of the DOA distribution modeled using a generalized Gaussian distribution (GGD), where a smoother distribution indicates a source is farther away.
Method 2 applies activation-shared nonnegative matrix factorization (NMF) to extract features while maintaining directional information and reduce noise interfering with DOA estimation. Conventional NMF generates artificial fluctuations, while shared activation preserves source direction.
An experiment calculates the correlation between estimated and reference depths from 6 datasets containing varying source combinations. Results show the proposed method achieves high correlation, confirming its efficacy over conventional methods
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
The document summarizes speech recognition front-end technologies including voice activity detection (VAD) and speech enhancement. It discusses conventional signal processing based approaches and more recent deep learning based methods. For VAD, it describes adaptive context attention models that can dynamically adjust the context used based on noise type and SNR. For speech enhancement, it proposes a two-step neural network approach consisting of a prior network that makes multiple predictions from noisy features and a post network that combines these using a boosting method to produce enhanced features, allowing end-to-end training without an explicit masking step. The approach aims to better exploit neural network modeling power while reducing computation cost compared to conventional methods or single-step deep learning frameworks.
The document discusses adaptive channel equalization using neural networks. It provides an overview of neural networks and their application to channel equalization. Specifically, it summarizes various neural network architectures that have been used for equalization, including multilayer perceptrons, functional link artificial neural networks, Chebyshev neural networks, and radial basis function networks. It compares the bit error rate performance of these different neural network equalizers with traditional linear equalizers such as LMS and RLS. Overall, the document finds that neural network equalizers can better handle nonlinear channel distortions compared to linear equalizers and that radial basis function networks provide particularly good performance for channel equalization applications.
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
Abstract Automatic speech recognition is an important topic of speech processing. This paper presents the use of an Artificial Neural Network (ANN) for isolated word recognition. The Pre-processing is done and voiced speech is detected based on energy and zero crossing rates (ZCR). The proposed approach used in speech recognition is Mel Frequency Cepstral Coefficients (MFCC) and combine features of both MFCC and Linear Predictive Coding (LPC). The back-propagation is used as a classifier. The recognition accuracy is increased when combine features of both LPC and MFCC are used as compared to only MFCC approach using Neural Network as a classifier.. Keywords: Pre-processing, Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC), Artificial Neural Network (ANN).
This document proposes a speaker-dependent WaveNet vocoder to generate high-quality speech from acoustic features. It uses a WaveNet model conditioned on mel-cepstral coefficients and fundamental frequency to directly model the relationship between acoustic features and speech waveforms. Evaluation shows the proposed method improves sound quality over traditional vocoders, as measured by objective metrics and subjective listening tests. Future work will apply this approach to other tasks and make the model independent of individual speakers.
Voice Activity Detection using Single Frequency FilteringTejus Adiga M
1. Voice Activity Detection (VAD) aims to locate speech segments in an input signal corrupted by noise by classifying frames as speech or noise.
2. Several time domain and frequency domain algorithms are discussed for VAD, including short-term energy, zero crossing rate, frequency subband distance measure, and long-term spectral flatness measure.
3. Single frequency filtering is also described, which analyzes envelopes at discrete frequencies to classify frames based on a learned noise floor.
During past a few months, I spent some time on the subject of "In Band Full-Duplex" wireless communications, because it is related to echo cancellation that I worked on many years ago. I put together a presentation on this subject for share with people have same interest.
It should be noted that what shown in the presentation are based on my experience and reflect my understanding and opinions. As usual, my opinions are in general conservative and may or may not be totally correct. Please let me know if you have any comments. Thanks.
The document discusses squarer-based timing recovery techniques for digital communications. It involves squaring the received signal to extract the periodic component that is related to the symbol timing. For PAM signals, squaring produces a signal with a frequency of 1/2T that can be used to generate a sampling clock of 1/T. For QAM signals, the envelope is squared. This technique extracts the timing information from the signal transitions but does not necessarily lock to the optimal timing phase. Performance depends on the signal bandwidth around the transitional regions.
발표자: 배재성(KAIST 석사과정)
발표일: 2018.10.
최근 딥러닝을 이용한 방법은 다양한 음성 인식 과제에서 괄목할 만한 성과를 내고 있습니다. 특히 Convolutional Neural Network (CNN)을 이용한 방식은 지역적인 특징 (local feature)들을 효과적으로 잡아낼 수 있기 때문에 비교적 짧은 시간 의존도를 가지는 음성 키워드 인식이나 음소 단위 인식과 같은 과제들에서 활발히 사용되고 있습니다. 그러나 CNN은 낮은 레벨의 특징들 간의 공간적 관계성을 고려하지 않는다는 한계점이 있습니다. 이를 극복하기 위해 캡슐 네트워크 구조를 도입하여 음성 스펙트로그램에서 추출된 특징들의 공간적 관계성을 고려하고자 하였습니다. 구글 음성 단어 데이터셋에서 CNN과 그 성능을 비교해 보았으며, 깨끗한 환경과 잡음 환경 모두에서 주목할만한 성능 향상을 이끌어 냈습니다.
This document discusses the process of sampling in signal processing. It defines key terms like analog and digital signals, sampling frequency, and samples. It explains how sampling works by taking regular measurements of a continuous signal's amplitude over time. This converts it into a discrete-time signal. It discusses applications of sampling like audio sampling, where signals are typically sampled above 20 kHz. It also discusses video sampling rates and speech sampling rates. The document contains examples and diagrams to illustrate these concepts.
This document summarizes adaptive signal processing techniques for acoustic echo cancellation. It defines acoustic echo as sound from a loudspeaker picked up by a microphone in the same room. Acoustic echo cancellation uses an adaptive filter to model the echo path and subtract the predicted echo from the microphone signal. The document reviews common adaptive algorithms for echo cancellation, including LMS, NLMS, RLS, APA, FAP, and VSS-APA, comparing their convergence speed, complexity, and performance in different noise conditions. FAP provides faster convergence than NLMS for speech signals while having lower complexity than APA. VSS-APA uses variable step sizes to improve performance during double-talk and under-modeling scenarios.
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
Fumiya Hasuike, Daichi Kitamura, and Rui Watanabe,"DNN-based frequency-domain permutation solver for multichannel audio source separation," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2022), pp. 872–877, Chiang Mai, Thailand, November 2022.
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
This document describes a proposed method for online divergence switching in a hybrid music source separation system. The hybrid system uses directional clustering for spatial separation followed by supervised nonnegative matrix factorization (SNMF) for spectral separation. The optimal divergence for SNMF depends on the amount of spectral gaps ("chasms") caused by directional clustering, with KL-divergence preferred for many chasms and Euclidean distance preferred when chasms are few. The proposed method divides the online spectrogram into blocks and selects the optimal divergence for each block based on its chasm rate, allowing real-time adaptation to achieve high separation accuracy for any source spatial conditions. Experiments show the proposed method outperforms using a single divergence.
Efficient initialization for nonnegative matrix factorization based on nonneg...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis," The 15th International Workshop on Acoustic Signal Enhancement (IWAENC 2016), Xi'an, China, September 2016.
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
The document proposes a new supervised nonnegative matrix factorization (SNMF) method and hybrid method for multichannel signal separation. It analyzes the optimal divergence criterion for the SNMF with spectrogram restoration ability. The key points are:
1. A generalized cost function is introduced to extend SNMF to optimize the divergence criterion.
2. Theoretical analysis based on a data generation model finds the optimal divergence for basis extrapolation in spectrogram restoration is around Euclidean distance.
3. Experiments show the proposed hybrid method using Euclidean distance outperforms other methods for both instantaneous mixtures and real recordings, achieving the best separation quality measured by signal-to-distortion ratio.
Depth estimation of sound images using directional clustering and activation-...Daichi Kitamura
Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, "Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.437-440, Hawaii, USA, March 2014 (Student Paper Award).
Blind Source Separation using Dictionary LearningDavide Nardone
The sparse decomposition of images and signals found great use in the field of: Compression, Noise removal and also in the Sources separation. This implies the decomposition of signals in the form of linear combinations with some elements of a redundant dictionary. The dictionary may be either a fixed dictionary (Fourier, Wavelet, etc) or may be learned from a set of samples. The algorithms based on learning the dictionary can be applied to a broad class of signals and have a better compression performance than methods based on fixed dictionary. Here we present a Compressed Sensing (CS) approach with an adaptive dictionary for solving a Determined Blind Source Separation (DBSS). The proposed method has been developed by reformulating a DBSS as Sparse Coding (SC) problem. The algorithm consist of few steps: Mixing matrix estimation, Sparse source separation and Source reconstruction. A sparse mixture of the original source signals has been used for the estimating the mixing matrix which have been used for the reconstruction of the of the source signals. A 'block signal representation' is used for representing the mixture in order to greatly improve the computation efficiency of the 'mixing matrix estimation' and the 'signal recovery' processes without particularly lose separation accuracy. Some experimental results are provided to compare the computation and separation performance of the method by varying the type of the dictionary used, be it fixed or an adaptive one. Finally a real case of study in the field of the Wireless Sensor Network (WSN) is illustrated in which a set of sensor nodes relay data to a multi-receiver node. Since more nodes transmits messages simultaneously it's necessary to separate the mixture of information at the receiver, thus solving a BSS problem.
Superresolution-based stereo signal separation via supervised nonnegative mat...Daichi Kitamura
This document presents a new method called regularized superresolution-based nonnegative matrix factorization (NMF) for multichannel music signal separation. The proposed method addresses limitations of existing approaches like directional clustering and penalized supervised NMF. It utilizes directional clustering to separate sources by direction, then applies superresolution-based NMF to extrapolate missing components from the target source using supervised bases, regularized by an index matrix. An evaluation compares this hybrid approach to other methods, finding it achieves higher source separation quality in terms of SDR, SIR and SAR scores.
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
This document summarizes a research presentation on an online divergence switching method for a hybrid source separation technique. The hybrid method combines directional clustering for spatial separation with supervised nonnegative matrix factorization (SNMF) for spectral separation. The proposed method switches between KL divergence and Euclidean distance for the SNMF, depending on the amount of spectral gaps from the directional clustering. When there are many gaps, Euclidean distance is better for basis extrapolation. When gaps are fewer, KL divergence gives better separation. In experiments, the proposed online switching method outperformed using only one divergence, achieving higher signal-to-distortion ratios for music source separation.
Depth Estimation of Sound Images Using Directional Clustering and Activation...奈良先端大 情報科学研究科
The document proposes two methods for estimating the depth of sound images using direction of arrival (DOA) information extracted from stereo signals.
Method 1 estimates depth based on the shape of the DOA distribution modeled using a generalized Gaussian distribution (GGD), where a smoother distribution indicates a source is farther away.
Method 2 applies activation-shared nonnegative matrix factorization (NMF) to extract features while maintaining directional information and reduce noise interfering with DOA estimation. Conventional NMF generates artificial fluctuations, while shared activation preserves source direction.
An experiment calculates the correlation between estimated and reference depths from 6 datasets containing varying source combinations. Results show the proposed method achieves high correlation, confirming its efficacy over conventional methods
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
The document summarizes speech recognition front-end technologies including voice activity detection (VAD) and speech enhancement. It discusses conventional signal processing based approaches and more recent deep learning based methods. For VAD, it describes adaptive context attention models that can dynamically adjust the context used based on noise type and SNR. For speech enhancement, it proposes a two-step neural network approach consisting of a prior network that makes multiple predictions from noisy features and a post network that combines these using a boosting method to produce enhanced features, allowing end-to-end training without an explicit masking step. The approach aims to better exploit neural network modeling power while reducing computation cost compared to conventional methods or single-step deep learning frameworks.
The document discusses adaptive channel equalization using neural networks. It provides an overview of neural networks and their application to channel equalization. Specifically, it summarizes various neural network architectures that have been used for equalization, including multilayer perceptrons, functional link artificial neural networks, Chebyshev neural networks, and radial basis function networks. It compares the bit error rate performance of these different neural network equalizers with traditional linear equalizers such as LMS and RLS. Overall, the document finds that neural network equalizers can better handle nonlinear channel distortions compared to linear equalizers and that radial basis function networks provide particularly good performance for channel equalization applications.
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
Abstract Automatic speech recognition is an important topic of speech processing. This paper presents the use of an Artificial Neural Network (ANN) for isolated word recognition. The Pre-processing is done and voiced speech is detected based on energy and zero crossing rates (ZCR). The proposed approach used in speech recognition is Mel Frequency Cepstral Coefficients (MFCC) and combine features of both MFCC and Linear Predictive Coding (LPC). The back-propagation is used as a classifier. The recognition accuracy is increased when combine features of both LPC and MFCC are used as compared to only MFCC approach using Neural Network as a classifier.. Keywords: Pre-processing, Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coding (LPC), Artificial Neural Network (ANN).
This document proposes a speaker-dependent WaveNet vocoder to generate high-quality speech from acoustic features. It uses a WaveNet model conditioned on mel-cepstral coefficients and fundamental frequency to directly model the relationship between acoustic features and speech waveforms. Evaluation shows the proposed method improves sound quality over traditional vocoders, as measured by objective metrics and subjective listening tests. Future work will apply this approach to other tasks and make the model independent of individual speakers.
Voice Activity Detection using Single Frequency FilteringTejus Adiga M
1. Voice Activity Detection (VAD) aims to locate speech segments in an input signal corrupted by noise by classifying frames as speech or noise.
2. Several time domain and frequency domain algorithms are discussed for VAD, including short-term energy, zero crossing rate, frequency subband distance measure, and long-term spectral flatness measure.
3. Single frequency filtering is also described, which analyzes envelopes at discrete frequencies to classify frames based on a learned noise floor.
During past a few months, I spent some time on the subject of "In Band Full-Duplex" wireless communications, because it is related to echo cancellation that I worked on many years ago. I put together a presentation on this subject for share with people have same interest.
It should be noted that what shown in the presentation are based on my experience and reflect my understanding and opinions. As usual, my opinions are in general conservative and may or may not be totally correct. Please let me know if you have any comments. Thanks.
The document discusses squarer-based timing recovery techniques for digital communications. It involves squaring the received signal to extract the periodic component that is related to the symbol timing. For PAM signals, squaring produces a signal with a frequency of 1/2T that can be used to generate a sampling clock of 1/T. For QAM signals, the envelope is squared. This technique extracts the timing information from the signal transitions but does not necessarily lock to the optimal timing phase. Performance depends on the signal bandwidth around the transitional regions.
발표자: 배재성(KAIST 석사과정)
발표일: 2018.10.
최근 딥러닝을 이용한 방법은 다양한 음성 인식 과제에서 괄목할 만한 성과를 내고 있습니다. 특히 Convolutional Neural Network (CNN)을 이용한 방식은 지역적인 특징 (local feature)들을 효과적으로 잡아낼 수 있기 때문에 비교적 짧은 시간 의존도를 가지는 음성 키워드 인식이나 음소 단위 인식과 같은 과제들에서 활발히 사용되고 있습니다. 그러나 CNN은 낮은 레벨의 특징들 간의 공간적 관계성을 고려하지 않는다는 한계점이 있습니다. 이를 극복하기 위해 캡슐 네트워크 구조를 도입하여 음성 스펙트로그램에서 추출된 특징들의 공간적 관계성을 고려하고자 하였습니다. 구글 음성 단어 데이터셋에서 CNN과 그 성능을 비교해 보았으며, 깨끗한 환경과 잡음 환경 모두에서 주목할만한 성능 향상을 이끌어 냈습니다.
This document discusses the process of sampling in signal processing. It defines key terms like analog and digital signals, sampling frequency, and samples. It explains how sampling works by taking regular measurements of a continuous signal's amplitude over time. This converts it into a discrete-time signal. It discusses applications of sampling like audio sampling, where signals are typically sampled above 20 kHz. It also discusses video sampling rates and speech sampling rates. The document contains examples and diagrams to illustrate these concepts.
This document summarizes adaptive signal processing techniques for acoustic echo cancellation. It defines acoustic echo as sound from a loudspeaker picked up by a microphone in the same room. Acoustic echo cancellation uses an adaptive filter to model the echo path and subtract the predicted echo from the microphone signal. The document reviews common adaptive algorithms for echo cancellation, including LMS, NLMS, RLS, APA, FAP, and VSS-APA, comparing their convergence speed, complexity, and performance in different noise conditions. FAP provides faster convergence than NLMS for speech signals while having lower complexity than APA. VSS-APA uses variable step sizes to improve performance during double-talk and under-modeling scenarios.
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
Fumiya Hasuike, Daichi Kitamura, and Rui Watanabe,"DNN-based frequency-domain permutation solver for multichannel audio source separation," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2022), pp. 872–877, Chiang Mai, Thailand, November 2022.
Applications of ann_in_microwave_engineeringprasadhegdegn
The document summarizes applications of artificial neural networks (ANNs) in microwave engineering. ANNs can be applied when problems are poorly understood, observations are difficult to obtain, or systems are complex and nonlinear. Some applications discussed include modeling of smart antennas, rectangular patch antennas, and demand node concepts for mobile network planning and optimization. Future trends may include using ANNs to improve antenna design and electromagnetic analysis algorithms.
Suppression of noise in noisy speech signal is required in many speech enhancement applications like signal recording and transmission from one place to other. In this paper a novel single line noise cancellation system is proposed using derivative of normalized least mean spare algorithm. The proposed system has two phases. The first phase is generation of secondary reference signal from incoming primary signal itself at initial silence period and pause between two words, which is essential while adaptive filter using as noise canceller. Second phase is noise cancellation using proposed modified error data normalized step size (EDNSS) algorithm. The performance of the proposed algorithm is compared with normalized least mean square (NLMS) algorithm and original EDNSS algorithm using standard IEEE sentence (SP23) of Noizeus data base with different types of real-world noise at different level of signal to noise ratio (SNR). The output of proposed, NLMS and EDNSS algorithm are measured with output SNR, excessive mean square error (EMSE) and misadjustment (M). The results clearly illustrates that the proposed algorithm gives improved result over conventional NLMS and EDNSS algorithm. The speed of convergence is also maintained as same conventional NLMS algorithm.
This document discusses key concepts in CDMA mobile communication systems including spread spectrum principles, CDMA, power control, rake receivers, and handoff. It explains that CDMA allows multiple users to occupy the same frequency band by using different spreading codes. The capacity of a CDMA system depends on the processing gain which provides an interference margin. Power control is important to maintain sufficient signal to interference ratios and maximize capacity. Rake receivers combine signal replicas from multipath propagation. Soft handoff allows communication between a mobile and multiple base stations for diversity.
This document discusses key concepts in CDMA mobile communication systems including spread spectrum principles, CDMA, power control, rake receivers, and handoff. It explains that CDMA allows multiple users to occupy the same frequency band by using different spreading codes. The capacity of a CDMA system depends on the processing gain which provides an interference margin. Power control is important to maintain adequate signal to interference ratios and maximize capacity. Rake receivers combine signal replicas from multipath propagation. Soft handoff allows communication between a mobile and multiple base stations for diversity.
Design of dfe based mimo communication system for mobile moving with high vel...Made Artha
The document discusses the design of a decision feedback equalizer (DFE) based multiple-input multiple-output (MIMO) communication system for mobile receivers moving at high velocities of up to 250km/hr. It analyzes the time dispersive and frequency dispersive effects of fading channels on signals. A DFE is proposed whose weights are periodically updated using the least mean squares (LMS) algorithm based on statistical channel parameters to combat the effects of fading. Simulation results show that the proposed MIMO system with a DFE achieves bit error rates below 10-3 at signal-to-noise ratios of 4dB and 10-4 at 6dB, even when the receiver is moving at 250km/hr.
Introduction to adaptive filtering and its applications.pptdebeshidutta2
This document discusses linear filters and adaptive filters. It provides an overview of key concepts such as:
- Linear filters have outputs that are linear functions of their inputs, while adaptive filters can adjust their parameters over time based on the input signals.
- The Wiener filter and LMS algorithm are introduced as approaches for optimal and adaptive filter design, with the LMS algorithm minimizing the mean square error using gradient descent.
- Applications of adaptive filters include system identification, inverse modeling, prediction, and interference cancellation. An example of acoustic echo cancellation is described.
- The document outlines the LMS adaptive algorithm steps and discusses its stability and convergence properties. It also summarizes different equalization techniques for mitigating inter
This document discusses sampling and quantization in digital communication. It introduces sampling as the process of converting a continuous-time analog signal into a discrete-time signal by taking samples at regular intervals. The sampling theorem states that if a signal is sampled at least twice the maximum frequency of the signal, it can be reconstructed without distortion. Quantization is the process of converting the discrete-time continuous amplitude samples into discrete amplitude values. The document covers topics such as the Nyquist rate, aliasing, ideal sampling, and methods of sampling like impulse sampling.
1) Diversity techniques like spatial diversity using multiple antennas can help reduce small-scale fading of signals by exploiting the random nature of mobile radio channels. If one signal path undergoes a deep fade, another independent path may still have a strong signal.
2) Selection diversity selects the branch with the highest instantaneous SNR. Maximal ratio diversity weights and combines signals from multiple branches optimally.
3) Time diversity transmits information repeatedly at time spacings greater than the coherence time so signals experience independent fading. This improves performance but reduces bandwidth efficiency. The RAKE receiver in CDMA systems provides a form of time diversity using multiple delayed signal paths.
This document describes an audio compression system using discrete wavelet transform and a psychoacoustic model. The system analyzes audio signals using wavelet decomposition, applies a psychoacoustic model to estimate masking thresholds, quantizes coefficients below the thresholds, and encodes the results. Evaluation showed the system achieved over 50% bit rate reduction with transparent quality on music signals like violin, drums, piano and vocals based on subjective listening tests.
The document discusses various topics related to digital communication systems including:
- Advantages of digital over analog communication systems such as noise immunity and easier implementation of error control coding.
- The process of analog to digital conversion including sampling, quantization, encoding, and pulse code modulation (PCM).
- Digital modulation techniques like differential PCM (DPCM) and delta modulation (DM) that reduce redundancy before encoding.
- Considerations for line coding binary data onto an analog channel such as bandwidth, noise immunity, power efficiency and self-clocking capability.
Adaptive Channel Equalization using Multilayer Perceptron Neural Networks wit...IOSRJVSP
This document presents a neural network approach to channel equalization using a multilayer perceptron with a variable learning rate parameter. Specifically, it proposes modifying the backpropagation algorithm to allow the learning rate to adapt at each iteration in order to achieve faster convergence. The equalizer structure is a decision feedback equalizer modeled as a neural network with an input, hidden and output layer. Simulation results show the proposed variable learning rate approach improves bit error rate and convergence speed compared to a standard backpropagation algorithm.
This document summarizes an OFDM channel estimation project. It discusses the objective to maximize OFDM system capacity through channel estimation and adaptive transmission. It outlines the system architecture, including the transmitter, channel, receiver, and channel estimation. It also lists the work completed, such as programs for channel impulse response, Rayleigh fading, and adding noise.
Intersymbol interference caused by multipath in band limited frequency selective time dispersive channels distorts the transmitted signal, causing bit error at receiver. ISI is the major obstacle to high speed data transmission over wireless channels. Channel estimation is a technique used to combat the intersymbol interference. The objective of this paper is to improve channel estimation accuracy in MIMO-OFDM system by using modified variable step size leaky Least Mean Square (MVSSLLMS) algorithm proposed for MIMO OFDM System. So we are going to analyze Bit Error Rate for different signal to noise ratio, also compare the proposed scheme with standard LMS channel estimation method.
This document discusses OFDM and OFDMA technologies. It begins with an outline of topics including the need for multi-carrier transmission, how OFDM addresses this need using FFT and IFFT, guard time insertion using cyclic prefixes, drawbacks of OFDM including high PAPR, channel estimation techniques, and an OFDM block diagram. It then discusses OFDMA which allows simultaneous transmissions to multiple users using OFDM signaling. Diversity techniques including time, frequency, and spatial diversity are also summarized.
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), pp. 651–658, Tokyo, Japan, December 2021.
This document summarizes a PhD candidate's research on high precision passive localization in multipath environments. It outlines the challenges of localizing radio emitters using time-of-arrival measurements in multipath conditions where line-of-sight paths may be blocked. The proposed approach models the received signals as a sparse combination of line-of-sight and non-line-of-sight propagation paths, and aims to jointly estimate the emitter locations and multipath parameters by formulating the localization problem within a compressive sensing framework. Numerical results demonstrate improvements over existing methods that do not account for multipath.
Similar to DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case (20)
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Kitamura Laboratory
Shoya Kawaguchi and Daichi Kitamura,
"Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and loudness using deep neural networks,"
Proceedings of RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2023), pp. 225–228, Honolulu, USA, March 2023.
Heart rate estimation of car driver using radar sensors and blind source sepa...Kitamura Laboratory
Keito Murata, Daichi Kitamura, Ryo Saito, and Daichi Ueki,
"Heart rate estimation of car driver using radar sensors and blind source separation,"
Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2022), pp. 1157–1164, Chiang Mai, Thailand, November 2022.
Linear multichannel blind source separation based on time-frequency mask obta...Kitamura Laboratory
This document proposes a new method for linear multichannel blind source separation (BSS) based on time-frequency masks obtained from harmonic/percussive sound separation (HPSS). The proposed method applies HPSS independently to temporarily estimated sources to generate harmonic and percussive masks, then smooths the masks and uses them in time-frequency masking-based BSS. Experiments show the proposed method achieves higher source separation quality than single-channel HPSS and outperforms other multichannel BSS methods, demonstrating the effectiveness of integrating HPSS with multichannel BSS.
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
Daichi Kitamura, "Blind audio source separation based on time-frequency structure models," Invited Overview Session in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), Tokyo, Japan, December 2021.
1) The document proposes a new metric to predict the accuracy of source separation by independent component analysis (ICA) using finite sample data, as directly calculating independence from theoretical distributions is not possible with limited samples.
2) An experiment shows high correlation (0.97) between the proposed metric, which calculates the squared error between sample expectations of signal sources, and actual ICA separation accuracy, allowing prediction of ICA performance before application.
3) The metric improves on existing metrics like symmetric uncertainty coefficient that rely on approximating distributions from finite bins, and enables advance assessment of ICA feasibility for problems involving mixing of multiple signal sources observed through limited sensor data.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case
1. DNN-based permutation solver for
frequency-domain independent component
analysis in two-source mixture case
Shuhei Yamaji and Daichi Kitamura
National Institute of Technology, Kagawa College
Japan
12th Asia-Pacific Signal and Information Processing
Association (APSIPA)
1
2. Introduction
About audio source separation
Applications of audio source separation
– Speech recognition
– Noise canceling
– Voice command device etc.
Nice to
meet you...
Hello…
Hello…
Nice to
meet you...
Audio
source
separation
2
3. Blind Source Separation
Independent component analysis (ICA) [Comon, 1994]
⁃ Assumes independence between source signals
⁃ Estimates demixing matrix without knowing mixing matrix
Actual audio mixing in reverberant environment
⁃ Convolution with room impulse responses between sources mics
⁃ Extend ICA to the frequency domain
Source signal Mixture signal Estimated signal
3
4. Frequency-Domain ICA
Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– Apply ICA in each frequency bin
Spectrogram
ICA1
ICA2
ICA3
…
…
ICA
Frequency
bin
Time frame
…
Inverse matrix
Frequency-wise
mixing matrix
Frequency-wise
demixing matrix
4
5. Permutation Problem in FDICA
Permutation problem in frequency-domain ICA
– Order of separated signals in each frequency is messed up
– Separated components must be aligned along the frequency axis
FDICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Estimated signal 1
Estimated signal 2
Non-aligned signal
Permutation
Solver
Time
5
6. Popular permutation solvers
– Based on Temporal Structures
• FDICA + correlation-based alignment between adjacent
frequencies [Murata+, 2001]
– Based on direction of arrival (DOA)
• Frequency-domain ICA + DOA alignment [Saruwatari+, 2006]
– Based on a relative correlation among frequencies
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim+, 2006]
– Based on a low-rank modeling of each source
• Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016]
Conventional Permutation Solvers
Time
…
…
Sort
Non-aligned signal Non-aligned signal
6
7. Problems of conventional permutation solvers
– Correlation-based method sometimes
fails to align components
– Even in IVA and ILRMA,
block permutation problem arise
Proposed method: DNN-based permutation solver
– The permutation problems can be simulated by shuffling the
frequency components of source signals
– Training data for DNN are easy to produce
Motivation of Proposed Method
Non-aligned
signal
Non-aligned
signal
Time
Separated
signal
Separated
signal
DNN
DNN
7
8. Proposed method: DNN input and label
Input and label
– Extract two short-time activations of reference and another
frequencies from the separated signal
– DNN predicts whether the permutation of input two frequencies is
correct (correct=0 and incorrect=1)
8
DNN
Correct permutation case Incorrect permutation case
DNN
Reference
Another
Reference
Another
10. Apply DNN in subband frequency (local time-frequency area)
– Subband: Reference (center) frequency several frequencies
Take majority decision along time frames
– to determine the subband permutation vector
Proposed method: DNN predictions in subband frequency bins
DNN output
Input vector
1 : Different sound source
1 : Different sound source
0 : Same sound source
1 : Different sound source
0 : Same sound source
10
Subband
permutation
vectorにして
おく
11. Proposed method: construct a fullband permutation vector
Alignment among subbands
– When the subband slides along frequency axis, the reference
(center) frequency component changes
• The meanings of “0 (same)” and “1 (different)” labels are not
shared among subbands
– The orders of source components in all subbands must be aligned
after the DNN prediction in all subbands
11
12. Proposed method: construct a fullband permutation vector
Objective
– Estimate “fullband permutation vector” that corresponds the two
sources to “0” and “1”
Step1
– The subband permutation vector of the lowest frequency subband is
simply set to the corresponding frequency bins in the fullband
permutation vector
Time
Frequency
1
1
0
1
0
1
1
0
1
0
1
1
0
1
0
1. Set
Fullband
permutation
vector
2. Set
12
13. Step2
– Slide the subband frequencies
– Obtain the subband permutation vector of the current subband and
its binary complement vector
– The similarity between subband and fullband permutation vectors are
measured by mean squared error (MSE)
– Set the subband vector that minimize MSE to the memory
– Update fullband permutation vector by taking majority decision
Proposed method: construct a fullband permutation vector
Time
Frequency
1
0
0
1
0
1
1
0
1
0
0
1
1
0
1
0
1
1
0
1
0
2. Set
0
1
1
0
1
1. Similarity comparison
3.
Majority
decision
Fullband
permutation
vector
13
14. Proposed method: construct a fullband permutation vector
Step3
– Iterate step2 up to the highest frequency subband
– Replace the components based on the fullband permutation vector
– Obtain permutation-aligned estimated signals
1
1
0
1
0
0
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
1
0
1
0
Majority
decision
Time
Frequency
Replace
Fullband
permutation
vector
Fullband
Vector
14
15. Experimental conditions
Training speech
signals
Dry sources: JVS corpus [Takamichi+, 2019] (Japanese speech)
Mixture: Convolve dry sources with RWCP impulse responses [Nakamura+, 2000]
Permutation: apply FDICA and randomly shuffling the components
Test speech
signals
Speech signals obtained from SiSEC2011 UND task [Araki+, 2012]
FFT length 8192 (512 ms, Humming window)
Shift length 2048
Subjective
evaluation
Average improvement of signal-to-distortion ratio (SDR)
Reverberation Time
15
16. Results
Findings
– Proposed method achieves an improvement of about 8 dB
– ILRMA's separation performance is about 4dB
– The proposed method is close to the upper-limit performance
0
2
4
6
8
10
12
FDICA
with IPS
ILRMA
(2 bases)
ILRMA
(3 bases)
ILRMA
(4 bases)
Proposed
method
SDR
improvement
[dB]
Good
Poor
ILRMA
(2 bases)
FDICA with
ideal
permutation
solver
(reference score)
ILRMA
(3 bases)
ILRMA
(4 bases)
FDICA with
DNN-based
permutation
solver
(proposed)
16
17. Conclusion
In this paper
– We proposed a new DNN-based permutation solver for determined
audio source separation using FDICA
– An SDR improvement of about 8 dB was achieved in experiments
with a highly reverberant speech mixture signal
Future work
– The proposed method creates a combinatorial explosion for three or
more separated signals
17
Thank you for your attention!
Hello everyone, I’m Shuei Yamaji at National Institute of Technology, Kagawa College, Japan.
In this presentation, we talk about DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case.
This presentation deals with audio source separation, / which is a technique to separate sounds from a mixture signal / into individual audio sources.
This technology can be used to many audio applications, / such as / speech recognition, / noise canceling, / voice command device, / and so on.
The popular approach for audio source separation is / independent component analysis, / ICA in short.
ICA assumes independence between sources / and estimates demixing matrix W / without knowing mixing matrix A.
This is represented in this figure.
The source signals, / s1 and s2, / are mixed by A, / then / observed as x1 and x2.
W / can separate the sources in x / if W is an inverse matrix of A / as y1 and y2.
Of cource we don’t know the mixing matrix A, / so / ICA estimates W using statistical independence between sources.
In actual situation (シテュエイション), audio signals are mixed with room reverberations as a convolutive mixture, / and simple ICA cannot separate in that situation.
To solve this problem, frequency-domain ICA, / FDICA in short, / was proposed.
01:00
This figure represents the mixture signals in time-frequency domain, / which are obtained by short-time Fourier transform.
In FDICA, / simple ICA / is applied to each frequency bin / like this figure.
Therefore, / the demixing matrix W must be estimated in each frequency bin / to achieve the source separation.
However, / since ICA cannot determine the order of the separated signals, / the output components of FDICA are not aligned like this, / and we have to re-order these separated red and blue components along frequency axis.
This is the so-called permutation problem.
Thus, a permutation solver must be applied after FDICA as post processing.
In this presentation, / we aim to solve the permutation problem over all frequency bins / using a new, / data-driven approach.
A major approach to solving the permutation problem / is based on temporal structures of the separated components
We can re-order the components based on the correlation values between adjacent frequencies.
(しっかり間を開ける)
When the positions of microphones are known, / the direction of arrivals of the sources / can also be utilized / for solving the permutation problem.
In recent years, / algorithms without encountering the permutation problem / have been proposed.
For example, both independent vector analysis, / IVA, / and independent low-rank matrix analysis, / ILRMA (アイルーマ), / estimate the frequency-wise demixing matrices / avoiding the permutation problem.
ILRMA(アイルーマ) is a state-of-the-art algorithm for blind audio source separation.
OK, let’s talk about our proposed method.
This slide explains our motivation.
The conventional correlation-based permutation solver / sometimes fails to align components correctly.
Even in IVA or ILRMA (アイルーマ), / the components are sometimes misaligned in blocks, / which is called the block permutation problem, / like this figure.
To achieve a stable and accurate permutation solver, / in this presentation, / we propose a DNN-based permutation solver, / where the training data for DNN permutation solver / can easily be obtained.
This is because the permutation problem can be simulated by randomly shuffling the frequency components of source signals.
In this slide, / we explain the input vector for the proposed DNN model(マドー).
In our DNN model (マドー), / first, / we extract / two short-time activations of reference / and another frequencies / from the separated signal.
These activations are concatenated (カンカーテネイテッド)as a single vector like this, / and input to the DNN.
Then, / DNN predicts whether the permutation of input two frequencies is correct, / where “zero”(ジロー)means that the current permutation is correct, / and “one” means they are inverted.
In the left-side figure, / the reference frequency is red and blue, / and another frequency is also red and blue.
So, / the current permutation is correct, / and its label should be zero(ジロー).
In the right-side figure, / the reference frequency is red and blue, / but another frequency is blue and red.
Therefore, / the current permutation is wrong, / and its label(レイブーゥ)should be one.
This figure depicts an architecture of DNN used in the proposed permutation solver.
This DNN model has full-connected 6 hidden layers, / and its structure is very simple.
Hereafter, / we consider the process in a sort-time subband frequency, / where the subband consists of reference frequency and plus-minus several frequencies.
In the proposed method, / we perform the DNN-based permutation prediction for all the combinations of reference and another frequencies, / where the reference frequency is fixed to the center of the subband.
In this figure, / the reference frequency is f3, / and fixed.
Another frequency is chosen from f1 to f5, / and all the combinations are input to DNN like this.
Thus, / we obtain these DNN outputs.
Since the correct permutation / does not depend on time, / we stride this short-time subband in time axis, / and collect DNN outputs like this figure.
Finally, we take a majority decision with the collected DNN outputs, / and obtain a subband permutation vector.
After the estimation of subband permutation vector, / we slide the subband along the frequency axis / like this figure.
However, / since the center frequency of the subband is always set to the reference frequency, / the meanings of the labels (レイブーゥス) “zero” and “one” are not shared / among subbands.
This is because the DNN outputs mean that / the components of reference and another frequencies are the same or different.
For this reason, / even if the subband components are aligned by the subband permutation vector, / the order of sources / could be different among the subbands / like this figure.
To solve this problem, it is necessary to unify the results for all the subband vectors, / for example, / 0 indicates a red source and 1 indicates a blue source in all the subbands.
This label(レイブーゥ) unification / can be achieved by the following 3 steps.
The objective of the following steps is that / we estimate a fullband permutation vector, / which corresponds the red and blue sources / to “zero” and “one,” respectively.
In the first step, / as shown in this figure, / the subband permutation vector in the lowest subband is simply set to the corresponding frequency bins / in the fullband permutation vector.
In step 2, / we slide the subband from the previous one / and obtain the subband permutation vector in that subband.
We also calculate the binary complement vector of the subband permutation vector / like this.
These two vectors are compared with the corresponding parts of the fullband vectors using mean square error, / then the vector that minimizes the error is selected and stored in the memory.
The fullband permutation vector is updated by taking a majority decision / using the vectors stored in the memory.
By repeating the process of the step 2, the complete fullband vector can be obtained.
Finally, / the permutation problem can be solved by replacing the frequency-wise source components based on the estimated fullband vector.
Let’s move on / to the experiments.
This table(テイボーゥ)shows the conditions.
In this experiment, / as a training dataset, / we used JVS corpus, / which is a Japanese speech dataset, / as dry sources, / and we mix them using impulse responses.
The permutation problem is simulated by randomly shuffling the frequency-wise components of the sources.
The test speech dataset is obtained from SiSEC UND task.
The bottom figure shows the impulse responses / used in this experiment, / where the reverberation time is 470 ms.
Here is the result of the experiment.
The vertical axis shows an average SDR improvement, / which shows the accuracy of the source separation.
The leftmost one is an FDICA with ideal permutation solver, / namely, / the permutation is perfectly solved by using the completely separated source signals.
So, this is an upper-bound score of the FDICA-based methods.
ILRMA(アイルーマ) is the state-of-the-art blind source separation method.
Since the reverberation time is long in this experiment, / the performance of ILRMA is not so high.
The rightmost one is our proposed method, / where the DNN-based permutation solver is applied after FDICA.
The proposed method achieves 8 dB improvement in SDR, / which is close to the upper-limit.
This is the conclusion (カンクルージョン).
Thank you for your attention.