Report

Follow

•0 likes•61 views

•0 likes•61 views

Download to read offline

Report

Shuhei Yamaji and Daichi Kitamura, "DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2020), pp. 781–787, Auckland, New Zealand, December 2020.

Follow

- 1. DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case Shuhei Yamaji and Daichi Kitamura National Institute of Technology, Kagawa College Japan 12th Asia-Pacific Signal and Information Processing Association (APSIPA) 1
- 2. Introduction About audio source separation Applications of audio source separation – Speech recognition – Noise canceling – Voice command device etc. Nice to meet you... Hello… Hello… Nice to meet you... Audio source separation 2
- 3. Blind Source Separation Independent component analysis (ICA) [Comon, 1994] ⁃ Assumes independence between source signals ⁃ Estimates demixing matrix without knowing mixing matrix Actual audio mixing in reverberant environment ⁃ Convolution with room impulse responses between sources mics ⁃ Extend ICA to the frequency domain Source signal Mixture signal Estimated signal 3
- 4. Frequency-Domain ICA Frequency-domain ICA (FDICA) [Smaragdis, 1998] – Apply ICA in each frequency bin Spectrogram ICA1 ICA2 ICA3 … … ICA Frequency bin Time frame … Inverse matrix Frequency-wise mixing matrix Frequency-wise demixing matrix 4
- 5. Permutation Problem in FDICA Permutation problem in frequency-domain ICA – Order of separated signals in each frequency is messed up – Separated components must be aligned along the frequency axis FDICA All frequency components Source 1 Source 2 Observed 1 Observed 2 Estimated signal 1 Estimated signal 2 Non-aligned signal Permutation Solver Time 5
- 6. Popular permutation solvers – Based on Temporal Structures • FDICA + correlation-based alignment between adjacent frequencies [Murata+, 2001] – Based on direction of arrival (DOA) • Frequency-domain ICA + DOA alignment [Saruwatari+, 2006] – Based on a relative correlation among frequencies • Independent vector analysis (IVA) [Hiroe, 2006], [Kim+, 2006] – Based on a low-rank modeling of each source • Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016] Conventional Permutation Solvers Time … … Sort Non-aligned signal Non-aligned signal 6
- 7. Problems of conventional permutation solvers – Correlation-based method sometimes fails to align components – Even in IVA and ILRMA, block permutation problem arise Proposed method: DNN-based permutation solver – The permutation problems can be simulated by shuffling the frequency components of source signals – Training data for DNN are easy to produce Motivation of Proposed Method Non-aligned signal Non-aligned signal Time Separated signal Separated signal DNN DNN 7
- 8. Proposed method: DNN input and label Input and label – Extract two short-time activations of reference and another frequencies from the separated signal – DNN predicts whether the permutation of input two frequencies is correct (correct=0 and incorrect=1) 8 DNN Correct permutation case Incorrect permutation case DNN Reference Another Reference Another
- 9. Simple Model – 6 hidden layers with ReLU or Sigmoid functions Proposed method: DNN Architecture Hidden Layer 1 (128 units) ReLU Hidden Layer 2 (128 units) ReLU Hidden Layer 3 (128 units) ReLU Hidden Layer 4 (64 units) ReLU Hidden Layer 5 (64 units) ReLU Hidden Layer 6 (1 units) Sigmoid Output Layer (1 units) Target label (1 units) Input Layer (160 units) Minimum MSE 0 or 1 9
- 10. Apply DNN in subband frequency (local time-frequency area) – Subband: Reference (center) frequency several frequencies Take majority decision along time frames – to determine the subband permutation vector Proposed method: DNN predictions in subband frequency bins DNN output Input vector 1 : Different sound source 1 : Different sound source 0 : Same sound source 1 : Different sound source 0 : Same sound source 10 Subband permutation vectorにして おく
- 11. Proposed method: construct a fullband permutation vector Alignment among subbands – When the subband slides along frequency axis, the reference (center) frequency component changes • The meanings of “0 (same)” and “1 (different)” labels are not shared among subbands – The orders of source components in all subbands must be aligned after the DNN prediction in all subbands 11
- 12. Proposed method: construct a fullband permutation vector Objective – Estimate “fullband permutation vector” that corresponds the two sources to “0” and “1” Step1 – The subband permutation vector of the lowest frequency subband is simply set to the corresponding frequency bins in the fullband permutation vector Time Frequency 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1. Set Fullband permutation vector 2. Set 12
- 13. Step2 – Slide the subband frequencies – Obtain the subband permutation vector of the current subband and its binary complement vector – The similarity between subband and fullband permutation vectors are measured by mean squared error (MSE) – Set the subband vector that minimize MSE to the memory – Update fullband permutation vector by taking majority decision Proposed method: construct a fullband permutation vector Time Frequency 1 0 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 2. Set 0 1 1 0 1 1. Similarity comparison 3. Majority decision Fullband permutation vector 13
- 14. Proposed method: construct a fullband permutation vector Step3 – Iterate step2 up to the highest frequency subband – Replace the components based on the fullband permutation vector – Obtain permutation-aligned estimated signals 1 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 1 0 Majority decision Time Frequency Replace Fullband permutation vector Fullband Vector 14
- 15. Experimental conditions Training speech signals Dry sources: JVS corpus [Takamichi+, 2019] (Japanese speech) Mixture: Convolve dry sources with RWCP impulse responses [Nakamura+, 2000] Permutation: apply FDICA and randomly shuffling the components Test speech signals Speech signals obtained from SiSEC2011 UND task [Araki+, 2012] FFT length 8192 (512 ms, Humming window) Shift length 2048 Subjective evaluation Average improvement of signal-to-distortion ratio (SDR) Reverberation Time 15
- 16. Results Findings – Proposed method achieves an improvement of about 8 dB – ILRMA's separation performance is about 4dB – The proposed method is close to the upper-limit performance 0 2 4 6 8 10 12 FDICA with IPS ILRMA (2 bases) ILRMA (3 bases) ILRMA (4 bases) Proposed method SDR improvement [dB] Good Poor ILRMA (2 bases) FDICA with ideal permutation solver (reference score) ILRMA （3 bases） ILRMA （4 bases） FDICA with DNN-based permutation solver (proposed) 16
- 17. Conclusion In this paper – We proposed a new DNN-based permutation solver for determined audio source separation using FDICA – An SDR improvement of about 8 dB was achieved in experiments with a highly reverberant speech mixture signal Future work – The proposed method creates a combinatorial explosion for three or more separated signals 17 Thank you for your attention!
- 18. Demonstration Original Mixture FDICA with IPS FDICA with proposed method 18

- Hello everyone, I’m Shuei Yamaji at National Institute of Technology, Kagawa College, Japan. In this presentation, we talk about DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case.
- This presentation deals with audio source separation, / which is a technique to separate sounds from a mixture signal / into individual audio sources. This technology can be used to many audio applications, / such as / speech recognition, / noise canceling, / voice command device, / and so on.
- The popular approach for audio source separation is / independent component analysis, / ICA in short. ICA assumes independence between sources / and estimates demixing matrix W / without knowing mixing matrix A. This is represented in this figure. The source signals, / s1 and s2, / are mixed by A, / then / observed as x1 and x2. W / can separate the sources in x / if W is an inverse matrix of A / as y1 and y2. Of cource we don’t know the mixing matrix A, / so / ICA estimates W using statistical independence between sources. In actual situation （シテュエイション）, audio signals are mixed with room reverberations as a convolutive mixture, / and simple ICA cannot separate in that situation. To solve this problem, frequency-domain ICA, / FDICA in short, / was proposed. 01:00
- This figure represents the mixture signals in time-frequency domain, / which are obtained by short-time Fourier transform. In FDICA, / simple ICA / is applied to each frequency bin / like this figure. Therefore, / the demixing matrix W must be estimated in each frequency bin / to achieve the source separation.
- However, / since ICA cannot determine the order of the separated signals, / the output components of FDICA are not aligned like this, / and we have to re-order these separated red and blue components along frequency axis. This is the so-called permutation problem. Thus, a permutation solver must be applied after FDICA as post processing. In this presentation, / we aim to solve the permutation problem over all frequency bins / using a new, / data-driven approach.
- A major approach to solving the permutation problem / is based on temporal structures of the separated components We can re-order the components based on the correlation values between adjacent frequencies. （しっかり間を開ける） When the positions of microphones are known, / the direction of arrivals of the sources / can also be utilized / for solving the permutation problem. In recent years, / algorithms without encountering the permutation problem / have been proposed. For example, both independent vector analysis, / IVA, / and independent low-rank matrix analysis, / ILRMA (アイルーマ), / estimate the frequency-wise demixing matrices / avoiding the permutation problem. ILRMA（アイルーマ） is a state-of-the-art algorithm for blind audio source separation.
- OK, let’s talk about our proposed method. This slide explains our motivation. The conventional correlation-based permutation solver / sometimes fails to align components correctly. Even in IVA or ILRMA (アイルーマ), / the components are sometimes misaligned in blocks, / which is called the block permutation problem, / like this figure. To achieve a stable and accurate permutation solver, / in this presentation, / we propose a DNN-based permutation solver, / where the training data for DNN permutation solver / can easily be obtained. This is because the permutation problem can be simulated by randomly shuffling the frequency components of source signals.
- In this slide, / we explain the input vector for the proposed DNN model（マドー）. In our DNN model （マドー）, / first, / we extract / two short-time activations of reference / and another frequencies / from the separated signal. These activations are concatenated （カンカーテネイテッド）as a single vector like this, / and input to the DNN. Then, / DNN predicts whether the permutation of input two frequencies is correct, / where “zero”（ジロー）means that the current permutation is correct, / and “one” means they are inverted. In the left-side figure, / the reference frequency is red and blue, / and another frequency is also red and blue. So, / the current permutation is correct, / and its label should be zero（ジロー）. In the right-side figure, / the reference frequency is red and blue, / but another frequency is blue and red. Therefore, / the current permutation is wrong, / and its label（レイブーゥ）should be one.
- This figure depicts an architecture of DNN used in the proposed permutation solver. This DNN model has full-connected 6 hidden layers, / and its structure is very simple.
- Hereafter, / we consider the process in a sort-time subband frequency, / where the subband consists of reference frequency and plus-minus several frequencies. In the proposed method, / we perform the DNN-based permutation prediction for all the combinations of reference and another frequencies, / where the reference frequency is fixed to the center of the subband. In this figure, / the reference frequency is f3, / and fixed. Another frequency is chosen from f1 to f5, / and all the combinations are input to DNN like this. Thus, / we obtain these DNN outputs. Since the correct permutation / does not depend on time, / we stride this short-time subband in time axis, / and collect DNN outputs like this figure. Finally, we take a majority decision with the collected DNN outputs, / and obtain a subband permutation vector.
- After the estimation of subband permutation vector, / we slide the subband along the frequency axis / like this figure. However, / since the center frequency of the subband is always set to the reference frequency, / the meanings of the labels （レイブーゥス） “zero” and “one” are not shared / among subbands. This is because the DNN outputs mean that / the components of reference and another frequencies are the same or different. For this reason, / even if the subband components are aligned by the subband permutation vector, / the order of sources / could be different among the subbands / like this figure. To solve this problem, it is necessary to unify the results for all the subband vectors, / for example, / 0 indicates a red source and 1 indicates a blue source in all the subbands.
- This label（レイブーゥ） unification / can be achieved by the following 3 steps. The objective of the following steps is that / we estimate a fullband permutation vector, / which corresponds the red and blue sources / to “zero” and “one,” respectively. In the first step, / as shown in this figure, / the subband permutation vector in the lowest subband is simply set to the corresponding frequency bins / in the fullband permutation vector.
- In step 2, / we slide the subband from the previous one / and obtain the subband permutation vector in that subband. We also calculate the binary complement vector of the subband permutation vector / like this. These two vectors are compared with the corresponding parts of the fullband vectors using mean square error, / then the vector that minimizes the error is selected and stored in the memory. The fullband permutation vector is updated by taking a majority decision / using the vectors stored in the memory.
- By repeating the process of the step 2, the complete fullband vector can be obtained. Finally, / the permutation problem can be solved by replacing the frequency-wise source components based on the estimated fullband vector.
- Let’s move on / to the experiments. This table（テイボーゥ）shows the conditions. In this experiment, / as a training dataset, / we used JVS corpus, / which is a Japanese speech dataset, / as dry sources, / and we mix them using impulse responses. The permutation problem is simulated by randomly shuffling the frequency-wise components of the sources. The test speech dataset is obtained from SiSEC UND task. The bottom figure shows the impulse responses / used in this experiment, / where the reverberation time is 470 ms.
- Here is the result of the experiment. The vertical axis shows an average SDR improvement, / which shows the accuracy of the source separation. The leftmost one is an FDICA with ideal permutation solver, / namely, / the permutation is perfectly solved by using the completely separated source signals. So, this is an upper-bound score of the FDICA-based methods. ILRMA（アイルーマ） is the state-of-the-art blind source separation method. Since the reverberation time is long in this experiment, / the performance of ILRMA is not so high. The rightmost one is our proposed method, / where the DNN-based permutation solver is applied after FDICA. The proposed method achieves 8 dB improvement in SDR, / which is close to the upper-limit.
- This is the conclusion （カンクルージョン）. Thank you for your attention.