DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

Kitamura Laboratory
Kitamura LaboratoryKitamura Laboratory
DNN-based permutation solver for
frequency-domain independent component
analysis in two-source mixture case
Shuhei Yamaji and Daichi Kitamura
National Institute of Technology, Kagawa College
Japan
12th Asia-Pacific Signal and Information Processing
Association (APSIPA)
1
Introduction
 About audio source separation
 Applications of audio source separation
– Speech recognition
– Noise canceling
– Voice command device etc.
Nice to
meet you...
Hello…
Hello…
Nice to
meet you...
Audio
source
separation
2
Blind Source Separation
 Independent component analysis (ICA) [Comon, 1994]
⁃ Assumes independence between source signals
⁃ Estimates demixing matrix without knowing mixing matrix
Actual audio mixing in reverberant environment
⁃ Convolution with room impulse responses between sources mics
⁃ Extend ICA to the frequency domain
Source signal Mixture signal Estimated signal
3
Frequency-Domain ICA
 Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– Apply ICA in each frequency bin
Spectrogram
ICA1
ICA2
ICA3
…
…
ICA
Frequency
bin
Time frame
…
Inverse matrix
Frequency-wise
mixing matrix
Frequency-wise
demixing matrix
4
Permutation Problem in FDICA
 Permutation problem in frequency-domain ICA
– Order of separated signals in each frequency is messed up
– Separated components must be aligned along the frequency axis
FDICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Estimated signal 1
Estimated signal 2
Non-aligned signal
Permutation
Solver
Time
5
 Popular permutation solvers
– Based on Temporal Structures
• FDICA + correlation-based alignment between adjacent
frequencies [Murata+, 2001]
– Based on direction of arrival (DOA)
• Frequency-domain ICA + DOA alignment [Saruwatari+, 2006]
– Based on a relative correlation among frequencies
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim+, 2006]
– Based on a low-rank modeling of each source
• Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016]
Conventional Permutation Solvers
Time
…
…
Sort
Non-aligned signal Non-aligned signal
6
 Problems of conventional permutation solvers
– Correlation-based method sometimes
fails to align components
– Even in IVA and ILRMA,
block permutation problem arise
 Proposed method: DNN-based permutation solver
– The permutation problems can be simulated by shuffling the
frequency components of source signals
– Training data for DNN are easy to produce
Motivation of Proposed Method
Non-aligned
signal
Non-aligned
signal
Time
Separated
signal
Separated
signal
DNN
DNN
7
Proposed method: DNN input and label
 Input and label
– Extract two short-time activations of reference and another
frequencies from the separated signal
– DNN predicts whether the permutation of input two frequencies is
correct (correct=0 and incorrect=1)
8
DNN
Correct permutation case Incorrect permutation case
DNN
Reference
Another
Reference
Another
 Simple Model
– 6 hidden layers with ReLU or Sigmoid functions
Proposed method: DNN Architecture
Hidden
Layer
1
(128
units)
ReLU
Hidden
Layer
2
(128
units)
ReLU
Hidden
Layer
3
(128
units)
ReLU
Hidden
Layer
4
(64
units)
ReLU
Hidden
Layer
5
(64
units)
ReLU
Hidden
Layer
6
(1
units)
Sigmoid
Output
Layer
(1
units)
Target
label
(1
units)
Input
Layer
(160
units)
Minimum
MSE
0
or
1
9
 Apply DNN in subband frequency (local time-frequency area)
– Subband: Reference (center) frequency several frequencies
 Take majority decision along time frames
– to determine the subband permutation vector
Proposed method: DNN predictions in subband frequency bins
DNN output
Input vector
1 : Different sound source
1 : Different sound source
0 : Same sound source
1 : Different sound source
0 : Same sound source
10
Subband
permutation
vectorにして
おく
Proposed method: construct a fullband permutation vector
 Alignment among subbands
– When the subband slides along frequency axis, the reference
(center) frequency component changes
• The meanings of “0 (same)” and “1 (different)” labels are not
shared among subbands
– The orders of source components in all subbands must be aligned
after the DNN prediction in all subbands
11
Proposed method: construct a fullband permutation vector
 Objective
– Estimate “fullband permutation vector” that corresponds the two
sources to “0” and “1”
 Step1
– The subband permutation vector of the lowest frequency subband is
simply set to the corresponding frequency bins in the fullband
permutation vector
Time
Frequency
1
1
0
1
0
1
1
0
1
0
1
1
0
1
0
1. Set
Fullband
permutation
vector
2. Set
12
 Step2
– Slide the subband frequencies
– Obtain the subband permutation vector of the current subband and
its binary complement vector
– The similarity between subband and fullband permutation vectors are
measured by mean squared error (MSE)
– Set the subband vector that minimize MSE to the memory
– Update fullband permutation vector by taking majority decision
Proposed method: construct a fullband permutation vector
Time
Frequency
1
0
0
1
0
1
1
0
1
0
0
1
1
0
1
0
1
1
0
1
0
2. Set
0
1
1
0
1
1. Similarity comparison
3.
Majority
decision
Fullband
permutation
vector
13
Proposed method: construct a fullband permutation vector
 Step3
– Iterate step2 up to the highest frequency subband
– Replace the components based on the fullband permutation vector
– Obtain permutation-aligned estimated signals
1
1
0
1
0
0
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
1
0
1
0
Majority
decision
Time
Frequency
Replace
Fullband
permutation
vector
Fullband
Vector
14
Experimental conditions
Training speech
signals
Dry sources: JVS corpus [Takamichi+, 2019] (Japanese speech)
Mixture: Convolve dry sources with RWCP impulse responses [Nakamura+, 2000]
Permutation: apply FDICA and randomly shuffling the components
Test speech
signals
Speech signals obtained from SiSEC2011 UND task [Araki+, 2012]
FFT length 8192 (512 ms, Humming window)
Shift length 2048
Subjective
evaluation
Average improvement of signal-to-distortion ratio (SDR)
Reverberation Time
15
Results
 Findings
– Proposed method achieves an improvement of about 8 dB
– ILRMA's separation performance is about 4dB
– The proposed method is close to the upper-limit performance
0
2
4
6
8
10
12
FDICA
with IPS
ILRMA
(2 bases)
ILRMA
(3 bases)
ILRMA
(4 bases)
Proposed
method
SDR
improvement
[dB]
Good
Poor
ILRMA
(2 bases)
FDICA with
ideal
permutation
solver
(reference score)
ILRMA
(3 bases)
ILRMA
(4 bases)
FDICA with
DNN-based
permutation
solver
(proposed)
16
Conclusion
 In this paper
– We proposed a new DNN-based permutation solver for determined
audio source separation using FDICA
– An SDR improvement of about 8 dB was achieved in experiments
with a highly reverberant speech mixture signal
 Future work
– The proposed method creates a combinatorial explosion for three or
more separated signals
17
Thank you for your attention!
Demonstration
Original
Mixture
FDICA with IPS
FDICA with
proposed method
18
1 of 18

More Related Content

What's hot(20)

Adaptive equalizationAdaptive equalization
Adaptive equalization
Kamal Bhatt13.5K views
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
Akira Tamamori2.6K views
Ibfd presentationIbfd presentation
Ibfd presentation
Fuyun Ling1.1K views
SamplingSampling
Sampling
Muhammad Uzair Rasheed13.2K views
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
chintanajoshi13.1K views

Similar to DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

10-cdma.ppt10-cdma.ppt
10-cdma.pptBaggaSingh
3 views25 slides
10-cdma.ppt10-cdma.ppt
10-cdma.pptPRADEEPJ30
2 views25 slides

Similar to DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case(20)

More from Kitamura Laboratory(20)

Recently uploaded(20)

Sanitary Landfill- SWM.pptxSanitary Landfill- SWM.pptx
Sanitary Landfill- SWM.pptx
Vinod Nejkar5 views
SICTECH CORPORATE PRESENTATIONSICTECH CORPORATE PRESENTATION
SICTECH CORPORATE PRESENTATION
SiCtechInduction15 views
Wire Cutting & StrippingWire Cutting & Stripping
Wire Cutting & Stripping
Iwiss Tools Co.,Ltd6 views
cloud computing-virtualization.pptxcloud computing-virtualization.pptx
cloud computing-virtualization.pptx
RajaulKarim2072 views
LFA-NPG-Paper.pdfLFA-NPG-Paper.pdf
LFA-NPG-Paper.pdf
harinsrikanth40 views
Investor PresentationInvestor Presentation
Investor Presentation
eser sevinç10 views
Deutsch CrimpingDeutsch Crimping
Deutsch Crimping
Iwiss Tools Co.,Ltd13 views
Saikat Chakraborty Java Oracle Certificate.pdfSaikat Chakraborty Java Oracle Certificate.pdf
Saikat Chakraborty Java Oracle Certificate.pdf
SaikatChakraborty7871489 views
What is Whirling Hygrometer.pdfWhat is Whirling Hygrometer.pdf
What is Whirling Hygrometer.pdf
IIT KHARAGPUR 10 views
SNMPxSNMPx
SNMPx
Amatullahbutt10 views
Pointers.pptxPointers.pptx
Pointers.pptx
Ananthi Palanisamy58 views
CHI-SQUARE ( χ2) TESTS.pptxCHI-SQUARE ( χ2) TESTS.pptx
CHI-SQUARE ( χ2) TESTS.pptx
ssusera597c511 views
Electrical CrimpingElectrical Crimping
Electrical Crimping
Iwiss Tools Co.,Ltd18 views
SWM L15-L28_drhasan (Part 2).pdfSWM L15-L28_drhasan (Part 2).pdf
SWM L15-L28_drhasan (Part 2).pdf
MahmudHasan74787025 views
String.pptxString.pptx
String.pptx
Ananthi Palanisamy45 views
SWM L1-L14_drhasan (Part 1).pdfSWM L1-L14_drhasan (Part 1).pdf
SWM L1-L14_drhasan (Part 1).pdf
MahmudHasan74787038 views

DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

  • 1. DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case Shuhei Yamaji and Daichi Kitamura National Institute of Technology, Kagawa College Japan 12th Asia-Pacific Signal and Information Processing Association (APSIPA) 1
  • 2. Introduction  About audio source separation  Applications of audio source separation – Speech recognition – Noise canceling – Voice command device etc. Nice to meet you... Hello… Hello… Nice to meet you... Audio source separation 2
  • 3. Blind Source Separation  Independent component analysis (ICA) [Comon, 1994] ⁃ Assumes independence between source signals ⁃ Estimates demixing matrix without knowing mixing matrix Actual audio mixing in reverberant environment ⁃ Convolution with room impulse responses between sources mics ⁃ Extend ICA to the frequency domain Source signal Mixture signal Estimated signal 3
  • 4. Frequency-Domain ICA  Frequency-domain ICA (FDICA) [Smaragdis, 1998] – Apply ICA in each frequency bin Spectrogram ICA1 ICA2 ICA3 … … ICA Frequency bin Time frame … Inverse matrix Frequency-wise mixing matrix Frequency-wise demixing matrix 4
  • 5. Permutation Problem in FDICA  Permutation problem in frequency-domain ICA – Order of separated signals in each frequency is messed up – Separated components must be aligned along the frequency axis FDICA All frequency components Source 1 Source 2 Observed 1 Observed 2 Estimated signal 1 Estimated signal 2 Non-aligned signal Permutation Solver Time 5
  • 6.  Popular permutation solvers – Based on Temporal Structures • FDICA + correlation-based alignment between adjacent frequencies [Murata+, 2001] – Based on direction of arrival (DOA) • Frequency-domain ICA + DOA alignment [Saruwatari+, 2006] – Based on a relative correlation among frequencies • Independent vector analysis (IVA) [Hiroe, 2006], [Kim+, 2006] – Based on a low-rank modeling of each source • Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016] Conventional Permutation Solvers Time … … Sort Non-aligned signal Non-aligned signal 6
  • 7.  Problems of conventional permutation solvers – Correlation-based method sometimes fails to align components – Even in IVA and ILRMA, block permutation problem arise  Proposed method: DNN-based permutation solver – The permutation problems can be simulated by shuffling the frequency components of source signals – Training data for DNN are easy to produce Motivation of Proposed Method Non-aligned signal Non-aligned signal Time Separated signal Separated signal DNN DNN 7
  • 8. Proposed method: DNN input and label  Input and label – Extract two short-time activations of reference and another frequencies from the separated signal – DNN predicts whether the permutation of input two frequencies is correct (correct=0 and incorrect=1) 8 DNN Correct permutation case Incorrect permutation case DNN Reference Another Reference Another
  • 9.  Simple Model – 6 hidden layers with ReLU or Sigmoid functions Proposed method: DNN Architecture Hidden Layer 1 (128 units) ReLU Hidden Layer 2 (128 units) ReLU Hidden Layer 3 (128 units) ReLU Hidden Layer 4 (64 units) ReLU Hidden Layer 5 (64 units) ReLU Hidden Layer 6 (1 units) Sigmoid Output Layer (1 units) Target label (1 units) Input Layer (160 units) Minimum MSE 0 or 1 9
  • 10.  Apply DNN in subband frequency (local time-frequency area) – Subband: Reference (center) frequency several frequencies  Take majority decision along time frames – to determine the subband permutation vector Proposed method: DNN predictions in subband frequency bins DNN output Input vector 1 : Different sound source 1 : Different sound source 0 : Same sound source 1 : Different sound source 0 : Same sound source 10 Subband permutation vectorにして おく
  • 11. Proposed method: construct a fullband permutation vector  Alignment among subbands – When the subband slides along frequency axis, the reference (center) frequency component changes • The meanings of “0 (same)” and “1 (different)” labels are not shared among subbands – The orders of source components in all subbands must be aligned after the DNN prediction in all subbands 11
  • 12. Proposed method: construct a fullband permutation vector  Objective – Estimate “fullband permutation vector” that corresponds the two sources to “0” and “1”  Step1 – The subband permutation vector of the lowest frequency subband is simply set to the corresponding frequency bins in the fullband permutation vector Time Frequency 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1. Set Fullband permutation vector 2. Set 12
  • 13.  Step2 – Slide the subband frequencies – Obtain the subband permutation vector of the current subband and its binary complement vector – The similarity between subband and fullband permutation vectors are measured by mean squared error (MSE) – Set the subband vector that minimize MSE to the memory – Update fullband permutation vector by taking majority decision Proposed method: construct a fullband permutation vector Time Frequency 1 0 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 2. Set 0 1 1 0 1 1. Similarity comparison 3. Majority decision Fullband permutation vector 13
  • 14. Proposed method: construct a fullband permutation vector  Step3 – Iterate step2 up to the highest frequency subband – Replace the components based on the fullband permutation vector – Obtain permutation-aligned estimated signals 1 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 1 0 Majority decision Time Frequency Replace Fullband permutation vector Fullband Vector 14
  • 15. Experimental conditions Training speech signals Dry sources: JVS corpus [Takamichi+, 2019] (Japanese speech) Mixture: Convolve dry sources with RWCP impulse responses [Nakamura+, 2000] Permutation: apply FDICA and randomly shuffling the components Test speech signals Speech signals obtained from SiSEC2011 UND task [Araki+, 2012] FFT length 8192 (512 ms, Humming window) Shift length 2048 Subjective evaluation Average improvement of signal-to-distortion ratio (SDR) Reverberation Time 15
  • 16. Results  Findings – Proposed method achieves an improvement of about 8 dB – ILRMA's separation performance is about 4dB – The proposed method is close to the upper-limit performance 0 2 4 6 8 10 12 FDICA with IPS ILRMA (2 bases) ILRMA (3 bases) ILRMA (4 bases) Proposed method SDR improvement [dB] Good Poor ILRMA (2 bases) FDICA with ideal permutation solver (reference score) ILRMA (3 bases) ILRMA (4 bases) FDICA with DNN-based permutation solver (proposed) 16
  • 17. Conclusion  In this paper – We proposed a new DNN-based permutation solver for determined audio source separation using FDICA – An SDR improvement of about 8 dB was achieved in experiments with a highly reverberant speech mixture signal  Future work – The proposed method creates a combinatorial explosion for three or more separated signals 17 Thank you for your attention!

Editor's Notes

  1. Hello everyone, I’m Shuei Yamaji at National Institute of Technology, Kagawa College, Japan. In this presentation, we talk about DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case.
  2. This presentation deals with audio source separation, / which is a technique to separate sounds from a mixture signal / into individual audio sources. This technology can be used to many audio applications, / such as / speech recognition, / noise canceling, / voice command device, / and so on.
  3. The popular approach for audio source separation is / independent component analysis, / ICA in short. ICA assumes independence between sources / and estimates demixing matrix W / without knowing mixing matrix A. This is represented in this figure. The source signals, / s1 and s2, / are mixed by A, / then / observed as x1 and x2. W / can separate the sources in x / if W is an inverse matrix of A / as y1 and y2. Of cource we don’t know the mixing matrix A, / so / ICA estimates W using statistical independence between sources. In actual situation (シテュエイション), audio signals are mixed with room reverberations as a convolutive mixture, / and simple ICA cannot separate in that situation. To solve this problem, frequency-domain ICA, / FDICA in short, / was proposed. 01:00
  4. This figure represents the mixture signals in time-frequency domain, / which are obtained by short-time Fourier transform. In FDICA, / simple ICA / is applied to each frequency bin / like this figure. Therefore, / the demixing matrix W must be estimated in each frequency bin / to achieve the source separation.
  5. However, / since ICA cannot determine the order of the separated signals, / the output components of FDICA are not aligned like this, / and we have to re-order these separated red and blue components along frequency axis. This is the so-called permutation problem. Thus, a permutation solver must be applied after FDICA as post processing. In this presentation, / we aim to solve the permutation problem over all frequency bins / using a new, / data-driven approach.
  6. A major approach to solving the permutation problem / is based on temporal structures of the separated components We can re-order the components based on the correlation values between adjacent frequencies. (しっかり間を開ける) When the positions of microphones are known, / the direction of arrivals of the sources / can also be utilized / for solving the permutation problem. In recent years, / algorithms without encountering the permutation problem / have been proposed. For example, both independent vector analysis, / IVA, / and independent low-rank matrix analysis, / ILRMA (アイルーマ), / estimate the frequency-wise demixing matrices / avoiding the permutation problem. ILRMA(アイルーマ) is a state-of-the-art algorithm for blind audio source separation.
  7. OK, let’s talk about our proposed method. This slide explains our motivation. The conventional correlation-based permutation solver / sometimes fails to align components correctly. Even in IVA or ILRMA (アイルーマ), / the components are sometimes misaligned in blocks, / which is called the block permutation problem, / like this figure. To achieve a stable and accurate permutation solver, / in this presentation, / we propose a DNN-based permutation solver, / where the training data for DNN permutation solver / can easily be obtained. This is because the permutation problem can be simulated by randomly shuffling the frequency components of source signals.
  8. In this slide, / we explain the input vector for the proposed DNN model(マドー). In our DNN model (マドー), / first, / we extract / two short-time activations of reference / and another frequencies / from the separated signal. These activations are concatenated (カンカーテネイテッド)as a single vector like this, / and input to the DNN. Then, / DNN predicts whether the permutation of input two frequencies is correct, / where “zero”(ジロー)means that the current permutation is correct, / and “one” means they are inverted. In the left-side figure, / the reference frequency is red and blue, / and another frequency is also red and blue. So, / the current permutation is correct, / and its label should be zero(ジロー). In the right-side figure, / the reference frequency is red and blue, / but another frequency is blue and red. Therefore, / the current permutation is wrong, / and its label(レイブーゥ)should be one.
  9. This figure depicts an architecture of DNN used in the proposed permutation solver. This DNN model has full-connected 6 hidden layers, / and its structure is very simple.
  10. Hereafter, / we consider the process in a sort-time subband frequency, / where the subband consists of reference frequency and plus-minus several frequencies. In the proposed method, / we perform the DNN-based permutation prediction for all the combinations of reference and another frequencies, / where the reference frequency is fixed to the center of the subband. In this figure, / the reference frequency is f3, / and fixed. Another frequency is chosen from f1 to f5, / and all the combinations are input to DNN like this. Thus, / we obtain these DNN outputs. Since the correct permutation / does not depend on time, / we stride this short-time subband in time axis, / and collect DNN outputs like this figure. Finally, we take a majority decision with the collected DNN outputs, / and obtain a subband permutation vector.
  11. After the estimation of subband permutation vector, / we slide the subband along the frequency axis / like this figure. However, / since the center frequency of the subband is always set to the reference frequency, / the meanings of the labels (レイブーゥス) “zero” and “one” are not shared / among subbands. This is because the DNN outputs mean that / the components of reference and another frequencies are the same or different. For this reason, / even if the subband components are aligned by the subband permutation vector, / the order of sources / could be different among the subbands / like this figure. To solve this problem, it is necessary to unify the results for all the subband vectors, / for example, / 0 indicates a red source and 1 indicates a blue source in all the subbands.
  12. This label(レイブーゥ) unification / can be achieved by the following 3 steps. The objective of the following steps is that / we estimate a fullband permutation vector, / which corresponds the red and blue sources / to “zero” and “one,” respectively. In the first step, / as shown in this figure, / the subband permutation vector in the lowest subband is simply set to the corresponding frequency bins / in the fullband permutation vector.
  13. In step 2, / we slide the subband from the previous one / and obtain the subband permutation vector in that subband. We also calculate the binary complement vector of the subband permutation vector / like this. These two vectors are compared with the corresponding parts of the fullband vectors using mean square error, / then the vector that minimizes the error is selected and stored in the memory. The fullband permutation vector is updated by taking a majority decision / using the vectors stored in the memory.
  14. By repeating the process of the step 2, the complete fullband vector can be obtained. Finally, / the permutation problem can be solved by replacing the frequency-wise source components based on the estimated fullband vector.
  15. Let’s move on / to the experiments. This table(テイボーゥ)shows the conditions. In this experiment, / as a training dataset, / we used JVS corpus, / which is a Japanese speech dataset, / as dry sources, / and we mix them using impulse responses. The permutation problem is simulated by randomly shuffling the frequency-wise components of the sources. The test speech dataset is obtained from SiSEC UND task. The bottom figure shows the impulse responses / used in this experiment, / where the reverberation time is 470 ms.
  16. Here is the result of the experiment. The vertical axis shows an average SDR improvement, / which shows the accuracy of the source separation. The leftmost one is an FDICA with ideal permutation solver, / namely, / the permutation is perfectly solved by using the completely separated source signals. So, this is an upper-bound score of the FDICA-based methods. ILRMA(アイルーマ) is the state-of-the-art blind source separation method. Since the reverberation time is long in this experiment, / the performance of ILRMA is not so high. The rightmost one is our proposed method, / where the DNN-based permutation solver is applied after FDICA. The proposed method achieves 8 dB improvement in SDR, / which is close to the upper-limit.
  17. This is the conclusion (カンクルージョン). Thank you for your attention.