DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

DNN-based permutation solver for
frequency-domain independent component
analysis in two-source mixture case
Shuhei Yamaji and Daichi Kitamura
National Institute of Technology, Kagawa College
Japan
12th Asia-Pacific Signal and Information Processing
Association (APSIPA)
1

Introduction
 About audio source separation
 Applications of audio source separation
– Speech recognition
– Noise canceling
– Voice command device etc.
Nice to
meet you...
Hello…
Hello…
Nice to
meet you...
Audio
source
separation
2

Blind Source Separation
 Independent component analysis (ICA) [Comon, 1994]
⁃ Assumes independence between source signals
⁃ Estimates demixing matrix without knowing mixing matrix
Actual audio mixing in reverberant environment
⁃ Convolution with room impulse responses between sources mics
⁃ Extend ICA to the frequency domain
Source signal Mixture signal Estimated signal
3

Frequency-Domain ICA
 Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– Apply ICA in each frequency bin
Spectrogram
ICA1
ICA2
ICA3
…
…
ICA
Frequency
bin
Time frame
…
Inverse matrix
Frequency-wise
mixing matrix
Frequency-wise
demixing matrix
4

Permutation Problem in FDICA
 Permutation problem in frequency-domain ICA
– Order of separated signals in each frequency is messed up
– Separated components must be aligned along the frequency axis
FDICA
All frequency
components
Source 1
Source 2
Observed 1
Observed 2
Estimated signal 1
Estimated signal 2
Non-aligned signal
Permutation
Solver
Time
5

 Popular permutation solvers
– Based on Temporal Structures
• FDICA + correlation-based alignment between adjacent
frequencies [Murata+, 2001]
– Based on direction of arrival (DOA)
• Frequency-domain ICA + DOA alignment [Saruwatari+, 2006]
– Based on a relative correlation among frequencies
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim+, 2006]
– Based on a low-rank modeling of each source
• Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016]
Conventional Permutation Solvers
Time
…
…
Sort
Non-aligned signal Non-aligned signal
6

 Problems of conventional permutation solvers
– Correlation-based method sometimes
fails to align components
– Even in IVA and ILRMA,
block permutation problem arise
 Proposed method: DNN-based permutation solver
– The permutation problems can be simulated by shuffling the
frequency components of source signals
– Training data for DNN are easy to produce
Motivation of Proposed Method
Non-aligned
signal
Non-aligned
signal
Time
Separated
signal
Separated
signal
DNN
DNN
7

Proposed method: DNN input and label
 Input and label
– Extract two short-time activations of reference and another
frequencies from the separated signal
– DNN predicts whether the permutation of input two frequencies is
correct (correct=0 and incorrect=1)
8
DNN
Correct permutation case Incorrect permutation case
DNN
Reference
Another
Reference
Another

 Simple Model
– 6 hidden layers with ReLU or Sigmoid functions
Proposed method: DNN Architecture
Hidden
Layer
1
(128
units)
ReLU
Hidden
Layer
2
(128
units)
ReLU
Hidden
Layer
3
(128
units)
ReLU
Hidden
Layer
4
(64
units)
ReLU
Hidden
Layer
5
(64
units)
ReLU
Hidden
Layer
6
(1
units)
Sigmoid
Output
Layer
(1
units)
Target
label
(1
units)
Input
Layer
(160
units)
Minimum
MSE
0
or
1
9

 Apply DNN in subband frequency (local time-frequency area)
– Subband: Reference (center) frequency several frequencies
 Take majority decision along time frames
– to determine the subband permutation vector
Proposed method: DNN predictions in subband frequency bins
DNN output
Input vector
1 : Different sound source
0 : Same sound source
0 : Same sound source
10
Subband
permutation
vectorにして
おく

Proposed method: construct a fullband permutation vector
 Alignment among subbands
– When the subband slides along frequency axis, the reference
(center) frequency component changes
• The meanings of “0 (same)” and “1 (different)” labels are not
shared among subbands
– The orders of source components in all subbands must be aligned
after the DNN prediction in all subbands
11

 Objective
– Estimate “fullband permutation vector” that corresponds the two
sources to “0” and “1”
 Step1
– The subband permutation vector of the lowest frequency subband is
simply set to the corresponding frequency bins in the fullband
permutation vector
Time
Frequency
1
1
0
1
0
1
1
0
1
0
1
1
0
1
0
1. Set
Fullband
permutation
vector
2. Set
12

 Step2
– Slide the subband frequencies
– Obtain the subband permutation vector of the current subband and
its binary complement vector
– The similarity between subband and fullband permutation vectors are
measured by mean squared error (MSE)
– Set the subband vector that minimize MSE to the memory
– Update fullband permutation vector by taking majority decision
Time
Frequency
1
0
0
1
0
1
1
0
1
0
0
1
1
0
1
0
1
1
0
1
0
2. Set
0
1
1
0
1
1. Similarity comparison
3.
Majority
decision
Fullband
permutation
vector
13

 Step3
– Iterate step2 up to the highest frequency subband
– Replace the components based on the fullband permutation vector
– Obtain permutation-aligned estimated signals
1
1
0
1
0
0
1
1
0
1
1
0
0
1
1
0
0
1
1
0
1
0
0
1
1
0
1
0
Majority
decision
Time
Frequency
Replace
Fullband
permutation
vector
Fullband
Vector
14

Experimental conditions
Training speech
signals
Dry sources: JVS corpus [Takamichi+, 2019] (Japanese speech)
Mixture: Convolve dry sources with RWCP impulse responses [Nakamura+, 2000]
Permutation: apply FDICA and randomly shuffling the components
Test speech
signals
Speech signals obtained from SiSEC2011 UND task [Araki+, 2012]
FFT length 8192 (512 ms, Humming window)
Shift length 2048
Subjective
evaluation
Average improvement of signal-to-distortion ratio (SDR)
Reverberation Time
15

Results
 Findings
– Proposed method achieves an improvement of about 8 dB
– ILRMA's separation performance is about 4dB
– The proposed method is close to the upper-limit performance
0
2
4
6
8
10
12
FDICA
with IPS
ILRMA
(2 bases)
ILRMA
(3 bases)
ILRMA
(4 bases)
Proposed
method
SDR
improvement
[dB]
Good
Poor
ILRMA
(2 bases)
FDICA with
ideal
permutation
solver
(reference score)
ILRMA
（3 bases）
ILRMA
（4 bases）
FDICA with
DNN-based
permutation
solver
(proposed)
16

Conclusion
 In this paper
– We proposed a new DNN-based permutation solver for determined
audio source separation using FDICA
– An SDR improvement of about 8 dB was achieved in experiments
with a highly reverberant speech mixture signal
 Future work
– The proposed method creates a combinatorial explosion for three or
more separated signals
17
Thank you for your attention!

Demonstration
Original
Mixture
FDICA with IPS
FDICA with
proposed method
18

DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

Similar to DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case (20)

More from Kitamura Laboratory

More from Kitamura Laboratory (20)

Recently uploaded

Recently uploaded (20)

DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case

Editor's Notes