Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization

Depth Estimation of Sound Images Using
Directional Clustering and Activation-Shared
Nonnegative Matrix Factorization

Tomo Miyauchi, Daichi Kitamura,
Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)

Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation shared nonnegative matrix factorization
Experiments
Conclusions
2

Background
With the advent of 3D TV, the reproduction of 3D image is realized.
3D sound reproduction system has not been established yet.
Problem

Picture image

Sound image

3D TV

: Sound
image

Viewer feels uncomfortable due to mismatch of images.

To solve this problem, sound field reproduction technique
have been studied actively.
can present the “direction” and “depth” of
the sound images to the listener.
3

Related study: wave field synthesis
Sound field reproduction

WFS requires the primary source
information of sound images.

Representation "depth“
of sound images

1. Individual sound source
2. Localization information

Wave Field Synthesis (WFS)
[A. J. Berkhout, et al., 1993]

WFS allows us to create sound
images at the front of loudspeakers.

…

…

…

These information have been lost in
existing contents by down-mix.

×Drawback of WFS
↓
Up-mixing method are required.

Sound image
1

Source separation

Mixed signal → individual source
2
Listener

Localization estimation of
sound images
4

Flow of proposed up-mixer
Spatial sound system using existing contents
Stereo contents

Spatial sound
reproduction

1

Mixed multichannel signal
Conventional
method

Sound source
separation

Wave field
Synthesis
This study

2

Directional
estimation

New depth
Depth
estimation

Depth estimation of sound images has not been proposed

5

Related study: directional clustering [Araki, et al., 2007]
Individual sources of each cluster

Mixed stereo signal

: Inverse Fourier transform

L-ch input signal

L-ch input signal

: Fourier transform
L-ch input signal

1

R-ch input signal

R-ch input signal

R-ch input signal

Normalization
：Source component

Clustering
：Spatial representative vector

6

Outline
Problem and purpose
Proposed method 1
Proposed method 2
- Activation-shared multichannel NMF
Experiments
Conclusions
7

Problem and purpose
Problem WFS requires specific localization information of
individual sound sources to reproduce a sound field.
Up-mixer
Directional estimation method have been developed.
Directional estimation based on VBAP [Hirata, et al., 2011]

Purpose
Establishing new depth estimation method
How can we get depth information?

Proposed method

Depth estimation method using
direction of arrival (DOA) distribution
8

Outline
Problem and purpose
Proposed method 1
Proposed method 2
Experiments
Conclusions
9

Proposed method 1: depth estimation based on DOA
DOA

→ “Direction of arrival” of sound waves
We estimate the depth using the DOA distribution.

Directional clustering

Weighted DOA histogram
Directional information

Amplitude
ratio of

Weighting term

Mixed signal
Frequency of
source components

Magnitude of each vector

Individual sources

Left

Center

Right

Direction of arrival

10

Proposed method 1: depth estimation based on DOA
In sound fields, when a sound source is far from the listener, sound waves
arrive from various directions owing to sound diffusion.

Frequency of

Close

source component

Difference of DOA shape corresponding to source distance

Close source
Observed DOA histogram
becomes spiky shape

Frequency of

Far

source component


Far source
Observed DOA histogram
becomes smooth shape

Observed DOA distribution of the target source
can be used as a cue for depth estimation.

11

Proposed method 1: modeling of DOA distribution
To model DOA, we propose a new modeling method using GGD.
Generalized Gaussian distribution: GGD [Box, et al., 1973]
Flexible family of probability
density function (PDF)

Shape of GGD changes
depending on βshape.
βshape = 2: Gaussian
distribution PDF
βshape = 1: Laplacian
distribution PDF
Definition of GGD

12

Proposed method 1: modeling of DOA distribution
Modeling of DOA distribution based on GGD parameter

Frequency of
source components

Close

Far


We propose a new depth estimation based on GGD.
Shape parameter βshape
is utilized as metric.

Source is close ⇔ βshape is small
Source is Far ⇔ βshape is large

13

Proposed method 2： problem in proposed method 1
Normalization problem
Small noise components
are enhanced.

× Problem of
L-ch

signal processing

R-ch

L-ch input signal

Frequency of
source components

Left

Binaural – recorded

R-ch input signal

Center

Right

Noise
DOA

Background noise and artificial distortion generated
by signal processing interfere with DOA histogram.
Feature extraction

Activation-shared multichannel NMF
14

Outline
Problem and purpose
Proposed method 1
Proposed method 2
Experiments
Conclusions
15

Proposed method 2: activation-shared multichannel NMF
Nonnegative matrix factorization: NMF [Lee, et al., 2001]

Frequency

Frequency

Amplitude

— is a sparse representation.
— can extract significant features from the observed matrix.

Time

Observed matrix
(Spectrogram)

Time

Amplitude

Activation matrix
(Time-varying gain)

Basis matrix
(Spectral patterns)

Ω: Number of frequency bins
: Number of time frames
: Number of bases

The sparse representation provides high performance
for noise reduction, compression, and feature extraction.
We eliminate background noise and artificial distortion.

16

Proposed method 2: problem of conventional NMF
Conventional NMF

Directional
information

L-ch
NMF

NMFs are
applied in
parallel

R-ch
NMF

Conventional NMFs
generate an artificial
fluctuation.

Bases are trained
uncorrelated.

Amplitude
ratio

DOA information
is disturbed.
17

Proposed method

Activation-shared multichannel NMF

NMF

Activation matrix
is shared through
all channels

R-ch

This reduces dimensionality of
input signal while maintaining
directional information.

L-ch

NMF

Cost function

: cost function,

: β-divergence,

: entries of matrices
18

- divergence [Eguchi, et al., 2001]
Generalized divergence of variable

corresponding to .

: Euclidean distance
: Generalized Kullback-Leibler divergence
: Itakura–Saito divergence
19

Derivation of optimal variables
Auxiliary function method is an optimization
scheme that uses the upper bound function.
1. Design the auxiliary function for
as
.
2. Minimize the original cost functions indirectly
by minimizing the auxiliary functions.
Using
-divergence

20

Cost function

The first and second terms become convex or concave
functions with respect to value.

concave
convex
convex

concave
concave
convex
21

Cost function

Upper bound function of each term is defined by applying
Convex: Jensen’s inequality

Concave: tangent line inequality

: Convex
function

: Concave
function

22

The update rules for optimization are obtained from the
derivative of auxiliary function w.r.t. each objective variable.

Update rules
‫ﰀﰀ‬

are entries
of matrices
.
23

Frequency of
source components

Flow of proposed depth estimation method
Input stereo signal
R-ch
L-ch
STFT


Cluster L

Cluster C

Cluster R

Activation- Activation- Activationshared NMF shared NMF shared NMF
Depth
estimation

Depth
estimation

Depth
estimation

We can estimate depth information by
calculate shape parameter of DOA histogram.

Frequency of
source components

‫ﰀﰀ‬

Frequency of
source components



24

Outline
Problem and purpose
Proposed method 1
Proposed method 2
Experiments
Conclusions
25

Experimental conditions
Conditions
Mixing source parameter

Test source 1
Test source 2
Test source 3
: Target source

Reverberation time

intervals

NMF beta

: Interference source

NMF basis

at

Mixed stereo signals
consist of 3 instruments.

Conventional
method 1

Target source is located
center with 7 distances.

Conventional
Processed by conventional NMF
method 2

Combination related to
direction is 6 patterns.

Proposed
method

（Not processed by NMF）

Processed by proposed NMF
26

Experimental conditions
Image method
[Allen, et al., 1979]

Geometry of image method
Real source

Technique of simulating
room impulse response

Image source

Volume of room
Source location
Microphone location
Absorption coefficient
Example of room impulse response

Reference sound sources
were generated using
image method.

Amplitude

– can be set arbitrarily

Time index

27

Experimental results
Results 1

: Target source

‫ﰀ‬ҏ

: Interference source

Data set 1
Target source: Vocal
Interference source (left): Piano
Interference source (right): Guitar

・ Results of conventional methods have no agreement with the oracle (image method).
・ Results of proposed method correctly estimates distance of the target source.

28

Experimental results: correlation coefficient
Results 2
Correlation coefficient
between reference value
and estimated value
Table Correlation coefficient of each method
Data set

1

2

3

4

5

6

‫ﰀ‬ҏ

Target source
Interference source (left)
Interference source (right)

Vocal
Piano
Guitar

Vocal
Guitar
Piano

Guitar
Piano
Vocal

Guitar
Vocal
Piano

Piano
Vocal
Guitar

Piano
Guitar
Vocal

Conventional method 1
Conventional method 2
Proposed method

0.350
0.189
0.986

0.532
0.165
0.925

0.154
0.044
0.777

0.277
-0.037
0.651

0.602
0.426
0.791

0.496
0.157
0.856

• Strong relation between the estimated value of proposed
method and the distance of the target source is indicated.
• The efficacy of the proposed method is confirmed.

29

Conclusions
We proposed a new depth estimation method of
sound source in mixed signal using the shape of DOA
distribution.
The shape of DOA distribution is modeling by GGD.
We also proposed a new feature extraction method
for the multichannel signal, activation-shared
multichannel NMF.
The result of the experiment indicated the efficacy of
the proposed method.

30

Derivation of parameter βshape

×The maximum-likelihood based shape parameter
estimation has no closed-form solution in GGD.
we propose a closed-form parameter estimation
algorithm based on some approximation and kurtosis.
Kurtosis of DOA histogram

th moment of GGD

Relation equation of kurtosis and shape parameter

: Observed DOA histogram

: Gamma function
32


×There is no exact closed-form solution of the inverse function.
Introduce Modified String’s formula Approximation of
gamma function
Modified Stirling's formula
‫ﰀﰀ‬

Take a logarithm

33

This results in the following quadratic equation of

to be solved

we can derive the closed-form estimation
‫ﰀﰀ‬

closed-form estimate of shape parameter

Preparation of depth estimation method is completed.
34

Preliminary experiment

Example of
DOA histogram

Weighted
DOA histogram
Direction of arrival [degree]

(Individually applied)
conventional NMF
Fluctuation are
generated in DOA

‫ﰀﰀ‬

L-ch
NMF

R-ch
NMF

(Activation-shared)
proposed NMF
Feature extraction
while maintaining
directional information

Center cluster DOA
of mixed source
(3 instrument)

L-ch
NMF

R-ch
NMF

35

Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization

Similar to Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization (20)

More from 奈良先端大情報科学研究科

More from 奈良先端大情報科学研究科 (20)

Recently uploaded

Recently uploaded (20)