Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Depth Estimation of Sound Images Using
Directional Clustering and Activation-Shared
Nonnegative Matrix Factorization

Tomo...
Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Pr...
Background
With the advent of 3D TV, the reproduction of 3D image is realized.
3D sound reproduction system has not been e...
Related study: wave field synthesis
Sound field reproduction

WFS requires the primary source
information of sound images....
Flow of proposed up-mixer
Spatial sound system using existing contents
Stereo contents

Spatial sound
reproduction

1

Mix...
Related study: directional clustering [Araki, et al., 2007]
Individual sources of each cluster

Mixed stereo signal

: Inv...
Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Pr...
Problem and purpose
Problem WFS requires specific localization information of
individual sound sources to reproduce a soun...
Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Pr...
Proposed method 1: depth estimation based on DOA
DOA

→ “Direction of arrival” of sound waves
We estimate the depth using ...
Proposed method 1: depth estimation based on DOA
In sound fields, when a sound source is far from the listener, sound wave...
Proposed method 1: modeling of DOA distribution
To model DOA, we propose a new modeling method using GGD.
Generalized Gaus...
Proposed method 1: modeling of DOA distribution
Modeling of DOA distribution based on GGD parameter

Frequency of
source c...
Proposed method 2: problem in proposed method 1
Normalization problem
Small noise components
are enhanced.

× Problem of
L...
Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Pr...
Proposed method 2: activation-shared multichannel NMF
Nonnegative matrix factorization: NMF [Lee, et al., 2001]

Frequency...
Proposed method 2: problem of conventional NMF
Conventional NMF

Directional
information

L-ch
NMF

NMFs are
applied in
pa...
Proposed method 2: activation-shared multichannel NMF
Proposed method

Activation-shared multichannel NMF

NMF

Activation...
Proposed method 2: activation-shared multichannel NMF
- divergence [Eguchi, et al., 2001]
Generalized divergence of variab...
Proposed method 2: activation-shared multichannel NMF
Derivation of optimal variables
Auxiliary function method is an opti...
Proposed method 2: activation-shared multichannel NMF
Cost function

The first and second terms become convex or concave
f...
Proposed method 2: activation-shared multichannel NMF
Cost function

Upper bound function of each term is defined by apply...
Proposed method 2: activation-shared multichannel NMF
The update rules for optimization are obtained from the
derivative o...
Frequency of
source components

Flow of proposed depth estimation method
Input stereo signal
R-ch
L-ch
STFT

Direction of ...
Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Pr...
Experimental conditions
Conditions
Mixing source parameter

Test source 1
Test source 2
Test source 3
: Target source

Rev...
Experimental conditions
Image method
[Allen, et al., 1979]

Geometry of image method
Real source

Technique of simulating
...
Experimental results
Results 1

: Target source

‫ﰀ‬ҏ

: Interference source

Data set 1
Target source: Vocal
Interference...
Experimental results: correlation coefficient
Results 2
Correlation coefficient
between reference value
and estimated valu...
Conclusions
We proposed a new depth estimation method of
sound source in mixed signal using the shape of DOA
distribution....
䩐

31
Derivation of parameter βshape

×The maximum-likelihood based shape parameter
estimation has no closed-form solution in GG...
Derivation of parameter βshape

×There is no exact closed-form solution of the inverse function.
Introduce Modified String...
Derivation of parameter βshape
This results in the following quadratic equation of

to be solved

we can derive the closed...
Proposed method 2: activation-shared multichannel NMF
Preliminary experiment

Example of
DOA histogram

Weighted
DOA histo...
Upcoming SlideShare
Loading in …5
×

Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization

2,300 views

Published on

Published in: Technology
  • Be the first to comment

Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization

  1. 1. Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan)
  2. 2. Outline Background and related study Problem and purpose Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation shared nonnegative matrix factorization Experiments Conclusions 2
  3. 3. Background With the advent of 3D TV, the reproduction of 3D image is realized. 3D sound reproduction system has not been established yet. Problem Picture image Sound image 3D TV : Sound image Viewer feels uncomfortable due to mismatch of images. To solve this problem, sound field reproduction technique have been studied actively. can present the “direction” and “depth” of the sound images to the listener. 3
  4. 4. Related study: wave field synthesis Sound field reproduction WFS requires the primary source information of sound images. Representation "depth“ of sound images 1. Individual sound source 2. Localization information Wave Field Synthesis (WFS) [A. J. Berkhout, et al., 1993] WFS allows us to create sound images at the front of loudspeakers. … … … These information have been lost in existing contents by down-mix. ×Drawback of WFS ↓ Up-mixing method are required. Sound image 1 Source separation Mixed signal → individual source 2 Listener Localization estimation of sound images 4
  5. 5. Flow of proposed up-mixer Spatial sound system using existing contents Stereo contents Spatial sound reproduction 1 Mixed multichannel signal Conventional method Sound source separation Wave field Synthesis This study 2 Directional estimation New depth Depth estimation Depth estimation of sound images has not been proposed 5
  6. 6. Related study: directional clustering [Araki, et al., 2007] Individual sources of each cluster Mixed stereo signal : Inverse Fourier transform L-ch input signal L-ch input signal : Fourier transform L-ch input signal 1 R-ch input signal R-ch input signal R-ch input signal Normalization :Source component Clustering :Spatial representative vector 6
  7. 7. Outline Background and related study Problem and purpose Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF Experiments Conclusions 7
  8. 8. Problem and purpose Problem WFS requires specific localization information of individual sound sources to reproduce a sound field. Up-mixer Directional estimation method have been developed. Directional estimation based on VBAP [Hirata, et al., 2011] Purpose Establishing new depth estimation method How can we get depth information? Proposed method Depth estimation method using direction of arrival (DOA) distribution 8
  9. 9. Outline Background and related study Problem and purpose Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF Experiments Conclusions 9
  10. 10. Proposed method 1: depth estimation based on DOA DOA → “Direction of arrival” of sound waves We estimate the depth using the DOA distribution. Directional clustering Weighted DOA histogram Directional information Amplitude ratio of Weighting term Mixed signal Frequency of source components Magnitude of each vector Individual sources Left Center Right Direction of arrival 10
  11. 11. Proposed method 1: depth estimation based on DOA In sound fields, when a sound source is far from the listener, sound waves arrive from various directions owing to sound diffusion. Frequency of Close source component Difference of DOA shape corresponding to source distance Close source Observed DOA histogram becomes spiky shape Frequency of Far source component Direction of arrival Far source Observed DOA histogram becomes smooth shape Direction of arrival Observed DOA distribution of the target source can be used as a cue for depth estimation. 11
  12. 12. Proposed method 1: modeling of DOA distribution To model DOA, we propose a new modeling method using GGD. Generalized Gaussian distribution: GGD [Box, et al., 1973] Flexible family of probability density function (PDF) Shape of GGD changes depending on βshape. βshape = 2: Gaussian distribution PDF βshape = 1: Laplacian distribution PDF Definition of GGD 12
  13. 13. Proposed method 1: modeling of DOA distribution Modeling of DOA distribution based on GGD parameter Frequency of source components Close Far Direction of arrival We propose a new depth estimation based on GGD. Shape parameter βshape is utilized as metric. Source is close ⇔ βshape is small Source is Far ⇔ βshape is large 13
  14. 14. Proposed method 2: problem in proposed method 1 Normalization problem Small noise components are enhanced. × Problem of L-ch signal processing R-ch L-ch input signal Frequency of source components Left Binaural – recorded R-ch input signal Center Right Noise DOA Background noise and artificial distortion generated by signal processing interfere with DOA histogram. Feature extraction Activation-shared multichannel NMF 14
  15. 15. Outline Background and related study Problem and purpose Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF Experiments Conclusions 15
  16. 16. Proposed method 2: activation-shared multichannel NMF Nonnegative matrix factorization: NMF [Lee, et al., 2001] Frequency Frequency Amplitude — is a sparse representation. — can extract significant features from the observed matrix. Time Observed matrix (Spectrogram) Time Amplitude Activation matrix (Time-varying gain) Basis matrix (Spectral patterns) Ω: Number of frequency bins : Number of time frames : Number of bases The sparse representation provides high performance for noise reduction, compression, and feature extraction. We eliminate background noise and artificial distortion. 16
  17. 17. Proposed method 2: problem of conventional NMF Conventional NMF Directional information L-ch NMF NMFs are applied in parallel R-ch NMF Conventional NMFs generate an artificial fluctuation. Bases are trained uncorrelated. Amplitude ratio DOA information is disturbed. 17
  18. 18. Proposed method 2: activation-shared multichannel NMF Proposed method Activation-shared multichannel NMF NMF Activation matrix is shared through all channels R-ch This reduces dimensionality of input signal while maintaining directional information. L-ch NMF Cost function : cost function, : β-divergence, : entries of matrices 18
  19. 19. Proposed method 2: activation-shared multichannel NMF - divergence [Eguchi, et al., 2001] Generalized divergence of variable corresponding to . : Euclidean distance : Generalized Kullback-Leibler divergence : Itakura–Saito divergence 19
  20. 20. Proposed method 2: activation-shared multichannel NMF Derivation of optimal variables Auxiliary function method is an optimization scheme that uses the upper bound function. 1. Design the auxiliary function for as . 2. Minimize the original cost functions indirectly by minimizing the auxiliary functions. Using -divergence 20
  21. 21. Proposed method 2: activation-shared multichannel NMF Cost function The first and second terms become convex or concave functions with respect to value. concave convex convex concave concave convex 21
  22. 22. Proposed method 2: activation-shared multichannel NMF Cost function Upper bound function of each term is defined by applying Convex: Jensen’s inequality Concave: tangent line inequality : Convex function : Concave function 22
  23. 23. Proposed method 2: activation-shared multichannel NMF The update rules for optimization are obtained from the derivative of auxiliary function w.r.t. each objective variable. Update rules ‫ﰀﰀ‬ are entries of matrices . 23
  24. 24. Frequency of source components Flow of proposed depth estimation method Input stereo signal R-ch L-ch STFT Direction of arrival Cluster L Cluster C Cluster R Activation- Activation- Activationshared NMF shared NMF shared NMF Depth estimation Depth estimation Depth estimation We can estimate depth information by calculate shape parameter of DOA histogram. Direction of arrival Frequency of source components ‫ﰀﰀ‬ Frequency of source components Weighted DOA histogram Direction of arrival 24
  25. 25. Outline Background and related study Problem and purpose Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF Experiments Conclusions 25
  26. 26. Experimental conditions Conditions Mixing source parameter Test source 1 Test source 2 Test source 3 : Target source Reverberation time intervals NMF beta : Interference source NMF basis at Mixed stereo signals consist of 3 instruments. Conventional method 1 Target source is located center with 7 distances. Conventional Processed by conventional NMF method 2 Combination related to direction is 6 patterns. Proposed method Weighted DOA histogram (Not processed by NMF) Processed by proposed NMF 26
  27. 27. Experimental conditions Image method [Allen, et al., 1979] Geometry of image method Real source Technique of simulating room impulse response Image source Volume of room Source location Microphone location Absorption coefficient Example of room impulse response Reference sound sources were generated using image method. Amplitude – can be set arbitrarily Time index 27
  28. 28. Experimental results Results 1 : Target source ‫ﰀ‬ҏ : Interference source Data set 1 Target source: Vocal Interference source (left): Piano Interference source (right): Guitar ・ Results of conventional methods have no agreement with the oracle (image method). ・ Results of proposed method correctly estimates distance of the target source. 28
  29. 29. Experimental results: correlation coefficient Results 2 Correlation coefficient between reference value and estimated value Table Correlation coefficient of each method Data set 1 2 3 4 5 6 ‫ﰀ‬ҏ Target source Interference source (left) Interference source (right) Vocal Piano Guitar Vocal Guitar Piano Guitar Piano Vocal Guitar Vocal Piano Piano Vocal Guitar Piano Guitar Vocal Conventional method 1 Conventional method 2 Proposed method 0.350 0.189 0.986 0.532 0.165 0.925 0.154 0.044 0.777 0.277 -0.037 0.651 0.602 0.426 0.791 0.496 0.157 0.856 • Strong relation between the estimated value of proposed method and the distance of the target source is indicated. • The efficacy of the proposed method is confirmed. 29
  30. 30. Conclusions We proposed a new depth estimation method of sound source in mixed signal using the shape of DOA distribution. The shape of DOA distribution is modeling by GGD. We also proposed a new feature extraction method for the multichannel signal, activation-shared multichannel NMF. The result of the experiment indicated the efficacy of the proposed method. 30
  31. 31. 䩐 31
  32. 32. Derivation of parameter βshape ×The maximum-likelihood based shape parameter estimation has no closed-form solution in GGD. we propose a closed-form parameter estimation algorithm based on some approximation and kurtosis. Kurtosis of DOA histogram th moment of GGD Relation equation of kurtosis and shape parameter : Observed DOA histogram : Gamma function 32
  33. 33. Derivation of parameter βshape ×There is no exact closed-form solution of the inverse function. Introduce Modified String’s formula Approximation of gamma function Modified Stirling's formula ‫ﰀﰀ‬ Take a logarithm 33
  34. 34. Derivation of parameter βshape This results in the following quadratic equation of to be solved we can derive the closed-form estimation ‫ﰀﰀ‬ closed-form estimate of shape parameter Preparation of depth estimation method is completed. 34
  35. 35. Proposed method 2: activation-shared multichannel NMF Preliminary experiment Example of DOA histogram Weighted DOA histogram Direction of arrival [degree] (Individually applied) conventional NMF Fluctuation are generated in DOA ‫ﰀﰀ‬ L-ch NMF R-ch NMF Direction of arrival [degree] (Activation-shared) proposed NMF Feature extraction while maintaining directional information Center cluster DOA of mixed source (3 instrument) L-ch NMF R-ch NMF Direction of arrival [degree] 35

×