Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

603 views

Published on

Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, "Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.437-440, Hawaii, USA, March 2014 (Student Paper Award).

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization

  1. 1. Depth Estimation of Sound Images Using Directional Clustering and Activation-Shared Nonnegative Matrix Factorization Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan)
  2. 2. Outline  Background and related study  Problem and purpose  Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation shared nonnegative matrix factorization  Experiments  Conclusions 2
  3. 3. Background With the advent of 3D TV, the reproduction of 3D image is realized. Viewer feels uncomfortable due to mismatch of images. Problem Picture image Sound image : Sound image 3D TV 3 To solve this problem, sound field reproduction technique have been studied actively. can present the “direction” and “depth” of the sound images to the listener. 3D sound reproduction system has not been established yet.
  4. 4. Related study: wave field synthesis WFS allows us to create sound images at the front of loudspeakers. Wave Field Synthesis (WFS) Sound field reproduction Representation "depth“ of sound images [A. J. Berkhout, et al., 1993] …… … Listener 4 Drawback of WFS× Source separation Localization estimation of sound images 1 2 These information have been lost in existing contents by down-mix. Up-mixing method are required. ↓ Sound image Mixed signal → individual source WFS requires the primary source information of sound images. 1. Individual sound source 2. Localization information
  5. 5. Mixed multi- channel signal Wave field Synthesis Stereo contents Spatial sound reproduction Spatial sound system using existing contents Flow of proposed up-mixer Depth estimation New depth estimation Sound source separation 1 Directional estimation Depth estimation of sound images has not been proposed Conventional method 2 This study 5
  6. 6. Related study: directional clustering [Araki, et al., 2007] 6:Source component :Spatial representative vector L-chinputsignal R-ch input signal L-chinputsignal R-ch input signal Normalization Clustering Mixed stereo signal L-chinputsignal R-ch input signal Individual sources of each cluster : Fourier transform : Inverse Fourier transform 1
  7. 7. Outline  Background and related study  Problem and purpose  Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF  Experiments  Conclusions 7
  8. 8. Problem and purpose 8 Depth estimation method using direction of arrival (DOA) distribution Proposed method Establishing new depth estimation method How can we get depth information? Purpose Problem WFS requires specific localization information of individual sound sources to reproduce a sound field. Up-mixer Directional estimation method have been developed. Directional estimation based on VBAP [Hirata, et al., 2011]
  9. 9. Outline  Background and related study  Problem and purpose  Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF  Experiments  Conclusions 9
  10. 10. → “Direction of arrival” of sound waves We estimate the depth using the DOA distribution. Center RightLeft Frequencyof sourcecomponents Direction of arrival Directional clustering Weighted DOA histogram DOA Amplitude ratio of 10 Directional information Weighting term Proposed method 1: depth estimation based on DOA Mixed signal Individual sources Magnitude of each vector
  11. 11. Proposed method 1: depth estimation based on DOA 11 sourcecomponent Frequencyof sourcecomponent Frequencyof Direction of arrival Close Far Observed DOA histogram becomes smooth shape Difference of DOA shape corresponding to source distance Observed DOA distribution of the target source can be used as a cue for depth estimation. Observed DOA histogram becomes spiky shape Close source Direction of arrival Far source  In sound fields, when a sound source is far from the listener, sound waves arrive from various directions owing to sound diffusion.
  12. 12. 12 Generalized Gaussian distribution: GGD [Box, et al., 1973] Proposed method 1: modeling of DOA distribution βshape = 2: Gaussian distribution PDF βshape = 1: Laplacian distribution PDF Definition of GGD Flexible family of probability density function (PDF)  To model DOA, we propose a new modeling method using GGD. Shape of GGD changes depending on βshape.
  13. 13. 13 Modeling of DOA distribution based on GGD parameter Proposed method 1: modeling of DOA distribution Close Direction of arrival sourcecomponents Frequencyof Far Source is close ⇔ βshape is small Source is Far ⇔ βshape is large We propose a new depth estimation based on GGD. Shape parameter βshape is utilized as metric.
  14. 14. Proposed method 2: problem in proposed method 1 Problem of signal processing L-ch R-ch Small noise components are enhanced. L-chinputsignal R-ch input signalBinaural – recorded Normalization problem 14 DOA Frequencyof sourcecomponents Center RightLeft  Background noise and artificial distortion generated by signal processing interfere with DOA histogram. Activation-shared multichannel NMFFeature extraction Noise ×
  15. 15. Outline  Background and related study  Problem and purpose  Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF  Experiments  Conclusions 15
  16. 16. Proposed method 2: activation-shared multichannel NMF 16 Time Frequency AmplitudeFrequency Amplitude Time Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases Nonnegative matrix factorization: NMF [Lee, et al., 2001] Activation matrix (Time-varying gain) Basis matrix (Spectral patterns) Observed matrix (Spectrogram) — is a sparse representation. — can extract significant features from the observed matrix.  The sparse representation provides high performance for noise reduction, compression, and feature extraction. We eliminate background noise and artificial distortion.
  17. 17. 17 L-ch NMF R-ch NMF  Conventional NMFs generate an artificial fluctuation. Directional information DOA information is disturbed. Conventional NMF Proposed method 2: problem of conventional NMF NMFs are applied in parallel Amplitude ratioBases are trained uncorrelated.
  18. 18. 18 This reduces dimensionality of input signal while maintaining directional information. Cost function Activation matrix is shared through all channels Activation-shared multichannel NMFProposed method : cost function, : β-divergence, : entries of matrices L-ch NMF R-ch NMF Proposed method 2: activation-shared multichannel NMF
  19. 19. - divergence [Eguchi, et al., 2001] : Euclidean distance : Generalized Kullback-Leibler divergence : Itakura–Saito divergence Generalized divergence of variable corresponding to . 19 Proposed method 2: activation-shared multichannel NMF
  20. 20. 20 Using -divergence Proposed method 2: activation-shared multichannel NMF Auxiliary function method is an optimization scheme that uses the upper bound function. 1. Design the auxiliary function for as . 2. Minimize the original cost functions indirectly by minimizing the auxiliary functions. Derivation of optimal variables
  21. 21. The first and second terms become convex or concave functions with respect to value. concave convex convex concave convex concave 21 Proposed method 2: activation-shared multichannel NMF Cost function
  22. 22.  Convex: Jensen’s inequality  Concave: tangent line inequality : Convex function : Concave function 22 Proposed method 2: activation-shared multichannel NMF Cost function Upper bound function of each term is defined by applying
  23. 23.  The update rules for optimization are obtained from the derivative of auxiliary function w.r.t. each objective variable. 23 are entries of matrices . Proposed method 2: activation-shared multichannel NMF Update rules
  24. 24. Flow of proposed depth estimation method Input stereo signal L-ch R-ch STFT Cluster RCluster CCluster L Weighted DOA histogram estimation Depth estimation Depth estimation Depth shared NMF Activation- Direction of arrivalWe can estimate depth information by calculate shape parameter of DOA histogram. Frequencyof sourcecomponents Direction of arrival Direction of arrival shared NMF Activation- shared NMF Activation- 24 Frequencyof sourcecomponents Frequencyof sourcecomponents
  25. 25. Outline  Background and related study  Problem and purpose  Proposed method 1 - Depth estimation based on DOA distribution Proposed method 2 - Activation-shared multichannel NMF  Experiments  Conclusions 25
  26. 26. Experimental conditions 26 Conditions  Mixed stereo signals consist of 3 instruments.  Target source is located center with 7 distances.  Combination related to direction is 6 patterns. Mixing source parameter Test source 1 Test source 2 Test source 3 Reverberation time NMF beta NMF basis: Interference source : Target source at intervals Conventional method 2 Conventional method 1 Proposed method Weighted DOA histogram (Not processed by NMF) Processed by conventional NMF Processed by proposed NMF
  27. 27. Real source Image source Geometry of image method Time index Amplitude Example of room impulse response Experimental conditions Technique of simulating room impulse response  Volume of room  Source location  Microphone location  Absorption coefficient – can be set arbitrarily Reference sound sources were generated using image method. Image method [Allen, et al., 1979] 27
  28. 28. 28 Experimental results Results 1 ・ Results of conventional methods have no agreement with the oracle (image method). ・ Results of proposed method correctly estimates distance of the target source. : Interference source : Target source Target source: Vocal Interference source (left): Piano Interference source (right): Guitar Data set 1
  29. 29. 29 Data set 1 2 3 4 5 6 Target source Interference source (left) Interference source (right) Vocal Piano Guitar Vocal Guitar Piano Guitar Piano Vocal Guitar Vocal Piano Piano Vocal Guitar Piano Guitar Vocal Conventional method 1 0.350 0.532 0.154 0.277 0.602 0.496 Conventional method 2 0.189 0.165 0.044 -0.037 0.426 0.157 Proposed method 0.986 0.925 0.777 0.651 0.791 0.856 Experimental results: correlation coefficient Correlation coefficient between reference value and estimated value • Strong relation between the estimated value of proposed method and the distance of the target source is indicated. • The efficacy of the proposed method is confirmed. Table Correlation coefficient of each method Results 2
  30. 30. Conclusions 30  We proposed a new depth estimation method of sound source in mixed signal using the shape of DOA distribution.  The shape of DOA distribution is modeling by GGD.  We also proposed a new feature extraction method for the multichannel signal, activation-shared multichannel NMF.  The result of the experiment indicated the efficacy of the proposed method.
  31. 31. 31
  32. 32. Derivation of parameter βshape Kurtosis of DOA histogram we propose a closed-form parameter estimation algorithm based on some approximation and kurtosis. th moment of GGD : Observed DOA histogram : Gamma function × 32 Relation equation of kurtosis and shape parameter The maximum-likelihood based shape parameter estimation has no closed-form solution in GGD.
  33. 33. Modified Stirling's formula There is no exact closed-form solution of the inverse function.× Approximation of gamma function Take a logarithm 33 Derivation of parameter βshape Introduce Modified String’s formula
  34. 34. This results in the following quadratic equation of to be solved closed-form estimate of shape parameter Preparation of depth estimation method is completed. we can derive the closed-form estimation 34 Derivation of parameter βshape
  35. 35. 35 L-ch NMF R-ch NMF Preliminary experiment Fluctuation are generated in DOA Direction of arrival [degree] L-ch NMF R-ch NMF (Individually applied) conventional NMF (Activation-shared) proposed NMF Weighted DOA histogram Center cluster DOA of mixed source (3 instrument)Direction of arrival [degree] Direction of arrival [degree] Feature extraction while maintaining directional information Proposed method 2: activation-shared multichannel NMF Example of DOA histogram

×