Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ica2016 312 saruwatari


Published on

Invited talk in International Congress on Acoustics (ICA2016) held on Sept. 7th, 2016.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Ica2016 312 saruwatari

  1. 1. Flexible Microphone Array Based on Multichannel Nonnegative Matrix Factorization and Statistical Signal Estimation Hiroshi Saruwatari, Kazuma Takata (The Unoversity of Tokyo, JAPAN) Nobutaka Ono (NII, JAPAN), Shoji Makino (University of Tsukuba, JAPAN) Acoustic Array Systems: Paper ICA2016-312
  2. 2. Outline  Introduction of rescue robot audition  Conventional approaches (ICA, IVA, Rank-1 MNMF)  Informed source separation and its problem  Ego-noise basis mismatch problem solution  Speech ambiguity problem solution  Experimental evaluation  Conclusion 2
  3. 3. Introduction: Rescue Robot Audition  Aimed to detect victims’ speech in a disaster area.  Flexible body twists and moves driven by vibration motors.  It wears multiple microphones around the body. • Thus, microphones’ position is always unknown. • Self-Vibration generates harmful noise. (so-called Ego-Noise) One of the Distributed Microphone Array Problem 3 MicrophoneVibrator What is hose-shaped rescue robot?
  4. 4. 4 Source Observation Separated Mixing Separation Conventional: ICA or Independent vector analysis (IVA), which separates the sources based on their independence nature. We assume linear time- invariance in A. This is a simultaneous estimation problem for W and source statistical models. x=As y=Wx Unknown Known W Demixing matrix How to solve? Use Blind Source Separation Source model (p.d.f.s) S1 S2 Speech Ego-noise Speech Ego- noise +
  5. 5. Low-rank source spectrogram 5 Rank-1 MNMF (Independent Low-Rank Matrix Analysis) that separates the sources by estimating demixing matrix W and low-rank source spectrogram model via Nonnegative Matrix Factorization (NMF) [Lee, 2001]. Rank-1 MNMF [Kitamura, Saruwatari et al., IEEE Trans. ASLP 2016] W Demixing matrix Simultaneous estimation for W and TV + In this study, we focus our attention to…
  6. 6. 6 Rank-1 MNMF (Independent Low-Rank Matrix Analysis) Pros & Cons: • All parameters can be updated via Auxiliary-Function method (EM-like algorithm), keeping nonnegative feature of T & V. • The cost function always decreases in each iteration. Thus, this is convergence-guaranteed algorithm unlike ICA! • Still affected by initial state of parameters. go to “Informed” Rank-1 MNMF’s cost function to be minimized : Independence measure between sources (for W) : Low-rank approximation of sources (for T and V) (Note: both are based on Itakura-Saito (IS) divergence.)
  7. 7. Typical ego-noise basis trained by NMF in advance Activation Source model in Rank-1 MNMF 7 Basis Toward Informed Source Separation
  8. 8. Typical ego-noise basis trained by NMF in advance Activation Source model in Rank-1 MNMF Fixing a part of bases, estimate remaining parameters and W. 8 Basis Speech basis Ego- noise basis Toward Informed Source Separation (unknown) (unknown)
  9. 9. Typical ego-noise basis trained by NMF in advance Activation Source model in Rank-1 MNMF Fixing a part of bases, estimate remaining parameters and W. [Problem 1] Ego-noise time-variance (ego-noise mismatch problem) [Problem 2] Unknown speech (speech model ambiguity problem) 9 Basis Speech basis Ego- noise basis Toward Informed Source Separation (unknown) (unknown)
  10. 10. Supervised Rank-1 MNMF Rough separation Statistical Postfilter [Breithaupt, 2010]  Chi distribution (sparse p.d.f.) is used as target signal prior.  Its sparseness can be estimted from data empirically via higher-order statistics [Murota, Saruwatari, ICASSP2014]. Observed signal Thanks to sparse prior, we can obtain more accurate separation and its Certainty. Statistical Signal Estimation Certainty Estimated ego-noise Sparse p.d.f. 6 Estimated target signal
  11. 11. Statistical Signal Estimation 6 Certainty I ={1; if G(f,t)>0.8, otherwise 0}: binary mask that extracts seldom overlapping components with the target signal from the estimated interference signal.
  12. 12. 12 Problem 1: Ego-Noise Mismatch Solution  We sample convincing ego-noise spectrogram by certainty I.  Next, obtain smoothed “time-frequency deformation function” between sampled spectrogram and original supervised ego- noise basis.  Time-invariant all-pole model is used as deformation function. Diagonal matrix with entries Supervised ego-noise basis Ego-noise activation KL divergence Order of all-pole model This can be solved as extended NMF optimization. Frequency Powerspectrum
  13. 13. 13 Problem 1: Ego-Noise Mismatch Solution : each element of Update of activation Update of all-pole-model weight By noting the KL-cost function as J, its auxiliary function is given by
  14. 14.  Statistical postfilter’s output is sparse estimation of S.  We can re-estimate sparse-aware speech basis using .  We use it as an initial value of speech basis in Rank-1 MNMF. 14 Problem 2: Speech Model Ambiguity Solution Speech basis Speech activation IS-divergence Time Frequency Time Frequency Sparse low-rank speech spectrogramOutput of Rank-1 MNMF Sparse Low-rank approximation Sparse speech spectrogram
  15. 15. 実験条件  # of mic. : 8 channel microphones on 3-m-long hose-shape robot  Speech : male & female speech with real-recorded impulse responses  Ego-noise: real-recorded in moving hose-shaped robot (2 patterns)  Training : matched with mixed ego-noise (2 patterns) & mismatched (3 patterns)  Evaluation: SDR improvement (both SNR and distortion are considered)  Input SDR: 0 dB, -5 dB, -10 dB  Comparison: IVA, PSNMF (single-channel supervised NMF), Rank-1 MNMF (no supervision) 15 Simulation Experiment True target Interference Artificial distortionEstimated Higher SDR indicates better separation
  16. 16. 16 Example of Typical SDR Improvement Supervised Rank-1 MNMF Statistical postfilter (1) (2) (3) (4) Combination of each processing is effective. SDRImprovement[dB] Step(1) Step(2) Step(3) Step(4) SDR increases through each processing step Before basis defom. and initialization Basis deform. and Initialization After basis defom. and initialization
  17. 17. 17 Comparison with Competitors  Proposed methods of both matched and mismatched cases outperform other conventional methods, whereas the mismatched case is inferior to matched. Conventional Proposed
  18. 18.  We proposed a new informed source separation method for the flexible microphone array system based on supervised Rank-1 MNMF and statistical speech enhancement.  To reduce the mismatch problem, we proposed the algorithm that an all-pole model is estimated to deform the bases using the reliable spectral components sampled by the statistical signal enhancement method.  We revealed that the proposed method outperforms the conventional methods via experiments with actual sounds in the rescue robot. 18 Conclusion Thank you for your attention!