音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
北村大地, "音源分離における音響モデリング," 日本音響学会 サマーセミナー 招待講演, September 11th, 2017.
Daichi Kitamura, "Acoustic modeling in audio source separation," The Acoustical Society of Japan, Summer Seminar Invited Talk, September 11th, 2017.
Presentation slide for AI seminar at Artificial Intelligence Research Center, The National Institute of Advanced Industrial Science and Technology, Japan.
URL (in Japanese): https://www.airc.aist.go.jp/seminar_detail/seminar_046.html
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
北村大地, "音源分離における音響モデリング," 日本音響学会 サマーセミナー 招待講演, September 11th, 2017.
Daichi Kitamura, "Acoustic modeling in audio source separation," The Acoustical Society of Japan, Summer Seminar Invited Talk, September 11th, 2017.
Presentation slide for AI seminar at Artificial Intelligence Research Center, The National Institute of Advanced Industrial Science and Technology, Japan.
URL (in Japanese): https://www.airc.aist.go.jp/seminar_detail/seminar_046.html
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...Daichi Kitamura
東京大学 システム情報学専攻 談話会
2017年2月27日(月)15時~16時30分
北村大地, "独立性に基づくブラインド音源分離の発展と独立低ランク行列分析," 東京大学 システム情報学専攻 談話会, 2月27日, 2017年.
Daichi Kitamura, "History of independence-based blind source separation and independent low-rank matrix analysis," The University of Tokyo, Department of Information Physics and Computing, Seminar, 27th Feb., 2017.
Effective Optimization Algorithms for Blind and Supervised Music Source Separation with Nonnegative Matrix Factorization
長倉研究奨励賞第三次審査,20分間の研究概要説明
内容は自身の学位論文の一部に相当
This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
Guest presentation at "Applied Gaussian Process and Machine Learning," Graduate School of Information Science and Technology, The University of Tokyo, Japan, 2021.
The document proposes an improved method for audio signal separation using supervised nonnegative matrix factorization (NMF) with time-variant basis deformation. The key contributions are:
1. Classifying supervised bases into time-variant attack and sustain parts and applying different all-pole model-based deformations to each.
2. Introducing discriminative training to avoid overfitting the interference signal and better separate the target.
3. An iterative approximated algorithm is presented that searches for deformation matrices representing the target signal while being constrained to also fit the mixture signal.
4. Experimental results on instrument mixtures show the proposed method achieves better signal-to-distortion ratio performance than previous supervised NMF techniques.
The document describes a proposed hybrid method for multichannel signal separation using supervised nonnegative matrix factorization (SNMF). The method combines directional clustering for spatial separation with SNMF incorporating spectrogram restoration for spectral separation. Experiments show the hybrid method achieves better separation performance than conventional single-channel SNMF or multichannel NMF methods, as measured by signal-to-distortion ratio. The optimal divergence for the SNMF component involves a tradeoff between separation ability and ability to restore missing spectral components.
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...Daichi Kitamura
東京大学 システム情報学専攻 談話会
2017年2月27日(月)15時~16時30分
北村大地, "独立性に基づくブラインド音源分離の発展と独立低ランク行列分析," 東京大学 システム情報学専攻 談話会, 2月27日, 2017年.
Daichi Kitamura, "History of independence-based blind source separation and independent low-rank matrix analysis," The University of Tokyo, Department of Information Physics and Computing, Seminar, 27th Feb., 2017.
Effective Optimization Algorithms for Blind and Supervised Music Source Separation with Nonnegative Matrix Factorization
長倉研究奨励賞第三次審査,20分間の研究概要説明
内容は自身の学位論文の一部に相当
This document summarizes a research talk on statistical-model-based speech enhancement techniques that aim to reduce noise without generating musical noise artifacts. The talk outlines conventional enhancement methods like spectral subtraction and Wiener filtering that often cause musical noise. It then proposes a biased minimum mean-square error estimator that can achieve a musical-noise-free state by introducing a bias parameter. Analysis and experiments show this method can reduce noise while keeping the kurtosis ratio fixed at 1.0 to prevent musical noise, outperforming other techniques in terms of speech quality. A strong speech prior model is found to limit achieving musical-noise-free states, so the prior must be carefully selected.
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
Guest presentation at "Applied Gaussian Process and Machine Learning," Graduate School of Information Science and Technology, The University of Tokyo, Japan, 2021.
The document proposes an improved method for audio signal separation using supervised nonnegative matrix factorization (NMF) with time-variant basis deformation. The key contributions are:
1. Classifying supervised bases into time-variant attack and sustain parts and applying different all-pole model-based deformations to each.
2. Introducing discriminative training to avoid overfitting the interference signal and better separate the target.
3. An iterative approximated algorithm is presented that searches for deformation matrices representing the target signal while being constrained to also fit the mixture signal.
4. Experimental results on instrument mixtures show the proposed method achieves better signal-to-distortion ratio performance than previous supervised NMF techniques.
The document describes a proposed hybrid method for multichannel signal separation using supervised nonnegative matrix factorization (SNMF). The method combines directional clustering for spatial separation with SNMF incorporating spectrogram restoration for spectral separation. Experiments show the hybrid method achieves better separation performance than conventional single-channel SNMF or multichannel NMF methods, as measured by signal-to-distortion ratio. The optimal divergence for the SNMF component involves a tradeoff between separation ability and ability to restore missing spectral components.
This document proposes a flexible microphone array system using informed source separation methods for a rescue robot. It aims to detect victim speech in disaster areas using multiple microphones on the robot's flexible body. The proposed method uses supervised rank-1 nonnegative matrix factorization (NMF) and statistical signal estimation to address two key problems: ego-noise basis mismatch due to the robot's self-vibrations, and speech model ambiguity. Experiments show the proposed approach outperforms conventional independent vector analysis and single-channel NMF, improving speech detection even with mismatched ego-noise recordings.
Shoichi Koyama, Naoki Murata, and Hiroshi Saruwatari. "Super-resolution in sound field recording and reproduction based on sparse representation"
presented at 5th Joint Meeting Acoustical Society of America and Acoustical Society of Japan (28 Nov. - 2 Dec. 2016, Honolulu, USA)
Shoichi Koyama, "Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry"
Presented in 2016 AES International Conference on Sound Field Control (July 18-20 2016, Guildford, UK)
The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
9. /13
収録音声の例とHMM学習
9
収録音声の例
HMM学習(補正)
– 読み上げ誤りの含まれる音声を使用すると、音声合成の品質が低下
– → 収録音声のうち、HMM尤度が相対的に高い音声のみを使用
発話文 話者1 話者2 話者3
There is no mine and there are no miners.
Do you often take them for a walk?
That’s interesting.