非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura
北村大地, "非負値行列分解の確率的生成モデルと多チャネル音源分離への応用," 慶應義塾大学理工学部電子工学科湯川研究室 招待講演, Kanagawa, November, 2015.
Daichi Kitamura, "Generative model in nonnegative matrix factorization and its application to multichannel sound source separation," Keio University, Science and Technology, Department of Electronics and Electrical Engineeing, Yukawa Laboratory, Invited Talk, Kanagawa, November, 2015.
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura
北村大地, "非負値行列分解の確率的生成モデルと多チャネル音源分離への応用," 慶應義塾大学理工学部電子工学科湯川研究室 招待講演, Kanagawa, November, 2015.
Daichi Kitamura, "Generative model in nonnegative matrix factorization and its application to multichannel sound source separation," Keio University, Science and Technology, Department of Electronics and Electrical Engineeing, Yukawa Laboratory, Invited Talk, Kanagawa, November, 2015.
This document discusses the connections between generative adversarial networks (GANs) and energy-based models (EBMs). It shows that GAN training can be interpreted as approximating maximum likelihood training of an EBM by replacing the intractable data distribution with a generator distribution. Specifically:
1. GANs train a discriminator to estimate the energy function of an EBM, with the generator minimizing that energy of its samples.
2. EBM training can be seen as alternatively updating the generator and sampling from it, in a manner similar to contrastive divergence for EBMs.
3. This perspective unifies GANs and EBMs, and suggests ways to combine their training procedures to leverage their respective advantages
The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
Presentation slide for AI seminar at Artificial Intelligence Research Center, The National Institute of Advanced Industrial Science and Technology, Japan.
URL (in Japanese): https://www.airc.aist.go.jp/seminar_detail/seminar_046.html
北村大地, 小野順貴, "独立性基準を用いた非負値行列因子分解の効果的な初期値決定法," 日本音響学会 2016年春季研究発表会, 3-3-5, pp. 619-622, Kanagawa, March 2016.
Daichi Kitamura, Nobutaka Ono, "Statistical-independence-based effective initialization for nonnegative matrix factorization," Proceedings of 2016 Spring Meeting of Acoustical Society of Japan, 3-3-5, pp. 619-622, Kanagawa, March 2016 (in Japanese).
This document provides an overview of POMDP (Partially Observable Markov Decision Process) and its applications. It first defines the key concepts of POMDP such as states, actions, observations, and belief states. It then uses the classic Tiger problem as an example to illustrate these concepts. The document discusses different approaches to solve POMDP problems, including model-based methods that learn the environment model from data and model-free reinforcement learning methods. Finally, it provides examples of applying POMDP to games like ViZDoom and robot navigation problems.
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
This document discusses the connections between generative adversarial networks (GANs) and energy-based models (EBMs). It shows that GAN training can be interpreted as approximating maximum likelihood training of an EBM by replacing the intractable data distribution with a generator distribution. Specifically:
1. GANs train a discriminator to estimate the energy function of an EBM, with the generator minimizing that energy of its samples.
2. EBM training can be seen as alternatively updating the generator and sampling from it, in a manner similar to contrastive divergence for EBMs.
3. This perspective unifies GANs and EBMs, and suggests ways to combine their training procedures to leverage their respective advantages
The document describes a real-time DNN voice conversion system with feedback to acquire character traits. It proposes a method to provide real-time feedback of the converted voice to the speaker to encourage speech modification (prosody and emphasis) towards the target speaker's character. Subjective evaluations from the first-person (user) perspective and third-person perspective found that the system improved the reproduction of the target speaker's character, especially for inexperienced users. Providing only pitch feedback was already quite effective.
Presentation slide for AI seminar at Artificial Intelligence Research Center, The National Institute of Advanced Industrial Science and Technology, Japan.
URL (in Japanese): https://www.airc.aist.go.jp/seminar_detail/seminar_046.html
北村大地, 小野順貴, "独立性基準を用いた非負値行列因子分解の効果的な初期値決定法," 日本音響学会 2016年春季研究発表会, 3-3-5, pp. 619-622, Kanagawa, March 2016.
Daichi Kitamura, Nobutaka Ono, "Statistical-independence-based effective initialization for nonnegative matrix factorization," Proceedings of 2016 Spring Meeting of Acoustical Society of Japan, 3-3-5, pp. 619-622, Kanagawa, March 2016 (in Japanese).
This document provides an overview of POMDP (Partially Observable Markov Decision Process) and its applications. It first defines the key concepts of POMDP such as states, actions, observations, and belief states. It then uses the classic Tiger problem as an example to illustrate these concepts. The document discusses different approaches to solve POMDP problems, including model-based methods that learn the environment model from data and model-free reinforcement learning methods. Finally, it provides examples of applying POMDP to games like ViZDoom and robot navigation problems.
ICASSP 2019音声&音響論文読み会(https://connpass.com/event/128527/)での発表資料です。
AASP (Audio and Acoustic Signal Processing) 分野の紹介と、ICASSP 2019での動向を紹介しています。#icassp2019jp
2017年6月24日,ICASSP2017読み会(関東編)@東京大学
AASP-L3: Deep Learning for Source Separation and Enhancement I
東京大学特任助教 北村大地担当分のスライド
私が著者ではないペーパーの紹介スライドですので,再配布等はご遠慮ください.また,このスライドで取り扱っていない詳細な情報に関しては対象となる論文をご参照ください.
Deep Learningについて、日本情報システム・ユーザー協会(JUAS)のJUAS ビジネスデータ研究会 AI分科会で発表しました。その際に使用した資料です。専門家向けではなく、一般向けの資料です。
なお本資料は、2015年12月の日本情報システム・ユーザー協会(JUAS)での発表資料の改訂版となります。
This is the company presentation material of RIZAP Technologies, Inc.
[DL輪読会]VOICEFILTER: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
1. 1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
VOICEFILTER:TargetedVoice Separation by
Speaker-Conditioned Spectrogram Masking
Hiroshi Sekiguchi, Morikawa Lab
2. 書誌情報
• “VOICEFILTER: Targeted Voice Separation by Speaker-Conditioned
Spectrogram Masking” arXiv:1810.04826v3 [eess.AS] 27 Oct 2018
• Author: Quan Wang1, Hannah Muckenhire2, Kevin Wilson1, Prashant
Sridhar1, Zelin Wu1, John Hershey1, Rif A. Saurous1, Ron J. Weiss1, Ye
Jia1, Ignacio Lopez Moreno1
1Google Inc. USA, 2Idiap Research Institute, Switzerland
• 論文選択の理由
• 重畳音声の分離が研究テーマ
• Google製スマートスピーカ”Google Home”の重畳音声分離をレビュー.
2
11. 話者認識ネットワーク
• 話者認識に2通りあり
– Text Dependent-Speaker Verification(TD-SV):
事前登録の単語(“OK Google”)=テスト時の単語(“OK Google”)
– Test Independent-Speaker Verification(TI-SV):
事前登録の単語(“色々な単語”(音韻も単語長も色々)=テスト時の単語(“Hey
Google”)
→今回は,後者のフレームワーク.
• 関連論文:
– Generalized End-to-End Loss for Speaker Verification,
Lin Wan , et.al,Google
– Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale
Acoustic Modeling
Hasim Sak,et.al,Google, USA
11