2017年6月24日,ICASSP2017読み会(関東編)@東京大学
AASP-L3: Deep Learning for Source Separation and Enhancement I
東京大学特任助教 北村大地担当分のスライド
私が著者ではないペーパーの紹介スライドですので,再配布等はご遠慮ください.また,このスライドで取り扱っていない詳細な情報に関しては対象となる論文をご参照ください.
5. セッション目次
• AASP-L3.1: Deep clustering and conventional networks for music
separation: stronger together
– Y. Luo, Z. Chen, J. R. Hershey, J. Le Roux, and N. Mesgarani
• AASP-L3.2: DNN-based speech mask estimation for eigenvector
beamforming
– Lukas Pfeifenberger, Matthias Zöhrer, and Franz Pernkopf
• AASP-L3.3: Recurrent deep stacking networks for supervised speech
separation
– Z.-Q. Wang, and D.L. Wang
• AASP-L3.4: Collaborative deep learning for speech enhancement: a run-
time model selection method using autoencoders
– M. Kim
• AASP-L3.5: DNN-based source enhancement self-optimized by
reinforcement learning using sound quality measurements
– Y. Koizumi, K. Niwa, Y. Hioka, K. Kobayashi, and Y. Haneda
• AASP-L3.6: A neural network alternative to non-negative audio models
– P. Smaragdis and S. Venkataramani
5/39
TFマスク推定
TFマスク推定
TFマスク推定
オートエンコーダ
TFマスク選択
オートエンコーダ
6. セッション目次
• AASP-L3.1: Deep clustering and conventional networks for music
separation: stronger together
– Y. Luo, Z. Chen, J. R. Hershey, J. Le Roux, and N. Mesgarani
• AASP-L3.2: DNN-based speech mask estimation for eigenvector
beamforming
– Lukas Pfeifenberger, Matthias Zöhrer, and Franz Pernkopf
• AASP-L3.3: Recurrent deep stacking networks for supervised speech
separation
– Z.-Q. Wang, and D.L. Wang
• AASP-L3.4: Collaborative deep learning for speech enhancement: a run-
time model selection method using autoencoders
– M. Kim
• AASP-L3.5: DNN-based source enhancement self-optimized by
reinforcement learning using sound quality measurements
– Y. Koizumi, K. Niwa, Y. Hioka, K. Kobayashi, and Y. Haneda
• AASP-L3.6: A neural network alternative to non-negative audio models
– P. Smaragdis and S. Venkataramani
6/39
19. セッション目次
• AASP-L3.1: Deep clustering and conventional networks for music
separation: stronger together
– Y. Luo, Z. Chen, J. R. Hershey, J. Le Roux, and N. Mesgarani
• AASP-L3.2: DNN-based speech mask estimation for eigenvector
beamforming
– Lukas Pfeifenberger, Matthias Zöhrer, and Franz Pernkopf
• AASP-L3.3: Recurrent deep stacking networks for supervised speech
separation
– Z.-Q. Wang, and D.L. Wang
• AASP-L3.4: Collaborative deep learning for speech enhancement: a run-
time model selection method using autoencoders
– M. Kim
• AASP-L3.5: DNN-based source enhancement self-optimized by
reinforcement learning using sound quality measurements
– Y. Koizumi, K. Niwa, Y. Hioka, K. Kobayashi, and Y. Haneda
• AASP-L3.6: A neural network alternative to non-negative audio models
– P. Smaragdis and S. Venkataramani
19/39
25. セッション目次
• AASP-L3.1: Deep clustering and conventional networks for music
separation: stronger together
– Y. Luo, Z. Chen, J. R. Hershey, J. Le Roux, and N. Mesgarani
• AASP-L3.2: DNN-based speech mask estimation for eigenvector
beamforming
– Lukas Pfeifenberger, Matthias Zöhrer, and Franz Pernkopf
• AASP-L3.3: Recurrent deep stacking networks for supervised speech
separation
– Z.-Q. Wang, and D.L. Wang
• AASP-L3.4: Collaborative deep learning for speech enhancement: a run-
time model selection method using autoencoders
– M. Kim
• AASP-L3.5: DNN-based source enhancement self-optimized by
reinforcement learning using sound quality measurements
– Y. Koizumi, K. Niwa, Y. Hioka, K. Kobayashi, and Y. Haneda
• AASP-L3.6: A neural network alternative to non-negative audio models
– P. Smaragdis and S. Venkataramani
25/39
31. セッション目次
• AASP-L3.1: Deep clustering and conventional networks for music
separation: stronger together
– Y. Luo, Z. Chen, J. R. Hershey, J. Le Roux, and N. Mesgarani
• AASP-L3.2: DNN-based speech mask estimation for eigenvector
beamforming
– Lukas Pfeifenberger, Matthias Zöhrer, and Franz Pernkopf
• AASP-L3.3: Recurrent deep stacking networks for supervised speech
separation
– Z.-Q. Wang, and D.L. Wang
• AASP-L3.4: Collaborative deep learning for speech enhancement: a run-
time model selection method using autoencoders
– M. Kim
• AASP-L3.5: DNN-based source enhancement self-optimized by
reinforcement learning using sound quality measurements
– Y. Koizumi, K. Niwa, Y. Hioka, K. Kobayashi, and Y. Haneda
• AASP-L3.6: A neural network alternative to non-negative audio models
– P. Smaragdis and S. Venkataramani
31/39
32. AASP-L3.6: NMFからNonnegative AEへ
• 非負値行列因子分解(NMF) [Lee+, 1999]
– 非負制約付きの任意基底数( 本)による低ランク近似
• 限られた数の非負基底ベクトルとそれらの非負係数を抽出
– STFTで得られるパワースペクトログラムに適用
• 頻出するスペクトルパターンとそれらの時間的な強度変化
32/39
Amplitude Amplitude
混合された観測行列
(パワースペクトログラム)
基底行列
(スペクトルパターン)
アクティベーション行列
(時間的強度変化)
Time
: 周波数ビン数
: 時間フレーム数
: 基底数
Time
Frequency
Frequency 基底 アクティベーション