SlideShare a Scribd company logo
©Yuki Saito, 07/03/2017
TRAINING ALGORITHM TO DECEIVE
ANTI-SPOOFING VERIFICATION
FOR DNN-BASED SPEECH SYNTHESIS
Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
(The University of Tokyo)
ICASSP 2017 SP-L4.2
/17
 Issue: quality degradation in statistical parametric speech
synthesis due to over-smoothing of the speech params.
 Countermeasures: reproducing natural statistics
– 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.]
– Histogram[Ohtani et al., 2012.]
 Proposed: training algorithm to deceive an Anti-Spoofing
Verification (ASV) for DNN-based speech synthesis
– Tries to deceive the ASV which distinguishes natural / synthetic speech.
– Compensates distribution difference betw. natural / synthetic speech.
 Results:
– Improves the synthetic speech quality.
– Works comparably robustly against its hyper-parameter setting.
1
Outline of This Talk
/17
Conventional Training Algorithm:
Minimum Generation Error (MGE) Training
2
Generation
error
𝐿G 𝒄, ො𝒄
Linguistic
feats.
[Wu et al., 2016.]
Natural
speech
params.
𝐿G 𝒄, ො𝒄 =
1
𝑇
ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize
𝒄
ML-based
parameter
generation
Generated
speech
params.ො𝒄
Acoustic models
⋯
⋯
⋯
Frame
𝑡 = 1
Static-dynamic
mean vectors
Frame
𝑡 = 𝑇
/173
Issue of MGE Training:
Over-smoothing of Generated Speech Parameters
Natural MGE
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
These distributions are significantly different...
(GV [Toda et al., 2007.] explicitly compensates the 2nd moment.)
Narrow
/174
Proposed algorithm:
Training Algorithm to Deceive
Anti-Spoofing Verification (ASV)
/17
Anti-Spoofing Verification (ASV):
Discriminator to Prevent Spoofing Attacks w/ Speech
5
[Wu et al., 2016.] [Chen et al., 2015.]
𝐿D,1 𝒄 𝐿D,0 ො𝒄
𝐿D 𝒄, ො𝒄 = → Minimize−
1
𝑇
෍
𝑡=1
𝑇
log 𝐷 𝒄 𝑡 −
1
𝑇
෍
𝑡=1
𝑇
log 1 − 𝐷 ො𝒄 𝑡
ො𝒄
Cross entropy
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Generated
speech params.
𝒄Natural
speech params.
Feature
function
𝝓 ⋅
Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅
or
Loss to recognize
generated speech as generated
Loss to recognize
natural speech as natural
/17
Training Algorithm to Deceive ASV
6
𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D
𝐸 𝐿G
𝐸 𝐿D
𝐿D,1 ො𝒄 → Minimize
𝐿G 𝒄, ො𝒄
Linguistic
feats.
Natural
speech params. 𝒄
ML-based
parameter
generation
Generated
speech params.ො𝒄
Acoustic models
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
ASV 𝐷 ⋅
Loss to recognize
generated speech as natural
𝜔D: weight, 𝐸𝐿G
, 𝐸𝐿D
: expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄
Static-dynamic
mean vectors
/17
 ① Update the acoustic models
 ② Update the ASV
Iterative Optimization of Acoustic models and ASV
7
By iterating ① and ②, we construct the final acoustic models!
Fixed
Fixed
𝐿G 𝒄, ො𝒄
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Feature
function
𝝓 ⋅
or
/17
 Compensations of speech feats. through the feature function:
– Automatically-derived feats. such as auto-encoded feats.
– Conventional analytically-derived feats. such as GV
 Loss function for training the acoustic models:
– Combination of MGE and adversarial training [Goodfellow et al., 2014.]
 The effect of the adversarial training:
– Minimizes the Jensen-Shannon divergence betw. the distributions of
the natural data / generated data.
8
Discussions of Proposed Algorithm
/179
Distributions of Speech Parameters
Our algorithm alleviates the over-smoothing effect!
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
Natural MGE Proposed
Narrow
Wide as
natural speech
/17
 Global Variance (GV): [Toda et al., 2007.]
– 2nd moment of the parameter distribution
10
Compensation of Global Variance
Feature index
0 5 10 15 20
10-3
10-1
101
Globalvariance
Proposed
Natural
MGE
10-2
100
10-4
GV is NOT used for training, but compensated by the ASV!
/17
 Maximal Information Coefficient (MIC): [Reshef et al., 2011.]
– Values to quantify a nonlinear correlation b/w two variables
– Natural speech params. tend to have weak correlation [Ijima et al., 2016.]
11
Additional Effect:
Alleviation of Unnaturally Strong Correlation
Natural MGE
0 6 12 18 24
0.0
0.2
0.4
0.6
0.8
1.0
Strong
Weak
Proposed
0 6 12 18 24 0 6 12 18 24
Proposed algorithm not only compensates the GV,
but also makes the correlations among speech params. natural!
/1712
Experimental Evaluations
/17
Experimental Conditions
13
Dataset
ATR Japanese speech database
(phonetic balanced 503 sentences)
Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling)
Linguistic feats.
274-dimensional vector
(phoneme, accent type, frame position, etc...)
Speech params.
Mel-cepstral coefficients (0th-through-24th),
𝐹0, 5-band aperiodicity
Prediction params.
Mel-cepstral coefficients
(the others were NOT predicted)
Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01)
Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear)
ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
/17
Initialization, Training, and Objective Evaluation
14
 Initialization:
– Acoustic models: conventional MGE training
– ASV: distinguish natural / generated speech after the MGE training
 Training:
– Acoustic models: update with the proposed algorithm
– ASV: distinguish natural / generated speech after updating the acoustic
models
 Objective evaluation:
– Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate
Spoofing rate =
# of the spoofing synthetic speech params.
Total # of the synthetic speech params.
We calculated these values w/ various 𝜔D.
/17
Results of Objective Evaluations
15
Generation loss Spoofing rate
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
0.45
0.50
0.55
0.60
0.65
0.70
0.75
1.0
0.8
0.6
0.4
0.2
0.0
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
Got
worse when 𝜔D > 0.3,
spoofing rate > 99%
Got
better
Our algorithm makes the generation loss worse
but
can train the acoustic models to deceive the ASV!
/17
Results of Subjective Evaluations
in Terms of Speech Quality
16
Proposed
𝜔D = 1.0
Proposed
𝜔D = 0.3
MGE
𝜔D = 0.0
Preference score (w/ 8 listeners)
0.0 0.2 0.4 0.6 0.8 1.0
Got
better
NO
significant
difference
Our algorithm improves the synthetic speech quality
and
works comparably robustly against its hyper-parameter setting!
Error bars denote 95% confidence intervals.
Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
/17
Conclusion
 Purpose:
– Improving the speech quality of statistical parametric speech synthesis
 Proposed:
– Training algorithm to deceive an ASV
• Compensates the difference b/w distributions of natural /
generated speech params. using adversarial training
 Results:
– Improved the speech quality compared to conventional training
– Worked comparably robustly against its hyper-parameter setting
 Future work:
– Devising temporal- and linguistic-dependent ASV
– Extending our algorithm to generate 𝐹0 and duration
17

More Related Content

What's hot

ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020
Yuki Saito
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告
Yuki Saito
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
Shinnosuke Takamichi
 
Onoma-to-wave: オノマトペを利用した環境音合成手法の提案
Onoma-to-wave: オノマトペを利用した環境音合成手法の提案Onoma-to-wave: オノマトペを利用した環境音合成手法の提案
Onoma-to-wave: オノマトペを利用した環境音合成手法の提案
Keisuke Imoto
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
Shinnosuke Takamichi
 
Slp201702
Slp201702Slp201702
Slp201702
Yuki Saito
 
Asj2017 3invited
Asj2017 3invitedAsj2017 3invited
Asj2017 3invited
SaruwatariLabUTokyo
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
홍배 김
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentation
Yuki Saito
 
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
Shinnosuke Takamichi
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
Yuki Saito
 
経験ベイズ木(IBIS 2017)
経験ベイズ木(IBIS 2017)経験ベイズ木(IBIS 2017)
経験ベイズ木(IBIS 2017)
Masashi Sekino
 
分布あるいはモーメント間距離最小化に基づく統計的音声合成
分布あるいはモーメント間距離最小化に基づく統計的音声合成分布あるいはモーメント間距離最小化に基づく統計的音声合成
分布あるいはモーメント間距離最小化に基づく統計的音声合成
Shinnosuke Takamichi
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離
NU_I_TODALAB
 
CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換
NU_I_TODALAB
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
NU_I_TODALAB
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
Shinnosuke Takamichi
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
Deep Learning JP
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
NU_I_TODALAB
 
Ea2015 7for ss
Ea2015 7for ssEa2015 7for ss
Ea2015 7for ss
SaruwatariLabUTokyo
 

What's hot (20)

ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
 
Onoma-to-wave: オノマトペを利用した環境音合成手法の提案
Onoma-to-wave: オノマトペを利用した環境音合成手法の提案Onoma-to-wave: オノマトペを利用した環境音合成手法の提案
Onoma-to-wave: オノマトペを利用した環境音合成手法の提案
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
 
Slp201702
Slp201702Slp201702
Slp201702
 
Asj2017 3invited
Asj2017 3invitedAsj2017 3invited
Asj2017 3invited
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentation
 
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 
経験ベイズ木(IBIS 2017)
経験ベイズ木(IBIS 2017)経験ベイズ木(IBIS 2017)
経験ベイズ木(IBIS 2017)
 
分布あるいはモーメント間距離最小化に基づく統計的音声合成
分布あるいはモーメント間距離最小化に基づく統計的音声合成分布あるいはモーメント間距離最小化に基づく統計的音声合成
分布あるいはモーメント間距離最小化に基づく統計的音声合成
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離
 
CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
 
Ea2015 7for ss
Ea2015 7for ssEa2015 7for ss
Ea2015 7for ss
 

Viewers also liked

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asj
Yuki Saito
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
Shinnosuke Takamichi
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Shinnosuke Takamichi
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
Shinnosuke Takamichi
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
Shinnosuke Takamichi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
Takuya Yoshioka
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
Shinnosuke Takamichi
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
Shinnosuke Takamichi
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)
Shinnosuke Takamichi
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)
Shinnosuke Takamichi
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
Shinnosuke Takamichi
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
Shunsuke Ono
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
Shunsuke Ono
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
Shinnosuke Takamichi
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
Daichi Kitamura
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
Yahoo!デベロッパーネットワーク
 

Viewers also liked (17)

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asj
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 

Similar to Saito2017icassp

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
Yuki Saito
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
Priyanka Reddy
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdf
rahul mishra
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
IJECEIAES
 
Une18apsipa
Une18apsipaUne18apsipa
Une18apsipa
Yuki Saito
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
SDL
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
tsysglobalsolutions
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
mosi2005
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
JAEMINJEONG5
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
Seoul National University
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
butest
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
sipij
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Lifeng (Aaron) Han
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
Lifeng (Aaron) Han
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
ssuser849b73
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET Journal
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
ijiert bestjournal
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
sipij
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
Dia Abdulkerim
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Tae Hwan Jung
 

Similar to Saito2017icassp (20)

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdf
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
 
Une18apsipa
Une18apsipaUne18apsipa
Une18apsipa
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 

More from Yuki Saito

hirai23slp03.pdf
hirai23slp03.pdfhirai23slp03.pdf
hirai23slp03.pdf
Yuki Saito
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_asc
Yuki Saito
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
Yuki Saito
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversion
Yuki Saito
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn Meeting
Yuki Saito
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 reading
Yuki Saito
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumn
Yuki Saito
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_published
Yuki Saito
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNA
Yuki Saito
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generation
Yuki Saito
 
Saito19asj_s
Saito19asj_sSaito19asj_s
Saito19asj_s
Yuki Saito
 
Saito18sp03
Saito18sp03Saito18sp03
Saito18sp03
Yuki Saito
 
Saito18asj_s
Saito18asj_sSaito18asj_s
Saito18asj_s
Yuki Saito
 
Saito17asjA
Saito17asjASaito17asjA
Saito17asjA
Yuki Saito
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会
Yuki Saito
 
miyoshi17sp07
miyoshi17sp07miyoshi17sp07
miyoshi17sp07
Yuki Saito
 
saito2017asj_tts
saito2017asj_ttssaito2017asj_tts
saito2017asj_tts
Yuki Saito
 
saito2017asj_vc
saito2017asj_vcsaito2017asj_vc
saito2017asj_vc
Yuki Saito
 
DNN音声合成のための Anti-spoofing を考慮した学習アルゴリズム
DNN音声合成のための Anti-spoofing を考慮した学習アルゴリズムDNN音声合成のための Anti-spoofing を考慮した学習アルゴリズム
DNN音声合成のための Anti-spoofing を考慮した学習アルゴリズム
Yuki Saito
 

More from Yuki Saito (19)

hirai23slp03.pdf
hirai23slp03.pdfhirai23slp03.pdf
hirai23slp03.pdf
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_asc
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversion
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn Meeting
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 reading
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumn
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_published
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNA
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generation
 
Saito19asj_s
Saito19asj_sSaito19asj_s
Saito19asj_s
 
Saito18sp03
Saito18sp03Saito18sp03
Saito18sp03
 
Saito18asj_s
Saito18asj_sSaito18asj_s
Saito18asj_s
 
Saito17asjA
Saito17asjASaito17asjA
Saito17asjA
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会
 
miyoshi17sp07
miyoshi17sp07miyoshi17sp07
miyoshi17sp07
 
saito2017asj_tts
saito2017asj_ttssaito2017asj_tts
saito2017asj_tts
 
saito2017asj_vc
saito2017asj_vcsaito2017asj_vc
saito2017asj_vc
 
DNN音声合成のための Anti-spoofing を考慮した学習アルゴリズム
DNN音声合成のための Anti-spoofing を考慮した学習アルゴリズムDNN音声合成のための Anti-spoofing を考慮した学習アルゴリズム
DNN音声合成のための Anti-spoofing を考慮した学習アルゴリズム
 

Recently uploaded

ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
SkillCertProExams
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Ben Linders
 
nationalismineurope-230420140400-1c53f60e.pptx
nationalismineurope-230420140400-1c53f60e.pptxnationalismineurope-230420140400-1c53f60e.pptx
nationalismineurope-230420140400-1c53f60e.pptx
silki0908
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
kainatfatyma9
 
Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
Raheem Muhammad
 
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
OECD Directorate for Financial and Enterprise Affairs
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
samililja
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
OECD Directorate for Financial and Enterprise Affairs
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
OECD Directorate for Financial and Enterprise Affairs
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
kekzed
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
OECD Directorate for Financial and Enterprise Affairs
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
Claudio Gallicchio
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
Robin Haunschild
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
RIDHIMAGARG21
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
OECD Directorate for Financial and Enterprise Affairs
 

Recently uploaded (20)

ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
ServiceNow CIS-ITSM Exam Dumps & Questions [2024]
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
The Intersection between Competition and Data Privacy – KEMP – June 2024 OECD...
 
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdfWhy Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
Why Psychological Safety Matters for Software Teams - ACE 2024 - Ben Linders.pdf
 
nationalismineurope-230420140400-1c53f60e.pptx
nationalismineurope-230420140400-1c53f60e.pptxnationalismineurope-230420140400-1c53f60e.pptx
nationalismineurope-230420140400-1c53f60e.pptx
 
Using-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptxUsing-Presentation-Software-to-the-Fullf.pptx
Using-Presentation-Software-to-the-Fullf.pptx
 
Proposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP IncProposal: The Ark Project and The BEEP Inc
Proposal: The Ark Project and The BEEP Inc
 
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
The Intersection between Competition and Data Privacy – COLANGELO – June 2024...
 
XP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to LeadershipXP 2024 presentation: A New Look to Leadership
XP 2024 presentation: A New Look to Leadership
 
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
The Intersection between Competition and Data Privacy – CAPEL – June 2024 OEC...
 
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussionPro-competitive Industrial Policy – LANE – June 2024 OECD discussion
Pro-competitive Industrial Policy – LANE – June 2024 OECD discussion
 
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...Competition and Regulation in Professions and Occupations – ROBSON – June 202...
Competition and Regulation in Professions and Occupations – ROBSON – June 202...
 
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussionArtificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – LIM – June 2024 OECD discussion
 
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
The Intersection between Competition and Data Privacy – OECD – June 2024 OECD...
 
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
怎么办理(lincoln学位证书)英国林肯大学毕业证文凭学位证书原版一模一样
 
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussionArtificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
Artificial Intelligence, Data and Competition – OECD – June 2024 OECD discussion
 
IEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdfIEEE CIS Webinar Sustainable futures.pdf
IEEE CIS Webinar Sustainable futures.pdf
 
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdfBRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
BRIC_2024_2024-06-06-11:30-haunschild_archival_version.pdf
 
Disaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other usesDisaster Management project for holidays homework and other uses
Disaster Management project for holidays homework and other uses
 
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
Artificial Intelligence, Data and Competition – ČORBA – June 2024 OECD discus...
 

Saito2017icassp

  • 1. ©Yuki Saito, 07/03/2017 TRAINING ALGORITHM TO DECEIVE ANTI-SPOOFING VERIFICATION FOR DNN-BASED SPEECH SYNTHESIS Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari (The University of Tokyo) ICASSP 2017 SP-L4.2
  • 2. /17  Issue: quality degradation in statistical parametric speech synthesis due to over-smoothing of the speech params.  Countermeasures: reproducing natural statistics – 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.] – Histogram[Ohtani et al., 2012.]  Proposed: training algorithm to deceive an Anti-Spoofing Verification (ASV) for DNN-based speech synthesis – Tries to deceive the ASV which distinguishes natural / synthetic speech. – Compensates distribution difference betw. natural / synthetic speech.  Results: – Improves the synthetic speech quality. – Works comparably robustly against its hyper-parameter setting. 1 Outline of This Talk
  • 3. /17 Conventional Training Algorithm: Minimum Generation Error (MGE) Training 2 Generation error 𝐿G 𝒄, ො𝒄 Linguistic feats. [Wu et al., 2016.] Natural speech params. 𝐿G 𝒄, ො𝒄 = 1 𝑇 ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ Frame 𝑡 = 1 Static-dynamic mean vectors Frame 𝑡 = 𝑇
  • 4. /173 Issue of MGE Training: Over-smoothing of Generated Speech Parameters Natural MGE 21st mel-cepstral coefficient 23rdmel-cepstral coefficient These distributions are significantly different... (GV [Toda et al., 2007.] explicitly compensates the 2nd moment.) Narrow
  • 5. /174 Proposed algorithm: Training Algorithm to Deceive Anti-Spoofing Verification (ASV)
  • 6. /17 Anti-Spoofing Verification (ASV): Discriminator to Prevent Spoofing Attacks w/ Speech 5 [Wu et al., 2016.] [Chen et al., 2015.] 𝐿D,1 𝒄 𝐿D,0 ො𝒄 𝐿D 𝒄, ො𝒄 = → Minimize− 1 𝑇 ෍ 𝑡=1 𝑇 log 𝐷 𝒄 𝑡 − 1 𝑇 ෍ 𝑡=1 𝑇 log 1 − 𝐷 ො𝒄 𝑡 ො𝒄 Cross entropy 𝐿D 𝒄, ො𝒄 1: natural 0: generated Generated speech params. 𝒄Natural speech params. Feature function 𝝓 ⋅ Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅ or Loss to recognize generated speech as generated Loss to recognize natural speech as natural
  • 7. /17 Training Algorithm to Deceive ASV 6 𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D 𝐸 𝐿G 𝐸 𝐿D 𝐿D,1 ො𝒄 → Minimize 𝐿G 𝒄, ො𝒄 Linguistic feats. Natural speech params. 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ ASV 𝐷 ⋅ Loss to recognize generated speech as natural 𝜔D: weight, 𝐸𝐿G , 𝐸𝐿D : expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄 Static-dynamic mean vectors
  • 8. /17  ① Update the acoustic models  ② Update the ASV Iterative Optimization of Acoustic models and ASV 7 By iterating ① and ②, we construct the final acoustic models! Fixed Fixed 𝐿G 𝒄, ො𝒄 Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D 𝒄, ො𝒄 1: natural 0: generated Feature function 𝝓 ⋅ or
  • 9. /17  Compensations of speech feats. through the feature function: – Automatically-derived feats. such as auto-encoded feats. – Conventional analytically-derived feats. such as GV  Loss function for training the acoustic models: – Combination of MGE and adversarial training [Goodfellow et al., 2014.]  The effect of the adversarial training: – Minimizes the Jensen-Shannon divergence betw. the distributions of the natural data / generated data. 8 Discussions of Proposed Algorithm
  • 10. /179 Distributions of Speech Parameters Our algorithm alleviates the over-smoothing effect! 21st mel-cepstral coefficient 23rdmel-cepstral coefficient Natural MGE Proposed Narrow Wide as natural speech
  • 11. /17  Global Variance (GV): [Toda et al., 2007.] – 2nd moment of the parameter distribution 10 Compensation of Global Variance Feature index 0 5 10 15 20 10-3 10-1 101 Globalvariance Proposed Natural MGE 10-2 100 10-4 GV is NOT used for training, but compensated by the ASV!
  • 12. /17  Maximal Information Coefficient (MIC): [Reshef et al., 2011.] – Values to quantify a nonlinear correlation b/w two variables – Natural speech params. tend to have weak correlation [Ijima et al., 2016.] 11 Additional Effect: Alleviation of Unnaturally Strong Correlation Natural MGE 0 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 Strong Weak Proposed 0 6 12 18 24 0 6 12 18 24 Proposed algorithm not only compensates the GV, but also makes the correlations among speech params. natural!
  • 14. /17 Experimental Conditions 13 Dataset ATR Japanese speech database (phonetic balanced 503 sentences) Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling) Linguistic feats. 274-dimensional vector (phoneme, accent type, frame position, etc...) Speech params. Mel-cepstral coefficients (0th-through-24th), 𝐹0, 5-band aperiodicity Prediction params. Mel-cepstral coefficients (the others were NOT predicted) Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01) Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear) ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
  • 15. /17 Initialization, Training, and Objective Evaluation 14  Initialization: – Acoustic models: conventional MGE training – ASV: distinguish natural / generated speech after the MGE training  Training: – Acoustic models: update with the proposed algorithm – ASV: distinguish natural / generated speech after updating the acoustic models  Objective evaluation: – Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate Spoofing rate = # of the spoofing synthetic speech params. Total # of the synthetic speech params. We calculated these values w/ various 𝜔D.
  • 16. /17 Results of Objective Evaluations 15 Generation loss Spoofing rate 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D 0.45 0.50 0.55 0.60 0.65 0.70 0.75 1.0 0.8 0.6 0.4 0.2 0.0 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D Got worse when 𝜔D > 0.3, spoofing rate > 99% Got better Our algorithm makes the generation loss worse but can train the acoustic models to deceive the ASV!
  • 17. /17 Results of Subjective Evaluations in Terms of Speech Quality 16 Proposed 𝜔D = 1.0 Proposed 𝜔D = 0.3 MGE 𝜔D = 0.0 Preference score (w/ 8 listeners) 0.0 0.2 0.4 0.6 0.8 1.0 Got better NO significant difference Our algorithm improves the synthetic speech quality and works comparably robustly against its hyper-parameter setting! Error bars denote 95% confidence intervals. Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
  • 18. /17 Conclusion  Purpose: – Improving the speech quality of statistical parametric speech synthesis  Proposed: – Training algorithm to deceive an ASV • Compensates the difference b/w distributions of natural / generated speech params. using adversarial training  Results: – Improved the speech quality compared to conventional training – Worked comparably robustly against its hyper-parameter setting  Future work: – Devising temporal- and linguistic-dependent ASV – Extending our algorithm to generate 𝐹0 and duration 17