SlideShare a Scribd company logo
UTTERANCE-LEVEL SEQUENTIAL MODELING
FOR DEEP GAUSSIAN PROCESS BASED

SPEECH SYNTHESIS USING

SIMPLE RECURRENT UNIT
Tomoki Koriyama, Hiroshi Saruwatari
The University of Tokyo, Japan
May 7, 2020
TH2.PB.10, SPE-P12, ICASSP 2020
Background: deep learning on speech synthesis
‣ Neural network (NN)-based speech synthesis
•Model the relationship between texts and speech parameters
‣ Differentiable components enable complicated models
•RNN (LSTM, GRU), CNN, Self-attention, Attention
‣ RNN for speech synthesis
•Can capture continuously changing speech parameters
•Was used in the best framework in Blizzard Challenge 2019
[Jiang2019]
•Is included in end-to-end frameworks (e.g. Tacotron [Wang2017])
Background: deep Gaussian process
‣ Deep Gaussian process (DGP) [Damianou2013][Salimbeni2018]
•Multi-layer Gaussian process regressions (GPRs)
•Nonlinear regression by kernel methods
•Bayesian learning considering model complexity
•Differentiable by variational approximation
‣ DGP-based speech synthesis [Koriyama2019]
Outperformed DNN-based method
Restricted model to feedforward architecture
p(y|x)
x
p(h1
|x)
p(h2
|x)
GPR
GPR
GPR
Is it possible to apply recurrent architecture to DGP?
Extension of DGPs
‣ Convolutional DGP [Kumar2018], TICK-GP [Dutordoir2019]
•Incorporate CNN architecture into DGP
‣ Probabilistic recurrent state space model (PR-SSM)
[Doerr2018]
•Incorporate RNN architecture into DGP
•Perform GPR in each time step
•Require much time for utterance-level training and generation
Purpose of study
To incorporate recurrent architecture into DGP with fast
computation to enable utterance-level sequential modeling
‣ Approach
•Utilize simple recurrent unit (SRU) [Lei2018]
•Separate parallel computation of GPR from recurrent architecture
Simple recurrent unit (SRU) [Lei2018]
SRU does not use the past hidden-layer value 

to calculate gates or update memory cell
hℓ
t−1
cℓ
t
<latexit sha1_base64="Zn0U1cf4rLt4wCaqCE5UdU8jWIw=">AAAUI3iclZi/b9tGFMfPStu46o/Y1VKgC1HHQYoixikIkCJAgEQOjARJEP+S5SRMBJE+iYRJkSBp+QehTu3SoWvmFOhQ9M/oEqBb0Q4Z+gcUBbpk6NKh746UREl3fBcStI7vvc/3vbujjmdZoefGCaWvFyrn3nn3vfOL71c/+PCjjy8sLX+yFwdHkc2aduAF0b7ViZnn9lkzcROP7YcR6/iWx1rW4Tr3twYsit2gv5uchuyZ3+n13a5rdxIwtZcX/jFDx220k+cm8zzj0k3DjN2e3zFMj3WTy2ar0eYBmdd0eGDK21fqQ+NLw7SKbrgfTIUHB0FimHajnSYQL4xm5Pac5AvDMM0q9xTyFuuQoVxfFFU3rkxFjzRzaFy4XVK1PV9NVDoKUYlYVOx/hHeeZ3OmskXSno/7nXV5HDTfVaekOqeYu720QteoOIz5Rj1vrJD82AyWF78nJjkgAbHJEfEJI32SQNsjHRLD+ZTUCSUh2J6RFGwRtFzhZ2RIqsAeQRSDiA5YD+FvD+6e5tY+3HPNWNA2ZPHgioA0yCr9g/5E39BX9Gf6F/1PqZUKDV7LKXxaGcvC9oXvPt35F6V8+EyIM6FKa05Il3wlanWh9lBYeC/sjB+cvXizc2N7Nb1Ef6B/Q/0v6Wv6C30FmtlpQiwjx6Knvsjdh7FNwc5HricsJ6DFLaPKAshTvI+glc7ED8HvQU0+XAnUPtTKxfvwNrmyeP1c/LlxxejhOUaRQ3FWS+bNgvghMpqcikGRP0190sh7aYlMs94hMlo+1DGrwG0YtwMZshmapSceTOOBeEotuJtXKfownQ70NYSxnleZeDANC+yJRGFkx3g+13KFiQfT2FVq7Gpr7MPoz9LchnEnEu5Eg2uWPovNt3oWQ0kNoUYNj8Qq25OM28SDaWyKOmf5zIqx8jnTm687YPek/MSD1+5Ia3fGta/C+p15DkCjK9Y6lo9vkQ1zpnyWZNnCQrYy+vaYHK2QFtSTgh3rZUNBNjS+23LSQklbQdooua4g1zWeCDl5ByW7CrKLkhsKcgMlHQXpoORdBXkXJe8pyHsoeaggD1HyvoK8r/GWk5MPUPKhgnyo8VaXk77GWionH2msQnJyU2Pdl5MhSm4pyC2U3FaQ2ygZKchIY88kJ3c0dgZyclfj/Swnmyh5pCDxveFAQQ5Qck9B7qHksYI8RsmWgmxp7Jnk5InGLk1O7qPkqYI8RcnHCvIxSp4pyDOUfKIgn2jkZPCNChQ8RflAeOR0fbxPKX+bjmrgGim5CJZ2QSv7z457vwYf/m4eqfXhisXaVq56cVxltnvTrXNjPDq63HRFE77aXlqpz/56Mt/Yu7pWp2v1rWsrtxr5LyuL5DPyObkMI32d3II3+iasN3alWUkr31S+rf1Y+7X2W+33LLSykDM1MnXU/vwfEmpa0Q==</latexit>
<latexit sha1_base64="wW4g1/uON3A4J2o4GJ588dFDFZ4=">AAAVK3iclZhLb9tGEMc3dh+u+rCd6pCiF6KOAwdBjVURoEWBAqkcGAmSIH7JUhKmgkivRMKkSJC0/CDUQ4/9Aj23t6Lop+jF96KHFL32EPSYQy89dHapB0XtcjYiZC1n5jfz3yG1XMsKPTdOKH1xZWHxjTffenvpncq7773/wfLK6tXDODiJbNawAy+IWlYnZp7bZ43ETTzWCiPW8S2PNa3jLe5vDlgUu0H/IDkP2XO/0+u7XdfuJGBqry58Z4aOW28n35jM84wbXxlm7Pb8jmF6rJtsmM16mwdkXtPhgSkff1obGrcMs1F0pwl4shAeYJhWPgKIwQwRHAWJYdpjjhvNyO05yU1AzYpZrqxMFqJpRpCGGi7GnhGTb5uM5LnnIibS7RLptlq60G7P9yko7VNQUiwo71OQ71MwP9v5LjkzUoJiB4S2jTx8s72yRjepeBnzg9posEZGr51gdemSmOSIBMQmJ8QnjPRJAmOPdEgMxzNSI5SEYHtOUrBFMHKFn5EhqQB7AlEMIjpgPYa/PTh7NrL24ZznjAVtQxUP3hGQBlmnf9Cf6St6SX+hL+l/ylypyMG1nMOnlbEsbC9/f23/X5Ty4TMhzpQq1ZyQLvlCaHVBeygsfBZ2xg8ufni1/+XeenqD/kT/Af0/0hf0N3oJObPDhFhGTsVMfVG7D71Nwc471xOWM8jFLWNlAdTJn0cwSgvxQ/B7oMmHdwLah1q1+Bxep1YWr1+L3zeu6B5eYxw5FEel5LpZED9EusmpGDLyu6lP6qNZWqJS0TtEuuWDjmIGbsO4faiQXaEiPfVgOR6Ku9SCs/kseR+WpwNzDaHX81mmHiyHBfZEkmFsx3h+reUZph4sx4Eyx4F2jhZ0v0hzG8adSbgzDa5Rei82XuteDCUaQg0Nj8Uq25P0berBcuwInUU+s2Ks/JrpXa+7YPek/NSDa3ek2p2J9nVYvzPPEeToirWOjfqbZ8MRU36VZNXCXLUy+usJOV4hLdCTgh2bZV1B1jW+23LSQklbQdoouaUgtzTuCDl5FyW7CrKLktsKchslHQXpoOQ9BXkPJV0FiX9T7yvI+yh5rCCPUfKBgnyg8XyUkw9R8pGCfKSxH5CTPkoGCjLQWL/l5GONlU9O7mg8a+RkiJK7CnIXJfcU5B5KRgoy0tinycl9jd2InDzQ2BPIyQZKnihIfD86UJADlDxUkIcoeaogT1GyqSCbGvs0OXmGki0F2ULJcwV5jpJPFOQTlLxQkBco+VRBPtWoyeAbpVrHqMYKyD1yujbZG5U/wccaeI6UXAdLO5cr+2+Se78FH74fGGfrwzsWa1t51usTldmOUVfn9qQ7utysoilfaa+s1Yq/2MwPDj/brNHN2u7ttTv10a85S+Rj8gnZgE5/Tu7ALmIH1ht74eXi8uK1xY+qv1Z/r/5Z/SsLXbgyYj4kM6/q3/8DE72k7w==</latexit>
LSTM
SRU
Simple recurrent unit (SRU) [Lei2018]
SRU can be decomposed into two blocks:
parallel computation and light recurrent
state
gate
state
layer output
layer input
gate
Light recurrent block
Linear
Parallel computation block
Simple recurrent unit for DGP
Replace linear transformation by GPR

in parallel computation block
state
gate
state
layer output
layer input
gate
Light recurrent block
GPR
Parallel computation block
SRU-DGP-based speech synthesis
Speech param
Context
GPR
GPR
SRU-layer w/ GPR
GPR
Context
SRU-layer w/ GPR
GPR
Context
GPR GPR
Speech param Speech param
# of SRU

layers
Time t
Utterance-level sampling for training
‣ In training process of DGP, inference and sampling is
repeatedly performed for each layer [Salimbeni19]
‣ Utterance-level predictive distribution is multivariate
Gaussian distribution:
•Hidden-layer values of adjacent frames are correlated
‣ Although the sampling can be performed by using
Cholesky decomposition of , this often unstable
‣ Use random feature expansion [Rahimi2008, Cutajar2017] for
stability of training
Σ
𝒩(h; μ, Σ)
Methods for experiments
Architecture
Models Kernel Bayes FeedForward LSTM SRU
NN - - FF-NN LSTM-RNN SRU-NN
BayesNN - ✓ FF-BayesNN -
SRU-
BayesNN
DGP ✓ ✓ FF-DGP -
SRU-DGP

(proposed)
Experimental conditions: database
Database
JSUT corpus [Sonobe2017]
1 female, BASIC0001~BASIC2000
Train / valid / test

sentences
1788 (1.95 h) / 60 / 60
Input featrue 575 dim. linguistic feature vector
Output feature
187 dim. acoustic feature vector
(Mel-cepstrum, log F0, code aperiodicity, v/uv & Δ, Δ2)
Experimental conditions: model configurations
Hidden layer dim. 256
# of inducing points 1024
Kernel function ArcCos [Cho09]
Optimizer Adam (learning rate: 0.01)
DGP
Hidden units 1024
Activation ReLU
Optimizer Adam (learning rate: 10-5)
BayesNN
NN: Hyperparameters were tuned by Optuna [Akiba2019] with 100 trials.
Objective evaluation: spectral feature distortion
Bayesian and SRU models yield smaller distortions
1 2 3 4 5 6 7 8
Number of layers
5.5
5.6
5.7
5.8
5.9
6.0
6.1
MCD[dB]
FF-NN (best)
FF-BayesNN
SRU-BayesNN
SRU-DGP
FF-DGP
LSTM-RNN (best)
SRU-NN (best)
Subjective evaluation
Proposed SRU-DGP gave higher score than other methods
1 2 3 4 5
Score
Method MOS
LSTM-RNN
SRU-NN
SRU-BayesNN
FF-DGP
SRU-DGP
ORIG
2.99
2.98
3.09
3.01
3.19
3.97
Computation time
SRU-DGP can generate speech faster than LSTM-RNN
1 2 3 4 5 6 7 8
Number of layers
SRU-DGP
FF-DGP
LSTM-RNN
SRU-NN
0.00
0.02
0.04
0.06
0.08
0.10Real-timefactor
Conclusions
‣ Incorporate simple recurrent unit (SRU) into DGP
‣ Achieve utterance-level sequential modeling
‣ The proposed SRU-DGP
•Outperformed feedforward (FF)-DGP and LSTM-RNN
•Achieved faster generation than LSTM-RNN
‣ Future work
•Investigate other differentiable components in DGP
- attention, self-attention
Additional speech samples
https://hyama5.github.io/demo_SRU_DGP_TTS/
Thank you for listening!

More Related Content

Similar to UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT

A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
Tomoki Koriyama
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
Jayavardhan Reddy Peddamail
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
June-Woo Kim
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
June-Woo Kim
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
Scott Edmunds
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
LINAGORA
 
GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017
Taegyun Jeon
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
Giorgio Alfredo Spedicato
 
Rajat CV
Rajat CVRajat CV
Rajat CV
RajatNagpal22
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
ssuser849b73
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
kevig
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
kevig
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
ssuser849b73
 
Pegasus
PegasusPegasus
Pegasus
Hangil Kim
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
Tomoki Hayashi
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
kevig
 

Similar to UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT (20)

A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 
Rajat CV
Rajat CVRajat CV
Rajat CV
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Pegasus
PegasusPegasus
Pegasus
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 

More from Tomoki Koriyama

深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
Tomoki Koriyama
 
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Tomoki Koriyama
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Tomoki Koriyama
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
Tomoki Koriyama
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
Tomoki Koriyama
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
Tomoki Koriyama
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
Tomoki Koriyama
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討
Tomoki Koriyama
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討
Tomoki Koriyama
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
Tomoki Koriyama
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
Tomoki Koriyama
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)
Tomoki Koriyama
 

More from Tomoki Koriyama (12)

深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
 
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)
 

Recently uploaded

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 

UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT

  • 1. UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT Tomoki Koriyama, Hiroshi Saruwatari The University of Tokyo, Japan May 7, 2020 TH2.PB.10, SPE-P12, ICASSP 2020
  • 2. Background: deep learning on speech synthesis ‣ Neural network (NN)-based speech synthesis •Model the relationship between texts and speech parameters ‣ Differentiable components enable complicated models •RNN (LSTM, GRU), CNN, Self-attention, Attention ‣ RNN for speech synthesis •Can capture continuously changing speech parameters •Was used in the best framework in Blizzard Challenge 2019 [Jiang2019] •Is included in end-to-end frameworks (e.g. Tacotron [Wang2017])
  • 3. Background: deep Gaussian process ‣ Deep Gaussian process (DGP) [Damianou2013][Salimbeni2018] •Multi-layer Gaussian process regressions (GPRs) •Nonlinear regression by kernel methods •Bayesian learning considering model complexity •Differentiable by variational approximation ‣ DGP-based speech synthesis [Koriyama2019] Outperformed DNN-based method Restricted model to feedforward architecture p(y|x) x p(h1 |x) p(h2 |x) GPR GPR GPR Is it possible to apply recurrent architecture to DGP?
  • 4. Extension of DGPs ‣ Convolutional DGP [Kumar2018], TICK-GP [Dutordoir2019] •Incorporate CNN architecture into DGP ‣ Probabilistic recurrent state space model (PR-SSM) [Doerr2018] •Incorporate RNN architecture into DGP •Perform GPR in each time step •Require much time for utterance-level training and generation
  • 5. Purpose of study To incorporate recurrent architecture into DGP with fast computation to enable utterance-level sequential modeling ‣ Approach •Utilize simple recurrent unit (SRU) [Lei2018] •Separate parallel computation of GPR from recurrent architecture
  • 6. Simple recurrent unit (SRU) [Lei2018] SRU does not use the past hidden-layer value 
 to calculate gates or update memory cell hℓ t−1 cℓ t <latexit sha1_base64="Zn0U1cf4rLt4wCaqCE5UdU8jWIw=">AAAUI3iclZi/b9tGFMfPStu46o/Y1VKgC1HHQYoixikIkCJAgEQOjARJEP+S5SRMBJE+iYRJkSBp+QehTu3SoWvmFOhQ9M/oEqBb0Q4Z+gcUBbpk6NKh746UREl3fBcStI7vvc/3vbujjmdZoefGCaWvFyrn3nn3vfOL71c/+PCjjy8sLX+yFwdHkc2aduAF0b7ViZnn9lkzcROP7YcR6/iWx1rW4Tr3twYsit2gv5uchuyZ3+n13a5rdxIwtZcX/jFDx220k+cm8zzj0k3DjN2e3zFMj3WTy2ar0eYBmdd0eGDK21fqQ+NLw7SKbrgfTIUHB0FimHajnSYQL4xm5Pac5AvDMM0q9xTyFuuQoVxfFFU3rkxFjzRzaFy4XVK1PV9NVDoKUYlYVOx/hHeeZ3OmskXSno/7nXV5HDTfVaekOqeYu720QteoOIz5Rj1vrJD82AyWF78nJjkgAbHJEfEJI32SQNsjHRLD+ZTUCSUh2J6RFGwRtFzhZ2RIqsAeQRSDiA5YD+FvD+6e5tY+3HPNWNA2ZPHgioA0yCr9g/5E39BX9Gf6F/1PqZUKDV7LKXxaGcvC9oXvPt35F6V8+EyIM6FKa05Il3wlanWh9lBYeC/sjB+cvXizc2N7Nb1Ef6B/Q/0v6Wv6C30FmtlpQiwjx6Knvsjdh7FNwc5HricsJ6DFLaPKAshTvI+glc7ED8HvQU0+XAnUPtTKxfvwNrmyeP1c/LlxxejhOUaRQ3FWS+bNgvghMpqcikGRP0190sh7aYlMs94hMlo+1DGrwG0YtwMZshmapSceTOOBeEotuJtXKfownQ70NYSxnleZeDANC+yJRGFkx3g+13KFiQfT2FVq7Gpr7MPoz9LchnEnEu5Eg2uWPovNt3oWQ0kNoUYNj8Qq25OM28SDaWyKOmf5zIqx8jnTm687YPek/MSD1+5Ia3fGta/C+p15DkCjK9Y6lo9vkQ1zpnyWZNnCQrYy+vaYHK2QFtSTgh3rZUNBNjS+23LSQklbQdooua4g1zWeCDl5ByW7CrKLkhsKcgMlHQXpoORdBXkXJe8pyHsoeaggD1HyvoK8r/GWk5MPUPKhgnyo8VaXk77GWionH2msQnJyU2Pdl5MhSm4pyC2U3FaQ2ygZKchIY88kJ3c0dgZyclfj/Swnmyh5pCDxveFAQQ5Qck9B7qHksYI8RsmWgmxp7Jnk5InGLk1O7qPkqYI8RcnHCvIxSp4pyDOUfKIgn2jkZPCNChQ8RflAeOR0fbxPKX+bjmrgGim5CJZ2QSv7z457vwYf/m4eqfXhisXaVq56cVxltnvTrXNjPDq63HRFE77aXlqpz/56Mt/Yu7pWp2v1rWsrtxr5LyuL5DPyObkMI32d3II3+iasN3alWUkr31S+rf1Y+7X2W+33LLSykDM1MnXU/vwfEmpa0Q==</latexit> <latexit sha1_base64="wW4g1/uON3A4J2o4GJ588dFDFZ4=">AAAVK3iclZhLb9tGEMc3dh+u+rCd6pCiF6KOAwdBjVURoEWBAqkcGAmSIH7JUhKmgkivRMKkSJC0/CDUQ4/9Aj23t6Lop+jF96KHFL32EPSYQy89dHapB0XtcjYiZC1n5jfz3yG1XMsKPTdOKH1xZWHxjTffenvpncq7773/wfLK6tXDODiJbNawAy+IWlYnZp7bZ43ETTzWCiPW8S2PNa3jLe5vDlgUu0H/IDkP2XO/0+u7XdfuJGBqry58Z4aOW28n35jM84wbXxlm7Pb8jmF6rJtsmM16mwdkXtPhgSkff1obGrcMs1F0pwl4shAeYJhWPgKIwQwRHAWJYdpjjhvNyO05yU1AzYpZrqxMFqJpRpCGGi7GnhGTb5uM5LnnIibS7RLptlq60G7P9yko7VNQUiwo71OQ71MwP9v5LjkzUoJiB4S2jTx8s72yRjepeBnzg9posEZGr51gdemSmOSIBMQmJ8QnjPRJAmOPdEgMxzNSI5SEYHtOUrBFMHKFn5EhqQB7AlEMIjpgPYa/PTh7NrL24ZznjAVtQxUP3hGQBlmnf9Cf6St6SX+hL+l/ylypyMG1nMOnlbEsbC9/f23/X5Ty4TMhzpQq1ZyQLvlCaHVBeygsfBZ2xg8ufni1/+XeenqD/kT/Af0/0hf0N3oJObPDhFhGTsVMfVG7D71Nwc471xOWM8jFLWNlAdTJn0cwSgvxQ/B7oMmHdwLah1q1+Bxep1YWr1+L3zeu6B5eYxw5FEel5LpZED9EusmpGDLyu6lP6qNZWqJS0TtEuuWDjmIGbsO4faiQXaEiPfVgOR6Ku9SCs/kseR+WpwNzDaHX81mmHiyHBfZEkmFsx3h+reUZph4sx4Eyx4F2jhZ0v0hzG8adSbgzDa5Rei82XuteDCUaQg0Nj8Uq25P0berBcuwInUU+s2Ks/JrpXa+7YPek/NSDa3ek2p2J9nVYvzPPEeToirWOjfqbZ8MRU36VZNXCXLUy+usJOV4hLdCTgh2bZV1B1jW+23LSQklbQdoouaUgtzTuCDl5FyW7CrKLktsKchslHQXpoOQ9BXkPJV0FiX9T7yvI+yh5rCCPUfKBgnyg8XyUkw9R8pGCfKSxH5CTPkoGCjLQWL/l5GONlU9O7mg8a+RkiJK7CnIXJfcU5B5KRgoy0tinycl9jd2InDzQ2BPIyQZKnihIfD86UJADlDxUkIcoeaogT1GyqSCbGvs0OXmGki0F2ULJcwV5jpJPFOQTlLxQkBco+VRBPtWoyeAbpVrHqMYKyD1yujbZG5U/wccaeI6UXAdLO5cr+2+Se78FH74fGGfrwzsWa1t51usTldmOUVfn9qQ7utysoilfaa+s1Yq/2MwPDj/brNHN2u7ttTv10a85S+Rj8gnZgE5/Tu7ALmIH1ht74eXi8uK1xY+qv1Z/r/5Z/SsLXbgyYj4kM6/q3/8DE72k7w==</latexit> LSTM SRU
  • 7. Simple recurrent unit (SRU) [Lei2018] SRU can be decomposed into two blocks: parallel computation and light recurrent state gate state layer output layer input gate Light recurrent block Linear Parallel computation block
  • 8. Simple recurrent unit for DGP Replace linear transformation by GPR
 in parallel computation block state gate state layer output layer input gate Light recurrent block GPR Parallel computation block
  • 9. SRU-DGP-based speech synthesis Speech param Context GPR GPR SRU-layer w/ GPR GPR Context SRU-layer w/ GPR GPR Context GPR GPR Speech param Speech param # of SRU
 layers Time t
  • 10. Utterance-level sampling for training ‣ In training process of DGP, inference and sampling is repeatedly performed for each layer [Salimbeni19] ‣ Utterance-level predictive distribution is multivariate Gaussian distribution: •Hidden-layer values of adjacent frames are correlated ‣ Although the sampling can be performed by using Cholesky decomposition of , this often unstable ‣ Use random feature expansion [Rahimi2008, Cutajar2017] for stability of training Σ 𝒩(h; μ, Σ)
  • 11. Methods for experiments Architecture Models Kernel Bayes FeedForward LSTM SRU NN - - FF-NN LSTM-RNN SRU-NN BayesNN - ✓ FF-BayesNN - SRU- BayesNN DGP ✓ ✓ FF-DGP - SRU-DGP
 (proposed)
  • 12. Experimental conditions: database Database JSUT corpus [Sonobe2017] 1 female, BASIC0001~BASIC2000 Train / valid / test
 sentences 1788 (1.95 h) / 60 / 60 Input featrue 575 dim. linguistic feature vector Output feature 187 dim. acoustic feature vector (Mel-cepstrum, log F0, code aperiodicity, v/uv & Δ, Δ2)
  • 13. Experimental conditions: model configurations Hidden layer dim. 256 # of inducing points 1024 Kernel function ArcCos [Cho09] Optimizer Adam (learning rate: 0.01) DGP Hidden units 1024 Activation ReLU Optimizer Adam (learning rate: 10-5) BayesNN NN: Hyperparameters were tuned by Optuna [Akiba2019] with 100 trials.
  • 14. Objective evaluation: spectral feature distortion Bayesian and SRU models yield smaller distortions 1 2 3 4 5 6 7 8 Number of layers 5.5 5.6 5.7 5.8 5.9 6.0 6.1 MCD[dB] FF-NN (best) FF-BayesNN SRU-BayesNN SRU-DGP FF-DGP LSTM-RNN (best) SRU-NN (best)
  • 15. Subjective evaluation Proposed SRU-DGP gave higher score than other methods 1 2 3 4 5 Score Method MOS LSTM-RNN SRU-NN SRU-BayesNN FF-DGP SRU-DGP ORIG 2.99 2.98 3.09 3.01 3.19 3.97
  • 16. Computation time SRU-DGP can generate speech faster than LSTM-RNN 1 2 3 4 5 6 7 8 Number of layers SRU-DGP FF-DGP LSTM-RNN SRU-NN 0.00 0.02 0.04 0.06 0.08 0.10Real-timefactor
  • 17. Conclusions ‣ Incorporate simple recurrent unit (SRU) into DGP ‣ Achieve utterance-level sequential modeling ‣ The proposed SRU-DGP •Outperformed feedforward (FF)-DGP and LSTM-RNN •Achieved faster generation than LSTM-RNN ‣ Future work •Investigate other differentiable components in DGP - attention, self-attention