/214
DNNを音響モデルとして用いたTTSの枠組み
𝒙
⋯
𝒀
Acoustic models
⋯
⋯
𝒙1
𝒙 𝑇
𝒀1
𝒀 𝑇
Spectrum
Continuous F0
Voiced / unvoiced
Band
aperiodicity
Linguistic
feats.
Static-dynamic
mean vectors
(generated speech feats.)
[Zen et al., 2013.]
⋯⋯
0
0
1
1
a
i
u
1
2
3
Phoneme
Accent
Mora
position
Frame
position
etc.
0