SlideShare a Scribd company logo
GMMN 

A Study of Sparse Approximation of Gram Matrices

for GMMN-based Speech Synthesis




Background
‣ Statistical speech synthesis
•Model the relationship between input context and output acoustic
features
- In general, synthetic speech is always the same in perception

if the sentence is the same
- Different from real human communication
‣ Sampling-based speech synthesis [Takamichi et al., 2017]
•Models the relationship between input context and

the distribution of output acoustic features
•Samples speech parameter from the distribution
•Uses generative moment matching network (GMMN) as a model
Generative moment matching network (GMMN)
‣ Generative model based on DNN
•Predict the sample of output distribution from noise vector
•Use conditional maximum mean discrepancy (CMMD) as a cost
function
•Applications
- i-vector for speaker verification [Shiota et al., 2018]
- singing voice for double-tracking [Tamaru et al., 2019]
•Advantage
- Sampling is easily performed without considering parametric p.d.f.
- Min-max optimization is not required
Purpose
‣ Computational complexity problem
•CMMD is computationally infeasible for a large amount of data
- when N is the number of training data points
•Conventional method
- Partitions data based on randomly selected minibatch
- Calculates CMMD for each minibatch
‣ Purpose of this study
O(N3
)
•Review the approximation method of CMMD, which is used as a
cost function of GMNN
•Evaluate naturalness and diversity of generates synthetic speech
Maximum Mean Discrepancy [Gretton et al.,, 2012]
The distance of two distributions is defined by

the distance of means of RKHS points
{yi} {˜yi}
ϕ(y) ϕ(y)
RKHS RKHS
μ ˜μ
𝔼[ ⋅ ] 𝔼[ ⋅ ]
MMD2
= ∥μ − ˜μ∥2
P(Y) P( ˜Y)
Conditional MMD (CMMD) [Ren et al.,, 2012]
The distance of two conditional distributions is defined by

the distance of linear operator of RKHSs
{yi} {˜yi}
ϕ(y) ϕ(y)
RKHS RKHS
μ = Cψ(x) ˜μ = ˜Cψ(x)
𝔼[ ⋅ ] 𝔼[ ⋅ ]
CMMD2
= ∥C − ˜C∥2
x
ψ(x)
RKHS
P(Y|x)
P( ˜Y|x)
Conditional MMD (CMMD)
CMMD2
= ∥C − ˜C∥2
CMMD2
= Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1
H(H + λI)−1
]
– Linear operators are estimated by kernel regressionC, ˜C
– Kernel trick is used
The distance of two conditional distributions is calculated by
the kernel functions of input features and output features
Gram matrices for output Gram matrix for input
Generative Moment Matching Network (GMMN)

[Ren et al.,, 2012]
Predict the samples of conditional distributions

using DNN, which is trained by CMMD cost function
{yi}
x {ni; ni ∼ 𝒩(0,I)}
DNN (GMNN)
{˜yi}
CMMD
: training data points
: noise
backprop
GMMN-Based Speech Synthesis
Use two DNNs, MSE criterion and
CMMD criterion that predicts residual of acoustic features
Gram
matrix
Gram
matrix
DNN with
MSE criterion
Context
Acoustic
feature
Bottleneck
feature
CMMD
Random vaue
GMMN for
sampling
Problem of GMMN-based speech synthesis
CMMD2
= Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1
H(H + λI)−1
]
2. Calculation of inverse matrix
1. Calculation of Gram matrices
O(N2
)
O(N3
)
‣ Impossible to use CMMD directly for speech synthesis,
because N of speech synthesis is large
‣ Unable to train a model by Minibatch-based optimization
Local Approximation (Conventional Method)
‣ CMMD is calculated for each partitioned minibatch
‣ This method is regarded as block diagonal approximation
•Blocks are determined by minibatch
‣ Computational complexity for each minibatch:
•B: minibatch size
CMMD2
= Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1
H(H + λI)−1
]
O(B3
)
Random Fourier Features (RFF) [Rahimi & Recht, 2008]
Kernel function is approximated by the inner product of a finite
number of basis to obtain low-rank Gram matrix
kRBF(x, x′) = (exp( −∥x − x′∥2
/2) kRBF(x, x′) ≈
1
M
M
∑
r=1
cos(x⊤
ωr + br)cos(x′⊤
ωr + br)
RBF kernel RBF kernel approx. with RFF
example:
-1.0
1.0
0.0
-1.0
1.0
0.0
Gram matrix with rank N=1000 Gram matrix with rank M=100
RFF-based Approximation
‣ Approximate Gram matrices of input features by RFF
‣ Can reduce computational complexity by matrix inversion
formula
‣ Computational complexity for each minibatch:
•B: minibatch size, M: RFF dimensions
CMMD2
= Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1
H(H + λI)−1
]
O(BM2
)
low rank low rank
Clustering for Minibatch Selection
‣ Conventional method chose minibatch randomly
•Gram matrices tended to be sparse
- Since /a/ and /s/ are distant, kernel function value is almost zero
•Sparse matrix is redundant
‣ Collect similar contexts and use cluster as minibatch
•Perform K-means clustering (K=2) on bottleneck features
•Top-down partition until cluster size becomes sufficiently small
Experimental Conditions
Database
1 female, 203 sentences

(ATR B-set subset a & j

REPEAT included in JSUT corpus)
Each sentence was repeated 5 times.
Training data 5 x 150 utterances (ATR-a and REPEAT)
Development set 5 x 26 utterances (ATR-j27 to j53)
Test data 27 utterances (ATR-j01 to j26), 5 samples are generated
Acoustic

features
0-39th mel-cepstrum, log F0, and 5-band aperiodicity
with their delta and delta-delta, and VUV
Network configurations
Dimensions
bottleneck feature: 32
noise vector: 3
hidden unit: 2014
# of hidden layers
DNN with MSE criterion: 7
GMMN: 3
Max minibatch size 10000
RFF dimensions 1024
Methods
‣ MSE
•No sampling. Just use DNN with MSE criterion
‣ VOC
•Vocoder speech of 5 different recordings
‣ Approximation methods


Subjective Evaluation: Naturalness
1
MSE
Score
95% confidence interval
p<0.01
LOCAL-RAND
LOCAL-CLST
RFF-RAND
RFF-CLST
VOC
2 3 4 5
(1: too bad, 5: very good)
Subjective Evaluation: Diversity
95% confidence interval
p<0.05 p<0.001
MSE
1 2 3 4 5
LOCAL-RAND
LOCAL-CLST
RFF-RAND
RFF-CLST
VOC
Score
(1: completely equivalent, 5: very different)
• Participants listened to two samples generated using different random inputs
• They rate how different two samples are in 5 point scale
Variance of Sampled Speech Parameters
The score of diversity increased with the variance of phone
duration
0-th mel-
cepstrum
1-st mel-
cepstrum
log F0
[cent]
phone
duration
[ms]
Diversity
MOS
LOCAL-RAND 0.023 0.012 15.8 2.46 1.61
LOCAL-CLST 0.053 0.022 18.2 3.50 1.71
RFF-RAND 0.021 0.007 1.5 3.77 1.73
RFF-CLST 0.049 0.027 14.0 5.47 1.94
Conclusions
‣ Examined the approximation methods to reduce
computational complexity of GMMN-based speech
synthesis
•Local approximation / Low rank approximation (RFF)
•Minibatch selection using clustering
‣ RFF and clustering-based minibatch improved diversity
‣ Future work
•Employ sequence-level modeling
•Use more data
•Investigate evaluation method of sampling-based TTS

More Related Content

What's hot

Performance Analysis of PAPR Reduction in MIMO-OFDM
Performance Analysis of PAPR Reduction in MIMO-OFDMPerformance Analysis of PAPR Reduction in MIMO-OFDM
Performance Analysis of PAPR Reduction in MIMO-OFDM
IJARBEST JOURNAL
 
2014.03.31.bach glc-pham-finalizing[conflict]
2014.03.31.bach glc-pham-finalizing[conflict]2014.03.31.bach glc-pham-finalizing[conflict]
2014.03.31.bach glc-pham-finalizing[conflict]
Bách Vũ Trọng
 
Frequency Domain Filtering of Digital Images
Frequency Domain Filtering of Digital ImagesFrequency Domain Filtering of Digital Images
Frequency Domain Filtering of Digital Images
Upendra Pratap Singh
 
Presenter name aizaz ali
Presenter name aizaz aliPresenter name aizaz ali
Presenter name aizaz ali
AizazAli21
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduce
Iraj Hedayati
 
Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...
Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...
Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...
Ealwan Lee
 
Computer Vision Pertemuan 05
Computer  Vision Pertemuan 05Computer  Vision Pertemuan 05
Computer Vision Pertemuan 05
soe sumijan
 
DFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceDFA Minimization in Map-Reduce
DFA Minimization in Map-Reduce
Iraj Hedayati
 
Nuclear Medicine Formulas
Nuclear Medicine FormulasNuclear Medicine Formulas
Nuclear Medicine Formulas
@Saudi_nmc
 
OPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIO
OPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIOOPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIO
OPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIO
csandit
 
Aistats RTD
Aistats RTDAistats RTD
Aistats RTD
Yuma Murakami
 
ICMR 2014 - Sparse Kernel Learning Poster
ICMR 2014 - Sparse Kernel Learning PosterICMR 2014 - Sparse Kernel Learning Poster
ICMR 2014 - Sparse Kernel Learning Poster
Sean Moran
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...
Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...
Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...Qian Han
 

What's hot (17)

Performance Analysis of PAPR Reduction in MIMO-OFDM
Performance Analysis of PAPR Reduction in MIMO-OFDMPerformance Analysis of PAPR Reduction in MIMO-OFDM
Performance Analysis of PAPR Reduction in MIMO-OFDM
 
2014.03.31.bach glc-pham-finalizing[conflict]
2014.03.31.bach glc-pham-finalizing[conflict]2014.03.31.bach glc-pham-finalizing[conflict]
2014.03.31.bach glc-pham-finalizing[conflict]
 
Frequency Domain Filtering of Digital Images
Frequency Domain Filtering of Digital ImagesFrequency Domain Filtering of Digital Images
Frequency Domain Filtering of Digital Images
 
FK_icassp_2014
FK_icassp_2014FK_icassp_2014
FK_icassp_2014
 
Presenter name aizaz ali
Presenter name aizaz aliPresenter name aizaz ali
Presenter name aizaz ali
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduce
 
Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...
Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...
Joint Compensation of CIM3 and I/Q Imbalance in the Up-conversion Mixer with ...
 
Computer Vision Pertemuan 05
Computer  Vision Pertemuan 05Computer  Vision Pertemuan 05
Computer Vision Pertemuan 05
 
DFA Minimization in Map-Reduce
DFA Minimization in Map-ReduceDFA Minimization in Map-Reduce
DFA Minimization in Map-Reduce
 
SBMF
SBMFSBMF
SBMF
 
Lect5 v2
Lect5 v2Lect5 v2
Lect5 v2
 
Nuclear Medicine Formulas
Nuclear Medicine FormulasNuclear Medicine Formulas
Nuclear Medicine Formulas
 
OPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIO
OPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIOOPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIO
OPTIMAL BEAM STEERING ANGLES OF A SENSOR ARRAY FOR A MULTIPLE SOURCE SCENARIO
 
Aistats RTD
Aistats RTDAistats RTD
Aistats RTD
 
ICMR 2014 - Sparse Kernel Learning Poster
ICMR 2014 - Sparse Kernel Learning PosterICMR 2014 - Sparse Kernel Learning Poster
ICMR 2014 - Sparse Kernel Learning Poster
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...
Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...
Spectrum-efficiency parametric channel estimation scheme for massive MIMO sys...
 

Similar to GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討

Asymptotic boundpresentation
Asymptotic boundpresentationAsymptotic boundpresentation
Asymptotic boundpresentation
Ramoni Adeogun, PhD
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
CSCJournals
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO Decoder
CSCJournals
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO Decoder
CSCJournals
 
Doppler Estimation Method of Using Frequency Channel Response for OFDM System...
Doppler Estimation Method of Using Frequency Channel Response for OFDM System...Doppler Estimation Method of Using Frequency Channel Response for OFDM System...
Doppler Estimation Method of Using Frequency Channel Response for OFDM System...
Tatsuji Miyamoto
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
Darshini Parikh
 
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
Pioneer Natural Resources
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
AIST
 
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Berna Bulut
 
Two-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time Coding
Two-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time CodingTwo-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time Coding
Two-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time Coding
Walter Freitas
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
htstatistics
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition final
Archit Vora
 
Analysis of Adaptive and Advanced Speckle Filters on SAR Data
Analysis of Adaptive and Advanced Speckle Filters on SAR DataAnalysis of Adaptive and Advanced Speckle Filters on SAR Data
Analysis of Adaptive and Advanced Speckle Filters on SAR Data
IOSRjournaljce
 
Icmmse slides
Icmmse slidesIcmmse slides
Icmmse slides
Manoj Shukla
 
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Abolfazl Asudeh
 
adaptive equa.ppt
adaptive equa.pptadaptive equa.ppt
adaptive equa.ppt
mohamadfarzansabahi1
 
EUCAP 2021_presentation (7)
EUCAP 2021_presentation (7)EUCAP 2021_presentation (7)
EUCAP 2021_presentation (7)
Hamdi Bilel
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 

Similar to GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討 (20)

Asymptotic boundpresentation
Asymptotic boundpresentationAsymptotic boundpresentation
Asymptotic boundpresentation
 
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding AlgorithmFixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO Decoder
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO Decoder
 
Doppler Estimation Method of Using Frequency Channel Response for OFDM System...
Doppler Estimation Method of Using Frequency Channel Response for OFDM System...Doppler Estimation Method of Using Frequency Channel Response for OFDM System...
Doppler Estimation Method of Using Frequency Channel Response for OFDM System...
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
 
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 
Oceans13 Presentation
Oceans13 PresentationOceans13 Presentation
Oceans13 Presentation
 
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
Cross-Layer Design of Raptor Codes for Video Multicast over 802.11n MIMO Chan...
 
Two-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time Coding
Two-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time CodingTwo-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time Coding
Two-Way MIMO Decode-and-Forward Relaying Systems with Tensor Space-Time Coding
 
Generalized Nonlinear Models in R
Generalized Nonlinear Models in RGeneralized Nonlinear Models in R
Generalized Nonlinear Models in R
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition final
 
Analysis of Adaptive and Advanced Speckle Filters on SAR Data
Analysis of Adaptive and Advanced Speckle Filters on SAR DataAnalysis of Adaptive and Advanced Speckle Filters on SAR Data
Analysis of Adaptive and Advanced Speckle Filters on SAR Data
 
Icmmse slides
Icmmse slidesIcmmse slides
Icmmse slides
 
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...Efficient Computation ofRegret-ratio Minimizing Set:A Compact Maxima Repres...
Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Repres...
 
teh presentation
teh presentationteh presentation
teh presentation
 
adaptive equa.ppt
adaptive equa.pptadaptive equa.ppt
adaptive equa.ppt
 
EUCAP 2021_presentation (7)
EUCAP 2021_presentation (7)EUCAP 2021_presentation (7)
EUCAP 2021_presentation (7)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

More from Tomoki Koriyama

UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
Tomoki Koriyama
 
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
Tomoki Koriyama
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Tomoki Koriyama
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
Tomoki Koriyama
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
Tomoki Koriyama
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
Tomoki Koriyama
 
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
Tomoki Koriyama
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討
Tomoki Koriyama
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討
Tomoki Koriyama
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
Tomoki Koriyama
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
Tomoki Koriyama
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)
Tomoki Koriyama
 

More from Tomoki Koriyama (12)

UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
 
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
 
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)
 

Recently uploaded

platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 

Recently uploaded (20)

platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 

GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討

  • 1. GMMN 
 A Study of Sparse Approximation of Gram Matrices
 for GMMN-based Speech Synthesis 
 

  • 2. Background ‣ Statistical speech synthesis •Model the relationship between input context and output acoustic features - In general, synthetic speech is always the same in perception
 if the sentence is the same - Different from real human communication ‣ Sampling-based speech synthesis [Takamichi et al., 2017] •Models the relationship between input context and
 the distribution of output acoustic features •Samples speech parameter from the distribution •Uses generative moment matching network (GMMN) as a model
  • 3. Generative moment matching network (GMMN) ‣ Generative model based on DNN •Predict the sample of output distribution from noise vector •Use conditional maximum mean discrepancy (CMMD) as a cost function •Applications - i-vector for speaker verification [Shiota et al., 2018] - singing voice for double-tracking [Tamaru et al., 2019] •Advantage - Sampling is easily performed without considering parametric p.d.f. - Min-max optimization is not required
  • 4. Purpose ‣ Computational complexity problem •CMMD is computationally infeasible for a large amount of data - when N is the number of training data points •Conventional method - Partitions data based on randomly selected minibatch - Calculates CMMD for each minibatch ‣ Purpose of this study O(N3 ) •Review the approximation method of CMMD, which is used as a cost function of GMNN •Evaluate naturalness and diversity of generates synthetic speech
  • 5. Maximum Mean Discrepancy [Gretton et al.,, 2012] The distance of two distributions is defined by
 the distance of means of RKHS points {yi} {˜yi} ϕ(y) ϕ(y) RKHS RKHS μ ˜μ 𝔼[ ⋅ ] 𝔼[ ⋅ ] MMD2 = ∥μ − ˜μ∥2 P(Y) P( ˜Y)
  • 6. Conditional MMD (CMMD) [Ren et al.,, 2012] The distance of two conditional distributions is defined by
 the distance of linear operator of RKHSs {yi} {˜yi} ϕ(y) ϕ(y) RKHS RKHS μ = Cψ(x) ˜μ = ˜Cψ(x) 𝔼[ ⋅ ] 𝔼[ ⋅ ] CMMD2 = ∥C − ˜C∥2 x ψ(x) RKHS P(Y|x) P( ˜Y|x)
  • 7. Conditional MMD (CMMD) CMMD2 = ∥C − ˜C∥2 CMMD2 = Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1 H(H + λI)−1 ] – Linear operators are estimated by kernel regressionC, ˜C – Kernel trick is used The distance of two conditional distributions is calculated by the kernel functions of input features and output features Gram matrices for output Gram matrix for input
  • 8. Generative Moment Matching Network (GMMN)
 [Ren et al.,, 2012] Predict the samples of conditional distributions
 using DNN, which is trained by CMMD cost function {yi} x {ni; ni ∼ 𝒩(0,I)} DNN (GMNN) {˜yi} CMMD : training data points : noise backprop
  • 9. GMMN-Based Speech Synthesis Use two DNNs, MSE criterion and CMMD criterion that predicts residual of acoustic features Gram matrix Gram matrix DNN with MSE criterion Context Acoustic feature Bottleneck feature CMMD Random vaue GMMN for sampling
  • 10. Problem of GMMN-based speech synthesis CMMD2 = Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1 H(H + λI)−1 ] 2. Calculation of inverse matrix 1. Calculation of Gram matrices O(N2 ) O(N3 ) ‣ Impossible to use CMMD directly for speech synthesis, because N of speech synthesis is large ‣ Unable to train a model by Minibatch-based optimization
  • 11. Local Approximation (Conventional Method) ‣ CMMD is calculated for each partitioned minibatch ‣ This method is regarded as block diagonal approximation •Blocks are determined by minibatch ‣ Computational complexity for each minibatch: •B: minibatch size CMMD2 = Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1 H(H + λI)−1 ] O(B3 )
  • 12. Random Fourier Features (RFF) [Rahimi & Recht, 2008] Kernel function is approximated by the inner product of a finite number of basis to obtain low-rank Gram matrix kRBF(x, x′) = (exp( −∥x − x′∥2 /2) kRBF(x, x′) ≈ 1 M M ∑ r=1 cos(x⊤ ωr + br)cos(x′⊤ ωr + br) RBF kernel RBF kernel approx. with RFF example: -1.0 1.0 0.0 -1.0 1.0 0.0 Gram matrix with rank N=1000 Gram matrix with rank M=100
  • 13. RFF-based Approximation ‣ Approximate Gram matrices of input features by RFF ‣ Can reduce computational complexity by matrix inversion formula ‣ Computational complexity for each minibatch: •B: minibatch size, M: RFF dimensions CMMD2 = Tr [(KY,Y + K˜Y, ˜Y − 2KY, ˜Y)(H + λI)−1 H(H + λI)−1 ] O(BM2 ) low rank low rank
  • 14. Clustering for Minibatch Selection ‣ Conventional method chose minibatch randomly •Gram matrices tended to be sparse - Since /a/ and /s/ are distant, kernel function value is almost zero •Sparse matrix is redundant ‣ Collect similar contexts and use cluster as minibatch •Perform K-means clustering (K=2) on bottleneck features •Top-down partition until cluster size becomes sufficiently small
  • 15. Experimental Conditions Database 1 female, 203 sentences
 (ATR B-set subset a & j
 REPEAT included in JSUT corpus) Each sentence was repeated 5 times. Training data 5 x 150 utterances (ATR-a and REPEAT) Development set 5 x 26 utterances (ATR-j27 to j53) Test data 27 utterances (ATR-j01 to j26), 5 samples are generated Acoustic
 features 0-39th mel-cepstrum, log F0, and 5-band aperiodicity with their delta and delta-delta, and VUV
  • 16. Network configurations Dimensions bottleneck feature: 32 noise vector: 3 hidden unit: 2014 # of hidden layers DNN with MSE criterion: 7 GMMN: 3 Max minibatch size 10000 RFF dimensions 1024
  • 17. Methods ‣ MSE •No sampling. Just use DNN with MSE criterion ‣ VOC •Vocoder speech of 5 different recordings ‣ Approximation methods 

  • 18. Subjective Evaluation: Naturalness 1 MSE Score 95% confidence interval p<0.01 LOCAL-RAND LOCAL-CLST RFF-RAND RFF-CLST VOC 2 3 4 5 (1: too bad, 5: very good)
  • 19. Subjective Evaluation: Diversity 95% confidence interval p<0.05 p<0.001 MSE 1 2 3 4 5 LOCAL-RAND LOCAL-CLST RFF-RAND RFF-CLST VOC Score (1: completely equivalent, 5: very different) • Participants listened to two samples generated using different random inputs • They rate how different two samples are in 5 point scale
  • 20. Variance of Sampled Speech Parameters The score of diversity increased with the variance of phone duration 0-th mel- cepstrum 1-st mel- cepstrum log F0 [cent] phone duration [ms] Diversity MOS LOCAL-RAND 0.023 0.012 15.8 2.46 1.61 LOCAL-CLST 0.053 0.022 18.2 3.50 1.71 RFF-RAND 0.021 0.007 1.5 3.77 1.73 RFF-CLST 0.049 0.027 14.0 5.47 1.94
  • 21. Conclusions ‣ Examined the approximation methods to reduce computational complexity of GMMN-based speech synthesis •Local approximation / Low rank approximation (RFF) •Minibatch selection using clustering ‣ RFF and clustering-based minibatch improved diversity ‣ Future work •Employ sequence-level modeling •Use more data •Investigate evaluation method of sampling-based TTS