SlideShare a Scribd company logo
Tomoki Koriyama12
, Takao Kobayashi1
1
Tokyo Institute of Technology, Japan, 2
Currently with The University of Tokyo, Japan
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model
Abstract Experiments
Semi-supervised learning of prosody
using DGP-LVM
GP, GPLVM, Deep Gaussian processBackground
Conclusions & Future Work
•Prosody labeling is important for TTS but laborious
•Use deep Gaussian process (DGP), a Bayesian deep model, to
represent prosodic context labels as latent variables
•Propose semi-supervised modeling for partially-annotated data, in
which the latent variables are used in place of annotated prosody
•Perform experiments using around 10% of fully-annotated data
•The proposed semi-supervised modeling with DGP
– Gave comparable score with the case all training data was
fully-annotated
– Outperformed the case using the data w/o accent information
•Future work
– Use diverse speech data including low-resource languages
– Compare other generative models, e.g., Bayes NN, VAE, flow
•To construct TTS, we require manual annotation of prosody labels,
which costs much time and patience
End-to-end approach [Wang et al., 2017][Sotelo et al., 2017]
•End-to-end TTS is language-dependent
•Japanese TTS still requires prosodic context labels [Yasuda et al., 2019] (b) Partially-annotated data
Common function
for both data
Acoustic featureAcoustic feature
Encode function
of accent contexts
Manually annotated
accent-dependent context
Latent variable as a accent
information representation
Accent-independent
context
Accent-independent
context
(a) Fully-annotated data
(a) FULL
0 1 2 3
Time [s]
150
200
300
400
F0[Hz]
(b) LABELED
0 1 2 3
Time [s]
(c) W/O ACCENT
0 1 2 3
Time [s]
(d) PROPOSED
0 1 2 3
Time [s]
(a) GP regression
ha
shi ga
Inference
(c) DGP regression (d) DGP-LVM(b) GPLVM
Purpose
– Incorporate DGP with LVM into prosody modeling
– Apply latent representation to semi-supervised learning
Problems in Japanese pitch accent
•Word meanings depend on accent
•Accent is not lexical. It varies with speakers and contexts
ha
shi
ga
ha
shi ga
Inference
Inference
Inference
•Infer the posteriors of functions and
latent variables simultaneously
Inference
[Damianou&Lawrence, 2013][Titsias&Lawrence, 2009]
Latent variable approach
•Gaussian process latent variable model (GPLVM) can represent
unannotated prosody information as latent variables [Moungsri et al., 2016]
•Single-layer GP lacks expressiveness in modeling
•Deep Gaussan process (DGP) [Damianou&Lawrence, 2013]
– Deep model with stacked Bayesian kernel regressions
– Outperformed 1-layer GP and DNN in TTS [Koriyama&Kobayashi, 2019]
low
high
a
ta ma
a
ta
malow
high
"edge is" "bridge is"
"head" "head"
"chopsticks are"
Speaker 2:Speaker 1:
•Use both fully-annotated and partially-annotated data
•The partially-annotated data does not include accent information
•Infer the posteriors of function and variables by variational Bayes
FULL 4.79 167
LABELED 5.54 228
W/O ACCENT 4.75 207
PROPOSED 4.76 178
Experimental conditions
Database
Train / Valid / Test
data set
Input features
Acoustic features
Model architecture
Model training
# of utterances for each method
Latent space: 3 dim, hidden layer: 32 dim
1024 inducing points, 5 layers
ArcCos kernel [Cho&Saul, 2009]
Optimizer: Adam, learning rate: 0.01
Japanese speech data of a female speaker in
XIMERA corpus [Kawai et al., 2004]
1533 (119 min) / 60 / 60 utterances
– 99 fully-annotated utterances
– 1434 partially-annotated utterances
accent dependent/independent context: 137/477 dims
40-dim mel-cepstrum, log F0, 5-band aperiodicity,
and their Δ+Δ2
Methods
Subjective evaluation Acoustic feature distortions
Example: generated F0 countours
Fully‒annotated data
(w/ accent info.)
Partially‒annotated data
(w/o accent info.)
FULL 1533 ‒
LABELED 99 ‒
W/O ACCENT ‒ 1533
PROPOSED 99 1434
MCD
[dB]
RMSE of
log F0 [cent]

More Related Content

What's hot

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
Hrishikesh Nair
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
cscpconf
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approachvini89
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
WarNik Chow
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
c sharada
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
WarNik Chow
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshi
Hiroyuki Miyoshi
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Daniele Di Mitri
 
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Jinho Choi
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Fwdays
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
Seoung-Ho Choi
 
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslationEffectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
Sunayana Gawde
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
Lifeng (Aaron) Han
 
Improving lexical choice in neural machine translation
Improving lexical choice in neural machine translationImproving lexical choice in neural machine translation
Improving lexical choice in neural machine translation
sekizawayuuki
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classification
JEE HYUN PARK
 
NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]
Sagar Ahire
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
ParrotAI
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
WarNik Chow
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
ananth
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
Bayu Aldi Yansyah
 

What's hot (20)

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Machine translation with statistical approach
Machine translation with statistical approachMachine translation with statistical approach
Machine translation with statistical approach
 
2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories2010 PACLIC - pay attention to categories
2010 PACLIC - pay attention to categories
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshi
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
Challenging Reading Comprehension on Daily Conversation: Passage Completion o...
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslationEffectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
 
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
MT SUMMIT13.Language-independent Model for Machine Translation Evaluation wit...
 
Improving lexical choice in neural machine translation
Improving lexical choice in neural machine translationImproving lexical choice in neural machine translation
Improving lexical choice in neural machine translation
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classification
 
NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]NLP Asignment Final Presentation [IIT-Bombay]
NLP Asignment Final Presentation [IIT-Bombay]
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 

Similar to Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model

2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
WarNik Chow
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
NUPUR YADAV
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
MedBelatrach
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
Jinho Choi
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
Jinho Choi
 
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
Tomoki Koriyama
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Yuki Tomo
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
khyati gupta
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
khyati gupta
 
Hi I am Ram.pptx
Hi I am Ram.pptxHi I am Ram.pptx
Hi I am Ram.pptx
ShubhamJain981677
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
Jayavardhan Reddy Peddamail
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
Traian Rebedea
 
A Light Introduction to Transfer Learning for NLP
A Light Introduction to Transfer Learning for NLPA Light Introduction to Transfer Learning for NLP
A Light Introduction to Transfer Learning for NLP
Lahore Garrison University
 
NLP
NLPNLP
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
Kalyanee Baruah
 
APP_All Five Unit PPT_NOTES.pptx
APP_All Five Unit PPT_NOTES.pptxAPP_All Five Unit PPT_NOTES.pptx
APP_All Five Unit PPT_NOTES.pptx
HaniyaMumtaj1
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
Hayahide Yamagishi
 

Similar to Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model (20)

2211 APSIPA
2211 APSIPA2211 APSIPA
2211 APSIPA
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
 
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH S...
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
project present
project presentproject present
project present
 
Hi I am Ram.pptx
Hi I am Ram.pptxHi I am Ram.pptx
Hi I am Ram.pptx
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
A Light Introduction to Transfer Learning for NLP
A Light Introduction to Transfer Learning for NLPA Light Introduction to Transfer Learning for NLP
A Light Introduction to Transfer Learning for NLP
 
NLP
NLPNLP
NLP
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
APP_All Five Unit PPT_NOTES.pptx
APP_All Five Unit PPT_NOTES.pptxAPP_All Five Unit PPT_NOTES.pptx
APP_All Five Unit PPT_NOTES.pptx
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 

More from Tomoki Koriyama

深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
Tomoki Koriyama
 
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Tomoki Koriyama
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
Tomoki Koriyama
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
Tomoki Koriyama
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
Tomoki Koriyama
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
Tomoki Koriyama
 
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
Tomoki Koriyama
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討
Tomoki Koriyama
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討
Tomoki Koriyama
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
Tomoki Koriyama
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
Tomoki Koriyama
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)
Tomoki Koriyama
 

More from Tomoki Koriyama (12)

深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
 
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
 
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model

  • 1. Tomoki Koriyama12 , Takao Kobayashi1 1 Tokyo Institute of Technology, Japan, 2 Currently with The University of Tokyo, Japan Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model Abstract Experiments Semi-supervised learning of prosody using DGP-LVM GP, GPLVM, Deep Gaussian processBackground Conclusions & Future Work •Prosody labeling is important for TTS but laborious •Use deep Gaussian process (DGP), a Bayesian deep model, to represent prosodic context labels as latent variables •Propose semi-supervised modeling for partially-annotated data, in which the latent variables are used in place of annotated prosody •Perform experiments using around 10% of fully-annotated data •The proposed semi-supervised modeling with DGP – Gave comparable score with the case all training data was fully-annotated – Outperformed the case using the data w/o accent information •Future work – Use diverse speech data including low-resource languages – Compare other generative models, e.g., Bayes NN, VAE, flow •To construct TTS, we require manual annotation of prosody labels, which costs much time and patience End-to-end approach [Wang et al., 2017][Sotelo et al., 2017] •End-to-end TTS is language-dependent •Japanese TTS still requires prosodic context labels [Yasuda et al., 2019] (b) Partially-annotated data Common function for both data Acoustic featureAcoustic feature Encode function of accent contexts Manually annotated accent-dependent context Latent variable as a accent information representation Accent-independent context Accent-independent context (a) Fully-annotated data (a) FULL 0 1 2 3 Time [s] 150 200 300 400 F0[Hz] (b) LABELED 0 1 2 3 Time [s] (c) W/O ACCENT 0 1 2 3 Time [s] (d) PROPOSED 0 1 2 3 Time [s] (a) GP regression ha shi ga Inference (c) DGP regression (d) DGP-LVM(b) GPLVM Purpose – Incorporate DGP with LVM into prosody modeling – Apply latent representation to semi-supervised learning Problems in Japanese pitch accent •Word meanings depend on accent •Accent is not lexical. It varies with speakers and contexts ha shi ga ha shi ga Inference Inference Inference •Infer the posteriors of functions and latent variables simultaneously Inference [Damianou&Lawrence, 2013][Titsias&Lawrence, 2009] Latent variable approach •Gaussian process latent variable model (GPLVM) can represent unannotated prosody information as latent variables [Moungsri et al., 2016] •Single-layer GP lacks expressiveness in modeling •Deep Gaussan process (DGP) [Damianou&Lawrence, 2013] – Deep model with stacked Bayesian kernel regressions – Outperformed 1-layer GP and DNN in TTS [Koriyama&Kobayashi, 2019] low high a ta ma a ta malow high "edge is" "bridge is" "head" "head" "chopsticks are" Speaker 2:Speaker 1: •Use both fully-annotated and partially-annotated data •The partially-annotated data does not include accent information •Infer the posteriors of function and variables by variational Bayes FULL 4.79 167 LABELED 5.54 228 W/O ACCENT 4.75 207 PROPOSED 4.76 178 Experimental conditions Database Train / Valid / Test data set Input features Acoustic features Model architecture Model training # of utterances for each method Latent space: 3 dim, hidden layer: 32 dim 1024 inducing points, 5 layers ArcCos kernel [Cho&Saul, 2009] Optimizer: Adam, learning rate: 0.01 Japanese speech data of a female speaker in XIMERA corpus [Kawai et al., 2004] 1533 (119 min) / 60 / 60 utterances – 99 fully-annotated utterances – 1434 partially-annotated utterances accent dependent/independent context: 137/477 dims 40-dim mel-cepstrum, log F0, 5-band aperiodicity, and their Δ+Δ2 Methods Subjective evaluation Acoustic feature distortions Example: generated F0 countours Fully‒annotated data (w/ accent info.) Partially‒annotated data (w/o accent info.) FULL 1533 ‒ LABELED 99 ‒ W/O ACCENT ‒ 1533 PROPOSED 99 1434 MCD [dB] RMSE of log F0 [cent]