SlideShare a Scribd company logo
2015©Shinnosuke TAKAMICHI 09/11/2015
shinnosuke-t@is.naist.jp
The NAIST Text-to-Speech System
for Blizzard Challenge 2015
Shinnosuke Takamichi,
Kazuhiro Kobayashi, Kou Tanaka,
Tomoki Toda, Satoshi Nakamura
(NAIST, Japan)
Blizzard Challenge 2015
/20
Blizzard Challenge 2015
 Languages
– Bengali, Hindi, Malayalam, Marathi, Tamil, & Telugu
– + English
 Provided data
– UTF-8-encoded text & 16 kHz-sampled speech waveform
– → We need to develop natural language process (front-end) and
speech waveform generation (back-end).
 2 tasks
– Mono-lingual task (IH1) … 6 Indian languages
– Multi-lingual task (IH2) … Indian languages + English
2
/20
Overview of our TTS system
3
v
v
Provided database
Speech featuresContext labels
HSMM & MS database
Context labels
of input text
Text Speech
Text processing Speech processing
Training
Synthesis
 Our system
– HMM-based TTS with 4 main modules
– No external data for all modules
 New functions
– Parameter trajectory smoothing in the speech processing module
– Modulation Spectrum (MS) in the synthesis module
Synthetic speech
/20
Text processing module
4
Text
(Discrete)
context labels
Text
analysis
Context
generation
 Bengali, Hindi, Tamil, & Telugu
Festvox ver. 2.7 recipes [Black et al., 2001.]
 Marathi
Festvox recipe for Hindi
 Malayalam
Rule [Nair et al., 2013.] … Stress is not extracted.
 Same contexts for all languages
Phoneme, syllable, & stress
Vowel/consonant, articulator & U/V
Position of phoneme, syllable, & word
The number of phonemes, syllables, & words
/20
Speech processing module
5
Speech
61-dim.
mel-cepstrum
Spectrum
extraction
F0
extraction
Aperiodicity
extraction
Trajectory
smoothing
Continuous F0 U/V symbol
5-band
aperiodicity
Continuous
F0 extraction
Trajectory
smoothing
Band
averaging
*STRAIGHT [Kawahara et al.], WORLD [Morise et al.]
/20
Motivation of parameter
trajectory smoothing
 Motivation
– Remove temporal fluctuation difficult to be modeled with HMMs
 Examples
– Fluctuating sequence vs. Smooth sequence
6
Mean ± variance
/20
Modulation spectrum analysis
for parameter trajectory smoothing
 Modulation spectrum [Takamichi et al., 2014 & 2015.]
– Power spectra of the temporal parameter sequence
– An extension of Global Variance (GV) [Toda et al., 2007.]
7
Modulation frequency
Modulationspectrum
Mel-cep sequence
FFT
& pow.
Easy to model
with HMMs
Difficult to model
with HMMs
Dominant in speech
perception
/20
Parameter trajectory smoothing
(= High modulation freq. removal)
8
Extracted parameters
50 Hz-cutoff LPF to remove high modulation freq.
*LPF: Low Pass Filter
/20
Training module
9
Mel-cepstrum Cont. F0 U/V symbol Aperiodicity
HSMM database MS database
 HSMM training
– ML training of context-dependent HSMM [Yoshimura et al., 1999.]
– MDL-based clustering [Shinoda et al., 2000.]
 MS model training
– Mean-normalized MS [Takamichi et al., 2014.]
– ML training of Gaussian distribution
/20
Synthesis module
10
Context labels of input text
HSMM database MS database
Spectrum
Generation
w/ MS
Cont. F0
generation
Aperiodicity
generation
U/V symbol
generation
MS-based
post-filter
MLSA filter
Synthetic speech
Smoothing
In silence*
* For reducing unnatural power in silence
/20
Speech parameter generation
algorithm considering MS
11
w/ ~50 Hz MS
w/o MS
𝒚 = argmax 𝑃𝑟𝑜𝑏HMM 𝑾𝒚 𝑃𝑟𝑜𝑏MS 𝒔 𝒚
𝜔
𝒚: speech parameters, 𝑾: delta window, 𝒔 𝒚 : MS of 𝒚
/20
Speech samples
12
Language w/o MS w/ MS
Bengali
Hindi
Malayalam
Marathi
Tamil
Telugu
EXPERIMENTAL RESULTS
13
/20
Evaluation of synthesizer
 Evaluation
– Naturalness: 5-point MOS score
– Intelligibility: WER of listening tests
– Similarity: 5-point DMOS score
 Result shown in this talk
– Naturalness: mean of MOS score of RD task in Marathi
– Intelligibility: mean of WER in Marathi
– Similarity: mean of DMOS score of RD task in Marathi
– + rank of these scores in all languages
14
/20
5-point MOS score on naturalness
15
Our place for RD task (our place / #-of-systems)
Bengali Marathi Hindi Tamil Malayalam Telugu
6 / 10 2 / 9 4 / 10 6 / 10 2 / 10 2 / 10
Results in Marathi
/20
WER for intelligibility
16
Our place (our place / #-of-systems)
Bengali Marathi Hindi Tamil Malayalam Telugu
5 / 10 1 / 9 7 / 10 8 / 10 4 / 10 4 / 10
Results in Marathi
/20
5-point DMOS score on similarity
17
Our place for RD task (our place / #-of-systems)
Bengali Marathi Hindi Tamil Malayalam Telugu
5 / 10 5 / 9 4 / 10 5 / 10 8 / 10 5 / 10
Results in Marathi
/20
Goodness and Weakness
 Good!
– Naturalness of synthetic speech
– Intelligibility of synthetic speech (in Marathi)
– Small footprint (10 ~ 20 MB)
– Fast training (~ 10 hours for 1 system)
 Weak…
– Similarity of synthetic speech
– Slow synthesis (3 minutes for 1 sentence)
• Because generation considering MS needs iteration.
 Early stopping of the iteration
 Parallelization of generation algorithm
18
/20
Is our system open source?
 Text processing
– Text analyzer … Yes (Festvox) except Malayalam
– Context generator … Yes (my GitHub*1)
 Speech processing
– Speech analyzer … Yes (STRAIGHT & WORLD)
– Spectral smoothing … No, but it uses only Butterworth LPF.
 Training
– HSMM & MS model training … Yes (HTS & SPTK)
 Synthesis
– Generation w/ MS … No, but post-filter is available (HTS).
19
*1: search “shinnsuke takamichi”
/20
Conclusion
 Our challenge
– Mono-lingual task (IH1) for Indian languages
 Our TTS synthesizer
– HMM-based TTS with 4 main modules
– Parameter trajectory smoothing in the speech processing module
– Modulation spectrum in the synthesis module
 Future work
– Combine with statistical sample-based method [Takamichi et al., 2014.]
20

More Related Content

What's hot

Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
IJECEIAES
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
A Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis SystemA Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis System
iosrjce
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
Chamani Shiranthika
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
Prabha P
 
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for VietnameseThe first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
journalBEEI
 
Baum2
Baum2Baum2
Baum2
dmolina87
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
International Journal of Science and Research (IJSR)
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
himadrigupta
 
Matlab: Speech Signal Analysis
Matlab: Speech Signal AnalysisMatlab: Speech Signal Analysis
Matlab: Speech Signal Analysis
DataminingTools Inc
 
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
NU_I_TODALAB
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
csandit
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Speech user interface
Speech user interfaceSpeech user interface
Speech user interface
Husain master
 
Saito2103slp
Saito2103slpSaito2103slp
Saito2103slp
Yuki Saito
 

What's hot (17)

Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
A Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis SystemA Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis System
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
 
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for VietnameseThe first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
 
Baum2
Baum2Baum2
Baum2
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
Matlab: Speech Signal Analysis
Matlab: Speech Signal AnalysisMatlab: Speech Signal Analysis
Matlab: Speech Signal Analysis
 
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
 
Speech user interface
Speech user interfaceSpeech user interface
Speech user interface
 
Saito2103slp
Saito2103slpSaito2103slp
Saito2103slp
 

Viewers also liked

私がビギナーの頃を振り返って ~20代の代表として~
私がビギナーの頃を振り返って~20代の代表として~私がビギナーの頃を振り返って~20代の代表として~
私がビギナーの頃を振り返って ~20代の代表として~
Shinnosuke Takamichi
 
z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)
Shinnosuke Takamichi
 
やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析
やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析
やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析
Shinnosuke Takamichi
 
研究発表のためのプレゼンテーション技術
研究発表のためのプレゼンテーション技術研究発表のためのプレゼンテーション技術
研究発表のためのプレゼンテーション技術
Shinnosuke Takamichi
 
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズムDNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
Shinnosuke Takamichi
 
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
Shinnosuke Takamichi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
Takuya Yoshioka
 
Saito2017icassp
Saito2017icasspSaito2017icassp
Saito2017icassp
Yuki Saito
 
Icml読み会 deep speech2
Icml読み会 deep speech2Icml読み会 deep speech2
Icml読み会 deep speech2
Jiro Nishitoba
 
深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析
Hiroki Ouchi
 
miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asj
Yuki Saito
 
音声認識と深層学習
音声認識と深層学習音声認識と深層学習
音声認識と深層学習
Preferred Networks
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
Seiya Tokui
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
Shinnosuke Takamichi
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
Shinnosuke Takamichi
 

Viewers also liked (15)

私がビギナーの頃を振り返って ~20代の代表として~
私がビギナーの頃を振り返って~20代の代表として~私がビギナーの頃を振り返って~20代の代表として~
私がビギナーの頃を振り返って ~20代の代表として~
 
z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)
 
やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析
やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析
やさしく音声分析法を学ぶ: ケプストラム分析とLPC分析
 
研究発表のためのプレゼンテーション技術
研究発表のためのプレゼンテーション技術研究発表のためのプレゼンテーション技術
研究発表のためのプレゼンテーション技術
 
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズムDNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
DNNテキスト音声合成のためのAnti-spoofingに敵対する学習アルゴリズム
 
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
HMMに基づく日本人英語音声合成における中学生徒の英語音声を用いた評価
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
Saito2017icassp
Saito2017icasspSaito2017icassp
Saito2017icassp
 
Icml読み会 deep speech2
Icml読み会 deep speech2Icml読み会 deep speech2
Icml読み会 deep speech2
 
深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析深層リカレントニューラルネットワークを用いた日本語述語項構造解析
深層リカレントニューラルネットワークを用いた日本語述語項構造解析
 
miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asj
 
音声認識と深層学習
音声認識と深層学習音声認識と深層学習
音声認識と深層学習
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 

Similar to The NAIST Text-to-Speech System for Blizzard Challenge 2015

D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
Jasline Presilda
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET Journal
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Sheeyam Shellvacumar
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_asc
Yuki Saito
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Association for Computational Linguistics
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
CSCJournals
 
nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
Yuki Saito
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
Stephen Peacock
 
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
iosrjce
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
csandit
 
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Yusuke Oda
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
IRJET Journal
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS - The Language Data Network
 
Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016
Nakul Sharma
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
Jasline Presilda
 
NLP
NLPNLP
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
Manuel Herranz
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET Journal
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
IRJET Journal
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
IJERA Editor
 

Similar to The NAIST Text-to-Speech System for Blizzard Challenge 2015 (20)

D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_asc
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
 
nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
Approach of Syllable Based Unit Selection Text- To-Speech Synthesis System fo...
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Kerstin Bier, Sybase, 4...
 
Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016Session on machine translation batu 19 march2016
Session on machine translation batu 19 march2016
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
NLP
NLPNLP
NLP
 
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
 

More from Shinnosuke Takamichi

JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
Shinnosuke Takamichi
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろう
Shinnosuke Takamichi
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
Shinnosuke Takamichi
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
Shinnosuke Takamichi
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
Shinnosuke Takamichi
 
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
Shinnosuke Takamichi
 
国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告
Shinnosuke Takamichi
 
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Shinnosuke Takamichi
 
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
Shinnosuke Takamichi
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパス
Shinnosuke Takamichi
 
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
Shinnosuke Takamichi
 
音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン
Shinnosuke Takamichi
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamforming
Shinnosuke Takamichi
 
論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus
Shinnosuke Takamichi
 
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
Shinnosuke Takamichi
 
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
Shinnosuke Takamichi
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス
Shinnosuke Takamichi
 
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
Shinnosuke Takamichi
 
音声合成・変換の国際コンペティションへの 参加を振り返って
音声合成・変換の国際コンペティションへの  参加を振り返って音声合成・変換の国際コンペティションへの  参加を振り返って
音声合成・変換の国際コンペティションへの 参加を振り返って
Shinnosuke Takamichi
 
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-trackingユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
Shinnosuke Takamichi
 

More from Shinnosuke Takamichi (20)

JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろう
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
 
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
 
国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告
 
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
 
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパス
 
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
 
音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamforming
 
論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus
 
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
 
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス
 
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
 
音声合成・変換の国際コンペティションへの 参加を振り返って
音声合成・変換の国際コンペティションへの  参加を振り返って音声合成・変換の国際コンペティションへの  参加を振り返って
音声合成・変換の国際コンペティションへの 参加を振り返って
 
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-trackingユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
 

Recently uploaded

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 

Recently uploaded (20)

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 

The NAIST Text-to-Speech System for Blizzard Challenge 2015

  • 1. 2015©Shinnosuke TAKAMICHI 09/11/2015 shinnosuke-t@is.naist.jp The NAIST Text-to-Speech System for Blizzard Challenge 2015 Shinnosuke Takamichi, Kazuhiro Kobayashi, Kou Tanaka, Tomoki Toda, Satoshi Nakamura (NAIST, Japan) Blizzard Challenge 2015
  • 2. /20 Blizzard Challenge 2015  Languages – Bengali, Hindi, Malayalam, Marathi, Tamil, & Telugu – + English  Provided data – UTF-8-encoded text & 16 kHz-sampled speech waveform – → We need to develop natural language process (front-end) and speech waveform generation (back-end).  2 tasks – Mono-lingual task (IH1) … 6 Indian languages – Multi-lingual task (IH2) … Indian languages + English 2
  • 3. /20 Overview of our TTS system 3 v v Provided database Speech featuresContext labels HSMM & MS database Context labels of input text Text Speech Text processing Speech processing Training Synthesis  Our system – HMM-based TTS with 4 main modules – No external data for all modules  New functions – Parameter trajectory smoothing in the speech processing module – Modulation Spectrum (MS) in the synthesis module Synthetic speech
  • 4. /20 Text processing module 4 Text (Discrete) context labels Text analysis Context generation  Bengali, Hindi, Tamil, & Telugu Festvox ver. 2.7 recipes [Black et al., 2001.]  Marathi Festvox recipe for Hindi  Malayalam Rule [Nair et al., 2013.] … Stress is not extracted.  Same contexts for all languages Phoneme, syllable, & stress Vowel/consonant, articulator & U/V Position of phoneme, syllable, & word The number of phonemes, syllables, & words
  • 5. /20 Speech processing module 5 Speech 61-dim. mel-cepstrum Spectrum extraction F0 extraction Aperiodicity extraction Trajectory smoothing Continuous F0 U/V symbol 5-band aperiodicity Continuous F0 extraction Trajectory smoothing Band averaging *STRAIGHT [Kawahara et al.], WORLD [Morise et al.]
  • 6. /20 Motivation of parameter trajectory smoothing  Motivation – Remove temporal fluctuation difficult to be modeled with HMMs  Examples – Fluctuating sequence vs. Smooth sequence 6 Mean ± variance
  • 7. /20 Modulation spectrum analysis for parameter trajectory smoothing  Modulation spectrum [Takamichi et al., 2014 & 2015.] – Power spectra of the temporal parameter sequence – An extension of Global Variance (GV) [Toda et al., 2007.] 7 Modulation frequency Modulationspectrum Mel-cep sequence FFT & pow. Easy to model with HMMs Difficult to model with HMMs Dominant in speech perception
  • 8. /20 Parameter trajectory smoothing (= High modulation freq. removal) 8 Extracted parameters 50 Hz-cutoff LPF to remove high modulation freq. *LPF: Low Pass Filter
  • 9. /20 Training module 9 Mel-cepstrum Cont. F0 U/V symbol Aperiodicity HSMM database MS database  HSMM training – ML training of context-dependent HSMM [Yoshimura et al., 1999.] – MDL-based clustering [Shinoda et al., 2000.]  MS model training – Mean-normalized MS [Takamichi et al., 2014.] – ML training of Gaussian distribution
  • 10. /20 Synthesis module 10 Context labels of input text HSMM database MS database Spectrum Generation w/ MS Cont. F0 generation Aperiodicity generation U/V symbol generation MS-based post-filter MLSA filter Synthetic speech Smoothing In silence* * For reducing unnatural power in silence
  • 11. /20 Speech parameter generation algorithm considering MS 11 w/ ~50 Hz MS w/o MS 𝒚 = argmax 𝑃𝑟𝑜𝑏HMM 𝑾𝒚 𝑃𝑟𝑜𝑏MS 𝒔 𝒚 𝜔 𝒚: speech parameters, 𝑾: delta window, 𝒔 𝒚 : MS of 𝒚
  • 12. /20 Speech samples 12 Language w/o MS w/ MS Bengali Hindi Malayalam Marathi Tamil Telugu
  • 14. /20 Evaluation of synthesizer  Evaluation – Naturalness: 5-point MOS score – Intelligibility: WER of listening tests – Similarity: 5-point DMOS score  Result shown in this talk – Naturalness: mean of MOS score of RD task in Marathi – Intelligibility: mean of WER in Marathi – Similarity: mean of DMOS score of RD task in Marathi – + rank of these scores in all languages 14
  • 15. /20 5-point MOS score on naturalness 15 Our place for RD task (our place / #-of-systems) Bengali Marathi Hindi Tamil Malayalam Telugu 6 / 10 2 / 9 4 / 10 6 / 10 2 / 10 2 / 10 Results in Marathi
  • 16. /20 WER for intelligibility 16 Our place (our place / #-of-systems) Bengali Marathi Hindi Tamil Malayalam Telugu 5 / 10 1 / 9 7 / 10 8 / 10 4 / 10 4 / 10 Results in Marathi
  • 17. /20 5-point DMOS score on similarity 17 Our place for RD task (our place / #-of-systems) Bengali Marathi Hindi Tamil Malayalam Telugu 5 / 10 5 / 9 4 / 10 5 / 10 8 / 10 5 / 10 Results in Marathi
  • 18. /20 Goodness and Weakness  Good! – Naturalness of synthetic speech – Intelligibility of synthetic speech (in Marathi) – Small footprint (10 ~ 20 MB) – Fast training (~ 10 hours for 1 system)  Weak… – Similarity of synthetic speech – Slow synthesis (3 minutes for 1 sentence) • Because generation considering MS needs iteration.  Early stopping of the iteration  Parallelization of generation algorithm 18
  • 19. /20 Is our system open source?  Text processing – Text analyzer … Yes (Festvox) except Malayalam – Context generator … Yes (my GitHub*1)  Speech processing – Speech analyzer … Yes (STRAIGHT & WORLD) – Spectral smoothing … No, but it uses only Butterworth LPF.  Training – HSMM & MS model training … Yes (HTS & SPTK)  Synthesis – Generation w/ MS … No, but post-filter is available (HTS). 19 *1: search “shinnsuke takamichi”
  • 20. /20 Conclusion  Our challenge – Mono-lingual task (IH1) for Indian languages  Our TTS synthesizer – HMM-based TTS with 4 main modules – Parameter trajectory smoothing in the speech processing module – Modulation spectrum in the synthesis module  Future work – Combine with statistical sample-based method [Takamichi et al., 2014.] 20