SlideShare a Scribd company logo
2015©Shinnosuke TAKAMICHI
09/19/2015
Prosody-Controllable HMM-Based
Speech Synthesis Using Speech Input
Yuri Nishigaki, Shinnosuke Takamichi, Tomoki Toda,
Graham Neubig, Sakriani Sakti, Satoshi Nakamura (NAIST)
MLSLP2015 in Aizu Univ.
/17
Speech-based creative activities
and HMM-based speech synthesis
2
Singing voice Speech
Advertisement Live concert Narration Next?
Video avatar
Voice actor
…
Useful method: HMM-based speech synthesis [Tokuda et al., 2013.]
“Synthesize!”
Synthetic speech parameters
text speech
/17
Manual control of synthetic speech
Laugh
Sad
Regression
Multi-Regression HMM [Nose et al., 2007.]
Manually manipulating HMM parameters
User
User
They are very useful, but difficult to control as the user wants.
/17
Motivation of this study
 Functions we want
– Original capability of HMM-based TTS
– Speech-based control
• Intuitive to control
• Make synthetic speech mimic input speech prosody
 Our work
– Speech synthesis having both functions
4
Synthesize
System
Synthesize“Synthesize.”
MR-HMM etc.
Similar to VOCALISTENER
for singing voice control
/17
Overview of the proposed system
(Only text is input.)
5
Input text
Text analysis
Waveform generation
Synthetic speech
Parameter
generation
Synthesis
HMM
Original HMM-based
speech synthesis
/17
Overview of the proposed system
(Text & speech are input.)
6
Input textInput speech
Speech analysis Text analysis
Waveform generation
Synthetic speech
F0
modification
Duration
extraction
Parameter
generation
Alignment
HMM
Synthesis
HMM
/17
Duration extraction module
7
Alignment
HMM
Synthesis
HMM
Feature of
input speech
Context of
Input text
HMM
alignment
Duration
generation
State duration of
synthetic speech
Parm. Gen.
Duration of input speech
/17
Alignment accuracy & duration unit
 How to build alignment HMMs suitable for input speech?
– → The use of pre-recorded speech uttered by users
– Large amounts → user-dependent HMMs
– Small amounts → HMMs adapted from original alignment HMMs
 How to map the input speech duration to synthetic speech?
– Alignment/synthesis HMM-states represent different speech segments.
– Which is better, HMM-state, phone, or mora-level duration unit?
8
/17
Speech parameter generation module
9
Synthesis
HMM
Context of
Input text
Parameter
generation
Spectrum of
synthetic speech
F0 generated
From HMMs
Dur. ext.
State duration
F0 mod. Wav. Gen.
/17
F0 modification module
10
Feature of
input speech
F0 generated
from HMMs
F0
conversion
U/V region
modification
Parm. gen.
F0 of
synthetic speech
Wav. Gen.
/17
F0 conversion &
unvoiced/voiced modification
11
F0
Time
Reference
generated from HMMs
Input speech
F0-converted
U/V-modified
 F0 conversion fixes F0 range of input speech to fit to reference.
 U/V modification fixes the U/V region of input speech to fit to reference.
Linear
conversion
Spline
interpolation
EXPERIMENTAL EVALUATION
12
/17
Experimental Setup
13
Content Value/Setting
User 4 Japanese speakers (2 male & 2 female)
Target speaker 1 Japanese female speaker
Training data of
synthesis HMMs
450 phoneme-balanced sentences,
16 kHz-sampled, 5 ms shift, reading style
Evaluation data 53 phoneme-balanced sentences
Speech features 25-dim. mel-cestrum, log F0, 5-band aperiodicity
Speech analyzer STRAIGHT [Kawahara et al., 1999.]
Text analyzer Open-jtalk
Acoustic model 5-state HSMM [Zen et al., 2007.]
 1. duration unit & alignment HMM adaptation
 2. synthesis HMM adaptation
 3. effect of U/V modification
/17
Evaluation 1: duration unit &
alignment HMM adaptation
 3 duration units
– State / phoneme / mora-level duration
 4 HMMs using different amounts of pre-recorded speech
– 0 … target-speaker-dependent HMMs (= synthesis HMM)
– 1 … HMMs adapted using 1 utterance uttered by the user
– 56 … HMMs adapted using 56 utterances
– 450 … user-dependent HMMs
 Evaluation
– MOS test on naturalness of synthetic speech
– DMOS test on prosody mimicking ability of synthetic speech
• Input speech is presented as reference.
14
/17
Result 1: duration unit &
alignment HMM adaptation
15
1
2
3
4
5
MOS on naturalness DMOS on prosody mimicking ability
0 1 56 450utts.
We can confirm (1) adaptation is effective, and
(2) phoneme-level dur. is relatively robust.
No significant diff. No significant diff.
state phone mora
/17
Experiment 2: Effectiveness of U/V
modification in naturalness
Preferencescoreonnaturalness[%]
0
20
40
60
80
100
Spkr1 Spkr2 Spkr3 Spkr4
U/Vmodificationratio[%]
0
5
10
15
20
Spkr1 Spkr2 Spkr3 Spkr4
w/o or w/ modification U->V or V->U modification
U/V modification can improve the naturalness!
(especially when many U frames of input speech are fixed.)
/17
Conclusion
 2 functions to control synthetic speech
– An original function of HMM-based TTS
• MR-HMM or manual control
– Speech-based control
• Intuitive for users
 2 main modules of our system
– Mimic duration.
• Copy duration of input speech to synthetic speech.
– Mimic F0 patterns.
• Copy dynamic F0 pattern of input speech to synthetic speech.
 Future work
– HMM selection using text & speech 17

More Related Content

What's hot

Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
IJECEIAES
 
A Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis SystemA Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis System
iosrjce
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
Prabha P
 
Baum2
Baum2Baum2
Baum2
dmolina87
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
NAIST Machine Translation Study Group
 
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for VietnameseThe first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
journalBEEI
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
International Journal of Science and Research (IJSR)
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
himadrigupta
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 
Speech user interface
Speech user interfaceSpeech user interface
Speech user interface
Husain master
 
Matlab: Speech Signal Analysis
Matlab: Speech Signal AnalysisMatlab: Speech Signal Analysis
Matlab: Speech Signal Analysis
DataminingTools Inc
 

What's hot (12)

Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
 
A Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis SystemA Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis System
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
 
Baum2
Baum2Baum2
Baum2
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for VietnameseThe first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
 
Speech user interface
Speech user interfaceSpeech user interface
Speech user interface
 
Matlab: Speech Signal Analysis
Matlab: Speech Signal AnalysisMatlab: Speech Signal Analysis
Matlab: Speech Signal Analysis
 

Viewers also liked

日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
Shinnosuke Takamichi
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
Shinnosuke Takamichi
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
Shinnosuke Takamichi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
Takuya Yoshioka
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
Shinnosuke Takamichi
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
NU_I_TODALAB
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)
Shinnosuke Takamichi
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
Shinnosuke Takamichi
 
Saito2017icassp
Saito2017icasspSaito2017icassp
Saito2017icassp
Yuki Saito
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
Shunsuke Ono
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
Shinnosuke Takamichi
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
Shunsuke Ono
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
Shinnosuke Takamichi
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
Daichi Kitamura
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
Yahoo!デベロッパーネットワーク
 

Viewers also liked (16)

日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
 
Saito2017icassp
Saito2017icasspSaito2017icassp
Saito2017icassp
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 

Similar to Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input

Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemEvaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
IJERA Editor
 
Voice morphing-
Voice morphing-Voice morphing-
Voice morphing-
Navneet Sharma
 
voice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdfvoice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdf
DeepthiDeepu668278
 
What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication? What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication?
Shinnosuke Takamichi
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hardik Kanjariya
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
behzad66
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
iosrjce
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech Synthesis
CSCJournals
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
Jasline Presilda
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
sivakumar m
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
csandit
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
kevig
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
ijnlc
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
kevig
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
kevig
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
Rehan Ahmed
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET Journal
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
editor1knowledgecuddle
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01
Madhu Babu
 

Similar to Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input (20)

Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemEvaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
 
Voice morphing-
Voice morphing-Voice morphing-
Voice morphing-
 
voice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdfvoice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdf
 
What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication? What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication?
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech Synthesis
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01
 

More from Shinnosuke Takamichi

JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
Shinnosuke Takamichi
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろう
Shinnosuke Takamichi
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
Shinnosuke Takamichi
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
Shinnosuke Takamichi
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
Shinnosuke Takamichi
 
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
Shinnosuke Takamichi
 
国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告
Shinnosuke Takamichi
 
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Shinnosuke Takamichi
 
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
Shinnosuke Takamichi
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパス
Shinnosuke Takamichi
 
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
Shinnosuke Takamichi
 
音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン
Shinnosuke Takamichi
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamforming
Shinnosuke Takamichi
 
論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus
Shinnosuke Takamichi
 
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
Shinnosuke Takamichi
 
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
Shinnosuke Takamichi
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス
Shinnosuke Takamichi
 
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
Shinnosuke Takamichi
 
音声合成・変換の国際コンペティションへの 参加を振り返って
音声合成・変換の国際コンペティションへの  参加を振り返って音声合成・変換の国際コンペティションへの  参加を振り返って
音声合成・変換の国際コンペティションへの 参加を振り返って
Shinnosuke Takamichi
 
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-trackingユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
Shinnosuke Takamichi
 

More from Shinnosuke Takamichi (20)

JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
 
音声合成のコーパスをつくろう
音声合成のコーパスをつくろう音声合成のコーパスをつくろう
音声合成のコーパスをつくろう
 
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパスJ-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
J-KAC:日本語オーディオブック・紙芝居朗読音声コーパス
 
短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討短時間発話を用いた話者照合のための音声加工の効果に関する検討
短時間発話を用いた話者照合のための音声加工の効果に関する検討
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
 
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
ここまで来た&これから来る音声合成 (明治大学 先端メディアコロキウム)
 
国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告国際会議 interspeech 2020 報告
国際会議 interspeech 2020 報告
 
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 読み会 "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 読み会 "Incremental Text to Speech for Neural Sequence-to-Sequ...
 
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
サブバンドフィルタリングに基づくリアルタイム広帯域DNN声質変換の実装と評価
 
P J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパスP J S: 音素バランスを考慮した日本語歌声コーパス
P J S: 音素バランスを考慮した日本語歌声コーパス
 
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価音響モデル尤度に基づくsubword分割の韻律推定精度における評価
音響モデル尤度に基づくsubword分割の韻律推定精度における評価
 
音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン音声合成研究を加速させるためのコーパスデザイン
音声合成研究を加速させるためのコーパスデザイン
 
論文紹介 Unsupervised training of neural mask-based beamforming
論文紹介 Unsupervised training of neural  mask-based beamforming論文紹介 Unsupervised training of neural  mask-based beamforming
論文紹介 Unsupervised training of neural mask-based beamforming
 
論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus論文紹介 Building the Singapore English National Speech Corpus
論文紹介 Building the Singapore English National Speech Corpus
 
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文紹介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
 
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
話者V2S攻撃: 話者認証から構築される 声質変換とその音声なりすまし可能性の評価
 
JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス JVS:フリーの日本語多数話者音声コーパス
JVS:フリーの日本語多数話者音声コーパス
 
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
差分スペクトル法に基づく DNN 声質変換の計算量削減に向けたフィルタ推定
 
音声合成・変換の国際コンペティションへの 参加を振り返って
音声合成・変換の国際コンペティションへの  参加を振り返って音声合成・変換の国際コンペティションへの  参加を振り返って
音声合成・変換の国際コンペティションへの 参加を振り返って
 
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-trackingユーザ歌唱のための generative moment matching network に基づく neural double-tracking
ユーザ歌唱のための generative moment matching network に基づく neural double-tracking
 

Recently uploaded

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 

Recently uploaded (20)

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 

Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input

  • 1. 2015©Shinnosuke TAKAMICHI 09/19/2015 Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input Yuri Nishigaki, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura (NAIST) MLSLP2015 in Aizu Univ.
  • 2. /17 Speech-based creative activities and HMM-based speech synthesis 2 Singing voice Speech Advertisement Live concert Narration Next? Video avatar Voice actor … Useful method: HMM-based speech synthesis [Tokuda et al., 2013.] “Synthesize!” Synthetic speech parameters text speech
  • 3. /17 Manual control of synthetic speech Laugh Sad Regression Multi-Regression HMM [Nose et al., 2007.] Manually manipulating HMM parameters User User They are very useful, but difficult to control as the user wants.
  • 4. /17 Motivation of this study  Functions we want – Original capability of HMM-based TTS – Speech-based control • Intuitive to control • Make synthetic speech mimic input speech prosody  Our work – Speech synthesis having both functions 4 Synthesize System Synthesize“Synthesize.” MR-HMM etc. Similar to VOCALISTENER for singing voice control
  • 5. /17 Overview of the proposed system (Only text is input.) 5 Input text Text analysis Waveform generation Synthetic speech Parameter generation Synthesis HMM Original HMM-based speech synthesis
  • 6. /17 Overview of the proposed system (Text & speech are input.) 6 Input textInput speech Speech analysis Text analysis Waveform generation Synthetic speech F0 modification Duration extraction Parameter generation Alignment HMM Synthesis HMM
  • 7. /17 Duration extraction module 7 Alignment HMM Synthesis HMM Feature of input speech Context of Input text HMM alignment Duration generation State duration of synthetic speech Parm. Gen. Duration of input speech
  • 8. /17 Alignment accuracy & duration unit  How to build alignment HMMs suitable for input speech? – → The use of pre-recorded speech uttered by users – Large amounts → user-dependent HMMs – Small amounts → HMMs adapted from original alignment HMMs  How to map the input speech duration to synthetic speech? – Alignment/synthesis HMM-states represent different speech segments. – Which is better, HMM-state, phone, or mora-level duration unit? 8
  • 9. /17 Speech parameter generation module 9 Synthesis HMM Context of Input text Parameter generation Spectrum of synthetic speech F0 generated From HMMs Dur. ext. State duration F0 mod. Wav. Gen.
  • 10. /17 F0 modification module 10 Feature of input speech F0 generated from HMMs F0 conversion U/V region modification Parm. gen. F0 of synthetic speech Wav. Gen.
  • 11. /17 F0 conversion & unvoiced/voiced modification 11 F0 Time Reference generated from HMMs Input speech F0-converted U/V-modified  F0 conversion fixes F0 range of input speech to fit to reference.  U/V modification fixes the U/V region of input speech to fit to reference. Linear conversion Spline interpolation
  • 13. /17 Experimental Setup 13 Content Value/Setting User 4 Japanese speakers (2 male & 2 female) Target speaker 1 Japanese female speaker Training data of synthesis HMMs 450 phoneme-balanced sentences, 16 kHz-sampled, 5 ms shift, reading style Evaluation data 53 phoneme-balanced sentences Speech features 25-dim. mel-cestrum, log F0, 5-band aperiodicity Speech analyzer STRAIGHT [Kawahara et al., 1999.] Text analyzer Open-jtalk Acoustic model 5-state HSMM [Zen et al., 2007.]  1. duration unit & alignment HMM adaptation  2. synthesis HMM adaptation  3. effect of U/V modification
  • 14. /17 Evaluation 1: duration unit & alignment HMM adaptation  3 duration units – State / phoneme / mora-level duration  4 HMMs using different amounts of pre-recorded speech – 0 … target-speaker-dependent HMMs (= synthesis HMM) – 1 … HMMs adapted using 1 utterance uttered by the user – 56 … HMMs adapted using 56 utterances – 450 … user-dependent HMMs  Evaluation – MOS test on naturalness of synthetic speech – DMOS test on prosody mimicking ability of synthetic speech • Input speech is presented as reference. 14
  • 15. /17 Result 1: duration unit & alignment HMM adaptation 15 1 2 3 4 5 MOS on naturalness DMOS on prosody mimicking ability 0 1 56 450utts. We can confirm (1) adaptation is effective, and (2) phoneme-level dur. is relatively robust. No significant diff. No significant diff. state phone mora
  • 16. /17 Experiment 2: Effectiveness of U/V modification in naturalness Preferencescoreonnaturalness[%] 0 20 40 60 80 100 Spkr1 Spkr2 Spkr3 Spkr4 U/Vmodificationratio[%] 0 5 10 15 20 Spkr1 Spkr2 Spkr3 Spkr4 w/o or w/ modification U->V or V->U modification U/V modification can improve the naturalness! (especially when many U frames of input speech are fixed.)
  • 17. /17 Conclusion  2 functions to control synthetic speech – An original function of HMM-based TTS • MR-HMM or manual control – Speech-based control • Intuitive for users  2 main modules of our system – Mimic duration. • Copy duration of input speech to synthetic speech. – Mimic F0 patterns. • Copy dynamic F0 pattern of input speech to synthetic speech.  Future work – HMM selection using text & speech 17