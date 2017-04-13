Parametrical speech synthesis - an overview Sébastien Le Maguer, Bernd Möbius 5th January 2016 1 / 46
Introduction - Text-To-Speech synthesis Welcome to this tutorial 2 / 46
Introduction - Corpus-based TTS Welcome to this tutorial N.L.P.Text Acou. coeﬀ. Desc. features Oﬄine stageS.P. Online stag...
Introduction - Focus on Unit Selection Welcome to this tutorial N.L.P.Text Acou. coeﬀ. Desc. features Oﬄine stageS.P. Onli...
Introduction - Focus on Unit Selection Main Hypothesis Nothing will be better than the speech itself Advantages Signal qua...
Introduction - Parametrical corpus based TTS Welcome to this tutorial N.L.P.Text Acou. coeﬀ. Desc. features Oﬄine stageS.P...
Pre-requisite : Signal parametrization Objective How to represent speech Numerical coeﬃcients Trend : complexity % quality...
Pre-requisite : Signal parametrization Vocoder - Source-ﬁlter model F0 Periodic signal generation White noise generation P...
Pre-requisite : Signal parametrization Vocoder - Source-ﬁlter model (voiced example) (http://www.haskins.yale.edu/featured...
Pre-requisite : Signal parametrization Vocoder - Mixed-mode excitation F0 Periodic signal generation White noise generatio...
Pre-requisite : Signal parametrization Spectrum - Main information 11 / 46
Pre-requisite : Signal parametrization Spectrum - Voiced/Unvoiced Voiced Unvoiced 12 / 46
Pre-requisite : Signal parametrization Cepstrum - Mel-Frequency Cepstral Coeﬃcients sn preacc. windowing |FFT| Mel log FFT...
Pre-requisite : Signal parametrization Cepstrum - Voiced/Unvoiced Voiced Unvoiced 14 / 46
Pre-requisite : Signal parametrization Cepstrum - Mel-Log Spectrum Approximation Filter [Fukada et al., 1992] Why ? In spe...
Pre-requisite : Signal parametrization Some samples Sample 1 Sample 2 16 / 46
Pre-requisite : Signal parametrization Taking dynamic into account Some samples Sample 1 Sample 2 THE EQUATION TO REMEMBER...
Pre-requisite : Signal parametrization Taking dynamic into accounts c1 ∆c1 ∆2c1 c2 ∆c2 ∆2c2 ... cT ∆cT ∆2cT o1 o2 oT 3MT O...
Pre-requisite : Get descriptive features Pre-requisite : Get descriptive features Objective How to describe speech What ar...
Pre-requisite : Get descriptive features Pre-requisite : Get descriptive features - Example label example label format des...
Training stage Where we are! Welcome to this tutorial N.L.P.Text Acou. coeﬀ. Desc. features Oﬄine stageS.P. Online stage N...
Training stage Problems need to be solved 1 Acoustic modelling Gaussian Hidden Semi Markov Models 2 Heterogeneous data Mul...
Training stage Statistical distribution - Gaussian vectors Deﬁnition - Multivariate gaussian distribution N(µ, Σ) (1) µ = ...
Training stage Statistical distribution - MSD Only used for the F0 modelling 24 / 46
Training stage Statistical distribution - MSD MSD = Multi-Space Distribution Deﬁnition X = (V , S) (3) S = a space V = a v...
Training stage Markov Models (MM) http://www.americanscientist.org/issues/pub/2013/2/ first-links-in-the-markov-chain/9999...
Training stage Hidden Markov Models (HMM) Now we can’t get the weather’s sequence directly! Main assumption There is an un...
Training stage Hidden Markov Models (HMM) - Continuous Based on the previous HMM 1 2 3 0.6 0.3 0.10.2 0.3 0.5 0.4 0.1 0.5 ...
Training stage Hidden Markov Models (HMM) - the 3 problems Tutorial = [Rabiner, 1989] Problem 1 Given an observation seque...
Training stage Hidden Markov Models (HMM )- Training input (Discrete) Example data raining - sunny - sunny - sunny - raini...
Training stage Hidden Markov Models (HMM )- Training input (continuous) Example data 0.66693531 - 0.71573471 - 0.44575163 ...
Training stage Hidden Markov Models (HMM )- Speech Tutorial = [Rabiner, 1989] 32 / 46
Training stage Hidden Markov Models (HMM) - Speech Deﬁnition λ = (A, B, π) A = {ai,j }, ∀i, j ∈ [1..S] B = {bj (ot )}, ∀j ...
Training stage Hidden Semi-Markov Models (HSMM) HMM = Geometric distribution Pr(X = k) = (1 − p)k−1 p (5) PROBLEM ! Optimi...
Training stage Hidden Semi-Markov Models (HSMM) 35 / 46
Training stage Decision tree + state tying [Young et al., 1994] Objective Dealing with sparseness english = 53 descriptive...
Training stage Training process Initialisation (MSD-HMM) Monophone (MSD-HSMM) Fullcontext (MSD-HSMM) Clustering tree+(MSD-...
Synthesis stage Where we are! Welcome to this tutorial N.L.P.Text Acou. coeﬀ. Desc. features Oﬄine stageS.P. Online stage ...
Synthesis stage HMM to produce speech 0. Input = sequence of descriptive features start-tt-an an-tt-end... 39 / 46
Synthesis stage HMM to produce speech 1. HMM-phrase building start-tt-an an-tt-end... ... Nb segments 39 / 46
Synthesis stage HMM to produce speech 2. Associate distributions [Yoshimura et al., 1999] start-tt-an an-tt-end... ... C-P...
Synthesis stage HMM to produce speech 3. Acoustic coeﬃcient generation [Tokuda et al., 2000] start-tt-an an-tt-end... ... ...
Synthesis stage HMM to produce speech 4. Signal synthesis [Zen and Toda, 2005] start-tt-an an-tt-end... ... ... ... Use ML...
Adaptation What is adaptation [Yamagishi et al., 2007] Train average model Adaptation 40 / 46
Adaptation Informations Why that’s great ! Average voice built ⇒ no need of a huge amount of a training data Flexible (voi...
Small introduction to DNN What is a DNN (conceptually!) 42 / 46
Small introduction to DNN What is a DNN (conceptually!) 43 / 46
Small introduction to DNN How it is apply on TTS More details in [Zen et al., 2013] 44 / 46
Small introduction to DNN Why is it adapted Main advantages Multi-layer models like speech (frames, phones, syllables, wor...
Conclusion Summary of this talk 2 Corpus based TTS methodologies Unit selection Parametrical speech synthesis Focus on sta...
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A Maximization Technique Occurring in the Statistical Analysis ...
Number July 2000. Young, S. J., Odell, J. J., and Woodland, P. C. (1994). Tree-based state tying for high accuracy acousti...
