E speak aegis-workshop

eSpeak TTS Engine:
Language Enhancement

Jerry Dimitriou, Singular Logic

28 November 2011 ÆGIS Conference, Brussels, Belgium

What is eSpeak

• Open Source, Free to Use TTS Engine
– Formant based
– Minimal need for resources
– More than 20 Languages already available
• Not all of them are in good state.
• Advantages
– Intelligible in High Speeds
– Easier to enhance languages (Rule based)
– Easier to create new sounds (Phonemes)
• Disadvantages
– Sound not natural (Robotic)

Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 2

How eSpeak works: Text-To-Phoneme

• Step 1: Text to Phoneme Translation
– Rule based with rules contained in a <lang>_rules file
– Exceptions of rules in a <lang>_list file
– Rules translate normal text to a stream of characters called
phonemes
– Phonemes represent a standard sound which is generated:
• either through formants (vowels and voiced consonants)
• by playing samples (unvoiced and fricative consonants)
– Examples:
• Normal Text eSpeak Phon IPA Alphabet
• Amazing → a#m'eIzIN → ɐmˈeɪzɪŋ
• Brussels → br'Vs@Lz → bɹˈʌsəlz
• Disability → d,Is@b'IlI2ti → dˌɪsəbˈɪlɪtɪ


How eSpeak works: Rules

• Rules Format
– <prefix>) <group of letters> ( <suffix> phonemes
– a (Cable 'eI
– a (tion 'eI
– _r) a (tion a
– Prefix and Suffix
• Non capital letters represent themselves
• Capital letters represent sets of letters
– C → Any Consonant
– A → Any Vowel
– _ → Start of word at prefix, end of word at suffix


How eSpeak works: Exceptions

• Exception Format
– <group of letters or word> phonemes and or flags
– _" kwoUts
– _% p3s'Ent
– _0 z'i@roU
– _1 w'0n
– eg fO@Egz'aamp@L
– ibm $abbrev
– Ambidextrous $3
– from fr0m $u
– Flags
• $u, $abbrev, $only, $dot, $pause, etc



– _" kwoUts
– _% p3s'Ent
– _0 z'i@roU
– _1 w'0n
– ibm $abbrev
– Ambidextrous $3
– from fr0m $u
– Flags


How eSpeak works: Phoneme-To-Sound

• Step 2: Phoneme to Sound
– Having the list of phonemes, for each phoneme eSpeak generates
a sound
– Previous or Next Phoneme may alter phoneme sound
– Phoneme sound generation may be from a sample file or from
formant data.
– Phoneme data are defined in ph_<language> files
• Eg: ph_english
– Example of an entry in ph_english (Phoneme Definition)
• phoneme I
vowel starttype #i endtype #i
length 130
IfNextVowelAppend(;)
FMT(vowel/ii_2)
endphoneme


Editing eSpeak files

• eSpeakEdit Program
– Used to edit, visualize and compile eSpeak data
• Formant Phoneme Data
• Workflow for text-to-phoneme
– Find an error in pronunciation, intonation etc
– Check which rule (or exception) generates the error
– Edit the rules or the dict file
– Compile the data
– Retry


Editing eSpeak files (2)

• Workflow for phoneme-to-sound
– There might be cases where there is no proper sound for a
specific phoneme (usual problem the R sound)
• Eg. should be shorter or longer, when stressed or
unstressed
– Check all the available sounds that seem similar with the
sound you need, using espeakedit.
– If something closer to what you need is found, change or
add its definition in ph_<language> file
– If not, create a new phoneme, using espeakedit or record a
new sound, for unvoiced consonants.
– retry


Editing Demonstration

• Demo of language edit, using espeakedit


Native speakers: How to contribute

• The biggest problem in language editing in eSpeak is ... native
speakers.
• One must be a native speaker in order to be able to fix
language problems
• How to contribute
– Find errors in eSpeak for a certain language and report
them
– Try to fix pronunciacion rules by editing rules and
exceptions
– Try to fix phoneme sounds by editing phoneme data.
– Send back the changes to the eSpeak community


Espeak Language Enhancement

Thank you!


E speak aegis-workshop

More Related Content

What's hot

Similar to E speak aegis-workshop

More from AEGIS-ACCESSIBLE Projects

Recently uploaded

E speak aegis-workshop