Over the last decade, automatically generated speech has significantly improved in terms of voice quality and expressiveness, However, multi-sententail synthesise speech still suffers from a high degree of unnaturalness. To overcome it, a more paragraph and discourse structure aware approach is needed to make real improvements in speech synthesis
Mireia Farrús
Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus
1. Using Paragraph- and Discourse-based
Prosodic Cues to Improve Speech
Synthesis Expressiveness
Mireia Farrús
AI With the Best, 25/09/2016
2. Outline
2AI With the Best, 25/09/2016
Over the last decade, automatically
generated speech has significantly
improved in terms of voice quality and
expressiveness. However, multi-sentential
synthesized speech still suffers from a high
degree of unnaturalness.
3. Outline
3AI With the Best, 25/09/2016
To overcome it, a more paragraph and
communicative structure aware approach
is needed to make real improvements in
speech synthesis
7. Context: current TTS systems
7AI With the Best, 25/09/2016
• Preceding and following phonemes
• Position of segment in syllable
• Position of syllable in word & phrase
• Position of word in phrase
• Stress/accent/length features of
current/preceding/following syllables
• Distance from stressed/accented syllables
8. Context: current TTS systems
8AI With the Best, 25/09/2016
• POS of current/preceding/following word
• Length of current/preceding/following
phrase
• End tone of phrase
• Lenght of utterance measured in
syllables/words/phrases
(King, 2010)
9. BUT human speech also relies on…
9AI With the Best, 25/09/2016
• Paragraph structure
• Communicative structure
• Discourse structure
10. Paragraph structure
10AI With the Best, 25/09/2016
• “Paragraph-based Prosodic Cues for Speech
Synthesis Applications”.
Mireia Farrús, Catherine Lai, Johanna D. Moore
17. AI With the Best, 25/09/2016 17
• There is clear evidence of prosodic resets over paragraph breaks
• We can also observe a steady declination in prosodic level over the
paragraph
• Difference features are more discriminative of boundaries than
sentence-based features
• Paragraphs have an identifiable suprasentential prosodic structure
that can be described in terms of relative changes in F0, intensity,
and timing
• The classification experiments support the idea that utterance
intrinsic features to paragraph position exist
• Pause duration is the most robust predictor of paragraph breaks
We should be able to employ paragraph declination, pause and
prosodic reset features to improve the naturalness of longer
synthesized speech
Conclusions
Paragraph structure
18. AI With the Best, 25/09/2016 18
Information/Communicative
structure
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner
19. Theoretical background - Motivation
• Influence of information structure on
intonation
• Steedman’s theory relating
– Theme/rheme
– Intonation patterns
AI With the Best, 25/09/2016 19
20. Theoretical background - Handicaps
• Based on short sentences with a simple structure
and a default word order (SVO for English)
• What if we have…
AI With the Best, 25/09/2016 20
21. ToBI labels
Tones and Break Indices
• high (H) and low (L) tones
• pitch accents (the L* tones)
• bitonal pitch accents (L+H*, etc.)
• phrase accents (H- and L- tones)
• boundary tones (H% and L%)
AI With the Best, 25/09/2016 21
22. Theoretical background – Our work
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner
• Objectives
– Validate Steedman’s theory
– Proposal for more complex syntactic structures
AI With the Best, 25/09/2016 22
24. Preliminary experiments
• Wall Street Journal corpus (Penn Treebank)
• American English recordings
• Native speakers
• 109 sentences
• AuToBI labelling + reduction model
• Manual annotation of Thematicity
AI With the Best, 25/09/2016 24
25. Validating the classic interface
• To what extent the classic approaches can be applied
to general discourse with more complex sentences?
• Examples matching the expected THEME patterns…
… but not the expected RHEMES.
AI With the Best, 25/09/2016 25
26. Validating the classic interface
• We have found that…
– Themes usually match, although ~40% do not.
– Steedman’s approach to include everything –apart
from theme – into a flat rheme span lacks
accuracy.
• We need a more accurate IS—prosody
interface.
AI With the Best, 25/09/2016 26
27. Towards a more accurate IS-Prosody
interface
• Our hypothesis:
– Applying Mel’čuk’s hierarchical three-partite
thematicity structure, we will be able to:
• Propose a more accurate modelisation of the
intonation-thematicity correlation for the ~40% non-
coincident patterns in theme spans.
• Find a justification for the discrepancies observed in the
rheme patterns.
AI With the Best, 25/09/2016 27
28. Towards a more accurate IS-Prosody
interface
• Specifier
Example with the annotation suggested by Mel’čuk (1)
AI With the Best, 25/09/2016 28
29. Towards a more accurate IS-Prosody
interface
• Specifier
Example with the annotation suggested by Mel’čuk (2)
AI With the Best, 25/09/2016 29
30. Towards a more accurate IS-Prosody
interface
• Hierarchy
Example with the annotation suggested by Mel’čuk
AI With the Best, 25/09/2016 30
rising pattern ↔ theme
Embedded themes behave as main themes in terms of intonation.
34. Conclusions
AI With the Best, 25/09/2016 34
• Information Structure determines the
“communicative” segmentation of the
meaning of an utterance.
• Central to the semantics—syntax—intonation
interface, and to NLP.
35. Conclusions
AI With the Best, 25/09/2016 35
• Descriptive study attempting to determine
which intonation patterns better characterize
thematicity in real utterances.
• Flat theme/rheme interpretation prevailing in
classical approaches fails to explain complex
linguistic structures.
• Hierarchical structures and the specifiers
render positive results.
36. Prosody & discourse structure
• Rhetorical Structure Theory (RST)
(Mann & Thompson, 1988)
Describes organization structure of texts via
definitions of relations between two text span,
nucleous (N) and satellite (S)
AI With the Best, 25/09/2016 36
37. Conclusions
• Prosody prediction from:
• Type of sentence
• Discourse structure
• Discourse markers
• Information structure
… to improve expressiveness and naturalness
of automatically generated speech
AI With the Best, 25/09/2016 37
38. Thank you for your attention!
AI With the Best, 25/09/2016 38