Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

Using Paragraph- and Discourse-based
Prosodic Cues to Improve Speech
Synthesis Expressiveness
Mireia Farrús
AI With the Best, 25/09/2016

Outline
2AI With the Best, 25/09/2016
Over the last decade, automatically
generated speech has significantly
improved in terms of voice quality and
expressiveness. However, multi-sentential
synthesized speech still suffers from a high
degree of unnaturalness.

Outline
To overcome it, a more paragraph and
communicative structure aware approach
is needed to make real improvements in
speech synthesis

Text-to-Speech (TTS) Systems

TTS systems

Context: current TTS systems
• Preceding and following phonemes
• Position of segment in syllable
• Position of syllable in word & phrase
• Position of word in phrase
• Stress/accent/length features of
current/preceding/following syllables
• Distance from stressed/accented syllables

Context: current TTS systems
• POS of current/preceding/following word
• Length of current/preceding/following
phrase
• End tone of phrase
• Lenght of utterance measured in
syllables/words/phrases
(King, 2010)

BUT human speech also relies on…
• Paragraph structure
• Communicative structure
• Discourse structure

Paragraph structure
• “Paragraph-based Prosodic Cues for Speech
Synthesis Applications”.
Mireia Farrús, Catherine Lai, Johanna D. Moore

Paragraph structure

Prosody & Pragraph Structure
• ~ 1400 TED talks
AI With the Best, 25/09/2016
14

AI With the Best, 25/09/2016 16

• There is clear evidence of prosodic resets over paragraph breaks
• We can also observe a steady declination in prosodic level over the
paragraph
• Difference features are more discriminative of boundaries than
sentence-based features
• Paragraphs have an identifiable suprasentential prosodic structure
that can be described in terms of relative changes in F0, intensity,
and timing
• The classification experiments support the idea that utterance
intrinsic features to paragraph position exist
• Pause duration is the most robust predictor of paragraph breaks
 We should be able to employ paragraph declination, pause and
prosodic reset features to improve the naturalness of longer
synthesized speech
Conclusions
Paragraph structure

Information/Communicative
structure
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner

Theoretical background - Motivation
• Influence of information structure on
intonation
• Steedman’s theory relating
– Theme/rheme
– Intonation patterns

Theoretical background - Handicaps
• Based on short sentences with a simple structure
and a default word order (SVO for English)
• What if we have…

ToBI labels
Tones and Break Indices
• high (H) and low (L) tones
• pitch accents (the L* tones)
• bitonal pitch accents (L+H*, etc.)
• phrase accents (H- and L- tones)
• boundary tones (H% and L%)

Theoretical background – Our work
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner
• Objectives
– Validate Steedman’s theory
– Proposal for more complex syntactic structures

Theoretical background - Mel’čuk
Steedman
• Linearity
• Intonation ~ theme/rheme
Mel’čuk
• Hierarchy
• Intonation ~ Thematicity
– theme/rheme
– specifiers
– embeddedness

Preliminary experiments
• Wall Street Journal corpus (Penn Treebank)
• American English recordings
• Native speakers
• 109 sentences
• AuToBI labelling + reduction model
• Manual annotation of Thematicity

Validating the classic interface
• To what extent the classic approaches can be applied
to general discourse with more complex sentences?
• Examples matching the expected THEME patterns…
… but not the expected RHEMES.

Validating the classic interface
• We have found that…
– Themes usually match, although ~40% do not.
– Steedman’s approach to include everything –apart
from theme – into a flat rheme span lacks
accuracy.
• We need a more accurate IS—prosody
interface.

Towards a more accurate IS-Prosody
interface
• Our hypothesis:
– Applying Mel’čuk’s hierarchical three-partite
thematicity structure, we will be able to:
• Propose a more accurate modelisation of the
intonation-thematicity correlation for the ~40% non-
coincident patterns in theme spans.
• Find a justification for the discrepancies observed in the
rheme patterns.

interface
• Specifier
Example with the annotation suggested by Mel’čuk (1)

interface
• Specifier
Example with the annotation suggested by Mel’čuk (2)

interface
• Hierarchy
Example with the annotation suggested by Mel’čuk
rising pattern ↔ theme
Embedded themes behave as main themes in terms of intonation.

Classification experiments
• Combining Acoustic and Linguistic Levels in
Phrase-Oriented Prosody Modelling

• Testing acoustic parameters

• Testing linguistic features

Conclusions
• Information Structure determines the
“communicative” segmentation of the
meaning of an utterance.
• Central to the semantics—syntax—intonation
interface, and to NLP.

Conclusions
• Descriptive study attempting to determine
which intonation patterns better characterize
thematicity in real utterances.
• Flat theme/rheme interpretation prevailing in
classical approaches fails to explain complex
linguistic structures.
• Hierarchical structures and the specifiers
render positive results.

Prosody & discourse structure
• Rhetorical Structure Theory (RST)
(Mann & Thompson, 1988)
Describes organization structure of texts via
definitions of relations between two text span,
nucleous (N) and satellite (S)

Conclusions
• Prosody prediction from:
• Type of sentence
• Discourse structure
• Discourse markers
• Information structure
… to improve expressiveness and naturalness
of automatically generated speech

Thank you for your attention!

Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

Recommended

Recommended

More Related Content

Similar to Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

Similar to Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus (20)

More from WithTheBest

More from WithTheBest (20)

Recently uploaded

Recently uploaded (20)

Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus