SlideShare a Scribd company logo
Using Paragraph- and Discourse-based
Prosodic Cues to Improve Speech
Synthesis Expressiveness
Mireia Farrús
AI With the Best, 25/09/2016
Outline
2AI With the Best, 25/09/2016
Over the last decade, automatically
generated speech has significantly
improved in terms of voice quality and
expressiveness. However, multi-sentential
synthesized speech still suffers from a high
degree of unnaturalness.
Outline
3AI With the Best, 25/09/2016
To overcome it, a more paragraph and
communicative structure aware approach
is needed to make real improvements in
speech synthesis
Text-to-Speech (TTS) Systems
4AI With the Best, 25/09/2016
TTS systems
5AI With the Best, 25/09/2016
TTS systems
6AI With the Best, 25/09/2016
Context: current TTS systems
7AI With the Best, 25/09/2016
• Preceding and following phonemes
• Position of segment in syllable
• Position of syllable in word & phrase
• Position of word in phrase
• Stress/accent/length features of
current/preceding/following syllables
• Distance from stressed/accented syllables
Context: current TTS systems
8AI With the Best, 25/09/2016
• POS of current/preceding/following word
• Length of current/preceding/following
phrase
• End tone of phrase
• Lenght of utterance measured in
syllables/words/phrases
(King, 2010)
BUT human speech also relies on…
9AI With the Best, 25/09/2016
• Paragraph structure
• Communicative structure
• Discourse structure
Paragraph structure
10AI With the Best, 25/09/2016
• “Paragraph-based Prosodic Cues for Speech
Synthesis Applications”.
Mireia Farrús, Catherine Lai, Johanna D. Moore
Paragraph structure
11AI With the Best, 25/09/2016
Paragraph structure
12AI With the Best, 25/09/2016
13AI With the Best, 25/09/2016
Prosody & Pragraph Structure
• ~ 1400 TED talks
AI With the Best, 25/09/2016
14
15AI With the Best, 25/09/2016
AI With the Best, 25/09/2016 16
AI With the Best, 25/09/2016 17
• There is clear evidence of prosodic resets over paragraph breaks
• We can also observe a steady declination in prosodic level over the
paragraph
• Difference features are more discriminative of boundaries than
sentence-based features
• Paragraphs have an identifiable suprasentential prosodic structure
that can be described in terms of relative changes in F0, intensity,
and timing
• The classification experiments support the idea that utterance
intrinsic features to paragraph position exist
• Pause duration is the most robust predictor of paragraph breaks
 We should be able to employ paragraph declination, pause and
prosodic reset features to improve the naturalness of longer
synthesized speech
Conclusions
Paragraph structure
AI With the Best, 25/09/2016 18
Information/Communicative
structure
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner
Theoretical background - Motivation
• Influence of information structure on
intonation
• Steedman’s theory relating
– Theme/rheme
– Intonation patterns
AI With the Best, 25/09/2016 19
Theoretical background - Handicaps
• Based on short sentences with a simple structure
and a default word order (SVO for English)
• What if we have…
AI With the Best, 25/09/2016 20
ToBI labels
Tones and Break Indices
• high (H) and low (L) tones
• pitch accents (the L* tones)
• bitonal pitch accents (L+H*, etc.)
• phrase accents (H- and L- tones)
• boundary tones (H% and L%)
AI With the Best, 25/09/2016 21
Theoretical background – Our work
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner
• Objectives
– Validate Steedman’s theory
– Proposal for more complex syntactic structures
AI With the Best, 25/09/2016 22
Theoretical background - Mel’čuk
Steedman
• Linearity
• Intonation ~ theme/rheme
Mel’čuk
• Hierarchy
• Intonation ~ Thematicity
– theme/rheme
– specifiers
– embeddedness
AI With the Best, 25/09/2016 23
Preliminary experiments
• Wall Street Journal corpus (Penn Treebank)
• American English recordings
• Native speakers
• 109 sentences
• AuToBI labelling + reduction model
• Manual annotation of Thematicity
AI With the Best, 25/09/2016 24
Validating the classic interface
• To what extent the classic approaches can be applied
to general discourse with more complex sentences?
• Examples matching the expected THEME patterns…
… but not the expected RHEMES.
AI With the Best, 25/09/2016 25
Validating the classic interface
• We have found that…
– Themes usually match, although ~40% do not.
– Steedman’s approach to include everything –apart
from theme – into a flat rheme span lacks
accuracy.
• We need a more accurate IS—prosody
interface.
AI With the Best, 25/09/2016 26
Towards a more accurate IS-Prosody
interface
• Our hypothesis:
– Applying Mel’čuk’s hierarchical three-partite
thematicity structure, we will be able to:
• Propose a more accurate modelisation of the
intonation-thematicity correlation for the ~40% non-
coincident patterns in theme spans.
• Find a justification for the discrepancies observed in the
rheme patterns.
AI With the Best, 25/09/2016 27
Towards a more accurate IS-Prosody
interface
• Specifier
Example with the annotation suggested by Mel’čuk (1)
AI With the Best, 25/09/2016 28
Towards a more accurate IS-Prosody
interface
• Specifier
Example with the annotation suggested by Mel’čuk (2)
AI With the Best, 25/09/2016 29
Towards a more accurate IS-Prosody
interface
• Hierarchy
Example with the annotation suggested by Mel’čuk
AI With the Best, 25/09/2016 30
rising pattern ↔ theme
Embedded themes behave as main themes in terms of intonation.
Classification experiments
• Combining Acoustic and Linguistic Levels in
Phrase-Oriented Prosody Modelling
AI With the Best, 25/09/2016 31
Classification experiments
• Testing acoustic parameters
AI With the Best, 25/09/2016 32
Classification experiments
• Testing linguistic features
AI With the Best, 25/09/2016 33
Conclusions
AI With the Best, 25/09/2016 34
• Information Structure determines the
“communicative” segmentation of the
meaning of an utterance.
• Central to the semantics—syntax—intonation
interface, and to NLP.
Conclusions
AI With the Best, 25/09/2016 35
• Descriptive study attempting to determine
which intonation patterns better characterize
thematicity in real utterances.
• Flat theme/rheme interpretation prevailing in
classical approaches fails to explain complex
linguistic structures.
• Hierarchical structures and the specifiers
render positive results.
Prosody & discourse structure
• Rhetorical Structure Theory (RST)
(Mann & Thompson, 1988)
Describes organization structure of texts via
definitions of relations between two text span,
nucleous (N) and satellite (S)
AI With the Best, 25/09/2016 36
Conclusions
• Prosody prediction from:
• Type of sentence
• Discourse structure
• Discourse markers
• Information structure
… to improve expressiveness and naturalness
of automatically generated speech
AI With the Best, 25/09/2016 37
Thank you for your attention!
AI With the Best, 25/09/2016 38

More Related Content

Similar to Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

2007 CogSci 2020 poster
2007 CogSci 2020 poster2007 CogSci 2020 poster
2007 CogSci 2020 poster
WarNik Chow
 
Qualitative analysis part 2
Qualitative analysis part 2Qualitative analysis part 2
Qualitative analysis part 2
ziarra advincula
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...
semanticsconference
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
 
Digital Language Resources_2024.pdf detailed
Digital Language Resources_2024.pdf detailedDigital Language Resources_2024.pdf detailed
Digital Language Resources_2024.pdf detailed
stu2203598065
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes
 
Evaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxEvaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docx
turveycharlyn
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014
SVTaylor123
 
Digging Deeper into the Common Core
Digging Deeper into the Common CoreDigging Deeper into the Common Core
Digging Deeper into the Common Core
National Resource Center for Paraprofessionals
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Jinho Choi
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices
Prof. Nirmal Kumar Swain
 
Academic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDFAcademic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDF
Dustin Pytko
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
Vipul Munot
 
Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007
Pascual Pérez-Paredes
 

Similar to Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus (20)

2007 CogSci 2020 poster
2007 CogSci 2020 poster2007 CogSci 2020 poster
2007 CogSci 2020 poster
 
Incrementality
IncrementalityIncrementality
Incrementality
 
Qualitative analysis part 2
Qualitative analysis part 2Qualitative analysis part 2
Qualitative analysis part 2
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Digital Language Resources_2024.pdf detailed
Digital Language Resources_2024.pdf detailedDigital Language Resources_2024.pdf detailed
Digital Language Resources_2024.pdf detailed
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Evaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxEvaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docx
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014
 
Digging Deeper into the Common Core
Digging Deeper into the Common CoreDigging Deeper into the Common Core
Digging Deeper into the Common Core
 
Chp 9
Chp 9Chp 9
Chp 9
 
Chp 9
Chp 9 Chp 9
Chp 9
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices
 
Convert to journal
Convert to journalConvert to journal
Convert to journal
 
Academic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDFAcademic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDF
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007
 

More from WithTheBest

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo Vittoria
WithTheBest
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual reality
WithTheBest
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
WithTheBest
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
WithTheBest
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101
WithTheBest
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
WithTheBest
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devices
WithTheBest
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unity
WithTheBest
 
Wizdish rovr
Wizdish rovrWizdish rovr
Wizdish rovr
WithTheBest
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vr
WithTheBest
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physics
WithTheBest
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self
WithTheBest
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helps
WithTheBest
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overview
WithTheBest
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason Jerald
WithTheBest
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devil
WithTheBest
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estate
WithTheBest
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VR
WithTheBest
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode.
WithTheBest
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years old
WithTheBest
 

More from WithTheBest (20)

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo Vittoria
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual reality
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devices
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unity
 
Wizdish rovr
Wizdish rovrWizdish rovr
Wizdish rovr
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vr
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physics
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helps
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overview
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason Jerald
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devil
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estate
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VR
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode.
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years old
 

Recently uploaded

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

  • 1. Using Paragraph- and Discourse-based Prosodic Cues to Improve Speech Synthesis Expressiveness Mireia Farrús AI With the Best, 25/09/2016
  • 2. Outline 2AI With the Best, 25/09/2016 Over the last decade, automatically generated speech has significantly improved in terms of voice quality and expressiveness. However, multi-sentential synthesized speech still suffers from a high degree of unnaturalness.
  • 3. Outline 3AI With the Best, 25/09/2016 To overcome it, a more paragraph and communicative structure aware approach is needed to make real improvements in speech synthesis
  • 4. Text-to-Speech (TTS) Systems 4AI With the Best, 25/09/2016
  • 5. TTS systems 5AI With the Best, 25/09/2016
  • 6. TTS systems 6AI With the Best, 25/09/2016
  • 7. Context: current TTS systems 7AI With the Best, 25/09/2016 • Preceding and following phonemes • Position of segment in syllable • Position of syllable in word & phrase • Position of word in phrase • Stress/accent/length features of current/preceding/following syllables • Distance from stressed/accented syllables
  • 8. Context: current TTS systems 8AI With the Best, 25/09/2016 • POS of current/preceding/following word • Length of current/preceding/following phrase • End tone of phrase • Lenght of utterance measured in syllables/words/phrases (King, 2010)
  • 9. BUT human speech also relies on… 9AI With the Best, 25/09/2016 • Paragraph structure • Communicative structure • Discourse structure
  • 10. Paragraph structure 10AI With the Best, 25/09/2016 • “Paragraph-based Prosodic Cues for Speech Synthesis Applications”. Mireia Farrús, Catherine Lai, Johanna D. Moore
  • 11. Paragraph structure 11AI With the Best, 25/09/2016
  • 12. Paragraph structure 12AI With the Best, 25/09/2016
  • 13. 13AI With the Best, 25/09/2016
  • 14. Prosody & Pragraph Structure • ~ 1400 TED talks AI With the Best, 25/09/2016 14
  • 15. 15AI With the Best, 25/09/2016
  • 16. AI With the Best, 25/09/2016 16
  • 17. AI With the Best, 25/09/2016 17 • There is clear evidence of prosodic resets over paragraph breaks • We can also observe a steady declination in prosodic level over the paragraph • Difference features are more discriminative of boundaries than sentence-based features • Paragraphs have an identifiable suprasentential prosodic structure that can be described in terms of relative changes in F0, intensity, and timing • The classification experiments support the idea that utterance intrinsic features to paragraph position exist • Pause duration is the most robust predictor of paragraph breaks  We should be able to employ paragraph declination, pause and prosodic reset features to improve the naturalness of longer synthesized speech Conclusions Paragraph structure
  • 18. AI With the Best, 25/09/2016 18 Information/Communicative structure • “The Information Structure-Prosody Language Interface Revisited”. Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner
  • 19. Theoretical background - Motivation • Influence of information structure on intonation • Steedman’s theory relating – Theme/rheme – Intonation patterns AI With the Best, 25/09/2016 19
  • 20. Theoretical background - Handicaps • Based on short sentences with a simple structure and a default word order (SVO for English) • What if we have… AI With the Best, 25/09/2016 20
  • 21. ToBI labels Tones and Break Indices • high (H) and low (L) tones • pitch accents (the L* tones) • bitonal pitch accents (L+H*, etc.) • phrase accents (H- and L- tones) • boundary tones (H% and L%) AI With the Best, 25/09/2016 21
  • 22. Theoretical background – Our work • “The Information Structure-Prosody Language Interface Revisited”. Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner • Objectives – Validate Steedman’s theory – Proposal for more complex syntactic structures AI With the Best, 25/09/2016 22
  • 23. Theoretical background - Mel’čuk Steedman • Linearity • Intonation ~ theme/rheme Mel’čuk • Hierarchy • Intonation ~ Thematicity – theme/rheme – specifiers – embeddedness AI With the Best, 25/09/2016 23
  • 24. Preliminary experiments • Wall Street Journal corpus (Penn Treebank) • American English recordings • Native speakers • 109 sentences • AuToBI labelling + reduction model • Manual annotation of Thematicity AI With the Best, 25/09/2016 24
  • 25. Validating the classic interface • To what extent the classic approaches can be applied to general discourse with more complex sentences? • Examples matching the expected THEME patterns… … but not the expected RHEMES. AI With the Best, 25/09/2016 25
  • 26. Validating the classic interface • We have found that… – Themes usually match, although ~40% do not. – Steedman’s approach to include everything –apart from theme – into a flat rheme span lacks accuracy. • We need a more accurate IS—prosody interface. AI With the Best, 25/09/2016 26
  • 27. Towards a more accurate IS-Prosody interface • Our hypothesis: – Applying Mel’čuk’s hierarchical three-partite thematicity structure, we will be able to: • Propose a more accurate modelisation of the intonation-thematicity correlation for the ~40% non- coincident patterns in theme spans. • Find a justification for the discrepancies observed in the rheme patterns. AI With the Best, 25/09/2016 27
  • 28. Towards a more accurate IS-Prosody interface • Specifier Example with the annotation suggested by Mel’čuk (1) AI With the Best, 25/09/2016 28
  • 29. Towards a more accurate IS-Prosody interface • Specifier Example with the annotation suggested by Mel’čuk (2) AI With the Best, 25/09/2016 29
  • 30. Towards a more accurate IS-Prosody interface • Hierarchy Example with the annotation suggested by Mel’čuk AI With the Best, 25/09/2016 30 rising pattern ↔ theme Embedded themes behave as main themes in terms of intonation.
  • 31. Classification experiments • Combining Acoustic and Linguistic Levels in Phrase-Oriented Prosody Modelling AI With the Best, 25/09/2016 31
  • 32. Classification experiments • Testing acoustic parameters AI With the Best, 25/09/2016 32
  • 33. Classification experiments • Testing linguistic features AI With the Best, 25/09/2016 33
  • 34. Conclusions AI With the Best, 25/09/2016 34 • Information Structure determines the “communicative” segmentation of the meaning of an utterance. • Central to the semantics—syntax—intonation interface, and to NLP.
  • 35. Conclusions AI With the Best, 25/09/2016 35 • Descriptive study attempting to determine which intonation patterns better characterize thematicity in real utterances. • Flat theme/rheme interpretation prevailing in classical approaches fails to explain complex linguistic structures. • Hierarchical structures and the specifiers render positive results.
  • 36. Prosody & discourse structure • Rhetorical Structure Theory (RST) (Mann & Thompson, 1988) Describes organization structure of texts via definitions of relations between two text span, nucleous (N) and satellite (S) AI With the Best, 25/09/2016 36
  • 37. Conclusions • Prosody prediction from: • Type of sentence • Discourse structure • Discourse markers • Information structure … to improve expressiveness and naturalness of automatically generated speech AI With the Best, 25/09/2016 37
  • 38. Thank you for your attention! AI With the Best, 25/09/2016 38