SlideShare a Scribd company logo
1 of 38
Using Paragraph- and Discourse-based
Prosodic Cues to Improve Speech
Synthesis Expressiveness
Mireia Farrús
AI With the Best, 25/09/2016
Outline
2AI With the Best, 25/09/2016
Over the last decade, automatically
generated speech has significantly
improved in terms of voice quality and
expressiveness. However, multi-sentential
synthesized speech still suffers from a high
degree of unnaturalness.
Outline
3AI With the Best, 25/09/2016
To overcome it, a more paragraph and
communicative structure aware approach
is needed to make real improvements in
speech synthesis
Text-to-Speech (TTS) Systems
4AI With the Best, 25/09/2016
TTS systems
5AI With the Best, 25/09/2016
TTS systems
6AI With the Best, 25/09/2016
Context: current TTS systems
7AI With the Best, 25/09/2016
• Preceding and following phonemes
• Position of segment in syllable
• Position of syllable in word & phrase
• Position of word in phrase
• Stress/accent/length features of
current/preceding/following syllables
• Distance from stressed/accented syllables
Context: current TTS systems
8AI With the Best, 25/09/2016
• POS of current/preceding/following word
• Length of current/preceding/following
phrase
• End tone of phrase
• Lenght of utterance measured in
syllables/words/phrases
(King, 2010)
BUT human speech also relies on…
9AI With the Best, 25/09/2016
• Paragraph structure
• Communicative structure
• Discourse structure
Paragraph structure
10AI With the Best, 25/09/2016
• “Paragraph-based Prosodic Cues for Speech
Synthesis Applications”.
Mireia Farrús, Catherine Lai, Johanna D. Moore
Paragraph structure
11AI With the Best, 25/09/2016
Paragraph structure
12AI With the Best, 25/09/2016
13AI With the Best, 25/09/2016
Prosody & Pragraph Structure
• ~ 1400 TED talks
AI With the Best, 25/09/2016
14
15AI With the Best, 25/09/2016
AI With the Best, 25/09/2016 16
AI With the Best, 25/09/2016 17
• There is clear evidence of prosodic resets over paragraph breaks
• We can also observe a steady declination in prosodic level over the
paragraph
• Difference features are more discriminative of boundaries than
sentence-based features
• Paragraphs have an identifiable suprasentential prosodic structure
that can be described in terms of relative changes in F0, intensity,
and timing
• The classification experiments support the idea that utterance
intrinsic features to paragraph position exist
• Pause duration is the most robust predictor of paragraph breaks
 We should be able to employ paragraph declination, pause and
prosodic reset features to improve the naturalness of longer
synthesized speech
Conclusions
Paragraph structure
AI With the Best, 25/09/2016 18
Information/Communicative
structure
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner
Theoretical background - Motivation
• Influence of information structure on
intonation
• Steedman’s theory relating
– Theme/rheme
– Intonation patterns
AI With the Best, 25/09/2016 19
Theoretical background - Handicaps
• Based on short sentences with a simple structure
and a default word order (SVO for English)
• What if we have…
AI With the Best, 25/09/2016 20
ToBI labels
Tones and Break Indices
• high (H) and low (L) tones
• pitch accents (the L* tones)
• bitonal pitch accents (L+H*, etc.)
• phrase accents (H- and L- tones)
• boundary tones (H% and L%)
AI With the Best, 25/09/2016 21
Theoretical background – Our work
• “The Information Structure-Prosody Language
Interface Revisited”.
Mónica Domínguez, Mireia Farrús,
Alicia Burga, Leo Wanner
• Objectives
– Validate Steedman’s theory
– Proposal for more complex syntactic structures
AI With the Best, 25/09/2016 22
Theoretical background - Mel’čuk
Steedman
• Linearity
• Intonation ~ theme/rheme
Mel’čuk
• Hierarchy
• Intonation ~ Thematicity
– theme/rheme
– specifiers
– embeddedness
AI With the Best, 25/09/2016 23
Preliminary experiments
• Wall Street Journal corpus (Penn Treebank)
• American English recordings
• Native speakers
• 109 sentences
• AuToBI labelling + reduction model
• Manual annotation of Thematicity
AI With the Best, 25/09/2016 24
Validating the classic interface
• To what extent the classic approaches can be applied
to general discourse with more complex sentences?
• Examples matching the expected THEME patterns…
… but not the expected RHEMES.
AI With the Best, 25/09/2016 25
Validating the classic interface
• We have found that…
– Themes usually match, although ~40% do not.
– Steedman’s approach to include everything –apart
from theme – into a flat rheme span lacks
accuracy.
• We need a more accurate IS—prosody
interface.
AI With the Best, 25/09/2016 26
Towards a more accurate IS-Prosody
interface
• Our hypothesis:
– Applying Mel’čuk’s hierarchical three-partite
thematicity structure, we will be able to:
• Propose a more accurate modelisation of the
intonation-thematicity correlation for the ~40% non-
coincident patterns in theme spans.
• Find a justification for the discrepancies observed in the
rheme patterns.
AI With the Best, 25/09/2016 27
Towards a more accurate IS-Prosody
interface
• Specifier
Example with the annotation suggested by Mel’čuk (1)
AI With the Best, 25/09/2016 28
Towards a more accurate IS-Prosody
interface
• Specifier
Example with the annotation suggested by Mel’čuk (2)
AI With the Best, 25/09/2016 29
Towards a more accurate IS-Prosody
interface
• Hierarchy
Example with the annotation suggested by Mel’čuk
AI With the Best, 25/09/2016 30
rising pattern ↔ theme
Embedded themes behave as main themes in terms of intonation.
Classification experiments
• Combining Acoustic and Linguistic Levels in
Phrase-Oriented Prosody Modelling
AI With the Best, 25/09/2016 31
Classification experiments
• Testing acoustic parameters
AI With the Best, 25/09/2016 32
Classification experiments
• Testing linguistic features
AI With the Best, 25/09/2016 33
Conclusions
AI With the Best, 25/09/2016 34
• Information Structure determines the
“communicative” segmentation of the
meaning of an utterance.
• Central to the semantics—syntax—intonation
interface, and to NLP.
Conclusions
AI With the Best, 25/09/2016 35
• Descriptive study attempting to determine
which intonation patterns better characterize
thematicity in real utterances.
• Flat theme/rheme interpretation prevailing in
classical approaches fails to explain complex
linguistic structures.
• Hierarchical structures and the specifiers
render positive results.
Prosody & discourse structure
• Rhetorical Structure Theory (RST)
(Mann & Thompson, 1988)
Describes organization structure of texts via
definitions of relations between two text span,
nucleous (N) and satellite (S)
AI With the Best, 25/09/2016 36
Conclusions
• Prosody prediction from:
• Type of sentence
• Discourse structure
• Discourse markers
• Information structure
… to improve expressiveness and naturalness
of automatically generated speech
AI With the Best, 25/09/2016 37
Thank you for your attention!
AI With the Best, 25/09/2016 38

More Related Content

Similar to Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

2007 CogSci 2020 poster
2007 CogSci 2020 poster2007 CogSci 2020 poster
2007 CogSci 2020 posterWarNik Chow
 
Qualitative analysis part 2
Qualitative analysis part 2Qualitative analysis part 2
Qualitative analysis part 2ziarra advincula
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...semanticsconference
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Evaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxEvaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxturveycharlyn
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 SVTaylor123
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Jinho Choi
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices Prof. Nirmal Kumar Swain
 
Academic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDFAcademic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDFDustin Pytko
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognitionVipul Munot
 
Bea2014 Let Common Core Power Your Publishing
Bea2014   Let Common Core Power Your PublishingBea2014   Let Common Core Power Your Publishing
Bea2014 Let Common Core Power Your PublishingBookExpoAmerica
 

Similar to Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus (20)

2007 CogSci 2020 poster
2007 CogSci 2020 poster2007 CogSci 2020 poster
2007 CogSci 2020 poster
 
Incrementality
IncrementalityIncrementality
Incrementality
 
Qualitative analysis part 2
Qualitative analysis part 2Qualitative analysis part 2
Qualitative analysis part 2
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Evaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docxEvaluation EssayAssignmentWe have the opportunity to select.docx
Evaluation EssayAssignmentWe have the opportunity to select.docx
 
5810 day 3 sept 20 2014
5810 day 3 sept 20 2014 5810 day 3 sept 20 2014
5810 day 3 sept 20 2014
 
Digging Deeper into the Common Core
Digging Deeper into the Common CoreDigging Deeper into the Common Core
Digging Deeper into the Common Core
 
Chp 9
Chp 9 Chp 9
Chp 9
 
Chp 9
Chp 9Chp 9
Chp 9
 
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
Text-based Speaker Identification on Multiparty Dialogues Using Multi-documen...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices Dr. N K Swain’s research prescription for LIS novices
Dr. N K Swain’s research prescription for LIS novices
 
Convert to journal
Convert to journalConvert to journal
Convert to journal
 
Academic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDFAcademic Phrasebank Navigable PDF
Academic Phrasebank Navigable PDF
 
dialogue act modeling for automatic tagging and recognition
 dialogue act modeling for automatic tagging and recognition dialogue act modeling for automatic tagging and recognition
dialogue act modeling for automatic tagging and recognition
 
Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007Sacodeyl Birmingham 2007
Sacodeyl Birmingham 2007
 
Bea2014 Let Common Core Power Your Publishing
Bea2014   Let Common Core Power Your PublishingBea2014   Let Common Core Power Your Publishing
Bea2014 Let Common Core Power Your Publishing
 

More from WithTheBest

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo VittoriaWithTheBest
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual realityWithTheBest
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experienceWithTheBest
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioWithTheBest
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101 WithTheBest
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyWithTheBest
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devicesWithTheBest
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityWithTheBest
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vrWithTheBest
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physicsWithTheBest
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self WithTheBest
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsWithTheBest
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overviewWithTheBest
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason JeraldWithTheBest
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devilWithTheBest
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateWithTheBest
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRWithTheBest
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. WithTheBest
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldWithTheBest
 

More from WithTheBest (20)

Riccardo Vittoria
Riccardo VittoriaRiccardo Vittoria
Riccardo Vittoria
 
Recreating history in virtual reality
Recreating history in virtual realityRecreating history in virtual reality
Recreating history in virtual reality
 
Engaging and sharing your VR experience
Engaging and sharing your VR experienceEngaging and sharing your VR experience
Engaging and sharing your VR experience
 
How to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie StudioHow to survive the early days of VR as an Indie Studio
How to survive the early days of VR as an Indie Studio
 
Mixed reality 101
Mixed reality 101 Mixed reality 101
Mixed reality 101
 
Unlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive TechnologyUnlocking Human Potential with Immersive Technology
Unlocking Human Potential with Immersive Technology
 
Building your own video devices
Building your own video devicesBuilding your own video devices
Building your own video devices
 
Maximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unityMaximizing performance of 3 d user generated assets in unity
Maximizing performance of 3 d user generated assets in unity
 
Wizdish rovr
Wizdish rovrWizdish rovr
Wizdish rovr
 
Haptics & amp; null space vr
Haptics & amp; null space vrHaptics & amp; null space vr
Haptics & amp; null space vr
 
How we use vr to break the laws of physics
How we use vr to break the laws of physicsHow we use vr to break the laws of physics
How we use vr to break the laws of physics
 
The Virtual Self
The Virtual Self The Virtual Self
The Virtual Self
 
You dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helpsYou dont have to be mad to do VR and AR ... but it helps
You dont have to be mad to do VR and AR ... but it helps
 
Omnivirt overview
Omnivirt overviewOmnivirt overview
Omnivirt overview
 
VR Interactions - Jason Jerald
VR Interactions - Jason JeraldVR Interactions - Jason Jerald
VR Interactions - Jason Jerald
 
Japheth Funding your startup - dating the devil
Japheth  Funding your startup - dating the devilJapheth  Funding your startup - dating the devil
Japheth Funding your startup - dating the devil
 
Transported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estateTransported vr the virtual reality platform for real estate
Transported vr the virtual reality platform for real estate
 
Measuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VRMeasuring Behavior in VR - Rob Merki Cognitive VR
Measuring Behavior in VR - Rob Merki Cognitive VR
 
Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode. Global demand for Mixed Realty (VR/AR) content is about to explode.
Global demand for Mixed Realty (VR/AR) content is about to explode.
 
VR, a new technology over 40,000 years old
VR, a new technology over 40,000 years oldVR, a new technology over 40,000 years old
VR, a new technology over 40,000 years old
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

  • 1. Using Paragraph- and Discourse-based Prosodic Cues to Improve Speech Synthesis Expressiveness Mireia Farrús AI With the Best, 25/09/2016
  • 2. Outline 2AI With the Best, 25/09/2016 Over the last decade, automatically generated speech has significantly improved in terms of voice quality and expressiveness. However, multi-sentential synthesized speech still suffers from a high degree of unnaturalness.
  • 3. Outline 3AI With the Best, 25/09/2016 To overcome it, a more paragraph and communicative structure aware approach is needed to make real improvements in speech synthesis
  • 4. Text-to-Speech (TTS) Systems 4AI With the Best, 25/09/2016
  • 5. TTS systems 5AI With the Best, 25/09/2016
  • 6. TTS systems 6AI With the Best, 25/09/2016
  • 7. Context: current TTS systems 7AI With the Best, 25/09/2016 • Preceding and following phonemes • Position of segment in syllable • Position of syllable in word & phrase • Position of word in phrase • Stress/accent/length features of current/preceding/following syllables • Distance from stressed/accented syllables
  • 8. Context: current TTS systems 8AI With the Best, 25/09/2016 • POS of current/preceding/following word • Length of current/preceding/following phrase • End tone of phrase • Lenght of utterance measured in syllables/words/phrases (King, 2010)
  • 9. BUT human speech also relies on… 9AI With the Best, 25/09/2016 • Paragraph structure • Communicative structure • Discourse structure
  • 10. Paragraph structure 10AI With the Best, 25/09/2016 • “Paragraph-based Prosodic Cues for Speech Synthesis Applications”. Mireia Farrús, Catherine Lai, Johanna D. Moore
  • 11. Paragraph structure 11AI With the Best, 25/09/2016
  • 12. Paragraph structure 12AI With the Best, 25/09/2016
  • 13. 13AI With the Best, 25/09/2016
  • 14. Prosody & Pragraph Structure • ~ 1400 TED talks AI With the Best, 25/09/2016 14
  • 15. 15AI With the Best, 25/09/2016
  • 16. AI With the Best, 25/09/2016 16
  • 17. AI With the Best, 25/09/2016 17 • There is clear evidence of prosodic resets over paragraph breaks • We can also observe a steady declination in prosodic level over the paragraph • Difference features are more discriminative of boundaries than sentence-based features • Paragraphs have an identifiable suprasentential prosodic structure that can be described in terms of relative changes in F0, intensity, and timing • The classification experiments support the idea that utterance intrinsic features to paragraph position exist • Pause duration is the most robust predictor of paragraph breaks  We should be able to employ paragraph declination, pause and prosodic reset features to improve the naturalness of longer synthesized speech Conclusions Paragraph structure
  • 18. AI With the Best, 25/09/2016 18 Information/Communicative structure • “The Information Structure-Prosody Language Interface Revisited”. Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner
  • 19. Theoretical background - Motivation • Influence of information structure on intonation • Steedman’s theory relating – Theme/rheme – Intonation patterns AI With the Best, 25/09/2016 19
  • 20. Theoretical background - Handicaps • Based on short sentences with a simple structure and a default word order (SVO for English) • What if we have… AI With the Best, 25/09/2016 20
  • 21. ToBI labels Tones and Break Indices • high (H) and low (L) tones • pitch accents (the L* tones) • bitonal pitch accents (L+H*, etc.) • phrase accents (H- and L- tones) • boundary tones (H% and L%) AI With the Best, 25/09/2016 21
  • 22. Theoretical background – Our work • “The Information Structure-Prosody Language Interface Revisited”. Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner • Objectives – Validate Steedman’s theory – Proposal for more complex syntactic structures AI With the Best, 25/09/2016 22
  • 23. Theoretical background - Mel’čuk Steedman • Linearity • Intonation ~ theme/rheme Mel’čuk • Hierarchy • Intonation ~ Thematicity – theme/rheme – specifiers – embeddedness AI With the Best, 25/09/2016 23
  • 24. Preliminary experiments • Wall Street Journal corpus (Penn Treebank) • American English recordings • Native speakers • 109 sentences • AuToBI labelling + reduction model • Manual annotation of Thematicity AI With the Best, 25/09/2016 24
  • 25. Validating the classic interface • To what extent the classic approaches can be applied to general discourse with more complex sentences? • Examples matching the expected THEME patterns… … but not the expected RHEMES. AI With the Best, 25/09/2016 25
  • 26. Validating the classic interface • We have found that… – Themes usually match, although ~40% do not. – Steedman’s approach to include everything –apart from theme – into a flat rheme span lacks accuracy. • We need a more accurate IS—prosody interface. AI With the Best, 25/09/2016 26
  • 27. Towards a more accurate IS-Prosody interface • Our hypothesis: – Applying Mel’čuk’s hierarchical three-partite thematicity structure, we will be able to: • Propose a more accurate modelisation of the intonation-thematicity correlation for the ~40% non- coincident patterns in theme spans. • Find a justification for the discrepancies observed in the rheme patterns. AI With the Best, 25/09/2016 27
  • 28. Towards a more accurate IS-Prosody interface • Specifier Example with the annotation suggested by Mel’čuk (1) AI With the Best, 25/09/2016 28
  • 29. Towards a more accurate IS-Prosody interface • Specifier Example with the annotation suggested by Mel’čuk (2) AI With the Best, 25/09/2016 29
  • 30. Towards a more accurate IS-Prosody interface • Hierarchy Example with the annotation suggested by Mel’čuk AI With the Best, 25/09/2016 30 rising pattern ↔ theme Embedded themes behave as main themes in terms of intonation.
  • 31. Classification experiments • Combining Acoustic and Linguistic Levels in Phrase-Oriented Prosody Modelling AI With the Best, 25/09/2016 31
  • 32. Classification experiments • Testing acoustic parameters AI With the Best, 25/09/2016 32
  • 33. Classification experiments • Testing linguistic features AI With the Best, 25/09/2016 33
  • 34. Conclusions AI With the Best, 25/09/2016 34 • Information Structure determines the “communicative” segmentation of the meaning of an utterance. • Central to the semantics—syntax—intonation interface, and to NLP.
  • 35. Conclusions AI With the Best, 25/09/2016 35 • Descriptive study attempting to determine which intonation patterns better characterize thematicity in real utterances. • Flat theme/rheme interpretation prevailing in classical approaches fails to explain complex linguistic structures. • Hierarchical structures and the specifiers render positive results.
  • 36. Prosody & discourse structure • Rhetorical Structure Theory (RST) (Mann & Thompson, 1988) Describes organization structure of texts via definitions of relations between two text span, nucleous (N) and satellite (S) AI With the Best, 25/09/2016 36
  • 37. Conclusions • Prosody prediction from: • Type of sentence • Discourse structure • Discourse markers • Information structure … to improve expressiveness and naturalness of automatically generated speech AI With the Best, 25/09/2016 37
  • 38. Thank you for your attention! AI With the Best, 25/09/2016 38