SlideShare a Scribd company logo
1 of 21
Speech Synthesis
Dr. VMS
Speech synthesis
Speech synthesis is the artificial production
of human speech that sounds almost like a
human voice and is more precise with pitch,
speech, and tone.
Automation and AI-based system designed
for this purpose is called a text-to-speech
synthesizer and can be implemented in
software or hardware.
Architecture of TTS systems
3
Text-to-phoneme module
Text input
Grapheme-to-
phoneme
conversion
Prosodic
modelling
Acoustic
synthesis
Abbreviation
lexicon
Text in orthographic form
Exceptions
lexicon
Orthographic
rules
Phoneme string
Normalization
Grammar rules
Phoneme string +
prosodic annotation
Prosodic model
Synthetic speech
output
Phoneme-to-speech module
Various
methods
Speech Synthesis
 Speech Synthesis is the artificial production of human
speech.
 A synthesizer can incorporate a model of the vocal tract and
other human voice characteristics to create a completely
"synthetic" voice output.
 A computer system used for this purpose is called a speech
computer or speech synthesizer.
 A text-to-speech (TTS) system converts normal language
text into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into speech.
Text-to-speech
 A text-to-speech system (or "engine") is composed of two
parts: a front-end and a back-end.
 The front-end converts raw text containing symbols like
numbers and abbreviations into the equivalent of written-
out words (tokenization), then assigns phonetic
transcriptions to each word, and divides and marks the text
into prosodic units, like phrases, clauses, and sentences
(grapheme-phoneme conversion).
 The back-end—often referred to as the synthesizer— then
converts the symbolic linguistic representation into sound.
Types of voice synthesis systems
 text-to-speech and concept-to-speech synthesis.
 Concept-to-speech synthesis involves a generation
component that generates a textual expression
from semantic, pragmatic and discourse
knowledge. The speech signal can then be
generated from this expression.
 In text-to-speech synthesis, the text to be spoken in
provided, it is not generated by the system. It must
however be analyzed and interpreted in order to
convey the proper pronunciation and emphasis.
SpeechSynthesis forTranslations
 the synthesized speech can be controlled more precisely than human
speech, making it easier to produce an accurate rendition of the
original text.
 It saves you ample time while saving you the labor of manual work that
may have a chance of being error-prone.
 The speech synthesis translator does not need to spend time recording
themselves speaking the translated text. It can be a significant time-
saving for long or complex texts.
Speech sound variations
 Pitch, length, loudness
 Intonation (pitch)
 essential to avoid monotonous robot-like voice
 linked to basic syntax (eg statement vs question), but also to
thematization (stress)
 Pitch range is a sensitive issue
 Rhythm (length)
 Has to do with pace (natural tendency to slow down at end of
utterance)
 Also need to pause at appropriate place
 Linked (with pitch and loudness) to stress
Synthesis types
Articulatory synthesis
Formant synthesis
Concatenative synthesis
Unit selection synthesis
Articulatory synthesis
 Simulation of physical processes of human articulation
 Wolfgang von Kempelen (1734-1804) and others used
bellows, reeds and tubes to construct mechanical
speaking machines
 Modern versions simulate electronically the effect of
articulator positions, vocal tract shape, etc.
Formant synthesis
 Reproduce the relevant characteristics of the
acoustic signal
 In particular, amplitude and frequency of
formants
 But also other resonances and noise, eg for
nasals, laterals, fricatives etc.
 Values of acoustic parameters are derived by rule
from phonetic transcription
 Result is intelligible, but too “pure” and sounds
synthetic
Concatenative synthesis
 Concatenate segments of pre-recorded natural
human speech
 Requires database of previously recorded human
speech covering all the possible segments to be
synthesised
 Segment might be phoneme, syllable, word,
phrase, or any combination
Concatenative synthesis
 Input is phonemic representation + prosodic features
 Diphone segments can be digitally manipulated for
length, pitch and loudness
 Segment boundaries need to be smoothed to avoid
distortion
Diphone synthesis
 Most systems use diphones because they
are
Manageable in number
Can be automatically extracted from
recordings of human speech
Capture most inter-allophonic variants
Unit selection synthesis
 Same idea as concatenative synthesis, but database
contains bigger variety of “units”
 Multiple examples of phonemes (under different prosodic
conditions) are recorded
 Selection of appropriate unit therefore becomes more
complex, as there are in the database competing
candidates for selection
Navigation andVoiceCommands
 Navigation systems and voice-activated assistants like Siri and Google
Assistant are prime examples ofTTS software.
 They convert text-based directions into speech, making it easier for
drivers to stay focused on the road.
 The voice assistants offer voice commands for various tasks, such as
sending a text message or setting a reminder.This technology
benefits people unfamiliar with an area or who have trouble reading
maps.
Applications
 web pages from a web browser or Google Toolbar such as
Text-to-voice which is an add-on to Firefox.
 Some specialized software can narrate RSS-feeds.
 Some e-book readers, such as the Amazon Kindle,
PocketBook eBook Reader Pro, and the Bebook Neo use
TTS.
 GPS Navigation units use speech synthesis for automobile
navigation.
Use
 Speech synthesizers are great to help in preparing
educational materials, such as audiobooks, audio
blogs and language-learning materials.
 Some visual learners or those who prefer to listen to
material rather than read it. Now educational content
creators can create materials for those with reading
impairments, such as dyslexia.
Use
 The longest application has been in the use of screen readers
for people with visual impairment, but text-to-speech systems
are now commonly used by people with dyslexia and other
reading difficulties as well as by pre-literate children.
 Speech synthesis techniques are also used in entertainment
productions such as games and animations.
 In addition, speech synthesis is a valuable computational aid
for the analysis and assessment of speech disorders.
 It can also be used as an educational tool, to learn different
accents, like in Google Translate.
Limitations
 Speech Synthesis can still sound a little unnatural.
 The approaches to Speech Synthesis that yield the
most natural speech need considerable resources in
terms of data storage and processing power.
 The process of tokenizing text is rarely
straightforward.
 There are many spellings in English which are
pronounced differently based on context making it
difficult for users
TTS Speech Synthesis

More Related Content

What's hot

Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive CodingSrishti Kakade
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentationsamyakbhuta
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionHugo Moreno
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)BushraShaikh44
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technologySrijanKumar18
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarWithTheBest
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxNsaroj kumar
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 

What's hot (20)

Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Gujarati Text-to-Speech Presentation
Gujarati Text-to-Speech PresentationGujarati Text-to-Speech Presentation
Gujarati Text-to-Speech Presentation
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Speaker Recognition
Speaker RecognitionSpeaker Recognition
Speaker Recognition
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 

Similar to TTS Speech Synthesis

SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)dineshkatta4
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechNgwe Tun
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
 
A Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech SynthesisA Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech SynthesisCynthia King
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONijistjournal
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language ProcessingVikalp Mahendra
 
Unlocking the Power of AI Text-to-Speech
Unlocking the Power of AI Text-to-SpeechUnlocking the Power of AI Text-to-Speech
Unlocking the Power of AI Text-to-SpeechNola58
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech SynthesisCSCJournals
 
HCI 3e - Ch 10: Universal design
HCI 3e - Ch 10:  Universal designHCI 3e - Ch 10:  Universal design
HCI 3e - Ch 10: Universal designAlan Dix
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 

Similar to TTS Speech Synthesis (20)

SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Assign
AssignAssign
Assign
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-Speech
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
 
A Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech SynthesisA Short Introduction To Text-To-Speech Synthesis
A Short Introduction To Text-To-Speech Synthesis
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Unlocking the Power of AI Text-to-Speech
Unlocking the Power of AI Text-to-SpeechUnlocking the Power of AI Text-to-Speech
Unlocking the Power of AI Text-to-Speech
 
FYPReport
FYPReportFYPReport
FYPReport
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech Synthesis
 
HCI 3e - Ch 10: Universal design
HCI 3e - Ch 10:  Universal designHCI 3e - Ch 10:  Universal design
HCI 3e - Ch 10: Universal design
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
551 466-472
551 466-472551 466-472
551 466-472
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 

More from Subramanian Mani

Testing and Evaluation System in Higher Education.pptx
Testing and Evaluation System in Higher Education.pptxTesting and Evaluation System in Higher Education.pptx
Testing and Evaluation System in Higher Education.pptxSubramanian Mani
 
Functions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary CommunicationFunctions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary CommunicationSubramanian Mani
 
Forensic stylistics history, methods and applicadtionsand
Forensic stylistics history, methods and applicadtionsandForensic stylistics history, methods and applicadtionsand
Forensic stylistics history, methods and applicadtionsandSubramanian Mani
 
non verbal communication.pptx
non verbal communication.pptxnon verbal communication.pptx
non verbal communication.pptxSubramanian Mani
 
Testing and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptxTesting and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptxSubramanian Mani
 
online assessment during covid19 .pptx
online assessment during covid19 .pptxonline assessment during covid19 .pptx
online assessment during covid19 .pptxSubramanian Mani
 
verb_and_head_movement.ppt
verb_and_head_movement.pptverb_and_head_movement.ppt
verb_and_head_movement.pptSubramanian Mani
 
scopeoftranslationtechnologiesinindusstry5-201014031459.pptx
scopeoftranslationtechnologiesinindusstry5-201014031459.pptxscopeoftranslationtechnologiesinindusstry5-201014031459.pptx
scopeoftranslationtechnologiesinindusstry5-201014031459.pptxSubramanian Mani
 
Motivation, Gender Culture and Achievement,.pptx
Motivation, Gender Culture and Achievement,.pptxMotivation, Gender Culture and Achievement,.pptx
Motivation, Gender Culture and Achievement,.pptxSubramanian Mani
 
Tree Adjoining Grammar.pptx
Tree Adjoining Grammar.pptxTree Adjoining Grammar.pptx
Tree Adjoining Grammar.pptxSubramanian Mani
 
Methods and Theories of Languaage learning.pptx
Methods and Theories of Languaage learning.pptxMethods and Theories of Languaage learning.pptx
Methods and Theories of Languaage learning.pptxSubramanian Mani
 

More from Subramanian Mani (20)

Testing and Evaluation System in Higher Education.pptx
Testing and Evaluation System in Higher Education.pptxTesting and Evaluation System in Higher Education.pptx
Testing and Evaluation System in Higher Education.pptx
 
Functions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary CommunicationFunctions of Gestural Semantics in Contemporary Communication
Functions of Gestural Semantics in Contemporary Communication
 
Forensic stylistics history, methods and applicadtionsand
Forensic stylistics history, methods and applicadtionsandForensic stylistics history, methods and applicadtionsand
Forensic stylistics history, methods and applicadtionsand
 
body languages.pptx
body languages.pptxbody languages.pptx
body languages.pptx
 
translation scope.pptx
translation scope.pptxtranslation scope.pptx
translation scope.pptx
 
types_of_computers.pptx
types_of_computers.pptxtypes_of_computers.pptx
types_of_computers.pptx
 
non verbal communication.pptx
non verbal communication.pptxnon verbal communication.pptx
non verbal communication.pptx
 
X bar Syntax.pptx
X bar Syntax.pptxX bar Syntax.pptx
X bar Syntax.pptx
 
word sense, notions.pptx
word sense, notions.pptxword sense, notions.pptx
word sense, notions.pptx
 
Testing and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptxTesting and Evaluation Strategies in Second Language Teaching.pptx
Testing and Evaluation Strategies in Second Language Teaching.pptx
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
online assessment during covid19 .pptx
online assessment during covid19 .pptxonline assessment during covid19 .pptx
online assessment during covid19 .pptx
 
verb_and_head_movement.ppt
verb_and_head_movement.pptverb_and_head_movement.ppt
verb_and_head_movement.ppt
 
scopeoftranslationtechnologiesinindusstry5-201014031459.pptx
scopeoftranslationtechnologiesinindusstry5-201014031459.pptxscopeoftranslationtechnologiesinindusstry5-201014031459.pptx
scopeoftranslationtechnologiesinindusstry5-201014031459.pptx
 
verb movements.ppt
verb movements.pptverb movements.ppt
verb movements.ppt
 
Motivation, Gender Culture and Achievement,.pptx
Motivation, Gender Culture and Achievement,.pptxMotivation, Gender Culture and Achievement,.pptx
Motivation, Gender Culture and Achievement,.pptx
 
Tree Adjoining Grammar.pptx
Tree Adjoining Grammar.pptxTree Adjoining Grammar.pptx
Tree Adjoining Grammar.pptx
 
Minimalism.pptx
Minimalism.pptxMinimalism.pptx
Minimalism.pptx
 
Methods and Theories of Languaage learning.pptx
Methods and Theories of Languaage learning.pptxMethods and Theories of Languaage learning.pptx
Methods and Theories of Languaage learning.pptx
 
LFG and GPSG.pptx
LFG and GPSG.pptxLFG and GPSG.pptx
LFG and GPSG.pptx
 

Recently uploaded

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

TTS Speech Synthesis

  • 2. Speech synthesis Speech synthesis is the artificial production of human speech that sounds almost like a human voice and is more precise with pitch, speech, and tone. Automation and AI-based system designed for this purpose is called a text-to-speech synthesizer and can be implemented in software or hardware.
  • 3. Architecture of TTS systems 3 Text-to-phoneme module Text input Grapheme-to- phoneme conversion Prosodic modelling Acoustic synthesis Abbreviation lexicon Text in orthographic form Exceptions lexicon Orthographic rules Phoneme string Normalization Grammar rules Phoneme string + prosodic annotation Prosodic model Synthetic speech output Phoneme-to-speech module Various methods
  • 4. Speech Synthesis  Speech Synthesis is the artificial production of human speech.  A synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.  A computer system used for this purpose is called a speech computer or speech synthesizer.  A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.
  • 5. Text-to-speech  A text-to-speech system (or "engine") is composed of two parts: a front-end and a back-end.  The front-end converts raw text containing symbols like numbers and abbreviations into the equivalent of written- out words (tokenization), then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences (grapheme-phoneme conversion).  The back-end—often referred to as the synthesizer— then converts the symbolic linguistic representation into sound.
  • 6. Types of voice synthesis systems  text-to-speech and concept-to-speech synthesis.  Concept-to-speech synthesis involves a generation component that generates a textual expression from semantic, pragmatic and discourse knowledge. The speech signal can then be generated from this expression.  In text-to-speech synthesis, the text to be spoken in provided, it is not generated by the system. It must however be analyzed and interpreted in order to convey the proper pronunciation and emphasis.
  • 7. SpeechSynthesis forTranslations  the synthesized speech can be controlled more precisely than human speech, making it easier to produce an accurate rendition of the original text.  It saves you ample time while saving you the labor of manual work that may have a chance of being error-prone.  The speech synthesis translator does not need to spend time recording themselves speaking the translated text. It can be a significant time- saving for long or complex texts.
  • 8. Speech sound variations  Pitch, length, loudness  Intonation (pitch)  essential to avoid monotonous robot-like voice  linked to basic syntax (eg statement vs question), but also to thematization (stress)  Pitch range is a sensitive issue  Rhythm (length)  Has to do with pace (natural tendency to slow down at end of utterance)  Also need to pause at appropriate place  Linked (with pitch and loudness) to stress
  • 9. Synthesis types Articulatory synthesis Formant synthesis Concatenative synthesis Unit selection synthesis
  • 10. Articulatory synthesis  Simulation of physical processes of human articulation  Wolfgang von Kempelen (1734-1804) and others used bellows, reeds and tubes to construct mechanical speaking machines  Modern versions simulate electronically the effect of articulator positions, vocal tract shape, etc.
  • 11. Formant synthesis  Reproduce the relevant characteristics of the acoustic signal  In particular, amplitude and frequency of formants  But also other resonances and noise, eg for nasals, laterals, fricatives etc.  Values of acoustic parameters are derived by rule from phonetic transcription  Result is intelligible, but too “pure” and sounds synthetic
  • 12. Concatenative synthesis  Concatenate segments of pre-recorded natural human speech  Requires database of previously recorded human speech covering all the possible segments to be synthesised  Segment might be phoneme, syllable, word, phrase, or any combination
  • 13. Concatenative synthesis  Input is phonemic representation + prosodic features  Diphone segments can be digitally manipulated for length, pitch and loudness  Segment boundaries need to be smoothed to avoid distortion
  • 14. Diphone synthesis  Most systems use diphones because they are Manageable in number Can be automatically extracted from recordings of human speech Capture most inter-allophonic variants
  • 15. Unit selection synthesis  Same idea as concatenative synthesis, but database contains bigger variety of “units”  Multiple examples of phonemes (under different prosodic conditions) are recorded  Selection of appropriate unit therefore becomes more complex, as there are in the database competing candidates for selection
  • 16. Navigation andVoiceCommands  Navigation systems and voice-activated assistants like Siri and Google Assistant are prime examples ofTTS software.  They convert text-based directions into speech, making it easier for drivers to stay focused on the road.  The voice assistants offer voice commands for various tasks, such as sending a text message or setting a reminder.This technology benefits people unfamiliar with an area or who have trouble reading maps.
  • 17. Applications  web pages from a web browser or Google Toolbar such as Text-to-voice which is an add-on to Firefox.  Some specialized software can narrate RSS-feeds.  Some e-book readers, such as the Amazon Kindle, PocketBook eBook Reader Pro, and the Bebook Neo use TTS.  GPS Navigation units use speech synthesis for automobile navigation.
  • 18. Use  Speech synthesizers are great to help in preparing educational materials, such as audiobooks, audio blogs and language-learning materials.  Some visual learners or those who prefer to listen to material rather than read it. Now educational content creators can create materials for those with reading impairments, such as dyslexia.
  • 19. Use  The longest application has been in the use of screen readers for people with visual impairment, but text-to-speech systems are now commonly used by people with dyslexia and other reading difficulties as well as by pre-literate children.  Speech synthesis techniques are also used in entertainment productions such as games and animations.  In addition, speech synthesis is a valuable computational aid for the analysis and assessment of speech disorders.  It can also be used as an educational tool, to learn different accents, like in Google Translate.
  • 20. Limitations  Speech Synthesis can still sound a little unnatural.  The approaches to Speech Synthesis that yield the most natural speech need considerable resources in terms of data storage and processing power.  The process of tokenizing text is rarely straightforward.  There are many spellings in English which are pronounced differently based on context making it difficult for users