SlideShare a Scribd company logo
1 of 30
Download to read offline
Comparative study of Text-to-Speech
Synthesis for Indian Languages by using
Syllable Approach
CLASS:M.E I COMPUTER
GUIDED BY : PROF. ASHISH MANWATKAR PRESENTED BY : RAVI SHARMA
ROLL NO: 15311
CONTENT
• INTRODUCTION
• MOTIVATION
• LITERATURE SURVEY
• DATA TABLE
• SYSYEM ARCHITECTURE
• MATHEMATICAL MODEL
• ALGORITHM
• ADVANTAGES
• DISADVANTAGES
• APPLICATION
• CONCLUSION
INTRODUCTION
• Text to Speech Synthesis-
A system which takes as input a sequence of words and converts
them to speech
•Parts of Speech Synthesizers
Speech Synthesizers usually consist of two parts.
First Part- The first part has two major tasks.
• First it takes the raw text and converts things like numbers and
abbreviations into their written-out word equivalents. This process is
often called text normalization.
• Then it assigns phonetic transcriptions to each word, and divides and
marks the text into various linguistic units like phrases, clauses, and
sentences.
• Second Part- The other part, the back end, takes the symbolic
linguistic representation and converts it into actual sound output
Text-to-phoneme challenges
• Speech synthesis systems use two basic approaches to determine the
pronunciation of a word based on its spelling, a process which is often
called text-to-phoneme conversion.
Dictionary Based approach
• The simplest approach to text-to-phoneme conversion is the
dictionary-based approach, where a large dictionary containing all
the words of a language and their correct pronunciation is stored by
the program.
• Determining the correct pronunciation of each word is a matter of
looking up each word in the dictionary and replacing the spelling with
the pronunciation specified in the dictionary
Rule based approach
• The other approach used for text-to-phoneme conversion is the rule-
based approach, where rules for the pronunciations of words are
applied to words to work out their pronunciations based on their
spellings. This is similar to the "sounding out" approach to learning
reading.
• SYLLABLE RULES-
Syllable is a cluster of consonants and vowel
Syllable should contain one vowel and any number of consonants.
1. Single vowel can act as a syllable. (I.e. V).
2. V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……et .
3. Consonant efore o el is alled „O set‟. i.e. C*V
4. Consonant after o el is alled „Coda‟. i.e. V*C
Syllable Rules-
1. When asals su h as / ’/, half pro ou ed / / or / / sou d
succeed a vowel immediately, they would be treated as a part of
the o el a d also the sa e s lla le. For e a ple, / ’/ i sa ’sthaa
will be a part of syllable containing /sa/
2. When there are three or more consonants between two
consecutive vowels, the first consonant would be a part of the coda
of the previous syllable while the remaining consonants would be
onset of the next syllable .
Syllable Rules-
3. When there are exactly two consonants between two vowels, the first consonant
would be part of coda of previous syllable and the second would be onset of the
next syllable
4. When the second consonant is a member of the set {/r/ /s/ /sh/ /shh/}, both the
consonants would be a part of onset of the next syllable
HMM synthesis
• A quite new technology is speech synthesis based on HMM, a
mathematical concept called Hidden Markov models.
• It is a statistical method where the text-to-speech system is based on
a model that is not known beforehand but it is refined by continuous
training.
• The technique consumes large CPU resources but very little memory.
• This approach seems to give a better prosody, without glitches, and
still produces very natural sounding, human-like speech
MOTIVATION
• There are 1652 languages in India
• Building a TTS system for each of them is time-consuming and
exhausting. Thus a more generic approach towards system building is
required. A common framework is first designed, using which
language- spe ifi systems are then built.
LITERATURE SURVEY
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
1.
An Unit Selection based
Hindi Text To Speech
Synthesis System Using
Syllable as a Basic Unit
quality of this system is the
improved naturalness in the
synthesized speech
An important
advantage of this
approach leads to
reduced prosody
mismatch and
spectral
discontinuity that
occurs during
syllable
concatenation.
Large concatenation
points. This large
concatenation
results in glitch at
the output which is
hard to eliminate
prosody mismatch
and spectral
discontinuity
2. Design and Development of
a Text-To-Speech Synthesizer
for Indian Languages
The design and
implementation of a unit
selection based text-to-
speech synthesizer with
syllables and polysyllables
as units of concatenation
improves synthesis
quality and it
reduces search
space improving the
synthesis timing.
it is not clear at the
time of writing, how
spectral
interpolation will be
performed at the
boundaries
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
3. Development of Speech
Database for Hindi Text-To-
Speech System Considering
Syllable as a Basic Unit
convert an orthographic
text into intelligible and
natural sounding speech
This technique
provides very high
quality speech
output which is
reasonably natural
and equivalent to
voice of the original
speaker.
before synthesizing
pre-processing of
text is required
4. Text-to-Speech Synthesis
using syllable-like units
the design of a syllable
based concatenative
waveform synthesizer for
Indian languages.
the automatic
segmentation
algorithm has in-
deed created a
useful speech unit
that has low target
and concatenation
costs.
current work uses a
single unique
syllable-like unit
from the repository
for synthesis.
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
5. Statistical parametric speech
synthesis
generating acceptable
speech synthesis
a variety of speaking
styles or emotional
speech can be
synthesized
using the small
amount of speech
data.
quality of
synthesized speech
factors which
degrade the
Quality: vocoder,
modeling accuracy,
and over-
smoothing.
6. Unit selection in a
concatenative speech
synthesis system using a
large speech database
the generation of natural-
sounding synthesized
speech waveforms
produce more
natural speech
there is little
difference in the
quality of out- put
using the two
training method
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
7. An Unit Selection based
Hindi Text To Speech
Synthesis System Using
Syllable as a Basic Unit
quality of this system is the
improved naturalness in the
synthesized speech and
gives very high quality
speech output when
compared to other
synthesizing techniques
An important
advantage of this
approach leads to
reduced prosody
mismatch and
spectral
discontinuity that
occurs during
syllable
concatenation.
Large concatenation
points. This large
concatenation
results in glitch at
the output which is
hard to eliminate
prosody mismatch
and spectral
discontinuity.
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
8. A Common Attribute based
Unified HTS framework for
Speech
Synthesis in Indian
Languages
high-quality synthetic
speech
concatenates pre-
recorded speech units
in
the database such that
the target and
concatenation costs
are minimized.
to obtain high-
quality synthetic
speech, the size of
the database
required is large, to
ensure that
sufficient examples
for each unit in
every
possible context is
available
DATA TABLE
TABLE I: Degradation MOS (DMOS) and Word error rate (WER) scores
Target Language Marathi Bengali Tamil Tamil Telugu Malayalam
Source Language Hindi Hindi Tamil Hindi Tamil Tamil
Numbers of hours of
target language
3 2 3 3 3 3
DMOS 2.79 2.50 2.97 2.53 2.63 2.88
WER 3.48% 15.06% 6.61% 5.16% 16.14% 3.13%
SYSTEM ARCHITECTURE
Fig.2.Training and Synthesis phases of HMM-based speech synthesis
MATHEMATICAL MODEL
Let I = Set of Language
I = {T, S}
Where,
T is the text which is input and
S is the sound is output.
D (I) = arg max p(o/w, lambda)
Where,
Lambda represents the model parameters
o represents speech parameters and
w is the transcription of the test sentence
Syllable Rules-
Syllable is a cluster of consonants and vowel
Syllable should contain one vowel and any number of consonants.
Single vowel can act as a syllable. (I.e. V).
V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……etc.
Consonant before vowel is called „Onset‟. i.e.(C*V)
Consonant after vowel is called „Coda‟. i.e.(V*C)
Output = Pk
Where D(I) = dictionary Fuction
Pk is Phonetics
ALGORITHM
• PARAMETER GENERATION ALGORITHM
• DELAY BASED SEGMENTATION ALGORITHM
ADVANTAGES
• For people wanting to learn a new language
• For educational institutions looking to enhance student learning,
recall and comprehension
• For people wanting to learn through multiple mediums to solidify
learning
• For people with physical disabilities
• Difficulty handling a book or paper
• Visual Issue (Difficulty seeing text)
DISADVANTAGES
• Despite large improvements, Speech Synthesis can still sound a little
unnatural.
• The approaches to Speech Synthesis that yield the most natural
speech need considerable resources in terms of data storage and
processing power.
• pronunciation analysis from written text is also a major problem
APPLICATION
• Systems that provide voice synthesis output for blind users are
generally referred to as screen readers.
• Applications for the Blind
• Applications for the Deafened and Vocally Handicapped
• Educational Applications
CONCLUSION
This paper explores syllable approach to building language independent
text to speech systems for Indian Languages. The use of common
phone set, common question set and borrowing context-independent
monophone models along with syllable approach across languages
makes the procedure easier and less time-consuming, without
compromising the synthesized speech quality. Systems can be built
without even knowing the language. This is especially quite beneficial
in the Indian scenario.
REFERENCES
• [ ] A. J. Hu t a d A. W. Bla k, U it sele tio i a concatenative speech synthesis system using a
large spee h data ase, i A ousti s, Spee h, a d Sig al Pro essi g, ICASSP-96), vol. 1,
1996, pp. 373–376.
• [2] H. Zen, K. Tokuda, a d A. W. Bla k, Statisti al para etri spee h s thesis, Spee h
Communication, vol. 51, no. 3, pp. 1039–1064, November 2009.
• [3] A. Beyerlein, W. Byrne, J. M. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone,
a d W. Wa g, To ards la guage i depe de t a ousti odeli g, i Pro eedi g o A ousti s,
Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 1029–1032.
• [4] R. Bayeh, S. Lin, G. Chollet, and C. Mokbel, To ards ultili gual spee h re og itio usi g
data dri e sour e/target a ousti al u its asso iatio , i A ousti s, Spee h, a d Sig al
Processing, 2004. Pro- ceedings ICASSP ’ , ol. , , pp. I–521–4. [5] V. B. Le and L. Besacier,
First steps i fast a ousti odeli g for a e target la guage: Appli atio to Viet a ese, i
A ousti s, Spee h, a d Sig al Pro essi g, . Pro eedi gs ICASSP ’ , ol. , , pp. –
824.
• [5] P. Eswar, A rule ased approa h for spotti g hara ters fro contin- uous speech in Indian
la guages, PhD Dissertatio , I dia I stitute of Te h olog , Depart e t of Co puter S ie e
and Engg., Madras, India, 1991
THANK YOU…!!!

More Related Content

What's hot

umair ijaz's Lexicography presentation
umair ijaz's Lexicography presentationumair ijaz's Lexicography presentation
umair ijaz's Lexicography presentationUmair Ijaz
 
SPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPSPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPHimaniBansal15
 
Language and the brain camila contreras
Language and the brain camila contrerasLanguage and the brain camila contreras
Language and the brain camila contrerasNatalia Ramirez
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
What is language?
What is language?What is language?
What is language?Dave Gray
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeRajpootBhatti5
 
From the formal grammar to the functional grammar
From the formal grammar to the functional grammarFrom the formal grammar to the functional grammar
From the formal grammar to the functional grammardimimytaki
 
Periodic and aperiodic sounds (2)
Periodic and aperiodic sounds (2)Periodic and aperiodic sounds (2)
Periodic and aperiodic sounds (2)Ahmed Qadoury Abed
 
Speech production
Speech productionSpeech production
Speech productiongcuf
 
Slides Oral language: The foundation of reading and reading intervention
Slides Oral language: The foundation of reading and reading interventionSlides Oral language: The foundation of reading and reading intervention
Slides Oral language: The foundation of reading and reading interventionRALLICampaign
 
Factors Of Language Change 1
Factors Of Language Change 1Factors Of Language Change 1
Factors Of Language Change 1Dr. Cupid Lucid
 
Linguistic and encyclopedic dictionaries
Linguistic and encyclopedic dictionariesLinguistic and encyclopedic dictionaries
Linguistic and encyclopedic dictionariesJoy Daprosa
 
1. An Overview of Functional Grammar
1. An Overview of Functional Grammar1. An Overview of Functional Grammar
1. An Overview of Functional GrammarMelia Nesti Ayu
 
Approaches to studying language attitudes beyond labov
Approaches to studying language attitudes  beyond labovApproaches to studying language attitudes  beyond labov
Approaches to studying language attitudes beyond labovJacqueline Trademan
 
Unit 3: Organic Voice Disorders
Unit 3: Organic Voice DisordersUnit 3: Organic Voice Disorders
Unit 3: Organic Voice Disorderssahughes
 

What's hot (20)

umair ijaz's Lexicography presentation
umair ijaz's Lexicography presentationumair ijaz's Lexicography presentation
umair ijaz's Lexicography presentation
 
Dictionaries
DictionariesDictionaries
Dictionaries
 
SPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLPSPEECH PERCEPTION MASLP
SPEECH PERCEPTION MASLP
 
Spectrograms
SpectrogramsSpectrograms
Spectrograms
 
Language and the brain camila contreras
Language and the brain camila contrerasLanguage and the brain camila contreras
Language and the brain camila contreras
 
Language and the brain 1
Language and the brain 1Language and the brain 1
Language and the brain 1
 
The dictionary
The dictionaryThe dictionary
The dictionary
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
What is language?
What is language?What is language?
What is language?
 
phonemes
 phonemes  phonemes
phonemes
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 Routledge
 
From the formal grammar to the functional grammar
From the formal grammar to the functional grammarFrom the formal grammar to the functional grammar
From the formal grammar to the functional grammar
 
Periodic and aperiodic sounds (2)
Periodic and aperiodic sounds (2)Periodic and aperiodic sounds (2)
Periodic and aperiodic sounds (2)
 
Speech production
Speech productionSpeech production
Speech production
 
Slides Oral language: The foundation of reading and reading intervention
Slides Oral language: The foundation of reading and reading interventionSlides Oral language: The foundation of reading and reading intervention
Slides Oral language: The foundation of reading and reading intervention
 
Factors Of Language Change 1
Factors Of Language Change 1Factors Of Language Change 1
Factors Of Language Change 1
 
Linguistic and encyclopedic dictionaries
Linguistic and encyclopedic dictionariesLinguistic and encyclopedic dictionaries
Linguistic and encyclopedic dictionaries
 
1. An Overview of Functional Grammar
1. An Overview of Functional Grammar1. An Overview of Functional Grammar
1. An Overview of Functional Grammar
 
Approaches to studying language attitudes beyond labov
Approaches to studying language attitudes  beyond labovApproaches to studying language attitudes  beyond labov
Approaches to studying language attitudes beyond labov
 
Unit 3: Organic Voice Disorders
Unit 3: Organic Voice DisordersUnit 3: Organic Voice Disorders
Unit 3: Organic Voice Disorders
 

Similar to Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognitionAditya Kumar Khare
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...rahulmonikasharma
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET Journal
 
English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...IJECEIAES
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicijnlc
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdfRamya Nellutla
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesEditor IJCATR
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needsIvan Berlocher
 

Similar to Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach (20)

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
 
FYPReport
FYPReportFYPReport
FYPReport
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
 
English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Permasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdfPermasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdf
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
VOC real world enterprise needs
VOC real world enterprise needsVOC real world enterprise needs
VOC real world enterprise needs
 

Recently uploaded

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 

Recently uploaded (20)

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 

Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach

  • 1. Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach CLASS:M.E I COMPUTER GUIDED BY : PROF. ASHISH MANWATKAR PRESENTED BY : RAVI SHARMA ROLL NO: 15311
  • 2. CONTENT • INTRODUCTION • MOTIVATION • LITERATURE SURVEY • DATA TABLE • SYSYEM ARCHITECTURE • MATHEMATICAL MODEL • ALGORITHM • ADVANTAGES • DISADVANTAGES • APPLICATION • CONCLUSION
  • 3. INTRODUCTION • Text to Speech Synthesis- A system which takes as input a sequence of words and converts them to speech
  • 4. •Parts of Speech Synthesizers Speech Synthesizers usually consist of two parts. First Part- The first part has two major tasks. • First it takes the raw text and converts things like numbers and abbreviations into their written-out word equivalents. This process is often called text normalization. • Then it assigns phonetic transcriptions to each word, and divides and marks the text into various linguistic units like phrases, clauses, and sentences.
  • 5. • Second Part- The other part, the back end, takes the symbolic linguistic representation and converts it into actual sound output
  • 6. Text-to-phoneme challenges • Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme conversion.
  • 7. Dictionary Based approach • The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciation is stored by the program. • Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary
  • 8. Rule based approach • The other approach used for text-to-phoneme conversion is the rule- based approach, where rules for the pronunciations of words are applied to words to work out their pronunciations based on their spellings. This is similar to the "sounding out" approach to learning reading.
  • 9. • SYLLABLE RULES- Syllable is a cluster of consonants and vowel Syllable should contain one vowel and any number of consonants. 1. Single vowel can act as a syllable. (I.e. V). 2. V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……et . 3. Consonant efore o el is alled „O set‟. i.e. C*V 4. Consonant after o el is alled „Coda‟. i.e. V*C
  • 10. Syllable Rules- 1. When asals su h as / ’/, half pro ou ed / / or / / sou d succeed a vowel immediately, they would be treated as a part of the o el a d also the sa e s lla le. For e a ple, / ’/ i sa ’sthaa will be a part of syllable containing /sa/ 2. When there are three or more consonants between two consecutive vowels, the first consonant would be a part of the coda of the previous syllable while the remaining consonants would be onset of the next syllable .
  • 11. Syllable Rules- 3. When there are exactly two consonants between two vowels, the first consonant would be part of coda of previous syllable and the second would be onset of the next syllable 4. When the second consonant is a member of the set {/r/ /s/ /sh/ /shh/}, both the consonants would be a part of onset of the next syllable
  • 12. HMM synthesis • A quite new technology is speech synthesis based on HMM, a mathematical concept called Hidden Markov models. • It is a statistical method where the text-to-speech system is based on a model that is not known beforehand but it is refined by continuous training. • The technique consumes large CPU resources but very little memory. • This approach seems to give a better prosody, without glitches, and still produces very natural sounding, human-like speech
  • 13.
  • 14. MOTIVATION • There are 1652 languages in India • Building a TTS system for each of them is time-consuming and exhausting. Thus a more generic approach towards system building is required. A common framework is first designed, using which language- spe ifi systems are then built.
  • 15. LITERATURE SURVEY SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 1. An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit quality of this system is the improved naturalness in the synthesized speech An important advantage of this approach leads to reduced prosody mismatch and spectral discontinuity that occurs during syllable concatenation. Large concatenation points. This large concatenation results in glitch at the output which is hard to eliminate prosody mismatch and spectral discontinuity 2. Design and Development of a Text-To-Speech Synthesizer for Indian Languages The design and implementation of a unit selection based text-to- speech synthesizer with syllables and polysyllables as units of concatenation improves synthesis quality and it reduces search space improving the synthesis timing. it is not clear at the time of writing, how spectral interpolation will be performed at the boundaries
  • 16. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 3. Development of Speech Database for Hindi Text-To- Speech System Considering Syllable as a Basic Unit convert an orthographic text into intelligible and natural sounding speech This technique provides very high quality speech output which is reasonably natural and equivalent to voice of the original speaker. before synthesizing pre-processing of text is required 4. Text-to-Speech Synthesis using syllable-like units the design of a syllable based concatenative waveform synthesizer for Indian languages. the automatic segmentation algorithm has in- deed created a useful speech unit that has low target and concatenation costs. current work uses a single unique syllable-like unit from the repository for synthesis.
  • 17. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 5. Statistical parametric speech synthesis generating acceptable speech synthesis a variety of speaking styles or emotional speech can be synthesized using the small amount of speech data. quality of synthesized speech factors which degrade the Quality: vocoder, modeling accuracy, and over- smoothing. 6. Unit selection in a concatenative speech synthesis system using a large speech database the generation of natural- sounding synthesized speech waveforms produce more natural speech there is little difference in the quality of out- put using the two training method
  • 18. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 7. An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit quality of this system is the improved naturalness in the synthesized speech and gives very high quality speech output when compared to other synthesizing techniques An important advantage of this approach leads to reduced prosody mismatch and spectral discontinuity that occurs during syllable concatenation. Large concatenation points. This large concatenation results in glitch at the output which is hard to eliminate prosody mismatch and spectral discontinuity.
  • 19. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 8. A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages high-quality synthetic speech concatenates pre- recorded speech units in the database such that the target and concatenation costs are minimized. to obtain high- quality synthetic speech, the size of the database required is large, to ensure that sufficient examples for each unit in every possible context is available
  • 20. DATA TABLE TABLE I: Degradation MOS (DMOS) and Word error rate (WER) scores Target Language Marathi Bengali Tamil Tamil Telugu Malayalam Source Language Hindi Hindi Tamil Hindi Tamil Tamil Numbers of hours of target language 3 2 3 3 3 3 DMOS 2.79 2.50 2.97 2.53 2.63 2.88 WER 3.48% 15.06% 6.61% 5.16% 16.14% 3.13%
  • 21. SYSTEM ARCHITECTURE Fig.2.Training and Synthesis phases of HMM-based speech synthesis
  • 22. MATHEMATICAL MODEL Let I = Set of Language I = {T, S} Where, T is the text which is input and S is the sound is output. D (I) = arg max p(o/w, lambda) Where, Lambda represents the model parameters o represents speech parameters and w is the transcription of the test sentence
  • 23. Syllable Rules- Syllable is a cluster of consonants and vowel Syllable should contain one vowel and any number of consonants. Single vowel can act as a syllable. (I.e. V). V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……etc. Consonant before vowel is called „Onset‟. i.e.(C*V) Consonant after vowel is called „Coda‟. i.e.(V*C) Output = Pk Where D(I) = dictionary Fuction Pk is Phonetics
  • 24. ALGORITHM • PARAMETER GENERATION ALGORITHM • DELAY BASED SEGMENTATION ALGORITHM
  • 25. ADVANTAGES • For people wanting to learn a new language • For educational institutions looking to enhance student learning, recall and comprehension • For people wanting to learn through multiple mediums to solidify learning • For people with physical disabilities • Difficulty handling a book or paper • Visual Issue (Difficulty seeing text)
  • 26. DISADVANTAGES • Despite large improvements, Speech Synthesis can still sound a little unnatural. • The approaches to Speech Synthesis that yield the most natural speech need considerable resources in terms of data storage and processing power. • pronunciation analysis from written text is also a major problem
  • 27. APPLICATION • Systems that provide voice synthesis output for blind users are generally referred to as screen readers. • Applications for the Blind • Applications for the Deafened and Vocally Handicapped • Educational Applications
  • 28. CONCLUSION This paper explores syllable approach to building language independent text to speech systems for Indian Languages. The use of common phone set, common question set and borrowing context-independent monophone models along with syllable approach across languages makes the procedure easier and less time-consuming, without compromising the synthesized speech quality. Systems can be built without even knowing the language. This is especially quite beneficial in the Indian scenario.
  • 29. REFERENCES • [ ] A. J. Hu t a d A. W. Bla k, U it sele tio i a concatenative speech synthesis system using a large spee h data ase, i A ousti s, Spee h, a d Sig al Pro essi g, ICASSP-96), vol. 1, 1996, pp. 373–376. • [2] H. Zen, K. Tokuda, a d A. W. Bla k, Statisti al para etri spee h s thesis, Spee h Communication, vol. 51, no. 3, pp. 1039–1064, November 2009. • [3] A. Beyerlein, W. Byrne, J. M. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone, a d W. Wa g, To ards la guage i depe de t a ousti odeli g, i Pro eedi g o A ousti s, Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 1029–1032. • [4] R. Bayeh, S. Lin, G. Chollet, and C. Mokbel, To ards ultili gual spee h re og itio usi g data dri e sour e/target a ousti al u its asso iatio , i A ousti s, Spee h, a d Sig al Processing, 2004. Pro- ceedings ICASSP ’ , ol. , , pp. I–521–4. [5] V. B. Le and L. Besacier, First steps i fast a ousti odeli g for a e target la guage: Appli atio to Viet a ese, i A ousti s, Spee h, a d Sig al Pro essi g, . Pro eedi gs ICASSP ’ , ol. , , pp. – 824. • [5] P. Eswar, A rule ased approa h for spotti g hara ters fro contin- uous speech in Indian la guages, PhD Dissertatio , I dia I stitute of Te h olog , Depart e t of Co puter S ie e and Engg., Madras, India, 1991