SlideShare a Scribd company logo
Comparative study of Text-to-Speech
Synthesis for Indian Languages by using
Syllable Approach
CLASS:M.E I COMPUTER
GUIDED BY : PROF. ASHISH MANWATKAR PRESENTED BY : RAVI SHARMA
ROLL NO: 15311
CONTENT
• INTRODUCTION
• MOTIVATION
• LITERATURE SURVEY
• DATA TABLE
• SYSYEM ARCHITECTURE
• MATHEMATICAL MODEL
• ALGORITHM
• ADVANTAGES
• DISADVANTAGES
• APPLICATION
• CONCLUSION
INTRODUCTION
• Text to Speech Synthesis-
A system which takes as input a sequence of words and converts
them to speech
•Parts of Speech Synthesizers
Speech Synthesizers usually consist of two parts.
First Part- The first part has two major tasks.
• First it takes the raw text and converts things like numbers and
abbreviations into their written-out word equivalents. This process is
often called text normalization.
• Then it assigns phonetic transcriptions to each word, and divides and
marks the text into various linguistic units like phrases, clauses, and
sentences.
• Second Part- The other part, the back end, takes the symbolic
linguistic representation and converts it into actual sound output
Text-to-phoneme challenges
• Speech synthesis systems use two basic approaches to determine the
pronunciation of a word based on its spelling, a process which is often
called text-to-phoneme conversion.
Dictionary Based approach
• The simplest approach to text-to-phoneme conversion is the
dictionary-based approach, where a large dictionary containing all
the words of a language and their correct pronunciation is stored by
the program.
• Determining the correct pronunciation of each word is a matter of
looking up each word in the dictionary and replacing the spelling with
the pronunciation specified in the dictionary
Rule based approach
• The other approach used for text-to-phoneme conversion is the rule-
based approach, where rules for the pronunciations of words are
applied to words to work out their pronunciations based on their
spellings. This is similar to the "sounding out" approach to learning
reading.
• SYLLABLE RULES-
Syllable is a cluster of consonants and vowel
Syllable should contain one vowel and any number of consonants.
1. Single vowel can act as a syllable. (I.e. V).
2. V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……et .
3. Consonant efore o el is alled „O set‟. i.e. C*V
4. Consonant after o el is alled „Coda‟. i.e. V*C
Syllable Rules-
1. When asals su h as / ’/, half pro ou ed / / or / / sou d
succeed a vowel immediately, they would be treated as a part of
the o el a d also the sa e s lla le. For e a ple, / ’/ i sa ’sthaa
will be a part of syllable containing /sa/
2. When there are three or more consonants between two
consecutive vowels, the first consonant would be a part of the coda
of the previous syllable while the remaining consonants would be
onset of the next syllable .
Syllable Rules-
3. When there are exactly two consonants between two vowels, the first consonant
would be part of coda of previous syllable and the second would be onset of the
next syllable
4. When the second consonant is a member of the set {/r/ /s/ /sh/ /shh/}, both the
consonants would be a part of onset of the next syllable
HMM synthesis
• A quite new technology is speech synthesis based on HMM, a
mathematical concept called Hidden Markov models.
• It is a statistical method where the text-to-speech system is based on
a model that is not known beforehand but it is refined by continuous
training.
• The technique consumes large CPU resources but very little memory.
• This approach seems to give a better prosody, without glitches, and
still produces very natural sounding, human-like speech
MOTIVATION
• There are 1652 languages in India
• Building a TTS system for each of them is time-consuming and
exhausting. Thus a more generic approach towards system building is
required. A common framework is first designed, using which
language- spe ifi systems are then built.
LITERATURE SURVEY
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
1.
An Unit Selection based
Hindi Text To Speech
Synthesis System Using
Syllable as a Basic Unit
quality of this system is the
improved naturalness in the
synthesized speech
An important
advantage of this
approach leads to
reduced prosody
mismatch and
spectral
discontinuity that
occurs during
syllable
concatenation.
Large concatenation
points. This large
concatenation
results in glitch at
the output which is
hard to eliminate
prosody mismatch
and spectral
discontinuity
2. Design and Development of
a Text-To-Speech Synthesizer
for Indian Languages
The design and
implementation of a unit
selection based text-to-
speech synthesizer with
syllables and polysyllables
as units of concatenation
improves synthesis
quality and it
reduces search
space improving the
synthesis timing.
it is not clear at the
time of writing, how
spectral
interpolation will be
performed at the
boundaries
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
3. Development of Speech
Database for Hindi Text-To-
Speech System Considering
Syllable as a Basic Unit
convert an orthographic
text into intelligible and
natural sounding speech
This technique
provides very high
quality speech
output which is
reasonably natural
and equivalent to
voice of the original
speaker.
before synthesizing
pre-processing of
text is required
4. Text-to-Speech Synthesis
using syllable-like units
the design of a syllable
based concatenative
waveform synthesizer for
Indian languages.
the automatic
segmentation
algorithm has in-
deed created a
useful speech unit
that has low target
and concatenation
costs.
current work uses a
single unique
syllable-like unit
from the repository
for synthesis.
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
5. Statistical parametric speech
synthesis
generating acceptable
speech synthesis
a variety of speaking
styles or emotional
speech can be
synthesized
using the small
amount of speech
data.
quality of
synthesized speech
factors which
degrade the
Quality: vocoder,
modeling accuracy,
and over-
smoothing.
6. Unit selection in a
concatenative speech
synthesis system using a
large speech database
the generation of natural-
sounding synthesized
speech waveforms
produce more
natural speech
there is little
difference in the
quality of out- put
using the two
training method
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
7. An Unit Selection based
Hindi Text To Speech
Synthesis System Using
Syllable as a Basic Unit
quality of this system is the
improved naturalness in the
synthesized speech and
gives very high quality
speech output when
compared to other
synthesizing techniques
An important
advantage of this
approach leads to
reduced prosody
mismatch and
spectral
discontinuity that
occurs during
syllable
concatenation.
Large concatenation
points. This large
concatenation
results in glitch at
the output which is
hard to eliminate
prosody mismatch
and spectral
discontinuity.
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
8. A Common Attribute based
Unified HTS framework for
Speech
Synthesis in Indian
Languages
high-quality synthetic
speech
concatenates pre-
recorded speech units
in
the database such that
the target and
concatenation costs
are minimized.
to obtain high-
quality synthetic
speech, the size of
the database
required is large, to
ensure that
sufficient examples
for each unit in
every
possible context is
available
DATA TABLE
TABLE I: Degradation MOS (DMOS) and Word error rate (WER) scores
Target Language Marathi Bengali Tamil Tamil Telugu Malayalam
Source Language Hindi Hindi Tamil Hindi Tamil Tamil
Numbers of hours of
target language
3 2 3 3 3 3
DMOS 2.79 2.50 2.97 2.53 2.63 2.88
WER 3.48% 15.06% 6.61% 5.16% 16.14% 3.13%
SYSTEM ARCHITECTURE
Fig.2.Training and Synthesis phases of HMM-based speech synthesis
MATHEMATICAL MODEL
Let I = Set of Language
I = {T, S}
Where,
T is the text which is input and
S is the sound is output.
D (I) = arg max p(o/w, lambda)
Where,
Lambda represents the model parameters
o represents speech parameters and
w is the transcription of the test sentence
Syllable Rules-
Syllable is a cluster of consonants and vowel
Syllable should contain one vowel and any number of consonants.
Single vowel can act as a syllable. (I.e. V).
V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……etc.
Consonant before vowel is called „Onset‟. i.e.(C*V)
Consonant after vowel is called „Coda‟. i.e.(V*C)
Output = Pk
Where D(I) = dictionary Fuction
Pk is Phonetics
ALGORITHM
• PARAMETER GENERATION ALGORITHM
• DELAY BASED SEGMENTATION ALGORITHM
ADVANTAGES
• For people wanting to learn a new language
• For educational institutions looking to enhance student learning,
recall and comprehension
• For people wanting to learn through multiple mediums to solidify
learning
• For people with physical disabilities
• Difficulty handling a book or paper
• Visual Issue (Difficulty seeing text)
DISADVANTAGES
• Despite large improvements, Speech Synthesis can still sound a little
unnatural.
• The approaches to Speech Synthesis that yield the most natural
speech need considerable resources in terms of data storage and
processing power.
• pronunciation analysis from written text is also a major problem
APPLICATION
• Systems that provide voice synthesis output for blind users are
generally referred to as screen readers.
• Applications for the Blind
• Applications for the Deafened and Vocally Handicapped
• Educational Applications
CONCLUSION
This paper explores syllable approach to building language independent
text to speech systems for Indian Languages. The use of common
phone set, common question set and borrowing context-independent
monophone models along with syllable approach across languages
makes the procedure easier and less time-consuming, without
compromising the synthesized speech quality. Systems can be built
without even knowing the language. This is especially quite beneficial
in the Indian scenario.
REFERENCES
• [ ] A. J. Hu t a d A. W. Bla k, U it sele tio i a concatenative speech synthesis system using a
large spee h data ase, i A ousti s, Spee h, a d Sig al Pro essi g, ICASSP-96), vol. 1,
1996, pp. 373–376.
• [2] H. Zen, K. Tokuda, a d A. W. Bla k, Statisti al para etri spee h s thesis, Spee h
Communication, vol. 51, no. 3, pp. 1039–1064, November 2009.
• [3] A. Beyerlein, W. Byrne, J. M. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone,
a d W. Wa g, To ards la guage i depe de t a ousti odeli g, i Pro eedi g o A ousti s,
Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 1029–1032.
• [4] R. Bayeh, S. Lin, G. Chollet, and C. Mokbel, To ards ultili gual spee h re og itio usi g
data dri e sour e/target a ousti al u its asso iatio , i A ousti s, Spee h, a d Sig al
Processing, 2004. Pro- ceedings ICASSP ’ , ol. , , pp. I–521–4. [5] V. B. Le and L. Besacier,
First steps i fast a ousti odeli g for a e target la guage: Appli atio to Viet a ese, i
A ousti s, Spee h, a d Sig al Pro essi g, . Pro eedi gs ICASSP ’ , ol. , , pp. –
824.
• [5] P. Eswar, A rule ased approa h for spotti g hara ters fro contin- uous speech in Indian
la guages, PhD Dissertatio , I dia I stitute of Te h olog , Depart e t of Co puter S ie e
and Engg., Madras, India, 1991
THANK YOU…!!!

More Related Content

What's hot

Nltk
NltkNltk
Nltk
Anirudh
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
Tony Russell-Rose
 
How to write a blog
How to write a blogHow to write a blog
How to write a blog
sabin bhattarai
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Toine Bogers
 
Text summarization
Text summarization Text summarization
Text summarization
prateek khandelwal
 
Chapter 17 corba
Chapter 17 corbaChapter 17 corba
Chapter 17 corba
AbDul ThaYyal
 
Software Engineering ppt
Software Engineering pptSoftware Engineering ppt
Software Engineering ppt
shruths2890
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
Mustafa Jarrar
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
Jayneel Vora
 
Dynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designDynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler design
kunjan shah
 
Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
balamurugan.k Kalibalamurugan
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
Cataldo Musto
 
Soundscape for Visually Impaired
Soundscape for Visually ImpairedSoundscape for Visually Impaired
Soundscape for Visually Impaired
IRJET Journal
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
Iffat Anjum
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 
NLP
NLPNLP
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
Balayogi G
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Devashish Shanker
 
Srs for banking system
Srs for banking systemSrs for banking system
Srs for banking system
Jaydev Kishnani
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
pkaviya
 

What's hot (20)

Nltk
NltkNltk
Nltk
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
How to write a blog
How to write a blogHow to write a blog
How to write a blog
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Text summarization
Text summarization Text summarization
Text summarization
 
Chapter 17 corba
Chapter 17 corbaChapter 17 corba
Chapter 17 corba
 
Software Engineering ppt
Software Engineering pptSoftware Engineering ppt
Software Engineering ppt
 
Habash: Arabic Natural Language Processing
Habash: Arabic Natural Language ProcessingHabash: Arabic Natural Language Processing
Habash: Arabic Natural Language Processing
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Dynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designDynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler design
 
Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
 
Soundscape for Visually Impaired
Soundscape for Visually ImpairedSoundscape for Visually Impaired
Soundscape for Visually Impaired
 
Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01Lecture 10 semantic analysis 01
Lecture 10 semantic analysis 01
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP
NLPNLP
NLP
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Srs for banking system
Srs for banking systemSrs for banking system
Srs for banking system
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 

Similar to Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
iosrjce
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
IJERA Editor
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Waqas Tariq
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
Aditya Kumar Khare
 
FYPReport
FYPReportFYPReport
FYPReport
David Ferris
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...
rahulmonikasharma
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET Journal
 
English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...
IJECEIAES
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
Universidad Nacional de San Martin
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
IJERA Editor
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
inscit2006
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
ijnlc
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
Kalluri Madhuri
 
Permasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdfPermasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdf
at Poltekkes Kemenkes Surakarta
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
Ramya Nellutla
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Editor IJCATR
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
Syeful Islam
 

Similar to Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach (20)

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
 
FYPReport
FYPReportFYPReport
FYPReport
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
 
English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Permasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdfPermasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdf
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 

Recently uploaded

Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
bijceesjournal
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
bjmsejournal
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 

Recently uploaded (20)

Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...Rainfall intensity duration frequency curve statistical analysis and modeling...
Rainfall intensity duration frequency curve statistical analysis and modeling...
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Design and optimization of ion propulsion drone
Design and optimization of ion propulsion droneDesign and optimization of ion propulsion drone
Design and optimization of ion propulsion drone
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 

Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach

  • 1. Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach CLASS:M.E I COMPUTER GUIDED BY : PROF. ASHISH MANWATKAR PRESENTED BY : RAVI SHARMA ROLL NO: 15311
  • 2. CONTENT • INTRODUCTION • MOTIVATION • LITERATURE SURVEY • DATA TABLE • SYSYEM ARCHITECTURE • MATHEMATICAL MODEL • ALGORITHM • ADVANTAGES • DISADVANTAGES • APPLICATION • CONCLUSION
  • 3. INTRODUCTION • Text to Speech Synthesis- A system which takes as input a sequence of words and converts them to speech
  • 4. •Parts of Speech Synthesizers Speech Synthesizers usually consist of two parts. First Part- The first part has two major tasks. • First it takes the raw text and converts things like numbers and abbreviations into their written-out word equivalents. This process is often called text normalization. • Then it assigns phonetic transcriptions to each word, and divides and marks the text into various linguistic units like phrases, clauses, and sentences.
  • 5. • Second Part- The other part, the back end, takes the symbolic linguistic representation and converts it into actual sound output
  • 6. Text-to-phoneme challenges • Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme conversion.
  • 7. Dictionary Based approach • The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciation is stored by the program. • Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary
  • 8. Rule based approach • The other approach used for text-to-phoneme conversion is the rule- based approach, where rules for the pronunciations of words are applied to words to work out their pronunciations based on their spellings. This is similar to the "sounding out" approach to learning reading.
  • 9. • SYLLABLE RULES- Syllable is a cluster of consonants and vowel Syllable should contain one vowel and any number of consonants. 1. Single vowel can act as a syllable. (I.e. V). 2. V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……et . 3. Consonant efore o el is alled „O set‟. i.e. C*V 4. Consonant after o el is alled „Coda‟. i.e. V*C
  • 10. Syllable Rules- 1. When asals su h as / ’/, half pro ou ed / / or / / sou d succeed a vowel immediately, they would be treated as a part of the o el a d also the sa e s lla le. For e a ple, / ’/ i sa ’sthaa will be a part of syllable containing /sa/ 2. When there are three or more consonants between two consecutive vowels, the first consonant would be a part of the coda of the previous syllable while the remaining consonants would be onset of the next syllable .
  • 11. Syllable Rules- 3. When there are exactly two consonants between two vowels, the first consonant would be part of coda of previous syllable and the second would be onset of the next syllable 4. When the second consonant is a member of the set {/r/ /s/ /sh/ /shh/}, both the consonants would be a part of onset of the next syllable
  • 12. HMM synthesis • A quite new technology is speech synthesis based on HMM, a mathematical concept called Hidden Markov models. • It is a statistical method where the text-to-speech system is based on a model that is not known beforehand but it is refined by continuous training. • The technique consumes large CPU resources but very little memory. • This approach seems to give a better prosody, without glitches, and still produces very natural sounding, human-like speech
  • 13.
  • 14. MOTIVATION • There are 1652 languages in India • Building a TTS system for each of them is time-consuming and exhausting. Thus a more generic approach towards system building is required. A common framework is first designed, using which language- spe ifi systems are then built.
  • 15. LITERATURE SURVEY SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 1. An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit quality of this system is the improved naturalness in the synthesized speech An important advantage of this approach leads to reduced prosody mismatch and spectral discontinuity that occurs during syllable concatenation. Large concatenation points. This large concatenation results in glitch at the output which is hard to eliminate prosody mismatch and spectral discontinuity 2. Design and Development of a Text-To-Speech Synthesizer for Indian Languages The design and implementation of a unit selection based text-to- speech synthesizer with syllables and polysyllables as units of concatenation improves synthesis quality and it reduces search space improving the synthesis timing. it is not clear at the time of writing, how spectral interpolation will be performed at the boundaries
  • 16. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 3. Development of Speech Database for Hindi Text-To- Speech System Considering Syllable as a Basic Unit convert an orthographic text into intelligible and natural sounding speech This technique provides very high quality speech output which is reasonably natural and equivalent to voice of the original speaker. before synthesizing pre-processing of text is required 4. Text-to-Speech Synthesis using syllable-like units the design of a syllable based concatenative waveform synthesizer for Indian languages. the automatic segmentation algorithm has in- deed created a useful speech unit that has low target and concatenation costs. current work uses a single unique syllable-like unit from the repository for synthesis.
  • 17. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 5. Statistical parametric speech synthesis generating acceptable speech synthesis a variety of speaking styles or emotional speech can be synthesized using the small amount of speech data. quality of synthesized speech factors which degrade the Quality: vocoder, modeling accuracy, and over- smoothing. 6. Unit selection in a concatenative speech synthesis system using a large speech database the generation of natural- sounding synthesized speech waveforms produce more natural speech there is little difference in the quality of out- put using the two training method
  • 18. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 7. An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit quality of this system is the improved naturalness in the synthesized speech and gives very high quality speech output when compared to other synthesizing techniques An important advantage of this approach leads to reduced prosody mismatch and spectral discontinuity that occurs during syllable concatenation. Large concatenation points. This large concatenation results in glitch at the output which is hard to eliminate prosody mismatch and spectral discontinuity.
  • 19. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 8. A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages high-quality synthetic speech concatenates pre- recorded speech units in the database such that the target and concatenation costs are minimized. to obtain high- quality synthetic speech, the size of the database required is large, to ensure that sufficient examples for each unit in every possible context is available
  • 20. DATA TABLE TABLE I: Degradation MOS (DMOS) and Word error rate (WER) scores Target Language Marathi Bengali Tamil Tamil Telugu Malayalam Source Language Hindi Hindi Tamil Hindi Tamil Tamil Numbers of hours of target language 3 2 3 3 3 3 DMOS 2.79 2.50 2.97 2.53 2.63 2.88 WER 3.48% 15.06% 6.61% 5.16% 16.14% 3.13%
  • 21. SYSTEM ARCHITECTURE Fig.2.Training and Synthesis phases of HMM-based speech synthesis
  • 22. MATHEMATICAL MODEL Let I = Set of Language I = {T, S} Where, T is the text which is input and S is the sound is output. D (I) = arg max p(o/w, lambda) Where, Lambda represents the model parameters o represents speech parameters and w is the transcription of the test sentence
  • 23. Syllable Rules- Syllable is a cluster of consonants and vowel Syllable should contain one vowel and any number of consonants. Single vowel can act as a syllable. (I.e. V). V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……etc. Consonant before vowel is called „Onset‟. i.e.(C*V) Consonant after vowel is called „Coda‟. i.e.(V*C) Output = Pk Where D(I) = dictionary Fuction Pk is Phonetics
  • 24. ALGORITHM • PARAMETER GENERATION ALGORITHM • DELAY BASED SEGMENTATION ALGORITHM
  • 25. ADVANTAGES • For people wanting to learn a new language • For educational institutions looking to enhance student learning, recall and comprehension • For people wanting to learn through multiple mediums to solidify learning • For people with physical disabilities • Difficulty handling a book or paper • Visual Issue (Difficulty seeing text)
  • 26. DISADVANTAGES • Despite large improvements, Speech Synthesis can still sound a little unnatural. • The approaches to Speech Synthesis that yield the most natural speech need considerable resources in terms of data storage and processing power. • pronunciation analysis from written text is also a major problem
  • 27. APPLICATION • Systems that provide voice synthesis output for blind users are generally referred to as screen readers. • Applications for the Blind • Applications for the Deafened and Vocally Handicapped • Educational Applications
  • 28. CONCLUSION This paper explores syllable approach to building language independent text to speech systems for Indian Languages. The use of common phone set, common question set and borrowing context-independent monophone models along with syllable approach across languages makes the procedure easier and less time-consuming, without compromising the synthesized speech quality. Systems can be built without even knowing the language. This is especially quite beneficial in the Indian scenario.
  • 29. REFERENCES • [ ] A. J. Hu t a d A. W. Bla k, U it sele tio i a concatenative speech synthesis system using a large spee h data ase, i A ousti s, Spee h, a d Sig al Pro essi g, ICASSP-96), vol. 1, 1996, pp. 373–376. • [2] H. Zen, K. Tokuda, a d A. W. Bla k, Statisti al para etri spee h s thesis, Spee h Communication, vol. 51, no. 3, pp. 1039–1064, November 2009. • [3] A. Beyerlein, W. Byrne, J. M. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone, a d W. Wa g, To ards la guage i depe de t a ousti odeli g, i Pro eedi g o A ousti s, Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 1029–1032. • [4] R. Bayeh, S. Lin, G. Chollet, and C. Mokbel, To ards ultili gual spee h re og itio usi g data dri e sour e/target a ousti al u its asso iatio , i A ousti s, Spee h, a d Sig al Processing, 2004. Pro- ceedings ICASSP ’ , ol. , , pp. I–521–4. [5] V. B. Le and L. Besacier, First steps i fast a ousti odeli g for a e target la guage: Appli atio to Viet a ese, i A ousti s, Spee h, a d Sig al Pro essi g, . Pro eedi gs ICASSP ’ , ol. , , pp. – 824. • [5] P. Eswar, A rule ased approa h for spotti g hara ters fro contin- uous speech in Indian la guages, PhD Dissertatio , I dia I stitute of Te h olog , Depart e t of Co puter S ie e and Engg., Madras, India, 1991