SlideShare a Scribd company logo
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org
Page | 14
Paper Publications
Segmentation Words for Speech Synthesis in
Persian Language Based On Silence
Sohrab Hojjatkhah, Ali Jowharpour
Islamic Azad university, Dehdasht branch
Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from
recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of
speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit
from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon
joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but
extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and
fluidity are better.
Keywords: TTS, SBS, Sillable, Diphone.
1- INTRODUCTION
The correct pronunciation of unknown or novel words is one of the biggest challenges for text-to-speech (TTS) systems.
Generally speaking, correct analysis and pronunciation can only be guaranteed if a direct match exists between an input string
and a corresponding entry in a pronunciation dictionary or morphologically annotated lexicon. However, any well-formed
text input to a general-purpose TTS system in any language is extremely likely to contain words that are not explicitly listed
in the lexicon. lexicon entries of every language is unbounded; Thus, in unlimited vocabulary scenarios we are not facing a
memory or storage problem but the requirement for the TTS system to correctly analyze unseen orthographic strings. The
Persian language is notorious for its extensive use of compounds. What makes this a challenge for linguistic analysis is the
fact that compounding is extraordinarily productive. Linguistic analysis has to provide a mechanism to appropriately
decompose compounds and, more generally, to handle unknown words. Most of the existing commercial speech synthesis
systems can be classified as either formant synthesizers [1,2,3] or concatenation synthesizers [4,5]. Formant synthesizers,
which are usually controlled by rules, have the advantage of having small footprints at the expense of the quality and
naturalness of the synthesized speech [6]. On the other hand, concatenative speech synthesis, using large speech databases,
has become popular due to its ability to produce high quality natural speech output [7]. The large footprints of these systems
do not present a practical problem for applications where the synthesis engine runs on a server with enough computational
power and sufficient storage [7]. In Persian language exist short vowel and long vowel but both kinds assume same in TTS
systems. This paper isolate short & long vowels. This property is used to get components of word that restrict by silent in
word's pronounciation. This method can be concatenative rule.
2- RELATED WORK
2.1 SYLLABLE BASED TTS SYSTEM
Syllable is one of the method, which is used for developing a text-to-speech system. Various languages have different
patterns for syllable. In most of these languages, there are many patterns for syllable and therefore, the number of syllables is
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org
Page | 15
Paper Publications
large; so usually syllable is not used in all purpose TTS systems. For example, there are more than 15000 syllables in English
[6]. Creating a database for this number of units is a very difficult and time-consuming task.
In some languages, the number of syllable patterns is limited, so the number of syllables is small, and creating a database for
them is reasonable; therefore this unit can be used in all-purpose TTS systems. For example, Indian language has CV, CCV,
VC, and CVC syllable patterns, and the total number of syllables in this language is 10000[1].Syllable is used in some
Persian TTS systems, too. This language has only CV, CVC, and CVCC patterns for its syllables and so, its syllable number
is limited to 4000 [10].
2.2 DIPHONE BASED TTS SYSTEM
Nowadays diphone is the most popular unit in synthesis systems. Diphones include a transition part from the first unit to the
next unit, and so, have a more desirable quality rather than other units. Also, in some modern systems, a combination of this
unit and other methods such as unit selection are used. Persian has 23 phonemes, so as an upper hand estimate, it has about
900 diphones.
2.3 Allophone based TTS system
Naturalness within the allophone, especially vowels; much fewer units need be prerecorded than with other choices Good
articulation between most allophones is difficult to achieve, making individual words unintelligible; stop consonantal
allophones are difficult to isolate for prerecording.
Hypothetically, allophones appear to be the perfect unit for concatenation. A few hundred of them will serve as the basic
building blocks for all utterances. The allophonic database is a set of allophone wav files, each file being named accounting
the allophone itself and its phonetic context. According to the input text, proper allophones from database have been chosen
and concatenated to obtain the primary output.[9]
3- SILENCE BASED SEGMENTATION (SBS)
This paper uses silence to word segmentation and replace properly sound for each segment. Because if silent in speech be
according to silent in play sound by computer then speech is more natural. For example look at the below word "‫"مناظره‬
(Monâzere). If this word become segmentary by syllable will be "Mo_na_ze_re" that for speech needed four piece recorded
sound. Thus concatenatenation of these parts decrease naturalness of speech. So must to try number of parts become
minimum. When it is said "Monâzere" ,intonation is "Monâ",silent, "zere" then in TTS systems it is better that segmentation
to do like to natural intonation.
Here explain that how do segmentation. The syllable structure variety of Farsi is limited to only three types: CV, CVC and
CVCC. Among all of syllables, which can be constructed from 23 consonants and 6 vowels of Farsi in the above three
syllable types (structures), only about 4000 syllables are used in Farsi words and the others are not used [10]. In this paper
vowels divide to kind, short vowel and long vowel. Short vowels don’t show in written text but long vowels are written in
text. Short vowel show with "V" and long vowel with "W". the boundaries between parts are four types "WCV",
"WVW","CCV" and "CCW" that word is isolated after first letter in types. Finding these types help to see silent in the word.
If the verb "Monâzere" is given to below function (iso) it will gives "CVCWCVCV" string that "WCV" is position of word's
division. Therefore "CVCWCVCV" divide to two substring "CVCW" and "CVCV" or "Montazer" become "CVCCVCVC"
that divide to "CVC" and "CVCV". Each unit isolated to another by silent in speech. See iso function
function iso(s:string):string;
var
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org
Page | 16
Paper Publications
ss:string;
kk:byte;
begin
ss:='';
for kk:=1 to length(s) do
begin
if pos(s[kk],c)<>0 then ss:=ss+'C'; {if kk's letter in s is consonant then add "C" to result string}
if pos(s[kk],v)<>0 then ss:=ss+'V'; {if kk's letter in s is short vowel then add "V" to result string}
if pos(s[kk],Ww)<>0 then ss:=ss+'W'; {if kk's letter in s is long vowel then add "W" to result string}
end;
iso:=ss;
end ;
after segmentation use to database that contain of all Persian parts of verbs. The number of units in database is more than
syllable rule but its utilization as the basic unit of concatenation will reduce the necessary smoothing process.
figure 1 shows segmentation by silence.
Fig 1: position of signal cutting
In SBS method number of parts that must is stored in signal dictionary increase, but in concatenate phase operation is less.
4. EVALUATION SBS METHOD
In this section compares SBS and syllables and diphones based segmentation. The evaluation do by MOS method on five
person (3 men, 2 women). The persons hear produced word by different methods and determine score between 1-5 that 1 is
bad and 5 is excellent. The results are shown in table 1.
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org
Page | 17
Paper Publications
Table 1. results of MOS test
Table2. most test on triphone model and dicision tree [11]
The result is acceptable. In table 2 is seen that syllable based method has been concatenated by PSOLA algorithm and
intelligibility of SBS is properly.
REFERENCES
[1]. W.Barkhoda, B.ZahirAzami, O-K.Shahryari1. A comparison between allophone, syllable, and diphone based TTS
systems for Kurdish language.
[2]. T. Styger and E. Keller, Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art,
and Future Challenges Formant synthesis, In Keller E. (ed.), 109-128, Chichester: John Wiley, 1994.
[3]. D. H. Klatt, "Software for a Cascade/Parallel Formant Synthesizer,” Journal of the Acoustical Society of America, Vol
67, 971-995,1980.
[4] W. Hamza, Arabic Speech Synthesis Using Large Speech Database, PhD. thesis, Cairo University, Electronics and
Communications Engineering Department, 2000.
[5] R. E. Donovan , Trainable Speech Synthesis, PhD. thesis, Cambridge University, Engineering Department, 1996.
[6] S. Lemmetty, Review of Speech Synthesis Technology, M.Sc Thesis, Helsinki University of Technology, Department of
Electrical and Communications Engineering, 1999.
[7] A. Youssef , et al, "An Arabic TTS System Based on the IBM Trainable Speech Synthesizer,” Le traitement automatique
de l’arabe, JEP–TALN 2004, Fès, 2004.
[8] H. R. Abutalebi and M. Bijankhan, "Implementation of a Text-to- Speech System for Farsi Language," Sixth
International Conference on Spoken Language Processing (ISCA), 2000.
[9] A. Sharifova .A COMPARISON BETWEEN ALLOPHONE, SYLLABLE, AND DIPHONE BASED TTS SYSTEMS
FOR AZERBAIJAN LANGUAGE Cybernetic Institute of Azerbaijan National Academy of Sciences 29, F.Agayev str.,
AZ1141, Baku, Azerbaijan
[10].Y. Samareh. Phonetics of Farsi Language, Iran, Tehran, Academic Press Center, 1995. (in Farsi)
[11]. Mm.Homayoonpour , musavi.sm, produce speech syntheses parameter in Persian by HMM and decision tree.2004.
FluidityNaturalnessIntelligibilityAudience
435M1
344M2
435M3
435W1
445W2
3.83.64.8Mean
FluidityNaturalnessIntelligibilityApplied method
4.14.44.2Decision tree
3.53.93.8Triphone model

More Related Content

What's hot

ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ijnlc
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
Andi Wu
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
ijnlc
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
kevig
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
kevig
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
Shashank Shisodia
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
IJERA Editor
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
cscpconf
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
IOSR Journals
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
IJERA Editor
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
IJECEIAES
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
IDES Editor
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
Seid Hassen
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
inscit2006
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
ijaia
 
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment MethodA Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
dannyijwest
 
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHODA SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
IJwest
 

What's hot (17)

ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
Statistically-Enhanced New Word Identification
Statistically-Enhanced New Word IdentificationStatistically-Enhanced New Word Identification
Statistically-Enhanced New Word Identification
 
Word sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy wordsWord sense disambiguation using wsd specific wordnet of polysemy words
Word sense disambiguation using wsd specific wordnet of polysemy words
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGEADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
ADVANCEMENTS ON NLP APPLICATIONS FOR MANIPURI LANGUAGE
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
 
Amharic WSD using WordNet
Amharic WSD using WordNetAmharic WSD using WordNet
Amharic WSD using WordNet
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment MethodA Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
A Self-Supervised Tibetan-Chinese Vocabulary Alignment Method
 
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHODA SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
A SELF-SUPERVISED TIBETAN-CHINESE VOCABULARY ALIGNMENT METHOD
 

Viewers also liked

Face Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image ProcessingFace Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image Processing
paperpublications3
 
Dentist near me
Dentist near meDentist near me
Dentist near me
mydentalplanet
 
спинальная и эпидуральная анестезия
спинальная и эпидуральная анестезияспинальная и эпидуральная анестезия
спинальная и эпидуральная анестезия
Slava Kolomak
 
Recreacion
RecreacionRecreacion
Recreacion
Rebeca Salcedo
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
paperpublications3
 
Anthony Norris Fanatics Resume
Anthony Norris Fanatics ResumeAnthony Norris Fanatics Resume
Anthony Norris Fanatics Resume
Anthony Norris
 
Sarmistha Chakraborty_2017 CV
Sarmistha Chakraborty_2017 CVSarmistha Chakraborty_2017 CV
Sarmistha Chakraborty_2017 CV
Sarmistha Chakraborty
 
Analisis de tendencias pedagogicas g3-4
Analisis de tendencias pedagogicas g3-4Analisis de tendencias pedagogicas g3-4
Analisis de tendencias pedagogicas g3-4
MIRIAMYANETH VALENCIA GAMBOA
 
Shane Bruce Constable - Resume
Shane Bruce Constable - ResumeShane Bruce Constable - Resume
Shane Bruce Constable - Resume
shane constable
 
Ms system
Ms systemMs system
Ms system
sahayoga_diploma
 
Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์
Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์
Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์
nawaporn khamseanwong
 
Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์
Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์
Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์
nawaporn khamseanwong
 
Artikel sistem operasi
Artikel sistem operasiArtikel sistem operasi
Artikel sistem operasi
jesicha alexya harbeach
 

Viewers also liked (13)

Face Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image ProcessingFace Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image Processing
 
Dentist near me
Dentist near meDentist near me
Dentist near me
 
спинальная и эпидуральная анестезия
спинальная и эпидуральная анестезияспинальная и эпидуральная анестезия
спинальная и эпидуральная анестезия
 
Recreacion
RecreacionRecreacion
Recreacion
 
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming ProblemPenalty Function Method For Solving Fuzzy Nonlinear Programming Problem
Penalty Function Method For Solving Fuzzy Nonlinear Programming Problem
 
Anthony Norris Fanatics Resume
Anthony Norris Fanatics ResumeAnthony Norris Fanatics Resume
Anthony Norris Fanatics Resume
 
Sarmistha Chakraborty_2017 CV
Sarmistha Chakraborty_2017 CVSarmistha Chakraborty_2017 CV
Sarmistha Chakraborty_2017 CV
 
Analisis de tendencias pedagogicas g3-4
Analisis de tendencias pedagogicas g3-4Analisis de tendencias pedagogicas g3-4
Analisis de tendencias pedagogicas g3-4
 
Shane Bruce Constable - Resume
Shane Bruce Constable - ResumeShane Bruce Constable - Resume
Shane Bruce Constable - Resume
 
Ms system
Ms systemMs system
Ms system
 
Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์
Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์
Loadแนวข้อสอบ พยาบาลวิชาชีพ กรมการแพทย์
 
Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์
Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์
Loadแนวข้อสอบ นักวิชาการสาธารณสุข กรมการแพทย์
 
Artikel sistem operasi
Artikel sistem operasiArtikel sistem operasi
Artikel sistem operasi
 

Similar to Segmentation Words for Speech Synthesis in Persian Language Based On Silence

Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
Daniel Adenew
 
F017163443
F017163443F017163443
F017163443
IOSR Journals
 
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMSTANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
ijnlc
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
ijma
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
ijma
 
Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...
ijnlc
 
B047006011
B047006011B047006011
B047006011
inventy
 
B047006011
B047006011B047006011
B047006011
inventy
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Editor IJCATR
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
ijnlc
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Waqas Tariq
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
iosrjce
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
bikashtaly
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGING
kevig
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
IJRAT
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
shrey bhate
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
gerogepatton
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
gerogepatton
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
ijaia
 
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
CSCJournals
 

Similar to Segmentation Words for Speech Synthesis in Persian Language Based On Silence (20)

Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
F017163443
F017163443F017163443
F017163443
 
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORMSTANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
STANDARD ARABIC VERBS INFLECTIONS USING NOOJ PLATFORM
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
 
Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...Identification of prosodic features of punjabi for enhancing the pronunciatio...
Identification of prosodic features of punjabi for enhancing the pronunciatio...
 
B047006011
B047006011B047006011
B047006011
 
B047006011
B047006011B047006011
B047006011
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
SETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGINGSETSWANA PART OF SPEECH TAGGING
SETSWANA PART OF SPEECH TAGGING
 
Paper id 25201466
Paper id 25201466Paper id 25201466
Paper id 25201466
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 

Segmentation Words for Speech Synthesis in Persian Language Based On Silence

  • 1. International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org Page | 14 Paper Publications Segmentation Words for Speech Synthesis in Persian Language Based On Silence Sohrab Hojjatkhah, Ali Jowharpour Islamic Azad university, Dehdasht branch Abstract: In speech synthesis in text to speech systems, the words usually break to different parts and use from recorded sound of each part for play words. This paper use silent in word's pronunciation for better quality of speech. Most algorithms divide words to syllable and some of them divide words to phoneme, but This paper benefit from silent in intonation and divide words at silent region and then set equivalent sound of each parts whereupon joining the parts is trusty and speech quality being more smooth . this paper concern Persian language but extendable to another language. This method has been tested with MOS test and intelligibility, naturalness and fluidity are better. Keywords: TTS, SBS, Sillable, Diphone. 1- INTRODUCTION The correct pronunciation of unknown or novel words is one of the biggest challenges for text-to-speech (TTS) systems. Generally speaking, correct analysis and pronunciation can only be guaranteed if a direct match exists between an input string and a corresponding entry in a pronunciation dictionary or morphologically annotated lexicon. However, any well-formed text input to a general-purpose TTS system in any language is extremely likely to contain words that are not explicitly listed in the lexicon. lexicon entries of every language is unbounded; Thus, in unlimited vocabulary scenarios we are not facing a memory or storage problem but the requirement for the TTS system to correctly analyze unseen orthographic strings. The Persian language is notorious for its extensive use of compounds. What makes this a challenge for linguistic analysis is the fact that compounding is extraordinarily productive. Linguistic analysis has to provide a mechanism to appropriately decompose compounds and, more generally, to handle unknown words. Most of the existing commercial speech synthesis systems can be classified as either formant synthesizers [1,2,3] or concatenation synthesizers [4,5]. Formant synthesizers, which are usually controlled by rules, have the advantage of having small footprints at the expense of the quality and naturalness of the synthesized speech [6]. On the other hand, concatenative speech synthesis, using large speech databases, has become popular due to its ability to produce high quality natural speech output [7]. The large footprints of these systems do not present a practical problem for applications where the synthesis engine runs on a server with enough computational power and sufficient storage [7]. In Persian language exist short vowel and long vowel but both kinds assume same in TTS systems. This paper isolate short & long vowels. This property is used to get components of word that restrict by silent in word's pronounciation. This method can be concatenative rule. 2- RELATED WORK 2.1 SYLLABLE BASED TTS SYSTEM Syllable is one of the method, which is used for developing a text-to-speech system. Various languages have different patterns for syllable. In most of these languages, there are many patterns for syllable and therefore, the number of syllables is
  • 2. International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org Page | 15 Paper Publications large; so usually syllable is not used in all purpose TTS systems. For example, there are more than 15000 syllables in English [6]. Creating a database for this number of units is a very difficult and time-consuming task. In some languages, the number of syllable patterns is limited, so the number of syllables is small, and creating a database for them is reasonable; therefore this unit can be used in all-purpose TTS systems. For example, Indian language has CV, CCV, VC, and CVC syllable patterns, and the total number of syllables in this language is 10000[1].Syllable is used in some Persian TTS systems, too. This language has only CV, CVC, and CVCC patterns for its syllables and so, its syllable number is limited to 4000 [10]. 2.2 DIPHONE BASED TTS SYSTEM Nowadays diphone is the most popular unit in synthesis systems. Diphones include a transition part from the first unit to the next unit, and so, have a more desirable quality rather than other units. Also, in some modern systems, a combination of this unit and other methods such as unit selection are used. Persian has 23 phonemes, so as an upper hand estimate, it has about 900 diphones. 2.3 Allophone based TTS system Naturalness within the allophone, especially vowels; much fewer units need be prerecorded than with other choices Good articulation between most allophones is difficult to achieve, making individual words unintelligible; stop consonantal allophones are difficult to isolate for prerecording. Hypothetically, allophones appear to be the perfect unit for concatenation. A few hundred of them will serve as the basic building blocks for all utterances. The allophonic database is a set of allophone wav files, each file being named accounting the allophone itself and its phonetic context. According to the input text, proper allophones from database have been chosen and concatenated to obtain the primary output.[9] 3- SILENCE BASED SEGMENTATION (SBS) This paper uses silence to word segmentation and replace properly sound for each segment. Because if silent in speech be according to silent in play sound by computer then speech is more natural. For example look at the below word "‫"مناظره‬ (Monâzere). If this word become segmentary by syllable will be "Mo_na_ze_re" that for speech needed four piece recorded sound. Thus concatenatenation of these parts decrease naturalness of speech. So must to try number of parts become minimum. When it is said "Monâzere" ,intonation is "Monâ",silent, "zere" then in TTS systems it is better that segmentation to do like to natural intonation. Here explain that how do segmentation. The syllable structure variety of Farsi is limited to only three types: CV, CVC and CVCC. Among all of syllables, which can be constructed from 23 consonants and 6 vowels of Farsi in the above three syllable types (structures), only about 4000 syllables are used in Farsi words and the others are not used [10]. In this paper vowels divide to kind, short vowel and long vowel. Short vowels don’t show in written text but long vowels are written in text. Short vowel show with "V" and long vowel with "W". the boundaries between parts are four types "WCV", "WVW","CCV" and "CCW" that word is isolated after first letter in types. Finding these types help to see silent in the word. If the verb "Monâzere" is given to below function (iso) it will gives "CVCWCVCV" string that "WCV" is position of word's division. Therefore "CVCWCVCV" divide to two substring "CVCW" and "CVCV" or "Montazer" become "CVCCVCVC" that divide to "CVC" and "CVCV". Each unit isolated to another by silent in speech. See iso function function iso(s:string):string; var
  • 3. International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org Page | 16 Paper Publications ss:string; kk:byte; begin ss:=''; for kk:=1 to length(s) do begin if pos(s[kk],c)<>0 then ss:=ss+'C'; {if kk's letter in s is consonant then add "C" to result string} if pos(s[kk],v)<>0 then ss:=ss+'V'; {if kk's letter in s is short vowel then add "V" to result string} if pos(s[kk],Ww)<>0 then ss:=ss+'W'; {if kk's letter in s is long vowel then add "W" to result string} end; iso:=ss; end ; after segmentation use to database that contain of all Persian parts of verbs. The number of units in database is more than syllable rule but its utilization as the basic unit of concatenation will reduce the necessary smoothing process. figure 1 shows segmentation by silence. Fig 1: position of signal cutting In SBS method number of parts that must is stored in signal dictionary increase, but in concatenate phase operation is less. 4. EVALUATION SBS METHOD In this section compares SBS and syllables and diphones based segmentation. The evaluation do by MOS method on five person (3 men, 2 women). The persons hear produced word by different methods and determine score between 1-5 that 1 is bad and 5 is excellent. The results are shown in table 1.
  • 4. International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 1, Issue 1, pp: (14-17), Month: April - June 2014, Available at: www.paperpublications.org Page | 17 Paper Publications Table 1. results of MOS test Table2. most test on triphone model and dicision tree [11] The result is acceptable. In table 2 is seen that syllable based method has been concatenated by PSOLA algorithm and intelligibility of SBS is properly. REFERENCES [1]. W.Barkhoda, B.ZahirAzami, O-K.Shahryari1. A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language. [2]. T. Styger and E. Keller, Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts, State of the Art, and Future Challenges Formant synthesis, In Keller E. (ed.), 109-128, Chichester: John Wiley, 1994. [3]. D. H. Klatt, "Software for a Cascade/Parallel Formant Synthesizer,” Journal of the Acoustical Society of America, Vol 67, 971-995,1980. [4] W. Hamza, Arabic Speech Synthesis Using Large Speech Database, PhD. thesis, Cairo University, Electronics and Communications Engineering Department, 2000. [5] R. E. Donovan , Trainable Speech Synthesis, PhD. thesis, Cambridge University, Engineering Department, 1996. [6] S. Lemmetty, Review of Speech Synthesis Technology, M.Sc Thesis, Helsinki University of Technology, Department of Electrical and Communications Engineering, 1999. [7] A. Youssef , et al, "An Arabic TTS System Based on the IBM Trainable Speech Synthesizer,” Le traitement automatique de l’arabe, JEP–TALN 2004, Fès, 2004. [8] H. R. Abutalebi and M. Bijankhan, "Implementation of a Text-to- Speech System for Farsi Language," Sixth International Conference on Spoken Language Processing (ISCA), 2000. [9] A. Sharifova .A COMPARISON BETWEEN ALLOPHONE, SYLLABLE, AND DIPHONE BASED TTS SYSTEMS FOR AZERBAIJAN LANGUAGE Cybernetic Institute of Azerbaijan National Academy of Sciences 29, F.Agayev str., AZ1141, Baku, Azerbaijan [10].Y. Samareh. Phonetics of Farsi Language, Iran, Tehran, Academic Press Center, 1995. (in Farsi) [11]. Mm.Homayoonpour , musavi.sm, produce speech syntheses parameter in Persian by HMM and decision tree.2004. FluidityNaturalnessIntelligibilityAudience 435M1 344M2 435M3 435W1 445W2 3.83.64.8Mean FluidityNaturalnessIntelligibilityApplied method 4.14.44.2Decision tree 3.53.93.8Triphone model