SlideShare a Scribd company logo
1 of 10
Download to read offline
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
DOI : 10.5121/sipij.2011.2403 27
High Quality Arabic Concatenative Speech
Synthesis
Abdelkader Chabchoub and Adnan Cherif
Signal Processing Laboratory, Science Faculty of Tunis, 1060 Tunisia
achabchoub@yahoo.fr, adnen2fr@yahoo.fr
ABSTRACT
This paper describes the implementation of TD-PSOLA tools to improve the quality of the Arabic Text-to-
speech (TTS) system. This system based on Diphone concatenation with TD-PSOLA modifier synthesizer.
This paper describes techniques to improve the precision of prosodic modifications in the Arabic speech
synthesis using the TD-PSOLA (Time Domain Pitch Synchronous Overlap-Add) method. This approach is
based on the decomposition of the signal into overlapping frames synchronized with the pitch period. The
main objective is to preserve the consistency and accuracy of the pitch marks after prosodic modifications
of the speech signal and diphone with vowel integrated database adjustment and optimisation.
KEYWORDS
Speech processing and synthesis, Arabic speech, prosody, diphone, spectrum analysis, pitch mark, timbre,
TD-PSOLA.
1. INTRODUCTION
The synthetic voice that imitates human speech from plain text is not a trivial task, since this
generally requires great knowledge about the real world, the language, the context where the text
comes from, a deep understanding of the semantics of the text content and the relations that
underlie all these information. However, many research and commercial speech synthesis systems
developed have contributed to our understanding of all these phenomena, and have been
successful in various respective ways for many applications such as in human-machine
interaction, hands and eyes free access of information, interactive voice response systems.
There have been three major approaches to speech synthesis: articulatory, formant and
concatenative [1] [2] [3] [4]. Articulatory synthesis tries to model the human articulatory system,
i.e. vocal cords, vocal tract, etc. Formant synthesis employs some set of rules to synthesize speech
using the formants that are the resonance frequencies of the vocal tract[19]. Since the formants
constitute the main frequencies that make sounds distinct, speech is synthesized using these
estimated frequencies. Several speech synthesis systems were developed such as vocoders and
LPC synthesizers [5][6], but most of them did not reproduce high quality of synthetic speech
when compared with that of PSOLA based systems [7] such as MBROLA synthesizers[8].
Especially TD-PSOLA method (Time Domain Pitch Synchronous Overlap-Add) is the most
efficient method to produce criteria of satisfaction speech [9] and is one of the most popular
concatenation synthesis techniques nowadays. LP-PSOLA (Linear Predictive PSOLA) and FD-
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
28
PSOLA (Frequency Domain PSOLA), though able to produce equivalent result, require much
more computational power. The first step of the TD-PSOLA is to perform a pitch detection
algorithm and to generate pitch marks through overlapping windowed speech. To synthesize
speech, the Short Time signals (ST signals) are simply overlapped and added with desired
spacing of the ST-signals.
2. THE ARABIC DATABASE
2.1. Introduction for Arabic language
The Arabic language is spoken throughout the Arab world and is the liturgical language of Islam.
This means that Arabic is known widely by all Muslims in the world. Arabic either refers to
Standard Arabic or to the many dialectal variations of Arabic. Standard Arabic is the language
used by media and the language of Qur’an. Modern Standard Arabic is generally adopted as the
common medium of communication through the Arab world today. Dialectal Arabic refers to the
dialects derived from Classical Arabic [10]. These dialects differ sometimes which means that it
is hard and a challenge for a Lebanese to understand an Algerian and it is worth mentioning there
is even a difference within the same country.
Standard Arabic has 34 basic phonemes, of which six are vowels, and 28 are consonants [11].
Several factors affect the pronunciation of phonemes. An example is the position of the phoneme
in the syllable as initial, closing, intervocalic, or suffix. The pronunciation of consonants may also
be influenced by the interaction (co-articulation) with other phonemes in the same syllable.
Among these coarticulation effects are the accentuation and the nasalization. Arabic vowels are
affected as well by the adjacent phonemes. Accordingly, each Arabic vowel has at least three
allophones, the normal, the accentuated, and the nasalized allophone. In classic Arabic, we can
divide the Arabic consonants into three categories with respect to dilution and accentuation [12].
Arabic language has five syllable patterns: CV, CW, CVC, CWC and CCV, where C represents a
consonant, V represents a vowel and W represents a long vowel.
Table 1. Arabic Consonants and Vowels and their phonetic notations SAMPA
Code
SAMP
A
Graphem
es
Code
SAMPA
Graphem
es
Code
SAMP
A
Graphem
es
Code
SAMP
A
Graphem
es
j
‫ي‬
G
‫غ‬
r
‫ر‬
‫ء‬
a
َ◌
f
‫ف‬
z
‫ز‬
b
‫ب‬
u
ُ◌
q
‫ق‬
s
‫س‬
t
‫ت‬
i
ِ◌
K
‫ك‬
S
‫ش‬
T
‫ث‬
an
ً◌
l
‫ل‬
s.
‫ص‬
Z
‫ج‬
un
ٌ◌
m
‫م‬
d.
‫ض‬
X
‫ح‬
in
ٍ◌
n
‫ن‬
t.
‫ط‬
X
‫خ‬
h
‫ه‬
z.
‫ظ‬
d
‫د‬
w
‫و‬
H
‫ع‬
D
‫ذ‬
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
29
2.2. Database construction
The first step in constructing a diphone database for Arabic is to determine all possible diphone
pairs of Arabic. In general, the typical diphone size is the square of the phone number for any
language [13]. In reality, additional sound segments and various allophonic variations may in
some cases be also included. The basic idea is to define classes of diphones, for example: vowel-
consonant, consonant- vowel, vowel-vowel, and consonant-consonant.
The syllabic structure of Arabic language is exploited here to simplify the required diphones
database. The proposed sound segments may be considered as "sub-syllabic" units [10]. For good
quality, the diphones boundaries are taken from the middle portion of vowels. Because diphones
need to be clearly articulated various techniques have been proposed to extract them from
subjects. One technique uses words within carrier sentences to ensure that the diphones are
pronounced with acceptable duration and prosody[20] (i.e. consistent). Ideally, the diphones
should come from a middle syllable of nonsense words so it is fully articulated and minimize the
articulatory effects at the start and end of the word [14].
The second step is to record the corpus, this recording made by a native speaker of Arabic
standard cardioids microphone with a high quality flat frequency response. The signal was
sampled at 16 kHz and 16 bit.
Finally Segmentation and annotation, the database registered must be prepared for the selection
method has all the information necessary for its operation. The base is first segmented into
phones, in second step to diphones. This was handmade by the studio diphone software developed
by the laboratory TCTS of Mons. A correction on the units to ensure quality was made by the
software Praat (Boers and my Weening, 2008).Prosodic analysis performed on the corrected
signal to determine the pitch and duration of phone.
The result of this segmentation provides a significant reduction of unites, all units not exceeding
400 (diphones, phonemes and phones), comparisons with other basis developed by other
laboratories; Chanfour CENT laboratory Rabat Faculty of Sciences, a database of diphones S.
Baloul thesis LIUM Mans France and a base of diphones by Noufel Tounsi, laboratory TCTS
Mons.The code SAMPA (Speech Assessment Method Phonetic Alphabet) used for
transformation grapheme phoneme.
3. SPEECH ANALYSIS AND SYNTHESIS
This section will describe the procedures of synchronous analysis and synthesis using TD-
PSOLA modifier Figure2 presents the block diagram of these two stages.
3.1. Speech analysis
The first step in the speech analysis is to filter the speech signal by a RIF filter (pre-accentuation).
The next step is to provide a sequence of pitch-marks and voiced/unvoiced classification for each
segment between two consecutive pitch marks. This decision is based on the zero-crossing and
the short time energy Figure1. A coefficient of voicement (v/uv) can be computed in order to
quantize the periodicity of the signal [15].
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
30
3.1.1. Segmentation
The segmentation of a speech signal is used in order to identify the voiced and un-voiced frames.
This classification is based on the zero-crossing ratio and the energy value of each signal frame
Figure 1. Automatic segmentation of Arabic speech (a) « ‫باب‬
; babun» (b) «‫شمس‬chamsun»
This segmentation is used in order to identify the voiced and unvoiced frames.
3.1.2. Speech marks
Different procedures of placed [ ]
i
ta are used according to the local features of components of
the signal. A previous segmentation of the signal in identical feature zones permits to orient the
marking toward the suitable method. Besides results of this segmentation will be necessary for
the synthesis stage.
3.1.2.1. Reading marks
The idea of our algorithm is to select pitch marks among local extrema of the speech signal.
Given a set of mark candidates which all are negative peaks or all positive peaks:
[ ] )
(
).....
(
)...
1
(
)
( N
t
i
t
t
i
t
T a
a
a
a
a =
=
where ( )
i
ta is the sample of the ith
peak, and N the number of peaks extracted ([16] explain how
these candidates are found).Pitch marks are a subset of points out of a
T , which are spaced by
periods of pitch given by the pitch extraction algorithm. The selection can be represented by a
sequence of indices:
[ ] )
1
(
)
(
)......
(
)......
1
(
)
( K
j
k
j
j
k
j
J =
=
With K<N. J has to preserve the chronological order which requires the monotony of j:
( )
1
)
( +
< k
j
k
j .
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
31
The sequence of indices along with the corresponding peaks is defined to be the set of pitch
marks:
[ ] )
2
(
))
(
(
))...
(
(
))...
1
(
(
))
(
( K
j
t
k
j
t
j
t
k
j
t
T a
a
a
a
a =
=
The determination of j requires a criterion expressing the reliability of two consecutive pitch
marks with respect to pitch values previously determined. The local criterion we chose is:
( )
3
))
(
(
))
(
)
(
(
))
(
);
(
( l
c
P
l
c
i
c
i
c
l
c
d a
−
−
=
We use the following algorithm for the marking: where l < i. It takes into account the time
interval between two marks compared to the pitch period a
P in samples. This criterion returns
zero if the two peaks are exactly ))
(
( l
c
Pa samples away from one another and a positive value if
the distance between these peaks is greater or less than the pitch period. The overall criterion is:
( ) ( ) ( )
4
))
1
(
(
))
1
(
(
)),
(
(
1
1
+
−
+
= ∑
−
=
k
j
t
B
k
j
t
k
j
t
d
D a
K
K
a
a
Where B is the bonus of selecting an extremum as a pitch mark. In a first time,
( )
5
)))
(
(
(
))
(
(
( k
j
t
amplitude
k
j
t
B a
a δ
=
The coefficient δ expresses the compromise between closeness to pitch values and strength of
pitch marks. Minimising D is achieved by using dynamic programming. The Pitch marking
results is shown in Figure2.
1600 1800 2000 2200 2400 2600 2800
50
100
150
original speech
1400 1600 1800 2000 2200 2400 2600
0
50
100
150
pitch marks
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
32
Figure 2. Pitch marks of Arabic speech (a) « ‫باب‬
; babun» (b) « akala ‫أكل‬ »
3.1.2.2. Synthesis marks
The OLA synthesis is based on the superposition-Addition of elementary signals ( )
n
Y j , obtained
from the ( )
n
X i placed in the new positions [ ]
j
ts . These positions are determined by the height
and the length of the synthesis signal. In such synthesis one can modify the temporal scale by a
coefficient tscale .The positions ( )
1
−
k
ts and the pitch period Pa(k) are supposed to be known
we can deduce ( )
k
ts as [17];
( ) ( ) ( )
( )
( ) ( ) ( )
6
1
1
tscale
k
n
k
n
k
n
P
tscale
k
t
k
t
s
a
s
s
+
=
+
⋅
+
−
=
tscale: coefficient of length modification
1200 1300 1400 1500 1600 1700 1800 1900
50
100
150
original speech
1300 1400 1500 1600 1700 1800 1900
0
50
100
150
pitch marks
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
33
Figure 3. TD-PSOLA for pitch (F0) modification. In order to increase the pitch, the individual pitch-
synchronous frames are extracted, Hanning windowed, moved closer together and then added up. To
decrease the pitch, we move the frames further apart. Increasing the pitch will result in a shorter
signal, so we also need to duplicate frames if we want to change the pitch while holding the duration
constant.
3.2. Synthesis speech
Therefore, given the pitch mark and the synthesis mark of a given frame we use a fast re-
sampling method described below to shift the frame precisely where it will appear in the new
signal. Let x[n] the original frame, the re-sampled signal is given by A. Oppenheim [18]:
[ ] ( )
7
)
(
sin
)
( 




 −
= ∑
∞
−∞
= Ts
nTs
t
c
n
x
t
x
n
π
Where Ts is the sampling period. Calculating the result frame y[m] corresponding to the frame
x[n] shifted by a small delay δ amounts to evaluate x (mTs - δ). Therefore, y[m] = x (mTs - δ) i.e:
[ ] [ ] [ ]
( )
[ ] [ ]
( )
)
8
(
)
(
sin
)
(
sin
δ
π
δ
π
−
−
=
−
−
=
∑
∑
∞
−∞
=
∞
−∞
=
Ts
n
m
fs
c
n
x
nTs
mTs
fs
c
n
x
m
y
n
n
Where fs is the sampling frequency (1/Ts).Now, by rewriting c
sin as x
x /
)
sin( and by using
the following formula:
[ ] ))
(
sin(
)
cos(
)
(
sin( n
m
fs
Ts
n
m
fs −
=
−
− π
δ
π
δ
π But 0
)
(
sin
1
)
(
cos =
−
±
=
− n
m
and
n
m π
π we
get
[ ] [ ]
[ ]
( )
9
)
(
)
sin(
)
1
( )
1
(
δ
π
δ
π
−
−
−
=
+
−
∞
−∞
=
∑ Ts
n
m
fs
fs
n
x
m
y
n
m
n
As 0 < δ < Ts (resp. -Ts < δ < 0), we define
δ = α Ts, where 0 < α < 1 (resp. -1 < α < 0).
Then the synthesized speech is
[ ] [ ] ( )
10
)
(
1
)
sin(
)
1
( )
1
(
α
π
απ
−
−
−
= ∑
∞
−∞
=
+
−
n
m
n
x
m
y
n
n
m
And it is shown in Figure6
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
34
Figure 4. A waveform of a human utterance and its synthesized equivalent using TD-PSOLA
tools.« Akala ‫أكل‬»
4. RESULTS AND EVALUATION
Two types of tests were applied two evaluate the speech of the developed system regarding the
intelligibility and the naturalness aspects. The first test which measures the intelligibility is the
Diagnostic Rhyme Test (DRT). In this test, twenty pairs of words that differ only in a single
consonant are uttered and the listeners are asked to mark on an answer sheet which word of each
pair of the words they think is correct [21]. In the second evaluation test, which is Categorical
Estimation (CE), the listeners were asked a few questions about several attributes such as the
speed, the pronunciation, the stress, etc.[22] of the speech and they were asked to rank the voice
quality using a five level scale. The test group consisted of sixteen persons and the previously
mentioned two tests were repeated twice to see whether or not the test results will increase by the
learning effect which means that the listeners may become accustomed to the synthesized speech
they hear and they understand it better after every listening session . The following tables and
charts illustrate the results of these tests.
For both listening tests we prepared listening test programs and a brief introduction was given
before the listening test. In the first listening test, each sound was played once in 4 seconds
interval and the listeners write the corresponding scripts to the word they heard on the given
answer sheet. In the second listening test, for each listener, we played all 15 sentences together
and randomly. Each subject listens to 15 sentences and gives their judgment score using the
listening test program by giving a measure of quality as follows: (5 – Excellent, 4 - Good,1–
Bad). They evaluated the system by considering the naturalness aspect. Each listener did the
listening test fifteen times and we took the last ten results considering the first five tests as
training.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
4
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
original speech
0 0.5 1 1.5 2 2.5
x 10
4
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
speech synthesis
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
35
After collecting all listeners’ response, we calculated the average values and we found the
following results. In the first listening test, the average correct-rate for original and analysis-
synthesis sounds were 98% and that of rule-based synthesized sounds was 90%. We found the
synthesized words to be very intelligible figure5.
Figure5. Average scores for the first test (system Euler, our system, natural speech and Acapela
system. for the intelligibility of speech
5. CONCLUSION
In this work, a voice quality conversion algorithm with TD-PSOLA modifier was implemented
and tested under Matlab environment using our database. The results of perceptual evaluation test
indicate that the algorithm can effectively convert modal voice into the desired voice quality.
Results of the simulation verify that the quality of the synthesized signal by TD-PSOLA with
technique depends on the precision of the analysis marking as well as the synthesis marking
which must be placed with precision to avoid errors in the phase. Our higher precision algorithm
for pitch marking during the synthesis stage increases the signal quality. This gain in accuracy
avoids the reduction of deference between original and synthetic signals. We have shown that
syllables produce reasonably natural quality speech and durational modeling is crucial for
naturalness, with a significant reduction in numbers of units of the total base developed. We can
see this quality from the listening tests and objective evaluation to compare the original and
synthetic speech.
REFERENCES
[1] Huang, X., A. Acero and H. W. Hon (2001), Spoken Language Processing, Prentice Hall PTR, New
Jersey.
[2] Greenwood, A. R.(1997) “Articulatory Speech Synthesis Using Diphone Units”, IEEE international
Conference on Acoustics, Speech and Signal Processing, pp. 1635–1638.
[3] Sagisaka, Y., N. Iwahashi and K. Mimura, (1992) “ATR v-TALK Speech Synthesis System”,
Proceedings of the ICSLP, Vol. 1, pp. 483–486.
[4] Black, A. W. and P. Taylor, (1994) “CHATR: A Generic Speech Synthesis System”, Proceedings of
the International Conference on Computational Linguistics, Vol. 2, pp. 983–986.
Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011
36
[5] Childers, D.G. «Glottal source modeling for voice conversion». Speech communication, 16(2): 127-
138, 1995.
[6] Childers, D.G., and Lee, C.K. «Vocal quality factors: Analysis, synthesis, and perception».Journal of
the Acoustical Society of America, 1991.
[7] Acero A. «Source-filter Models for Time-Scale Pitch-Scale Modification of Speech». IEEE
International Conference on Acoustics, Speech, and Signal Processing, Seattle, USA, pp.881-884.
May, 1998.
[8] Dutoit, T., Pagel, V., Pierret, N., Bataille, and F. & van der Vrecken, O. (1996) The MBROLA
Project: Towards a Set of High-Quality Speech Synthesizers Free of Use.
[9] Moulines, E., and Charpentier, F. «Pitch-Synchronous Waveform Processing Techniques for TTS
Synthesis».Speech communication Vol 9, pp453-467, 1990.
[10] Alghmadi, M., (2003) “KACST Arabic Phonetic Database”, the Fifteenth International Congress of
Phonetics Science, Barcelona, pp 3109-3112.
[11] Assaf, M.,(2005). “A Prototype of an Arabic Diphone Speech Synthesizer in Festival,” Master Thesis,
Department of Linguistics and Philology, Uppsala University.
[12] Al-Zabibi, M., (1990) “An Acoustic–Phonetic Approach in Automatic Arabic Speech Recognition,”
The British Library in Association with UMI.
[13] Ibraheem A.(1990). “Al-Aswat Al-Arabia”, Arabic title, Anglo-Egyptian Publisher, Egypt.
[14] Maria M., “A Prototype of an Arabic Diphone Speech Synthesizer in Festival", Master Thesis in
Computational Linguistics, Uppsala university, 2004.
[15] Laprie, Y. and Colotte, V. (1998) “Automatic pitch marking for speech transformations via TD-
PSOLA”. In IX European Signal Processing Conference, Rhodes, Greece, 1998.
[16] Mower, L., Boeffard, O., Cherbonnel, B. (1991) “An algorithm of speech synthesis high-quality”
Proceeding of a Seminar SFA/GCP, pp 104-107.
[17] Oppenheim A. V. and Schafer, W. R. (1975) Digital Signal Processing. Prentice-Hall, Inc,
[18] Oppenheim, A.V. and Schafer R.W. (1975) Digital Signal Processing. Prentice-Hall, Inc., New York.
[19] Walker, J., Murphy, P. (2007). “A review of glottal waveform analysis. In: Progress in Nonlinear
Speech Processing.
[20] Demenko, G., Grocholewski, S., Wagner, A. & Szymański, M. (2006). “Prosody Annotation for
Corpus Based Speech Synthesis”. In: Proceedings of the Eleventh Australasian International
Conference on Speech Science and Technology. Auckland, New Zealand, pp. 460-465.
[21] Maria M., (2004) "A Prototype of an Arabic Diphone Speech Synthesizer in Festival",Master Thesis
in Computational Linguistics, Uppsala university, 2004.
[22] Kraft V., Portele T.,(1995) "Quality Evaluation of Five German Speech Synthesis Systems" Acta
Acustica 3, pp. 351-365.

More Related Content

What's hot

Voice morphing ppt
Voice morphing pptVoice morphing ppt
Voice morphing ppthimadrigupta
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Madhu Babu
 
Rain Technology
Rain TechnologyRain Technology
Rain Technologyamal390
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 
Voice morphing-101113123852-phpapp01 (1)
Voice morphing-101113123852-phpapp01 (1)Voice morphing-101113123852-phpapp01 (1)
Voice morphing-101113123852-phpapp01 (1)Ravi Teja
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping pptciciapaul
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET Journal
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
 
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...IDES Editor
 
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresSivaranjan Goswami
 

What's hot (20)

Voice morphing ppt
Voice morphing pptVoice morphing ppt
Voice morphing ppt
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01
 
Voice morphing
Voice morphingVoice morphing
Voice morphing
 
Rain Technology
Rain TechnologyRain Technology
Rain Technology
 
G010424248
G010424248G010424248
G010424248
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Voice morphing-101113123852-phpapp01 (1)
Voice morphing-101113123852-phpapp01 (1)Voice morphing-101113123852-phpapp01 (1)
Voice morphing-101113123852-phpapp01 (1)
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping ppt
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
 
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech features
 

Similar to High Quality Arabic Concatenative Speech Synthesis

An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicijnlc
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignmentskevig
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...ijnlc
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...kevig
 
A new framework based on KNN and DT for speech identification through emphat...
A new framework based on KNN and DT for speech  identification through emphat...A new framework based on KNN and DT for speech  identification through emphat...
A new framework based on KNN and DT for speech identification through emphat...nooriasukmaningtyas
 
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITIONFORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITIONsipij
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...csandit
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...ijma
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerIOSR Journals
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...ijma
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01ijcsit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARijcseit
 

Similar to High Quality Arabic Concatenative Speech Synthesis (20)

An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert SystemModeling of Speech Synthesis of Standard Arabic Using an Expert System
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
IMPROVING MYANMAR AUTOMATIC SPEECH RECOGNITION WITH OPTIMIZATION OF CONVOLUTI...
 
A new framework based on KNN and DT for speech identification through emphat...
A new framework based on KNN and DT for speech  identification through emphat...A new framework based on KNN and DT for speech  identification through emphat...
A new framework based on KNN and DT for speech identification through emphat...
 
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITIONFORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
 
F017163443
F017163443F017163443
F017163443
 
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) SynthesizerImplementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
 
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMARSYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
SYLLABLE-BASED SPEECH RECOGNITION SYSTEM FOR MYANMAR
 

Recently uploaded

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

High Quality Arabic Concatenative Speech Synthesis

  • 1. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 DOI : 10.5121/sipij.2011.2403 27 High Quality Arabic Concatenative Speech Synthesis Abdelkader Chabchoub and Adnan Cherif Signal Processing Laboratory, Science Faculty of Tunis, 1060 Tunisia achabchoub@yahoo.fr, adnen2fr@yahoo.fr ABSTRACT This paper describes the implementation of TD-PSOLA tools to improve the quality of the Arabic Text-to- speech (TTS) system. This system based on Diphone concatenation with TD-PSOLA modifier synthesizer. This paper describes techniques to improve the precision of prosodic modifications in the Arabic speech synthesis using the TD-PSOLA (Time Domain Pitch Synchronous Overlap-Add) method. This approach is based on the decomposition of the signal into overlapping frames synchronized with the pitch period. The main objective is to preserve the consistency and accuracy of the pitch marks after prosodic modifications of the speech signal and diphone with vowel integrated database adjustment and optimisation. KEYWORDS Speech processing and synthesis, Arabic speech, prosody, diphone, spectrum analysis, pitch mark, timbre, TD-PSOLA. 1. INTRODUCTION The synthetic voice that imitates human speech from plain text is not a trivial task, since this generally requires great knowledge about the real world, the language, the context where the text comes from, a deep understanding of the semantics of the text content and the relations that underlie all these information. However, many research and commercial speech synthesis systems developed have contributed to our understanding of all these phenomena, and have been successful in various respective ways for many applications such as in human-machine interaction, hands and eyes free access of information, interactive voice response systems. There have been three major approaches to speech synthesis: articulatory, formant and concatenative [1] [2] [3] [4]. Articulatory synthesis tries to model the human articulatory system, i.e. vocal cords, vocal tract, etc. Formant synthesis employs some set of rules to synthesize speech using the formants that are the resonance frequencies of the vocal tract[19]. Since the formants constitute the main frequencies that make sounds distinct, speech is synthesized using these estimated frequencies. Several speech synthesis systems were developed such as vocoders and LPC synthesizers [5][6], but most of them did not reproduce high quality of synthetic speech when compared with that of PSOLA based systems [7] such as MBROLA synthesizers[8]. Especially TD-PSOLA method (Time Domain Pitch Synchronous Overlap-Add) is the most efficient method to produce criteria of satisfaction speech [9] and is one of the most popular concatenation synthesis techniques nowadays. LP-PSOLA (Linear Predictive PSOLA) and FD-
  • 2. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 28 PSOLA (Frequency Domain PSOLA), though able to produce equivalent result, require much more computational power. The first step of the TD-PSOLA is to perform a pitch detection algorithm and to generate pitch marks through overlapping windowed speech. To synthesize speech, the Short Time signals (ST signals) are simply overlapped and added with desired spacing of the ST-signals. 2. THE ARABIC DATABASE 2.1. Introduction for Arabic language The Arabic language is spoken throughout the Arab world and is the liturgical language of Islam. This means that Arabic is known widely by all Muslims in the world. Arabic either refers to Standard Arabic or to the many dialectal variations of Arabic. Standard Arabic is the language used by media and the language of Qur’an. Modern Standard Arabic is generally adopted as the common medium of communication through the Arab world today. Dialectal Arabic refers to the dialects derived from Classical Arabic [10]. These dialects differ sometimes which means that it is hard and a challenge for a Lebanese to understand an Algerian and it is worth mentioning there is even a difference within the same country. Standard Arabic has 34 basic phonemes, of which six are vowels, and 28 are consonants [11]. Several factors affect the pronunciation of phonemes. An example is the position of the phoneme in the syllable as initial, closing, intervocalic, or suffix. The pronunciation of consonants may also be influenced by the interaction (co-articulation) with other phonemes in the same syllable. Among these coarticulation effects are the accentuation and the nasalization. Arabic vowels are affected as well by the adjacent phonemes. Accordingly, each Arabic vowel has at least three allophones, the normal, the accentuated, and the nasalized allophone. In classic Arabic, we can divide the Arabic consonants into three categories with respect to dilution and accentuation [12]. Arabic language has five syllable patterns: CV, CW, CVC, CWC and CCV, where C represents a consonant, V represents a vowel and W represents a long vowel. Table 1. Arabic Consonants and Vowels and their phonetic notations SAMPA Code SAMP A Graphem es Code SAMPA Graphem es Code SAMP A Graphem es Code SAMP A Graphem es j ‫ي‬ G ‫غ‬ r ‫ر‬ ‫ء‬ a َ◌ f ‫ف‬ z ‫ز‬ b ‫ب‬ u ُ◌ q ‫ق‬ s ‫س‬ t ‫ت‬ i ِ◌ K ‫ك‬ S ‫ش‬ T ‫ث‬ an ً◌ l ‫ل‬ s. ‫ص‬ Z ‫ج‬ un ٌ◌ m ‫م‬ d. ‫ض‬ X ‫ح‬ in ٍ◌ n ‫ن‬ t. ‫ط‬ X ‫خ‬ h ‫ه‬ z. ‫ظ‬ d ‫د‬ w ‫و‬ H ‫ع‬ D ‫ذ‬
  • 3. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 29 2.2. Database construction The first step in constructing a diphone database for Arabic is to determine all possible diphone pairs of Arabic. In general, the typical diphone size is the square of the phone number for any language [13]. In reality, additional sound segments and various allophonic variations may in some cases be also included. The basic idea is to define classes of diphones, for example: vowel- consonant, consonant- vowel, vowel-vowel, and consonant-consonant. The syllabic structure of Arabic language is exploited here to simplify the required diphones database. The proposed sound segments may be considered as "sub-syllabic" units [10]. For good quality, the diphones boundaries are taken from the middle portion of vowels. Because diphones need to be clearly articulated various techniques have been proposed to extract them from subjects. One technique uses words within carrier sentences to ensure that the diphones are pronounced with acceptable duration and prosody[20] (i.e. consistent). Ideally, the diphones should come from a middle syllable of nonsense words so it is fully articulated and minimize the articulatory effects at the start and end of the word [14]. The second step is to record the corpus, this recording made by a native speaker of Arabic standard cardioids microphone with a high quality flat frequency response. The signal was sampled at 16 kHz and 16 bit. Finally Segmentation and annotation, the database registered must be prepared for the selection method has all the information necessary for its operation. The base is first segmented into phones, in second step to diphones. This was handmade by the studio diphone software developed by the laboratory TCTS of Mons. A correction on the units to ensure quality was made by the software Praat (Boers and my Weening, 2008).Prosodic analysis performed on the corrected signal to determine the pitch and duration of phone. The result of this segmentation provides a significant reduction of unites, all units not exceeding 400 (diphones, phonemes and phones), comparisons with other basis developed by other laboratories; Chanfour CENT laboratory Rabat Faculty of Sciences, a database of diphones S. Baloul thesis LIUM Mans France and a base of diphones by Noufel Tounsi, laboratory TCTS Mons.The code SAMPA (Speech Assessment Method Phonetic Alphabet) used for transformation grapheme phoneme. 3. SPEECH ANALYSIS AND SYNTHESIS This section will describe the procedures of synchronous analysis and synthesis using TD- PSOLA modifier Figure2 presents the block diagram of these two stages. 3.1. Speech analysis The first step in the speech analysis is to filter the speech signal by a RIF filter (pre-accentuation). The next step is to provide a sequence of pitch-marks and voiced/unvoiced classification for each segment between two consecutive pitch marks. This decision is based on the zero-crossing and the short time energy Figure1. A coefficient of voicement (v/uv) can be computed in order to quantize the periodicity of the signal [15].
  • 4. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 30 3.1.1. Segmentation The segmentation of a speech signal is used in order to identify the voiced and un-voiced frames. This classification is based on the zero-crossing ratio and the energy value of each signal frame Figure 1. Automatic segmentation of Arabic speech (a) « ‫باب‬ ; babun» (b) «‫شمس‬chamsun» This segmentation is used in order to identify the voiced and unvoiced frames. 3.1.2. Speech marks Different procedures of placed [ ] i ta are used according to the local features of components of the signal. A previous segmentation of the signal in identical feature zones permits to orient the marking toward the suitable method. Besides results of this segmentation will be necessary for the synthesis stage. 3.1.2.1. Reading marks The idea of our algorithm is to select pitch marks among local extrema of the speech signal. Given a set of mark candidates which all are negative peaks or all positive peaks: [ ] ) ( )..... ( )... 1 ( ) ( N t i t t i t T a a a a a = = where ( ) i ta is the sample of the ith peak, and N the number of peaks extracted ([16] explain how these candidates are found).Pitch marks are a subset of points out of a T , which are spaced by periods of pitch given by the pitch extraction algorithm. The selection can be represented by a sequence of indices: [ ] ) 1 ( ) ( )...... ( )...... 1 ( ) ( K j k j j k j J = = With K<N. J has to preserve the chronological order which requires the monotony of j: ( ) 1 ) ( + < k j k j .
  • 5. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 31 The sequence of indices along with the corresponding peaks is defined to be the set of pitch marks: [ ] ) 2 ( )) ( ( ))... ( ( ))... 1 ( ( )) ( ( K j t k j t j t k j t T a a a a a = = The determination of j requires a criterion expressing the reliability of two consecutive pitch marks with respect to pitch values previously determined. The local criterion we chose is: ( ) 3 )) ( ( )) ( ) ( ( )) ( ); ( ( l c P l c i c i c l c d a − − = We use the following algorithm for the marking: where l < i. It takes into account the time interval between two marks compared to the pitch period a P in samples. This criterion returns zero if the two peaks are exactly )) ( ( l c Pa samples away from one another and a positive value if the distance between these peaks is greater or less than the pitch period. The overall criterion is: ( ) ( ) ( ) 4 )) 1 ( ( )) 1 ( ( )), ( ( 1 1 + − + = ∑ − = k j t B k j t k j t d D a K K a a Where B is the bonus of selecting an extremum as a pitch mark. In a first time, ( ) 5 ))) ( ( ( )) ( ( ( k j t amplitude k j t B a a δ = The coefficient δ expresses the compromise between closeness to pitch values and strength of pitch marks. Minimising D is achieved by using dynamic programming. The Pitch marking results is shown in Figure2. 1600 1800 2000 2200 2400 2600 2800 50 100 150 original speech 1400 1600 1800 2000 2200 2400 2600 0 50 100 150 pitch marks
  • 6. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 32 Figure 2. Pitch marks of Arabic speech (a) « ‫باب‬ ; babun» (b) « akala ‫أكل‬ » 3.1.2.2. Synthesis marks The OLA synthesis is based on the superposition-Addition of elementary signals ( ) n Y j , obtained from the ( ) n X i placed in the new positions [ ] j ts . These positions are determined by the height and the length of the synthesis signal. In such synthesis one can modify the temporal scale by a coefficient tscale .The positions ( ) 1 − k ts and the pitch period Pa(k) are supposed to be known we can deduce ( ) k ts as [17]; ( ) ( ) ( ) ( ) ( ) ( ) ( ) 6 1 1 tscale k n k n k n P tscale k t k t s a s s + = + ⋅ + − = tscale: coefficient of length modification 1200 1300 1400 1500 1600 1700 1800 1900 50 100 150 original speech 1300 1400 1500 1600 1700 1800 1900 0 50 100 150 pitch marks
  • 7. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 33 Figure 3. TD-PSOLA for pitch (F0) modification. In order to increase the pitch, the individual pitch- synchronous frames are extracted, Hanning windowed, moved closer together and then added up. To decrease the pitch, we move the frames further apart. Increasing the pitch will result in a shorter signal, so we also need to duplicate frames if we want to change the pitch while holding the duration constant. 3.2. Synthesis speech Therefore, given the pitch mark and the synthesis mark of a given frame we use a fast re- sampling method described below to shift the frame precisely where it will appear in the new signal. Let x[n] the original frame, the re-sampled signal is given by A. Oppenheim [18]: [ ] ( ) 7 ) ( sin ) (       − = ∑ ∞ −∞ = Ts nTs t c n x t x n π Where Ts is the sampling period. Calculating the result frame y[m] corresponding to the frame x[n] shifted by a small delay δ amounts to evaluate x (mTs - δ). Therefore, y[m] = x (mTs - δ) i.e: [ ] [ ] [ ] ( ) [ ] [ ] ( ) ) 8 ( ) ( sin ) ( sin δ π δ π − − = − − = ∑ ∑ ∞ −∞ = ∞ −∞ = Ts n m fs c n x nTs mTs fs c n x m y n n Where fs is the sampling frequency (1/Ts).Now, by rewriting c sin as x x / ) sin( and by using the following formula: [ ] )) ( sin( ) cos( ) ( sin( n m fs Ts n m fs − = − − π δ π δ π But 0 ) ( sin 1 ) ( cos = − ± = − n m and n m π π we get [ ] [ ] [ ] ( ) 9 ) ( ) sin( ) 1 ( ) 1 ( δ π δ π − − − = + − ∞ −∞ = ∑ Ts n m fs fs n x m y n m n As 0 < δ < Ts (resp. -Ts < δ < 0), we define δ = α Ts, where 0 < α < 1 (resp. -1 < α < 0). Then the synthesized speech is [ ] [ ] ( ) 10 ) ( 1 ) sin( ) 1 ( ) 1 ( α π απ − − − = ∑ ∞ −∞ = + − n m n x m y n n m And it is shown in Figure6
  • 8. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 34 Figure 4. A waveform of a human utterance and its synthesized equivalent using TD-PSOLA tools.« Akala ‫أكل‬» 4. RESULTS AND EVALUATION Two types of tests were applied two evaluate the speech of the developed system regarding the intelligibility and the naturalness aspects. The first test which measures the intelligibility is the Diagnostic Rhyme Test (DRT). In this test, twenty pairs of words that differ only in a single consonant are uttered and the listeners are asked to mark on an answer sheet which word of each pair of the words they think is correct [21]. In the second evaluation test, which is Categorical Estimation (CE), the listeners were asked a few questions about several attributes such as the speed, the pronunciation, the stress, etc.[22] of the speech and they were asked to rank the voice quality using a five level scale. The test group consisted of sixteen persons and the previously mentioned two tests were repeated twice to see whether or not the test results will increase by the learning effect which means that the listeners may become accustomed to the synthesized speech they hear and they understand it better after every listening session . The following tables and charts illustrate the results of these tests. For both listening tests we prepared listening test programs and a brief introduction was given before the listening test. In the first listening test, each sound was played once in 4 seconds interval and the listeners write the corresponding scripts to the word they heard on the given answer sheet. In the second listening test, for each listener, we played all 15 sentences together and randomly. Each subject listens to 15 sentences and gives their judgment score using the listening test program by giving a measure of quality as follows: (5 – Excellent, 4 - Good,1– Bad). They evaluated the system by considering the naturalness aspect. Each listener did the listening test fifteen times and we took the last ten results considering the first five tests as training. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 4 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 original speech 0 0.5 1 1.5 2 2.5 x 10 4 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 speech synthesis
  • 9. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 35 After collecting all listeners’ response, we calculated the average values and we found the following results. In the first listening test, the average correct-rate for original and analysis- synthesis sounds were 98% and that of rule-based synthesized sounds was 90%. We found the synthesized words to be very intelligible figure5. Figure5. Average scores for the first test (system Euler, our system, natural speech and Acapela system. for the intelligibility of speech 5. CONCLUSION In this work, a voice quality conversion algorithm with TD-PSOLA modifier was implemented and tested under Matlab environment using our database. The results of perceptual evaluation test indicate that the algorithm can effectively convert modal voice into the desired voice quality. Results of the simulation verify that the quality of the synthesized signal by TD-PSOLA with technique depends on the precision of the analysis marking as well as the synthesis marking which must be placed with precision to avoid errors in the phase. Our higher precision algorithm for pitch marking during the synthesis stage increases the signal quality. This gain in accuracy avoids the reduction of deference between original and synthetic signals. We have shown that syllables produce reasonably natural quality speech and durational modeling is crucial for naturalness, with a significant reduction in numbers of units of the total base developed. We can see this quality from the listening tests and objective evaluation to compare the original and synthetic speech. REFERENCES [1] Huang, X., A. Acero and H. W. Hon (2001), Spoken Language Processing, Prentice Hall PTR, New Jersey. [2] Greenwood, A. R.(1997) “Articulatory Speech Synthesis Using Diphone Units”, IEEE international Conference on Acoustics, Speech and Signal Processing, pp. 1635–1638. [3] Sagisaka, Y., N. Iwahashi and K. Mimura, (1992) “ATR v-TALK Speech Synthesis System”, Proceedings of the ICSLP, Vol. 1, pp. 483–486. [4] Black, A. W. and P. Taylor, (1994) “CHATR: A Generic Speech Synthesis System”, Proceedings of the International Conference on Computational Linguistics, Vol. 2, pp. 983–986.
  • 10. Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.4, December 2011 36 [5] Childers, D.G. «Glottal source modeling for voice conversion». Speech communication, 16(2): 127- 138, 1995. [6] Childers, D.G., and Lee, C.K. «Vocal quality factors: Analysis, synthesis, and perception».Journal of the Acoustical Society of America, 1991. [7] Acero A. «Source-filter Models for Time-Scale Pitch-Scale Modification of Speech». IEEE International Conference on Acoustics, Speech, and Signal Processing, Seattle, USA, pp.881-884. May, 1998. [8] Dutoit, T., Pagel, V., Pierret, N., Bataille, and F. & van der Vrecken, O. (1996) The MBROLA Project: Towards a Set of High-Quality Speech Synthesizers Free of Use. [9] Moulines, E., and Charpentier, F. «Pitch-Synchronous Waveform Processing Techniques for TTS Synthesis».Speech communication Vol 9, pp453-467, 1990. [10] Alghmadi, M., (2003) “KACST Arabic Phonetic Database”, the Fifteenth International Congress of Phonetics Science, Barcelona, pp 3109-3112. [11] Assaf, M.,(2005). “A Prototype of an Arabic Diphone Speech Synthesizer in Festival,” Master Thesis, Department of Linguistics and Philology, Uppsala University. [12] Al-Zabibi, M., (1990) “An Acoustic–Phonetic Approach in Automatic Arabic Speech Recognition,” The British Library in Association with UMI. [13] Ibraheem A.(1990). “Al-Aswat Al-Arabia”, Arabic title, Anglo-Egyptian Publisher, Egypt. [14] Maria M., “A Prototype of an Arabic Diphone Speech Synthesizer in Festival", Master Thesis in Computational Linguistics, Uppsala university, 2004. [15] Laprie, Y. and Colotte, V. (1998) “Automatic pitch marking for speech transformations via TD- PSOLA”. In IX European Signal Processing Conference, Rhodes, Greece, 1998. [16] Mower, L., Boeffard, O., Cherbonnel, B. (1991) “An algorithm of speech synthesis high-quality” Proceeding of a Seminar SFA/GCP, pp 104-107. [17] Oppenheim A. V. and Schafer, W. R. (1975) Digital Signal Processing. Prentice-Hall, Inc, [18] Oppenheim, A.V. and Schafer R.W. (1975) Digital Signal Processing. Prentice-Hall, Inc., New York. [19] Walker, J., Murphy, P. (2007). “A review of glottal waveform analysis. In: Progress in Nonlinear Speech Processing. [20] Demenko, G., Grocholewski, S., Wagner, A. & Szymański, M. (2006). “Prosody Annotation for Corpus Based Speech Synthesis”. In: Proceedings of the Eleventh Australasian International Conference on Speech Science and Technology. Auckland, New Zealand, pp. 460-465. [21] Maria M., (2004) "A Prototype of an Arabic Diphone Speech Synthesizer in Festival",Master Thesis in Computational Linguistics, Uppsala university, 2004. [22] Kraft V., Portele T.,(1995) "Quality Evaluation of Five German Speech Synthesis Systems" Acta Acustica 3, pp. 351-365.