SlideShare a Scribd company logo
1 of 7
Download to read offline
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008

Investigation of Combined use of MFCC and
LPC Features in Speech Recognition Systems

International Science Index 19, 2008 waset.org/publications/7293

К.R. Aida–Zade, C. Ardil and S.S. Rustamov
Abstract—Statement of the automatic speech recognition
problem, the assignment of speech recognition and the application
fields are shown in the paper. At the same time as Azerbaijan speech,
the establishment principles of speech recognition system and the
problems arising in the system are investigated.
The computing algorithms of speech features, being the main part
of speech recognition system, are analyzed. From this point of view,
the determination algorithms of Mel Frequency Cepstral Coefficients
(MFCC) and Linear Predictive Coding (LPC) coefficients expressing
the basic speech features are developed. Combined use of cepstrals of
MFCC and LPC in speech recognition system is suggested to
improve the reliability of speech recognition system. To this end, the
recognition system is divided into MFCC and LPC-based recognition
subsystems. The training and recognition processes are realized in
both subsystems separately, and recognition system gets the decision
being the same results of each subsystems. This results in decrease of
error rate during recognition.
The training and recognition processes are realized by artificial
neural networks in the automatic speech recognition system. The
neural networks are trained by the conjugate gradient method. In the
paper the problems observed by the number of speech features at
training the neural networks of MFCC and LPC-based speech
recognition subsystems are investigated.
The variety of results of neural networks trained from different
initial points in training process is analyzed. Methodology of
combined use of neural networks trained from different initial points
in speech recognition system is suggested to improve the reliability
of recognition system and increase the recognition quality, and
obtained practical results are shown.
Keywords—speech recognition, cepstral analysis, Voice
activation detection algorithm, Mel Frequency Cepstral
Coefficients, features of speech, Cepstral Mean Subtraction,
neural networks, Linear Predictive Coding

R

I.

INTRODUCTION

ECENTLY as a result of wide development of computers,
the various forms of information exchange between man
and computer are discovered. At present, inputting the
data into the computer by the speech and its recognition by the
computer is one of the developed scientific fields. Because
each language has its specific features, the various speech
recognition systems are investigated for the different
languages.

Kamil Aida-Zade is with the institute of Cybernetics of the National Academy
of Sciences, Baku, Azerbaijan.
Cemal Ardil is with the National Academy of Aviation, Baku, Azerbaijan.
Samir Rustamov is with the institute of Cybernetics of the National Academy
of Sciences, Baku, Azerbaijan.

This is why we propose speech recognition system for the
Azerbaijani language.
The subject of this paper is about the construction of
structured Azerbaijan speech recognition system, analysis of
investigating the speech recognition system, and recognition
result. The speech inputted to our system consists of finite
number of words clearly expressed with definite time interval.
The recognizable words (speech) depending on applied fields
can be used for various purposes.
II. PROBLEM STATEMENT
Automatic speech recognition by computer is a process
where speech signals are automatically converted into the
corresponding sequence of words in text.
Automatic speech recognition involves a number of
disciplines such as physiology, acoustics, signal processing,
pattern recognition, and linguistics. The difficulty of
automatic speech recognition is coming from many aspects of
these areas.
Variability from speakers: A word may be uttered
differently by the same speaker because of illness or emotion.
It may be articulated differently depending on whether it is
planned read speech or spontaneous conversation. The speech
produced in noise is different from the speech produced in a
quiet environment because of the change in speech production
in an effort to communicate more effectively across a noisy
environment. Since no two persons share identical vocal cords
and vocal tract, they cannot produce the same acoustic signal.
Typically, females sound is different from males. So do
children from adults. Also, there is variability due to dialect
foreign accent.
Variability
from
environments:
The
acoustical
environment where recognizers are used to introduce another
layer of corruption in speech signals. This is because of
background noise, reverberation, microphones, and
transmission channels.
III. THE METHODS OF SOLUTION

At first the speech signal is transformed into electric
oscillation by the sound recorders (for example, microphone).
Later the signal passed over analog-digital converter is
transformed into digital form at some sampling frequency f d
and quantization level. The sampling frequency - analog
signal without losing its important information determines the
necessary frequency for sampling.
The main part of speech recognition system consists of
training and recognition processes. Initially basic features

36
International Science Index 19, 2008 waset.org/publications/7293

World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008

characterizing speech signal are computed in both processes.
The efficiency of this stage is one of the significant factors
affecting behavior of the next stages and exactness of speech
recognition. Using the time function of the signal as feature is
ineffective. The reason for this is that when the same person
says the same word, its time function varies significantly.
At present the methods of calculating MFCC (Mel
Frequency Cepstral Coefficients) and LPC (Linear Predictive
Coding) are widely used in speech recognition as speech
features.
Let’s explain the essence of these methods, separately:
The model of speech generation consists of two parts: the
generation of the excitation signal and the vocal tract filter.
The excitation signal is spectrally shaped by a vocal tract
equivalent filter. The outcome of this process is the speech. If
e(n) denotes a sequence of the excitation signal and θ (n)
denotes the impulse response of the vocal tract equivalent
filter, a sequence of the speech is then equal to the excitation
signal convolved with the impulse response of the vocal tract
filter as shown in equation (3.1).
s ( n ) = e( n ) * θ ( n )
(1)
A convolution in the time domain corresponds to a
multiplication in the frequency domain:
S (ω ) = E (ω ) ⋅ θ (ω )
(2)
In MFCC method using the logarithm of equation (2), the
multiplied spectra becomes additive
log S (ω ) = log E (ω ) ⋅ θ (ω ) = log E (ω ) + log θ (ω ) .
It is possible to separate the excitation spectrum E (ω )
from the vocal system spectrum θ (ω ) by remembering that:
E (ω ) is responsible for the “fast” spectral variations, θ (ω ) is
responsible for the “slow” spectral variations. Frequency
components corresponding to E (ω ) appear at “large values”
on the horizontal axis in the “new frequency domain”,
whereas frequency components corresponding to θ (ω ) appear
at “small values”. The new domain found after taking the
logarithm and the inverse Fourier transform is called the
cepstrum domain, and the word quefrency is used for
describing the “frequencies” in the cepstrum domain.
As same way Z transform is applied to the convolution in
the time domain in method LPC:
S ( z) = E ( z) ⋅θ ( z)
The main idea behind linear prediction is to extract the
vocal tract parameters. Given a speech samples at time n ,
s(n ) can be modeled as a linear combination of the past p
speech samples, such that:
p
)
s ( a; n) = ∑ s (n − k ) ⋅ a p (k )

p

The filter A( z ) = 1 + ∑ a p (k ) z −k is called the predicting
k =1

error filter. This filter is equal to the inverse value of vocal
tract equivalent filter.
1
.
A( z ) =
θ ( z)
To find the vocal tract filter θ ( z ) , we must first find the
LPC coefficients a p . By this aim, the following function is
minimized

ε p (a ) = ∑ e(a; n) → min
M

a p = (a p (1),a p (2),..., a p ( p))

are

unknown

where M is a number of frames.
We use the necessary condition of minimum to solve the
problem:
M
M
∂ε p (a )
∂
∂
2
=
e ( a; n ) =
∑ e(a; n) = 2 ∑ e(a; n)
∂a p ( k ) ∂a p ( k ) n =1
∂a p (k )
n =1
M

= 2 ∑ e ( a; n )
n =1

k =1

Apply to this signal the z − transform: R ( z ) = S ( z ) A( z ) .

p
∂ ⎡
⎤
⎢ s (n) + ∑ a p (l ) s (n − l ) ⎥ =
∂a p (k ) ⎣
l =1
⎦

M

= 2 ∑ e ( a; n ) s ( n − k ) = 0 ,

k = 1,2,..., p

n =1

Then we get
p
M
⎡
⎤
∑ ⎢ s(n) + ∑ a p (l ) s(n − l )⎥ s (n − k ) = 0 .
n =1 ⎣
l =1
⎦

(5)

M

Let’s denote rx (k ) = ∑ s (n) s ( n − k ) . Consequently we can
n =1

write the equation (3.5) as following form.
p

p

l =1

l =1

rx (k ) + ∑ a p (l )rx (l − k ) = 0 or ∑ a p (l )rx (k − l ) = − rx ( k ) ,

k = 1,..., p .
(6)
The equation (6) is called the normal equation or the YuleWalker equation.
Using the expression (3) in the functional (4), we get:

ε p (a ) = ∑ e(a; n) = ∑ e(a; n)e(a; n) =
M

2

n =1

M

n =1

p
⎡
⎤
= ∑ e( a; n) ⎢ s(n) + ∑ a p ( k )s (n − k ) ⎥ =
n =1
k =1
⎣
⎦
M

p

M

k =1

M

n =1

= ∑ e(a; n) s(n) + ∑ a p ( k ) ∑ e(a; n) s (n − k ) .
n =1

While

M

∑ e(a; n) s(n − k ) = 0 , we can write the functional

n =1

(4) as following form.
M

M

⎡

p

⎤

n =1

n =1

⎣

k =1

⎦

ε p , min (a ) = ε p (a ) = ∑ e(a; n) s (n) = ∑ ⎢ s (n) + ∑ a p (k ) s (n − k )⎥ s (n) =

LPC

coefficients ( p ∈ [8,12]) .
Summing the real and predicted samples we get the
following signal:
p
)
e(a; n) = s(n) + s (a; n) = s(n) + ∑ a p (k ) ⋅ s (n − k )
(3)

(4)

n =1

k =1

where

2

p

= rx (0) + ∑ a p (k )rx (k ).
k =1

The coefficients a p (k ) , which giving the minimum to the
functional is found by using following Levinson-Durbin
recursion.
b) E0 = rx (0)
1. a) a0 (0) = 1
2. For j = 0,1,... p − 1 calculated the following

37
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008

expressions:
j

a) γ j = rx ( j + 1) + ∑ a j (i )rx ( j − i + 1)
i =1

b) Γ j +1 = −γ j / E j
c) i = 1,2,..., j
a j +1 (i ) = a j (i ) + Γ j +1a j ( j − i + 1)
d) a j +1 ( j + 1) = Γ j +1
2
e) E j +1 = E j ⎡1 − Γ j +1 ⎤ .
⎢
⎥
⎣
⎦

International Science Index 19, 2008 waset.org/publications/7293

IV.ALGORITHM

OF CALCULATION OF SPEECH
FEATURES

The combined use of LPC and MFCC cepstrals in speech
recognition system is for calculating speech features.
Calculation of the speech features algorithm is defined in the
following form.
1. Pre-processing. The amplitude spectrum of a speech
signal is dominant at “low frequencies” (up to
approximately 4 kHz ). The speech signals is passed through a
first-order FIR high pass filter:
s p (n) = sin ( n) − α ⋅ sin (n − 1)
where α − is the filter coefficient (α ∈ (0,95 ;1)) , sin (n) − is
the input signal.
2. Voice activation detection (VAD). The problem of
locating the endpoints of an utterance in a speech signal is a
major problem for the speech recognizer. An inaccurate
endpoint detection will decrease the performance of the
speech recognizer. Some commonly used measurements for
finding speech are short-term energy estimate E s , or shortterm power estimate Ps , and short term zero crossing rate Z s .
For the speech signals s p (n ) these measures are calculated as

Using this function both the short-term power and the zero
crossing rate will be taken into account. S c is a scale factor
for avoiding small values, in a typical application is
S c = 1000 . The trigger for this function can be described as:
tW = µW + αδ W
the µ w is the mean and δ w is the variance for W s (m )
calculated for the first 5 blocks. The α term is constant that
have to be fine tuned according to the characteristics of the
signal. After some testing the following approximation of α
will give a pretty good voice activation detection in various
level of additive background noise.
−
α = 0,2 ⋅ δ W0, 4 .
The voice activation detection function, VAD(m ) , can be
found as:
⎧1, Ws ( m) ≥ tW ,
VAD(m) = ⎨
⎩0, Ws ( m) < tW .

By using this function we can detect the endpoints of an
utterance.
3. Framing. The input signal is divided into overlapping
frames of N samples.
s frame (n) = s p (n) ⋅ w(n) ,
⎧1, K ⋅ r < n ≤ K ⋅ r + N , r = 0,1,2,..., M − 1,
w( n) = ⎨
⎩0, otherwise,
where M is the number of frames, f s is the sampling
frequency, t frame is the frame length measured in time, and K
is the frame step.
N = f s ⋅ t frame .
TABLE I
VALUES OF FRAME LENGTH AND FRAME STEP INTERVAL
DEPENDING ON THE SAMPLING FREQUENCY

follows:
E s ( m) =

m

∑ s 2 ( n) ,
p

n = m − L +1

Z s ( m) =

Ps (m) =

Sampling
frequency ( f s )

1 m 2
∑ s p ( n) ,
L n=m− L +1

Frame length ( N )
Frame step ( K )

1 m sgn( s p ( n)) − sgn( s p (n − 1))
∑
L n=m − L +1
2

where
⎧1, s p (n) ≥ 0 ,
⎪
sgn( s p (n)) = ⎨
⎪− 1, s p (n) < 0 .
⎩
For each block of L = 100 samples these measures
calculate some value. The short term zero crossing rate gives a
measure of how many times the signal, s p (n ) , changes sign.
This short term zero crossing rate tends to be larger during
unvoiced regions.
These measures will need some triggers for making
decision about where the utterances begin and end. To create a
trigger, one needs some information about the background
noise. This is done by assuming that the first 5 blocks are
background noise. With this assumption, the mean and
variance for the measures will be calculated. To make a more
comfortable approach, the following function is used:
Ws (m) = Ps (m) ⋅ (1 − Z s (m)) ⋅ S c .

f s = 16kHs
400
160

f s = 11kHs

f s = 8kHs

256
110

200
80

We use the f s = 16kHs sampling frequency in our system.
4. Windowing. There are a number of different window
functions to choose between to minimize the signal
discontinuities. One of the most commonly used for
windowing a speech signal before Fourier transformation, is
the Hamming window:
⎧
⎛ 2π (n − 1) ⎞⎫
s w (n) = ⎨0,54 − 0,46 cos⎜
⎟⎬s frame (n), 1 ≤ n ≤ N .
⎝ N − 1 ⎠⎭
⎩
Calculating of MFCC features.
Fast Fourier transform(FFT). Applying by FFT to
windowing frames are calculated spectrum of frames.
N

bink = ∑ s w (n)e

−i ( n −1) k

2π
N

, k = 0,1,2,..., N − 1 .

n =1

Mel filtering. The low-frequency components of the
magnitude spectrum are ignored. The useful frequency band

38
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008

lies between 64 Hz and half of the actual sampling frequency.
This band is divided into 23 channels equidistant in mel
frequency domain. Each channel has triangular-shaped
frequency window. Consecutive channels are halfoverlapping.
The choice of the starting frequency of the filter bank,
f start = 64 Hz , roughly corresponds to the case where the full
frequency band is divided into 24 channels and the first
channel is discarded using any of the three possible sampling
frequencies.
The centre frequencies of the channels in terms of FFT
bin indices ( cbini for the i -th channel) are calculated as
follows:
⎛ mel
⎞
x ⎞
⎛
Mel ( x) = 2595 lg⎜1 +
⎟, x = 700 ⋅ ⎜10 2595 − 1⎟ ,
⎜
⎟
⎝ 700 ⎠
⎝
⎠
Mel{ f s 2} − Mel{ f start } ⎫
⎧
f c = Mel −1 ⎨Mel{ f start } +
i ⎬,
NF
⎩
⎭ ,

is different from the channel effect in later recordings when
the person uses the system. The problem is that a false
distance between the training data and newly recorded data is
introduced due to the different channel effects. The channel
effect is eliminated by subtracting the mel-cepstrum
coefficients with the mean mel-cepstrum coefficients:
1 M
mc j ( q ) = C j (q ) −
∑ C i (q), q = 1,2, ,...,12
M i =1
Calculating of LPC features.
The LPC coefficients of each frame are found by applying
Levinson-Durbin algorithm and following cepstrals are
calculated.
k −1
i⎞
⎛
c(k ) = − a p (k ) − ∑ ⎜1 − ⎟a p (i )c(k − i ) ,
k⎠
i =1 ⎝

We apply the cepstral mean subtraction to these 12 LPC
cepstrals and enter to the feature vector in next step.
V. CONSTRUCTION OF NEURAL NETWORK

i

International Science Index 19, 2008 waset.org/publications/7293

i = 1,2,3,..., NF − 1

⎧ fc ⎫
cbini = round ⎨
N⎬ ,
⎩ fs ⎭
where round (⋅) stands for rounding towards the nearest
integer. NF = 24 -is the number of channels of filter.
The output of the mel filter is the weighted sum of the FFT
magnitude spectrum values (bini ) in each band. Triangular,
half-overlapped windowing is used as follows:
cbin
i − cbin k −1 + 1
bini +
fbank k = ∑
cbin k − cbink −1 + 1
i = cbin
i

k

k −1

⎞
⎛
i − cbin k
⎟
+ ∑ ⎜1 −
⎜ cbin − cbin + 1 ⎟bini , k = 1,2,..., NF − 1.
i = cbin +1 ⎝
k +1
k
⎠
where cbin0 and cbin24 denote the FFT bin indices
corresponding to the starting frequency and half of the
sampling frequency, respectively,
⎧f
⎫
cbin0 = round ⎨ start N ⎬ ;
⎩ fs
⎭
cbink +1

There are various mathematical models which form the
basis of speech recognition systems. The widely used model is
Multilayer Artificial Neural Network (MANN). Let’s briefly
describe the structure of MANN.
Generally, MANN is incompletely connected graph. Let
L – quantity of MANN’s layers, N l - neuron quantity on
−
layer l , l = 1..L ; I lj - set of neurons of layer (l − 1) , which

connected to the neuron j on layer l ; θ lj - bias of neuron j
l
on layer l ; wij - weighted coefficient (synapse) of connection

between of neuron i on layer (l − 1) and neuron j on
layer l ; s lj , p and y lj , p - state and output value of neuron j
on layer l for input signal x p ∈ X of MANN.

k

cbin 24

⎧f 2 ⎫ N
= round ⎨ s N ⎬ = .
⎩ fs
⎭ 2

Non-linear transformation. The output of mel filtering is
subjected to a logarithm function (natural logarithm)
f i = ln( fbank i ), i = 1,2,..., NF − 1 .
Cepstral coefficients. 12 cepstral coefficients are
calculated from the output of the non-linear transformation
block.
NF −1
⎛ π ⋅i
⎞
( j − 0.5) ⎟, i = 1,..,12 .
Ci = ∑ f j ⋅ cos⎜
NF − 1
j =1
⎝
⎠
We apply to these 12 LPC cepstrals the cepstral mean
subtraction and enter to the feature vector in next step.
Cepstral Mean Subtraction (CMS). A speech signal may
be subjected to some channel noise when recorded, also
referred to as the channel effect. A problem arises if the
channel effect when recording training data for a given person

k = 1,...12 .

Forward propagation of MANN for x p ∈ X input signal
has been described by the following expressions (figures 1,2):
l
s lj , p = ∑ wij ⋅ yil,−p1 + θ lj ,
(7)
−
i∈I lj

y lj , p = f ( s lj , p ) ,

j = 1,..., N l , l = 1,..., L ,

y 0, p = x j , p ,
j

j = 1,..., N 0 ,

(8)
(9)

where f (⋅) - given nonlinear activation function. As
activation function logistic or hyperbolic tangent functions
can be used:
e z − e−z
1
.
f log ( z ) =
, f tan ( z ) = z
−αz
e + e−z
1+ e
Their derivation can be calculated by function value:
df log ( z )
df tan ( z )
2
= α ⋅ f log ( z ) ⋅ (1 − f log ( z )) ,
= 1 − f tan ( z ) .
dz
dz
Let, the training set of {x p , d p }, p = 1..P pairs are given,
where d p = ( d1, p ,..., d N

L,p

) – desired output for x p input

l
signal. The training of MANN consists in finding such wij
−
and θ l i ∈ I lj , j = 1,..., N l , l = 1,..., L , herewith on x p input
j

signal that MANN has output y p , which maximal closed to

39
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008

desired output d p . Usually, training quality is defined by
mean square error function:
1 P
E ( w,θ ; x, s, y ) = ∑η p E p ( w,θ ; x p , s p , y p ) ,
P p =1
(10)
2
1 NL
E p ( w,θ ; x p , s p , y p ) = ∑ y L, p − d j , p ,
j
2 j =1
where η p – coefficient, which determine the belonging

(

“quality”

of

input

xp

)

to

its

“ideal”

pattern

International Science Index 19, 2008 waset.org/publications/7293

p = 1,..., P , j = 1,..., N L .
The task of MANN training constitutes minimization of
criterion (10) according to parameters ( w,θ ) with (7)-(9)
conditions. The MANN of developed system was trained by
conjugate gradient method.

Fig. 1 MANN with two layers

person repeats the same speech, it has the different time
durations. For partially removing the problem, time durations
are led to the same scale. When the dimension of scale defined
for the speech signal increases, then the dimension of feature
vector corresponding to the signal also increases.
The dimension of neural network is taken as a total
number of weights and biases of neural network. The large
dimension of the feature vector acts strongly on the dimension
of neural network. For example, our neural network consists
of 2 layers: a number of inputted parameters are 420, a
number of neurons in the first layer are 50, a number of
neurons in output layer are 10. The dimension of our neural
network is 420 × 50 + 50 × 10 + 60 = 21560 .
Since the dimension of neural network is less than the
number of trained samples, there exists a set of various
weights and biases giving a minimum to minimization
criterion (10), such that the application of these weights and
biases to the recognition system gives the different results.
Here the construction of the following system is suggested
by application of neural networks trained from different initial
points. The speech recognition system depending on the aim
of a user presents him a recognition system of different
quality. The recognition systems with respect to the factor of
error recognition percent are conditionally called strong,
intermediate and weak reliability systems.
Strong reliability system. This is a system confirming the
recognition by each neural network trained from different
initial points. If some of these networks discard the
recognition, then the system doesn’t accept any recognition.
This system prevents the error in recognition process and
therefore is more reliable.

Fig. 2 One neuron description

VI. THE RECOGNITION PROCESS
The speech recognition system consists of MFCC and
LPC-based two subsystems. These subsystems are trained by
neural networks with MFCC and LPC features, respectively.
The recognition process is realized by two stages:
1. In MFCC and LPC–based recognition subsystems
recognition processes are realized in parallel.
2. The recognition results of MFCC and LPC–based
recognition subsystems are compared and the speech
recognition system confirms the result, which
confirmed by the both subsystems.
Since the MFCC and LPC methods are applied to the
overlapping frames of speech signal, the dimension of feature
vector depends on dimension of frames. At the same time, the
number of frames depends on the length of speech signal,
sampling frequency, frame step, frame length. In our system
the sampling frequency is 16 khs , the frame step is 160
samples, and the frame length is 400 samples.
The other problem of speech recognition is the same
speech has different time duration. Even when the same

Intermediate reliability system. This system uses voting
between neural networks trained from different initial points,
and recognition system confirms the result of the voting. For
example, if the number of neural networks trained by
changing the initial points are 3, then the system accepts the
same result confirmed by some two networks of them. In spite
of the fact that the confidence of the system is lower than
“strong reliability system”, the recognition percent is high.
Weak reliability system. Our suggested method in this
system is a sequential method. Let’s explain the main essence
of the method. The recognition system is trained from
different initial points in the trained process. First trained
neural network is used initially, then second neural network is
applied to the by the first network. Similarly the third neural
network is applied to the unrecognized patterns by the second
neural network and so on. This approach minimizes the
number of unrecognized patterns. However, it has got weak
reliability in terms of error rate.
Note that the number of neural networks trained from the
different initial points depends on the computer processing
power. Apparently when the number of neural networks used
in the system increases, the strong reliability system
minimizes the error and the recognition reliability of the
system increases. In weak reliability system despite of
increasing the correct recognition percent, the error

40
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008

312

MFCC
LPC
Combined

The number of error
recognized patterns

291 (93.27%)
288 (92.31%)

7 (2.24%)
5 (1.6%)

312

MFCC

288 (92.31%)

9 (2.88%)

312

LPC
MFCC

292 (93.59%)
289 (92.63%)

7 (2.24%)
5 (1.6%)

13 (4.17%)
18 (5.77%)

LPC

293 (93.91%)

6 (1.92%)

13 (4.17%)

MFCC
LPC
Combined

273 (87.5%)
286 (91.61%)
264 (84.6%)

1 (0.32%)
2 (0.64%)
1(0.32%)

The number of
unrecognized
patterns

The number of
error recognized
patterns

The numbers of
recognized
patterns

The types of
features

TABLE III
THE RESULTS OF THE STRONG RELIABILITY SYSTEM

312

The number of
unrecognized
patterns

The number of
error recognized
patterns

The number of
unrecognized
patterns

The number of
error recognized
patterns

The numbers of
recognized
patterns

MFCC
LPC

299 (95.83%)
296 (94.87%)

10 (3.2%)
10 (3.2%)

3 (0.96%)
6 (1.92%)

294(94.23%)

4(1.28%)

14(4.49%)

K.R.Ayda-zade, S.S.Rustamov. Research of Cepstral Coefficients
for Azerbaijan speech
recognition system. Transactions of
Azerbaijan National Academy of sciences.”Informatics and control
problems”. Volume XXV, №3. Baku, 2005, p.89-94.
[2] K.Р.Айда-заде, Э.Э.Мустафаев. Об оптимизации параметров
нейронной сети на этапе ее обучения / Труды
Республиканской научной конференции «Современные
проблемы информатизации, кибернетики и информационных
технологий», том I, Баку, 2003, с. 118-121.
[3] Mikael Nilsson,Marcus Ejnarsson. “Speech Recognition using
Hidden Markov Model”.Department of Telecommunications and
Speech Processing, Blekinge Institute of Technology. 2002.
http://www.hh.se/staff/maej/publications/MSc Thesis - MiMa.pdf
[4] Group 622 “On Speaker Verification”. 2004. 198 p.
http://www.control.auc.dk/~jhve02/report_inf6.pdf
[5] А.Б.Сергиенко. Цифровая обработка сигналов. СПб.: Питер,
2002, 608 с.
[6] ETSI ES 201 108 v1.1.2 (2000-04). “Speech Processing,
Transmission and Quality aspects(STQ); distributed speech
recognition; Front-end feature extraction algorithm; Compression
algorithms”. 20 p.
http://www.3gpp.org/ftp/TSG_SA/TSG_SA/TSGS_13/docs/PDF/S
P-010566.pdf
[7] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of
Electroscience. Lund University. August 2005.
http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf
[8] Bengt Mandersson. Chapter 5. “Levinson-Durbin Recursion”.
Department of Electroscience. Lund University. September 2005.
http://www.tde.lth.se/ugradcourses/osb/osb05_f3_a4.pdf
[9] Group 11. Tejaswini Hebalkar, Lee Hotraphinyo, Richard Tseng.
“Voice Recognition and Identification System”. Digital
communications and Signal Processing Systems Design. June
2000.
http://www.ece.cmu.edu/~ee551/Final_Reports/Gr11.551.S00.pdf
[10] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of
Electroscience.
Lund
University.
August
2005.
http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf

15 (4.81%)

3

15 (4.8%)
17 (5.45%)
24(7.69%)

REFERENCES

14 (4.49%)
19 (6.09%)

2

3 (0.96%)
5 (1.6%)
2(0.64%)

Combined

The number of
unrecognized
patterns

The numbers of
testing patterns

The numbers of
recognized patterns

The types of features

Number of training

MFCC
LPC

The types of
features

The numbers of
testing patterns
312

[1]

312

294 (94.23%)
290 (92.95%)
286(91.67%)

TABLE V
THE RESULTS OF THE WEAK RELIABILITY SYSTEM

TABLE II
THE RESULTS OF THE MFCC AND LPC-BASED SPEECH
RECOGNITION SUBSYSTEMS TRAINING FROM DIFFERENT INITIAL
POINTS

1

The numbers of
recognized
patterns

The numbers of
testing patterns

For training and recognition of the different scaled speech
in neural network, it is necessary to lead them to the same
scale. The dimension of scale has taken 5840 samples, which
corresponds to the 35 frames. Every frame has 12 features.
Our neural network consists of 2 layers: a number of input
parameters are 420, a number of neurons in the first layer is
50, a number of neurons in output layer is the number of
testing words (10). Testing speech is taken by Azerbaijani
digits.
For training process from every speech form digit are
entered 140-150 patterns to the system. The neural networks
of developed system were trained by conjugate gradient
method. In following tables MFCC and LPC-based
subsystems results are shown separately. Results of speech
recognition system, which combined use the MFCC and LPCbased subsystems are also shown.

The numbers of
testing patterns

International Science Index 19, 2008 waset.org/publications/7293

VII. EXPERIMENTAL RESULTS

The types of
features

TABLE IV
THE RESULTS OF THE INTERMEDIATE RELIABILITY SYSTEM

recognition percent also relatively increases and the reliability
of the system decreases. The number of neural networks
doesn’t affect the results of intermediate reliability system.

38 (12.18%)
24 (7.69%)
47(15.1%)

41
World Academy of Science, Engineering and Technology
International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008

[11] Химмельблау Д. Прикладное нелинейное программирование.
М.: Мир, 1975, 534 с.

International Science Index 19, 2008 waset.org/publications/7293

Kamil Aida-Zade is with the institute of Cybernetics of the National
Academy of Sciences, Baku, Azerbaijan.
Cemal Ardil is with the National Academy of Aviation, Baku,
Azerbaijan.
Samir Rustamov is with the institute of Cybernetics of the National
Academy of Sciences, Baku, Azerbaijan.

42

More Related Content

What's hot

Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...IJECEIAES
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
 
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic IntroductionFuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic IntroductionWaqas Tariq
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
 
11.a novel neural network classifier for brain computer
11.a novel neural network classifier for brain computer11.a novel neural network classifier for brain computer
11.a novel neural network classifier for brain computerAlexander Decker
 
A novel neural network classifier for brain computer
A novel neural network classifier for brain computerA novel neural network classifier for brain computer
A novel neural network classifier for brain computerAlexander Decker
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
The NAIST Text-to-Speech System for Blizzard Challenge 2015
The NAIST Text-to-Speech System for Blizzard Challenge 2015The NAIST Text-to-Speech System for Blizzard Challenge 2015
The NAIST Text-to-Speech System for Blizzard Challenge 2015Shinnosuke Takamichi
 
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingDemixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Maskingdhia_naruto
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A novel speech enhancement technique
A novel speech enhancement techniqueA novel speech enhancement technique
A novel speech enhancement techniqueeSAT Publishing House
 

What's hot (13)

Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...Chaotic signals denoising using empirical mode decomposition inspired by mult...
Chaotic signals denoising using empirical mode decomposition inspired by mult...
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic IntroductionFuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
Fuzzy Logic and Neuro-fuzzy Systems: A Systematic Introduction
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
 
11.a novel neural network classifier for brain computer
11.a novel neural network classifier for brain computer11.a novel neural network classifier for brain computer
11.a novel neural network classifier for brain computer
 
A novel neural network classifier for brain computer
A novel neural network classifier for brain computerA novel neural network classifier for brain computer
A novel neural network classifier for brain computer
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
The NAIST Text-to-Speech System for Blizzard Challenge 2015
The NAIST Text-to-Speech System for Blizzard Challenge 2015The NAIST Text-to-Speech System for Blizzard Challenge 2015
The NAIST Text-to-Speech System for Blizzard Challenge 2015
 
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingDemixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A novel speech enhancement technique
A novel speech enhancement techniqueA novel speech enhancement technique
A novel speech enhancement technique
 

Viewers also liked

Literary devices
Literary devicesLiterary devices
Literary devicesShee Luh
 
Key features of speech v2
Key features of speech v2Key features of speech v2
Key features of speech v2Neill Ford
 
Narrative Techniques & Other Literary Devices
Narrative Techniques & Other Literary DevicesNarrative Techniques & Other Literary Devices
Narrative Techniques & Other Literary Devicesaplitper7
 
Language, society and culture (PPT)
Language, society and culture (PPT)Language, society and culture (PPT)
Language, society and culture (PPT)KarlaEssmann
 
Literature Ii Elements Of Literature
Literature Ii  Elements Of LiteratureLiterature Ii  Elements Of Literature
Literature Ii Elements Of LiteratureXimena Gómez
 
Speech Acts And Speech Events, By Dr.Shadia Yousef Banjar.Pptx
Speech Acts And Speech Events, By Dr.Shadia Yousef Banjar.PptxSpeech Acts And Speech Events, By Dr.Shadia Yousef Banjar.Pptx
Speech Acts And Speech Events, By Dr.Shadia Yousef Banjar.PptxDr. Shadia Banjar
 

Viewers also liked (8)

Literary devices
Literary devicesLiterary devices
Literary devices
 
Key features of speech v2
Key features of speech v2Key features of speech v2
Key features of speech v2
 
Narrative Techniques & Other Literary Devices
Narrative Techniques & Other Literary DevicesNarrative Techniques & Other Literary Devices
Narrative Techniques & Other Literary Devices
 
Figures of speech
Figures of speechFigures of speech
Figures of speech
 
Language, society and culture (PPT)
Language, society and culture (PPT)Language, society and culture (PPT)
Language, society and culture (PPT)
 
Literature Ii Elements Of Literature
Literature Ii  Elements Of LiteratureLiterature Ii  Elements Of Literature
Literature Ii Elements Of Literature
 
Speech Acts And Speech Events, By Dr.Shadia Yousef Banjar.Pptx
Speech Acts And Speech Events, By Dr.Shadia Yousef Banjar.PptxSpeech Acts And Speech Events, By Dr.Shadia Yousef Banjar.Pptx
Speech Acts And Speech Events, By Dr.Shadia Yousef Banjar.Pptx
 
Figures of speech interactive presentation
Figures of speech interactive presentationFigures of speech interactive presentation
Figures of speech interactive presentation
 

Similar to Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-systems

Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechIOSR Journals
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION optljjournal
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)TANVIRAHMED611926
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Venkata Sudhir Vedurla
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depressionijsrd.com
 
Isolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationIsolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationeSAT Journals
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Publishing House
 
Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...
Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...
Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...CSCJournals
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
 
Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...csandit
 
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION optljjournal
 

Similar to Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-systems (20)

Emotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio SpeechEmotion Recognition Based On Audio Speech
Emotion Recognition Based On Audio Speech
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
 
D111823
D111823D111823
D111823
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
Isolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantizationIsolated word recognition using lpc &amp; vector quantization
Isolated word recognition using lpc &amp; vector quantization
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...
Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...
Arabic Phoneme Recognition using Hierarchical Neural Fuzzy Petri Net and LPC ...
 
Ky2418521856
Ky2418521856Ky2418521856
Ky2418521856
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...
 
Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...
 
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform DomainReal Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
 
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
VOICED SPEECH CHARACTERISATION BASED ON EMPIRICAL MODE DECOMPOSITION
 

More from Cemal Ardil

Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...Cemal Ardil
 
The main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-systemThe main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-systemCemal Ardil
 
The feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsThe feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsCemal Ardil
 
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...Cemal Ardil
 
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalSonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalCemal Ardil
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsCemal Ardil
 
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Cemal Ardil
 
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-psoReduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-psoCemal Ardil
 
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-designReal coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-designCemal Ardil
 
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...Cemal Ardil
 
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcgaOptimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcgaCemal Ardil
 
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...Cemal Ardil
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesCemal Ardil
 
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersOn the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersCemal Ardil
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil
 
On problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-objectOn problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-objectCemal Ardil
 
Numerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-enginesNumerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-enginesCemal Ardil
 
New technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-bladesNew technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-bladesCemal Ardil
 
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...Cemal Ardil
 
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidentsMultivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidentsCemal Ardil
 

More from Cemal Ardil (20)

Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
Upfc supplementary-controller-design-using-real-coded-genetic-algorithm-for-d...
 
The main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-systemThe main-principles-of-text-to-speech-synthesis-system
The main-principles-of-text-to-speech-synthesis-system
 
The feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsThe feedback-control-for-distributed-systems
The feedback-control-for-distributed-systems
 
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
System overflow blocking-transients-for-queues-with-batch-arrivals-using-a-fa...
 
Sonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposalSonic localization-cues-for-classrooms-a-structural-model-proposal
Sonic localization-cues-for-classrooms-a-structural-model-proposal
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systems
 
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
Response quality-evaluation-in-heterogeneous-question-answering-system-a-blac...
 
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-psoReduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
Reduction of-linear-time-invariant-systems-using-routh-approximation-and-pso
 
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-designReal coded-genetic-algorithm-for-robust-power-system-stabilizer-design
Real coded-genetic-algorithm-for-robust-power-system-stabilizer-design
 
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
Performance of-block-codes-using-the-eigenstructure-of-the-code-correlation-m...
 
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcgaOptimal supplementary-damping-controller-design-for-tcsc-employing-rcga
Optimal supplementary-damping-controller-design-for-tcsc-employing-rcga
 
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
Optimal straight-line-trajectory-generation-in-3 d-space-using-deviation-algo...
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particles
 
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersOn the-joint-optimization-of-performance-and-power-consumption-in-data-centers
On the-joint-optimization-of-performance-and-power-consumption-in-data-centers
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
 
On problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-objectOn problem-of-parameters-identification-of-dynamic-object
On problem-of-parameters-identification-of-dynamic-object
 
Numerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-enginesNumerical modeling-of-gas-turbine-engines
Numerical modeling-of-gas-turbine-engines
 
New technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-bladesNew technologies-for-modeling-of-gas-turbine-cooled-blades
New technologies-for-modeling-of-gas-turbine-cooled-blades
 
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
Neuro -fuzzy-networks-for-identification-of-mathematical-model-parameters-of-...
 
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidentsMultivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
Multivariate high-order-fuzzy-time-series-forecasting-for-car-road-accidents
 

Recently uploaded

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Recently uploaded (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-systems

  • 1. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008 Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems International Science Index 19, 2008 waset.org/publications/7293 К.R. Aida–Zade, C. Ardil and S.S. Rustamov Abstract—Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown. Keywords—speech recognition, cepstral analysis, Voice activation detection algorithm, Mel Frequency Cepstral Coefficients, features of speech, Cepstral Mean Subtraction, neural networks, Linear Predictive Coding R I. INTRODUCTION ECENTLY as a result of wide development of computers, the various forms of information exchange between man and computer are discovered. At present, inputting the data into the computer by the speech and its recognition by the computer is one of the developed scientific fields. Because each language has its specific features, the various speech recognition systems are investigated for the different languages. Kamil Aida-Zade is with the institute of Cybernetics of the National Academy of Sciences, Baku, Azerbaijan. Cemal Ardil is with the National Academy of Aviation, Baku, Azerbaijan. Samir Rustamov is with the institute of Cybernetics of the National Academy of Sciences, Baku, Azerbaijan. This is why we propose speech recognition system for the Azerbaijani language. The subject of this paper is about the construction of structured Azerbaijan speech recognition system, analysis of investigating the speech recognition system, and recognition result. The speech inputted to our system consists of finite number of words clearly expressed with definite time interval. The recognizable words (speech) depending on applied fields can be used for various purposes. II. PROBLEM STATEMENT Automatic speech recognition by computer is a process where speech signals are automatically converted into the corresponding sequence of words in text. Automatic speech recognition involves a number of disciplines such as physiology, acoustics, signal processing, pattern recognition, and linguistics. The difficulty of automatic speech recognition is coming from many aspects of these areas. Variability from speakers: A word may be uttered differently by the same speaker because of illness or emotion. It may be articulated differently depending on whether it is planned read speech or spontaneous conversation. The speech produced in noise is different from the speech produced in a quiet environment because of the change in speech production in an effort to communicate more effectively across a noisy environment. Since no two persons share identical vocal cords and vocal tract, they cannot produce the same acoustic signal. Typically, females sound is different from males. So do children from adults. Also, there is variability due to dialect foreign accent. Variability from environments: The acoustical environment where recognizers are used to introduce another layer of corruption in speech signals. This is because of background noise, reverberation, microphones, and transmission channels. III. THE METHODS OF SOLUTION At first the speech signal is transformed into electric oscillation by the sound recorders (for example, microphone). Later the signal passed over analog-digital converter is transformed into digital form at some sampling frequency f d and quantization level. The sampling frequency - analog signal without losing its important information determines the necessary frequency for sampling. The main part of speech recognition system consists of training and recognition processes. Initially basic features 36
  • 2. International Science Index 19, 2008 waset.org/publications/7293 World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008 characterizing speech signal are computed in both processes. The efficiency of this stage is one of the significant factors affecting behavior of the next stages and exactness of speech recognition. Using the time function of the signal as feature is ineffective. The reason for this is that when the same person says the same word, its time function varies significantly. At present the methods of calculating MFCC (Mel Frequency Cepstral Coefficients) and LPC (Linear Predictive Coding) are widely used in speech recognition as speech features. Let’s explain the essence of these methods, separately: The model of speech generation consists of two parts: the generation of the excitation signal and the vocal tract filter. The excitation signal is spectrally shaped by a vocal tract equivalent filter. The outcome of this process is the speech. If e(n) denotes a sequence of the excitation signal and θ (n) denotes the impulse response of the vocal tract equivalent filter, a sequence of the speech is then equal to the excitation signal convolved with the impulse response of the vocal tract filter as shown in equation (3.1). s ( n ) = e( n ) * θ ( n ) (1) A convolution in the time domain corresponds to a multiplication in the frequency domain: S (ω ) = E (ω ) ⋅ θ (ω ) (2) In MFCC method using the logarithm of equation (2), the multiplied spectra becomes additive log S (ω ) = log E (ω ) ⋅ θ (ω ) = log E (ω ) + log θ (ω ) . It is possible to separate the excitation spectrum E (ω ) from the vocal system spectrum θ (ω ) by remembering that: E (ω ) is responsible for the “fast” spectral variations, θ (ω ) is responsible for the “slow” spectral variations. Frequency components corresponding to E (ω ) appear at “large values” on the horizontal axis in the “new frequency domain”, whereas frequency components corresponding to θ (ω ) appear at “small values”. The new domain found after taking the logarithm and the inverse Fourier transform is called the cepstrum domain, and the word quefrency is used for describing the “frequencies” in the cepstrum domain. As same way Z transform is applied to the convolution in the time domain in method LPC: S ( z) = E ( z) ⋅θ ( z) The main idea behind linear prediction is to extract the vocal tract parameters. Given a speech samples at time n , s(n ) can be modeled as a linear combination of the past p speech samples, such that: p ) s ( a; n) = ∑ s (n − k ) ⋅ a p (k ) p The filter A( z ) = 1 + ∑ a p (k ) z −k is called the predicting k =1 error filter. This filter is equal to the inverse value of vocal tract equivalent filter. 1 . A( z ) = θ ( z) To find the vocal tract filter θ ( z ) , we must first find the LPC coefficients a p . By this aim, the following function is minimized ε p (a ) = ∑ e(a; n) → min M a p = (a p (1),a p (2),..., a p ( p)) are unknown where M is a number of frames. We use the necessary condition of minimum to solve the problem: M M ∂ε p (a ) ∂ ∂ 2 = e ( a; n ) = ∑ e(a; n) = 2 ∑ e(a; n) ∂a p ( k ) ∂a p ( k ) n =1 ∂a p (k ) n =1 M = 2 ∑ e ( a; n ) n =1 k =1 Apply to this signal the z − transform: R ( z ) = S ( z ) A( z ) . p ∂ ⎡ ⎤ ⎢ s (n) + ∑ a p (l ) s (n − l ) ⎥ = ∂a p (k ) ⎣ l =1 ⎦ M = 2 ∑ e ( a; n ) s ( n − k ) = 0 , k = 1,2,..., p n =1 Then we get p M ⎡ ⎤ ∑ ⎢ s(n) + ∑ a p (l ) s(n − l )⎥ s (n − k ) = 0 . n =1 ⎣ l =1 ⎦ (5) M Let’s denote rx (k ) = ∑ s (n) s ( n − k ) . Consequently we can n =1 write the equation (3.5) as following form. p p l =1 l =1 rx (k ) + ∑ a p (l )rx (l − k ) = 0 or ∑ a p (l )rx (k − l ) = − rx ( k ) , k = 1,..., p . (6) The equation (6) is called the normal equation or the YuleWalker equation. Using the expression (3) in the functional (4), we get: ε p (a ) = ∑ e(a; n) = ∑ e(a; n)e(a; n) = M 2 n =1 M n =1 p ⎡ ⎤ = ∑ e( a; n) ⎢ s(n) + ∑ a p ( k )s (n − k ) ⎥ = n =1 k =1 ⎣ ⎦ M p M k =1 M n =1 = ∑ e(a; n) s(n) + ∑ a p ( k ) ∑ e(a; n) s (n − k ) . n =1 While M ∑ e(a; n) s(n − k ) = 0 , we can write the functional n =1 (4) as following form. M M ⎡ p ⎤ n =1 n =1 ⎣ k =1 ⎦ ε p , min (a ) = ε p (a ) = ∑ e(a; n) s (n) = ∑ ⎢ s (n) + ∑ a p (k ) s (n − k )⎥ s (n) = LPC coefficients ( p ∈ [8,12]) . Summing the real and predicted samples we get the following signal: p ) e(a; n) = s(n) + s (a; n) = s(n) + ∑ a p (k ) ⋅ s (n − k ) (3) (4) n =1 k =1 where 2 p = rx (0) + ∑ a p (k )rx (k ). k =1 The coefficients a p (k ) , which giving the minimum to the functional is found by using following Levinson-Durbin recursion. b) E0 = rx (0) 1. a) a0 (0) = 1 2. For j = 0,1,... p − 1 calculated the following 37
  • 3. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008 expressions: j a) γ j = rx ( j + 1) + ∑ a j (i )rx ( j − i + 1) i =1 b) Γ j +1 = −γ j / E j c) i = 1,2,..., j a j +1 (i ) = a j (i ) + Γ j +1a j ( j − i + 1) d) a j +1 ( j + 1) = Γ j +1 2 e) E j +1 = E j ⎡1 − Γ j +1 ⎤ . ⎢ ⎥ ⎣ ⎦ International Science Index 19, 2008 waset.org/publications/7293 IV.ALGORITHM OF CALCULATION OF SPEECH FEATURES The combined use of LPC and MFCC cepstrals in speech recognition system is for calculating speech features. Calculation of the speech features algorithm is defined in the following form. 1. Pre-processing. The amplitude spectrum of a speech signal is dominant at “low frequencies” (up to approximately 4 kHz ). The speech signals is passed through a first-order FIR high pass filter: s p (n) = sin ( n) − α ⋅ sin (n − 1) where α − is the filter coefficient (α ∈ (0,95 ;1)) , sin (n) − is the input signal. 2. Voice activation detection (VAD). The problem of locating the endpoints of an utterance in a speech signal is a major problem for the speech recognizer. An inaccurate endpoint detection will decrease the performance of the speech recognizer. Some commonly used measurements for finding speech are short-term energy estimate E s , or shortterm power estimate Ps , and short term zero crossing rate Z s . For the speech signals s p (n ) these measures are calculated as Using this function both the short-term power and the zero crossing rate will be taken into account. S c is a scale factor for avoiding small values, in a typical application is S c = 1000 . The trigger for this function can be described as: tW = µW + αδ W the µ w is the mean and δ w is the variance for W s (m ) calculated for the first 5 blocks. The α term is constant that have to be fine tuned according to the characteristics of the signal. After some testing the following approximation of α will give a pretty good voice activation detection in various level of additive background noise. − α = 0,2 ⋅ δ W0, 4 . The voice activation detection function, VAD(m ) , can be found as: ⎧1, Ws ( m) ≥ tW , VAD(m) = ⎨ ⎩0, Ws ( m) < tW . By using this function we can detect the endpoints of an utterance. 3. Framing. The input signal is divided into overlapping frames of N samples. s frame (n) = s p (n) ⋅ w(n) , ⎧1, K ⋅ r < n ≤ K ⋅ r + N , r = 0,1,2,..., M − 1, w( n) = ⎨ ⎩0, otherwise, where M is the number of frames, f s is the sampling frequency, t frame is the frame length measured in time, and K is the frame step. N = f s ⋅ t frame . TABLE I VALUES OF FRAME LENGTH AND FRAME STEP INTERVAL DEPENDING ON THE SAMPLING FREQUENCY follows: E s ( m) = m ∑ s 2 ( n) , p n = m − L +1 Z s ( m) = Ps (m) = Sampling frequency ( f s ) 1 m 2 ∑ s p ( n) , L n=m− L +1 Frame length ( N ) Frame step ( K ) 1 m sgn( s p ( n)) − sgn( s p (n − 1)) ∑ L n=m − L +1 2 where ⎧1, s p (n) ≥ 0 , ⎪ sgn( s p (n)) = ⎨ ⎪− 1, s p (n) < 0 . ⎩ For each block of L = 100 samples these measures calculate some value. The short term zero crossing rate gives a measure of how many times the signal, s p (n ) , changes sign. This short term zero crossing rate tends to be larger during unvoiced regions. These measures will need some triggers for making decision about where the utterances begin and end. To create a trigger, one needs some information about the background noise. This is done by assuming that the first 5 blocks are background noise. With this assumption, the mean and variance for the measures will be calculated. To make a more comfortable approach, the following function is used: Ws (m) = Ps (m) ⋅ (1 − Z s (m)) ⋅ S c . f s = 16kHs 400 160 f s = 11kHs f s = 8kHs 256 110 200 80 We use the f s = 16kHs sampling frequency in our system. 4. Windowing. There are a number of different window functions to choose between to minimize the signal discontinuities. One of the most commonly used for windowing a speech signal before Fourier transformation, is the Hamming window: ⎧ ⎛ 2π (n − 1) ⎞⎫ s w (n) = ⎨0,54 − 0,46 cos⎜ ⎟⎬s frame (n), 1 ≤ n ≤ N . ⎝ N − 1 ⎠⎭ ⎩ Calculating of MFCC features. Fast Fourier transform(FFT). Applying by FFT to windowing frames are calculated spectrum of frames. N bink = ∑ s w (n)e −i ( n −1) k 2π N , k = 0,1,2,..., N − 1 . n =1 Mel filtering. The low-frequency components of the magnitude spectrum are ignored. The useful frequency band 38
  • 4. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008 lies between 64 Hz and half of the actual sampling frequency. This band is divided into 23 channels equidistant in mel frequency domain. Each channel has triangular-shaped frequency window. Consecutive channels are halfoverlapping. The choice of the starting frequency of the filter bank, f start = 64 Hz , roughly corresponds to the case where the full frequency band is divided into 24 channels and the first channel is discarded using any of the three possible sampling frequencies. The centre frequencies of the channels in terms of FFT bin indices ( cbini for the i -th channel) are calculated as follows: ⎛ mel ⎞ x ⎞ ⎛ Mel ( x) = 2595 lg⎜1 + ⎟, x = 700 ⋅ ⎜10 2595 − 1⎟ , ⎜ ⎟ ⎝ 700 ⎠ ⎝ ⎠ Mel{ f s 2} − Mel{ f start } ⎫ ⎧ f c = Mel −1 ⎨Mel{ f start } + i ⎬, NF ⎩ ⎭ , is different from the channel effect in later recordings when the person uses the system. The problem is that a false distance between the training data and newly recorded data is introduced due to the different channel effects. The channel effect is eliminated by subtracting the mel-cepstrum coefficients with the mean mel-cepstrum coefficients: 1 M mc j ( q ) = C j (q ) − ∑ C i (q), q = 1,2, ,...,12 M i =1 Calculating of LPC features. The LPC coefficients of each frame are found by applying Levinson-Durbin algorithm and following cepstrals are calculated. k −1 i⎞ ⎛ c(k ) = − a p (k ) − ∑ ⎜1 − ⎟a p (i )c(k − i ) , k⎠ i =1 ⎝ We apply the cepstral mean subtraction to these 12 LPC cepstrals and enter to the feature vector in next step. V. CONSTRUCTION OF NEURAL NETWORK i International Science Index 19, 2008 waset.org/publications/7293 i = 1,2,3,..., NF − 1 ⎧ fc ⎫ cbini = round ⎨ N⎬ , ⎩ fs ⎭ where round (⋅) stands for rounding towards the nearest integer. NF = 24 -is the number of channels of filter. The output of the mel filter is the weighted sum of the FFT magnitude spectrum values (bini ) in each band. Triangular, half-overlapped windowing is used as follows: cbin i − cbin k −1 + 1 bini + fbank k = ∑ cbin k − cbink −1 + 1 i = cbin i k k −1 ⎞ ⎛ i − cbin k ⎟ + ∑ ⎜1 − ⎜ cbin − cbin + 1 ⎟bini , k = 1,2,..., NF − 1. i = cbin +1 ⎝ k +1 k ⎠ where cbin0 and cbin24 denote the FFT bin indices corresponding to the starting frequency and half of the sampling frequency, respectively, ⎧f ⎫ cbin0 = round ⎨ start N ⎬ ; ⎩ fs ⎭ cbink +1 There are various mathematical models which form the basis of speech recognition systems. The widely used model is Multilayer Artificial Neural Network (MANN). Let’s briefly describe the structure of MANN. Generally, MANN is incompletely connected graph. Let L – quantity of MANN’s layers, N l - neuron quantity on − layer l , l = 1..L ; I lj - set of neurons of layer (l − 1) , which connected to the neuron j on layer l ; θ lj - bias of neuron j l on layer l ; wij - weighted coefficient (synapse) of connection between of neuron i on layer (l − 1) and neuron j on layer l ; s lj , p and y lj , p - state and output value of neuron j on layer l for input signal x p ∈ X of MANN. k cbin 24 ⎧f 2 ⎫ N = round ⎨ s N ⎬ = . ⎩ fs ⎭ 2 Non-linear transformation. The output of mel filtering is subjected to a logarithm function (natural logarithm) f i = ln( fbank i ), i = 1,2,..., NF − 1 . Cepstral coefficients. 12 cepstral coefficients are calculated from the output of the non-linear transformation block. NF −1 ⎛ π ⋅i ⎞ ( j − 0.5) ⎟, i = 1,..,12 . Ci = ∑ f j ⋅ cos⎜ NF − 1 j =1 ⎝ ⎠ We apply to these 12 LPC cepstrals the cepstral mean subtraction and enter to the feature vector in next step. Cepstral Mean Subtraction (CMS). A speech signal may be subjected to some channel noise when recorded, also referred to as the channel effect. A problem arises if the channel effect when recording training data for a given person k = 1,...12 . Forward propagation of MANN for x p ∈ X input signal has been described by the following expressions (figures 1,2): l s lj , p = ∑ wij ⋅ yil,−p1 + θ lj , (7) − i∈I lj y lj , p = f ( s lj , p ) , j = 1,..., N l , l = 1,..., L , y 0, p = x j , p , j j = 1,..., N 0 , (8) (9) where f (⋅) - given nonlinear activation function. As activation function logistic or hyperbolic tangent functions can be used: e z − e−z 1 . f log ( z ) = , f tan ( z ) = z −αz e + e−z 1+ e Their derivation can be calculated by function value: df log ( z ) df tan ( z ) 2 = α ⋅ f log ( z ) ⋅ (1 − f log ( z )) , = 1 − f tan ( z ) . dz dz Let, the training set of {x p , d p }, p = 1..P pairs are given, where d p = ( d1, p ,..., d N L,p ) – desired output for x p input l signal. The training of MANN consists in finding such wij − and θ l i ∈ I lj , j = 1,..., N l , l = 1,..., L , herewith on x p input j signal that MANN has output y p , which maximal closed to 39
  • 5. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008 desired output d p . Usually, training quality is defined by mean square error function: 1 P E ( w,θ ; x, s, y ) = ∑η p E p ( w,θ ; x p , s p , y p ) , P p =1 (10) 2 1 NL E p ( w,θ ; x p , s p , y p ) = ∑ y L, p − d j , p , j 2 j =1 where η p – coefficient, which determine the belonging ( “quality” of input xp ) to its “ideal” pattern International Science Index 19, 2008 waset.org/publications/7293 p = 1,..., P , j = 1,..., N L . The task of MANN training constitutes minimization of criterion (10) according to parameters ( w,θ ) with (7)-(9) conditions. The MANN of developed system was trained by conjugate gradient method. Fig. 1 MANN with two layers person repeats the same speech, it has the different time durations. For partially removing the problem, time durations are led to the same scale. When the dimension of scale defined for the speech signal increases, then the dimension of feature vector corresponding to the signal also increases. The dimension of neural network is taken as a total number of weights and biases of neural network. The large dimension of the feature vector acts strongly on the dimension of neural network. For example, our neural network consists of 2 layers: a number of inputted parameters are 420, a number of neurons in the first layer are 50, a number of neurons in output layer are 10. The dimension of our neural network is 420 × 50 + 50 × 10 + 60 = 21560 . Since the dimension of neural network is less than the number of trained samples, there exists a set of various weights and biases giving a minimum to minimization criterion (10), such that the application of these weights and biases to the recognition system gives the different results. Here the construction of the following system is suggested by application of neural networks trained from different initial points. The speech recognition system depending on the aim of a user presents him a recognition system of different quality. The recognition systems with respect to the factor of error recognition percent are conditionally called strong, intermediate and weak reliability systems. Strong reliability system. This is a system confirming the recognition by each neural network trained from different initial points. If some of these networks discard the recognition, then the system doesn’t accept any recognition. This system prevents the error in recognition process and therefore is more reliable. Fig. 2 One neuron description VI. THE RECOGNITION PROCESS The speech recognition system consists of MFCC and LPC-based two subsystems. These subsystems are trained by neural networks with MFCC and LPC features, respectively. The recognition process is realized by two stages: 1. In MFCC and LPC–based recognition subsystems recognition processes are realized in parallel. 2. The recognition results of MFCC and LPC–based recognition subsystems are compared and the speech recognition system confirms the result, which confirmed by the both subsystems. Since the MFCC and LPC methods are applied to the overlapping frames of speech signal, the dimension of feature vector depends on dimension of frames. At the same time, the number of frames depends on the length of speech signal, sampling frequency, frame step, frame length. In our system the sampling frequency is 16 khs , the frame step is 160 samples, and the frame length is 400 samples. The other problem of speech recognition is the same speech has different time duration. Even when the same Intermediate reliability system. This system uses voting between neural networks trained from different initial points, and recognition system confirms the result of the voting. For example, if the number of neural networks trained by changing the initial points are 3, then the system accepts the same result confirmed by some two networks of them. In spite of the fact that the confidence of the system is lower than “strong reliability system”, the recognition percent is high. Weak reliability system. Our suggested method in this system is a sequential method. Let’s explain the main essence of the method. The recognition system is trained from different initial points in the trained process. First trained neural network is used initially, then second neural network is applied to the by the first network. Similarly the third neural network is applied to the unrecognized patterns by the second neural network and so on. This approach minimizes the number of unrecognized patterns. However, it has got weak reliability in terms of error rate. Note that the number of neural networks trained from the different initial points depends on the computer processing power. Apparently when the number of neural networks used in the system increases, the strong reliability system minimizes the error and the recognition reliability of the system increases. In weak reliability system despite of increasing the correct recognition percent, the error 40
  • 6. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008 312 MFCC LPC Combined The number of error recognized patterns 291 (93.27%) 288 (92.31%) 7 (2.24%) 5 (1.6%) 312 MFCC 288 (92.31%) 9 (2.88%) 312 LPC MFCC 292 (93.59%) 289 (92.63%) 7 (2.24%) 5 (1.6%) 13 (4.17%) 18 (5.77%) LPC 293 (93.91%) 6 (1.92%) 13 (4.17%) MFCC LPC Combined 273 (87.5%) 286 (91.61%) 264 (84.6%) 1 (0.32%) 2 (0.64%) 1(0.32%) The number of unrecognized patterns The number of error recognized patterns The numbers of recognized patterns The types of features TABLE III THE RESULTS OF THE STRONG RELIABILITY SYSTEM 312 The number of unrecognized patterns The number of error recognized patterns The number of unrecognized patterns The number of error recognized patterns The numbers of recognized patterns MFCC LPC 299 (95.83%) 296 (94.87%) 10 (3.2%) 10 (3.2%) 3 (0.96%) 6 (1.92%) 294(94.23%) 4(1.28%) 14(4.49%) K.R.Ayda-zade, S.S.Rustamov. Research of Cepstral Coefficients for Azerbaijan speech recognition system. Transactions of Azerbaijan National Academy of sciences.”Informatics and control problems”. Volume XXV, №3. Baku, 2005, p.89-94. [2] K.Р.Айда-заде, Э.Э.Мустафаев. Об оптимизации параметров нейронной сети на этапе ее обучения / Труды Республиканской научной конференции «Современные проблемы информатизации, кибернетики и информационных технологий», том I, Баку, 2003, с. 118-121. [3] Mikael Nilsson,Marcus Ejnarsson. “Speech Recognition using Hidden Markov Model”.Department of Telecommunications and Speech Processing, Blekinge Institute of Technology. 2002. http://www.hh.se/staff/maej/publications/MSc Thesis - MiMa.pdf [4] Group 622 “On Speaker Verification”. 2004. 198 p. http://www.control.auc.dk/~jhve02/report_inf6.pdf [5] А.Б.Сергиенко. Цифровая обработка сигналов. СПб.: Питер, 2002, 608 с. [6] ETSI ES 201 108 v1.1.2 (2000-04). “Speech Processing, Transmission and Quality aspects(STQ); distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms”. 20 p. http://www.3gpp.org/ftp/TSG_SA/TSG_SA/TSGS_13/docs/PDF/S P-010566.pdf [7] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of Electroscience. Lund University. August 2005. http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf [8] Bengt Mandersson. Chapter 5. “Levinson-Durbin Recursion”. Department of Electroscience. Lund University. September 2005. http://www.tde.lth.se/ugradcourses/osb/osb05_f3_a4.pdf [9] Group 11. Tejaswini Hebalkar, Lee Hotraphinyo, Richard Tseng. “Voice Recognition and Identification System”. Digital communications and Signal Processing Systems Design. June 2000. http://www.ece.cmu.edu/~ee551/Final_Reports/Gr11.551.S00.pdf [10] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of Electroscience. Lund University. August 2005. http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf 15 (4.81%) 3 15 (4.8%) 17 (5.45%) 24(7.69%) REFERENCES 14 (4.49%) 19 (6.09%) 2 3 (0.96%) 5 (1.6%) 2(0.64%) Combined The number of unrecognized patterns The numbers of testing patterns The numbers of recognized patterns The types of features Number of training MFCC LPC The types of features The numbers of testing patterns 312 [1] 312 294 (94.23%) 290 (92.95%) 286(91.67%) TABLE V THE RESULTS OF THE WEAK RELIABILITY SYSTEM TABLE II THE RESULTS OF THE MFCC AND LPC-BASED SPEECH RECOGNITION SUBSYSTEMS TRAINING FROM DIFFERENT INITIAL POINTS 1 The numbers of recognized patterns The numbers of testing patterns For training and recognition of the different scaled speech in neural network, it is necessary to lead them to the same scale. The dimension of scale has taken 5840 samples, which corresponds to the 35 frames. Every frame has 12 features. Our neural network consists of 2 layers: a number of input parameters are 420, a number of neurons in the first layer is 50, a number of neurons in output layer is the number of testing words (10). Testing speech is taken by Azerbaijani digits. For training process from every speech form digit are entered 140-150 patterns to the system. The neural networks of developed system were trained by conjugate gradient method. In following tables MFCC and LPC-based subsystems results are shown separately. Results of speech recognition system, which combined use the MFCC and LPCbased subsystems are also shown. The numbers of testing patterns International Science Index 19, 2008 waset.org/publications/7293 VII. EXPERIMENTAL RESULTS The types of features TABLE IV THE RESULTS OF THE INTERMEDIATE RELIABILITY SYSTEM recognition percent also relatively increases and the reliability of the system decreases. The number of neural networks doesn’t affect the results of intermediate reliability system. 38 (12.18%) 24 (7.69%) 47(15.1%) 41
  • 7. World Academy of Science, Engineering and Technology International Journal of Computer, Information Science and Engineering Vol:2 No:7, 2008 [11] Химмельблау Д. Прикладное нелинейное программирование. М.: Мир, 1975, 534 с. International Science Index 19, 2008 waset.org/publications/7293 Kamil Aida-Zade is with the institute of Cybernetics of the National Academy of Sciences, Baku, Azerbaijan. Cemal Ardil is with the National Academy of Aviation, Baku, Azerbaijan. Samir Rustamov is with the institute of Cybernetics of the National Academy of Sciences, Baku, Azerbaijan. 42