SlideShare a Scribd company logo
1 of 15
Download to read offline
http://www.iaeme.com/IJECET/index.asp 82 editor@iaeme.com
International Journal of Electronics and Communication Engineering & Technology
(IJECET)
Volume 6, Issue 9, Sep 2015, pp. 82-96, Article ID: IJECET_06_09_010
Available online at
http://www.iaeme.com/IJECETissues.asp?JType=IJECET&VType=6&IType=9
ISSN Print: 0976-6464 and ISSN Online: 0976-6472
© IAEME Publication
COMPARATIVE STUDY OF LPCC AND
FUSED MEL FEATURE SETS FOR
SPEAKER IDENTIFICATION USING GMM-
UBM
Anagha S. Bawaskar
Dept. of Electronics & Telecommunication,
M. E. S. College of Engineering, Pune, India
Prabhakar N. Kota
Dept. of Electronics & Telecommunication,
M. E. S. College of Engineering, Pune, India
ABSTRACT
Biometrics identifiers are typically measurable characteristics used to
label and describe the individual respectively. Biometric identifiers are the
combination of both physiological and behavioral characteristics. The
physiological characteristics include the characteristics related to the shape
of the body. There are various examples for physiological characteristics but
not limited. Examples include fingerprint, palm, hand geometry, iris
recognition, and retina. Behavioral characteristics are related to the pattern
behavior of a person including but not limited to typing rhythm and voice.
Biometric system technology is now a day’s a well-furnished technology, it
analyzes human body characteristics. It is also known as one of the active
biometric tasks. There is much speech related activities such as language
recognition, speech recognition, and speaker recognition respectively.
Speaker recognition superficially defines as to identify the accurate speaker
from the group of various people. It is a very broad term and is further
classified as speaker identification and speaker verification. The paper is
concentrating on the term speaker identification. The main aim is to identify
the accurate speaker from the given speech samples. These samples are
obtained by extracting features and are used for modeling purpose. Standard
database TIMIT is being used for identification. The paper comprises of
various algorithms for feature extraction, they are Mel Frequency Cepstral
Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficient (IMFCC)
and linear predictive Cepstral Coefficients (LPCC). The term Fusion came
from the combination of the two algorithms namely MFCC and IMFCC. The
Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM
http://www.iaeme.com/IJECET/index.asp 83 editor@iaeme.com
comparison is made among the results of Fusion and LPCC respectively.
From the result, it is seen on an average Fusion is better than LPCC.
Index Terms: Gaussian Mixture Models (GMM), Inverted Mel Frequency
Cepstral Coefficients (IMFCC), Linear Predictive Cepstral Coefficients
(LPCC), Mel Frequency Cepstral Coefficients (MFCC), Universal
Background Model (UBM)
Cite this Article: Anagha S. Bawaskar and Prabhakar N. Kota. Comparative
Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM, International Journal of Electronics and Communication
Engineering & Technology, 6(9), 2015, pp. 82-96.
http://www.iaeme.com/IJECET/issues.asp?JType=IJECET&VType=6&IType=
9
1. INTRODUCTION
Nowadays various biometrics systems are there. In last decades, an increasing interest
in security system has risen. For the security purpose, it includes various biometric
schemes. Biometrics refers to technologies that measure and analyzes human body
characteristics. There are many biometric methods existing in the world, they are face
recognition, eye retina and iris recognition, fingerprint, DNA, hand measurements etc.
for authentication purpose. These are the one of the well-known biometric method,
adding to this list one of the well-known method is Speech signal processing. Speech
is one of the natural forms used in communication. Speech recognition has application
in voice identification in ordinary personal computers to biometric and forensic
applications. Recently the development has been seen in a security system. There are
two main techniques in speech processing, one is speaker recognition and the other is
speech recognition, in this paper the main focus is given on speaker recognition. The
speaker recognition is further divided only speaker identification and speaker
verification.
Speaker identification is the technique in which not registered speaker is being
identified and Speaker verification a claimed speaker is being identified. The speaker
identification is in a ratio of 1: N while speaker verification is in 1:1 ratio
respectively. In this paper text- independent, speaker identification system is used. In
speaker identification, the specific characteristics of voice are being extracted from
the given sample of voice of speaker known as feature extraction. After this, the
speaker model is trained and stored into the system database. The extraction of the
voice of speaker yields us the specific information of the speakers’ voice called
feature vectors. The speaker vectors represent the specific information of the speaker
which is based on the single or many things from the following: vocal tract, excitation
source, and behavioral traits. All speaker recognition systems use the set of scores to
enhance the probability and reliability of the recognizer. Before feature extraction, the
system goes through the pre-processing stage. An important role is played by Pre-
processing in speaker identification and helps to reduce the amount of variation in the
database which does not contain the important information about speech; it is
considered to be a good practice. The preprocessing removes the irrelevant
information respectively.
The various algorithms used for feature extraction are Mel-frequency Cepstral
Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficients (IMFCC), and
Linear Predictive Cepstral Coefficients (LPCC). In this paper, feature extraction is
Anagha S. Bawaskar and Prabhakar N. Kota
http://www.iaeme.com/IJECET/index.asp 84 editor@iaeme.com
done by using all these above-mentioned algorithms. Investigations by the researchers
find out speaker specific complementary information relative to MFCC that are called
as Inverse Mel Frequency Cepstral Coefficients (IMFCC) respectively.
Complementary information is used for combining the score models and for
combining the score models along with the MFCC and is named as Fused Mel Feature
Set. Such models are nothing but the mathematical representation of the particular
system [1].The inverse filter bank method is being used for capturing this
complementary information from high-frequency part of the energy spectrum. IMFCC
captures the information which is neglected by MFCC. The respective features are
modeled by using Gaussian Mixture Model and Universal Background Model (GMM-
UBM). All algorithms used in this paper are based on Gaussian filters only. The
results are verified in standard database TIMIT.
The final results are the comparison between LPCC results and Fused Mel Feature
Set results and accurate results are noted down. The next section of this paper is
followed by Fused Mel Feature set using Gaussian Filters and Linear predictive
Cepstral coefficient using Gaussian filters. It is followed by comparative results of
both Fused Mel feature Set and Linear Predictive Cepstral Coefficient.
2. FEATURE EXTRACTION AND FILTER DESIGN
To represent any speech signal in a finite number of measures is the goal of feature
extraction. Features are nothing but the representation of the spectrum of a speech
signal in each window frame. The Cepstral vectors are derived from a filter bank that
has been designed according to some model of the auditory system [2]. Most of the
feature extraction methods use a standard triangular filter. The triangular filters are
used for filtering the spectrum of the speech signal which simulates the characteristics
of a human ear. But this also has some disadvantages. These are, they give sharper or
crisp partition in an energy spectrum, due to this some information is lost. In this
paper, Gaussian filters are used. The crisp and sharp transition in an energy spectrum
is avoided if we use Gaussian filters instead of triangular filters. This gives results in a
smoother adaptation from one sub band to other. Because of this adaptive property,
there is always one type of correlation being maintained. These correlations are
maintained from the mid points of the triangular filters at the base of it as well as from
the end points of triangular filters. Mathematical calculations in Gaussian filters are
simple. Hence because of such advantages over triangular filters we use Gaussian
Filters. The motivation for using Mel-Frequency Cepstrum Coefficients was due to
the fact that the auditory response of the human ear resolves frequencies nonlinearly.
The mapping from linear frequency to Mel Frequency is defined as [3].
)7001(log2595 10
ffmel 
(1)
Where;
The subjective pitch in Mel corresponding to f is melf , this frequency in actual
measured in Hz.
MFCCs are one of the more popular parameterization methods used by
researchers in the speech technology field. It has the benefit that it is capable of
capturing the phonetically important characteristics of speech. MFCC are band-
limiting in nature and can easily be employed to make it suitable for applications like
a telephone.
Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM
http://www.iaeme.com/IJECET/index.asp 85 editor@iaeme.com
Generally, the feature extraction using MFCC uses a triangular filter. The
triangular filter has some characteristics like, asymmetrically tapering which also do
not provide the correlation between the sub-bands and the nearby spectral
components. Because of all this, information loss occurred there. By using Gaussian
filters, one profit is that it avoids drawbacks and losses seen in a triangular filter.
Gaussian filters are tapering towards both the end and provide correlation between
sub-bands and its nearby spectral components [4].
The IMFCC is one of the feature extraction techniques. It captures the
complementary information present in the high-frequency part of the spectrum. The
figure below shows the steps involved in feature extraction of both Gaussian MFCC
and IMFCC features. Let the input speech signal be y (n), where n=1, M. it represent
the preprocessed frame of the signal. Firstly the signal y (n) is converted to the
frequency domain by a DFT which leads to the energy spectrum. This is followed by
Gaussian filter bank block.
Figure 1 Steps involved in extraction of Gaussian MFCC and IMFCC [5]
Mathematically the equation for Gaussian filter is written as;
(2)
Where, k is coefficient index in the N-point DFT, b i
k is a point between the th
i
triangular filters boundary located at its base and considered as mean of th
i Gaussian
filter while the i is the standard deviation or square root of variance and can be
written as,

 ii bb
i
kk 
 1
(3)
Where; is the parameter where variance is being controlled.
Figure 2 Filterbank design [5]
MFCC
IMFCC
DCT
DCT
()10Log
()10Log
Speech
Signal Pre-
Processing ||2
FFT
MFCC Filter Bank
Gaussian
Gaussian
IMFCC Filter Bank
Gaussian
2
2
2
)(
i
bi
MFCC
kk
g
i e 




Anagha S. Bawaskar and Prabhakar N. Kota
http://www.iaeme.com/IJECET/index.asp 86 editor@iaeme.com
Two plots in a single figure are shown in above figure 2. One is for triangular
filter and the other is for the Gaussian filter. This plot is made by considering a single
value of sigma. Here in this case by considering different values for sigma plot can be
drawn respectively. Fig 4 and Fig 5 shows the individual response for Gaussian filter
bank of MFCC and IMFCC.
Figure 3 Mel scale Gaussian filter bank [6]
Figure 4 Inverted Mel scale Gaussian filter bank [6]
Mathematically, the Gaussian MFCC can be written as,
 
    
1
)(1








 
Q
ffffi
fff
F
M
k lowmelhighmel
lowmelmel
s
s
bi
(4)
sM is a number of points in DFT, sF is the sampling frequency, lowf and highf are low
and high-frequency boundaries of a filterbank, Q is the number of filters in the bank
and 1
melf is an inverse of the transformation.
)110(700)( 2595/1
 fmel
melmel ff (5)
The inverted Mel Scale Filterbank structure can be obtained by just flipping
original filterbank around the midpoint of frequency range that is being considered.
)6.(..............................1
2
)( 1
'






  k
M
k s
iqi 
Where,
)('
ki is the original MFCC filter bank response.
0 1000 2000 3000 4000 5000 6000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frequency (Hz)
Weight
0 1000 2000 3000 4000 5000 6000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frequency (Hz)
Weight
Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM
http://www.iaeme.com/IJECET/index.asp 87 editor@iaeme.com
These filter banks are being forced on the energy spectrum obtained by taking
Fast Fourier transform of the preprocessed signal as follow:



2
1
2
)(.|)(|)(
Ms
k
g
i
g
kkYie
MFCC
MFCC

(7)
Where, )(ki is respective filter response and 2
)(kY is the energy spectrum.
1ibk ibk 1ibk
Figure 5 Response )(ki of a typical Mel scale filter [5]
Finally, DCT is taken on the log filter bank energies })]}({{log[ Q
iie and the final
MFCC coefficients can be written as-
)8.(].........)
2
12
(cos[)]1([log
2 1
0 Q
l
mie
Q
C MFCCMFCC g
Q
l
g
m



 


Where; 10  Rm , R is the desired number of Cepstral features. The same procedure for
extracting the IMFCC features as well [4] and are denoted as;
)9]......()
2
2
(cos[)]1([log
2 1
0 Q
ll
mie
Q
C IMFCCIMFCC g
Q
l
g
m



 



3. LINEAR PREDICTIVE CEPSTRAL COEFFICIENTS (LPCC)
The predictive coefficient can be determined by minimizing the squared differences
between actual speech samples and linearly predicted values. This set is a unique set
of parameters. In practice, the actual predictor coefficients are never used as it is
because of their high variance. These predictor coefficients are transformed to a more
robust set of parameters known as Cepstral coefficients. The procedure for extracting
the LPCC is same as that of MFCC and IMFCC respectively. In this also we are going
to use Gaussian filter bank.
Figure 6 Block diagram of LPCC algorithm [7]
1Amp
l
i
t
u
d
e
Speech
Sequence
Pre-emphasis
andhamming
window
Linear
Predictive
Analysis
Cepstral
Analysis
LPCC
DFT coefficient index
Amplitude 1
Anagha S. Bawaskar and Prabhakar N. Kota
http://www.iaeme.com/IJECET/index.asp 88 editor@iaeme.com
 Pre-emphasis and Hamming Window
The first block is a pre-emphasis block; the input signal is given to, the first step
of the algorithm is pre-emphasis. The idea of pre-emphasis is to spectrally flatten the
speech signal and equalize the inherent spectral tilt in a speech [8]. Pre-emphasis is
implemented by a first order FIR digital filter. The following equation shows the
transfer function of the pre-emphasis digital filter,
1
1)( 
 zZHp 
(10)
Where, alpha is constant, which has a typical value of 0.97.
After pre-emphasis, the speech signal is subdivided into frames. This process is
the same as multiplying the entire speech sequence by a windowing function,
][][][ mnwnsnsm  (11)
Where s[n] is the entire speech sequence, s
m
[n] is a windowed speech frame at time m
and w[n] is the windowing function.
The typical length of a frame is about 20-30 milliseconds. In the above equation,
m is the time shift or the step size of the windowing function. A new frame is obtained
by shifting the windowing function to a subsequent time. The amount of shifting is
typically 10 milliseconds. The shape of the windowing function is important.
Rectangular window is not recommended since it causes severe spectral distortion
(leakage) to the speech frames [9]. Other types of windowing function, which
minimize the spectral distortion, should be used. One of the most commonly used
windows is the Hamming window.
 







1
2
cos46.054.0][
N
nw
(12)
In the above equation, N is the length of the windowing function. After Hamming
windowing, the speech frame is passed to the next stage for further processing.
 Linear predictive analysis
In human speech production, the shape of the vocal tract governs the nature of the
sound being produced. The main idea is based on basic speech production model; it
says that vocal tract can be modeled by an all-pole filter. These are nothing but the
simple coefficient of all-pole filter. They are same as smooth envelope of log
spectrum of speech. The main idea behind LPC is that a given speech sample can
have approximated as a linear combination of the past speech samples. LPC models
signal s (n) as a linear combination of its past values and present input (vocal cords
excitation). If the signal will be represented only in terms of the linear combination of
the past values then the difference between real and predicted output is called
prediction error. LPC minimizes the prediction error to find out the coefficients.
The cepstrum is the inverse transform of the log of the magnitude of the spectrum.
Useful for separating convolved signals (like the source and filter in the speech
production model). Log operation separates the vocal tract transfer function and the
voice source. Vocal Tract filter has slow spectral variations and excitation signal has
high spectral variations. Generally provides more efficient and robust coding of
speech information than LPC coefficients.
Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM
http://www.iaeme.com/IJECET/index.asp 89 editor@iaeme.com
Figure 7 LPCC [10]
The predictor coefficients are rarely used as features, but they are transformed
into the more robust Linear Predictive Cepstral Coefficients (LPCC) features. The
LPC are obtained using Levinson-Durbin recursive algorithm. This is known as LPC
analysis. The difference between the actual and the predicted sample value is termed
as the prediction error or residual [11] and is given by,
)()()( nsnsne




p
k
k knsans
1
)()(
(13)
)14(..............................1,)()( 0
0
 
aknsane
p
k
k
Optimal predictor coefficients will minimize this mean square error. At minimum
value of E,
)15...(........................................,...2,1,0 pk
E
ak



Differentiating and equating to zero we get,
= (16)
Where, )18.......(........................................)]()...2()1([
)17(......................................................]...[ 21
r
r
p
prrrr
aaaa


Where ‘R’ is the Toeplitz symmetric autocorrelation matrix given by,




















)0(......)1(
....
....
)2(...)0()1(
)1(...)1()0(
rpr
prrr
prrr
R
Equation can be solved for predictor coefficients by using Levinson’s and Durbin
algorithm as follows:
)20..(....................
|][|.][
)19...(........................................].........0[
)1(
1
1
1
)0(




 


i
L
j
i
j
i
E
jirair
k
rE
ka
Anagha S. Bawaskar and Prabhakar N. Kota
http://www.iaeme.com/IJECET/index.asp 90 editor@iaeme.com
Where,
pi 1
kai
j 
(21)
1)1(
. 


 i
jii
i
j
i
j akaa
(22)
12
).1( 
 i
i
i
EkE (23)
The above set of equations is solved recursively for i=1, 2…p. the final solution is
given by
)( p
mm aa  , pmwhere 1, (24)
Where; sam' are linear predictive coefficients (LPC)
 Cepstral Analysis.
In reality, the actual predictor coefficients are never used in recognition, since
they typical show high variance. The predictor coefficients are more efficiently
transformed to a robust set of parameters known as Cepstral coefficients
Before going to the definition of Cepstral coefficients, let us go through the
definition of the Cepstrum. A cepstrum is nothing but the result of taking the Fourier
transform of the logarithm of the estimated spectrum of a signal. The three different
types of cepstrum are the power cepstrum, complex cepstrum and the other one is real
cepstrum. Among them, the power cepstrum, in particular, finds application in the
analysis of human speech. The name cepstrum was derived from the word spectrum
by reversing the first four letters.
The steps through which the input speech signal goes through are preprocessing
then feature extraction and after that modeling. After preprocessing, the signal
reduces complex complexity while operating on speech signal. In this one particular
reduces the number of samples of operations. It is very difficult to work on huge set
of samples; therefore instead of working on such a large set of samples, we restrict
our operations to a frame of sufficiently reduced length. After the signal conditioning
or after pre-processing the speech signal goes through the feature extraction stage.
Here the features are extracted by using DCT. That is calculating the coefficients
using DCT.
Mathematically;
 )))((log( windowyFFTabsdctCeps  (25)
The principal advantage of Cepstral coefficients is that they are generally
decorrelated and this allows diagonal covariances. However, one minor problem with
them is that the higher order Cepstral are numerically quite small and this results in a
very wide range of variances when going from the low to high Cepstral coefficients.
Cepstral coefficient can be used to separate the excitation signal (which contains
the words and the pitch) and the transfer function (which contains the voice quality).
The cepstrum can be seen as information about rate of change in the different
spectrum bands. The recursive relation between the predictor coefficients and
Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM
http://www.iaeme.com/IJECET/index.asp 91 editor@iaeme.com
Cepstral coefficients is used to convert the LP coefficients (LPC) into LP Cepstral
coefficients kc
)26....(..................................................ln 2
0 c
kmk
m
kmm ac
m
k
ac 

 






1
1
pm1 (27)
)28.(........................................
1
1 kmk
m
km ac
m
k
c 

 






Where 2
 the gain term in the LP analysis and d is is the number of LP Cepstral
coefficients.
4. GAUSSIAN MIXTURE MODEL (GMM) AND UNIVERSAL
BACKGROUND MODEL
The text independent speaker recognition system used in this paper uses GMM-UBM
approach for modeling purpose. Generally; two models are being developed here, one
is target speaker model and other is impostor model (UBM). It has generalization
ability to handle unseen acoustic pattern [12].
In a biometric system, GMM is commonly used as a parametric model of
probability distribution continuous measurements or features. The features used are
generally vocal tract features in any speaker identification system. As we all know
that GMM are more likely used for text-independent speaker identification as the
prior knowledge about what speaker will say. Hence modeling is generally done in
GMM.A Gaussian mixture model is a weighted sum of M component Gaussian
densities as given by the equation [13].
Where; x is a D-dimensional continuous-valued data vector (i.e. measurement
or features),
, i = 1….. M are the mixture weights, and
, i = 1… M is the component Gaussian densities. Each component density is a
D-variate Gaussian function of the form;
With mean vector and covariance matrix the mixture weights satisfy the
constraint that
The complete Gaussian mixture model is parameterized by the mean vectors,
covariance matrices and mixture weights from all component densities. These
parameters are collectively represented by the notation,
i=1 … M ………. (31)
)29......(..........),|()|(
1


M
i
iii xgwxp 
iw
),|( iixg 
)30)}....(()'(
2
1
exp{
||)2(
1
),|( 1
2/12/ iii
i
Dii xxxg 

 

 
},,{ iii  
i i
11   i
M
i 
Anagha S. Bawaskar and Prabhakar N. Kota
http://www.iaeme.com/IJECET/index.asp 92 editor@iaeme.com
For a sequence of T training vectors .The GMM likelihood, assuming
independence between the vectors, can be written as
(32)
For utterances with T frames, the log-likelihood of speaker models is;
)33(..........)|(log)|(log)(
1


T
t
stss xpXpXL 
For speaker identification, the value of )(XLs is computed for all speaker models
s enrolled in the system and the owner of the model that generates the highest value
is the returned as the identified speaker. During training phase, Feature vectors are
being trained using Expectation and Maximization (E&M) algorithm. An iterative
update of each of the parameters in  , with a consecutive increase in the log
likelihood at each step.
GMM are generally used for text-independent speaker identification. The
drawback of the previous systems is being overcome by using GMM-UBM. It
overcomes on the cost of the mode; it is not as expensive that of the GMM. There is
no need for the vocabulary database or big phoneme. GMM is more advantageous
than HMM.
Capturing the general characteristics of a population and accordingly adapting it
to individual speaker is the basic idea of UBM. In other words more briefly UBM is
defined as the model which is used in many application areas but one of them is
biometric system which is used to compare the person’s independent feature
characteristics against person specific feature model during decision of acceptance or
rejection. UBM is also said as GMM only with large set of speakers.
The UBM is trained with the EM algorithm on its training data. For the speaker
recognition process, it fulfills two main roles:
It is the apriori model for all target speakers when applying Bayesian adaptation to
derive speaker models and it helps to compute log-likelihood ratio much faster by
selecting the best Gaussian for each frame on which likelihood is relevant. This work
proposes to use the UBM as a guide to discriminative training of speakers [14].
5. COMPARATIVE RESULTS OF FUSED MEL FEATURE SETS
AND LINEAR PREDICTIVE CEPSTRAL COEFFICIENTS
The main method focus in this paper is fusion of the both algorithms that are used
both MFCC and IMFCC respectively. The main aim is to compare the fused results to
the results obtained from LPCC. That is here the accuracy obtained from the fused
Mel feature set is compared with the Linear predictive Cepstral coefficient. The better
results among them will give us the accurately identified speaker respectively among
the database used for it. The system performs better if the two or more combination of
them were supplied with information that is complementary in nature. For obtaining
the identification accuracy MFCC and IMFCC features which are complementary to
each other can be fused together. There are many possible ways for combining such
as; product, sum, minimum, maximum, median, average etc, can be used. The sum
rule outperforms as compared to the other combinations and is most resilient to
estimation errors.
Let us go through the block diagram of Fused Mel feature set along with LPCC
with GMM-UBM modeling technique.


T
t
txpXp
1
)|()|( 
},.....{ 1 TxxX 
Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM
http://www.iaeme.com/IJECET/index.asp 93 editor@iaeme.com
From figure 7 and 8 we can say that system includes training and testing for fused
Mel feature Set and LPCC feature set. The implementation is done on TIMIT
database. TIMIT corpus is one of the standard databases used by the many researchers
for the purpose of speaker identification. This paper also concentrates on the TIMIT
database. It comprises of the 16 speakers.
Figure 8 Steps involved in speaker identification system (fused Mel features sets) [5]
Figure 9 Speaker identification system (LPCC) [6]
The recordings are from 8 dialect regions. Each speaker has 10 utterances
respectively Total 160 sentences recordings (10 recordings per speaker). The audio
format is .wav format, single channel, 16 kHz sampling, 16-bit sample, PCM
encoding.
The features are being extracted by using Gaussian Mel scale filter bank. The
feature vectors are trained by using Expectation Maximum algorithm. From the
diagram, we can say that separate model is being created for each speaker [5].
Features are extracted from the incoming test signal and then the likelihood of these
features with each of the speaker model is determined. These are included in the
testing step. The likelihood for MFCC and IMFCC as well as for LPCC is determined.
We have drawn two separate block diagrams for fused Mel feature sets and LPCC. In
first diagram a uniform weighted sum rule is adopted to fuse the scores from the two
classifiers.
(34)
i
IMFCCMFCC
ii
com SwwSS )1( 
Anagha S. Bawaskar and Prabhakar N. Kota
http://www.iaeme.com/IJECET/index.asp 94 editor@iaeme.com
is combined score of MFCC and IMFCC, and , are scores generated by the
MFCC model and scores generated by IMFCC Model and wis fusion coefficient. On similar Line, we
calculated the values for LPCC and are denoted by
i
LPCCS .
The accuracy for Fusion and LPCC are calculated and are compared. The usage of
weights and number of mixtures can be changed to different values to test the system
for optimum result.
Table I shows the performance level of proposed system for different weights and
mixtures. As stated we are using standard database TIMIT of 16 speakers, we need to
divide these into two for training and testing purpose. For this purpose, the UBM
consist of 5 speakers and GMM 11 speakers respectively. The background model is
generated by UBM. The value of alpha that is filtering constant is kept as 0.97
respectively. The accuracy is being calculated on the basis of False positive and False
negative. In false positive a false speaker is accepted as true one. While in False
Negative, a true speaker is rejected as an impostor. The formula for accuracy
calculation is:
Accuracy in percentage=100-((FP+FN)*100/ (M*N))
Where, M*N= size of the confusion matrix.
Table I. Comparative Results for different number of Mixtures and weights for given
proposed system
No. of
Mix-
tures
Score threshold=0.6 Score
threshold=0.77
Score
threshold=0.8
Score threshold=0.97
Fusion
(%)
LPCC
(%)
Fusion
(%)
LPCC
(%)
Fusion
(%)
LPCC
(%)
Fusion
(%)
LPCC
(%)
4 92.56 84.29 92.56 85.95 92.56 85.95 91.73 86.77
8 94.21 86.77 92.56 86.77 92.56 87.60 92.56 87.60
16 92.56 71.07 95.04 73.55 95.04 74.38 92.56 76.85
Figure 10 Graphical Representation of table 1
From above table, we can see that the various accuracy percentages we got for the
different values of the mixtures and the different values of score threshold. The value
of threshold increases the accuracy is increasing accordingly. But in all the accuracy
for the Fusion is good as compared to the LPCC. The performance of the fused
system exceeds the performance of LPCC. The percentage of maximum performance
is 95.04% and hence likewise we have found out good identification with limited
errors.
i
coms i
MFCCS i
IMFCCS
Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using
GMM-UBM
http://www.iaeme.com/IJECET/index.asp 95 editor@iaeme.com
6. CONCLUSION
Many methods were used earlier for feature extraction. They include MFCC, IMFCC,
etc. These two algorithms worked individually well and give good accuracy. Though
the IMFCC help MFCC to improve its accuracy further, these two algorithms are
combined together and is called as fused Mel Feature set. In this, the Gaussian
Mixture Model is being evaluated for speaker identification. The performance is
increased by fusing the complementary information. As shown in table, the accuracy
has been calculated for LPCC and Fusion and is seen 95.04% at weight 0.77 and 0.8
respectively for Fusion which is better than 73.55 and 74.38 at weight 0.77 and 0.8 for
LPCC. The more enhancements may be done by changing the modeling technique
and by changing various combinations of weights.
The future scope may include an application of same database approach try to
develop a real-time application and also the system can be developed by using
artificial neural network based approach.
REFERENCES
[1] J. Kittler, M. Hatef, R. Duin, J. Mataz, On Combining Classifier, IEEE
Transaction, Pattern Analysis and Machine Intelligence, 20(3), pp.226-
239,March 1998.
[2] Rana, Mukesh, and Saloni Miglani, Performance analysis of MFCC and LPCC
Techniques in Automatic speech Recognition, International Journal of
Engineering and Computer Science, 3(8), pp.7727-7732, August, 2014
[3] Sridharan, Sridha & Wong, Eddie, Comparison of Linear Prediction Cepstrum
Coefficients and Mel-Frequency Cepstrum Coefficients for language
identification, Proceedings of International Symposium on Intelligent
Multimedia, Video and Speech Processing, pp. 95-98, 2-4 May 2001
[4] Chakroborty Sandipan, and Goutam Saha, Improved text-independent speaker
identification using fused MFCC & IMFCC feature sets based on Gaussian filter,
International Journal of Signal Processing, 5(1), pp. 11-19, 2009
[5] R. Shantha Selva Kumari , S. selva Nidhyananthan , Anand, Fused Mel Feature
sets based Text-Independent Speaker Identification using Gaussian Mixture
Model, International Conference on Communication Technology and System
Design , Procedia Engineering, 30, pp. 319–326, 2012
[6] Anagha S. Bawaskar, Prabhakar N. Kota, Speaker Identification Based on MFCC
and IMFCC Using GMM-UBM, International Organization of Scientific
Research (IOSR Journals), 5(2), pp. 53-60, March-April 2015
[7] Cheng, Octavian, Waleed Abdulla, and Zoran Salcic, Performance evaluation of
front-end processing for speech recognition systems, School of Engineering
Report. The University of Auckland, Electrical and Computer Engineering, 2005
[8] Rabiner, L. and Juang, B, Fundamentals of speech recognition, Prentice Hall,
Inc., Upper Saddle River, New Jersey, 22 April 1993
[9] Rabiner, L.R., Schafer, R.W., Digital Processing of Speech Signals, Prentice
Hall, 1978.
[10] Pallavi P. Ingale and Dr. S.L. Nalbalwar, Novel Approach To Text Independent
Speaker Identification, International Journal of Electronics and Communication
Engineering & Technology, 3(2), 2012, pp. 87-93.
[11] Chang, Wen-Wen, Time Frequency Analysis and Wavelet Transform Tutorial
Time-Frequency Analysis for Voiceprint (Speaker) Recognition, National Taiwan
University.
Anagha S. Bawaskar and Prabhakar N. Kota
http://www.iaeme.com/IJECET/index.asp 96 editor@iaeme.com
[12] Pazhanirajan, S., and P. Dhanalakshmi, EEG Signal Classification using Linear
Predictive Cepstral Coefficient Features, International Journal of Computer
Applications, 73(1), pp. , 2013
[13] Chao, Yi-Hsiang; Tsai, W.-H.; Hsin-Min Wang, Discriminative Feedback
Adaptation for GMM-UBM Speaker Verification, Chinese Spoken language
Processing( ISCSL) 6th International Symposium on , pp.1,4, 16-19 Dec. 2008
[14] Manan Vyas, A Gaussian Mixture Model Based Speech Recognition System
Using Matlab, Signal & Image Processing: An International Journal (SIPIJ),
4(4), August 2013
[15] Amr Rashed, Fast Algorithm For Noisy Speaker Recognition Using Ann,
International journal of Computer Engineering & Technology, 5(2), 2014, pp. 12
- 18.
[16] Viplav Gautam, Saurabh Sharma,Swapnil Gautam and Gaurav Sharma,
Identification and Verification of Speaker Using Mel Frequency Cepstral
Coefficient, International Journal of Electronics and Communication
Engineering & Technology, 3(2), 2012, pp. 413-423.
[17] Scheffer N, Bonastre. J.F, UBM-GMM Driven Discriminative Approach for
Speaker verification, Speaker and Language Recognition workshop, IEEE
Odyssey, pp.1-7, 28-30 June 2006.

More Related Content

What's hot

Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
 
Cochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersCochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersIAEME Publication
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientijait
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesTELKOMNIKA JOURNAL
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...IJECEIAES
 
An effective evaluation study of objective measures using spectral subtractiv...
An effective evaluation study of objective measures using spectral subtractiv...An effective evaluation study of objective measures using spectral subtractiv...
An effective evaluation study of objective measures using spectral subtractiv...eSAT Journals
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITIONniranjan kumar
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET Journal
 
Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...ijitjournal
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET Journal
 
Text independent speaker recognition using combined lpc and mfc coefficients
Text independent speaker recognition using combined lpc and mfc coefficientsText independent speaker recognition using combined lpc and mfc coefficients
Text independent speaker recognition using combined lpc and mfc coefficientseSAT Publishing House
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01ijcsit
 

What's hot (17)

Ad04507176183
Ad04507176183Ad04507176183
Ad04507176183
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
 
Cochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filtersCochlear implant acoustic simulation model based on critical band filters
Cochlear implant acoustic simulation model based on critical band filters
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...
 
An effective evaluation study of objective measures using spectral subtractiv...
An effective evaluation study of objective measures using spectral subtractiv...An effective evaluation study of objective measures using spectral subtractiv...
An effective evaluation study of objective measures using spectral subtractiv...
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
 
Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...Text independent speaker identification system using average pitch and forman...
Text independent speaker identification system using average pitch and forman...
 
Db31706711
Db31706711Db31706711
Db31706711
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
Text independent speaker recognition using combined lpc and mfc coefficients
Text independent speaker recognition using combined lpc and mfc coefficientsText independent speaker recognition using combined lpc and mfc coefficients
Text independent speaker recognition using combined lpc and mfc coefficients
 
5215ijcseit01
5215ijcseit015215ijcseit01
5215ijcseit01
 

Viewers also liked

Individual treatment tom
Individual treatment tomIndividual treatment tom
Individual treatment tomrhsmediastudies
 
Stephanie Gitau Portfolio - 2015 Edited small
Stephanie Gitau Portfolio - 2015 Edited smallStephanie Gitau Portfolio - 2015 Edited small
Stephanie Gitau Portfolio - 2015 Edited smallStephanie Gitau
 
Экскурсия в усадьбу Голицыных в кузьминках.
Экскурсия в усадьбу Голицыных в кузьминках.Экскурсия в усадьбу Голицыных в кузьминках.
Экскурсия в усадьбу Голицыных в кузьминках.mv1386
 
математична регата
математична регатаматематична регата
математична регатаtimmo67
 
practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)
practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)
practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)Ed Shapiro
 
Oliwia modul 3
Oliwia modul 3Oliwia modul 3
Oliwia modul 3oliwias
 
Presentatie1 piscinesrev2
Presentatie1 piscinesrev2Presentatie1 piscinesrev2
Presentatie1 piscinesrev2Michel Hamers
 

Viewers also liked (19)

Ijeet 06 08_008
Ijeet 06 08_008Ijeet 06 08_008
Ijeet 06 08_008
 
casestudy3-Anthem
casestudy3-Anthemcasestudy3-Anthem
casestudy3-Anthem
 
Individual treatment tom
Individual treatment tomIndividual treatment tom
Individual treatment tom
 
IoT_Implemented
IoT_ImplementedIoT_Implemented
IoT_Implemented
 
Innovative work
Innovative workInnovative work
Innovative work
 
Ijciet 06 10_009
Ijciet 06 10_009Ijciet 06 10_009
Ijciet 06 10_009
 
Stephanie Gitau Portfolio - 2015 Edited small
Stephanie Gitau Portfolio - 2015 Edited smallStephanie Gitau Portfolio - 2015 Edited small
Stephanie Gitau Portfolio - 2015 Edited small
 
Экскурсия в усадьбу Голицыных в кузьминках.
Экскурсия в усадьбу Голицыных в кузьминках.Экскурсия в усадьбу Голицыных в кузьминках.
Экскурсия в усадьбу Голицыных в кузьминках.
 
Thong ke xo so mien bac 10 ngay
Thong ke xo so mien bac 10 ngayThong ke xo so mien bac 10 ngay
Thong ke xo so mien bac 10 ngay
 
математична регата
математична регатаматематична регата
математична регата
 
Phrasal verbs
Phrasal verbsPhrasal verbs
Phrasal verbs
 
Ingar´s CV.2
Ingar´s CV.2Ingar´s CV.2
Ingar´s CV.2
 
practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)
practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)
practice-of-dynamic-psychiatry-small-community_shapiro_2014 (1) (1)
 
Oliwia modul 3
Oliwia modul 3Oliwia modul 3
Oliwia modul 3
 
Swot analysis
Swot analysisSwot analysis
Swot analysis
 
Ijeet 06 08_005
Ijeet 06 08_005Ijeet 06 08_005
Ijeet 06 08_005
 
CV
CVCV
CV
 
Snapchat
SnapchatSnapchat
Snapchat
 
Presentatie1 piscinesrev2
Presentatie1 piscinesrev2Presentatie1 piscinesrev2
Presentatie1 piscinesrev2
 

Similar to Ijecet 06 09_010

Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...IDES Editor
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpccIJAEMSJORNAL
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identificationsipij
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionCSCJournals
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...IOSR Journals
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...IJCI JOURNAL
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkeSAT Journals
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...TELKOMNIKA JOURNAL
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...sophiabelthome
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
 

Similar to Ijecet 06 09_010 (20)

Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
 
Speaker Identification
Speaker IdentificationSpeaker Identification
Speaker Identification
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: Review
 
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
Isolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural networkIsolated words recognition using mfcc, lpc and neural network
Isolated words recognition using mfcc, lpc and neural network
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
D04812125
D04812125D04812125
D04812125
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
International journal of signal and image processing issues vol 2015 - no 1...
International journal of signal and image processing issues   vol 2015 - no 1...International journal of signal and image processing issues   vol 2015 - no 1...
International journal of signal and image processing issues vol 2015 - no 1...
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification Techniques
 
D111823
D111823D111823
D111823
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 

More from IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

More from IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Recently uploaded

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 

Recently uploaded (20)

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 

Ijecet 06 09_010

  • 1. http://www.iaeme.com/IJECET/index.asp 82 editor@iaeme.com International Journal of Electronics and Communication Engineering & Technology (IJECET) Volume 6, Issue 9, Sep 2015, pp. 82-96, Article ID: IJECET_06_09_010 Available online at http://www.iaeme.com/IJECETissues.asp?JType=IJECET&VType=6&IType=9 ISSN Print: 0976-6464 and ISSN Online: 0976-6472 © IAEME Publication COMPARATIVE STUDY OF LPCC AND FUSED MEL FEATURE SETS FOR SPEAKER IDENTIFICATION USING GMM- UBM Anagha S. Bawaskar Dept. of Electronics & Telecommunication, M. E. S. College of Engineering, Pune, India Prabhakar N. Kota Dept. of Electronics & Telecommunication, M. E. S. College of Engineering, Pune, India ABSTRACT Biometrics identifiers are typically measurable characteristics used to label and describe the individual respectively. Biometric identifiers are the combination of both physiological and behavioral characteristics. The physiological characteristics include the characteristics related to the shape of the body. There are various examples for physiological characteristics but not limited. Examples include fingerprint, palm, hand geometry, iris recognition, and retina. Behavioral characteristics are related to the pattern behavior of a person including but not limited to typing rhythm and voice. Biometric system technology is now a day’s a well-furnished technology, it analyzes human body characteristics. It is also known as one of the active biometric tasks. There is much speech related activities such as language recognition, speech recognition, and speaker recognition respectively. Speaker recognition superficially defines as to identify the accurate speaker from the group of various people. It is a very broad term and is further classified as speaker identification and speaker verification. The paper is concentrating on the term speaker identification. The main aim is to identify the accurate speaker from the given speech samples. These samples are obtained by extracting features and are used for modeling purpose. Standard database TIMIT is being used for identification. The paper comprises of various algorithms for feature extraction, they are Mel Frequency Cepstral Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficient (IMFCC) and linear predictive Cepstral Coefficients (LPCC). The term Fusion came from the combination of the two algorithms namely MFCC and IMFCC. The
  • 2. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM http://www.iaeme.com/IJECET/index.asp 83 editor@iaeme.com comparison is made among the results of Fusion and LPCC respectively. From the result, it is seen on an average Fusion is better than LPCC. Index Terms: Gaussian Mixture Models (GMM), Inverted Mel Frequency Cepstral Coefficients (IMFCC), Linear Predictive Cepstral Coefficients (LPCC), Mel Frequency Cepstral Coefficients (MFCC), Universal Background Model (UBM) Cite this Article: Anagha S. Bawaskar and Prabhakar N. Kota. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM, International Journal of Electronics and Communication Engineering & Technology, 6(9), 2015, pp. 82-96. http://www.iaeme.com/IJECET/issues.asp?JType=IJECET&VType=6&IType= 9 1. INTRODUCTION Nowadays various biometrics systems are there. In last decades, an increasing interest in security system has risen. For the security purpose, it includes various biometric schemes. Biometrics refers to technologies that measure and analyzes human body characteristics. There are many biometric methods existing in the world, they are face recognition, eye retina and iris recognition, fingerprint, DNA, hand measurements etc. for authentication purpose. These are the one of the well-known biometric method, adding to this list one of the well-known method is Speech signal processing. Speech is one of the natural forms used in communication. Speech recognition has application in voice identification in ordinary personal computers to biometric and forensic applications. Recently the development has been seen in a security system. There are two main techniques in speech processing, one is speaker recognition and the other is speech recognition, in this paper the main focus is given on speaker recognition. The speaker recognition is further divided only speaker identification and speaker verification. Speaker identification is the technique in which not registered speaker is being identified and Speaker verification a claimed speaker is being identified. The speaker identification is in a ratio of 1: N while speaker verification is in 1:1 ratio respectively. In this paper text- independent, speaker identification system is used. In speaker identification, the specific characteristics of voice are being extracted from the given sample of voice of speaker known as feature extraction. After this, the speaker model is trained and stored into the system database. The extraction of the voice of speaker yields us the specific information of the speakers’ voice called feature vectors. The speaker vectors represent the specific information of the speaker which is based on the single or many things from the following: vocal tract, excitation source, and behavioral traits. All speaker recognition systems use the set of scores to enhance the probability and reliability of the recognizer. Before feature extraction, the system goes through the pre-processing stage. An important role is played by Pre- processing in speaker identification and helps to reduce the amount of variation in the database which does not contain the important information about speech; it is considered to be a good practice. The preprocessing removes the irrelevant information respectively. The various algorithms used for feature extraction are Mel-frequency Cepstral Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficients (IMFCC), and Linear Predictive Cepstral Coefficients (LPCC). In this paper, feature extraction is
  • 3. Anagha S. Bawaskar and Prabhakar N. Kota http://www.iaeme.com/IJECET/index.asp 84 editor@iaeme.com done by using all these above-mentioned algorithms. Investigations by the researchers find out speaker specific complementary information relative to MFCC that are called as Inverse Mel Frequency Cepstral Coefficients (IMFCC) respectively. Complementary information is used for combining the score models and for combining the score models along with the MFCC and is named as Fused Mel Feature Set. Such models are nothing but the mathematical representation of the particular system [1].The inverse filter bank method is being used for capturing this complementary information from high-frequency part of the energy spectrum. IMFCC captures the information which is neglected by MFCC. The respective features are modeled by using Gaussian Mixture Model and Universal Background Model (GMM- UBM). All algorithms used in this paper are based on Gaussian filters only. The results are verified in standard database TIMIT. The final results are the comparison between LPCC results and Fused Mel Feature Set results and accurate results are noted down. The next section of this paper is followed by Fused Mel Feature set using Gaussian Filters and Linear predictive Cepstral coefficient using Gaussian filters. It is followed by comparative results of both Fused Mel feature Set and Linear Predictive Cepstral Coefficient. 2. FEATURE EXTRACTION AND FILTER DESIGN To represent any speech signal in a finite number of measures is the goal of feature extraction. Features are nothing but the representation of the spectrum of a speech signal in each window frame. The Cepstral vectors are derived from a filter bank that has been designed according to some model of the auditory system [2]. Most of the feature extraction methods use a standard triangular filter. The triangular filters are used for filtering the spectrum of the speech signal which simulates the characteristics of a human ear. But this also has some disadvantages. These are, they give sharper or crisp partition in an energy spectrum, due to this some information is lost. In this paper, Gaussian filters are used. The crisp and sharp transition in an energy spectrum is avoided if we use Gaussian filters instead of triangular filters. This gives results in a smoother adaptation from one sub band to other. Because of this adaptive property, there is always one type of correlation being maintained. These correlations are maintained from the mid points of the triangular filters at the base of it as well as from the end points of triangular filters. Mathematical calculations in Gaussian filters are simple. Hence because of such advantages over triangular filters we use Gaussian Filters. The motivation for using Mel-Frequency Cepstrum Coefficients was due to the fact that the auditory response of the human ear resolves frequencies nonlinearly. The mapping from linear frequency to Mel Frequency is defined as [3]. )7001(log2595 10 ffmel  (1) Where; The subjective pitch in Mel corresponding to f is melf , this frequency in actual measured in Hz. MFCCs are one of the more popular parameterization methods used by researchers in the speech technology field. It has the benefit that it is capable of capturing the phonetically important characteristics of speech. MFCC are band- limiting in nature and can easily be employed to make it suitable for applications like a telephone.
  • 4. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM http://www.iaeme.com/IJECET/index.asp 85 editor@iaeme.com Generally, the feature extraction using MFCC uses a triangular filter. The triangular filter has some characteristics like, asymmetrically tapering which also do not provide the correlation between the sub-bands and the nearby spectral components. Because of all this, information loss occurred there. By using Gaussian filters, one profit is that it avoids drawbacks and losses seen in a triangular filter. Gaussian filters are tapering towards both the end and provide correlation between sub-bands and its nearby spectral components [4]. The IMFCC is one of the feature extraction techniques. It captures the complementary information present in the high-frequency part of the spectrum. The figure below shows the steps involved in feature extraction of both Gaussian MFCC and IMFCC features. Let the input speech signal be y (n), where n=1, M. it represent the preprocessed frame of the signal. Firstly the signal y (n) is converted to the frequency domain by a DFT which leads to the energy spectrum. This is followed by Gaussian filter bank block. Figure 1 Steps involved in extraction of Gaussian MFCC and IMFCC [5] Mathematically the equation for Gaussian filter is written as; (2) Where, k is coefficient index in the N-point DFT, b i k is a point between the th i triangular filters boundary located at its base and considered as mean of th i Gaussian filter while the i is the standard deviation or square root of variance and can be written as,   ii bb i kk   1 (3) Where; is the parameter where variance is being controlled. Figure 2 Filterbank design [5] MFCC IMFCC DCT DCT ()10Log ()10Log Speech Signal Pre- Processing ||2 FFT MFCC Filter Bank Gaussian Gaussian IMFCC Filter Bank Gaussian 2 2 2 )( i bi MFCC kk g i e     
  • 5. Anagha S. Bawaskar and Prabhakar N. Kota http://www.iaeme.com/IJECET/index.asp 86 editor@iaeme.com Two plots in a single figure are shown in above figure 2. One is for triangular filter and the other is for the Gaussian filter. This plot is made by considering a single value of sigma. Here in this case by considering different values for sigma plot can be drawn respectively. Fig 4 and Fig 5 shows the individual response for Gaussian filter bank of MFCC and IMFCC. Figure 3 Mel scale Gaussian filter bank [6] Figure 4 Inverted Mel scale Gaussian filter bank [6] Mathematically, the Gaussian MFCC can be written as,        1 )(1           Q ffffi fff F M k lowmelhighmel lowmelmel s s bi (4) sM is a number of points in DFT, sF is the sampling frequency, lowf and highf are low and high-frequency boundaries of a filterbank, Q is the number of filters in the bank and 1 melf is an inverse of the transformation. )110(700)( 2595/1  fmel melmel ff (5) The inverted Mel Scale Filterbank structure can be obtained by just flipping original filterbank around the midpoint of frequency range that is being considered. )6.(..............................1 2 )( 1 '         k M k s iqi  Where, )(' ki is the original MFCC filter bank response. 0 1000 2000 3000 4000 5000 6000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frequency (Hz) Weight 0 1000 2000 3000 4000 5000 6000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frequency (Hz) Weight
  • 6. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM http://www.iaeme.com/IJECET/index.asp 87 editor@iaeme.com These filter banks are being forced on the energy spectrum obtained by taking Fast Fourier transform of the preprocessed signal as follow:    2 1 2 )(.|)(|)( Ms k g i g kkYie MFCC MFCC  (7) Where, )(ki is respective filter response and 2 )(kY is the energy spectrum. 1ibk ibk 1ibk Figure 5 Response )(ki of a typical Mel scale filter [5] Finally, DCT is taken on the log filter bank energies })]}({{log[ Q iie and the final MFCC coefficients can be written as- )8.(].........) 2 12 (cos[)]1([log 2 1 0 Q l mie Q C MFCCMFCC g Q l g m        Where; 10  Rm , R is the desired number of Cepstral features. The same procedure for extracting the IMFCC features as well [4] and are denoted as; )9]......() 2 2 (cos[)]1([log 2 1 0 Q ll mie Q C IMFCCIMFCC g Q l g m         3. LINEAR PREDICTIVE CEPSTRAL COEFFICIENTS (LPCC) The predictive coefficient can be determined by minimizing the squared differences between actual speech samples and linearly predicted values. This set is a unique set of parameters. In practice, the actual predictor coefficients are never used as it is because of their high variance. These predictor coefficients are transformed to a more robust set of parameters known as Cepstral coefficients. The procedure for extracting the LPCC is same as that of MFCC and IMFCC respectively. In this also we are going to use Gaussian filter bank. Figure 6 Block diagram of LPCC algorithm [7] 1Amp l i t u d e Speech Sequence Pre-emphasis andhamming window Linear Predictive Analysis Cepstral Analysis LPCC DFT coefficient index Amplitude 1
  • 7. Anagha S. Bawaskar and Prabhakar N. Kota http://www.iaeme.com/IJECET/index.asp 88 editor@iaeme.com  Pre-emphasis and Hamming Window The first block is a pre-emphasis block; the input signal is given to, the first step of the algorithm is pre-emphasis. The idea of pre-emphasis is to spectrally flatten the speech signal and equalize the inherent spectral tilt in a speech [8]. Pre-emphasis is implemented by a first order FIR digital filter. The following equation shows the transfer function of the pre-emphasis digital filter, 1 1)(   zZHp  (10) Where, alpha is constant, which has a typical value of 0.97. After pre-emphasis, the speech signal is subdivided into frames. This process is the same as multiplying the entire speech sequence by a windowing function, ][][][ mnwnsnsm  (11) Where s[n] is the entire speech sequence, s m [n] is a windowed speech frame at time m and w[n] is the windowing function. The typical length of a frame is about 20-30 milliseconds. In the above equation, m is the time shift or the step size of the windowing function. A new frame is obtained by shifting the windowing function to a subsequent time. The amount of shifting is typically 10 milliseconds. The shape of the windowing function is important. Rectangular window is not recommended since it causes severe spectral distortion (leakage) to the speech frames [9]. Other types of windowing function, which minimize the spectral distortion, should be used. One of the most commonly used windows is the Hamming window.          1 2 cos46.054.0][ N nw (12) In the above equation, N is the length of the windowing function. After Hamming windowing, the speech frame is passed to the next stage for further processing.  Linear predictive analysis In human speech production, the shape of the vocal tract governs the nature of the sound being produced. The main idea is based on basic speech production model; it says that vocal tract can be modeled by an all-pole filter. These are nothing but the simple coefficient of all-pole filter. They are same as smooth envelope of log spectrum of speech. The main idea behind LPC is that a given speech sample can have approximated as a linear combination of the past speech samples. LPC models signal s (n) as a linear combination of its past values and present input (vocal cords excitation). If the signal will be represented only in terms of the linear combination of the past values then the difference between real and predicted output is called prediction error. LPC minimizes the prediction error to find out the coefficients. The cepstrum is the inverse transform of the log of the magnitude of the spectrum. Useful for separating convolved signals (like the source and filter in the speech production model). Log operation separates the vocal tract transfer function and the voice source. Vocal Tract filter has slow spectral variations and excitation signal has high spectral variations. Generally provides more efficient and robust coding of speech information than LPC coefficients.
  • 8. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM http://www.iaeme.com/IJECET/index.asp 89 editor@iaeme.com Figure 7 LPCC [10] The predictor coefficients are rarely used as features, but they are transformed into the more robust Linear Predictive Cepstral Coefficients (LPCC) features. The LPC are obtained using Levinson-Durbin recursive algorithm. This is known as LPC analysis. The difference between the actual and the predicted sample value is termed as the prediction error or residual [11] and is given by, )()()( nsnsne     p k k knsans 1 )()( (13) )14(..............................1,)()( 0 0   aknsane p k k Optimal predictor coefficients will minimize this mean square error. At minimum value of E, )15...(........................................,...2,1,0 pk E ak    Differentiating and equating to zero we get, = (16) Where, )18.......(........................................)]()...2()1([ )17(......................................................]...[ 21 r r p prrrr aaaa   Where ‘R’ is the Toeplitz symmetric autocorrelation matrix given by,                     )0(......)1( .... .... )2(...)0()1( )1(...)1()0( rpr prrr prrr R Equation can be solved for predictor coefficients by using Levinson’s and Durbin algorithm as follows: )20..(.................... |][|.][ )19...(........................................].........0[ )1( 1 1 1 )0(         i L j i j i E jirair k rE ka
  • 9. Anagha S. Bawaskar and Prabhakar N. Kota http://www.iaeme.com/IJECET/index.asp 90 editor@iaeme.com Where, pi 1 kai j  (21) 1)1( .     i jii i j i j akaa (22) 12 ).1(   i i i EkE (23) The above set of equations is solved recursively for i=1, 2…p. the final solution is given by )( p mm aa  , pmwhere 1, (24) Where; sam' are linear predictive coefficients (LPC)  Cepstral Analysis. In reality, the actual predictor coefficients are never used in recognition, since they typical show high variance. The predictor coefficients are more efficiently transformed to a robust set of parameters known as Cepstral coefficients Before going to the definition of Cepstral coefficients, let us go through the definition of the Cepstrum. A cepstrum is nothing but the result of taking the Fourier transform of the logarithm of the estimated spectrum of a signal. The three different types of cepstrum are the power cepstrum, complex cepstrum and the other one is real cepstrum. Among them, the power cepstrum, in particular, finds application in the analysis of human speech. The name cepstrum was derived from the word spectrum by reversing the first four letters. The steps through which the input speech signal goes through are preprocessing then feature extraction and after that modeling. After preprocessing, the signal reduces complex complexity while operating on speech signal. In this one particular reduces the number of samples of operations. It is very difficult to work on huge set of samples; therefore instead of working on such a large set of samples, we restrict our operations to a frame of sufficiently reduced length. After the signal conditioning or after pre-processing the speech signal goes through the feature extraction stage. Here the features are extracted by using DCT. That is calculating the coefficients using DCT. Mathematically;  )))((log( windowyFFTabsdctCeps  (25) The principal advantage of Cepstral coefficients is that they are generally decorrelated and this allows diagonal covariances. However, one minor problem with them is that the higher order Cepstral are numerically quite small and this results in a very wide range of variances when going from the low to high Cepstral coefficients. Cepstral coefficient can be used to separate the excitation signal (which contains the words and the pitch) and the transfer function (which contains the voice quality). The cepstrum can be seen as information about rate of change in the different spectrum bands. The recursive relation between the predictor coefficients and
  • 10. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM http://www.iaeme.com/IJECET/index.asp 91 editor@iaeme.com Cepstral coefficients is used to convert the LP coefficients (LPC) into LP Cepstral coefficients kc )26....(..................................................ln 2 0 c kmk m kmm ac m k ac           1 1 pm1 (27) )28.(........................................ 1 1 kmk m km ac m k c           Where 2  the gain term in the LP analysis and d is is the number of LP Cepstral coefficients. 4. GAUSSIAN MIXTURE MODEL (GMM) AND UNIVERSAL BACKGROUND MODEL The text independent speaker recognition system used in this paper uses GMM-UBM approach for modeling purpose. Generally; two models are being developed here, one is target speaker model and other is impostor model (UBM). It has generalization ability to handle unseen acoustic pattern [12]. In a biometric system, GMM is commonly used as a parametric model of probability distribution continuous measurements or features. The features used are generally vocal tract features in any speaker identification system. As we all know that GMM are more likely used for text-independent speaker identification as the prior knowledge about what speaker will say. Hence modeling is generally done in GMM.A Gaussian mixture model is a weighted sum of M component Gaussian densities as given by the equation [13]. Where; x is a D-dimensional continuous-valued data vector (i.e. measurement or features), , i = 1….. M are the mixture weights, and , i = 1… M is the component Gaussian densities. Each component density is a D-variate Gaussian function of the form; With mean vector and covariance matrix the mixture weights satisfy the constraint that The complete Gaussian mixture model is parameterized by the mean vectors, covariance matrices and mixture weights from all component densities. These parameters are collectively represented by the notation, i=1 … M ………. (31) )29......(..........),|()|( 1   M i iii xgwxp  iw ),|( iixg  )30)}....(()'( 2 1 exp{ ||)2( 1 ),|( 1 2/12/ iii i Dii xxxg        },,{ iii   i i 11   i M i 
  • 11. Anagha S. Bawaskar and Prabhakar N. Kota http://www.iaeme.com/IJECET/index.asp 92 editor@iaeme.com For a sequence of T training vectors .The GMM likelihood, assuming independence between the vectors, can be written as (32) For utterances with T frames, the log-likelihood of speaker models is; )33(..........)|(log)|(log)( 1   T t stss xpXpXL  For speaker identification, the value of )(XLs is computed for all speaker models s enrolled in the system and the owner of the model that generates the highest value is the returned as the identified speaker. During training phase, Feature vectors are being trained using Expectation and Maximization (E&M) algorithm. An iterative update of each of the parameters in  , with a consecutive increase in the log likelihood at each step. GMM are generally used for text-independent speaker identification. The drawback of the previous systems is being overcome by using GMM-UBM. It overcomes on the cost of the mode; it is not as expensive that of the GMM. There is no need for the vocabulary database or big phoneme. GMM is more advantageous than HMM. Capturing the general characteristics of a population and accordingly adapting it to individual speaker is the basic idea of UBM. In other words more briefly UBM is defined as the model which is used in many application areas but one of them is biometric system which is used to compare the person’s independent feature characteristics against person specific feature model during decision of acceptance or rejection. UBM is also said as GMM only with large set of speakers. The UBM is trained with the EM algorithm on its training data. For the speaker recognition process, it fulfills two main roles: It is the apriori model for all target speakers when applying Bayesian adaptation to derive speaker models and it helps to compute log-likelihood ratio much faster by selecting the best Gaussian for each frame on which likelihood is relevant. This work proposes to use the UBM as a guide to discriminative training of speakers [14]. 5. COMPARATIVE RESULTS OF FUSED MEL FEATURE SETS AND LINEAR PREDICTIVE CEPSTRAL COEFFICIENTS The main method focus in this paper is fusion of the both algorithms that are used both MFCC and IMFCC respectively. The main aim is to compare the fused results to the results obtained from LPCC. That is here the accuracy obtained from the fused Mel feature set is compared with the Linear predictive Cepstral coefficient. The better results among them will give us the accurately identified speaker respectively among the database used for it. The system performs better if the two or more combination of them were supplied with information that is complementary in nature. For obtaining the identification accuracy MFCC and IMFCC features which are complementary to each other can be fused together. There are many possible ways for combining such as; product, sum, minimum, maximum, median, average etc, can be used. The sum rule outperforms as compared to the other combinations and is most resilient to estimation errors. Let us go through the block diagram of Fused Mel feature set along with LPCC with GMM-UBM modeling technique.   T t txpXp 1 )|()|(  },.....{ 1 TxxX 
  • 12. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM http://www.iaeme.com/IJECET/index.asp 93 editor@iaeme.com From figure 7 and 8 we can say that system includes training and testing for fused Mel feature Set and LPCC feature set. The implementation is done on TIMIT database. TIMIT corpus is one of the standard databases used by the many researchers for the purpose of speaker identification. This paper also concentrates on the TIMIT database. It comprises of the 16 speakers. Figure 8 Steps involved in speaker identification system (fused Mel features sets) [5] Figure 9 Speaker identification system (LPCC) [6] The recordings are from 8 dialect regions. Each speaker has 10 utterances respectively Total 160 sentences recordings (10 recordings per speaker). The audio format is .wav format, single channel, 16 kHz sampling, 16-bit sample, PCM encoding. The features are being extracted by using Gaussian Mel scale filter bank. The feature vectors are trained by using Expectation Maximum algorithm. From the diagram, we can say that separate model is being created for each speaker [5]. Features are extracted from the incoming test signal and then the likelihood of these features with each of the speaker model is determined. These are included in the testing step. The likelihood for MFCC and IMFCC as well as for LPCC is determined. We have drawn two separate block diagrams for fused Mel feature sets and LPCC. In first diagram a uniform weighted sum rule is adopted to fuse the scores from the two classifiers. (34) i IMFCCMFCC ii com SwwSS )1( 
  • 13. Anagha S. Bawaskar and Prabhakar N. Kota http://www.iaeme.com/IJECET/index.asp 94 editor@iaeme.com is combined score of MFCC and IMFCC, and , are scores generated by the MFCC model and scores generated by IMFCC Model and wis fusion coefficient. On similar Line, we calculated the values for LPCC and are denoted by i LPCCS . The accuracy for Fusion and LPCC are calculated and are compared. The usage of weights and number of mixtures can be changed to different values to test the system for optimum result. Table I shows the performance level of proposed system for different weights and mixtures. As stated we are using standard database TIMIT of 16 speakers, we need to divide these into two for training and testing purpose. For this purpose, the UBM consist of 5 speakers and GMM 11 speakers respectively. The background model is generated by UBM. The value of alpha that is filtering constant is kept as 0.97 respectively. The accuracy is being calculated on the basis of False positive and False negative. In false positive a false speaker is accepted as true one. While in False Negative, a true speaker is rejected as an impostor. The formula for accuracy calculation is: Accuracy in percentage=100-((FP+FN)*100/ (M*N)) Where, M*N= size of the confusion matrix. Table I. Comparative Results for different number of Mixtures and weights for given proposed system No. of Mix- tures Score threshold=0.6 Score threshold=0.77 Score threshold=0.8 Score threshold=0.97 Fusion (%) LPCC (%) Fusion (%) LPCC (%) Fusion (%) LPCC (%) Fusion (%) LPCC (%) 4 92.56 84.29 92.56 85.95 92.56 85.95 91.73 86.77 8 94.21 86.77 92.56 86.77 92.56 87.60 92.56 87.60 16 92.56 71.07 95.04 73.55 95.04 74.38 92.56 76.85 Figure 10 Graphical Representation of table 1 From above table, we can see that the various accuracy percentages we got for the different values of the mixtures and the different values of score threshold. The value of threshold increases the accuracy is increasing accordingly. But in all the accuracy for the Fusion is good as compared to the LPCC. The performance of the fused system exceeds the performance of LPCC. The percentage of maximum performance is 95.04% and hence likewise we have found out good identification with limited errors. i coms i MFCCS i IMFCCS
  • 14. Comparative Study of LPCC and Fused Mel Feature Sets For Speaker Identification Using GMM-UBM http://www.iaeme.com/IJECET/index.asp 95 editor@iaeme.com 6. CONCLUSION Many methods were used earlier for feature extraction. They include MFCC, IMFCC, etc. These two algorithms worked individually well and give good accuracy. Though the IMFCC help MFCC to improve its accuracy further, these two algorithms are combined together and is called as fused Mel Feature set. In this, the Gaussian Mixture Model is being evaluated for speaker identification. The performance is increased by fusing the complementary information. As shown in table, the accuracy has been calculated for LPCC and Fusion and is seen 95.04% at weight 0.77 and 0.8 respectively for Fusion which is better than 73.55 and 74.38 at weight 0.77 and 0.8 for LPCC. The more enhancements may be done by changing the modeling technique and by changing various combinations of weights. The future scope may include an application of same database approach try to develop a real-time application and also the system can be developed by using artificial neural network based approach. REFERENCES [1] J. Kittler, M. Hatef, R. Duin, J. Mataz, On Combining Classifier, IEEE Transaction, Pattern Analysis and Machine Intelligence, 20(3), pp.226- 239,March 1998. [2] Rana, Mukesh, and Saloni Miglani, Performance analysis of MFCC and LPCC Techniques in Automatic speech Recognition, International Journal of Engineering and Computer Science, 3(8), pp.7727-7732, August, 2014 [3] Sridharan, Sridha & Wong, Eddie, Comparison of Linear Prediction Cepstrum Coefficients and Mel-Frequency Cepstrum Coefficients for language identification, Proceedings of International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 95-98, 2-4 May 2001 [4] Chakroborty Sandipan, and Goutam Saha, Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter, International Journal of Signal Processing, 5(1), pp. 11-19, 2009 [5] R. Shantha Selva Kumari , S. selva Nidhyananthan , Anand, Fused Mel Feature sets based Text-Independent Speaker Identification using Gaussian Mixture Model, International Conference on Communication Technology and System Design , Procedia Engineering, 30, pp. 319–326, 2012 [6] Anagha S. Bawaskar, Prabhakar N. Kota, Speaker Identification Based on MFCC and IMFCC Using GMM-UBM, International Organization of Scientific Research (IOSR Journals), 5(2), pp. 53-60, March-April 2015 [7] Cheng, Octavian, Waleed Abdulla, and Zoran Salcic, Performance evaluation of front-end processing for speech recognition systems, School of Engineering Report. The University of Auckland, Electrical and Computer Engineering, 2005 [8] Rabiner, L. and Juang, B, Fundamentals of speech recognition, Prentice Hall, Inc., Upper Saddle River, New Jersey, 22 April 1993 [9] Rabiner, L.R., Schafer, R.W., Digital Processing of Speech Signals, Prentice Hall, 1978. [10] Pallavi P. Ingale and Dr. S.L. Nalbalwar, Novel Approach To Text Independent Speaker Identification, International Journal of Electronics and Communication Engineering & Technology, 3(2), 2012, pp. 87-93. [11] Chang, Wen-Wen, Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis for Voiceprint (Speaker) Recognition, National Taiwan University.
  • 15. Anagha S. Bawaskar and Prabhakar N. Kota http://www.iaeme.com/IJECET/index.asp 96 editor@iaeme.com [12] Pazhanirajan, S., and P. Dhanalakshmi, EEG Signal Classification using Linear Predictive Cepstral Coefficient Features, International Journal of Computer Applications, 73(1), pp. , 2013 [13] Chao, Yi-Hsiang; Tsai, W.-H.; Hsin-Min Wang, Discriminative Feedback Adaptation for GMM-UBM Speaker Verification, Chinese Spoken language Processing( ISCSL) 6th International Symposium on , pp.1,4, 16-19 Dec. 2008 [14] Manan Vyas, A Gaussian Mixture Model Based Speech Recognition System Using Matlab, Signal & Image Processing: An International Journal (SIPIJ), 4(4), August 2013 [15] Amr Rashed, Fast Algorithm For Noisy Speaker Recognition Using Ann, International journal of Computer Engineering & Technology, 5(2), 2014, pp. 12 - 18. [16] Viplav Gautam, Saurabh Sharma,Swapnil Gautam and Gaurav Sharma, Identification and Verification of Speaker Using Mel Frequency Cepstral Coefficient, International Journal of Electronics and Communication Engineering & Technology, 3(2), 2012, pp. 413-423. [17] Scheffer N, Bonastre. J.F, UBM-GMM Driven Discriminative Approach for Speaker verification, Speaker and Language Recognition workshop, IEEE Odyssey, pp.1-7, 28-30 June 2006.