SlideShare a Scribd company logo
1 of 9
Download to read offline
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 7, No. 6, December 2017, pp. 3655~3663
ISSN: 2088-8708, DOI: 10.11591/ijece.v7i6.pp3655-3663  3655
Journal homepage: http://iaesjournal.com/online/index.php/IJECE
Towards an Optimal Speaker Modeling in Speaker Verification
Systems using Personalized Background Models
Ayoub Bouziane, Jamal Kharroubi, Arsalane Zarghili
Laboratory of Intelligent Systems and Applications, University Sidi Mohamed Ben Abdellah
Fez, Morocco.
Article Info ABSTRACT
Article history:
Received Mar 19, 2017
Revised Jun 29, 2017
Accepted Jul 14, 2017
This paper presents a novel speaker modeling approachfor speaker
recognition systems. The basic idea of this approach consists of deriving the
target speaker model from a personalized background model, composed only
of the UBM Gaussian components which are really present in the speech of
the target speaker. The motivation behind the derivation of speakers’ models
from personalized background models is to exploit the observeddifference
insome acoustic-classes between speakers, in order to improve the
performance of speaker recognition systems.
The proposed approach was evaluatedfor speaker verification task using
various amounts of training and testing speech data. The experimental results
showed that the proposed approach is efficientin termsof both verification
performance and computational cost during the testing phase of the system,
compared to the traditional UBM based speaker recognition systems.
Keyword:
Speaker modeling
Speaker verification
Gaussian mixture models
Universal background model
Maximum a posteriori (MAP)
Personalized background
models
Copyright © 2017Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Ayoub Bouziane,
Laboratory of Intelligent Systems and Applications,
Sidi Mohamed Ben Abdellah University,
P.B: 2202 Immouzer Road, Fez, Morocco.
Email: ayoub.bouziane@usmba.ac.ma
1. INTRODUCTION
The GMM-UBM based speaker modeling approach was firstly introduced to the speaker recognition
community by Reynolds, in 2000 [1]–[3]. Since then, it have become the predominant approach for speaker
modeling in text-independent speaker recognition systems, and the basis of the most successful approaches
that have been emerged in the last decade: the hybrid GMM-SVM approach [4], [5], the joint factor analysis
approach [6]–[8], and the recently introduced i-vectors approach [9], [10]. The motivation behind the use of
adapted Gaussian mixture models for speaker modeling is generally based on the assumption that Gaussian
densities may model a set of hidden acoustical classes that reflect some general speaker dependent vocal tract
characteristics.
The main idea of the UBM-based speaker recognition systems consist of deriving the target speaker
model from a universal background model that represents the set of all human acoustic classes (e.g., gender
dependent acoustic classes, age-dependent acoustic classes, accent dependent acoustic classes…). The target
speaker modelwill therefore be composed of an adapted version of the various acoustic classes defined in the
universal background model. However, as it sounds to the human ear, speakers don’t share the same acoustic
classes. For example, the acoustic classes of male speakers are different from those of female speakers and
the acoustic classes of adult speakers are different from those of minor speakers, as well as, the acoustic
classes of timid speakers are different from those of bold speakers, etc. Thereby, it seems that it is not logical
to model a speaker by an adapted version of anacoustic class which may not exist in its voice. Furthermore,
the difference of acoustic classes between speakers can be exploited to discriminate between them in speaker
recognition systems.
 ISSN:2088-8708
IJECE Vol. 7, No. 6, December 2017 : 3655–3663
3656
In view of these observations, the present study attempts, firstly,to experimentally investigate the
degree of acoustic-classes’difference between speakers,and secondly, to propose a speaker modeling
approach that takes into account these observations and exploits the differences in acoustic classes between
speakers to improve the performance of speaker recognition systems.
The remainder of this paper is organized as follows. The first section gives a brief overview of the
traditional GMM-UBM based Speaker verification systems. Next, the proposed speaker modeling approach,
based on personalized background models, is introduced. Afterward, experimental resultsand discussions are
presented in the third section. Finally, conclusions and future research directions are drawn in the last section.
2. TRADITIONAL GMM-UBM BASED SPEAKER VERIFICATION SYSTEMS
The GMM-UBM based speaker verification system was originally proposed by Reynolds in 2000
[1], [11]. Since then, it has become the predominant approach for speaker modeling in text-independent
speaker recognition systems and the basis of the most successful approaches that have been emerged in the
last decade. The main idea of the GMM-UBM approach consists, as shown in Figure 1, of deriving speakers’
models from a universal background model using maximum a posteriori (MAP) adaptation [1]. The
Universal Background Model (UBM) is typically a Gaussian mixture model (GMM) that represents the
distribution of the entire acoustic space of speech [2], [3].
Figure 1. A MAP adaptation of a GMM comprises 5 Gaussian densities. Original Gaussian densities of the
UBM are depicted as unfilled ellipses (dotted line), whereas the adapted Gaussian densities are denoted by
filled ellipses, and the observed feature vectors are depicted as small circles
The MAP adaptation process is generally composed of two steps. First, the sufficient statistics
estimates of the speaker’s training data are computed for each mixture in the UBM. Next, the computed
sufficient statistics estimates are combined with the old sufficient statistics from the UBM mixture
parameters using a data-dependent mixing coefficient. The specifics of the adaptation are as follows. Given
that the universal background model is composed of M Gaussian components, each of which is
parameterized by a mean vectorμi, variance matrix σ2
i and its weightwiin the mixture model. Initially, the
posteriori probability of each UBM component {μi, σi, wi} given the feature vector xt is computed as follows:
𝑃𝑟(𝑖|𝑥𝑡) =
𝑤𝑖 𝑝𝑖(𝑥𝑡)
∑ 𝑤𝑖 𝑝𝑗(𝑥𝑡)𝐿
𝑗=1
(1)
where pi(xt) is the density of the probability, given by:
𝑝𝑖(𝑥𝑡) =
1
(2𝜋)
𝐷
2⁄
|𝜎𝑖|
1
2⁄
𝑒𝑥𝑝 {−
1
2
(𝑥𝑡 − 𝜇𝑖) 𝑇
𝜎𝑖
−1
(𝑥𝑡 − 𝜇𝑖)} (2)
The sufficient statistics for the weight, mean, and variance parameters are then computed as follows:
Universal Background Model Universal Background
Model Specific Speaker ModelTesting phase
MAP Adaptation
IJECE ISSN: 2088-8708 
Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane)
3657
𝑛𝑖 = ∑ 𝑃𝑟(𝑖|𝑥𝑡)
𝑇
𝑡=1
; 𝐸𝑖(𝑥) =
1
𝑛𝑖
∑ 𝑃𝑟(𝑖|𝑥𝑡)
𝑇
𝑡=1
𝑥𝑡 ; 𝐸𝑖(𝑥2
) =
1
𝑛𝑖
∑ 𝑃𝑟(𝑖|𝑥𝑡)
𝑇
𝑡=1
𝑥𝑡
2 (3)
Thereafter, the computed sufficient statistics are used for estimating the adapted mixture weights wî,
means μî and variances σî of the given speaker:
𝜇𝑖̂ = [𝛽𝑖
𝜇
𝐸𝑖(𝑥) + (1 − 𝛽𝑖
𝜇
)𝜇𝑖] (4)
𝑤𝑖̂ = [𝛽𝑖
𝑤
𝑛𝑖)/𝑇 + (1 − 𝛽𝑖
𝑤
)𝑤𝑖]𝛾 (4)
𝜎2̂ 𝑖 = [𝛽𝑖
𝜎2
𝐸𝑖(𝑥2) + (1 − 𝛽𝑖
𝜎2
)(𝜎2
𝑖 − 𝛽𝑖
𝜎2
) − 𝜇𝑖̂2
] (5)
with, 𝛽𝑖
𝜌
= ni/(ni + 𝑟 𝜌
), ρ ∈ {w, 𝜇, 𝜎2
} (6)
Here, γ is a scale factor computed over all adapted mixture weights to ensure that they sum to unity,
βi
ρ
, ρ ∈ {w, μ, σ2} are the adaptation coefficients that control how the adapted GMM parameters will be
affected by the observed speaker data, and rρ
is a fixed relevance factor for parameter ρ.
An overall diagram of the GMM-UBM based speaker verification system is shown in Figure 2. The
basic operating structure of the system, as shown in Figure 2, is composed of three phases: the training phase,
the enrollment phase and the testing phase. During the first phase, i.e. the training phase, a large collection of
speech utterances is collected from a background population of speakers, their corresponding feature vectors
are extracted and used to train the universal background model. The training process of the UBM is done
generally using maximum likelihood estimation via the EM algorithm. In the second phase, i.e. the
enrollment phase, speaker models of new client speakers are derived from the universal background model
through MAP adaptation using the speakers’ training feature vectors [13]. While in the testing phase, the
extracted feature vectors of the unknown speaker’s utterance Xu
= {xu
1, xu
2, . . ., xu
N} are compared against
both the claimed target speaker model and the background model. The log likelihood ratio
LLR(Xu
; λspk, λUBM) between the claimed speaker model and the universal background model is then
calculated and used to make a decision about the acceptance/rejection of the claimed identity. The log
likelihood ratio (LLR) of the test utterance Xu
between the speaker model λj and the UBM model λUBM:
𝐿𝑅(𝑋 𝑢
; 𝜆 𝑠𝑝𝑘, 𝜆 𝑈𝐵𝑀) =
1
𝑁
[∑ 𝑙𝑜𝑔 𝑝(𝑥 𝑢
𝑖|𝜆 𝑠𝑝𝑘)
𝑁
𝑖=1
− ∑ 𝑙𝑜𝑔 𝑃(𝑥 𝑢
𝑖|𝜆 𝑈𝐵𝑀)
𝑁
𝑖=1
] (7)
With, 𝑝(𝑥𝑡|𝜆) = ∑ 𝑤𝑖 𝑏𝑖(𝑥𝑡)𝑀
𝑖=1 (8)
 ISSN:2088-8708
IJECE Vol. 7, No. 6, December 2017 : 3655–3663
3658
Figure 2. Block diagram of the GMM-UBM based speaker verification system
3. SPEAKER RECOGNITION USING PERSONALIZED BACKGROUND MODELS (PBMS)
The main idea of the proposed PBM-based speaker modeling approach consistsof adapting the
target speaker model from a personalized background model (PBM), composed only of the UBM Gaussian
components which are actually present in the speaker’s speech. The MAP adaptation step of traditional UBM
based systems will, therefore, be preceded by a selection step that selectsthe background Gaussian
component swhich reflect the general form of the acoustic classes characterizing the speaker, see Figure 3.
Figure 3. A MAP adaptation of a GMM comprises 5 Gaussian densities. Original Gaussian densities of the
UBM are depicted as unfilled ellipses (dotted line), whereas the adapted Gaussian densities are denoted by
filled ellipses, and the observed feature vectors are depicted as small circles.
Given a UBM of M Gaussian component λUBM = {μUBMi
, σUBMi
, wUBMi
}/ i ∈ {1,2, … , N}, and M
training feature vectors X = {x1, x2, … , xN}, extracted from the target speaker’s speech.The PBMGaussian
components are generally chosen from the UBM using a winner-take-all based strategy. A UBM Gaussian
IJECE ISSN: 2088-8708 
Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane)
3659
component θUBM 𝑖
= (μUBM 𝑖
, σUBM 𝑖
, wUBM 𝑖
) isselected to belong to the personalized background model λPBM
of the target speaker, if there is at least one feature vector xn ∈ X, where the UBM Gaussian component
θUBMi
achieves the maximum posterior probability of xn belongingness:
𝑃𝑟(θ 𝑈𝐵𝑀 𝑖
|𝑥n) ≥ 𝑃𝑟(θ 𝑈𝐵𝑀 𝑙
|𝑥n) , ∀𝑙 ∈ {1,2, … , N} (9)
Once the PBM Gaussian components are selected, the weights w 𝑈𝐵𝑀j
of the selected components are
divided by their sum so that the total weight is equal to unity.
A block diagram of the speaker modeling process using personalized background models is shown
in Figure 4. Firstly, the training feature vectors of the target speaker are extracted from itsenrollment
utterances. Next, the extracted feature vectors are used to select the UBM Gaussian components which will
compose the speaker’s personalized background model.Afterwards, the composed personalized background
model is utilized do derive the speaker model using the MAP adaptation procedure. Finally, the adapted
model is stored together with the corresponding indices of the PBM Gaussian components in the UBM.
An example of a two-dimensional projectionof two speakers’ features and the means of their
corresponding UBM and PBM adapted modelsis shown in Figure 5. As it can be seen from this figure, the
means of the PBM adapted models fit the speakers’ features better than the means of the traditional UBM
adapted models. Moreover, it seems that the adaptation of the UBM Gaussian components which haven't any
relationship with the target speakerinfluence on the feature vectors belongingness to the appropriate Gaussian
components, which therefore affect negatively the adapted model.
Figure 4. Block diagram of speaker modeling process using personalized background models
The speaker’s feature vectors The Universal Background Model
The UBM adapted model The PBM adapted model
 ISSN:2088-8708
IJECE Vol. 7, No. 6, December 2017 : 3655–3663
3660
Figure 5. A two-dimensional projection of two speakers’ features (blue points), the means of their
corresponding UBM and PBM adapted models (in green and yellow points, respectively) and the means of
the UBM model (red points)
During the test phase, the log-likelihood ratio LLR(Xu
; λspk, λPBM) between the claimed speaker
model and the personalized background model is used to make a decision about the acceptance or the
rejection of the claimed identity:
The motivation behind the use of the personalized background modelinstead of the universal
background model is to penalize the decision score of impostor speakers who don’t share the same acoustic
classes with the target speaker.
4. EXPERIMENTS, RESULTS AND DISCUSSION
4.1. The Experimental Protocol
The performed experiments in this study were conducted on the THUYG-20 SRE database [14].
This database isgenerally composed of 353 speakers, collected in a controlled environment (silent office by
the samecarbon Microphone). The entire speech corpuswas divided into three data sets: the first dataset
consists of 200 genderbalanced speakers (100 Male and 100 Female) and devoted totrain the Universal
Background Model (UBM), the second and the third data sets are composed of the same set of 153 client
speakers.The first dataset comprises the training speech data, whereas the second comprises the testing
speech data of the 153 speakers.During the testing phase of the system, the client speakers were tested
against each other,resulting in total of 896,886 trials of 4 seconds and 441,558 trials of 8 seconds.
The feature vectors of the overall speech utterances were extracted using the MFCC approach: the
digitized speech is firstly emphasized using a simple first order digital filter with transfer function H (z) = 1 –
0.95z. Next, the emphasized speech signal is blocked into Hamming-windowed frames of 25 ms (400
samples) in length with 10 ms (160 samples) overlap between any two adjacent frames.Finally, 19 Mel-
Frequency Cepstral Coefficients were extracted from each frame [15]. During the training phase, a universal
background model (UBM) of 1024 Gaussian components was trained on the overall training data (7.5 hours
of speech) using the EM algorithm.
4.2. Investigation on the Degree of Difference in Acoustic Classes between Speakers
Theaim of the performed experiments in this sectionisto investigate the degree of difference in
acoustic classesbetween speakers. For this purpose, we have carried out several speaker verification
experiments in which we have basedonly on the difference between the acoustic classes present in the target
speaker’s speech and those present in the claimed speaker’s speech. To proceed, we have represented the
training speech utterance(s) of the target speaker and the test speech utterance of the claimed speaker by
histograms. Each histogramis composed of 1024 bins, whereeach bin represents aUBM Gaussian component.
𝐿𝑅(𝑋 𝑢
; 𝜆 𝑠𝑝𝑘, 𝜆 𝑃𝐵𝑀) =
1
𝑁
[∑ 𝑙𝑜𝑔 𝑝(𝑥 𝑢
𝑖|𝜆 𝑠𝑝𝑘)
𝑁
𝑖=1
− ∑ 𝑙𝑜𝑔 𝑃(𝑥 𝑢
𝑖|𝜆 𝑃𝐵𝑀)
𝑁
𝑖=1
] (10)
With, 𝑝(𝑥𝑡|𝜆) = ∑ 𝑤𝑖 𝑏𝑖(𝑥𝑡)𝑀
𝑖=1 (11)
IJECE ISSN: 2088-8708 
Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane)
3661
The value of each bin is defined as the number of times that the corresponding UBM Gaussian component
has the maximum posterior probability over the feature vectors of the speaker. The specifics of the histogram
construction process are shown in Algorithm 1.
Algorithm 1 The histogram construction process
Input:The feature vectorsX = {x1, x2, … , xN} of the speech utterance,
The universal background model λUBM = {μUBMi
, σUBMi
, wUBMi
}/ i ∈ {1,2, … , N}.
Output: The corresponding histogram ℋof the speech utterance.
ℋ = 𝑧𝑒𝑟𝑜𝑠(1,1024)
𝐅𝐎𝐑 𝐄𝐀𝐂𝐇 𝒙𝒊 𝐈𝐍 𝐗
𝑗 = 𝑎𝑟𝑔 𝑚𝑎𝑥
𝑖
𝑃𝑟(𝑖|𝑥𝑡)
ℋ(𝑗) = ℋ(𝑗) + 1
END
Once the histogram of the target and the claimed speakers are constructed, ℋT and ℋC respectively, the
comparison between themis done using the Bhattacharyya distance:
D(ℋT, ℋC) = − log ∑ √ℋT(𝑖). ℋC(𝑖)
1024
𝑖=1
(12)
The computed distance D(ℋ 𝑇, ℋC) is then used to make a decision about the acceptance or the
rejection of the claimed speaker.The obtained results of the performed experiments, while varying the
amount of training and testing speech data, are shown in Table1.
Table 1. The obtained EERs using several amount of training and testing speech data.
Amount of Enrollment Speech Data
20 seconds 40 seconds 60 seconds
Amount of
Testing
Speech Data
4 seconds 6.38 5.90 4.64
8 seconds 5.86 5.31 3.98
First and foremost, as it can be seen from Table 1, the obtained results are highly encouraging. The
lower obtainedequal error rates, based only on the difference in hidden acoustic classes between speakers,
reflect a great difference in those hidden acoustic classes between speakers. Additionally, it appears that each
increase in the amount of training or testing speech data is translated into better verification performance.This
proportional relationshipbetween the amounts of speech data and the performance of the system reflects the
fact that the overall acoustic classes of a speaker cannot be assembled in its pronunciation of one or two
utterances.
4.3. Assessment of the Proposed GMM-PBM Approach Compared to the Traditional GMM-UBM
Approach
The performed experiments in this section attemptto assess the performance of the proposed GMM-
PBM based speaker modeling approach, compared to the traditional GMM-UBM based approach. Hence,
various experiments were carried out using the two approaches while varying the amount of training and
testing speech data. The obtained results are illustrated in Figure 6.
The experimental results show, across the various amounts of training and testing speech data that
our proposed approach has achieved a better verification performance compared to the traditional UBM
based approach. Even in little amounts of training speech data, where the speaker’s acoustic classes may not
be all present, our proposed approach demonstrates its enhanced verification performance as compared with
the UBM based approach. Furthermore, we can see that the obtained performance under the UBM based
system using test utterances of 8 seconds was obtained under our proposed PBM based system using test
utterances of only 4 seconds. Moreover, it can be seen that the relative error reduction was doubled when we
have doubled the amount of testing utterances.
In addition to its performance advantage, the proposed approach can significantly reduce the CPU
time required for speaker verification during the testing phase of the system compared to the traditional
system, see Figure 7. In fact, the derivation of speakers’ models from personalized background models
reduces the order of their adapted models, which consequently, reduces theirstorage and computational costs.
 ISSN:2088-8708
IJECE Vol. 7, No. 6, December 2017 : 3655–3663
3662
Figure 6. The obtained Equal Error Rates using test utterances of 4 seconds (left figure) and 8 seconds (right
figure)
Figure 7. CPU time required for speaker verification using the UBM based approach and the PBM based
approach
5. CONCLUSION AND FUTURE RESEARCH DIRECTIONS
The aim of the present study was two-fold. The first aim was to investigate the degree of difference
in hidden acoustic-classes between speakers. The second aim was to propose a novel speaker modeling
approach that exploits this difference to improve the performance of speaker recognition systems. The
findings of the study revealed that there is a great difference in hidden acoustic classes between speakers.
Additionally, the evaluation of the proposed approach demonstrates its efficiency in terms of both
verification performance and computational cost during the verification phase of the system, compared to the
traditional approach.Future researchwill concentrate on applying the proposed approach within the hybrid
GMM-SVM and the i-vectors based systems.
REFERENCES
[1] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”,
Digit. Signal Process, vol. 10, no. 1, pp. 19–41, Jan. 2000.
[2] D. Reynolds, “Universal Background Models”, in Encyclopedia of Biometrics, S. Z. Li and A. K. Jain, Eds.
Springer US, 2015, pp. 1547–1550.
[3] T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: From features to supervectors”,
Speech Commun., vol. 52, no. 1, pp. 12–40, Jan. 2010.
[4] N. Dehak and G. Chollet, “Support vector GMMs for speaker verification”, in 2006 IEEE Odyssey-The Speaker
and Language Recognition Workshop, 2006, pp. 1–4.
2
2.3
2.6
2.9
3.2
3.5
3.8
20 seconds 40 seconds 60 seconds
Amount of Enrollment speech data
GMM-UBM system
GMM-PBM system
EqualErrorRate(EER)
2
2.3
2.6
2.9
3.2
3.5
3.8
20 seconds 40 seconds 60 seconds
Amount of Enrollment speech data
GMM-UBM system
GMM-PBM system
EqualErrorRate(EER)
0
0.05
0.1
0.15
0.2
0.25
20 seconds 40 seconds 60 seconds
Amount of Enrollment speech data
GMM-UBM based system GMM-PUBM based system
CPUtime(inseconds)
IJECE ISSN: 2088-8708 
Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane)
3663
[5] W. M. Campbell, D. E. Sturim, and D. A. Reynolds, “Support vector machines using GMM supervectors for
speaker verification”, IEEE Signal Process. Lett., vol. 13, no. 5, pp. 308–311, mai 2006.
[6] N. Dehak et al., “Support vector machines and Joint Factor Analysis for speaker verification”, in 2009 IEEE
International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 4237–4240.
[7] N. Dehak, R. Dehak, P. Kenny, and P. Dumouchel, “Comparison between factor analysis and GMM support vector
machines for speaker verification”, in Odyssey, 2008, p. 9.
[8] P. Kenny, “Joint factor analysis of speaker and session variability: Theory and algorithms”, CRIM MontrealReport
CRIM-0608-13, 2005.
[9] P. Kenny, “A small footprint i-vector extractor”, in Odyssey, 2012, pp. 1–6.
[10] D. Wu, J. Cao, and H. Wang, “Speaker Recognition Based on i-vector and Improved Local Preserving Projection”,
Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 6, pp. 4299–4305, Jun. 2014.
[11] H. Beigi, Fundamentals of Speaker Recognition. Springer, 2011.
[12] D. A. Reynolds, “Comparison of background normalization methods for text-independent speaker verification”, in
Eurospeech, 1997.
[13] D. Reynolds, “Gaussian Mixture Models”, in Encyclopedia of Biometrics, S. Z. Li and A. K. Jain, Eds. Boston,
MA: Springer US, 2015, pp. 827–832.
[14] A. Rozi, D. Wang, Z. Zhang, and T. F. Zheng, “An open/free database and Benchmark for Uyghur speaker
recognition”, in Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and
Evaluation (O-COCOSDA/CASLRE), 2015 International Conference, 2015, pp. 81–85.
[15] B. Ayoub, K. Jamal, and Z. Arsalane, “An analysis and comparative evaluation of MFCC variants for speaker
identification over VoIP networks”, in 2015 World Congress on Information Technology and Computer
Applications Congress (WCITCA), 2015, pp. 1–6.
BIOGRAPHIES OF AUTHORS
Ayoub Bouziane received the M.Sc. degree in intelligent systems and networks from the Faculty
of Sciences and Technologies, Fez, Morocco, in 2012. He is currently pursuing the Ph.D. degree
in the intelligent systems and applications laboratory, Sidi Mohammed ben Abdullah University,
Morocco. His research interests cover signal processing and machine learning, mostly with
applications in automatic speaker recognition. In parallel with his research activities, he teaches
undergraduate level courses in computer science, at the Faculty of Sciences and Technologies,
Fez. Additionally, He is a member of IEEE Signal Processing Society, IEEE Computational
Intelligence Society, IEEE Computer Society and the International Speech and Communication
Association (ISCA).
Jamal Kharroubi has his B.Sc. in Computer Sciencefrom Sidi Mohamed Ben Abdellah
University (Fez-Morocco) in 1996. Two years after, he got his postgraduate degree in the
domain of Artificial Intelligence from Galilee’s Institute - Paris XIII University. In 2002,
Hereceived his Ph.D. degree in automatic speaker recognition systems from Telecom Paris Tech
(Ecole Nationale Supérieure des Télécommunications de Paris-France)”. Since January 2003, he
isan associate professor in the Department of Computer Scienceat the Faculty of Science and
Technology. In 2008, he received his HDR diploma (Habilitation to conduct research).
Additionally, He iscurrently the coordinator of the Master of Intelligent Systems and Networks.
Moreover, he is the author of more than thirty publications in peer-reviewed scientific journals &
conference proceedings. His research interests are focused onsignal andimage processing, pattern
recognition, etc.
Arsalane Zarghili is a Doctor of Sciencefrom Sidi Mohamed Ben Abdellah University (Fez-
Morocco). He received his Ph.D. in 2001 and joined the same University in 2002 as Professor at
thecomputer science department of the Faculty of Science and Technology (FST). In 2007 he
was head of the computer sciences department andchair of the Software Quality Master in the
FST-Fez. Helectures Programming, Distributed, compilation and Information processing, for
both under graduate and masterlevels. In 2008 he obtained his HDR in information processing.
In 2011, he is the co-founder and the head of the Laboratory of Intelligent Systems and
Applications in the FST of Fez. He is amember of the steering committee of the department of
computer sciences and was a member of the faculty board. He is also IEEE member since 2011.
His main research is about pattern recognition, image indexing and retrieval systems incultural
heritage, biometric, etc.

More Related Content

Similar to Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models

Speaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMSpeaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMinventionjournals
 
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker IdentificationProbabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker IdentificationTELKOMNIKA JOURNAL
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...IJECEIAES
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Priyanka Reddy
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...IJECEIAES
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
 
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...sipij
 
A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_
A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_
A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_Mike Zhang
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionIDES Editor
 
Improvement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation methodImprovement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals
 
Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...IJECEIAES
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...IRJET Journal
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...
Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...
Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...sipij
 

Similar to Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models (20)

Speaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMSpeaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBM
 
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker IdentificationProbabilistic Self-Organizing Maps for Text-Independent Speaker Identification
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
 
A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...A novel automatic voice recognition system based on text-independent in a noi...
A novel automatic voice recognition system based on text-independent in a noi...
 
J016577185
J016577185J016577185
J016577185
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
V4101134138
V4101134138V4101134138
V4101134138
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak Transform
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
 
L046056365
L046056365L046056365
L046056365
 
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
 
A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_
A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_
A Novel Parallel Model Method for Noise Speech Recognition_正式投稿_
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
 
Improvement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation methodImprovement of minimum tracking in Minimum Statistics noise estimation method
Improvement of minimum tracking in Minimum Statistics noise estimation method
 
Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...
Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...
Image Filtering Using all Neighbor Directional Weighted Pixels: Optimization ...
 

More from IJECEIAES

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...IJECEIAES
 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...IJECEIAES
 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...IJECEIAES
 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningIJECEIAES
 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...IJECEIAES
 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...IJECEIAES
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...IJECEIAES
 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...IJECEIAES
 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyIJECEIAES
 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationIJECEIAES
 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceIJECEIAES
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...IJECEIAES
 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...IJECEIAES
 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...IJECEIAES
 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...IJECEIAES
 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...IJECEIAES
 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...IJECEIAES
 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...IJECEIAES
 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...IJECEIAES
 

More from IJECEIAES (20)

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...
 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regression
 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learning
 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...
 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...
 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...
 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...
 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancy
 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimization
 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performance
 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...
 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...
 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...
 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...
 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...
 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
 

Recently uploaded

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 

Recently uploaded (20)

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 

Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 7, No. 6, December 2017, pp. 3655~3663 ISSN: 2088-8708, DOI: 10.11591/ijece.v7i6.pp3655-3663  3655 Journal homepage: http://iaesjournal.com/online/index.php/IJECE Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models Ayoub Bouziane, Jamal Kharroubi, Arsalane Zarghili Laboratory of Intelligent Systems and Applications, University Sidi Mohamed Ben Abdellah Fez, Morocco. Article Info ABSTRACT Article history: Received Mar 19, 2017 Revised Jun 29, 2017 Accepted Jul 14, 2017 This paper presents a novel speaker modeling approachfor speaker recognition systems. The basic idea of this approach consists of deriving the target speaker model from a personalized background model, composed only of the UBM Gaussian components which are really present in the speech of the target speaker. The motivation behind the derivation of speakers’ models from personalized background models is to exploit the observeddifference insome acoustic-classes between speakers, in order to improve the performance of speaker recognition systems. The proposed approach was evaluatedfor speaker verification task using various amounts of training and testing speech data. The experimental results showed that the proposed approach is efficientin termsof both verification performance and computational cost during the testing phase of the system, compared to the traditional UBM based speaker recognition systems. Keyword: Speaker modeling Speaker verification Gaussian mixture models Universal background model Maximum a posteriori (MAP) Personalized background models Copyright © 2017Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Ayoub Bouziane, Laboratory of Intelligent Systems and Applications, Sidi Mohamed Ben Abdellah University, P.B: 2202 Immouzer Road, Fez, Morocco. Email: ayoub.bouziane@usmba.ac.ma 1. INTRODUCTION The GMM-UBM based speaker modeling approach was firstly introduced to the speaker recognition community by Reynolds, in 2000 [1]–[3]. Since then, it have become the predominant approach for speaker modeling in text-independent speaker recognition systems, and the basis of the most successful approaches that have been emerged in the last decade: the hybrid GMM-SVM approach [4], [5], the joint factor analysis approach [6]–[8], and the recently introduced i-vectors approach [9], [10]. The motivation behind the use of adapted Gaussian mixture models for speaker modeling is generally based on the assumption that Gaussian densities may model a set of hidden acoustical classes that reflect some general speaker dependent vocal tract characteristics. The main idea of the UBM-based speaker recognition systems consist of deriving the target speaker model from a universal background model that represents the set of all human acoustic classes (e.g., gender dependent acoustic classes, age-dependent acoustic classes, accent dependent acoustic classes…). The target speaker modelwill therefore be composed of an adapted version of the various acoustic classes defined in the universal background model. However, as it sounds to the human ear, speakers don’t share the same acoustic classes. For example, the acoustic classes of male speakers are different from those of female speakers and the acoustic classes of adult speakers are different from those of minor speakers, as well as, the acoustic classes of timid speakers are different from those of bold speakers, etc. Thereby, it seems that it is not logical to model a speaker by an adapted version of anacoustic class which may not exist in its voice. Furthermore, the difference of acoustic classes between speakers can be exploited to discriminate between them in speaker recognition systems.
  • 2.  ISSN:2088-8708 IJECE Vol. 7, No. 6, December 2017 : 3655–3663 3656 In view of these observations, the present study attempts, firstly,to experimentally investigate the degree of acoustic-classes’difference between speakers,and secondly, to propose a speaker modeling approach that takes into account these observations and exploits the differences in acoustic classes between speakers to improve the performance of speaker recognition systems. The remainder of this paper is organized as follows. The first section gives a brief overview of the traditional GMM-UBM based Speaker verification systems. Next, the proposed speaker modeling approach, based on personalized background models, is introduced. Afterward, experimental resultsand discussions are presented in the third section. Finally, conclusions and future research directions are drawn in the last section. 2. TRADITIONAL GMM-UBM BASED SPEAKER VERIFICATION SYSTEMS The GMM-UBM based speaker verification system was originally proposed by Reynolds in 2000 [1], [11]. Since then, it has become the predominant approach for speaker modeling in text-independent speaker recognition systems and the basis of the most successful approaches that have been emerged in the last decade. The main idea of the GMM-UBM approach consists, as shown in Figure 1, of deriving speakers’ models from a universal background model using maximum a posteriori (MAP) adaptation [1]. The Universal Background Model (UBM) is typically a Gaussian mixture model (GMM) that represents the distribution of the entire acoustic space of speech [2], [3]. Figure 1. A MAP adaptation of a GMM comprises 5 Gaussian densities. Original Gaussian densities of the UBM are depicted as unfilled ellipses (dotted line), whereas the adapted Gaussian densities are denoted by filled ellipses, and the observed feature vectors are depicted as small circles The MAP adaptation process is generally composed of two steps. First, the sufficient statistics estimates of the speaker’s training data are computed for each mixture in the UBM. Next, the computed sufficient statistics estimates are combined with the old sufficient statistics from the UBM mixture parameters using a data-dependent mixing coefficient. The specifics of the adaptation are as follows. Given that the universal background model is composed of M Gaussian components, each of which is parameterized by a mean vectorμi, variance matrix σ2 i and its weightwiin the mixture model. Initially, the posteriori probability of each UBM component {μi, σi, wi} given the feature vector xt is computed as follows: 𝑃𝑟(𝑖|𝑥𝑡) = 𝑤𝑖 𝑝𝑖(𝑥𝑡) ∑ 𝑤𝑖 𝑝𝑗(𝑥𝑡)𝐿 𝑗=1 (1) where pi(xt) is the density of the probability, given by: 𝑝𝑖(𝑥𝑡) = 1 (2𝜋) 𝐷 2⁄ |𝜎𝑖| 1 2⁄ 𝑒𝑥𝑝 {− 1 2 (𝑥𝑡 − 𝜇𝑖) 𝑇 𝜎𝑖 −1 (𝑥𝑡 − 𝜇𝑖)} (2) The sufficient statistics for the weight, mean, and variance parameters are then computed as follows: Universal Background Model Universal Background Model Specific Speaker ModelTesting phase MAP Adaptation
  • 3. IJECE ISSN: 2088-8708  Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane) 3657 𝑛𝑖 = ∑ 𝑃𝑟(𝑖|𝑥𝑡) 𝑇 𝑡=1 ; 𝐸𝑖(𝑥) = 1 𝑛𝑖 ∑ 𝑃𝑟(𝑖|𝑥𝑡) 𝑇 𝑡=1 𝑥𝑡 ; 𝐸𝑖(𝑥2 ) = 1 𝑛𝑖 ∑ 𝑃𝑟(𝑖|𝑥𝑡) 𝑇 𝑡=1 𝑥𝑡 2 (3) Thereafter, the computed sufficient statistics are used for estimating the adapted mixture weights wî, means μî and variances σî of the given speaker: 𝜇𝑖̂ = [𝛽𝑖 𝜇 𝐸𝑖(𝑥) + (1 − 𝛽𝑖 𝜇 )𝜇𝑖] (4) 𝑤𝑖̂ = [𝛽𝑖 𝑤 𝑛𝑖)/𝑇 + (1 − 𝛽𝑖 𝑤 )𝑤𝑖]𝛾 (4) 𝜎2̂ 𝑖 = [𝛽𝑖 𝜎2 𝐸𝑖(𝑥2) + (1 − 𝛽𝑖 𝜎2 )(𝜎2 𝑖 − 𝛽𝑖 𝜎2 ) − 𝜇𝑖̂2 ] (5) with, 𝛽𝑖 𝜌 = ni/(ni + 𝑟 𝜌 ), ρ ∈ {w, 𝜇, 𝜎2 } (6) Here, γ is a scale factor computed over all adapted mixture weights to ensure that they sum to unity, βi ρ , ρ ∈ {w, μ, σ2} are the adaptation coefficients that control how the adapted GMM parameters will be affected by the observed speaker data, and rρ is a fixed relevance factor for parameter ρ. An overall diagram of the GMM-UBM based speaker verification system is shown in Figure 2. The basic operating structure of the system, as shown in Figure 2, is composed of three phases: the training phase, the enrollment phase and the testing phase. During the first phase, i.e. the training phase, a large collection of speech utterances is collected from a background population of speakers, their corresponding feature vectors are extracted and used to train the universal background model. The training process of the UBM is done generally using maximum likelihood estimation via the EM algorithm. In the second phase, i.e. the enrollment phase, speaker models of new client speakers are derived from the universal background model through MAP adaptation using the speakers’ training feature vectors [13]. While in the testing phase, the extracted feature vectors of the unknown speaker’s utterance Xu = {xu 1, xu 2, . . ., xu N} are compared against both the claimed target speaker model and the background model. The log likelihood ratio LLR(Xu ; λspk, λUBM) between the claimed speaker model and the universal background model is then calculated and used to make a decision about the acceptance/rejection of the claimed identity. The log likelihood ratio (LLR) of the test utterance Xu between the speaker model λj and the UBM model λUBM: 𝐿𝑅(𝑋 𝑢 ; 𝜆 𝑠𝑝𝑘, 𝜆 𝑈𝐵𝑀) = 1 𝑁 [∑ 𝑙𝑜𝑔 𝑝(𝑥 𝑢 𝑖|𝜆 𝑠𝑝𝑘) 𝑁 𝑖=1 − ∑ 𝑙𝑜𝑔 𝑃(𝑥 𝑢 𝑖|𝜆 𝑈𝐵𝑀) 𝑁 𝑖=1 ] (7) With, 𝑝(𝑥𝑡|𝜆) = ∑ 𝑤𝑖 𝑏𝑖(𝑥𝑡)𝑀 𝑖=1 (8)
  • 4.  ISSN:2088-8708 IJECE Vol. 7, No. 6, December 2017 : 3655–3663 3658 Figure 2. Block diagram of the GMM-UBM based speaker verification system 3. SPEAKER RECOGNITION USING PERSONALIZED BACKGROUND MODELS (PBMS) The main idea of the proposed PBM-based speaker modeling approach consistsof adapting the target speaker model from a personalized background model (PBM), composed only of the UBM Gaussian components which are actually present in the speaker’s speech. The MAP adaptation step of traditional UBM based systems will, therefore, be preceded by a selection step that selectsthe background Gaussian component swhich reflect the general form of the acoustic classes characterizing the speaker, see Figure 3. Figure 3. A MAP adaptation of a GMM comprises 5 Gaussian densities. Original Gaussian densities of the UBM are depicted as unfilled ellipses (dotted line), whereas the adapted Gaussian densities are denoted by filled ellipses, and the observed feature vectors are depicted as small circles. Given a UBM of M Gaussian component λUBM = {μUBMi , σUBMi , wUBMi }/ i ∈ {1,2, … , N}, and M training feature vectors X = {x1, x2, … , xN}, extracted from the target speaker’s speech.The PBMGaussian components are generally chosen from the UBM using a winner-take-all based strategy. A UBM Gaussian
  • 5. IJECE ISSN: 2088-8708  Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane) 3659 component θUBM 𝑖 = (μUBM 𝑖 , σUBM 𝑖 , wUBM 𝑖 ) isselected to belong to the personalized background model λPBM of the target speaker, if there is at least one feature vector xn ∈ X, where the UBM Gaussian component θUBMi achieves the maximum posterior probability of xn belongingness: 𝑃𝑟(θ 𝑈𝐵𝑀 𝑖 |𝑥n) ≥ 𝑃𝑟(θ 𝑈𝐵𝑀 𝑙 |𝑥n) , ∀𝑙 ∈ {1,2, … , N} (9) Once the PBM Gaussian components are selected, the weights w 𝑈𝐵𝑀j of the selected components are divided by their sum so that the total weight is equal to unity. A block diagram of the speaker modeling process using personalized background models is shown in Figure 4. Firstly, the training feature vectors of the target speaker are extracted from itsenrollment utterances. Next, the extracted feature vectors are used to select the UBM Gaussian components which will compose the speaker’s personalized background model.Afterwards, the composed personalized background model is utilized do derive the speaker model using the MAP adaptation procedure. Finally, the adapted model is stored together with the corresponding indices of the PBM Gaussian components in the UBM. An example of a two-dimensional projectionof two speakers’ features and the means of their corresponding UBM and PBM adapted modelsis shown in Figure 5. As it can be seen from this figure, the means of the PBM adapted models fit the speakers’ features better than the means of the traditional UBM adapted models. Moreover, it seems that the adaptation of the UBM Gaussian components which haven't any relationship with the target speakerinfluence on the feature vectors belongingness to the appropriate Gaussian components, which therefore affect negatively the adapted model. Figure 4. Block diagram of speaker modeling process using personalized background models The speaker’s feature vectors The Universal Background Model The UBM adapted model The PBM adapted model
  • 6.  ISSN:2088-8708 IJECE Vol. 7, No. 6, December 2017 : 3655–3663 3660 Figure 5. A two-dimensional projection of two speakers’ features (blue points), the means of their corresponding UBM and PBM adapted models (in green and yellow points, respectively) and the means of the UBM model (red points) During the test phase, the log-likelihood ratio LLR(Xu ; λspk, λPBM) between the claimed speaker model and the personalized background model is used to make a decision about the acceptance or the rejection of the claimed identity: The motivation behind the use of the personalized background modelinstead of the universal background model is to penalize the decision score of impostor speakers who don’t share the same acoustic classes with the target speaker. 4. EXPERIMENTS, RESULTS AND DISCUSSION 4.1. The Experimental Protocol The performed experiments in this study were conducted on the THUYG-20 SRE database [14]. This database isgenerally composed of 353 speakers, collected in a controlled environment (silent office by the samecarbon Microphone). The entire speech corpuswas divided into three data sets: the first dataset consists of 200 genderbalanced speakers (100 Male and 100 Female) and devoted totrain the Universal Background Model (UBM), the second and the third data sets are composed of the same set of 153 client speakers.The first dataset comprises the training speech data, whereas the second comprises the testing speech data of the 153 speakers.During the testing phase of the system, the client speakers were tested against each other,resulting in total of 896,886 trials of 4 seconds and 441,558 trials of 8 seconds. The feature vectors of the overall speech utterances were extracted using the MFCC approach: the digitized speech is firstly emphasized using a simple first order digital filter with transfer function H (z) = 1 – 0.95z. Next, the emphasized speech signal is blocked into Hamming-windowed frames of 25 ms (400 samples) in length with 10 ms (160 samples) overlap between any two adjacent frames.Finally, 19 Mel- Frequency Cepstral Coefficients were extracted from each frame [15]. During the training phase, a universal background model (UBM) of 1024 Gaussian components was trained on the overall training data (7.5 hours of speech) using the EM algorithm. 4.2. Investigation on the Degree of Difference in Acoustic Classes between Speakers Theaim of the performed experiments in this sectionisto investigate the degree of difference in acoustic classesbetween speakers. For this purpose, we have carried out several speaker verification experiments in which we have basedonly on the difference between the acoustic classes present in the target speaker’s speech and those present in the claimed speaker’s speech. To proceed, we have represented the training speech utterance(s) of the target speaker and the test speech utterance of the claimed speaker by histograms. Each histogramis composed of 1024 bins, whereeach bin represents aUBM Gaussian component. 𝐿𝑅(𝑋 𝑢 ; 𝜆 𝑠𝑝𝑘, 𝜆 𝑃𝐵𝑀) = 1 𝑁 [∑ 𝑙𝑜𝑔 𝑝(𝑥 𝑢 𝑖|𝜆 𝑠𝑝𝑘) 𝑁 𝑖=1 − ∑ 𝑙𝑜𝑔 𝑃(𝑥 𝑢 𝑖|𝜆 𝑃𝐵𝑀) 𝑁 𝑖=1 ] (10) With, 𝑝(𝑥𝑡|𝜆) = ∑ 𝑤𝑖 𝑏𝑖(𝑥𝑡)𝑀 𝑖=1 (11)
  • 7. IJECE ISSN: 2088-8708  Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane) 3661 The value of each bin is defined as the number of times that the corresponding UBM Gaussian component has the maximum posterior probability over the feature vectors of the speaker. The specifics of the histogram construction process are shown in Algorithm 1. Algorithm 1 The histogram construction process Input:The feature vectorsX = {x1, x2, … , xN} of the speech utterance, The universal background model λUBM = {μUBMi , σUBMi , wUBMi }/ i ∈ {1,2, … , N}. Output: The corresponding histogram ℋof the speech utterance. ℋ = 𝑧𝑒𝑟𝑜𝑠(1,1024) 𝐅𝐎𝐑 𝐄𝐀𝐂𝐇 𝒙𝒊 𝐈𝐍 𝐗 𝑗 = 𝑎𝑟𝑔 𝑚𝑎𝑥 𝑖 𝑃𝑟(𝑖|𝑥𝑡) ℋ(𝑗) = ℋ(𝑗) + 1 END Once the histogram of the target and the claimed speakers are constructed, ℋT and ℋC respectively, the comparison between themis done using the Bhattacharyya distance: D(ℋT, ℋC) = − log ∑ √ℋT(𝑖). ℋC(𝑖) 1024 𝑖=1 (12) The computed distance D(ℋ 𝑇, ℋC) is then used to make a decision about the acceptance or the rejection of the claimed speaker.The obtained results of the performed experiments, while varying the amount of training and testing speech data, are shown in Table1. Table 1. The obtained EERs using several amount of training and testing speech data. Amount of Enrollment Speech Data 20 seconds 40 seconds 60 seconds Amount of Testing Speech Data 4 seconds 6.38 5.90 4.64 8 seconds 5.86 5.31 3.98 First and foremost, as it can be seen from Table 1, the obtained results are highly encouraging. The lower obtainedequal error rates, based only on the difference in hidden acoustic classes between speakers, reflect a great difference in those hidden acoustic classes between speakers. Additionally, it appears that each increase in the amount of training or testing speech data is translated into better verification performance.This proportional relationshipbetween the amounts of speech data and the performance of the system reflects the fact that the overall acoustic classes of a speaker cannot be assembled in its pronunciation of one or two utterances. 4.3. Assessment of the Proposed GMM-PBM Approach Compared to the Traditional GMM-UBM Approach The performed experiments in this section attemptto assess the performance of the proposed GMM- PBM based speaker modeling approach, compared to the traditional GMM-UBM based approach. Hence, various experiments were carried out using the two approaches while varying the amount of training and testing speech data. The obtained results are illustrated in Figure 6. The experimental results show, across the various amounts of training and testing speech data that our proposed approach has achieved a better verification performance compared to the traditional UBM based approach. Even in little amounts of training speech data, where the speaker’s acoustic classes may not be all present, our proposed approach demonstrates its enhanced verification performance as compared with the UBM based approach. Furthermore, we can see that the obtained performance under the UBM based system using test utterances of 8 seconds was obtained under our proposed PBM based system using test utterances of only 4 seconds. Moreover, it can be seen that the relative error reduction was doubled when we have doubled the amount of testing utterances. In addition to its performance advantage, the proposed approach can significantly reduce the CPU time required for speaker verification during the testing phase of the system compared to the traditional system, see Figure 7. In fact, the derivation of speakers’ models from personalized background models reduces the order of their adapted models, which consequently, reduces theirstorage and computational costs.
  • 8.  ISSN:2088-8708 IJECE Vol. 7, No. 6, December 2017 : 3655–3663 3662 Figure 6. The obtained Equal Error Rates using test utterances of 4 seconds (left figure) and 8 seconds (right figure) Figure 7. CPU time required for speaker verification using the UBM based approach and the PBM based approach 5. CONCLUSION AND FUTURE RESEARCH DIRECTIONS The aim of the present study was two-fold. The first aim was to investigate the degree of difference in hidden acoustic-classes between speakers. The second aim was to propose a novel speaker modeling approach that exploits this difference to improve the performance of speaker recognition systems. The findings of the study revealed that there is a great difference in hidden acoustic classes between speakers. Additionally, the evaluation of the proposed approach demonstrates its efficiency in terms of both verification performance and computational cost during the verification phase of the system, compared to the traditional approach.Future researchwill concentrate on applying the proposed approach within the hybrid GMM-SVM and the i-vectors based systems. REFERENCES [1] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digit. Signal Process, vol. 10, no. 1, pp. 19–41, Jan. 2000. [2] D. Reynolds, “Universal Background Models”, in Encyclopedia of Biometrics, S. Z. Li and A. K. Jain, Eds. Springer US, 2015, pp. 1547–1550. [3] T. Kinnunen and H. Li, “An overview of text-independent speaker recognition: From features to supervectors”, Speech Commun., vol. 52, no. 1, pp. 12–40, Jan. 2010. [4] N. Dehak and G. Chollet, “Support vector GMMs for speaker verification”, in 2006 IEEE Odyssey-The Speaker and Language Recognition Workshop, 2006, pp. 1–4. 2 2.3 2.6 2.9 3.2 3.5 3.8 20 seconds 40 seconds 60 seconds Amount of Enrollment speech data GMM-UBM system GMM-PBM system EqualErrorRate(EER) 2 2.3 2.6 2.9 3.2 3.5 3.8 20 seconds 40 seconds 60 seconds Amount of Enrollment speech data GMM-UBM system GMM-PBM system EqualErrorRate(EER) 0 0.05 0.1 0.15 0.2 0.25 20 seconds 40 seconds 60 seconds Amount of Enrollment speech data GMM-UBM based system GMM-PUBM based system CPUtime(inseconds)
  • 9. IJECE ISSN: 2088-8708  Towards an Optimal Speaker Modeling in Speaker Verification Systems … (Ayoub Bouziane) 3663 [5] W. M. Campbell, D. E. Sturim, and D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification”, IEEE Signal Process. Lett., vol. 13, no. 5, pp. 308–311, mai 2006. [6] N. Dehak et al., “Support vector machines and Joint Factor Analysis for speaker verification”, in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 4237–4240. [7] N. Dehak, R. Dehak, P. Kenny, and P. Dumouchel, “Comparison between factor analysis and GMM support vector machines for speaker verification”, in Odyssey, 2008, p. 9. [8] P. Kenny, “Joint factor analysis of speaker and session variability: Theory and algorithms”, CRIM MontrealReport CRIM-0608-13, 2005. [9] P. Kenny, “A small footprint i-vector extractor”, in Odyssey, 2012, pp. 1–6. [10] D. Wu, J. Cao, and H. Wang, “Speaker Recognition Based on i-vector and Improved Local Preserving Projection”, Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 6, pp. 4299–4305, Jun. 2014. [11] H. Beigi, Fundamentals of Speaker Recognition. Springer, 2011. [12] D. A. Reynolds, “Comparison of background normalization methods for text-independent speaker verification”, in Eurospeech, 1997. [13] D. Reynolds, “Gaussian Mixture Models”, in Encyclopedia of Biometrics, S. Z. Li and A. K. Jain, Eds. Boston, MA: Springer US, 2015, pp. 827–832. [14] A. Rozi, D. Wang, Z. Zhang, and T. F. Zheng, “An open/free database and Benchmark for Uyghur speaker recognition”, in Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2015 International Conference, 2015, pp. 81–85. [15] B. Ayoub, K. Jamal, and Z. Arsalane, “An analysis and comparative evaluation of MFCC variants for speaker identification over VoIP networks”, in 2015 World Congress on Information Technology and Computer Applications Congress (WCITCA), 2015, pp. 1–6. BIOGRAPHIES OF AUTHORS Ayoub Bouziane received the M.Sc. degree in intelligent systems and networks from the Faculty of Sciences and Technologies, Fez, Morocco, in 2012. He is currently pursuing the Ph.D. degree in the intelligent systems and applications laboratory, Sidi Mohammed ben Abdullah University, Morocco. His research interests cover signal processing and machine learning, mostly with applications in automatic speaker recognition. In parallel with his research activities, he teaches undergraduate level courses in computer science, at the Faculty of Sciences and Technologies, Fez. Additionally, He is a member of IEEE Signal Processing Society, IEEE Computational Intelligence Society, IEEE Computer Society and the International Speech and Communication Association (ISCA). Jamal Kharroubi has his B.Sc. in Computer Sciencefrom Sidi Mohamed Ben Abdellah University (Fez-Morocco) in 1996. Two years after, he got his postgraduate degree in the domain of Artificial Intelligence from Galilee’s Institute - Paris XIII University. In 2002, Hereceived his Ph.D. degree in automatic speaker recognition systems from Telecom Paris Tech (Ecole Nationale Supérieure des Télécommunications de Paris-France)”. Since January 2003, he isan associate professor in the Department of Computer Scienceat the Faculty of Science and Technology. In 2008, he received his HDR diploma (Habilitation to conduct research). Additionally, He iscurrently the coordinator of the Master of Intelligent Systems and Networks. Moreover, he is the author of more than thirty publications in peer-reviewed scientific journals & conference proceedings. His research interests are focused onsignal andimage processing, pattern recognition, etc. Arsalane Zarghili is a Doctor of Sciencefrom Sidi Mohamed Ben Abdellah University (Fez- Morocco). He received his Ph.D. in 2001 and joined the same University in 2002 as Professor at thecomputer science department of the Faculty of Science and Technology (FST). In 2007 he was head of the computer sciences department andchair of the Software Quality Master in the FST-Fez. Helectures Programming, Distributed, compilation and Information processing, for both under graduate and masterlevels. In 2008 he obtained his HDR in information processing. In 2011, he is the co-founder and the head of the Laboratory of Intelligent Systems and Applications in the FST of Fez. He is amember of the steering committee of the department of computer sciences and was a member of the faculty board. He is also IEEE member since 2011. His main research is about pattern recognition, image indexing and retrieval systems incultural heritage, biometric, etc.