SlideShare a Scribd company logo
1 of 7
Download to read offline
International Journal of Electrical and Computer Engineering (IJECE)
Vol. 12, No. 1, February 2022, pp. 376~382
ISSN: 2088-8708, DOI: 10.11591/ijece.v12i1.pp376-382 ๏ฒ 376
Journal homepage: http://ijece.iaescore.com
Comparative study to realize an automatic speaker recognition
system
Fadwa Abakarim, Abdenbi Abenaou
Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied Sciences,
Ibn Zohr University, Agadir, Morocco
Article Info ABSTRACT
Article history:
Received Jan 15, 2021
Revised Jul 13, 2021
Accepted Jul 26, 2021
In this research, we present an automatic speaker recognition system based
on adaptive orthogonal transformations. To obtain the informative features
with a minimum dimension from the input signals, we created an adaptive
operator, which helped to identify the speakerโ€™s voice in a fast and efficient
manner. We test the efficiency and the performance of our method by
comparing it with another approach, mel-frequency cepstral coefficients
(MFCCs), which is widely used by researchers as their feature extraction
method. The experimental results show the importance of creating the
adaptive operator, which gives added value to the proposed approach. The
performance of the system achieved 96.8% accuracy using Fourier transform
as a compression method and 98.1% using Correlation as a compression
method.
Keywords:
Adaptive orthogonal transform
Automatic speech recognition
DTW
MFCCs
Speaker recognition system This is an open access article under the CC BY-SA license.
Corresponding Author:
Fadwa Abakarim
Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied
Sciences, Ibn Zohr University
B.P 1136, Agadir 80000, Morocco
Email: fadwa.abakarim@gmail.com
1. INTRODUCTION
Automatic speech recognition is a computer technique that is commonly used in several systems. In
car systems, speech recognition identifies the driver's voice to start responding to their commands such as
playing music, activating global positioning system (GPS), launching phone calls, and selecting radio
stations. In the field of language education, speech recognition can teach proper pronunciation and help
people to develop their oral expression, and also it facilitates education for blind students. In this context, the
research proposed a method to identify the speaker's voice using adaptive orthogonal transformations [1] and
comparing it with the method of mel-frequency cepstral coefficients (MFCCs) [2]-[4].
In order to identify the speakerโ€™s voice several methods are used to extract the special features of
each voice, among them mel-frequency cepstral coefficients. Although numerous researchers chose it as their
feature extraction method because of its several advantages [5], [6], it reaches its limit in the improvement of
automatic speaker recognition system as described by references [7]-[9]. It needs a large voice training
dataset and a long execution time to identify the voice of each speaker [10] and the same goes for other
approaches such as principal component analysis (PCA), discrete wavelet transform (DWT) and empirical
modal decomposition (EMD) as revealed by reference [11].
Janse et al. [12] presented a comparative study between mel-frequency cepstral coefficients and
discrete wavelet transform, where it mentioned that MFCCs values are not very robust in the presence of
additive noise and that DWT requires a longer compression time. Winursito et al. [13] combined MFCCs
with data reduction methods with the aim of improving the accuracy and increasing the computational speed
Int J Elec & Comp Eng ISSN: 2088-8708 ๏ฒ
Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim)
377
of the classification process by decreasing the dimensions of the feature data. The data reduction process is
designed in two versions: MFCC+SVD version 1 and MFCC+PCA version 2. The results showed a
performance improvement for the proposed approach. Wang and Lawlor [14] proposed a method for a
speaker recognition system by combining MFCCs with back-propagation neural networks. It revealed that
this approach works successfully only when the number of unfamiliar speakers is not too large.
From these research studies, we deduced that many authors have used MFCCs as a feature
extraction approach and to strengthen their methodologies, they have used other approaches in the
classification process to obtain the desired results. In addition, other authors have developed new methods by
addressing the limitations of MFCCs in order to obtain an improved algorithm, which is not sensitive to
noise, and has a fast execution time. The goal of this study is to solve the problems mentioned above by
developing a fast algorithm based on adaptive orthogonal transformations for the extraction of the
informative features from the voice signal using the smallest possible training dataset, inspired by references
[1], [15]. This paper is organized as follows: Section 2 describes the new approach of orthogonal operators,
then the comparison results obtained between MFCCs and the proposed method are discussed in section 3, and
finally section 4 concludes the paper.
2. RESEARCH METHOD
2.1. Pre-processing
Before starting to apply the proposed approach, it is first necessary to pre-process the signals as
shown in Figure 1. This involves firstly removing silence, then secondly detecting the beginning and the end
of the speech by using the zero-crossing rate (ZCR) [16]-[19]. The third step is making them equal in length
by using zero padding [20], [21] because the training dataset can contain several signals that do not have the
same length. The final step is compressing their size without losing quality to avoid the problem of system
slowness by using Fourier transform [22], [23] or correlation [24], [25]. Figure 2 shows the input signal
before and after removing silence with detection of the beginning and the end of speech. Figure 3 shows the
speech signal after applying the Fourier transform method to detect the informative intervals. Figure 4 shows
the speech signal after applying correlation to detect the informative intervals.
Figure 1. The pre-processing part
Figure 2. The input signal before and after removing silence with detection of the beginning and the end of
speech
๏ฒ ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382
378
Figure 3. The speech signal after applying the Fourier transform
Figure 4. The speech signal after applying correlation
2.2. Theoretical background
Our approach consists in searching the informative features of the signal by using the operator H,
which is a matrix operator of the transform (dimension Nร—N) whose number of rows corresponds to the
number of basic functions. To decompose the vector X, the calculation of the discrete spectrum Y with the
numerical methods can be represented by the following matrix [1], [15], [26]:
๐‘Œ =
1
๐‘
๐ป๐‘‹ (1)
where
๐‘‹ = [๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘]๐‘‡
is the initial signal to be transformed (size ฮ = 2๐‘›
).
๐‘Œ = [๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘]๐‘‡
is the vector of the spectral coefficients, calculated by the operator orthogonal H.
The calculation of the spectrum ๐‘Œ by using (1) requires ๐‘2
multiplication and addition operations.
The most efficient way to reduce the number of operations is to use a sparse matrix [27] where most of its
elements are zero, which will make the calculation and execution time of the algorithm faster. The method of
Good [28] which is used in the construction of fast transformation algorithms consists in expressing the
Int J Elec & Comp Eng ISSN: 2088-8708 ๏ฒ
Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim)
379
orthogonal spectral operator ๐ป as a product of sparse matrices ๐บ๐‘– composed by minimum dimensional
matrices called spectral kernels, where these matrices take the following form:
๐‘‰๐‘–,๐‘—(๐›ผ๐‘–,๐‘—) = [
cos (๐›ผ๐‘–,๐‘—) sin (๐›ผ๐‘–,๐‘—)
sin (๐›ผ๐‘–,๐‘—) โˆ’cos (๐›ผ๐‘–,๐‘—)
] (2)
with ๐›ผ ๐œ– [0,2๐œ‹].
Then ๐บ๐‘– will be written as (3):
๐บ๐‘– =
[
๐‘‰๐‘–,1 0 โ€ฆ 0
0 ๐‘‰๐‘–,2 0 0
โ‹ฎ 0 โ‹ฑ 0
0 0 0 ๐‘‰๐‘–,๐‘
2
โ„ ]
(3)
with ๐‘– = 1 โ€ฆ ๐‘› , ๐‘› = log2 ๐‘ which is the number of matrices ๐บ๐‘–, then each matrix ๐บ๐‘– contains
๐‘
2
spectral
kernels of ๐‘‰๐‘–,๐‘—(๐›ผ๐‘–,๐‘—) of dimension 2 ร— 2 .
So, the formula of ๐‘Œ will be:
๐‘Œ =
1
๐‘
๐ป๐‘‹ =
1
๐‘
๐บ1๐บ2 โ€ฆ ๐บ๐‘›๐‘‹ =
1
๐‘
โˆ ๐บ๐‘–๐‘‹
๐‘›
๐‘–=1 (4)
The algorithm goes through a procedure of adaptation of the operator ๐ป to a class of input signals. It
consists in calculating the average of the statistical features at the pre-processing part (section 2 part 1) to
form the standard vector ๐‘…
ฬ‚๐‘ ๐‘‘. We can say that the operator ๐ป is adapted to a class of signals represented by a
standard vector ๐‘…
ฬ‚๐‘ ๐‘‘ if it verifies the following condition:
1
๐›ฎ
๐ป๐‘Ž๐‘…
ฬ‚๐‘ ๐‘‘ = ๐‘Œ๐‘ก = [๐‘ฆ๐‘ก1, 0, โ€ฆ ,0]๐‘‡
with ๐‘ฆ๐‘ก1 โ‰  0 (5)
where ๐‘Œ๐‘ก is the target vector that constructs the adaptation criterion of the operator ๐ป๐‘Ž to ๐‘…
ฬ‚๐‘ ๐‘‘ .
The target vector ๐‘Œ๐‘ก is calculated as (6):
๐‘Œ๐‘– = ๐บ๐‘–๐‘Œ๐‘–โˆ’1 (6)
with ๐‘– = 1 โ€ฆ log2 ๐‘ and ๐‘Œ0 = ๐‘…
ฬ‚๐‘ ๐‘‘.
In a simplified way, the synthesis procedure of the operator of orthogonal transformation is as follows:
For ๐‘– = 1, ๐‘Œ1 = ๐บ1๐‘…
ฬ‚๐‘ ๐‘‘ with ๐‘Œ1 contains
๐‘
21 non-zero number of elements.
For ๐‘– = 2, ๐‘Œ2 = ๐บ2๐‘Œ1 with ๐‘Œ2 contains
๐‘
22 non-zero number of elements.
For ๐‘– = ๐‘›, ๐‘Œ๐‘› = ๐‘Œ๐‘ก = ๐บ๐‘›๐‘Œ๐‘›โˆ’1 with ๐‘Œ๐‘ก contains
๐‘
2๐‘› non-zero number of elements.
Then the calculation of the orthogonal spectral operator is:
๐ป๐‘Ž = ๐บ๐‘›๐บ๐‘›โˆ’1 โ€ฆ ๐บ1 (7)
Figure 5 shows the overall process to extract the informative features from the input signals by using
our approach:
As shown in Figure 5, the extraction of the informative features consists of 7 steps:
โˆ’ Step 1: Input signals go through a pre-processing process (section 2 part 1).
โˆ’ Step 2: We calculate the average of the statistical features obtained during the pre-processing part (using
Fourier transform and correlation).
โˆ’ Step 3: The operator synthesis algorithm is applied to the average of the statistical features obtained from
the previous step.
โˆ’ Step 4: The output of the algorithm is the adaptive operator H.
โˆ’ Step 5: The projection multiplication is applied between the operator H and the rest of the statistical
features.
โˆ’ Step 6: The result of the previous operation is a set of informative features that characterize each signal of
the class.
โˆ’ Step 7: The average of the feature vectors is calculated. The result is an informative feature vector with a
minimum dimension that characterizes the whole class.
๏ฒ ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382
380
Figure 5. The overall process to extract the informative features from the input signals by using the adaptive
orthogonal transform method
2.3. Datasets
The speech signals are recorded and filtered by the Audacity program. Each signal is recorded by
default at 22 kHz with a duration between 1 s and 2 s. The training dataset contains 10 classes, where each
class contains 100 voice recordings of the speaker (Class 1 of Speaker A contains 100 of his voice
recordings, the same applies to Class 2 of Speaker B up to Class 10 of Speaker J).
The test dataset contains 6000 voice recordings of speaker A, B, โ€ฆ, J and other unfamiliar speakers.
To test the similarity between the speakerโ€™s voice in the training dataset and the speakerโ€™s voice in the test
dataset, dynamic time wrapping (DTW) is used. Dynamic time wrapping or DTW [29]-[32] consists in
comparing two voice signals by considering the Euclidean distance between the two vectors obtained by the
applied method, which is defined by (8):
๐ท๐‘– = โˆšโˆ‘ (๐‘Ž๐‘– โˆ’ ๐‘๐‘–)2
๐‘›
๐‘–=1 (8)
with
๐ท๐‘– : The distance between the ๐‘– vector of the spectrum ๐‘Ž and the ๐‘– vector of the spectrum ๐‘.
๐‘› : Dimension of ๐‘Ž and ๐‘ spectrum.
Therefore, the vector ๐‘Ž๐‘– will correspond to class i if ๐ท๐‘– = min (๐ท๐‘–=1โ€ฆ๐ถ) where ๐ถ is the number of classes.
3. RESULTS AND DISCUSSION
The quality of recognition is measured by calculating the recognition rate which is defined as (9).
๐‘…๐‘Ž๐‘ก๐‘’ =
๐‘กโ„Ž๐‘’ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘”๐‘›๐‘–๐‘ง๐‘’๐‘‘ ๐‘ ๐‘๐‘’๐‘Ž๐‘˜๐‘’๐‘Ÿ๐‘  ๐‘๐‘œ๐‘ข๐‘›๐‘ก
๐‘กโ„Ž๐‘’ ๐‘ก๐‘’๐‘ ๐‘ก ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ ๐‘’๐‘ก ๐‘ ๐‘–๐‘ง๐‘’
โˆ— 100 (9)
Tables 1 and 2 show the voice recognition rate according to the size of the interval of the analysis. From
these tables, we observe that the adaptive orthogonal transform method gives good results compared to the
MFCCs approach. As mentioned in section 2 part 1, we used correlation and Fourier transform to work only
with the informative intervals of the signal instead of working with the whole signal. As we can see there is a
47.5% difference in voice identification rates with Fourier transform intervals between using our approach
(96.8%) and MFCCs (49.3%). On the other hand, we found a 45.0% difference in voice identification rates
with correlation intervals between using our approach (98.1%) and the MFCCs (53.1%). Correlation intervals
Int J Elec & Comp Eng ISSN: 2088-8708 ๏ฒ
Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim)
381
give us better results than Fourier transform intervals, either for our approach or for the MFCCs. The
proposed method has succeeded in identifying 5886 voice recordings among 6000 voice recordings of the
test dataset (rate 98.1%) compared to MFCCs that identified only 3186 voice recordings (rate 53.1%), and
these results show the efficiency of our algorithm.
Table 1. The voice recognition rate according to the size of the interval using Fourier transform with MFCCs
and the adaptive orthogonal transform method.
Size of interval using Fourier
Transform
The voice recognition rate with
MFCCs (%)
The voice recognition rate with the adaptive orthogonal
transform method (%)
128
256
512
1024
2048
4096
23.8
32.8
35.6
38.5
45.6
49.3
68.9
75.3
82.3
91.5
93.1
96.8
Table 2. The voice recognition rate according to the size of the interval using correlation with MFCCs and
the adaptive orthogonal transform method
Size of interval using
Correlation
The voice recognition rate with MFCCs
(%)
The voice recognition rate with the adaptive orthogonal
transform method (%)
128
256
512
1024
2048
4096
25.6
30.3
37.3
43.2
49.1
53.1
65.8
73.2
81.7
90.2
95.6
98.1
4. CONCLUSION
MFCCs is one of the most usable and well-known methods in the field of signal processing.
However, it needs a large training dataset and a long execution time to extract the important features if the
number of test dataset unfamiliar speakers is large, so for these reasons we developed a new method based on
the creation of the operator H which is adaptable to any input signal. Even though its creation goes through
several iterations log2 ๐‘ iterations where N is the length of the signal, an advantage of working with a sparse
matrix where most of its elements are zero is that it makes the calculation and execution time of the
algorithm faster. Our future goal is to increase the number of voice recordings in the test dataset and to
decrease the number of voice recordings in the training dataset to see if the method continues to give
successful results or not. In addition, we will combine it with other methods that are commonly used as
classification methods such as hidden markov model (HMM) and artificial neural networks (ANN).
ACKNOWLEDGEMENTS
A special thanks to Mr. T. Hobson from Anglosphere English Center for reviewing for spelling and
grammatical mistakes, and to all the participants who recorded their voices for this research.
REFERENCES
[1] A. Abenaou, F. Ataa Allah, and B. Nsiri, โ€œTowards an automatic speech recognition system in amazigh based on orthogonal
transformations,โ€ Asinag, pp. 133-145, 2014.
[2] N. Easwari and P. Ponmuthuramalingam, โ€œA comparative study on feature extraction technique for isolated word speech
recognition,โ€ International Journal of Engineering and Techniques, vol. 1, no. 6, pp. 108-115, 2015.
[3] L. Muda, M. Begam, and I. Elamvazuthi, โ€œVoice recognition algorithms using mel frequency cepstral coefficient (MFCC) and
dynamic time warping (DTW) techniques,โ€ Journal of Computing, vol. 2, no. 3, 2010.
[4] P. P. Singh and P. Rani., โ€œAn approach to extract feature using MFCC,โ€ IOSR Journal of Engineering, vol. 4, no. 8, pp. 21-25,
2014, doi: 10.9790/3021-04812125.
[5] X. Wu, H. Gong, P. Chen, Z. Zhong, and Y. Xu, โ€œSurveillance robot utilizing video and audio information,โ€ Journal of Intelligent
and Robotic Systems, vol. 55, no. 4, pp. 403-421, 2009, doi: 10.1007/s10846-008-9297-3.
[6] A. N. A. Kumar and S. A. Muthukumaraswamy, โ€œText dependent voice recognition system using MFCC and VQ for security
applications,โ€ 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), vol. 2, 2017,
pp. 130-136, doi: 10.1109/ICECA.2017.8212779.
[7] W. Zunjing and C. Zhigang, โ€œImproved MFCC-based feature for robust speaker identification,โ€ Tsinghua Science and
Technology, vol. 10, no. 2, pp. 158-161, 2005, doi: 10.1016/S1007-0214(05)70048-1.
[8] W. Chen, M. Zhenjiang, and M. Xiao, โ€œDifferential MFCC and vector quantization used for real-time speaker recognition
system,โ€ 2008 Congress on Image and Signal Processing, vol. 5, pp. 319-323, 2008, doi: 10.1109/CISP.2008.492.
๏ฒ ISSN: 2088-8708
Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382
382
[9] F. Z. Chelali and A. Djeradi, โ€œText dependant speaker recognition using MFCC, LPC and DWT,โ€ International Journal of Speech
Technology, vol. 20, no. 3, pp. 725-740, 2017, doi: 10.1007/s10772-017-9441-1.
[10] M. Cutajar, E. Gatt, I. Grech, O. Casha, and J. Micallef, โ€œComparative study of automatic speech recognition techniques,โ€ IET
Signal Processing, vol. 7, no. 1, pp. 25-46, 2013, doi: 10.1049/iet-spr.2012.0151.
[11] F. Abakarim and A. Abenaou, โ€œAmazigh isolated word speech recognition system using the adaptive orthogonal transform
method,โ€ 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), 2020, pp. 1-6,
doi: 10.1109/ISCV49265.2020.9204291.
[12] P. V. Janse et al., โ€œA comparative study between MFCC and DWT feature extraction technique,โ€ International Journal of
Engineering Research and Technology, vol. 3, no. 1, pp. 3124-3127, 2014.
[13] A. Winursito, R. Hidayat, A. Bejo, and M. N. Y. Utomo, โ€œFeature data reduction of MFCC using PCA and SVD in speech
recognition system,โ€ 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1-6, 2018,
doi: 10.1109/ICSCEE.2018.8538414.
[14] Y. Wang and B. Lawlor, โ€œSpeaker recognition based on MFCC and BP neural networks,โ€ 2017 28th Irish Signals and Systems
Conference (ISSC) , 2017, pp. 1-4, doi: 10.1109/ISSC.2017.7983644.
[15] M. Azergui, A. Abenaou, and H. Bouzahir, โ€œBearing fault classification based on the adaptive orthogonal transform method,โ€
International Journal of Advanced Computer Science and Applications (IJACSA), vol. 9, no. 1, pp. 375-380, 2018,
doi: 10.14569/IJACSA.2018.090151.
[16] D. S. Sheteb and P. S. B. Patil, โ€œZero crossing rate and energy of the speech signal of devanagari script,โ€ IOSR Journal of VLSI
and Signal Processing, vol. 4, no. 1, pp. 1-5, 2014, doi: 10.9790/4200-04110105.
[17] R. G. Bachu, S. Kopparthi, B. Adapa, and B. D. Barkana, โ€œVoiced/unvoiced decision for speech signals based on zero-crossing
rate and energy,โ€ Advanced Techniques in Computing Sciences and Software Engineering, 2010, pp. 279-282, doi: 10.1007/978-
90-481-3660-5_47.
[18] H. Aouani and Y. Ben Ayed, โ€œSpeech emotion recognition with deep learning,โ€ Procedia Computer Science, vol. 176,
pp. 251-260, 2020, doi: 10.1016/j.procs.2020.08.027.
[19] T. Ijitona, H. Yue, J. Soraghan, and A. Lowit, โ€œImproved silence-unvoiced-voiced (SUV) segmentation for dysarthric speech
signals using linear prediction error variance,โ€ 2020 5th International Conference on Computer and Communication Systems
(ICCCS), 2020, pp. 685-690, doi: 10.1109/ICCCS49078.2020.9118462.
[20] J. Luo, Z. Xie, and M. Xie, โ€œInterpolated DFT algorithms with zero padding for classic windows,โ€ Mechanical Systems and
Signal Processing, vol. 70, pp. 1011-1025, 2016, doi: 10.1016/j.ymssp.2015.09.045.
[21] B. Zhang, T. Xiao, and J. Zhong, โ€œA simple determination approach for zero-padding of FFT method in focal spot calculation,โ€
Optics Communications, vol. 451, pp. 260-264, 2019, doi: 10.1016/j.optcom.2019.06.065.
[22] J. Obuchowski, A. Wyล‚omaล„ska, and R. Zimroz, โ€œSelection of informative frequency band in local damage detection in rotating
machinery,โ€ Mechanical Systems and Signal Processing, vol. 48, no. 1-2, pp. 138-152, 2014, doi: 10.1016/j.ymssp.2014.03.011.
[23] R. Mankar, M. J. Walsh, R. Bhargava, S. Prasad, and D. Mayerich, โ€œSelecting optimal features from fourier transform infrared
spectroscopy for discrete-frequency imaging,โ€ The Analyst, vol. 143, no. 5, pp. 1147-1156, 2018, doi: 10.1039/c7an01888f.
[24] C. Charayaphan, A. E. Marble, S. T. Nugent, and D. Swingler, โ€œCorrelation algorithm and sampling techniques for estimating the
signal-to-noise ratio of the electrocardiogram,โ€ Journal of biomedical engineering, vol. 14, no. 6, pp. 516-520, 1992,
doi: 10.1016/0141-5425(92)90106-U.
[25] J. Zhou and A. Qu, โ€œInformative estimation and selection of correlation structure for longitudinal data,โ€ Journal of the American
statistical Association, vol. 107, no. 498, pp. 701-710, 2012, doi: 10.1080/01621459.2012.682534.
[26] A. Abenaou, โ€œDevelopment of a method for the synthesis of adaptive spectral operators for the analysis of random,โ€ International
Journal of Computer Theory and Engineering, vol. 9, no. 4, pp. 273-276, 2017, doi: 10.7763/IJCTE.2017.V9.1150.
[27] S. Wang, J. Liu, and N. Shroff, โ€œCoded sparse matrix multiplication,โ€ 2018 International Conference on Machine Learning
(ICML), 2018, pp. 5152-5160.
[28] I. J. Good, โ€œThe interaction algorithm and practical fourier analysis,โ€ Journal of the Royal Statistical Society: Series B
(Methodological), vol. 20, no. 2, pp. 361-372, 1958, doi: 10.1111/j.2517-6161.1958.tb00300.x.
[29] P. Senin, โ€œDynamic time warping algorithm review,โ€ Information and Computer Science Department University of Hawaii at
Manoa Honolulu USA, vol. 855, no. 1-23, pp. 40, 2008.
[30] A. Ismail, S. Abdlerazek, and I. M. El-Henawy, โ€œDevelopment of smart healthcare system based on speech recognition using
support vector machine and dynamic time warping,โ€ Sustainability, vol. 12, no. 6, pp. 2403, 2020, doi: 10.3390/su12062403.
[31] I. D. G. Y. A. Wibawa and I. D. M. B. A. Darmawan, โ€œImplementation of audio recognition using mel frequency cepstrum
coefficient and dynamic time warping in wirama praharsini,โ€ Journal of Physics: Conference Series, vol. 1722, no. 1, 2021, doi:
10.1088/1742-6596/1722/1/012014.
[32] S. Kinkiri and S. Keates, โ€œSpeaker identification: Variations of a human voice,โ€ in 2020 International Conference on Advances in
Computing and Communication Engineering (ICACCE), 2020, pp. 1-4, doi:10.1109/ICACCE49060.2020.9154998.

More Related Content

What's hot

Mixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qosMixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qoseSAT Journals
ย 
Motion compensation for hand held camera devices
Motion compensation for hand held camera devicesMotion compensation for hand held camera devices
Motion compensation for hand held camera deviceseSAT Journals
ย 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12Alexander Decker
ย 
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...ijcax
ย 
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...IJECEIAES
ย 
Kv3419501953
Kv3419501953Kv3419501953
Kv3419501953IJERA Editor
ย 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...eSAT Journals
ย 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...eSAT Publishing House
ย 
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
IRJET - Wavelet based Image Fusion using FPGA for Biomedical ApplicationIRJET Journal
ย 
A040101001006
A040101001006A040101001006
A040101001006ijceronline
ย 
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...IRJET Journal
ย 
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET Journal
ย 
Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...elelijjournal
ย 
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...Cemal Ardil
ย 
Optimal neural network models for wind speed prediction
Optimal neural network models for wind speed predictionOptimal neural network models for wind speed prediction
Optimal neural network models for wind speed predictionIAEME Publication
ย 
V04507125128
V04507125128V04507125128
V04507125128IJERA Editor
ย 

What's hot (17)

Mixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qosMixed approach for scheduling process in wimax for high qos
Mixed approach for scheduling process in wimax for high qos
ย 
Motion compensation for hand held camera devices
Motion compensation for hand held camera devicesMotion compensation for hand held camera devices
Motion compensation for hand held camera devices
ย 
1.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-121.meena tushir finalpaper-1-12
1.meena tushir finalpaper-1-12
ย 
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
Cell Charge Approximation for Accelerating Molecular Simulation on CUDA-Enabl...
ย 
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
A new design of fuzzy logic controller optimized by PSO-SCSO applied to SFO-D...
ย 
Kv3419501953
Kv3419501953Kv3419501953
Kv3419501953
ย 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
ย 
A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...A vm scheduling algorithm for reducing power consumption of a virtual machine...
A vm scheduling algorithm for reducing power consumption of a virtual machine...
ย 
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
 IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
IRJET - Wavelet based Image Fusion using FPGA for Biomedical Application
ย 
A040101001006
A040101001006A040101001006
A040101001006
ย 
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
IRJET- Image Reconstruction Algorithm for Electrical Impedance Tomographic Sy...
ย 
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
ย 
Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...
ย 
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
Hybrid neuro-fuzzy-approach-for-automatic-generation-control-of-two--area-int...
ย 
Optimal neural network models for wind speed prediction
Optimal neural network models for wind speed predictionOptimal neural network models for wind speed prediction
Optimal neural network models for wind speed prediction
ย 
V04507125128
V04507125128V04507125128
V04507125128
ย 
IMQA Paper
IMQA PaperIMQA Paper
IMQA Paper
ย 

Similar to Fast and efficient speaker recognition system using adaptive orthogonal transforms

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...IJECEIAES
ย 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
ย 
Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...IJECEIAES
ย 
D04812125
D04812125D04812125
D04812125IOSR-JEN
ย 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET Journal
ย 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewIJAEMSJORNAL
ย 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
ย 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCTELKOMNIKA JOURNAL
ย 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
ย 
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...IJCSEA Journal
ย 
Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...TELKOMNIKA JOURNAL
ย 
An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...IJECEIAES
ย 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcsandit
ย 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONcscpconf
ย 
Automatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithmAutomatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithmTELKOMNIKA JOURNAL
ย 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsIOSR Journals
ย 
F43063841
F43063841F43063841
F43063841IJERA Editor
ย 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
ย 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...TELKOMNIKA JOURNAL
ย 
Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...IJECEIAES
ย 

Similar to Fast and efficient speaker recognition system using adaptive orthogonal transforms (20)

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
ย 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
ย 
Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...Compressive speech enhancement using semi-soft thresholding and improved thre...
Compressive speech enhancement using semi-soft thresholding and improved thre...
ย 
D04812125
D04812125D04812125
D04812125
ย 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
ย 
Comparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: ReviewComparative Study of Different Techniques in Speaker Recognition: Review
Comparative Study of Different Techniques in Speaker Recognition: Review
ย 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
ย 
FPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCCFPGA-based implementation of speech recognition for robocar control using MFCC
FPGA-based implementation of speech recognition for robocar control using MFCC
ย 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
ย 
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ENERGY COMPUTATION FOR BCI USING DCT AND MOVING AVERAGE WINDOW FOR NOISE SMOO...
ย 
Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...Stereo matching algorithm using census transform and segment tree for depth e...
Stereo matching algorithm using census transform and segment tree for depth e...
ย 
An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...An efficient hardware logarithm generator with modified quasi-symmetrical app...
An efficient hardware logarithm generator with modified quasi-symmetrical app...
ย 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
ย 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
ย 
Automatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithmAutomatic sound synthesis using the fly algorithm
Automatic sound synthesis using the fly algorithm
ย 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic Mathematics
ย 
F43063841
F43063841F43063841
F43063841
ย 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
ย 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
ย 
Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...Improved autocorrelation method for time synchronization in filtered orthogon...
Improved autocorrelation method for time synchronization in filtered orthogon...
ย 

More from IJECEIAES

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...IJECEIAES
ย 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionIJECEIAES
ย 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...IJECEIAES
ย 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...IJECEIAES
ย 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningIJECEIAES
ย 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...IJECEIAES
ย 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...IJECEIAES
ย 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...IJECEIAES
ย 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...IJECEIAES
ย 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyIJECEIAES
ย 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationIJECEIAES
ย 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceIJECEIAES
ย 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...IJECEIAES
ย 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...IJECEIAES
ย 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...IJECEIAES
ย 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...IJECEIAES
ย 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...IJECEIAES
ย 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...IJECEIAES
ย 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...IJECEIAES
ย 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...IJECEIAES
ย 

More from IJECEIAES (20)

Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...Cloud service ranking with an integration of k-means algorithm and decision-m...
Cloud service ranking with an integration of k-means algorithm and decision-m...
ย 
Prediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regressionPrediction of the risk of developing heart disease using logistic regression
Prediction of the risk of developing heart disease using logistic regression
ย 
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...Predictive analysis of terrorist activities in Thailand's Southern provinces:...
Predictive analysis of terrorist activities in Thailand's Southern provinces:...
ย 
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...Optimal model of vehicular ad-hoc network assisted by  unmanned aerial vehicl...
Optimal model of vehicular ad-hoc network assisted by unmanned aerial vehicl...
ย 
Improving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learningImproving cyberbullying detection through multi-level machine learning
Improving cyberbullying detection through multi-level machine learning
ย 
Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...Comparison of time series temperature prediction with autoregressive integrat...
Comparison of time series temperature prediction with autoregressive integrat...
ย 
Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...Strengthening data integrity in academic document recording with blockchain a...
Strengthening data integrity in academic document recording with blockchain a...
ย 
Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...Design of storage benchmark kit framework for supporting the file storage ret...
Design of storage benchmark kit framework for supporting the file storage ret...
ย 
Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...Detection of diseases in rice leaf using convolutional neural network with tr...
Detection of diseases in rice leaf using convolutional neural network with tr...
ย 
A systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancyA systematic review of in-memory database over multi-tenancy
A systematic review of in-memory database over multi-tenancy
ย 
Agriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimizationAgriculture crop yield prediction using inertia based cat swarm optimization
Agriculture crop yield prediction using inertia based cat swarm optimization
ย 
Three layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performanceThree layer hybrid learning to improve intrusion detection system performance
Three layer hybrid learning to improve intrusion detection system performance
ย 
Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...Non-binary codes approach on the performance of short-packet full-duplex tran...
Non-binary codes approach on the performance of short-packet full-duplex tran...
ย 
Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...Improved design and performance of the global rectenna system for wireless po...
Improved design and performance of the global rectenna system for wireless po...
ย 
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...Advanced hybrid algorithms for precise multipath channel estimation in next-g...
Advanced hybrid algorithms for precise multipath channel estimation in next-g...
ย 
Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...Performance analysis of 2D optical code division multiple access through unde...
Performance analysis of 2D optical code division multiple access through unde...
ย 
On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...On performance analysis of non-orthogonal multiple access downlink for cellul...
On performance analysis of non-orthogonal multiple access downlink for cellul...
ย 
Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...Phase delay through slot-line beam switching microstrip patch array antenna d...
Phase delay through slot-line beam switching microstrip patch array antenna d...
ย 
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...A simple feed orthogonal excitation X-band dual circular polarized microstrip...
A simple feed orthogonal excitation X-band dual circular polarized microstrip...
ย 
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
A taxonomy on power optimization techniques for fifthgeneration heterogenous ...
ย 

Recently uploaded

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
ย 
๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
ย 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
ย 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
ย 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
ย 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
ย 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
ย 
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”soniya singh
ย 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
ย 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
ย 
Study on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube ExchangerAnamika Sarkar
ย 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
ย 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
ย 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
ย 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
ย 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
ย 

Recently uploaded (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
ย 
๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
๐Ÿ”9953056974๐Ÿ”!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
ย 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
ย 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
ย 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
ย 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
ย 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
ย 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
ย 
โ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
โ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCRโ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
โ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
ย 
young call girls in Rajiv Chowk๐Ÿ” 9953056974 ๐Ÿ” Delhi escort Service
young call girls in Rajiv Chowk๐Ÿ” 9953056974 ๐Ÿ” Delhi escort Serviceyoung call girls in Rajiv Chowk๐Ÿ” 9953056974 ๐Ÿ” Delhi escort Service
young call girls in Rajiv Chowk๐Ÿ” 9953056974 ๐Ÿ” Delhi escort Service
ย 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
ย 
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
Model Call Girl in Narela Delhi reach out to us at ๐Ÿ”8264348440๐Ÿ”
ย 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
ย 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
ย 
Study on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ๏ปฟTube Exchanger
ย 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
ย 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
ย 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
ย 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
ย 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
ย 

Fast and efficient speaker recognition system using adaptive orthogonal transforms

  • 1. International Journal of Electrical and Computer Engineering (IJECE) Vol. 12, No. 1, February 2022, pp. 376~382 ISSN: 2088-8708, DOI: 10.11591/ijece.v12i1.pp376-382 ๏ฒ 376 Journal homepage: http://ijece.iaescore.com Comparative study to realize an automatic speaker recognition system Fadwa Abakarim, Abdenbi Abenaou Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied Sciences, Ibn Zohr University, Agadir, Morocco Article Info ABSTRACT Article history: Received Jan 15, 2021 Revised Jul 13, 2021 Accepted Jul 26, 2021 In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speakerโ€™s voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method. Keywords: Adaptive orthogonal transform Automatic speech recognition DTW MFCCs Speaker recognition system This is an open access article under the CC BY-SA license. Corresponding Author: Fadwa Abakarim Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied Sciences, Ibn Zohr University B.P 1136, Agadir 80000, Morocco Email: fadwa.abakarim@gmail.com 1. INTRODUCTION Automatic speech recognition is a computer technique that is commonly used in several systems. In car systems, speech recognition identifies the driver's voice to start responding to their commands such as playing music, activating global positioning system (GPS), launching phone calls, and selecting radio stations. In the field of language education, speech recognition can teach proper pronunciation and help people to develop their oral expression, and also it facilitates education for blind students. In this context, the research proposed a method to identify the speaker's voice using adaptive orthogonal transformations [1] and comparing it with the method of mel-frequency cepstral coefficients (MFCCs) [2]-[4]. In order to identify the speakerโ€™s voice several methods are used to extract the special features of each voice, among them mel-frequency cepstral coefficients. Although numerous researchers chose it as their feature extraction method because of its several advantages [5], [6], it reaches its limit in the improvement of automatic speaker recognition system as described by references [7]-[9]. It needs a large voice training dataset and a long execution time to identify the voice of each speaker [10] and the same goes for other approaches such as principal component analysis (PCA), discrete wavelet transform (DWT) and empirical modal decomposition (EMD) as revealed by reference [11]. Janse et al. [12] presented a comparative study between mel-frequency cepstral coefficients and discrete wavelet transform, where it mentioned that MFCCs values are not very robust in the presence of additive noise and that DWT requires a longer compression time. Winursito et al. [13] combined MFCCs with data reduction methods with the aim of improving the accuracy and increasing the computational speed
  • 2. Int J Elec & Comp Eng ISSN: 2088-8708 ๏ฒ Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim) 377 of the classification process by decreasing the dimensions of the feature data. The data reduction process is designed in two versions: MFCC+SVD version 1 and MFCC+PCA version 2. The results showed a performance improvement for the proposed approach. Wang and Lawlor [14] proposed a method for a speaker recognition system by combining MFCCs with back-propagation neural networks. It revealed that this approach works successfully only when the number of unfamiliar speakers is not too large. From these research studies, we deduced that many authors have used MFCCs as a feature extraction approach and to strengthen their methodologies, they have used other approaches in the classification process to obtain the desired results. In addition, other authors have developed new methods by addressing the limitations of MFCCs in order to obtain an improved algorithm, which is not sensitive to noise, and has a fast execution time. The goal of this study is to solve the problems mentioned above by developing a fast algorithm based on adaptive orthogonal transformations for the extraction of the informative features from the voice signal using the smallest possible training dataset, inspired by references [1], [15]. This paper is organized as follows: Section 2 describes the new approach of orthogonal operators, then the comparison results obtained between MFCCs and the proposed method are discussed in section 3, and finally section 4 concludes the paper. 2. RESEARCH METHOD 2.1. Pre-processing Before starting to apply the proposed approach, it is first necessary to pre-process the signals as shown in Figure 1. This involves firstly removing silence, then secondly detecting the beginning and the end of the speech by using the zero-crossing rate (ZCR) [16]-[19]. The third step is making them equal in length by using zero padding [20], [21] because the training dataset can contain several signals that do not have the same length. The final step is compressing their size without losing quality to avoid the problem of system slowness by using Fourier transform [22], [23] or correlation [24], [25]. Figure 2 shows the input signal before and after removing silence with detection of the beginning and the end of speech. Figure 3 shows the speech signal after applying the Fourier transform method to detect the informative intervals. Figure 4 shows the speech signal after applying correlation to detect the informative intervals. Figure 1. The pre-processing part Figure 2. The input signal before and after removing silence with detection of the beginning and the end of speech
  • 3. ๏ฒ ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382 378 Figure 3. The speech signal after applying the Fourier transform Figure 4. The speech signal after applying correlation 2.2. Theoretical background Our approach consists in searching the informative features of the signal by using the operator H, which is a matrix operator of the transform (dimension Nร—N) whose number of rows corresponds to the number of basic functions. To decompose the vector X, the calculation of the discrete spectrum Y with the numerical methods can be represented by the following matrix [1], [15], [26]: ๐‘Œ = 1 ๐‘ ๐ป๐‘‹ (1) where ๐‘‹ = [๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘]๐‘‡ is the initial signal to be transformed (size ฮ = 2๐‘› ). ๐‘Œ = [๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘]๐‘‡ is the vector of the spectral coefficients, calculated by the operator orthogonal H. The calculation of the spectrum ๐‘Œ by using (1) requires ๐‘2 multiplication and addition operations. The most efficient way to reduce the number of operations is to use a sparse matrix [27] where most of its elements are zero, which will make the calculation and execution time of the algorithm faster. The method of Good [28] which is used in the construction of fast transformation algorithms consists in expressing the
  • 4. Int J Elec & Comp Eng ISSN: 2088-8708 ๏ฒ Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim) 379 orthogonal spectral operator ๐ป as a product of sparse matrices ๐บ๐‘– composed by minimum dimensional matrices called spectral kernels, where these matrices take the following form: ๐‘‰๐‘–,๐‘—(๐›ผ๐‘–,๐‘—) = [ cos (๐›ผ๐‘–,๐‘—) sin (๐›ผ๐‘–,๐‘—) sin (๐›ผ๐‘–,๐‘—) โˆ’cos (๐›ผ๐‘–,๐‘—) ] (2) with ๐›ผ ๐œ– [0,2๐œ‹]. Then ๐บ๐‘– will be written as (3): ๐บ๐‘– = [ ๐‘‰๐‘–,1 0 โ€ฆ 0 0 ๐‘‰๐‘–,2 0 0 โ‹ฎ 0 โ‹ฑ 0 0 0 0 ๐‘‰๐‘–,๐‘ 2 โ„ ] (3) with ๐‘– = 1 โ€ฆ ๐‘› , ๐‘› = log2 ๐‘ which is the number of matrices ๐บ๐‘–, then each matrix ๐บ๐‘– contains ๐‘ 2 spectral kernels of ๐‘‰๐‘–,๐‘—(๐›ผ๐‘–,๐‘—) of dimension 2 ร— 2 . So, the formula of ๐‘Œ will be: ๐‘Œ = 1 ๐‘ ๐ป๐‘‹ = 1 ๐‘ ๐บ1๐บ2 โ€ฆ ๐บ๐‘›๐‘‹ = 1 ๐‘ โˆ ๐บ๐‘–๐‘‹ ๐‘› ๐‘–=1 (4) The algorithm goes through a procedure of adaptation of the operator ๐ป to a class of input signals. It consists in calculating the average of the statistical features at the pre-processing part (section 2 part 1) to form the standard vector ๐‘… ฬ‚๐‘ ๐‘‘. We can say that the operator ๐ป is adapted to a class of signals represented by a standard vector ๐‘… ฬ‚๐‘ ๐‘‘ if it verifies the following condition: 1 ๐›ฎ ๐ป๐‘Ž๐‘… ฬ‚๐‘ ๐‘‘ = ๐‘Œ๐‘ก = [๐‘ฆ๐‘ก1, 0, โ€ฆ ,0]๐‘‡ with ๐‘ฆ๐‘ก1 โ‰  0 (5) where ๐‘Œ๐‘ก is the target vector that constructs the adaptation criterion of the operator ๐ป๐‘Ž to ๐‘… ฬ‚๐‘ ๐‘‘ . The target vector ๐‘Œ๐‘ก is calculated as (6): ๐‘Œ๐‘– = ๐บ๐‘–๐‘Œ๐‘–โˆ’1 (6) with ๐‘– = 1 โ€ฆ log2 ๐‘ and ๐‘Œ0 = ๐‘… ฬ‚๐‘ ๐‘‘. In a simplified way, the synthesis procedure of the operator of orthogonal transformation is as follows: For ๐‘– = 1, ๐‘Œ1 = ๐บ1๐‘… ฬ‚๐‘ ๐‘‘ with ๐‘Œ1 contains ๐‘ 21 non-zero number of elements. For ๐‘– = 2, ๐‘Œ2 = ๐บ2๐‘Œ1 with ๐‘Œ2 contains ๐‘ 22 non-zero number of elements. For ๐‘– = ๐‘›, ๐‘Œ๐‘› = ๐‘Œ๐‘ก = ๐บ๐‘›๐‘Œ๐‘›โˆ’1 with ๐‘Œ๐‘ก contains ๐‘ 2๐‘› non-zero number of elements. Then the calculation of the orthogonal spectral operator is: ๐ป๐‘Ž = ๐บ๐‘›๐บ๐‘›โˆ’1 โ€ฆ ๐บ1 (7) Figure 5 shows the overall process to extract the informative features from the input signals by using our approach: As shown in Figure 5, the extraction of the informative features consists of 7 steps: โˆ’ Step 1: Input signals go through a pre-processing process (section 2 part 1). โˆ’ Step 2: We calculate the average of the statistical features obtained during the pre-processing part (using Fourier transform and correlation). โˆ’ Step 3: The operator synthesis algorithm is applied to the average of the statistical features obtained from the previous step. โˆ’ Step 4: The output of the algorithm is the adaptive operator H. โˆ’ Step 5: The projection multiplication is applied between the operator H and the rest of the statistical features. โˆ’ Step 6: The result of the previous operation is a set of informative features that characterize each signal of the class. โˆ’ Step 7: The average of the feature vectors is calculated. The result is an informative feature vector with a minimum dimension that characterizes the whole class.
  • 5. ๏ฒ ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382 380 Figure 5. The overall process to extract the informative features from the input signals by using the adaptive orthogonal transform method 2.3. Datasets The speech signals are recorded and filtered by the Audacity program. Each signal is recorded by default at 22 kHz with a duration between 1 s and 2 s. The training dataset contains 10 classes, where each class contains 100 voice recordings of the speaker (Class 1 of Speaker A contains 100 of his voice recordings, the same applies to Class 2 of Speaker B up to Class 10 of Speaker J). The test dataset contains 6000 voice recordings of speaker A, B, โ€ฆ, J and other unfamiliar speakers. To test the similarity between the speakerโ€™s voice in the training dataset and the speakerโ€™s voice in the test dataset, dynamic time wrapping (DTW) is used. Dynamic time wrapping or DTW [29]-[32] consists in comparing two voice signals by considering the Euclidean distance between the two vectors obtained by the applied method, which is defined by (8): ๐ท๐‘– = โˆšโˆ‘ (๐‘Ž๐‘– โˆ’ ๐‘๐‘–)2 ๐‘› ๐‘–=1 (8) with ๐ท๐‘– : The distance between the ๐‘– vector of the spectrum ๐‘Ž and the ๐‘– vector of the spectrum ๐‘. ๐‘› : Dimension of ๐‘Ž and ๐‘ spectrum. Therefore, the vector ๐‘Ž๐‘– will correspond to class i if ๐ท๐‘– = min (๐ท๐‘–=1โ€ฆ๐ถ) where ๐ถ is the number of classes. 3. RESULTS AND DISCUSSION The quality of recognition is measured by calculating the recognition rate which is defined as (9). ๐‘…๐‘Ž๐‘ก๐‘’ = ๐‘กโ„Ž๐‘’ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘”๐‘›๐‘–๐‘ง๐‘’๐‘‘ ๐‘ ๐‘๐‘’๐‘Ž๐‘˜๐‘’๐‘Ÿ๐‘  ๐‘๐‘œ๐‘ข๐‘›๐‘ก ๐‘กโ„Ž๐‘’ ๐‘ก๐‘’๐‘ ๐‘ก ๐‘‘๐‘Ž๐‘ก๐‘Ž๐‘ ๐‘’๐‘ก ๐‘ ๐‘–๐‘ง๐‘’ โˆ— 100 (9) Tables 1 and 2 show the voice recognition rate according to the size of the interval of the analysis. From these tables, we observe that the adaptive orthogonal transform method gives good results compared to the MFCCs approach. As mentioned in section 2 part 1, we used correlation and Fourier transform to work only with the informative intervals of the signal instead of working with the whole signal. As we can see there is a 47.5% difference in voice identification rates with Fourier transform intervals between using our approach (96.8%) and MFCCs (49.3%). On the other hand, we found a 45.0% difference in voice identification rates with correlation intervals between using our approach (98.1%) and the MFCCs (53.1%). Correlation intervals
  • 6. Int J Elec & Comp Eng ISSN: 2088-8708 ๏ฒ Comparative study to realize an automatic speaker recognition system (Fadwa Abakarim) 381 give us better results than Fourier transform intervals, either for our approach or for the MFCCs. The proposed method has succeeded in identifying 5886 voice recordings among 6000 voice recordings of the test dataset (rate 98.1%) compared to MFCCs that identified only 3186 voice recordings (rate 53.1%), and these results show the efficiency of our algorithm. Table 1. The voice recognition rate according to the size of the interval using Fourier transform with MFCCs and the adaptive orthogonal transform method. Size of interval using Fourier Transform The voice recognition rate with MFCCs (%) The voice recognition rate with the adaptive orthogonal transform method (%) 128 256 512 1024 2048 4096 23.8 32.8 35.6 38.5 45.6 49.3 68.9 75.3 82.3 91.5 93.1 96.8 Table 2. The voice recognition rate according to the size of the interval using correlation with MFCCs and the adaptive orthogonal transform method Size of interval using Correlation The voice recognition rate with MFCCs (%) The voice recognition rate with the adaptive orthogonal transform method (%) 128 256 512 1024 2048 4096 25.6 30.3 37.3 43.2 49.1 53.1 65.8 73.2 81.7 90.2 95.6 98.1 4. CONCLUSION MFCCs is one of the most usable and well-known methods in the field of signal processing. However, it needs a large training dataset and a long execution time to extract the important features if the number of test dataset unfamiliar speakers is large, so for these reasons we developed a new method based on the creation of the operator H which is adaptable to any input signal. Even though its creation goes through several iterations log2 ๐‘ iterations where N is the length of the signal, an advantage of working with a sparse matrix where most of its elements are zero is that it makes the calculation and execution time of the algorithm faster. Our future goal is to increase the number of voice recordings in the test dataset and to decrease the number of voice recordings in the training dataset to see if the method continues to give successful results or not. In addition, we will combine it with other methods that are commonly used as classification methods such as hidden markov model (HMM) and artificial neural networks (ANN). ACKNOWLEDGEMENTS A special thanks to Mr. T. Hobson from Anglosphere English Center for reviewing for spelling and grammatical mistakes, and to all the participants who recorded their voices for this research. REFERENCES [1] A. Abenaou, F. Ataa Allah, and B. Nsiri, โ€œTowards an automatic speech recognition system in amazigh based on orthogonal transformations,โ€ Asinag, pp. 133-145, 2014. [2] N. Easwari and P. Ponmuthuramalingam, โ€œA comparative study on feature extraction technique for isolated word speech recognition,โ€ International Journal of Engineering and Techniques, vol. 1, no. 6, pp. 108-115, 2015. [3] L. Muda, M. Begam, and I. Elamvazuthi, โ€œVoice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques,โ€ Journal of Computing, vol. 2, no. 3, 2010. [4] P. P. Singh and P. Rani., โ€œAn approach to extract feature using MFCC,โ€ IOSR Journal of Engineering, vol. 4, no. 8, pp. 21-25, 2014, doi: 10.9790/3021-04812125. [5] X. Wu, H. Gong, P. Chen, Z. Zhong, and Y. Xu, โ€œSurveillance robot utilizing video and audio information,โ€ Journal of Intelligent and Robotic Systems, vol. 55, no. 4, pp. 403-421, 2009, doi: 10.1007/s10846-008-9297-3. [6] A. N. A. Kumar and S. A. Muthukumaraswamy, โ€œText dependent voice recognition system using MFCC and VQ for security applications,โ€ 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), vol. 2, 2017, pp. 130-136, doi: 10.1109/ICECA.2017.8212779. [7] W. Zunjing and C. Zhigang, โ€œImproved MFCC-based feature for robust speaker identification,โ€ Tsinghua Science and Technology, vol. 10, no. 2, pp. 158-161, 2005, doi: 10.1016/S1007-0214(05)70048-1. [8] W. Chen, M. Zhenjiang, and M. Xiao, โ€œDifferential MFCC and vector quantization used for real-time speaker recognition system,โ€ 2008 Congress on Image and Signal Processing, vol. 5, pp. 319-323, 2008, doi: 10.1109/CISP.2008.492.
  • 7. ๏ฒ ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 12, No. 1, February 2022: 376-382 382 [9] F. Z. Chelali and A. Djeradi, โ€œText dependant speaker recognition using MFCC, LPC and DWT,โ€ International Journal of Speech Technology, vol. 20, no. 3, pp. 725-740, 2017, doi: 10.1007/s10772-017-9441-1. [10] M. Cutajar, E. Gatt, I. Grech, O. Casha, and J. Micallef, โ€œComparative study of automatic speech recognition techniques,โ€ IET Signal Processing, vol. 7, no. 1, pp. 25-46, 2013, doi: 10.1049/iet-spr.2012.0151. [11] F. Abakarim and A. Abenaou, โ€œAmazigh isolated word speech recognition system using the adaptive orthogonal transform method,โ€ 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), 2020, pp. 1-6, doi: 10.1109/ISCV49265.2020.9204291. [12] P. V. Janse et al., โ€œA comparative study between MFCC and DWT feature extraction technique,โ€ International Journal of Engineering Research and Technology, vol. 3, no. 1, pp. 3124-3127, 2014. [13] A. Winursito, R. Hidayat, A. Bejo, and M. N. Y. Utomo, โ€œFeature data reduction of MFCC using PCA and SVD in speech recognition system,โ€ 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1-6, 2018, doi: 10.1109/ICSCEE.2018.8538414. [14] Y. Wang and B. Lawlor, โ€œSpeaker recognition based on MFCC and BP neural networks,โ€ 2017 28th Irish Signals and Systems Conference (ISSC) , 2017, pp. 1-4, doi: 10.1109/ISSC.2017.7983644. [15] M. Azergui, A. Abenaou, and H. Bouzahir, โ€œBearing fault classification based on the adaptive orthogonal transform method,โ€ International Journal of Advanced Computer Science and Applications (IJACSA), vol. 9, no. 1, pp. 375-380, 2018, doi: 10.14569/IJACSA.2018.090151. [16] D. S. Sheteb and P. S. B. Patil, โ€œZero crossing rate and energy of the speech signal of devanagari script,โ€ IOSR Journal of VLSI and Signal Processing, vol. 4, no. 1, pp. 1-5, 2014, doi: 10.9790/4200-04110105. [17] R. G. Bachu, S. Kopparthi, B. Adapa, and B. D. Barkana, โ€œVoiced/unvoiced decision for speech signals based on zero-crossing rate and energy,โ€ Advanced Techniques in Computing Sciences and Software Engineering, 2010, pp. 279-282, doi: 10.1007/978- 90-481-3660-5_47. [18] H. Aouani and Y. Ben Ayed, โ€œSpeech emotion recognition with deep learning,โ€ Procedia Computer Science, vol. 176, pp. 251-260, 2020, doi: 10.1016/j.procs.2020.08.027. [19] T. Ijitona, H. Yue, J. Soraghan, and A. Lowit, โ€œImproved silence-unvoiced-voiced (SUV) segmentation for dysarthric speech signals using linear prediction error variance,โ€ 2020 5th International Conference on Computer and Communication Systems (ICCCS), 2020, pp. 685-690, doi: 10.1109/ICCCS49078.2020.9118462. [20] J. Luo, Z. Xie, and M. Xie, โ€œInterpolated DFT algorithms with zero padding for classic windows,โ€ Mechanical Systems and Signal Processing, vol. 70, pp. 1011-1025, 2016, doi: 10.1016/j.ymssp.2015.09.045. [21] B. Zhang, T. Xiao, and J. Zhong, โ€œA simple determination approach for zero-padding of FFT method in focal spot calculation,โ€ Optics Communications, vol. 451, pp. 260-264, 2019, doi: 10.1016/j.optcom.2019.06.065. [22] J. Obuchowski, A. Wyล‚omaล„ska, and R. Zimroz, โ€œSelection of informative frequency band in local damage detection in rotating machinery,โ€ Mechanical Systems and Signal Processing, vol. 48, no. 1-2, pp. 138-152, 2014, doi: 10.1016/j.ymssp.2014.03.011. [23] R. Mankar, M. J. Walsh, R. Bhargava, S. Prasad, and D. Mayerich, โ€œSelecting optimal features from fourier transform infrared spectroscopy for discrete-frequency imaging,โ€ The Analyst, vol. 143, no. 5, pp. 1147-1156, 2018, doi: 10.1039/c7an01888f. [24] C. Charayaphan, A. E. Marble, S. T. Nugent, and D. Swingler, โ€œCorrelation algorithm and sampling techniques for estimating the signal-to-noise ratio of the electrocardiogram,โ€ Journal of biomedical engineering, vol. 14, no. 6, pp. 516-520, 1992, doi: 10.1016/0141-5425(92)90106-U. [25] J. Zhou and A. Qu, โ€œInformative estimation and selection of correlation structure for longitudinal data,โ€ Journal of the American statistical Association, vol. 107, no. 498, pp. 701-710, 2012, doi: 10.1080/01621459.2012.682534. [26] A. Abenaou, โ€œDevelopment of a method for the synthesis of adaptive spectral operators for the analysis of random,โ€ International Journal of Computer Theory and Engineering, vol. 9, no. 4, pp. 273-276, 2017, doi: 10.7763/IJCTE.2017.V9.1150. [27] S. Wang, J. Liu, and N. Shroff, โ€œCoded sparse matrix multiplication,โ€ 2018 International Conference on Machine Learning (ICML), 2018, pp. 5152-5160. [28] I. J. Good, โ€œThe interaction algorithm and practical fourier analysis,โ€ Journal of the Royal Statistical Society: Series B (Methodological), vol. 20, no. 2, pp. 361-372, 1958, doi: 10.1111/j.2517-6161.1958.tb00300.x. [29] P. Senin, โ€œDynamic time warping algorithm review,โ€ Information and Computer Science Department University of Hawaii at Manoa Honolulu USA, vol. 855, no. 1-23, pp. 40, 2008. [30] A. Ismail, S. Abdlerazek, and I. M. El-Henawy, โ€œDevelopment of smart healthcare system based on speech recognition using support vector machine and dynamic time warping,โ€ Sustainability, vol. 12, no. 6, pp. 2403, 2020, doi: 10.3390/su12062403. [31] I. D. G. Y. A. Wibawa and I. D. M. B. A. Darmawan, โ€œImplementation of audio recognition using mel frequency cepstrum coefficient and dynamic time warping in wirama praharsini,โ€ Journal of Physics: Conference Series, vol. 1722, no. 1, 2021, doi: 10.1088/1742-6596/1722/1/012014. [32] S. Kinkiri and S. Keates, โ€œSpeaker identification: Variations of a human voice,โ€ in 2020 International Conference on Advances in Computing and Communication Engineering (ICACCE), 2020, pp. 1-4, doi:10.1109/ICACCE49060.2020.9154998.