SlideShare a Scribd company logo
TELKOMNIKA, Vol.17, No.4, August 2019, pp.1914~1922
ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018
DOI: 10.12928/TELKOMNIKA.v17i4.12615 ◼ 1914
Received July 1, 2018; Revised February 23, 2019; Accepted March 27, 2019
FPGA-based implementation of speech recognition for
robocar control using MFCC
Bayuaji Kurniadhani*1
, Sugondo Hadiyoso2
, Suci Aulia3
, Rita Magdalena4
1,4
School of Electrical Engineering, Telkom University, Terusan Buah Batu, Bandung, Indonesia
2,3
School of Applied Science, Telkom University, Terusan Buah Batu, Bandung, Indonesia
*Corresponding author, e-mail: kurniadhani.bayuaji@gmail.com1
, sugondo@telkomuniversity.ac.id2
,
suciaulia@telkomuniversity.ac.id3
, ritamagdalena@telkomuniversity.ac.id4
Abstract
This research proposes a simulation of the logic series of speech recognition on the MFCC
(Mel Frequency Spread Spectrum) based FPGA and Euclidean Distance to control the robotic car motion.
The speech known would be used as a command to operate the robotic car. MFCC in this study was used
in the feature extraction process, while Euclidean distance was applied in the feature classification process
of each speech that later would be forwarded to the part of decision to give the control logic in robotic
motor. The test that has been conducted showed that the logic series designed was precise here by
measuring the Mel Frequency Warping and Power Cepstrum. With the achievement of logic design in this
research proven with a comparison between the Matlab computation and Xilinx simulation, it enables to
facilitate the researchers to continue its implementation to FPGA hardware.
Keywords: Euclidean distance, FPGA, MFCC, speech recognition
Copyright © 2019 Universitas Ahmad Dahlan. All rights reserved.
1. Introduction
The research on Automatic Speech Recognition (ASR) is still the most interest topic for
its very important role in daily life such as in the number of works that have been conducted on
SR (Speech Recognition) such as Smart Home [1-4], Artificial Intelligence such as human
emotional classification based on speech recognition [5-9] and robotic application [10],
in the field of Student Learning [11-13], in medical sector, SR is used to detect the stress level
of a person [14]. Therefore, the optimization process in SR is still in the progress for obtaining
the better accuracy such as with speech enhancement [15-17] or noise reduction [18-20].
Automatic Speech Recognition (ASR) is a signal recognition process of speech to be a number
of word orders. This concept has been developed since 2000 started from the emergence of
Hidden Markov Model-based CMU Sphinx-N, Google Voice Recognition in Androin in 2010. In
2012 Apple released its application named Siri [21]. Since the end of 2017, there have been
many applicable applications of speech recognition along with the increasing use of speech
technology in real life environment.
Of some applications of ASR above, speech features that are frequently used in
the extraction process are [22, 23] LPC (Linear Predictive Codes), MFCC (Mel Frequency
Cepstral Coefficients), PLP (Perceptual Linear Prediction), and PLP-RASTA (PLP-Relative
Spectra). A technical review to observe the performance of the speech feature extraction
techniques (MFCC, LPC, PLP, GFCC) with the combination of its classification technique
(DTW, HMM, MLP, SVM, and DT) have been tested for SR Tamil Spoken words by Vimala [24].
Based upon the result of the test, of 5 (five) varieties of the method of feature extraction and
classification, GFCC method has been more excellent in comparison to other algorithms [25].
Gurban has also conducted an approach of the MFCC based audio visual SR and conducted
the optimization with two methods: CMI (Conditional Mutual Information) algorithm and MIFS
(Mutual Information Feature Selection) algorithm. The result was found at best in very noisy
conditions. Another study on SR was conducted by Gupta [26] using LPC and LPCC as
the feature extraction techniques with a result showing that in terms of speech identification,
the LPC parameter was more precise compared to LPCC. On the other hand, in reliability and
robustness, LPCC was more excellent. Research study group on Speech and Language
Technology for three years successfully did a research on how the effect of stress was on
TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1915
the speech production with the approaches of HMM and MFCC [27]. Other SR was with hidden
Marcov model that is the experiment conducted by Farsi [28], the result showed that the HMM
process needed to still be done with GA (Genetic Algorithm) for a good result. To compare [27],
another study on MFCC conducted by Mohan [29] was done by combining MFCC with DTW
algorithm showed the good results in the process of SR with ten features for each word in
training phase.
Based on the comparison from a number of studies above (MFCC, LPC, PLP, GFCC)
to this study, MFCC was used for the process of feature extraction on the speech signal as
commonly it had the high performance rate and low complexity [30]. In this research,
a prototype of SR system was made to FPGA that later would be used as an input for
the robotic car (further research). One of examples of word recognition system application can
be done in a simple robot car. The robot is able to differentiate the word pronounced by
the users. By applying the word recognition system in the robotic car, then the users could
control the direction of robot motion without a need to touch the button or being close to
the robot.
2. Mel Frequency Cepstrum Coefficients (MFCC)
Feature extraction is a process to determine a value or vector that can be used as an
object or individual identifier that subsequently will be used in the classification
process [31, 32]. MFCC analysis is a standard method [33] used to represent the parameter of
sound signal. The mechanism of MFCC is based upon the difference of frequency that can be
captured by human ears commonly stated in the scale of Mel (originated from Melody) in which
the sound signal would be filtered in linear in Mel frequency scale that is for the low frequency
less than 1 KHz and logarithmically for high frequency more than 1 KHz [34]. The block diagram
for the process of feature extraction using MFCC is presented in Figure 1.
Figure 1. MFCC block diagram
As shown in Figure1 MFCC consists of some following computational steps:
Step 1: Preprocessing
As the initial step, the analog signal is passed through HPF emphasizes or known as
pre-emphasis [35] Its purpose is to increase the energy of signal [36] thus, its output becomes
the one as in (1) :
Y(𝑛) = 𝑥(𝑛)−∝ 𝑥(𝑛 − 1) (1)
𝑤ℎ𝑒𝑟𝑒 0.9 ≤ ∝ ≤ 1
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1916
Step 2: Frame Blocking and Windowing
Speech signal enters to the process of short frame or frame blocking with the duration
of 10-30 ms on purpose for ADC. However, in its process, aliasing or spectral leakage or
discontinue frequently occurs. For this, to cope with this problem, it must firstly start with
windowing process before the signal continues to FFT phase. If defined, the window as function
of 𝑤(𝑛), 0 ≤ 𝑛 ≤ (𝑁 − 1) is dependent upon the N value in which N refers to the number of
samples in its frame. The input signal enters to the windowing process with the convolutional
concept. The function of window used in this study is Hamming windowing as shown
in (2) [37, 38]:
𝑤(𝑛) = 0.54 − 0.46 𝑐𝑜𝑠 (
2𝜋𝑛
𝑁−1
) (2)
where 0 ≤ 𝑛 ≤ (𝑁 − 1).
The determination of the number of frame length should be in the fold of 2N to facilitate the FFT
process existing in the next block.
Step 3: DFT (Discrete Fourier Transform)
The output signal from Hamming window is in the time domain. To facilitate
the measurement in the further process Mel-filter bank, then the signal is transformed
mathematically from discrete time domain to the frequency using DFT method [39]. Meanwhile,
the algorithm used to do transformation is called as FFT. Mathematically, DFT can be
formulated as follows (3) [40, 41]:
𝑋[𝑘] = ∑ 𝑥[𝑛]. 𝑒−
𝑗2𝜋𝑛𝑘
𝑁𝑁−1
𝑛=0 (2)
𝑘 = 0,1,2, … . , (𝑁 − 1)
By doing FFT process, then it can obtain the value of X[k] as the result of
the transformation from FFT representing each value of x(n) from the input signal. From
the input signal in which each of the values is the representation of basic frequency from
the input signal. X[k] is commonly called as spectrum or periodogram.
Step 4: Mel Frequency Filter Bank
This phase is the filtering process from frequency spectrum of X[k] in each frame using
a number of M filter banks. This filter is made by following the perception of mel frequency scale
represented to be the function of triangle filter function and mel scale frequency is obtained from
the result of the conversion of linear frequency. For the linear frequency (fHz)<1 kHz, it is
converted to be fHz while if fHz>1 kHz, then it is converted into the (4) presented as
follows [29], [35], [42] :
𝑀𝑒𝑙(𝑓) = 2595𝑙𝑜𝑔10 (1 +
𝑓
700
) (3)
warping process to the signal in the frequency domain will result in the value of Mel Spectrum
coefficients through the process as shown in (5) as follows:
𝑋𝑖 = 𝑙𝑜𝑔10(∑ |𝑋(𝑘)|𝐻𝑖(𝑘)𝑁−1
𝑘=0 ) (4)
Xi is the value of frequency spectrum to i, N is the number of coefficients of FFT, and Hi(f) is
the filter value to i on the frequency spot f.
Step 5: Cepstrum
At this phase, Mel Cepstrum would be converted into the time domain using Discrete
Cosine Transform (DCT). The result is called as Mel Frequency Cepstrum Coefficients (MFCC)
as shown in (6) [41, 42] in which n is the number of coefficient and M is the number of filter
banks. The word cepstrum is originated from the word spectrum that is reversed in its first
syllable that is spec to ceps [43]. Cepstrum is the power spectrum obtained mathematically
through the logarithmic computation [44, 45].
TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1917
𝐶 𝑛 = ∑ ( 𝑙𝑜𝑔 𝑋𝑖) [𝑛 (𝑀 −
1
2
)
𝜋
𝑀
]𝑀
𝑖=1 (5)
3. Digital Design and Speech Recognition Simulation
This section discusses in detailed about the design of the block of speech recognition
(SR) system. The main part consists of audio codec interface, feature extraction using MFCC
and classification using Euclidean distance. It also discusses about the simulation of the digital
logic design to FPGA using the Xilinx application. In the beginning of condition, the system will
wait for the input from the switch to do training or recognizing. In the training mode, the coming
sound will face the process of feature extraction using MFCC. The result of the feature
extraction will be in the form of cepstral coefficient stored in the database. In the recognizing
mode, the sound coming to the system will face the feature extraction process.
Then, the coefficient cepstral of input sound would be compared the cepstral coefficient in
the database using Euclidean distance method. The lowest value of Euclidean distance would
be classified with the corresponding words. Further, the logic on the output pin would be
conditioned in accordance with the words recognized. The value of output pin of this FPGA is
used as the logic input on the driver motor to control the motion of the robotic car as shown in
Table 1 as follows.
Table 1. Logic Output FPGA
Instruction First Motor Second Motor Output
Logic of FPGAEn Dir 1 Dir 2 En Dir 1 Dir 2
“Right” 1 1 1 1 1 1 111111
“Left” 1 0 1 1 1 0 101110
“Forward” 0 0 0 1 1 0 000110
“Backward” 0 0 0 1 0 1 000101
“Stop” 0 0 0 0 0 0 000000
The result of the simulation output then is compared to the result of the MATLAB.
The system architecture of the speech recognition block is shown in Figure 2. The system
architecture consists of four main blocks: microphone, audio codec, FPGA, driver motor and
motor DC. Inside the FPGA, it is added with eight digital series acting as the cores of the system
those are codec interface, preprocessing, control unit, clock divider, MFCC, database,
Euclidean distance and output logic.
Figure 2. Architecture of proposed speech recognition
4. Results and Analysis
Simulation of the logic series design was conducted in Xilinx ISE Project Navigator.
Also, computation simulation was done in MATLAB in order to obtain the comparing data from
the designed system. The test was conducted by observing and comparing the data in each
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1918
sub-system of speech recognition. The synthesis of the logic circuit in this proposed system can
be seen in Figure 3.
Figure 3. RTL Design schematic
4.1. Pre-emphasis Filter
This series functions to reduce the noise ratio in signal (SNR). This filter maintains
the high frequencies on the spectrum eliminated in the process of sound production. From
the result of simulation test compared by means of manual calculation, it has been found a
similar value in each sample of signal. The results are shown in Table 2.
Table 2. Comparison of the Result of Manual Calculation on the Pre-emphasis Filter and
Simulation Result
No Manual Calculation Xilinx Simulation
1 380 380
2 -166.25 -167
3 282.875 282
4 -25.1875 -26
5 -436.563 -437
6 431.5625 431
7 23.75 23
8 -519.25 -520
9 -92.1875 -93
10 -42.3125 -43
4.2. FFT
The computation FFT design on FPGA is done separately to result in 2 parts of output.
From Figure 4 below can be seen that the result was not much different as the calculation
operation in this design did not use the floating point system. The comparison of the results of
FFT in the graphic form can be seen in Figure 4.
4.3. Mel Frequency Warping
Mel Frequency Warping functions as a filter from the spectrum of frequency of
the output result of FFT. The multiplication process was done in parallel to 20 filter banks to
make this process faster. The results from 20 filter bank are in the form of magnitude values as
TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1919
shown in Figure 5. As seen in Figure 5 above, it can be seen that the feature in 5 - 10 from
the result of Xilink simulation approached the MATLAB computation. This shows that the logic
design made is precise.
Figure 4. The Graph of the calculation result on 256-point FFT on the MATLAB
and Xilinx Simulation
Figure 5. Graph of Mel Frequency Warping result on MATLAB and Simulation on Xilinx
4.4. Cepstrum
This block functions to convert the Mel cepstrum from the result of previous block into
the time domain using Discrete Cosine Transform. The coefficient value was then changed into
the representation of fixed point 16 bits and stored in ROM. This block also had RAM to store
the output from the previous process to facilitate the process. The results of the block, if
compared to the result of the calculation on MATLAB as shown in Figure 6, were seen different.
This was because the calculation operation in this block involved the numbers that had some
digits after the comma. Meanwhile, the representation of the numbers used did not have any
accuracy in number; thus, rounding occurred to the representation of numbers closer.
4.5. Decision
12 coefficients from the result of feature extraction stored in the database at the testing
phase will be compared to the coefficient of the sound input feature and then will be cut to do
1 2 3 4 5 6 7 8 9 10
Number of Features
2
2.5
3
3.5
4
Magnitude
Matlab vs Xilink Simulation
Matlab vs Xilink Simulation
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1920
the instruction as determined. The decision was taken by calculating the closest feature
coefficient value using euclidean distance. The block of database made was used to store one
sample of each instruction word. Five dataset of feature were saved in the database.
Once obtaining the closest number, then decision was taken to regulate the control in the motor
in the format of 4 bit data as shown in the following Figure 7.
Figure 6. Graph of on the Calculation Result on MATLAB and Xilinx Simulation
Figure 7. Result of the simulation on the output logic block on Xilinx
5. Conclusion
This research has successfully made a design and simulation of logic series in a
speech recognition system using the MFCC method and euclidean distance to control the rate
of robotic car. The MFCC method was used to obtain the feature from the command input in
the form of sound consisting of right, left, forward, backward and stop. The result of the signal
feature from MFCC furthermore was calculated for its similarity and compared with the signal
feature in the database using euclidean distance to give the control logic in motor.
Process simplification was also done to obtain the resource of the FPGA memory as
minimum as possible but still had a good performance. Validation has been conducted to test
the excellence of the system made. This test was done by comparing the output value of logic
design that has been made in each part of speech recognition componentswith the calculation
simulation on MATLAB. There was a difference in the value between the result of
the computation of the logic series and the MATLAB as the calculating operation involved
the numbers that had some digits after the comma. Meanwhile, the representation of
the numbers used did not have any sufficient number accuracy; thus making the rounding
occurred in the closest representative numbers. However, this difference value was relative
insignificant and the system then still had a good performance to compare the signal features
from one class to other as proven in the results. The next research will be done through
the implementation to the FPGA board and analysis on the synthesis of logic series to
determine the parameter of the performance in IC design.
1 2 3 4 5 6 7 8 9 10 11 12
Number of Features
-10
-6
-2
2
6
10
Magnitude
Matlab Computation vs Xilink Simulation
Matlab
Xilink
TELKOMNIKA ISSN: 1693-6930 ◼
FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani)
1921
References
[1] M Vacher, AFleury, F Portet, J-F Serignat, N Noury. Complete Sound and Speech Recognition
System for Health Smart Homes: Application to the Recognition of Activities of Daily Living. New
Developments in Biomedical Engineering. InTech. 2010.
[2] P Putthapipat, C Woralert, P Sirinimnuankul. Speech recognition gateway for home automation on
open platform. International Conference on Electronics, Information, and Communication (ICEIC)
2018. Hawai, USA. 2018: 1–4.
[3] T Ayres, B Nolan. Voice activated command and control with speech recognition over WiFi. Sci.
Comput. Program. 2006; 59(1–2): 109–126.
[4] A Mohanta, VK Mittal. Human Emotional States Classification Based upon Changes in Speech
Production Features in Vowel Region. International Conference on Telecommunication and Networks
(TEL-NET) 2017. Noida, India. 2017.
[5] KF Akingbade, OM Umanna, IA Alimi. Voice-Based Door Access Control System Using the Mel
Frequency Cepstrum Coefficients and Gaussian Mixture Model. Int. J. Electr. Comput. Eng. 2014;
4(5): 643–647.
[6] W Rao et al. Investigation of fixed-dimensional speech representations for real-time speech emotion
recognition system. International Conference on Orange Technologies (ICOT) 2017: 197–200.
[7] J Yadav, MS Fahad, KS Rao. Epoch detection from emotional speeach signal using zero time
windowing. Speech Commun. 2017; 96(November): 142–149.
[8] HK Palo, MN Mohanty. Classification of Emotional Speech of Children Using Probabilistic Neural
Network. Int. J. Electr. Comput. Eng. 2015; 5(2): 311–317.
[9] FE Gunawan, K Idananta. Predicting the Level of Emotion by Means of Indonesian Speech Signal.
TELKOMNIKA. 2017; 15(2): 665–670.
[10] J Twiefel, X Hinaut, S Wermter, J Twiefel, X Hinaut, S. Wermter. Syntactic Reanalysis in Language
Models for Speech Recognition. Int. Conf. Dev. Learn. Epigenetic Robot. Lisbon, Portugal. 2017:
215–220.
[11] D Recasens, C Rodríguez. Contextual and syllabic effects in heterosyllabic consonant sequences.
An ultrasound study. Speech Commun. 2017; 96(December): 150–167.
[12] M Jia, J Sun, C Bao, C Ritz. Separation of multiple speech sources by recovering sparse and
non-sparse components from B-format microphone recordings. Speech Commun. 2017; 96(May):
184–196.
[13] M Tahon, G Lecorve, D Lolive. Can we Generate Emotional Pronunciations for Expressive Speech
Synthesis?. IEEE Trans. Affect. Comput. 2018; 3045: 1-1.
[14] K Li, S Mao, X Li, Z Wu, H Meng. Automatic lexical stress and pitch accent detection for L2 English
speech using multi-distribution deep neural networks. Speech Commun. 2017; 96(September):
28–36.
[15] RK Kandagatla, PV. Subbaiah. Speech enhancement using MMSE estimation of amplitude and
complex speech spectral coefficients under phase-uncertainty. Speech Commun. 2017;
96(November): 10–27.
[16] A Misra, JHL. Hansen. Modelling and compensation for language mismatch in speaker verification.
Speech Commun. 2017; 96(January): 58–66.
[17] B Wiem, BM Mohamed anouar, P Mowlaee, B Aicha. Unsupervised single channel speech
separation based on optimized subspace separation. Speech Commun. 2017; 96 (April): 93–101.
[18] R Soleymani, IW. Selesnick, DM. Landsberger. SEDA: A tunable Q-factor wavelet-based noise
reduction algorithm for multi-talker babble. Speech Commun. 2017; 96(October): 102–115.
[19] Y Tang, Q Liu, W Wang, TJ Cox. A non-intrusive method for estimating binaural speech intelligibility
from noise-corrupted signals captured by a pair of microphones. Speech Commun. 2017; 96(June):
116–128.
[20] F Saki, A Bhattacharya, N Kehtarnavaz. Real-time Simulink implementation of noise adaptive speech
processing pipeline of cochlear implants. Speech Commun. 2017; 96(April): 197–206.
[21] S Naragan, MD Gupta. Speech Feature Extraction Technique: A Review. Int. J. Comput. Sci. Mob.
Comput. 2015; 4(3): 107–114.
[22] N Dave. Feature Extraction Methods LPC, PLP and MFCC In Speech Recognition. Int. J. Adv. Res.
Eng. Technol. 2013; 1(VI): 1–5.
[23] Chulhee Lee, Donghoon Hyun, Euisun Choi, Jinwook Go, and Chungyong Lee. Optimizing feature
extraction for speech recognition. IEEE Trans. Speech Audio Process. 2003; 11(1): 80–87.
[24] C Vimala and V. Radha. Suitable Feature Extraction and Speech Recognition Technique for Isolated
Tamil Spoken Words. Int. J. Comput. Sci. Inf. Technol. 2014; 5(1): 378–383.
[25] M Gurban, JP Thiran. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition.
IEEE Trans. Signal Process. 2009; 57(12): 4765–4776.
[26] H Gupta, D Gupta. LPC and LPCC method of feature extraction in Speech Recognition System.
Proc. 6th Int. Conf. Cloud Syst. Big Data Eng. Conflu. (2016). Noida, India: 498–502.
◼ ISSN: 1693-6930
TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922
1922
[27] HJM Steeneken, JHL Hansen. Speech under stress conditions: overview of the effect on speech
production and on system performance. IEEE International Conference on Acoustics, Speech, and
Signal Processing. Phoenix, USA. 1999: 2079–2082.
[28] H Farsi, R Saleh. Implementation and optimization of a speech recognition system based on hidden
Markov model using genetic algorithm. Iranian Conference on Intelligent Systems (ICIS). Bam, Iran.
2014: 1–5.
[29] Bhadragiri Jagan Mohan and Ramesh Babu N. Speech recognition using MFCC and DTW.
International Conference on Advances in Electrical Engineering (ICAEE). Vellore, India. 2014: 1–4.
[30] I McLoughlin. Applied Speech and Audio Processing. Cambridge: Cambridge University Press. 2009;
9780521519.
[31] J Yang, J Yang. Generalized K–L transform based combined feature extraction. Pattern Recognit.
2002; 35(1): 295–297.
[32] I Guyon, A Elisseeff. An Introduction to Feature Extraction. Feature Extraction. 2006; 207: 1–25.
[33] P Motlicek. Feature Extraction in Speech Coding and Recognition. Report of PhD research internship
in ASP Group. 2003: 1–50.
[34] GSVS Sivaram, H Hermansky. Sparse Multilayer Perceptron for Phoneme Recognition. IEEE Trans.
Audio. Speech. Lang. Processing. 2012; 20(1): 23–29.
[35] L Muda, M Begam, I Elamvazuthi. Voice Recognition Algorithms using Mel Frequency Cepstral
Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. J. Comput. 2010; 2(3): 138–143.
[36] A Přibilová. Preemphasis influence on harmonic speech model with autoregressive parameterization.
Radioengineering. 2003; 12(3): 32–36.
[37] R Goldberg, L Riek. A practical handbook of speech coders, 1st ed. Boca Raton: CRC Press, 2000.
[38] MA Samad. A Novel Window Function Yielding Suppressed Mainlobe Width and Minimum Sidelobe
Peak. Int. J. Comput. Sci. Eng. Inf. Technol. 2012; 2(2): 91–103.
[39] KR Ghule, RR Deshmukh. Feature-Extraction-Techniques-for-Speech-Recognition-A-Review. Int. J.
Sci. Eng. Res. 2015; 6(5): 143–147.
[40] SC Joshi, AN Cheeran. MATLAB Based Feature Extraction Using Mel Frequency Cepstrum
Coefficients for Automatic Speech Recognition. Int. J. Sci. Eng. Technol. Res. 2014; 3(6):
1820–1823.
[41] SE Levinson. Mathematical Models for Speech Technology. Chichester, UK: John Wiley & Sons, Ltd.
2005.
[42] S Dhingra, G Nijhawan, P Pandit. Isolated speech recognition using MFCC and DTW. International
Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. 2013; 2(8):
4085–4092.
[43] S Bedi, R Singh. Hamming Generalized Low Time Liftered Cepstral Analysis. Int. J. Adv. Res.
Comput. Sci. Softw. Eng. 2017; 7(6): 157–160.
[44] NA Michael. Short-Time Spectrum and ‘Cepstrum’ Techniques for Vocal-Pitch Detection. J. Acoust.
Soc. Am. 1964; 36(2): 296–302.
[45] RM Martin, CL Burley. Power Cepstrum Technique With Application to Model Helicopter Acoustic
Data. NASA Technical Paper 2586. 1986.

More Related Content

What's hot

Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
CSCJournals
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
IJECEIAES
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
Roger Gomes
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
IDES Editor
 
Speaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMSpeaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBM
inventionjournals
 
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
ijcsa
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
IJAEMSJORNAL
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
IJCI JOURNAL
 
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
sipij
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
ijceronline
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identification
ijcisjournal
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak Transform
CSCJournals
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
ijistjournal
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
ijsrd.com
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
gt_ebuddy
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
phyuhsan
 
Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...
IJECEIAES
 
19 ijcse-01227
19 ijcse-0122719 ijcse-01227
19 ijcse-01227
Shivlal Mewada
 

What's hot (20)

Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Limited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of FeaturesLimited Data Speaker Verification: Fusion of Features
Limited Data Speaker Verification: Fusion of Features
 
Speaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home ApplicationsSpeaker and Speech Recognition for Secured Smart Home Applications
Speaker and Speech Recognition for Secured Smart Home Applications
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 
Speaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBMSpeaker Identification based on GFCC using GMM-UBM
Speaker Identification based on GFCC using GMM-UBM
 
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
 
05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc05 comparative study of voice print based acoustic features mfcc and lpcc
05 comparative study of voice print based acoustic features mfcc and lpcc
 
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...F EATURE  S ELECTION USING  F ISHER ’ S  R ATIO  T ECHNIQUE FOR  A UTOMATIC  ...
F EATURE S ELECTION USING F ISHER ’ S R ATIO T ECHNIQUE FOR A UTOMATIC ...
 
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
Speech Recognition Using HMM with MFCC-An Analysis Using Frequency Specral De...
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identification
 
A Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak TransformA Text-Independent Speaker Identification System based on The Zak Transform
A Text-Independent Speaker Identification System based on The Zak Transform
 
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTKPUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...Design and implementation of a java based virtual laboratory for data communi...
Design and implementation of a java based virtual laboratory for data communi...
 
19 ijcse-01227
19 ijcse-0122719 ijcse-01227
19 ijcse-01227
 

Similar to FPGA-based implementation of speech recognition for robocar control using MFCC

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
IJECEIAES
 
Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system
IJECEIAES
 
N017428692
N017428692N017428692
N017428692
IOSR Journals
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
IJCSEA Journal
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET Journal
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System
IJCSIS Research Publications
 
Fp3410401046
Fp3410401046Fp3410401046
Fp3410401046
IJERA Editor
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
IRJET Journal
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
cscpconf
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
TANVIRAHMED611926
 
Low complexity DCO-FBMC visible light communication system
Low complexity DCO-FBMC visible light  communication system Low complexity DCO-FBMC visible light  communication system
Low complexity DCO-FBMC visible light communication system
IJECEIAES
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
TELKOMNIKA JOURNAL
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
csandit
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
TELKOMNIKA JOURNAL
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
IRJET Journal
 
Audio Based Speech Recognition Using KNN Classification Method.Report
Audio Based Speech Recognition Using KNN Classification Method.ReportAudio Based Speech Recognition Using KNN Classification Method.Report
Audio Based Speech Recognition Using KNN Classification Method.ReportSantosh Kumar
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
IJERA Editor
 
Sound event detection using deep neural networks
Sound event detection using deep neural networksSound event detection using deep neural networks
Sound event detection using deep neural networks
TELKOMNIKA JOURNAL
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
FEATURE EXTRACTION USING MFCC
FEATURE EXTRACTION USING MFCCFEATURE EXTRACTION USING MFCC
FEATURE EXTRACTION USING MFCC
sipij
 

Similar to FPGA-based implementation of speech recognition for robocar control using MFCC (20)

Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...Intelligent Arabic letters speech recognition system based on mel frequency c...
Intelligent Arabic letters speech recognition system based on mel frequency c...
 
Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system
 
N017428692
N017428692N017428692
N017428692
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
 
Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System Fusion Approach for Robust Speaker Identification System
Fusion Approach for Robust Speaker Identification System
 
Fp3410401046
Fp3410401046Fp3410401046
Fp3410401046
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
Low complexity DCO-FBMC visible light communication system
Low complexity DCO-FBMC visible light  communication system Low complexity DCO-FBMC visible light  communication system
Low complexity DCO-FBMC visible light communication system
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITIONSYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
SYNTHETICAL ENLARGEMENT OF MFCC BASED TRAINING SETS FOR EMOTION RECOGNITION
 
A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...A comparison of different support vector machine kernels for artificial speec...
A comparison of different support vector machine kernels for artificial speec...
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
Audio Based Speech Recognition Using KNN Classification Method.Report
Audio Based Speech Recognition Using KNN Classification Method.ReportAudio Based Speech Recognition Using KNN Classification Method.Report
Audio Based Speech Recognition Using KNN Classification Method.Report
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
 
Sound event detection using deep neural networks
Sound event detection using deep neural networksSound event detection using deep neural networks
Sound event detection using deep neural networks
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
FEATURE EXTRACTION USING MFCC
FEATURE EXTRACTION USING MFCCFEATURE EXTRACTION USING MFCC
FEATURE EXTRACTION USING MFCC
 

More from TELKOMNIKA JOURNAL

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
TELKOMNIKA JOURNAL
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
TELKOMNIKA JOURNAL
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
TELKOMNIKA JOURNAL
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
TELKOMNIKA JOURNAL
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
TELKOMNIKA JOURNAL
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
TELKOMNIKA JOURNAL
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
TELKOMNIKA JOURNAL
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
TELKOMNIKA JOURNAL
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
TELKOMNIKA JOURNAL
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
TELKOMNIKA JOURNAL
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
TELKOMNIKA JOURNAL
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
TELKOMNIKA JOURNAL
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
TELKOMNIKA JOURNAL
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
TELKOMNIKA JOURNAL
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
TELKOMNIKA JOURNAL
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
TELKOMNIKA JOURNAL
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
TELKOMNIKA JOURNAL
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
TELKOMNIKA JOURNAL
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
TELKOMNIKA JOURNAL
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
TELKOMNIKA JOURNAL
 

More from TELKOMNIKA JOURNAL (20)

Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...Amazon products reviews classification based on machine learning, deep learni...
Amazon products reviews classification based on machine learning, deep learni...
 
Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...Design, simulation, and analysis of microstrip patch antenna for wireless app...
Design, simulation, and analysis of microstrip patch antenna for wireless app...
 
Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...Design and simulation an optimal enhanced PI controller for congestion avoida...
Design and simulation an optimal enhanced PI controller for congestion avoida...
 
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...
 
Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...Conceptual model of internet banking adoption with perceived risk and trust f...
Conceptual model of internet banking adoption with perceived risk and trust f...
 
Efficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antennaEfficient combined fuzzy logic and LMS algorithm for smart antenna
Efficient combined fuzzy logic and LMS algorithm for smart antenna
 
Design and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fireDesign and implementation of a LoRa-based system for warning of forest fire
Design and implementation of a LoRa-based system for warning of forest fire
 
Wavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio networkWavelet-based sensing technique in cognitive radio network
Wavelet-based sensing technique in cognitive radio network
 
A novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bandsA novel compact dual-band bandstop filter with enhanced rejection bands
A novel compact dual-band bandstop filter with enhanced rejection bands
 
Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...Deep learning approach to DDoS attack with imbalanced data at the application...
Deep learning approach to DDoS attack with imbalanced data at the application...
 
Brief note on match and miss-match uncertainties
Brief note on match and miss-match uncertaintiesBrief note on match and miss-match uncertainties
Brief note on match and miss-match uncertainties
 
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...
 
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemEvaluation of the weighted-overlap add model with massive MIMO in a 5G system
Evaluation of the weighted-overlap add model with massive MIMO in a 5G system
 
Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...Reflector antenna design in different frequencies using frequency selective s...
Reflector antenna design in different frequencies using frequency selective s...
 
Reagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensorReagentless iron detection in water based on unclad fiber optical sensor
Reagentless iron detection in water based on unclad fiber optical sensor
 
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...Impact of CuS counter electrode calcination temperature on quantum dot sensit...
Impact of CuS counter electrode calcination temperature on quantum dot sensit...
 
A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...A progressive learning for structural tolerance online sequential extreme lea...
A progressive learning for structural tolerance online sequential extreme lea...
 
Electroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networksElectroencephalography-based brain-computer interface using neural networks
Electroencephalography-based brain-computer interface using neural networks
 
Adaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imagingAdaptive segmentation algorithm based on level set model in medical imaging
Adaptive segmentation algorithm based on level set model in medical imaging
 
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...
 

Recently uploaded

14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
bhadouriyakaku
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Ethernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.pptEthernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.ppt
azkamurat
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
obonagu
 

Recently uploaded (20)

14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Ethernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.pptEthernet Routing and switching chapter 1.ppt
Ethernet Routing and switching chapter 1.ppt
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 

FPGA-based implementation of speech recognition for robocar control using MFCC

  • 1. TELKOMNIKA, Vol.17, No.4, August 2019, pp.1914~1922 ISSN: 1693-6930, accredited First Grade by Kemenristekdikti, Decree No: 21/E/KPT/2018 DOI: 10.12928/TELKOMNIKA.v17i4.12615 ◼ 1914 Received July 1, 2018; Revised February 23, 2019; Accepted March 27, 2019 FPGA-based implementation of speech recognition for robocar control using MFCC Bayuaji Kurniadhani*1 , Sugondo Hadiyoso2 , Suci Aulia3 , Rita Magdalena4 1,4 School of Electrical Engineering, Telkom University, Terusan Buah Batu, Bandung, Indonesia 2,3 School of Applied Science, Telkom University, Terusan Buah Batu, Bandung, Indonesia *Corresponding author, e-mail: kurniadhani.bayuaji@gmail.com1 , sugondo@telkomuniversity.ac.id2 , suciaulia@telkomuniversity.ac.id3 , ritamagdalena@telkomuniversity.ac.id4 Abstract This research proposes a simulation of the logic series of speech recognition on the MFCC (Mel Frequency Spread Spectrum) based FPGA and Euclidean Distance to control the robotic car motion. The speech known would be used as a command to operate the robotic car. MFCC in this study was used in the feature extraction process, while Euclidean distance was applied in the feature classification process of each speech that later would be forwarded to the part of decision to give the control logic in robotic motor. The test that has been conducted showed that the logic series designed was precise here by measuring the Mel Frequency Warping and Power Cepstrum. With the achievement of logic design in this research proven with a comparison between the Matlab computation and Xilinx simulation, it enables to facilitate the researchers to continue its implementation to FPGA hardware. Keywords: Euclidean distance, FPGA, MFCC, speech recognition Copyright © 2019 Universitas Ahmad Dahlan. All rights reserved. 1. Introduction The research on Automatic Speech Recognition (ASR) is still the most interest topic for its very important role in daily life such as in the number of works that have been conducted on SR (Speech Recognition) such as Smart Home [1-4], Artificial Intelligence such as human emotional classification based on speech recognition [5-9] and robotic application [10], in the field of Student Learning [11-13], in medical sector, SR is used to detect the stress level of a person [14]. Therefore, the optimization process in SR is still in the progress for obtaining the better accuracy such as with speech enhancement [15-17] or noise reduction [18-20]. Automatic Speech Recognition (ASR) is a signal recognition process of speech to be a number of word orders. This concept has been developed since 2000 started from the emergence of Hidden Markov Model-based CMU Sphinx-N, Google Voice Recognition in Androin in 2010. In 2012 Apple released its application named Siri [21]. Since the end of 2017, there have been many applicable applications of speech recognition along with the increasing use of speech technology in real life environment. Of some applications of ASR above, speech features that are frequently used in the extraction process are [22, 23] LPC (Linear Predictive Codes), MFCC (Mel Frequency Cepstral Coefficients), PLP (Perceptual Linear Prediction), and PLP-RASTA (PLP-Relative Spectra). A technical review to observe the performance of the speech feature extraction techniques (MFCC, LPC, PLP, GFCC) with the combination of its classification technique (DTW, HMM, MLP, SVM, and DT) have been tested for SR Tamil Spoken words by Vimala [24]. Based upon the result of the test, of 5 (five) varieties of the method of feature extraction and classification, GFCC method has been more excellent in comparison to other algorithms [25]. Gurban has also conducted an approach of the MFCC based audio visual SR and conducted the optimization with two methods: CMI (Conditional Mutual Information) algorithm and MIFS (Mutual Information Feature Selection) algorithm. The result was found at best in very noisy conditions. Another study on SR was conducted by Gupta [26] using LPC and LPCC as the feature extraction techniques with a result showing that in terms of speech identification, the LPC parameter was more precise compared to LPCC. On the other hand, in reliability and robustness, LPCC was more excellent. Research study group on Speech and Language Technology for three years successfully did a research on how the effect of stress was on
  • 2. TELKOMNIKA ISSN: 1693-6930 ◼ FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani) 1915 the speech production with the approaches of HMM and MFCC [27]. Other SR was with hidden Marcov model that is the experiment conducted by Farsi [28], the result showed that the HMM process needed to still be done with GA (Genetic Algorithm) for a good result. To compare [27], another study on MFCC conducted by Mohan [29] was done by combining MFCC with DTW algorithm showed the good results in the process of SR with ten features for each word in training phase. Based on the comparison from a number of studies above (MFCC, LPC, PLP, GFCC) to this study, MFCC was used for the process of feature extraction on the speech signal as commonly it had the high performance rate and low complexity [30]. In this research, a prototype of SR system was made to FPGA that later would be used as an input for the robotic car (further research). One of examples of word recognition system application can be done in a simple robot car. The robot is able to differentiate the word pronounced by the users. By applying the word recognition system in the robotic car, then the users could control the direction of robot motion without a need to touch the button or being close to the robot. 2. Mel Frequency Cepstrum Coefficients (MFCC) Feature extraction is a process to determine a value or vector that can be used as an object or individual identifier that subsequently will be used in the classification process [31, 32]. MFCC analysis is a standard method [33] used to represent the parameter of sound signal. The mechanism of MFCC is based upon the difference of frequency that can be captured by human ears commonly stated in the scale of Mel (originated from Melody) in which the sound signal would be filtered in linear in Mel frequency scale that is for the low frequency less than 1 KHz and logarithmically for high frequency more than 1 KHz [34]. The block diagram for the process of feature extraction using MFCC is presented in Figure 1. Figure 1. MFCC block diagram As shown in Figure1 MFCC consists of some following computational steps: Step 1: Preprocessing As the initial step, the analog signal is passed through HPF emphasizes or known as pre-emphasis [35] Its purpose is to increase the energy of signal [36] thus, its output becomes the one as in (1) : Y(𝑛) = 𝑥(𝑛)−∝ 𝑥(𝑛 − 1) (1) 𝑤ℎ𝑒𝑟𝑒 0.9 ≤ ∝ ≤ 1
  • 3. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922 1916 Step 2: Frame Blocking and Windowing Speech signal enters to the process of short frame or frame blocking with the duration of 10-30 ms on purpose for ADC. However, in its process, aliasing or spectral leakage or discontinue frequently occurs. For this, to cope with this problem, it must firstly start with windowing process before the signal continues to FFT phase. If defined, the window as function of 𝑤(𝑛), 0 ≤ 𝑛 ≤ (𝑁 − 1) is dependent upon the N value in which N refers to the number of samples in its frame. The input signal enters to the windowing process with the convolutional concept. The function of window used in this study is Hamming windowing as shown in (2) [37, 38]: 𝑤(𝑛) = 0.54 − 0.46 𝑐𝑜𝑠 ( 2𝜋𝑛 𝑁−1 ) (2) where 0 ≤ 𝑛 ≤ (𝑁 − 1). The determination of the number of frame length should be in the fold of 2N to facilitate the FFT process existing in the next block. Step 3: DFT (Discrete Fourier Transform) The output signal from Hamming window is in the time domain. To facilitate the measurement in the further process Mel-filter bank, then the signal is transformed mathematically from discrete time domain to the frequency using DFT method [39]. Meanwhile, the algorithm used to do transformation is called as FFT. Mathematically, DFT can be formulated as follows (3) [40, 41]: 𝑋[𝑘] = ∑ 𝑥[𝑛]. 𝑒− 𝑗2𝜋𝑛𝑘 𝑁𝑁−1 𝑛=0 (2) 𝑘 = 0,1,2, … . , (𝑁 − 1) By doing FFT process, then it can obtain the value of X[k] as the result of the transformation from FFT representing each value of x(n) from the input signal. From the input signal in which each of the values is the representation of basic frequency from the input signal. X[k] is commonly called as spectrum or periodogram. Step 4: Mel Frequency Filter Bank This phase is the filtering process from frequency spectrum of X[k] in each frame using a number of M filter banks. This filter is made by following the perception of mel frequency scale represented to be the function of triangle filter function and mel scale frequency is obtained from the result of the conversion of linear frequency. For the linear frequency (fHz)<1 kHz, it is converted to be fHz while if fHz>1 kHz, then it is converted into the (4) presented as follows [29], [35], [42] : 𝑀𝑒𝑙(𝑓) = 2595𝑙𝑜𝑔10 (1 + 𝑓 700 ) (3) warping process to the signal in the frequency domain will result in the value of Mel Spectrum coefficients through the process as shown in (5) as follows: 𝑋𝑖 = 𝑙𝑜𝑔10(∑ |𝑋(𝑘)|𝐻𝑖(𝑘)𝑁−1 𝑘=0 ) (4) Xi is the value of frequency spectrum to i, N is the number of coefficients of FFT, and Hi(f) is the filter value to i on the frequency spot f. Step 5: Cepstrum At this phase, Mel Cepstrum would be converted into the time domain using Discrete Cosine Transform (DCT). The result is called as Mel Frequency Cepstrum Coefficients (MFCC) as shown in (6) [41, 42] in which n is the number of coefficient and M is the number of filter banks. The word cepstrum is originated from the word spectrum that is reversed in its first syllable that is spec to ceps [43]. Cepstrum is the power spectrum obtained mathematically through the logarithmic computation [44, 45].
  • 4. TELKOMNIKA ISSN: 1693-6930 ◼ FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani) 1917 𝐶 𝑛 = ∑ ( 𝑙𝑜𝑔 𝑋𝑖) [𝑛 (𝑀 − 1 2 ) 𝜋 𝑀 ]𝑀 𝑖=1 (5) 3. Digital Design and Speech Recognition Simulation This section discusses in detailed about the design of the block of speech recognition (SR) system. The main part consists of audio codec interface, feature extraction using MFCC and classification using Euclidean distance. It also discusses about the simulation of the digital logic design to FPGA using the Xilinx application. In the beginning of condition, the system will wait for the input from the switch to do training or recognizing. In the training mode, the coming sound will face the process of feature extraction using MFCC. The result of the feature extraction will be in the form of cepstral coefficient stored in the database. In the recognizing mode, the sound coming to the system will face the feature extraction process. Then, the coefficient cepstral of input sound would be compared the cepstral coefficient in the database using Euclidean distance method. The lowest value of Euclidean distance would be classified with the corresponding words. Further, the logic on the output pin would be conditioned in accordance with the words recognized. The value of output pin of this FPGA is used as the logic input on the driver motor to control the motion of the robotic car as shown in Table 1 as follows. Table 1. Logic Output FPGA Instruction First Motor Second Motor Output Logic of FPGAEn Dir 1 Dir 2 En Dir 1 Dir 2 “Right” 1 1 1 1 1 1 111111 “Left” 1 0 1 1 1 0 101110 “Forward” 0 0 0 1 1 0 000110 “Backward” 0 0 0 1 0 1 000101 “Stop” 0 0 0 0 0 0 000000 The result of the simulation output then is compared to the result of the MATLAB. The system architecture of the speech recognition block is shown in Figure 2. The system architecture consists of four main blocks: microphone, audio codec, FPGA, driver motor and motor DC. Inside the FPGA, it is added with eight digital series acting as the cores of the system those are codec interface, preprocessing, control unit, clock divider, MFCC, database, Euclidean distance and output logic. Figure 2. Architecture of proposed speech recognition 4. Results and Analysis Simulation of the logic series design was conducted in Xilinx ISE Project Navigator. Also, computation simulation was done in MATLAB in order to obtain the comparing data from the designed system. The test was conducted by observing and comparing the data in each
  • 5. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922 1918 sub-system of speech recognition. The synthesis of the logic circuit in this proposed system can be seen in Figure 3. Figure 3. RTL Design schematic 4.1. Pre-emphasis Filter This series functions to reduce the noise ratio in signal (SNR). This filter maintains the high frequencies on the spectrum eliminated in the process of sound production. From the result of simulation test compared by means of manual calculation, it has been found a similar value in each sample of signal. The results are shown in Table 2. Table 2. Comparison of the Result of Manual Calculation on the Pre-emphasis Filter and Simulation Result No Manual Calculation Xilinx Simulation 1 380 380 2 -166.25 -167 3 282.875 282 4 -25.1875 -26 5 -436.563 -437 6 431.5625 431 7 23.75 23 8 -519.25 -520 9 -92.1875 -93 10 -42.3125 -43 4.2. FFT The computation FFT design on FPGA is done separately to result in 2 parts of output. From Figure 4 below can be seen that the result was not much different as the calculation operation in this design did not use the floating point system. The comparison of the results of FFT in the graphic form can be seen in Figure 4. 4.3. Mel Frequency Warping Mel Frequency Warping functions as a filter from the spectrum of frequency of the output result of FFT. The multiplication process was done in parallel to 20 filter banks to make this process faster. The results from 20 filter bank are in the form of magnitude values as
  • 6. TELKOMNIKA ISSN: 1693-6930 ◼ FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani) 1919 shown in Figure 5. As seen in Figure 5 above, it can be seen that the feature in 5 - 10 from the result of Xilink simulation approached the MATLAB computation. This shows that the logic design made is precise. Figure 4. The Graph of the calculation result on 256-point FFT on the MATLAB and Xilinx Simulation Figure 5. Graph of Mel Frequency Warping result on MATLAB and Simulation on Xilinx 4.4. Cepstrum This block functions to convert the Mel cepstrum from the result of previous block into the time domain using Discrete Cosine Transform. The coefficient value was then changed into the representation of fixed point 16 bits and stored in ROM. This block also had RAM to store the output from the previous process to facilitate the process. The results of the block, if compared to the result of the calculation on MATLAB as shown in Figure 6, were seen different. This was because the calculation operation in this block involved the numbers that had some digits after the comma. Meanwhile, the representation of the numbers used did not have any accuracy in number; thus, rounding occurred to the representation of numbers closer. 4.5. Decision 12 coefficients from the result of feature extraction stored in the database at the testing phase will be compared to the coefficient of the sound input feature and then will be cut to do 1 2 3 4 5 6 7 8 9 10 Number of Features 2 2.5 3 3.5 4 Magnitude Matlab vs Xilink Simulation Matlab vs Xilink Simulation
  • 7. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922 1920 the instruction as determined. The decision was taken by calculating the closest feature coefficient value using euclidean distance. The block of database made was used to store one sample of each instruction word. Five dataset of feature were saved in the database. Once obtaining the closest number, then decision was taken to regulate the control in the motor in the format of 4 bit data as shown in the following Figure 7. Figure 6. Graph of on the Calculation Result on MATLAB and Xilinx Simulation Figure 7. Result of the simulation on the output logic block on Xilinx 5. Conclusion This research has successfully made a design and simulation of logic series in a speech recognition system using the MFCC method and euclidean distance to control the rate of robotic car. The MFCC method was used to obtain the feature from the command input in the form of sound consisting of right, left, forward, backward and stop. The result of the signal feature from MFCC furthermore was calculated for its similarity and compared with the signal feature in the database using euclidean distance to give the control logic in motor. Process simplification was also done to obtain the resource of the FPGA memory as minimum as possible but still had a good performance. Validation has been conducted to test the excellence of the system made. This test was done by comparing the output value of logic design that has been made in each part of speech recognition componentswith the calculation simulation on MATLAB. There was a difference in the value between the result of the computation of the logic series and the MATLAB as the calculating operation involved the numbers that had some digits after the comma. Meanwhile, the representation of the numbers used did not have any sufficient number accuracy; thus making the rounding occurred in the closest representative numbers. However, this difference value was relative insignificant and the system then still had a good performance to compare the signal features from one class to other as proven in the results. The next research will be done through the implementation to the FPGA board and analysis on the synthesis of logic series to determine the parameter of the performance in IC design. 1 2 3 4 5 6 7 8 9 10 11 12 Number of Features -10 -6 -2 2 6 10 Magnitude Matlab Computation vs Xilink Simulation Matlab Xilink
  • 8. TELKOMNIKA ISSN: 1693-6930 ◼ FPGA-based implementation of speech recognition for robocar control... (Bayuaji Kurniadhani) 1921 References [1] M Vacher, AFleury, F Portet, J-F Serignat, N Noury. Complete Sound and Speech Recognition System for Health Smart Homes: Application to the Recognition of Activities of Daily Living. New Developments in Biomedical Engineering. InTech. 2010. [2] P Putthapipat, C Woralert, P Sirinimnuankul. Speech recognition gateway for home automation on open platform. International Conference on Electronics, Information, and Communication (ICEIC) 2018. Hawai, USA. 2018: 1–4. [3] T Ayres, B Nolan. Voice activated command and control with speech recognition over WiFi. Sci. Comput. Program. 2006; 59(1–2): 109–126. [4] A Mohanta, VK Mittal. Human Emotional States Classification Based upon Changes in Speech Production Features in Vowel Region. International Conference on Telecommunication and Networks (TEL-NET) 2017. Noida, India. 2017. [5] KF Akingbade, OM Umanna, IA Alimi. Voice-Based Door Access Control System Using the Mel Frequency Cepstrum Coefficients and Gaussian Mixture Model. Int. J. Electr. Comput. Eng. 2014; 4(5): 643–647. [6] W Rao et al. Investigation of fixed-dimensional speech representations for real-time speech emotion recognition system. International Conference on Orange Technologies (ICOT) 2017: 197–200. [7] J Yadav, MS Fahad, KS Rao. Epoch detection from emotional speeach signal using zero time windowing. Speech Commun. 2017; 96(November): 142–149. [8] HK Palo, MN Mohanty. Classification of Emotional Speech of Children Using Probabilistic Neural Network. Int. J. Electr. Comput. Eng. 2015; 5(2): 311–317. [9] FE Gunawan, K Idananta. Predicting the Level of Emotion by Means of Indonesian Speech Signal. TELKOMNIKA. 2017; 15(2): 665–670. [10] J Twiefel, X Hinaut, S Wermter, J Twiefel, X Hinaut, S. Wermter. Syntactic Reanalysis in Language Models for Speech Recognition. Int. Conf. Dev. Learn. Epigenetic Robot. Lisbon, Portugal. 2017: 215–220. [11] D Recasens, C Rodríguez. Contextual and syllabic effects in heterosyllabic consonant sequences. An ultrasound study. Speech Commun. 2017; 96(December): 150–167. [12] M Jia, J Sun, C Bao, C Ritz. Separation of multiple speech sources by recovering sparse and non-sparse components from B-format microphone recordings. Speech Commun. 2017; 96(May): 184–196. [13] M Tahon, G Lecorve, D Lolive. Can we Generate Emotional Pronunciations for Expressive Speech Synthesis?. IEEE Trans. Affect. Comput. 2018; 3045: 1-1. [14] K Li, S Mao, X Li, Z Wu, H Meng. Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks. Speech Commun. 2017; 96(September): 28–36. [15] RK Kandagatla, PV. Subbaiah. Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty. Speech Commun. 2017; 96(November): 10–27. [16] A Misra, JHL. Hansen. Modelling and compensation for language mismatch in speaker verification. Speech Commun. 2017; 96(January): 58–66. [17] B Wiem, BM Mohamed anouar, P Mowlaee, B Aicha. Unsupervised single channel speech separation based on optimized subspace separation. Speech Commun. 2017; 96 (April): 93–101. [18] R Soleymani, IW. Selesnick, DM. Landsberger. SEDA: A tunable Q-factor wavelet-based noise reduction algorithm for multi-talker babble. Speech Commun. 2017; 96(October): 102–115. [19] Y Tang, Q Liu, W Wang, TJ Cox. A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones. Speech Commun. 2017; 96(June): 116–128. [20] F Saki, A Bhattacharya, N Kehtarnavaz. Real-time Simulink implementation of noise adaptive speech processing pipeline of cochlear implants. Speech Commun. 2017; 96(April): 197–206. [21] S Naragan, MD Gupta. Speech Feature Extraction Technique: A Review. Int. J. Comput. Sci. Mob. Comput. 2015; 4(3): 107–114. [22] N Dave. Feature Extraction Methods LPC, PLP and MFCC In Speech Recognition. Int. J. Adv. Res. Eng. Technol. 2013; 1(VI): 1–5. [23] Chulhee Lee, Donghoon Hyun, Euisun Choi, Jinwook Go, and Chungyong Lee. Optimizing feature extraction for speech recognition. IEEE Trans. Speech Audio Process. 2003; 11(1): 80–87. [24] C Vimala and V. Radha. Suitable Feature Extraction and Speech Recognition Technique for Isolated Tamil Spoken Words. Int. J. Comput. Sci. Inf. Technol. 2014; 5(1): 378–383. [25] M Gurban, JP Thiran. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition. IEEE Trans. Signal Process. 2009; 57(12): 4765–4776. [26] H Gupta, D Gupta. LPC and LPCC method of feature extraction in Speech Recognition System. Proc. 6th Int. Conf. Cloud Syst. Big Data Eng. Conflu. (2016). Noida, India: 498–502.
  • 9. ◼ ISSN: 1693-6930 TELKOMNIKA Vol. 17, No. 4, August 2019: 1914-1922 1922 [27] HJM Steeneken, JHL Hansen. Speech under stress conditions: overview of the effect on speech production and on system performance. IEEE International Conference on Acoustics, Speech, and Signal Processing. Phoenix, USA. 1999: 2079–2082. [28] H Farsi, R Saleh. Implementation and optimization of a speech recognition system based on hidden Markov model using genetic algorithm. Iranian Conference on Intelligent Systems (ICIS). Bam, Iran. 2014: 1–5. [29] Bhadragiri Jagan Mohan and Ramesh Babu N. Speech recognition using MFCC and DTW. International Conference on Advances in Electrical Engineering (ICAEE). Vellore, India. 2014: 1–4. [30] I McLoughlin. Applied Speech and Audio Processing. Cambridge: Cambridge University Press. 2009; 9780521519. [31] J Yang, J Yang. Generalized K–L transform based combined feature extraction. Pattern Recognit. 2002; 35(1): 295–297. [32] I Guyon, A Elisseeff. An Introduction to Feature Extraction. Feature Extraction. 2006; 207: 1–25. [33] P Motlicek. Feature Extraction in Speech Coding and Recognition. Report of PhD research internship in ASP Group. 2003: 1–50. [34] GSVS Sivaram, H Hermansky. Sparse Multilayer Perceptron for Phoneme Recognition. IEEE Trans. Audio. Speech. Lang. Processing. 2012; 20(1): 23–29. [35] L Muda, M Begam, I Elamvazuthi. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques. J. Comput. 2010; 2(3): 138–143. [36] A Přibilová. Preemphasis influence on harmonic speech model with autoregressive parameterization. Radioengineering. 2003; 12(3): 32–36. [37] R Goldberg, L Riek. A practical handbook of speech coders, 1st ed. Boca Raton: CRC Press, 2000. [38] MA Samad. A Novel Window Function Yielding Suppressed Mainlobe Width and Minimum Sidelobe Peak. Int. J. Comput. Sci. Eng. Inf. Technol. 2012; 2(2): 91–103. [39] KR Ghule, RR Deshmukh. Feature-Extraction-Techniques-for-Speech-Recognition-A-Review. Int. J. Sci. Eng. Res. 2015; 6(5): 143–147. [40] SC Joshi, AN Cheeran. MATLAB Based Feature Extraction Using Mel Frequency Cepstrum Coefficients for Automatic Speech Recognition. Int. J. Sci. Eng. Technol. Res. 2014; 3(6): 1820–1823. [41] SE Levinson. Mathematical Models for Speech Technology. Chichester, UK: John Wiley & Sons, Ltd. 2005. [42] S Dhingra, G Nijhawan, P Pandit. Isolated speech recognition using MFCC and DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. 2013; 2(8): 4085–4092. [43] S Bedi, R Singh. Hamming Generalized Low Time Liftered Cepstral Analysis. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2017; 7(6): 157–160. [44] NA Michael. Short-Time Spectrum and ‘Cepstrum’ Techniques for Vocal-Pitch Detection. J. Acoust. Soc. Am. 1964; 36(2): 296–302. [45] RM Martin, CL Burley. Power Cepstrum Technique With Application to Model Helicopter Acoustic Data. NASA Technical Paper 2586. 1986.