Ijcet 06 10_006

http://www.iaeme.com/IJCIET/index.asp 51 editor@iaeme.com
International Journal of Computer Engineering & Technology (IJCET)
Volume 6, Issue 10, Oct 2015, pp. 51-57, Article ID: IJCET_06_10_006
Available online at
http://www.iaeme.com/IJCET/issues.asp?JType=IJCET&VType=6&IType=10
ISSN Print: 0976-6367 and ISSN Online: 0976–6375
© IAEME Publication
___________________________________________________________________________
GUI BASED SPEECH RECOGNITION USING
FREQUENCY SPECTRUM
Miss Gaganpreet Kaur Cheema, Mr Sukhveer Singh and Ms Jagminder Kaur
Cheema
Bhai Maha Singh College of Engineering, Mukstar,
Punjab Technical University, Jalandhar
ABSTRACT
Speech is one of the natural forms of communication. Recent development
has made it possible to use this in the security system (military application)
and controlling devices (home automation). In speech recognition, our main
task is to recognize the identity of the person from the sample in which voice
of various speakers has been recorded. An important pre-processing step in
Automatic Speech Recognition systems is to detect the presence of noise. This
proposed work forms the integral part of ASR (Automatic speech recognition)
technology.
A major limitation of previous efforts was they have appreciable accuracy
under low noise conditions but decreases radically with increasing noise. The
proposed algorithm firstly takes different samples into MATLAB using data
acquisition then the features are extracted and stored into database. The
current signal is loaded, specification of noise is added, features are
extracted, then features are modified ,then according to Euclidean distance we
see whether the signal matches or not.
Key words: Sound recognition, FSAED, MFCC and FBCC.
Cite this Article: Miss Cheema, G. K., Mr. Singh, S. and Ms. Cheema, J. K.
GUI based Speech Recognition using Frequency Spectrum. International
Journal of Computer Engineering and Technology, 6(10), 2015, pp. 51-57.
http://www.iaeme.com/IJCET/issues.asp?JType=IJCET&VType=6&IType=10
1. INTRODUCTION
A. An important pre-processing step in Automatic Speech Recognition systems is to
detect the presence of noise. It has been shown that accurate speech endpoint
detection improves the isolated word recognition accuracy. Also, proper location of
regions of speech reduces the amount of processing
 Speech recognition systems work reasonably well with a quiet background but poorly
under noisy conditions or in distorted channels.

Miss Gaganpreet Kaur Cheema, Mr. Sukhveer Singh and Ms. Jagminder Kaur Cheema
 Speech recognition is a part of pattern recognition which includes two processes:
speech training and speech recognition. The first stage is training also known as
modeling stage. In this stage, the system learned and summarized the human
language and the learned knowledge is stored to establish a language reference
model. The second stage is identification also known as testing stage.
The system will match the outside input voice messages with the reference model in
the library and get the nearest meaning or semantic recognition results.
2. WORKING OF SPEECH RECOGNITION SYSTEM
2.1. Creation of database
In our case we can show steps to create database i.e. step by step procedure to create
data base. Then we can use figures of waveform which are extracted from the signal
which has to be matched
Acquiring Data with a Sound Card: As shown in Figure 4.1 data acquisition
system a typical data acquisition session consists of these four steps:
1. Initialization: Creating a device object.
2. Configuration: Adding channels and controlling acquisition behavior with properties.
3. Execution: Starting the device object and acquiring or sending data.
4. Termination: Deleting the device object.
Figure 1 Data acquisition system
2.2. Initialization
The first step is to create the analog input object (AI) for the sound card.
AI = analog input (‘win sound’);
2.3. Configuration
Next, we add a single channel to AI, and set the sample rate to 8000 Hz with
acquisition duration of 2 seconds:
Add channel (AI, 1);
Fs = 8000; % Sample Rate is 8000 Hz
set (AI, ‘Sample Rate’, Fs)

GUI based Speech Recognition using Frequency Spectrum
duration = 2; % 2 second acquisition
set(AI, ‘Samples Per Trigger’, duration*Fs);
2.4. Execution
Now, we are ready to start the acquisition. The default trigger behavior is to start
collecting data as soon as the start command is issued. Before doing so, you should strike
the tuning fork to begin supplying a tone to the microphone (whistling will work as well).
Start (AI);
To retrieve all the data
Data = get data (AI);
2.5. Termination
The acquisition ends once all the data is acquired. To end the acquisition session, we
can delete the AI object from the workspace:
Delete (AI)
2.6. Results
Let’s now determine the frequency components of the tuning fork and plot the results.
First, we calculate the absolute value of the FFT of the data.
xfft = abs(fft (data));
3. FREQUENCY SPECTRUM USING EUCLIDEAN DISTANCE
The frequency spectrum of a time-domain (Time domain is the analysis of mathematical
functions, physical signals with respect to time. In the time domain, the signal or
function’s value is known for all real numbers) signal is a representation of that signal in
the frequency domain (frequency domain refers to the analysis of mathematical functions
or signals with respect to frequency, rather than time). The frequency spectrum can be
generated via a Fourier transform of the signal, and the resulting values are usually
presented as amplitude and phase, both plotted versus frequency. A musical tone’s timbre
is characterized by its harmonic spectrum. Spectrum analysis, also referred to as
frequency domain analysis or spectral density estimation, is the technical process of
decomposing a complex signal into simpler parts. As described above, many physical
processes are best described as a sum of many individual frequency components. Any
process that quantifies the various amounts (e.g. amplitudes, powers, intensities, or
phases), versus frequency can be called spectrum analysis. When a sound signal contains
frequencies, distributed equally over the audio spectrum, it is called white noise [2]. In
mathematics, the Euclidean distance or Euclidean metric is the “ordinary” distance
between two points that one would measure with a ruler, and is given by the Pythagorean
formula. The theorem can be written as an equation relating the lengths of the sides a, b
and c, often called the Pythagorean equation.
(4.1)
Where c represents the length of the hypotenuse, and a and b represent the lengths of
the other two sides. By using this formula as distance, Euclidean space (or even any
inner product space) becomes a metric space. The associated norm is called the
Euclidean norm. The Euclidean distance between point’s p and q is the length of the
line segment connecting them ( ). The squared distance between two vectors x =
[x1 x2] and y = [y1 y2] is the sum of squared differences in their coordinates. To

denote the distance between vectors x and y we can use the notation dx,y so that this
last result can be written as:
dxy2 = (x1-y1)2 + (x2-y2)2 (4.2)
i.e, the distance itself is the square root:-
dxy = (4.3)
4. SPEECH PROCESSING PROPOSED
4.1. Procedure
How the technique works to recognize speech of a person and to control appliances:

Recording voice of two or three persons separately. These will be treated as inputs
to the system and along will also see the frequency ranges of the inputs through plots.
Following separate plots are showing frequency ranges of inputs.
Now again recording voice of ten persons, but voice of above three are also
included in ten. This is done to find the identity of the person through voice. Voice
will be identified through English Speech Recognition Software, this helps to match
the voice.
Now the above ten voices we have taken including first three will be considered as
our database for whole
Technique. And another database is created for Storing English Commands using
the Data Acquisition.
The technique will consider the ten and first three voices simultaneously and runs
the system to get results. Suppose first voice matches with one of from ten, and then
following results are obtained.
And if second from first matches with one of from ten then following frequency
range is obtained.
5. CONCLUSION
The proposed technique (FSAED) gives accuracy up to 96% for different user voice
as compared to the conventional techniques i.e Fourier-Bessel cepstral coefficients for
robust speech recognition. Result for the different SNR (dB) shows accuracy in %.
When white noise 30(dB) added conventional technique gives the accuracy 92.3%.
Similarly, at 30 (dB) car noise gives accuracy 90.8% & music noise gives accuracy
92.6%. The proposed technique gives accuracy 94% at white noise 30(dB) added,
90.9% car noise & 93.4% music noise when 30(dB) added. Performance evaluation
results are shown graphically which indicates that proposed algorithm gives
considerably better results over the existing conventional model.
REFERENCES
[1] Resmi K, Satish Kumar, H.K. Sardana, Radhika Chhabra, “Graphical Speech
Training System for Hearing Impaired”, IEEE on Image Information Processing
(ICIIP), 2011 International Conference, 3-5 Nov 2011, ISBN 978-1-61284-859-4,
pp-1-6.
[2] Prakash Chetana, Gangashetty Suryakanth V. “Fourier-Bessel Cepstral
Coefficients for Robust Speech Recognition”, IEEE on Signal Processing and
Communications (SPCOM), 2012 International Conference, 22-25 Aug
2012,ISBN 978-1-4673-2013-9, pp-1-5.
[3] He Guangji, Sugahara Takanobu, Miyamoto Yuki, Fujinaga Tsuyoshi, Hiroki
Noguchi, Shintaro Izumi, “A 40 nm 144 mW VLSI Processor for Real-Time 60-
kWord Continuous Speech Recognition”, IEEE transaction on Circuits and
Systems, vol 59, no 8, Aug2012, pp-1656-1666.
[4] Virginia Estellers, Mihai Gurban, Jean-Philippe Thiran, “On Dynamic Stream
Weighting for Audio-Visual Speech Recognition”, IEEE transaction on Audio,
Speech, and Language Processing, vol 20, no 4, May 2012, pp-1145-1157.
[5] Shing-Tai Pan, Xu-Yu Li “An FPGA-Based Embedded Robust Speech
Recognition System Designed by Combining Empirical Mode Decomposition
and a Genetic Algorithm”, IEEE transaction on Instrumentation and
Measurement, vol 61, no 9 Sept 2012, pp-2560-2572.

[6] Punit Kumar Sharma, Dr. B.R. Lakshmikantha and K. Shanmukha Sundar “Real
Time Control of DC Motor Drive using Speech Recognition”, IEEE Power
Electronics (IICPE), 2010 India International Conference, 28-30 Jan 2011,ISBN
978-1-4244-7883-5, pp-1-5.
[7] Qun Feng Tan, Panayiotis G. Georgiou and Shrikanth (Shri) Narayanan
“Enhanced Sparse Imputation Techniques for a Robust Speech Recognition
Front-End”, IEEE transaction on Audio, Speech, and Language Processing, vol
19, no 8, Nov 2011, pp-2418-2429.
[8] Nam Soo Kim, Member, IEEE, Tae Gyoon Kang, Shin Jae Kang, Chang Woo
Han, and Doo Hwa Hong “Speech Feature Mapping Based on Switching Linear
Dynamic System” , IEEE transaction on Audio, Speech, and Language
Processing, vol 20, no 2 Feb 2012, pp-620-631.
[9] Dexin Zhou, Jiacang Kang, Zhicheng Fan, Wenlin Zhang “The Application of
Improved Apriori Algorithm in Continuous Speech Recognition”, IEEE
Mechanic Automation and Control Engineering (MACE), 2011 Second
International Conference, 15-17 July 2011,ISBN: 978-1-4244-9436-1,pp-756-
758.
[10] Baifen Liu, “Research and Implementation of the Speech Recognition
Technology Based on DSP”, IEEE Artificial Intelligence, Management Science
and Electronic Commerce (AIMSEC), 2011 2nd International Conference, 8-10
Aug 2011,ISBN: 978-1-4577-0535-9, pp- 4188-4191.
[11] Shing-Tai Pan, Sheng-Fu Liang, Tzung-Pei Hong, Jian-Hong Zeng “Apply Fuzzy
Vector Quantization to Improve the Observation-Based Discrete Hidden Markov
Model. An example on electroencephalogram (EEG) signals recognition” , IEEE
Fuzzy Systems (FUZZ), 2011 IEEE International Conference, 27-30 June 2011,
ISBN 3-642-13497-1 978-3-642-13497-5,pp-1674-1680.
[12] Yong Lu, Haining Huang “Research on a kind of Noisy Tibetan speech
recognition algorithm based on WNN” , IEEE Natural Computation (ICNC),
2011 Seventh International Conference on, vol 2, July 2011, pp-605-608.
[13] Jian Wang, Zhiyan Han, and Shuxian Lun, “Speech Emotion Recognition System
Based on Genetic Algorithm and Neural Network”, IEEE Image Analysis and
Signal Processing (IASP), 2011 International Conference, 21-23 Oct 2011, ISBN:
978-1-61284-879-2, pp- 578-582.
[14] Nemanj a Majstorovic, Milenko Andric, Davorin Mikluc “Entropy-based
algorithm for speech recognition in noisy environment”, IEEE
Telecommunications Forum (TELFOR), Nov 2011,ISBN: 978-1-4577-1499-3,
pp-667-670.
[15] Shing-Tai Pan, Ching-Fa Chen, Wei-Der Chang, Yi-Heng Tsai “Performances
Comparison between Improved DHMM and Gaussian Mixture HMM for Speech
Recognition”, IEEE on Image and Signal Processing (CISP), 2011 4th
International Congress ,vol 5, oct 2011, pp-2426-2430.
[16] Qinglin Qu, Liangguang Li “Realization of Embedded Speech Recognition
Module Based on STM32”,ISCIT on Communications and Information
Technologies (ISCIT), 11th International Symposium, oct 2011,ISBN 978-1-
4577-1294-4, pp-73-77.
[17] M. Kudinov, “Comparison of Some Algorithms for Endpoint Detection for
Speech Recognition Device Used in Cars”,IEEE on Control and Communications
(SIBCON), 2011 International Siberian Conference, Sept 2011,ISBN: 978-1-
4577-1069-8, pp-230-233.
[18] C. Y. Fook, M. Hariharan, SazaliYaacob, Adom AH”A Review: Malay Speech
Recognition and Audio Visual Speech Recognition”, IEEEon Biomedical

Engineering (ICoBE), 2012 International Conference, Feb 2012, ISBN: 978-1-
4577-1990-5, pp-479-484.
[19] Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, and Louis
Goldstein, “Articulatory Information for Noise Robust Speech Recognition”,
IEEE transaction on Audio, Speech, and Language Processing, vol 19, no 7, Sept
2011, pp-1913-1924.
[20] Gabriele Fanelli, Juergen Gall, Harald Romsdorfer, “A 3-D Audio-Visual Corpus
of Affective Communication”, IEEE transaction on multimedia, vol 12, no 6, Oct
2010, pp-591-598.
[21] Panikos Heracleous, Viet-Anh Tran, Takayuki Nagai, and Kiyohiro Shikano,
“Analysis and Recognition of NAM Speech Using HMM Distances and Visual
Information”, IEEE trans. on audio, speech and language processing, vol 18, Aug
2010, pp-1528-1538.
[22] MihaiGurban, Jean-Philippe Thiran “Information Theoretic Feature Extraction
for Audio-Visual Speech Recognition”, IEEE trans. on signal processing, vol 57,
Dec 2009, pp-4765-4776.
[23] Jong-Seok Lee, Cheol HoonPark, “Robust Audio-Visual Speech Recognition
Based on Late Integration”, IEEE trans. on multimedia, vol 10, Aug 2008, pp-
767-779.
[24] Bengt Jonas Borgström, Abeer Alwan”A Low-Complexity Parabolic Lip Contour
Model with Speaker Normalization for High-Level Feature Extraction in Noise-
Robust Audiovisual Speech Recognition”, IEEE trans. Systems, Man and
Cybernetics, Part A: Systems and Humans, vol 38, Nov 2008, pp-1273-1280.
[25] Valentin Ion and Reinhold Haeb-Umbach, “A Novel Uncertainty Decoding Rule
With Applications to Transmission Error Robust Speech Recognition”, IEEE
trans. on audio, speech and language processing, vol 16, no. 5, July 2008, pp-
1047-1060.
[26] Zhihong Zeng, Jilin Tu, Ming Liu, Thomas S. Huang, Brian Pianfetti, Dan Roth,
and Stephen Levinson “Audio-Visual Affect Recognition”, IEEE trans. on
multimedia,, july 2005, vol.9, no.2, pp-424-428.
[27] Carlos Busso, Shrikanth S. Narayanan, “Interrelation between Speech and Facial
Gestures in Emotional Utterances: A Single Subject Study”, IEEE trans. on
audio, speech and language processing, VOL 15, no. 8, Nov 2007, pp-2331-2347.
[28] Belgacem BEN MOSBAH,”Speech Recognition for Disabilities People”, IEEE
Information and Communication Technologies, vol 1.
[29] http://www.mathworks.in/help/techdoc/matlab_prog/f2-43934.html
[30] Asst. Prof. Dr. Jane J. Stephan and Asst. Lecturer Rasha H. Ali. Speech
Recognition using Genetic Algorithm. International Journal of Computer
Engineering and Technology,5(5), 2015, pp. 76-81.

Ijcet 06 10_006

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Viewers also liked

Viewers also liked (14)

Similar to Ijcet 06 10_006

Similar to Ijcet 06 10_006 (20)

More from IAEME Publication

More from IAEME Publication (20)

Recently uploaded

Recently uploaded (20)

Ijcet 06 10_006