Matlab straight forward programming interface make it an ideal tool for Hindi Key word Recognition. For the extraction of the feature, Hindi Key word database has been designed by using the Matlab 7.5. The database consists of the eight key words.
Automatic Control of Instruments Using Efficient Speech Recognition Algorithm
1. International Journal of Electrical & Electronics Engineering 16 www.ijeee-apm.com
IJEEE, Vol. 1, Spl. Issue 1 (March 2014) e-ISSN: 1694-2310 | p-ISSN: 1694-2426
Automatic Control of Instruments Using Efficient
Speech Recognition Algorithm
Abhishek Thakur1
, Rajesh Kumar2
, Amandeep Bath3
, Jitender Sharma4
1,2,3,4
Electronics & Communication Engineering Department,
Indo Global College of Engineering, Punjab, India
1
abhithakur25@gmail.com, 2
errajeshkumar2002@gmail.com,
3
amandeep_batth@rediffmail.com, 4
er_jitender2007@yahoo.co.in
Abstract- Matlab straight forward programming interface
make it an ideal tool for Hindi Key word Recognition. For the
extraction of the feature, Hindi Key word database has been
designed by using the Matlab 7.5. The database consists of
the eight key words. Each key word has been stored in
database by the ten speakers, eight male speakers and two
female speakers consist of total 80 samples for eight
commands. Features of the speech signal which are extracted
in the form of MFCC coefficients and Dynamic Time
Warping (DTW) has been used as features matching
techniques. This thesis presents the technique to detect
utterance using end point detection, MFCC to extract features
and DTW to compare the test patterns. The recognition results
are tested for clean and noisy test data. The system can be
said to be robust as average accuracy for clean data is 97.50
% while that for noisy data is 91.25 % or above is acceptable
since most people would not mind repeating a command to
the system one out of ten times or less. The system can be
implemented using one of the common microcontrollers with
a small amount of dedicated memory and an analog to digital
converter to accept the input speech. The system would be
fast, small and cost efficient to be incorporated into a wide
variety of consumer electronics. The aim of this thesis is
therefore to develop a speaker dependent, isolated word,
limited vocabulary speech recognition system that is small
enough to fit in a small household appliance and that can be
operated in real time.
Index Terms- Automatic Speech Recognition (ASR), Mel
Frequency Cepstral Coefficient (MFCC) and Dynamic Time
Wrapping (DTW)
I. INTRODUCTION
Although many systems exist for speech recognition, none of
them address the needs for consumer level applications. In
order for a system to be incorporated in the everyday needs of
a consumer, the system must be speaker independent, fast,
low cost, require no training and small enough to be fit inside
a consumer appliance. Such a system will move speech
recognition from the domain of the academic or industrial
application to that of a common home user. The above system
can be implemented using current technology once a certain
number of compromises are made. For example, let's say a
speech recognition system is to be developed so that it can be
incorporated into a home microwave oven. One can
immediately see that there is no need to have a 60,000 word
vocabulary for such a system, a dozen words including the
digits are sufficient for its operation. The system could be
further simplified if one does not allow the user to change the
number of words in the vocabulary. The Second aspect of the
system is that it does not have to accept continuous speech.
For example, a common command may be "Move....
Forward.... Fast.... Start. Proposed design for home
automation system and Matlab based Hindi key word speech
recognition system is for disabled persons, as they are not
able to move from one place to other and can‟t locate
switches. This paper attempt to provide them solution, by
sitting on wheel chair or bed they can switch on and off home
appliances and also control internal parameters like wheel
chair direction, fan speed, heater temperature. Physically
challenged persons find difficulty in power ON/OFF their
home loads such as fan, light, AC etc. they require an
attendee to do these things. In the absence of the attendee
their world seems to be more difficult. This design helps the
person with physical disability and elderly to navigate easily
within their home in a wheelchair by giving voice commands.
[3-5] designed for navigation of robot and forklift by giving
voice commands. Some of the voice based design uses a voice
recognition chip with integrated or interfaced memory chip
that has a drawback of having limited number of voice
commands. The reported design Speech Recognition Based
Wireless Automation of Home Appliances for Disabled
Persons involves automation home loads by giving voice
commands in a wireless environment.
II. SYSTEM OVERVIEW
This paper is related to the controlling of the
electronic/electrical equipment using voice key words. In this
paper we are going to recognize the Hindi key word of the
person and control the desired parameter. The goal of the
thesis is to help the disabled and handicapped persons, who
are not able to locate switches or, not able to reach there. This
thesis can also work as a security purpose by operating the
machines through the voice and will be operated by only one
person. This can also work for the home atomization and
replace the switches and remotes by the voice command. This
is done using software designed in Matlab 7.5 using MFCC
and DTW. By using this software Hindi key words spoken in
real time will match with pre recorded samples and generate
ASCII code. These ASCII codes send to microcontroller
using serial communication RS-232. All peripherals are
controlled by the microcontroller. The output of the
2. www.ijeee-apm.com International Journal of Electrical & Electronics Engineering 17
microcontroller controls the various applications upon
receiving the input from the software. The relays are
controlled on the ports of microcontroller to activate a
particular appliance connected to the particular port.
Fig. 1: Microcontroller Interfacing.
Automatic speech recognition system and home
automation system port connection with external peripherals
is shown in Table 1. All peripheral are connected to
corresponding port pin of microcontroller (89C52) as given in
Table 1. These peripherals work according to our program
and discussed in software design section. When command
word given by user through microphone it is recognized by
proposed algorithm and ASCII code will be generated. These
ASCII code given to 89C52 microcontroller, if recognized
code match then appliance will perform particular operation
related to that key word.
TABLE 1: MICROCONTROLLER PORT CONNECTION
As we can see in table 1, if AAGE key word recognized then
Port 1.7 goes logic one and Port 1.6 goes logic zero. Which
means that robot moves in forward direction. The logic one
and logic zero position of the port is shows in table 1 for
corresponding key word.
III. HARDWARE DESIGN
a) Voice processor:
Next stage is voice processor stage consisting of .m voice
processor file. After comparison in voice processor data is
send to microcontroller for control or driving action, we are
using RS232 as application communication protocol. The
whole process goes in the following manner e.g. if we say
AAGE key word the action related to “AAGE key word” has
to performed and if we say “PICHE key word” then the action
related to PICHE key word has to be performed. As shown in
figure 2 when we say key any word the microphone takes
analog signal and converts it to the electrical signal then
attenuation of the signal is performed by the attenuator.
Attenuated signal is transferred to the voice processor, these
files are executed and an ASCII code is then transferred to the
microcontroller through the RS 232 standard communication
protocol. In this manner the voice will hold the control action
of the machine or the electric appliance.
Fig. 2: Speaker recognition process.
b) Temperature sensor circuit:
We can use wide range of supply voltages lies between
single supply 3 V to 30 V (LM2902 and LM2902Q 3V to
26V), or Dual supplies. Common mode input voltage range
includes ground that allow direct sensing to near ground. The
low supply current drain is independent to the supply voltage
0.8 mA Typ. Low input bias and offset parameters includes
input offset voltage 3 mV Typ. input offset current 2 nA Typ.
input bias Current 20 nA Typ. differential input voltage range
equal to maximum rated supply voltage 32 V open loop
differential voltage amplification 100 V/mV Typ.
Fig. 3: LM 35 Interface.
c) Analog to digital converter:
Analog to digital converter device is a high current four
channel driver designed to accept standard DTL or TTL logic
levels, monolithic integrated high voltage and drive inductive
loads (such as relays solenoids, DC and stepping motors) and
switching power transistors. To simplify use as two bridges
S.N. Ports of 89C52
µc
Hardware Devices
Control
Hindi Key
Word
1 P1.0, P1.1, P1.2 ADC BAND
2 P1.4 Temperature 30 deg. TIESH
3 P1.5 Temperature 50 deg. PACHAS
4 P1.6 Go AAGE
5 P1.7 Reverse PICHE
6 P1.6, P1.7 Break RUKO
7 P2.2 Fan low set DHEERE
8 P2.3 Fan medium set TEJ
3. International Journal of Electrical & Electronics Engineering 18 www.ijeee-apm.com
each pair of channels is equipped with an enable input. A
separate supply input is provided for the logic, allowing
operation at a lower voltage and internal clamp diodes are
included. This device is suitable for use in switching
applications at frequencies up to 5 kHz. The L293D is
assembled in a 16 lead plastic package which has 4 center
pins connected together and used for heat sinking. The
L293DD is assembled in a 20 lead surface mount which has 8
center pins connected together and used for heat sinking.
600Milli amperes output current capability per channel, 1.2A
peak output current per channel, enable facility, over
temperature protection, logical input voltage up to 1.5 V,
internal clamp diodes.
Fig. 4: Functional block diagram of A to D converter.
d) Building a wireless remote control:
Now question arises that how you can get rid of that long
wired tail dangling out of your remote control robot? Well,
transforming your wired remote control into a wireless one
isn‟t as difficult as you may think. The easiest solution would
be to hack those cheap wireless toy cars, take their electronic
guts out and use them in your robot. But if you want more
flexibility, you can build a custom remote control system. The
idea is to use off the shelf RF Tx/Rx modules. These modules,
once a rare commodity, are now widely and cheaply
available. In this particular discussion, we shall be using ASK
(Amplitude Shift Keying) based TX/RX pair operating at 433
MHz
Fig. 5: ASK Transmitter and Receiver.
The transmitter module accepts serial data at a maximum
of XX baud rate. They can be directly interfaced to a
microcontroller or can be used in remote control applications
with the help of encoder/decoder ICs. The encoder IC takes in
parallel data at the TX side packages it into serial format and
then transmits it with the help of a RF transmitter module. At
the RX end, the decoder IC receives the signal via the RF
receiver module, decodes the serial data and reproduces the
original data in the parallel format. Now in order to control
say one motor, we require 2 bits of information while we need
4 bits of information to control 2 motors. HT12E and HT12D
is 4 channel encoder/decoder ICs directly compatible with the
specified RF module.
e) Wheel chair control:
Receiver receives the data in serial form then it decodes
that data and at last it is again converted into parallel form
and given to the receiver side CPU. At the receiver side the
decoder circuit IC HT 12D is used as a decoder. At the
decoder again the codes are received in serial form which
then again converted into parallel form. These decoded
signals are then given as an input to CPU. At the receiver side
the IC MN4519 is used as the buffer.
-
CNTRL=0
R30
6K8
VCC
VCC +12V
U10A
NAND2
1
2
3
VREF
+12V
2K7
VREF
-
2K7
1 2
NOT
12
+
U13
NOT
12
+12V
VREF
NOT
12
+
-
U19C
LM339
9
8
14
312
PULSE
R31
2K7
TIP-127
TIP-122
+5V
2K7
DIR/1
VCC
TIP-127
PNP
DC--MOTOR
VCC
Q8
NPN
7404
VCC
+
-
U20B
LM339
5
4
2
312
2K7
+
-
U18A
LM339
7
6
1
312
+12V
+12V
U14
NOT
1 2
2K7
DC MOTOR
CONTROL
CARD
2K7
DIR/2
NOT
1 2
+
-
U21D
LM339
11
10
13
312
Q11
PNP
7404
PAD4
OCPAD
Q9
NPN
12V
7404
7400
U11B
NAND1
1
2
3
2K7
2K7
7404
VREF
VREF
+
TIP-122
7400
DIR 1 DIR 2
CONTROL
-
+
-
+
+
-
-
+
A B
NAND
NOT
COMP
NAND
NOT
COMP.
NOT
COMP.
1
2
3 adc
4
5
6
7
8
3
Fig. 6: Robot Control.
The nature of this buffer is FIFO that is First In First Out.
In order to drive motors, we would need to connect a suitable
motor driver at the output of the decoder IC. The motor driver
circuit can consist of a Relay, transistorized H-Bridge or
motor driver ICs like the L293D, L298 etc.
4. www.ijeee-apm.com International Journal of Electrical & Electronics Engineering 19
IV. SOFTWARE DESIGN
Keyword recognition algorithm is designed according to
the block diagram as shown in figure below.
Fig. 7: Block diagram of Mel Frequency Cepstral Coefficient
Speech recognition algorithm is written in matlab 7.0 and
results are tested in clean or noisy test data. The explanation
and results are discussed in main program step by step as
shown below:
Step1. Declare variables:
clear all; % clear all variables
close all;% close all files
clc % clear screen
ncoeff = 13; %Required number of mfcc coefficients
N = 8; %Number of words in vocabulary
k = 4; %Number of nearest neighbors to choose
fs=16000; %Sampling rate
duration1 = 0.15; %Initial silence duration in seconds
duration2 = 2; %Recording duration in seconds
G=2; %vary this factor to compensate for amplitude
variations
NSpeakers = 5; %Number of training speakers
Step2. Input Keyword and perform EPD:
Fig. 8: End Point Detection for Hindi Key word “AAGE”.
for i=1:8; % Check real time 8 keywords
fprintf('Press any key to start %g seconds of speech
recording...', duration2);
pause; % Wait for 0.15 second
silence = wavrecord(duration1*fs, fs); %Record keyword
fprintf('Recording speech...');
speechIn = wavrecord(duration2*fs, fs); % duration*fs is the
total number of sample points
Fig. 9: After End Point Detection for Hindi Key word
“AAGE”.
Step3. Addition of silence:
p=length(speechIn)-length(silence);
for i=1:p
silence=[silence ;0];
end
fprintf('Finished recording.n');
fprintf('System is trying to recognize what you have
spoken...n');
speechIn1 = [silence;speechIn]; %pads with 150 ms silence
speechIn2 = speechIn1.*G;
Fig. 10: Addition of silence 0.15 seconds in Hindi key word
“AAGE”.
Step4. Noise Reduction:
speechIn3 = speechIn2 - mean(speechIn2); %DC offset
elimination
speechIn = nreduce(speechIn3,fs); %Applies spectral
subtraction
Fig. 11: After noise reduction for Hindi key word “AAGE”.
Step5. Windowing, DFT and Mel filter bank:
rMatrix1 = mfccf(13,speechIn,fs); %Compute test feature
vector
Fig. 12: Shows the time signal of the Hindi key word AAGE
and Mel filter bank of the word computed via FFT.
Step6. Inverse DFT:
rMatrix = CMN(rMatrix1); %Removes convolutional noise
Sco = DTWScores(rMatrix,N); %computes all DTW scores
[SortedScores,EIndex] = sort(Sco); %Sort scores increasing
K_Vector = EIndex(1:k); %Gets k lowest scores
Neighbors = zeros(1,k); %will hold k-N neighbors
5. International Journal of Electrical & Electronics Engineering 20 www.ijeee-apm.com
Fig.13: DCT and Spectrogram for „AAGE‟ Key Word.
% Code below uses the index of the returned k lowest scores
to determine their classes
for t = 1:k
u = K_Vector(t);
for r = 1:NSpeakers-1
if u <= (N)
break
else u = u - (N);
end
end
Neighbors(t) = u;
end
Fig.14: Result for keyword recognition „AAGE‟ Key Word.
%Apply k-Nearest Neighbor rule
Nbr = Neighbors[Modal,Freq] = mode(Nbr); %most frequent
value
Word = strvcat('Forward-AAGE', 'Reverse-PICHE', 'Break-
RUKO', 'Thirty-TEESH', 'Fifty-PACHAS', 'low-DHERE',
'Medium-TEJ', 'Stop-BAND');
if mean(abs(speechIn)) < 0.01
fprintf('No microphone connected or you have not said
anything.n');
elseif ((k/(Freq)) > 2) %if no majority
fprintf('The word you have said could not be properly
recognised.n');
else
fprintf('You have just said %s.n',Word(Modal,:)); %Prints
recognized word
end
V. RESULT DISCUSSION
We made two experiments, in noise and in clean
environment one using traditional method (Md. Rashidul
Hasan et al. 2004) and the other using the developed
technique. The templets were used as input to the same
recognition system using DTW in order to measure the
performance for each method. First experiment uses the
Fig. 15: Shows results in chart for noise environment with or without
EPD.
traditional method (Md. Rashidul Hasan et al. 2004). The
dictionary contains Hindi key words and digits. For each
hindi key word and digits were selected a number of
templates from several training candidates (4-10) and second
experiment use 8 templates. A new generated template was
used for each key word and digit. Both experiments were
speaker dependent. The test was made using 8 test records for
each key words and digits. The accuracy for Hindi Key Word
recognition is calculated by speaking one command 10 times
and find out how many times it recognize Key Words with
different rate of speech. Chart shows approximately 91.25 %
accuracy with end point detection when user 1 say key Word
in 10 × 12 room with noise environment (Fan On, Tv On, and
Cooking in Kitchen) and without end point detection average
accuracy is 80.00 %. Figure shows chart for Hindi key word
recognition in noise environment with or without EPD.
6. www.ijeee-apm.com International Journal of Electrical & Electronics Engineering 21
Fig. 16: Shows results in chart for clean environment with or without
EPD.
Chart shows approximately 97.50 % accuracy with end
point detection when user 1 say key Word in 10 × 12 room
with clean environment (Fan Off, Tv Off, No Cooking in
Kitchen) and without end point detection average accuracy is
87.50 %. Figure 2 shows chart for hindi key word recognition
in clean environment with or without EPD. After calculating
MFCC features, DTW finds nearest distance between spoken
word and recorded samples of 10 speakers. If nearest distance
of recorded samples matches with five or more samples then
it will show output and related to key word operation
performed, if match is below five samples then play recording
word not recognized please try again.
VI. CONCLUSION AND FUTURE WORK
This paper presents a simple technique for word detection
using end point detection, feature extraction using Mel
frequency cepstral coefficient and feature matching using
dynamic time warping. The implemented algorithm and
control system control fan speed, temperature of heater and
robot direction using the voice key word. It demonstrates its
reliability and ease of future development. Based on obtained
experimental results it demonstrates that the proposed
algorithm is indeed functional and it can be used in voice key
word recognition home automation system and industrial
robots. Percentage of correct recognition of key word is high
enough. The recognition results are tested for clean and noisy
test data. The system can be said to be robust as average
accuracy for clean data is 97.50 while that for noisy data is
91.25 %.
The main contribution of this study is that it presents the
idea of Hindi key word recognition and Home Automation
system. The experiments also show that the approach is good
for Hindi key word recognition. The proposed ASR and
Control System was completely implemented, our effort will
be directed toward developing the more appropriate and
convenient method.
REFERENCES
[1] A. Rathinavelu, G.Anupriya, A.S.Muthanantha murugavel,
“Speech Recognition Model for Tamil Stops”, Proceedings of
the World Congress on Engineering, ISBN:978-988-98671-5-7,
Vol I, pp. 543 – 547, July 2 - 4, 2007.
[2] Adriana. Tapus and Brian Scassellati, “The grand challenges in
helping humans through social robotics”, IEEE Robotics &
Automation Magazine, Vol 14, Issue 1, pp. 35–42, 2007.
[3] Anjli Bala, Abhijeet Kumar and Nidhika Birla, “Voice
Command Recognition System Based on MFCC and DTW”,
International Journal of Engineering Science and Technology,
ISSN: 0975-5462, Vol. 2, No 12, pp. 7335-7342, Dec. 2010.
[4] Atanas Ouzounov (2010) “Acestral Feature and Text
Dependent Speaker Identification-A Comparative stdy”,
Cybernetics and Information Technologies, Vol. 10, No. 1, pp.
1-12, 2010.
[5] B. H. Juang and Lawrence R. Rabiner, “Automatic Speech
Recognition – A Brief History of the Technology”, Vol. 10, No.
3, August 2004
[6] Bengt J. Borgstrom, “HMM-Based Reconstruction of
Unreliable Spectrographic Data for Noise Robust Speech
Recognition”, IEEE Transactions on Audio and Language
Processing, Vol. 18, No. 6, pp. 1612-1623 August 2010.
[7] Bharti W. Gawali, Santosh Gaikwad, Pravin Yannawar, Suresh
C.Mehrotra, “Marathi Isolated Word Recognition System using
MFCC and DTW features”, ACEEE Int. J. on Information
Technology, Vol. 01, No. 01, Mar 2011.
[8] Cini Kurian and Kannan Balakrishnan, “Automated
Transcription System for Malayam Language”, International
Journal of Computer Application, Vol. 19, No. 5, April 2011.
[9] F. K. Soong, A. E. Rosenberg, L. R. Rabiner and B. H. Juang,
“A Vector Quantization Approach to Speaker Recognition”,
Acoustics, Speech, and Signal Processing, IEEE International
Conference on ICASSP '85, vol 10, No 3, pp. 387-390, 1985.
[10] Fausto “Tito” Poz and Durand R. Begault, “Voice Identification
and Elimination Using Aural Spectographic Protocol”, AES
26th International Conference, Denver, Colorado, USA, 7–9
July 2005.
[11] Josef Rajnoha et al. (2011) “ASR systems in Noisy
Environment: Analysis and Solutions for Increasing Noise
Robustness”, Radioengineering, Vol. 20, No. 1, April 2011.
[12] K. H. Davis, R. Biddulph and S. Balashek, “Automatic
Recognition of spoken digits”, The Journal of the acoustical
society of america, vol 24, No 6, November, 1952.
[13] K. M. Ravikumar, R. Rajagopal and H. C. Nagaraj, “An
Approach for Objective Assessment of Stuttered Speech Using
MFCC Features”, DSP Journal, Volume 9, Issue 1, June, 2009.
[14] Khalid Saeed, “Sound and Voice Verification and Identification
A Brief Review of Töeplitz Approach”, Znalosti 2008, pp. 22-
27, 2008.
[15] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice
Recognition Algorithms using Mel Frequency Cepstral
Coefficient (MFCC) and Dynamic Time Warping (DTW)
Techniques”, Journal of Computing, ISSN 2151-9617, Volume
2, Issue 3, March 2010
[16] M. A. Anusuya and S. K. Katti, “Speech Recognition by
Machine: A Review”, (IJCSIS) International Journal of
Computer Science and Information Security, Vol. 6, No. 3,
2009.
[17] Maayan Geffet, Yair Wiseman and Dror Feitelson, “Automatic
Alphabet Recognition”, Springer Science, Vol. 8, pp. 25–40,
2005.
[18] Mark D. Skowronski and John G. Harris, “Improving the Filter
Bank of a Classic Speech Feature Extraction Algorithm”, IEEE
7. International Journal of Electrical & Electronics Engineering 22 www.ijeee-apm.com
Intl Symposium on Circuits and Systems, Bangkok, Thailand,
vol 4, pp. 281-284, May 25 - 28, 2003.
AUTHORS
First Author– Abhishek Thakur: M.
Tech. in Electronics and
Communication Engineering from
Punjab Technical University, MBA
in Information Technology from
Symbiosis Pune, M.H. Bachelor in
Engineering (B.E.- Electronics) from
Shivaji University Kolhapur, M.H.
Five years of work experience in
teaching and one year of work experience in industry. Area of
interest: Digital Image and Speech Processing, Antenna
Design and Wireless Communication. International
Publication: 7, National Conferences and Publication: 6, Book
Published: 4 (Microprocessor and Assembly Language
Programming, Microprocessor and Microcontroller, Digital
Communication and Wireless Communication). Working
with Indo Global College of Engineering Abhipur, Mohali,
P.B. since 2011.
Email: abhithakur25@gmail.com
Second Author – Rajesh Kumar is
working as Associate Professor at
Indo Global College of Engineering,
Mohali, Punjab. He is pursuing Ph.D
from NIT, Hamirpur, H.P. and has
completed his M.Tech from GNE,
Ludhiana, India. He completed his
B.Tech from HCTM, Kaithal, India.
He has 11 years of academic
experience. He has authored many research papers in reputed
International Journals, International and National conferences.
His areas of interest are VLSI, Microelectronics and Image &
Speech Processing.
Third Author – Amandeep Batth:
M. Tech. in Electronics and
Communication Engineering from
Punjab Technical University, MBA
in Human Resource Management
from Punjab Technical University ,
Bachelor in Technology (B-Tech.)
from Punjab Technical University .
Six years of work experience in
teaching. Area of interest: Antenna Design and Wireless
Communication. International Publication: 1, National
Conferences and Publication: 4. Working with Indo Global
College of Engineering Abhipur, Mohali, P.B. since 2008.
Email: amandeep_batth@rediffmail.com
Fourth Author – Jitender Sharma: M. Tech. in Electronics
and Communication Engineering from Mullana University,
Ambala, Bachelor in Technology (B-Tech.)from Punjab
Technical University . Five years of work experience in
teaching. Area of interest:, Antenna Design and Wireless
Communication. International Publication: 1 National
Conferences and Publication:6 and Wireless
Communication). Working with Indo Global college since
2008.
E-mail: er_jitender2007@yahoo.co.in