More Related Content
Similar to 50120140505010
Similar to 50120140505010 (20)
More from IAEME Publication
More from IAEME Publication (20)
50120140505010
- 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME
76
SPEECH RECOGNITION USING GENETIC ALGORITHM
¹Asst.Prof. Dr. Jane J.Stephan, ²Asst.Lecture. Rasha H.Ali
¹Iraqi Commission for Computers and Informatics, Baghdad, Iraq
²Computer Department, Education College for Women, Baghdad University, Baghdad, Iraq
ABSTRACT
Speech recognition is a very important field that can be used in many applications such as
banking, and transaction over telephone network database access service, voice email, investigations,
and management. In this paper, an approach for recognition isolated Arabic words is presented.
Discrete Wavelet Transform (DWT) from type Haar Wavelet with (third and fourth levels) and
Magnitude is used in feature extraction stage and Genetic Algorithm (GA) is used in classification
stage. The results showed that the recognition rate in third level was 90% and fourth level was
87.5%.
Keywords: Speech Recognition, Genetic Algorithm, Wavelet Transform, Magnitude.
1. INTRODUCTION
Speech signals are composed of a sequence of sound. These sound and the transitions
between them serve as symbolic representation of information. The arrangement of these sounds
(symbols) is governed by the rule of language. The study of these rules and their implications in
human communication is the domain of linguistic. The study and classification of the sounds of
speech is called phonetics. Speech can be represented in term of its message content or information.
An alternative way of characterizing speech is in terms of the signal carrying the message
information, i.e., the acoustic waveform [1]. Speech is one of the most important tools for
communication between human and his environment, therefore manufacturing of Automatic System
Recognition (ASR) is desire for him all the time [2].
Speech recognition systems are separated in several different classes by describing what
types of utterances they have the ability to recognize. These classes are based on the fact that one of
the difficulties of ASR is the ability to determine when a speaker starts and finishes an utterance [3].
The types of speech are isolated word, connected word, and continuous word .In this paper the
Arabic isolated words are treated for recognition.
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 5, Issue 5, May (2014), pp. 76-81
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
- 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME
77
2. GENETIC ALGORITHM
Genetic algorithm based on natural genetics; therefore they share the same names. The
genetic algorithms is a stochastic search technique (stochastic search use probability to help guide
their search) inspired by the mechanics of natural selection and natural genetics [4].
The basic idea behind the genetic algorithms is to maintain a population of strings or
chromosome, which are encoding of a potential solution to the problem being investigated. Each
chromosome is tested using a fitness function to know the good solution of the problem. The new
population is created by selecting chromosome from the old population. The new population is re-
evaluated and the processes continue until the solution is found [5]. The strings of artificial genetic
system are analogous to chromosome in biological system. The chromosomes are composed of
features, or detectors that are called genes. This may take on some number of values, called alleles.
Features may be located on different positions on the string, the position of genes, it is locus,
is identified separately from the gene's function. Thus, we can talk a particular gene, for example, an
animal's eye color gene, its locus, position 10, and its allele value, blue eyes. The total package of
strings (chromosome) is called a structure. These structures decoded to form a particular parameter
set. [4].
In natural systems, one or more chromosome combined to form the total genetic prescription
for the construction and operation of some organism. The total genetic package (structure) is called
the genotype. The organism formed by the interaction of the total genetic package with its
environment is called the phenotype [5].
3. THE DATABASE OF SPEECH
The system has been applied on eleven Arabic words; these words were recorded by
microphone and independent speaker (5 speakers). Each speaker repeats each word 7 times, three for
storing as reference and four for testing. The format of these files are wave format The total number
of words in the database becomes 385 utterances,165 utterances are used for storing as reference and
220 utterances are used as testing in the proposed algorithm. Each word has different length. These
words are:
( تمارا,رفيف,ياسين )عمان,رمان,سلطان,بغداد,حسين,يوسف,شبكة,الخير,
This word was recorded at sampling rate of 22 KHz coded in 8bits, one channel. The
digitization of the signal was made by the professional software "sound forge version 9.0"[6].
4. ARCHITECTURE OF THE PROPOSED APPROACH
The proposed approach for speech recognition consists of three stages: the preprocessing
stage, the feature extraction stage, and the classification stage. Figure (1) shows the architecture of
the proposed approach.
- 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME
78
Figure 1: The proposed approach for recognition
4.1 Preprocessing stage
This stage represents the signal processing part (i.e. converting the signal to its parametric
representation. The signal must be transition from analog sound to the digital sound [7].
The speech signal is blocked into frames of m samples. Since we deal with speech signal,
which is non stationary signal (vary with time), the framing process is essential to deal with frame
not with original signal. After this stage the speech signal has many frames and the number of frames
depends on the number of samples for each word. The number of samples in each frame is 256
samples and the overlap value is 128 samples.
4.2 Feature extraction stage
The speech signal consisting of infinite information, we must extract the most important
ones. A direct comparison treatment on this kind of signal is impossible because there is too much
information [7].
So the discrete wavelet transform (DWT) using haar function coefficients are applied on all
the frames for all words used in this proposed work, the filter bank used for the extraction of features
is (three and four) level, The DWT of the extracted frame (256 samples) will result in four sub bands
called (d1, d2, d3, a3). First level results is 128 samples from (high pass filter + down sampling)
called (d1) and 128 samples from (low pass filter + down sampling) called (a1). In the second level,
the same filters are applied on the output (a1), this results in 64 samples from the (high pass filter +
down sampling) called (d2) and 64 samples from the (low pass filter + down sampling) called (a2).
The third level [using the same filters on (a2)] result is 32 samples from the (high pass filter +
down sampling) called (d3) and 32 samples from the (low pass filter + down sampling) called (a3).
Since most of the information is concentrated in the low frequency components only the 32
samples result from the low pass filters (a3) will be considered as the features of the input signal.
Input signal
Preprocessing
stage
Feature extraction
stage
Store as Reference
Preprocessing
stage
Feature extraction
stage
Classification using
Genetic Algorithm
Word
recognized
Input signal
- 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME
79
The DWT using (haar) scaling function coefficients are applied on all the frames for all words used
in this proposed work. Then the magnitude for each signal to produce resultant features also named
feature vectors that are classified by GA in classification stage. Eq (1) shows the computation of
magnitude in each frame. This stage is used to reduce the number of data characterizing and shows a
limited number of parameters or coefficients [2].
∑
−
=
=
1
0
][][
Nosamples
M
N MDataNmagn .…… (1)
Where m:- is number of samples, Data[M] is the value of signal.
4.3 Classification stage
The recognition process is made by a genetic algorithm by comparing the test and reference
files. Before using Genetic Algorithm, the features that are extracted from feature extraction stage is
divided into blocks (such as 5 blocks) that can be used in matching by Genetic Algorithms for
recognition the spoken word. Fig (2) showed the segmentation of extracted signal. The following
points clarify how the genetic algorithm was used for speech recognition.
Figure 2: Segmentation for Speech Signal
1. Initialization of the population
The reference files are the population managed by Genetic Algorithms, the number of the
individual in the population is 165 individuals. The choice of the initial population is random for
each word to be recognized.
Input the signal produced
from Feature extraction
Input the no. of blocks
Appling GA steps on
blocks
End
Start
Compute Magnitude for
each block
- 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME
80
2. Encoding
Binary encoding is used to encode the magnitude of feature vector for speech signal as show
in table (1).
Table (1) Magnitude computation for each block (as chromosome)
3. Fitness Function
To evaluate the population, the fitness of each individual or chromosome are be calculated,
which is the differences between the word to be recognized and each word in the database, whenever
the distance value is less or near to zero then the word is recognized. It's formula is derived from the
Euclidean distance using Mean Square Error (MSE) as shown in Eq. (2) [8], and applied the fitness
function using the Eq.(3) to limit the boundary of The Fitness value between (0,1), this means if the
fitness value near to one then the best matching is being occur between the test and reference files
and vise versa.
1- Distance (A,B)= MSE = ∑=
−
n
i
BA
N 1
2
)(
1
.……. (2)
A is the vector test file, B is the vector reference file, N is number of block
2- Fitness=
MSE+1
1
…....…. (3)
4. Selecting
After evaluating individuals of the population, the elitist selection method will be used; this
method allows the Genetic Algorithm to retain a number of best individuals for the next generation.
These individuals may be lost if they are not selected to reproduce.
5. Crossover
Single point crossover was used in Magnitude features block by replace the value of some
blocks between the test and reference that are selected randomly, but the replace operation only in
first digit value of block in order to keep the features of the block and do not maximum change in
value of block, The probability of the crossover is 0.7 as shown in following table (2)
Table (2): Single Crossover for Magnitude blocking
بغداد كلمة Magnitude value for each block
Test 53 49 53 55 49
In Binary 011101 110001 110101 110111 110001
Reference 54 57 55 54 26
In Binary 101101 111001 110111 110110 11010
Test After
crossover
101101 110001 110101 110111 110001
Test file after
process by Eq(1)
81 140 38 255 231
In binary as
block
1010001 10001100 100110 11111111 11100111
- 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME
81
6. Mutation
A mutation operation in this paper is simple change because keeping the feature extraction is
very important in this work, so if the word did not recognized, the mutation operation will be done
by replacing the individual. The probability of mutation is 0.001.
7. The Stop Criterion
The stop criterion of the proposed approach is either the word is recognized or all sub-
populations have been covered or the number of generation is finished.
CONCLUSION
In this paper, we suggest a system for automatic recognition of isolated Arabic words, with
small vocabulary of 11 words each word recording 7 times that represent the Arabic language
characteristic. Our system used discrete wavelet transform (DWT) and Magnitude (Magn) in feature
extraction stage. And genetic algorithm in classification stage with binary encoding and single
crossover. The recognition rate when using third level of DWT is 90% and the recognition rate when
using fourth level of DWT is 87.5%.
REFERENCES
[1]- Holmes J. and Holmes W. "Speech Synthesis and Recognition", Second Edition,Taylor
and Francis e-library London and New York, 2001.
[2]- Rabiner L.R. and Schafer R.W., "Digital Processing of Speech Signals", Englewood Cliffs,
New Jersey: Prentice Hall, 1978.
[3]- Cook S. "Speech Recognition Howto",
http://www.gear21.com/speech/big_html/speech_big.html, April, 19, 2007.
[4]- Goldberg D., "Genetic Algorithm in search, Optimization and Machine Learning",
Addison Wesely Longman, Boston, MA , 1989.
[5]- Mitchell M. , "An Introduction to Genetic Algorithms", The MIT press, Cambridge , MA ,
1996.
[6]- Sound Forge 9.0, http://www.wsystem.com/html/sound_forge.html.
[7]- Maouche F., Benmohamed M. ,"Automatic Recognition of Arabic Word by Genetic
Algorithm and MFFC Modeling ", Faculty of Informatics, Mentouri University,
Constantine, Algeria,2009.
[8]- P Mahalakshmi and M R Reddy, “Speech Processing Strategies for Cochlear Prostheses-The
Past, Present and Future: A Tutorial Review”, International Journal of Advanced Research in
Engineering & Technology (IJARET), Volume 3, Issue 2, 2012, pp. 197 - 206, ISSN Print:
0976-6480, ISSN Online: 0976-6499.
[9]- Dr. Mustafa Dhiaa Al-Hassani and Dr. Abdulkareem A. Kadhim, “Design a Text-Prompt
Speaker Recognition System using LPC-Derived Features”, International Journal of
Information Technology and Management Information Systems (IJITMIS), Volume 4,
Issue 3, 2013, pp. 68 - 84, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413.
[10]- Pallavi P. Ingale and Dr. S.L. Nalbalwar, “Novel Approach to Text Independent Speaker
Identification”, International Journal of Electronics and Communication Engineering &
Technology (IJECET), Volume 3, Issue 2, 2012, pp. 87 - 93, ISSN Print: 0976- 6464,
ISSN Online: 0976 –6472.