Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME 76 SPEECH RECOGNITION USING GENETIC ALGORITHM ¹Asst.Prof. Dr. Jane J.Stephan, ²Asst.Lecture. Rasha H.Ali ¹Iraqi Commission for Computers and Informatics, Baghdad, Iraq ²Computer Department, Education College for Women, Baghdad University, Baghdad, Iraq ABSTRACT Speech recognition is a very important field that can be used in many applications such as banking, and transaction over telephone network database access service, voice email, investigations, and management. In this paper, an approach for recognition isolated Arabic words is presented. Discrete Wavelet Transform (DWT) from type Haar Wavelet with (third and fourth levels) and Magnitude is used in feature extraction stage and Genetic Algorithm (GA) is used in classification stage. The results showed that the recognition rate in third level was 90% and fourth level was 87.5%. Keywords: Speech Recognition, Genetic Algorithm, Wavelet Transform, Magnitude. 1. INTRODUCTION Speech signals are composed of a sequence of sound. These sound and the transitions between them serve as symbolic representation of information. The arrangement of these sounds (symbols) is governed by the rule of language. The study of these rules and their implications in human communication is the domain of linguistic. The study and classification of the sounds of speech is called phonetics. Speech can be represented in term of its message content or information. An alternative way of characterizing speech is in terms of the signal carrying the message information, i.e., the acoustic waveform [1]. Speech is one of the most important tools for communication between human and his environment, therefore manufacturing of Automatic System Recognition (ASR) is desire for him all the time [2]. Speech recognition systems are separated in several different classes by describing what types of utterances they have the ability to recognize. These classes are based on the fact that one of the difficulties of ASR is the ability to determine when a speaker starts and finishes an utterance [3]. The types of speech are isolated word, connected word, and continuous word .In this paper the Arabic isolated words are treated for recognition. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME 77 2. GENETIC ALGORITHM Genetic algorithm based on natural genetics; therefore they share the same names. The genetic algorithms is a stochastic search technique (stochastic search use probability to help guide their search) inspired by the mechanics of natural selection and natural genetics [4]. The basic idea behind the genetic algorithms is to maintain a population of strings or chromosome, which are encoding of a potential solution to the problem being investigated. Each chromosome is tested using a fitness function to know the good solution of the problem. The new population is created by selecting chromosome from the old population. The new population is re- evaluated and the processes continue until the solution is found [5]. The strings of artificial genetic system are analogous to chromosome in biological system. The chromosomes are composed of features, or detectors that are called genes. This may take on some number of values, called alleles. Features may be located on different positions on the string, the position of genes, it is locus, is identified separately from the gene's function. Thus, we can talk a particular gene, for example, an animal's eye color gene, its locus, position 10, and its allele value, blue eyes. The total package of strings (chromosome) is called a structure. These structures decoded to form a particular parameter set. [4]. In natural systems, one or more chromosome combined to form the total genetic prescription for the construction and operation of some organism. The total genetic package (structure) is called the genotype. The organism formed by the interaction of the total genetic package with its environment is called the phenotype [5]. 3. THE DATABASE OF SPEECH The system has been applied on eleven Arabic words; these words were recorded by microphone and independent speaker (5 speakers). Each speaker repeats each word 7 times, three for storing as reference and four for testing. The format of these files are wave format The total number of words in the database becomes 385 utterances,165 utterances are used for storing as reference and 220 utterances are used as testing in the proposed algorithm. Each word has different length. These words are: ( ‫تمارا‬,‫رفيف‬,‫ياسين‬ )‫عمان‬,‫رمان‬,‫سلطان‬,‫بغداد‬,‫حسين‬,‫يوسف‬,‫شبكة‬,‫الخير‬, This word was recorded at sampling rate of 22 KHz coded in 8bits, one channel. The digitization of the signal was made by the professional software "sound forge version 9.0"[6]. 4. ARCHITECTURE OF THE PROPOSED APPROACH The proposed approach for speech recognition consists of three stages: the preprocessing stage, the feature extraction stage, and the classification stage. Figure (1) shows the architecture of the proposed approach.
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME 78 Figure 1: The proposed approach for recognition 4.1 Preprocessing stage This stage represents the signal processing part (i.e. converting the signal to its parametric representation. The signal must be transition from analog sound to the digital sound [7]. The speech signal is blocked into frames of m samples. Since we deal with speech signal, which is non stationary signal (vary with time), the framing process is essential to deal with frame not with original signal. After this stage the speech signal has many frames and the number of frames depends on the number of samples for each word. The number of samples in each frame is 256 samples and the overlap value is 128 samples. 4.2 Feature extraction stage The speech signal consisting of infinite information, we must extract the most important ones. A direct comparison treatment on this kind of signal is impossible because there is too much information [7]. So the discrete wavelet transform (DWT) using haar function coefficients are applied on all the frames for all words used in this proposed work, the filter bank used for the extraction of features is (three and four) level, The DWT of the extracted frame (256 samples) will result in four sub bands called (d1, d2, d3, a3). First level results is 128 samples from (high pass filter + down sampling) called (d1) and 128 samples from (low pass filter + down sampling) called (a1). In the second level, the same filters are applied on the output (a1), this results in 64 samples from the (high pass filter + down sampling) called (d2) and 64 samples from the (low pass filter + down sampling) called (a2). The third level [using the same filters on (a2)] result is 32 samples from the (high pass filter + down sampling) called (d3) and 32 samples from the (low pass filter + down sampling) called (a3). Since most of the information is concentrated in the low frequency components only the 32 samples result from the low pass filters (a3) will be considered as the features of the input signal. Input signal Preprocessing stage Feature extraction stage Store as Reference Preprocessing stage Feature extraction stage Classification using Genetic Algorithm Word recognized Input signal
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME 79 The DWT using (haar) scaling function coefficients are applied on all the frames for all words used in this proposed work. Then the magnitude for each signal to produce resultant features also named feature vectors that are classified by GA in classification stage. Eq (1) shows the computation of magnitude in each frame. This stage is used to reduce the number of data characterizing and shows a limited number of parameters or coefficients [2]. ∑ − = = 1 0 ][][ Nosamples M N MDataNmagn .…… (1) Where m:- is number of samples, Data[M] is the value of signal. 4.3 Classification stage The recognition process is made by a genetic algorithm by comparing the test and reference files. Before using Genetic Algorithm, the features that are extracted from feature extraction stage is divided into blocks (such as 5 blocks) that can be used in matching by Genetic Algorithms for recognition the spoken word. Fig (2) showed the segmentation of extracted signal. The following points clarify how the genetic algorithm was used for speech recognition. Figure 2: Segmentation for Speech Signal 1. Initialization of the population The reference files are the population managed by Genetic Algorithms, the number of the individual in the population is 165 individuals. The choice of the initial population is random for each word to be recognized. Input the signal produced from Feature extraction Input the no. of blocks Appling GA steps on blocks End Start Compute Magnitude for each block
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME 80 2. Encoding Binary encoding is used to encode the magnitude of feature vector for speech signal as show in table (1). Table (1) Magnitude computation for each block (as chromosome) 3. Fitness Function To evaluate the population, the fitness of each individual or chromosome are be calculated, which is the differences between the word to be recognized and each word in the database, whenever the distance value is less or near to zero then the word is recognized. It's formula is derived from the Euclidean distance using Mean Square Error (MSE) as shown in Eq. (2) [8], and applied the fitness function using the Eq.(3) to limit the boundary of The Fitness value between (0,1), this means if the fitness value near to one then the best matching is being occur between the test and reference files and vise versa. 1- Distance (A,B)= MSE = ∑= − n i BA N 1 2 )( 1 .……. (2) A is the vector test file, B is the vector reference file, N is number of block 2- Fitness= MSE+1 1 …....…. (3) 4. Selecting After evaluating individuals of the population, the elitist selection method will be used; this method allows the Genetic Algorithm to retain a number of best individuals for the next generation. These individuals may be lost if they are not selected to reproduce. 5. Crossover Single point crossover was used in Magnitude features block by replace the value of some blocks between the test and reference that are selected randomly, but the replace operation only in first digit value of block in order to keep the features of the block and do not maximum change in value of block, The probability of the crossover is 0.7 as shown in following table (2) Table (2): Single Crossover for Magnitude blocking ‫بغداد‬ ‫كلمة‬ Magnitude value for each block Test 53 49 53 55 49 In Binary 011101 110001 110101 110111 110001 Reference 54 57 55 54 26 In Binary 101101 111001 110111 110110 11010 Test After crossover 101101 110001 110101 110111 110001 Test file after process by Eq(1) 81 140 38 255 231 In binary as block 1010001 10001100 100110 11111111 11100111
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 76-81 © IAEME 81 6. Mutation A mutation operation in this paper is simple change because keeping the feature extraction is very important in this work, so if the word did not recognized, the mutation operation will be done by replacing the individual. The probability of mutation is 0.001. 7. The Stop Criterion The stop criterion of the proposed approach is either the word is recognized or all sub- populations have been covered or the number of generation is finished. CONCLUSION In this paper, we suggest a system for automatic recognition of isolated Arabic words, with small vocabulary of 11 words each word recording 7 times that represent the Arabic language characteristic. Our system used discrete wavelet transform (DWT) and Magnitude (Magn) in feature extraction stage. And genetic algorithm in classification stage with binary encoding and single crossover. The recognition rate when using third level of DWT is 90% and the recognition rate when using fourth level of DWT is 87.5%. REFERENCES [1]- Holmes J. and Holmes W. "Speech Synthesis and Recognition", Second Edition,Taylor and Francis e-library London and New York, 2001. [2]- Rabiner L.R. and Schafer R.W., "Digital Processing of Speech Signals", Englewood Cliffs, New Jersey: Prentice Hall, 1978. [3]- Cook S. "Speech Recognition Howto", http://www.gear21.com/speech/big_html/speech_big.html, April, 19, 2007. [4]- Goldberg D., "Genetic Algorithm in search, Optimization and Machine Learning", Addison Wesely Longman, Boston, MA , 1989. [5]- Mitchell M. , "An Introduction to Genetic Algorithms", The MIT press, Cambridge , MA , 1996. [6]- Sound Forge 9.0, http://www.wsystem.com/html/sound_forge.html. [7]- Maouche F., Benmohamed M. ,"Automatic Recognition of Arabic Word by Genetic Algorithm and MFFC Modeling ", Faculty of Informatics, Mentouri University, Constantine, Algeria,2009. [8]- P Mahalakshmi and M R Reddy, “Speech Processing Strategies for Cochlear Prostheses-The Past, Present and Future: A Tutorial Review”, International Journal of Advanced Research in Engineering & Technology (IJARET), Volume 3, Issue 2, 2012, pp. 197 - 206, ISSN Print: 0976-6480, ISSN Online: 0976-6499. [9]- Dr. Mustafa Dhiaa Al-Hassani and Dr. Abdulkareem A. Kadhim, “Design a Text-Prompt Speaker Recognition System using LPC-Derived Features”, International Journal of Information Technology and Management Information Systems (IJITMIS), Volume 4, Issue 3, 2013, pp. 68 - 84, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413. [10]- Pallavi P. Ingale and Dr. S.L. Nalbalwar, “Novel Approach to Text Independent Speaker Identification”, International Journal of Electronics and Communication Engineering & Technology (IJECET), Volume 3, Issue 2, 2012, pp. 87 - 93, ISSN Print: 0976- 6464, ISSN Online: 0976 –6472.