Your SlideShare is downloading. ×
International Journal of Electronic Engineering ResearchISSN 0975 - 6450 Volume 2 Number 3 (2010) pp. 377–381© Research In...
378                                                        G.N. Kodandaramaiah et al    The extracted formants are compare...
Independent Speaker Recognition for Native English Vowels                                       379                      F...
380                                                         G.N. Kodandaramaiah et al   Table 3.1 Shows the percentage rec...
Independent Speaker Recognition for Native English Vowels                        381   Table 3.2 Shows percentage recognit...
382   G.N. Kodandaramaiah et al
Upcoming SlideShare
Loading in...5
×

Ijeer journal

661

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
661
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Ijeer journal"

  1. 1. International Journal of Electronic Engineering ResearchISSN 0975 - 6450 Volume 2 Number 3 (2010) pp. 377–381© Research India Publicationshttp://www.ripublication.com/ijeer.htm Independent Speaker Recognition for Native English Vowels 1 G.N. Kodandaramaiah, 2M.N. Giriprasad and 3M. Mukunda Rao 1 HOD, Department of Electronics and Communications Engineering, Madanapalli Institute of Technology, Madanapalli, India 2 Principal, Jawaharlal Nehru Technological University, Pulivendula, India 3 Honorary Research Professor, Biomedical Sciences Sri Ramachandra Medical College & Research Institute, Chennai, India E-mail: kodandramaiah@yahoo.com Abstract This paper presents the standard method for vocal tract shape estimation has been the basis for many successful automatic speech recognition (ASR) systems. Analytic results presented demonstrate that estimation of vocal tract shape, based on reflection co-efficients obtained from LPC analysis of speech, is satisfactory and is related to the place of articulation of the vowels. Here we describe a “standard” approach for classification of vowels based on formants, which are meaningfully distinguishable frequency components of human speech. These formant frequencies depend upon the shape and dimensions of the vocal tract, Vocal tract shape is characterized by a set of formant frequencies, and different sounds are produced by varying the shape of the vocal tract, leading to the property of spoken speech. It has been implemented in many of speech related applications such as, speech/speaker recognition. This work uses Euclidean distance measure, is applied in order to measure the similarity or the dissimilarity between two spoken words, which take place after quantizing a spoken word into its code book. Keywords: Speech, Vocal tract, Formants, Euclidean distance.IntroductionThe Fig 1.1 shows the block diagram of Independent Speaker Recognition for vowels.Let S(n) be the test sample of a vowel. Then parameters i.e. formants F1 and F2 areextracted.
  2. 2. 378 G.N. Kodandaramaiah et al The extracted formants are compared with the threshold of reference formants.Euclidean distance measure is applied in order to measure the similarity or thedissimilarity between two spoken words, which take place after quantizing a spokenword into its code book. The matching of an unknown vowel is performed bymeasuring the Euclidean distance between the features vector (formants) of theunknown vowel to the reference model (codebook) of the known vowel formants F1,F2 in the database. The goal is to find the codebook that has the minimum distancemeasurement in order to identify the unknown vowel (Franti et al., 1997). Forexample, in the testing or identification session, the Euclidean distance between thefeatures vector formants F1, F2 and codebook for each spoken vowel is calculated andthe vowel with the smallest average minimum distance is picked as shown in the Eq.(1.1). Note that xi is the i th input features vector (formants F1, F2), yi is the i thfeatures vector in the codebook (Reference Model) and distance d is the distancebetween xi and yi. d(x,y)=√[ ] 1.1where D=2, xi is the ith input features vector (formants F1, F2), yi is the ith featurevector in the code book (Reference Model) and d is the distance between xi andyi.s,Wi = weight associated with ith feature vector, recognition score.Decision RuleThe weights ‘w’ are important to use if the information contained in the underliningfeatures is not proportional to the feature variances. In this case of vowel recognitionbased on formants F1 and F2, they do not uniformly contribute to vowel recognition.Based on study, relative weights-F1 =2; F2 =1 are given but normalized such that thesum of the weights is 1.0. We refer to classification based on this distance as Maximum LikelihoodRegression, since this is based on Gaussian assumptions used to obtain the parametersin the classifier. To provide verification that the vowels displayed are producingaccurate results, the MLR has calculated the distance of average features for the givenvowels.If the feature distance is within the threshold criteria Di (F1, F2), then equation1.1 becomes Di (f) < α√m 1.2where m is number of features i.e. F1 and F2, α is arbitrary scale factor used forperformance tuning. Then, the vector xi is identified as the vector yi, otherwise not. Ifit is too small the MLR rejects many correct vowel samples. If it is too large theoutput of category vowels will not be rejected. In our work the threshold α=x hasgiven optimum results.
  3. 3. Independent Speaker Recognition for Native English Vowels 379 Figure 1.1: Block diagram of vowel recognition.Result of Vowel Ecognition of Male and Female SpeakersMale SpeakersThe table 3.1 gives the result for male vowel recognition based on MLR method.Vowel /a/ has achieved perfect classification compared to other vowels. The detectionrate for vowel /u/ and /e/ is better than vowel /o/ and /i/ for all tested samples. Vowel/e/ and vowel /i/ tend to mis-classify with each other due to the variations ofutterances from different inter-speakers. The Fig 3.1 shows vowel ‘X’ versus %vowel recognition for 50 male samples, where ‘X’ is the actual vowel. For vowel /a/, /a/ in /a/ is 46; /a/ in /e/ is 0; /a/ in /i/ is 4;/a/ in /o/ is 0;/a/ in /u/ is 0.Hence the percentage correctness of recognition of vowel /a/ is = ( /a/ in /a/ )*100÷(/a/ in all the vowels) = 46 * 100÷(46+0+4+0+0) = 46*100/50 = 92 %. Figure 3.1: Vowel Vs % vowel recognition for male speaker.
  4. 4. 380 G.N. Kodandaramaiah et al Table 3.1 Shows the percentage recognition for vowel of male speakers. vowels Predicted Actual /a/ /e/ /i/ /o/ /u/ % correct /a/ 46 0 4 0 0 92% /e/ 2 44 0 4 0 89% /i/ 6 0 40 0 4 80% /o/ 0 3 0 44 3 88% /u/ 3 1 1 0 45 90%Female SpeakersThe table 3.2 gives the result for female vowel recognition based on MLR method.Vowel /o/ has achieved perfect classification compared to other vowels. The detectionrate for vowel /u/ and /e/ is better than vowel /a/ and /i/ for all tested samples. Vowel/a/ and vowel /i/ tend to mis-classify with each other due to the variations ofutterances from different inter-speakers. The Fig 3.2 shows the percentage ofrecogniton of vowel for 40 female samples. For vowel /o/, /o/ in /a/ is 0; /o/ in /e/ is 0;/o/ in /i/ is 0;/o/ in /o/ is 39;/o/ in /u/ is 1. Hence the Percentage correctness ofrecognition of vowel /o/ is = ( /o/ in /o/ )* 100÷(/o/ in all the vowels) = 39 *100÷(0+0+0+39+1) = 39*100/40 = 98 %. Figure 3.2: Vowel Vs % vowel recognition for female speaker.
  5. 5. Independent Speaker Recognition for Native English Vowels 381 Table 3.2 Shows percentage recognition of vowel for female speakers Vowel /a/ /e/ /i/ /o/ /u/ %correct /a/ 34 4 0 0 2 85% /e/ 0 37 0 3 0 92% /i/ 0 4 34 0 2 86% /o/ 0 0 0 39 1 98% /u/ 3 0 0 0 37 94%ConclussionIt was an attempt presents to the standard method for vocal tract shape estimation hasbeen the basis for many successful automatic speech recognition (ASR) systems. Herewe describe a “standard” approach for classification of vowels based on formants. Weachieved 80 to 95 percentage of speaker recognition using Euclidean distancemeasure.AcknowledgementsWe would like to thanks the Management, Principal of Madanapalli Institute ofTechnology and Science, Madanapalli, A.P., for their Cooperation andEncouragementReferences[1] L.R.Rabiner and R.W.Schafer, Digital processing of Speech signals, Droling Kindersly(india)pvt.Ltd.,licensees of pearson eduction in south asia, 1978, PP. 54-101,412-460.[2] Thomas F. Quatieri, Discrete time speech signal processing principles and practice,2002, pp 56-59.[3] P. Ladefoged, R. Harshman, L. Goldstein, and L. Rice, “Generating vocal tract shapes from formant frequencies,” J. Acoust. Soc. Am., vol. 64, no. 4, , 1978, pp. 1027–1035.[4] Mayukh Bhaowal & Kunal Chawla Isolated word Recognition for English Language using LPC, Vq and HMM,pp.2-4.[5] G.E Peterson and H.L Barney,” control methods used in a study of the vowels ” J.Acoustic.Soc.Amer., Volume 24,PP.175-184[6] P.Rose,Long-and short-term within-speaker differences in the formants of Australian hello, j.Int. Phonetic. Assoc. 29(1) (1999) 1-31.[7] AhmedAli Safiullah Bhatti, dr.Munammad Sleem Miam. formants based Analysis for speech recognition, IEEE 2006.
  6. 6. 382 G.N. Kodandaramaiah et al

×