SlideShare a Scribd company logo
1 of 9
Download to read offline
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
DOI : 10.5121/ijcseit.2012.2101 1
GENDER RECOGNITION SYSTEM USING SPEECH
SIGNAL
Md. Sadek Ali1
, Md. Shariful Islam1
and Md. Alamgir Hossain1
1
Dept. of Information & Communication Engineering
Islamic University, Kushtia 7003, Bangladesh.
E-mail :{ sadek_ice, afmsi76, alamgir_ict}@yahoo.com
ABSTRACT
In this paper, a system, developed for speech encoding, analysis, synthesis and gender identification is
presented. A typical gender recognition system can be divided into front-end system and back-end system.
The task of the front-end system is to extract the gender related information from a speech signal and
represents it by a set of vectors called feature. Features like power spectrum density, frequency at
maximum power carry speaker information. The feature is extracted using First Fourier Transform (FFT)
algorithm. The task of the back-end system (also called classifier) is to create a gender model to recognize
the gender from his/her speech signal in recognition phase. This paper also presents the digital processing
of a speech signals (pronounced “A” and “B”) which are taken from 10 persons, 5 of them are Male and
the rest of them are Female. Power Spectrum Estimation of the signal is examined .The frequency at
maximum power of the English Phonemes is extracted from the estimated power spectrum. The system uses
threshold technique as identification tool. The recognition accuracy of this system is 80% on average.
KEYWORDS
Gender Recognition, Feature Extraction, First Fourier Transform (FFT), Font-end, Back-end.
1. INTRODUCTION
“Speech” according to Webster’s Dictionary is the “communication or expression of throughout
in speaker words”. Speech signal not only caries the information that is need to communicate
among people but also contents the information regarding the particular speaker. The
nonlinguistic characteristics of a speaker help to classify speaker (male or female). Features like
power spectrum density, frequency at maximum power carry speaker information. These speaker
features can be tracked well varying the frequency characteristics of the vocal tract and the
variation in the excitation. The speech signal also carry the information of the particular speaker
including social factors, affective factor and the properties of the physical voices production
apparatus for which human being are able to recognize whether the speaker is a male or a
female easily, during telephone conversation or any hidden condition of the speaker [1][2]. With
the current concern of security worldwide gender classification has received great deal of
attention among of the speech researchers. Also a rapidly developing environment of
computerization, one of the most important issues in the developing world is gender recognition.
Gender recognition, which can be classified into two different tasks: Gender identification and
Gender verification. In the identification task, or 1: N matching, an unknown speaker is compared
against a database of N known speakers, and the best matching speaker is returned as the
recognition decision. The verification task, or 1:1 matching, consists of making a decision
whether a given voice sample is produced by a claimed speaker. An identity claim (e.g., a PIN
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
2
code) is given to the system, and the unknown speaker’s voice sample is compared against the
claimed speaker’s voice template. If the similarity degree between the voice sample and the
template exceeds a predefined decision threshold, the speaker is accepted, and otherwise rejected.
The rest of the paper is organized as follows. In section II shows the related works. Section III
describes the mathematical tools and techniques for gender recognition systems. Section IV
describes speech recording and feature extraction process. Computations of power and frequency
spectrum are described in section V. System implementation is detailed in section VI.
Recognition and experimental results are given in section VII. Finally, section VIII concludes the
paper.
2. RELATED WORKS
Gender recognition is a task of recognizing the gender from his or her voice. With the current
concern of security worldwide speaker identification has received great deal of attention among
of the speech researchers. Also a rapidly developing environment of computerization, one of the
most important issues in the developing world is speaker identification. Speech processing based
several types of research work have been continuing from a few decade ago as a field of digital
signal processing (DSP). The most efficient related work is “Speaker recognition in a multi-
speaker environment” was submitted in Proc. 7th European Conference on Speech
Communication and Technology (Eurospeech 2001) (Aalborg, Denmark, 2001), pp. 787–90[3].
Another related work is “Spectral Feature for Automatic Voice-independent Speaker
Recognition” was developed in the department of Computer Science, Joensuu University, Finland
2003 [4]. From the study of different previous research works it was observed that among the
different features the power spectrum results in best classification rate. Based on the power
spectrum, we have computed frequency spectrum from maximum power of speech signal. We
have implemented a complete gender recognition system to identify particular gender
(male/female) using frequency component. In addition to description, theoretical and
experimental analysis, we provide implementation details as well.
3. MATHEMATICAL TOOLS AND TECHNIQUES
For digital communication or digital signal synthesis, it is necessary to convey the analog signal
such as speech, music etc. as a sequence of digitized number, which is commonly done by
sampling the speech signal denoted by Xn (t) periodically to produce the sequence
X(n) = Xn (nt) α < n < α ……………………………………………...(1)
Where n0 have only integer value. In this paper, we have used pulse code modulation (PCM)
technique to digitize speech signal.
The sampled data are operated to find out the different parameter. Discrete Fourier Transform
(DFT) computes the frequency information of the equivalent time domain signal [5]. Since a
speech signal contains only real point values, we can make use of this fact and use a real-point
Fast Fourier Transform (FFT) for increased efficiency. The resulting output contains both the
magnitude and phase information of the original time domain signal. The Short time Fourier
analysis of windowed speech signal can produce a reasonable feature space for recognition [6].
The Fourier Transform for a discreet time signal f kT( ) is given by
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
3
Fn f kT e j Nnk
k
N
( ) ( ) / .
= −
=
−
∑ 2
0
1
π
………………………………………………………………….. (2)
Equation (2) can be written as
F n f k WN
nk
k
N
( ) ( )= −
=
−
∑0
1
……………………………………………………………………… (3)
Where f k f kT( ) ( )= andW eN
j N
= 2π /
, WN is usually referred to as the kernel of the transform.
There are several algorithms that can considerably reduce the number of computations in a DFT.
DFT implemented using such schemes is referred to as Fast Fourier Transform (FFT). Among the
FFT algorithms there are two popular algorithms: decimation-in-time and decimation- in
frequency. Through out this paper, the DFT is computed using decimation-in-time algorithm.
4. SPEECH RECORDING AND FEATURE EXTRACTION
Human speech signal contains significant amount of energy within 2.5 KHz. So, we have taken
the sampling rate of speech signal as 8 KHz, 8 bit mono, which is sufficient for representing
signals up to 4 KHz without aliasing effect. To record speech signal we have used Intel(r)
Integrated sound card, a normal microphone and windows default sound recorder software.
Speech was recorded in room environment. The recorded sound was stored in PCM (.wav) sound
file format. The file header is stored at the beginning of the PCM file and occupied 44 bytes [7].
We also know that the actual wave data are stored after 58 bytes from the beginning. So, to
extract wave data, we first discard 58 bytes from the beginning of the wave file and then read
wave data as character. This data are stored in a text file (.txt) as integer data.
Feature extraction is the process of converting the original speech signal to a parametric
representation that gives a set of meaningful features useful for recognition. Feature extractions is
the combination of some signal processing steps including the computation of drive data from
wave sound, computation of Fast Fourier Transform (FFT), Power spectrum, the sample point at
maximum power and finally compute the frequency.
5. COMPUTATION OF POWER AND FREQUENCY
Power spectrum estimation uses an estimator called peridogram [5][6]. The power spectrum is
defined at N/2+1 frequency as
[ ] ( )P f N F F k Nk k N k( ) | | | | , ,.......,= + − − − − − − − − − = −−1 1 2 2 12 2 2
Where fk is defined only for the zero and positive frequencies
f k N f k N k Nk c≡ = − − − − − − − =∆ 2 0 1 2, ,....,
To compute the power spectrum of speech, the speech data is segmented into K segments of
N=2M points. Where N is the length of a window, taken as a power of 2 for the convenient
computation of FFT. Each segment is FFTed separately and the resulting K periodograms are
averaged together to obtain a Power Spectrum estimate at M frequency values between 0 and fc.
The figure 1 and 2 shows the signal waveform (fig1.a and fig2.a), power spectrum (fig1.b and
fig2.b) of male and female speaker for phoneme “A”.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
4
(a) (b)
Figure 1. The signal waveform and power spectrum of a male speaker for Phoneme “A”.
(a) (b)
Figure 2. The signal waveform and power spectrum of a female speaker for phoneme “A”.
0
0.01
0.02
0.03
0.04
0.05
0 200 400 600 800
Frequencyin Hz
PowerMagnitude
0
0.002
0.004
0.006
0.008
0.01
0 200 400 600 800
Frequency in Hz
PowerMagnitude
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
5
The computation steps for frequency can be summarized as shown in figure 3.
Figure 3. The sequence of operations in converting a speech signal into Frequency features
6. SYSTEM IMPLEMENTATION
The following figure 4 shows the abstraction of a gender recognition system. Regardless of the
type of the task (classification or verification), gender recognition system operates in two modes:
training and recognition modes. In the training mode, a new gender person’s voice is recorded
and analysis. The recognition mode, an unknown gender person gives a speech input and the
system makes a decision about the speaker’s identity. Both the training and the recognition modes
include feature extraction, sometimes called the front-end of the system. The feature extractor
converts the digital speech signal into a sequence of numerical descriptors, called feature vectors.
The features provide a more stable, robust, and compact representation than the raw input signal.
Feature extraction can be considered as a data reduction process that attempts to capture the
essential characteristics of the speaker with a small data rate.
Speech Signal
Drive data from recorded
Wave sound
FFT
Log10. 2
Sample point at maximum
power spectrum
Power Spectrum
Frequency determination at max power
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
6
Figure 4. Block diagram of the Gender Recognition System
The software system contains the following steps:
1. To code and record the speech data as a disk file
2. To compute the Fast Fourier Transform of the data from recorded file
3. To compute the power spectrum from transformer data of step 2.
4. To compute the frequency from power spectrum of step 3.
5. To identify the Speaker as a male person or as female person or give a message that indicate
the speaker is neither a male person nor a female person.
The developed system is applied to store and analyze English phonemes “A” and “B”. The speech
data are recorded for male and female speaker, FFT and power spectrum is calculated using these
data files by the developed system. The results are shown in tabular form in Table1:
Training Phase
Input
Speech
Feature
Extraction
Pattern
Matching
Decision LogicDecision
Recognition phase
Feature
Extraction
Store Extracted
Feature (data) in
Database.
Input
Speech
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
7
Table 1. Frequency at maximum power of some male and female corresponding English
phonemes “A” and “B”.
Speaker
ID
Phonemes No.of
sample
Frequency at
maximum power
(Male voice)
Frequency at
maximum power
(Female voice)
1 A 1 570 672
2 A 1 588 720
3 A 1 498 672
4 A 1 462 729
5 A 1 432 687
7. SYSTEM RESULTS
7.1. RECOGNITION RESULTS
The detailed recognition results are shown in Table 2.
Table 2. For sample #1
No. of
speaker
No. of accuracy
of Gender
Recognize
Percentage (%)
1 Male 100
2 Male 100
3 Male 100
4 Male 100
5 Male 100
6 Female 100
7 Female 100
8 Female 100
9 Female 100
10 Female 100
1 B 2 432 618
2 B 2 342 402
3 B 2 543 666
4 B 2 366 612
5 B 2 510 414
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
8
Table 2. For sample #2
No. of
speaker
No. of accuracy
of Gender
Recognize
Percentage (%)
1 Male 100
2 Male 0
3 Male 100
4 Male 0
5 Male 100
6 Female 100
7 Female 0
8 Female 100
9 Female 100
10 Female 0
7.2. EXPERIMENTAL RESULTS
The system was tested with 10 speakers (5 male people and 5 female people).The speech
(utterance) of word (“A” and “B”) of 10 speakers was recorded separately. The threshold
technique was used in recognition. The objective of the experiment was to study the recognition
performance of the system with different speakers. The percentage of accuracy rate of recognition
has been calculated using following equations:
No. of accurately recognized gender
Recognition Accuracy (%) = X 100...................... (4)
No. of correct gender expected
From recognition result, number of accurately recognized gender 16, number of correct gender
expected 20.The recognition accuracy is = 16*100/20 = 80%.
8. CONCLUSION
In this paper the main goal was to develop a gender recognition system using speech signal. The
feature selection is one of the most important factors in designing a gender recognition system.
From the study of different previous research works it was observed that among the different
features the power spectrum results in best classification rate. Thus the power spectrum has been
selected as the feature for classification. Among the different technique, the statistical analysis
and threshold technique is simple in computation and produces very good results. This is why this
method was selected for pattern comparison in recognition process to obtain improved
performance. The average recognition accuracy is 80 %. From experimental result, it can be seen
that recognition rate decreases as the number of speaker increases. Therefore, the system
efficiency is decreases, if the number of reference speakers in the speaker database increases. For
recognition constant thresholds have been used. If we could use dynamic threshold for
recognition it might produce more accurate and better recognition results.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012
9
REFERENCES
[1]. Prabhakar, S., Pankanti, S., and Jain, A. “Biometric recognition: security and privacy concerns”
IEEE Security and Privacy Magazine 1(2003), 33-42.
[2]. Huang X., Acero, A., and Hon, H.-W. “Spoken Language Processing: a Guideto Theory, Algorithm
and System Development” prentice-Hall, New Jersey, 2001.
[3]. Martin, A., and Przybocki, M. Speaker recognition in a multi-speaker environment.In Proc. 7th
European Conference on Speech Communication and Technology (Eurospeech 2001) (Aalborg,
Denmark, 2001), pp. 787–90.
[4]. Tomi Kinnunen “ Spectral Feature for Automatic Voice-independent Speaker Recognition”
Depertment of Computer Science, Joensuu University,Finland. December 21, 2003.
[5]. John R. Deller, John G Proakis and John H. L. Hansen, “Discrete- Time Processing of Speech
Signals” Macmillan Publishing company, 866 Third avenue, New York 10022.
[6]. Rabiner Lawrence, Juang Bing-Hwang, “Fundamentals of Speech Recognitions”, Prentice Hall New
Jersey, 1993, ISBN 0-13-015157-2.
[7]. Md. Saidur Rahman, “Small Vocabulary Speech Recognition in Bangla Language”, M.Sc. Thesis,
Dept. of Computer Science & Engineering, Islamic University, Kushtia-7003, July-2004.
Biography
Md. Sadek Ali received the Bachelor’s and Master‘s Degree in the Department of
Information and Communication Engineering (ICE) from Islamic University, Kushtia,
in 2004 and 2005 respectively. He is currently a Lecturer in the department of ICE,
Islamic University, Kushtia-Bangladesh. Since 2003, he has been working as a
Research Scientist at the Signal and Communication Research Laboratory,
Department of ICE, Islamic University, Kushtia, where he belongs to the spread-
spectrum research group. He has three published paper in international and one
national journal in the same areas. He has also two published paper in international
and one national conference in the same areas. His areas of interest include Signal processing, Wireless
Communications, optical fiber communication, Spread Spectrum and mobile communication.
Md. Shariful Islam received the Bachelor’s and Master‘s Degree in Applied Physics,
Electronics & Communication Engineering from Islamic University, Kushtia,
Bangladesh in 1999, and 2000 respectively. He is currently Assistant Professor in the
department of ICE, Islamic University, Kushtia-7003 Bangladesh. He has five
published papers in international and national journals. His areas of interest include
signal processing & mobile communication.
Md. Alamgir Hossain received the Bachelor’s and Master‘s Degree in the Dept. of
Information and Communication Engineering (ICE) from Islamic University, Kushtia,
in 2003 and 2004, respectively. He is currently Lecturer in the department of ICE,
Islamic University, Kushtia-7003, and Bangladesh. He was a lecturer in the Dept. of
Computer Science & Engineering from Institute of Science Trade & Technology
(Under National University), Dhaka, Bangladesh. from 23th October, 2007 to 18th
April 2010. Since 2003, he has been working as a Research Scientist at the
Communication Reasearch Laboratory, Department of ICE, Islamic University, Kushtia, where he belongs
to the spread-spectrum research group. He has two published paper in international and one national journal
in the same areas. His current research interests include Wireless Communications, Spread Spectrum and
mobile communication, Signal processing, Data Ming.

More Related Content

What's hot

Data Communication And Networking - DATA & SIGNALS
Data Communication And Networking - DATA & SIGNALSData Communication And Networking - DATA & SIGNALS
Data Communication And Networking - DATA & SIGNALS
Avijeet Negel
 
Microprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answersMicroprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answers
Abhijith Augustine
 

What's hot (20)

INTERFACING WITH INTEL 8251A (USART)
INTERFACING WITH INTEL 8251A (USART)INTERFACING WITH INTEL 8251A (USART)
INTERFACING WITH INTEL 8251A (USART)
 
2017 reg ece syllabus
2017 reg ece syllabus2017 reg ece syllabus
2017 reg ece syllabus
 
Question Bank microcontroller 8051
Question Bank microcontroller 8051Question Bank microcontroller 8051
Question Bank microcontroller 8051
 
EEP306: Delta modulation
EEP306: Delta modulationEEP306: Delta modulation
EEP306: Delta modulation
 
NETWORK LAYER - Logical Addressing
NETWORK LAYER - Logical AddressingNETWORK LAYER - Logical Addressing
NETWORK LAYER - Logical Addressing
 
Power control in 3 g
Power control in 3 gPower control in 3 g
Power control in 3 g
 
Final ppt
Final pptFinal ppt
Final ppt
 
introduction to data comunication
introduction to data comunicationintroduction to data comunication
introduction to data comunication
 
LTE Schedulers – A Definitive Approach
LTE Schedulers – A Definitive Approach LTE Schedulers – A Definitive Approach
LTE Schedulers – A Definitive Approach
 
Network Layer,Computer Networks
Network Layer,Computer NetworksNetwork Layer,Computer Networks
Network Layer,Computer Networks
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
Utran architecture(rashmi)
Utran architecture(rashmi)Utran architecture(rashmi)
Utran architecture(rashmi)
 
Data Communication And Networking - DATA & SIGNALS
Data Communication And Networking - DATA & SIGNALSData Communication And Networking - DATA & SIGNALS
Data Communication And Networking - DATA & SIGNALS
 
Serial transmission
Serial transmissionSerial transmission
Serial transmission
 
Internship ppt on bsnl
Internship ppt on bsnlInternship ppt on bsnl
Internship ppt on bsnl
 
Routing algorithm
Routing algorithmRouting algorithm
Routing algorithm
 
Microprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answersMicroprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answers
 
Circular Convolution
Circular ConvolutionCircular Convolution
Circular Convolution
 
Timer And Counter in 8051 Microcontroller
Timer And Counter in 8051 MicrocontrollerTimer And Counter in 8051 Microcontroller
Timer And Counter in 8051 Microcontroller
 
Interrupts
InterruptsInterrupts
Interrupts
 

Similar to GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL

Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition System
ijtsrd
 

Similar to GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL (20)

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition System
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...
 
Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...
 
H42045359
H42045359H42045359
H42045359
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification Techniques
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
 
Compression Using Wavelet Transform
Compression Using Wavelet TransformCompression Using Wavelet Transform
Compression Using Wavelet Transform
 

More from IJCSEIT Journal

BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
IJCSEIT Journal
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
IJCSEIT Journal
 

More from IJCSEIT Journal (20)

ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
 
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
 
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
 
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
 
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
 
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
 
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
 
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NNDETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
 
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORMM-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
 
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
 
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
 
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEUSING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
 
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
 
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETPROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
 
ALGORITHMIC AND ARCHITECTURAL OPTIMIZATION OF A 3D RECONSTRUCTION MEDICAL IMA...
ALGORITHMIC AND ARCHITECTURAL OPTIMIZATION OF A 3D RECONSTRUCTION MEDICAL IMA...ALGORITHMIC AND ARCHITECTURAL OPTIMIZATION OF A 3D RECONSTRUCTION MEDICAL IMA...
ALGORITHMIC AND ARCHITECTURAL OPTIMIZATION OF A 3D RECONSTRUCTION MEDICAL IMA...
 

Recently uploaded

21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
IJECEIAES
 

Recently uploaded (20)

Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Adsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) pptAdsorption (mass transfer operations 2) ppt
Adsorption (mass transfer operations 2) ppt
 
Independent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging StationIndependent Solar-Powered Electric Vehicle Charging Station
Independent Solar-Powered Electric Vehicle Charging Station
 
Artificial Intelligence in due diligence
Artificial Intelligence in due diligenceArtificial Intelligence in due diligence
Artificial Intelligence in due diligence
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Interfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdfInterfacing Analog to Digital Data Converters ee3404.pdf
Interfacing Analog to Digital Data Converters ee3404.pdf
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Seizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networksSeizure stage detection of epileptic seizure using convolutional neural networks
Seizure stage detection of epileptic seizure using convolutional neural networks
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 

GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL

  • 1. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 DOI : 10.5121/ijcseit.2012.2101 1 GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL Md. Sadek Ali1 , Md. Shariful Islam1 and Md. Alamgir Hossain1 1 Dept. of Information & Communication Engineering Islamic University, Kushtia 7003, Bangladesh. E-mail :{ sadek_ice, afmsi76, alamgir_ict}@yahoo.com ABSTRACT In this paper, a system, developed for speech encoding, analysis, synthesis and gender identification is presented. A typical gender recognition system can be divided into front-end system and back-end system. The task of the front-end system is to extract the gender related information from a speech signal and represents it by a set of vectors called feature. Features like power spectrum density, frequency at maximum power carry speaker information. The feature is extracted using First Fourier Transform (FFT) algorithm. The task of the back-end system (also called classifier) is to create a gender model to recognize the gender from his/her speech signal in recognition phase. This paper also presents the digital processing of a speech signals (pronounced “A” and “B”) which are taken from 10 persons, 5 of them are Male and the rest of them are Female. Power Spectrum Estimation of the signal is examined .The frequency at maximum power of the English Phonemes is extracted from the estimated power spectrum. The system uses threshold technique as identification tool. The recognition accuracy of this system is 80% on average. KEYWORDS Gender Recognition, Feature Extraction, First Fourier Transform (FFT), Font-end, Back-end. 1. INTRODUCTION “Speech” according to Webster’s Dictionary is the “communication or expression of throughout in speaker words”. Speech signal not only caries the information that is need to communicate among people but also contents the information regarding the particular speaker. The nonlinguistic characteristics of a speaker help to classify speaker (male or female). Features like power spectrum density, frequency at maximum power carry speaker information. These speaker features can be tracked well varying the frequency characteristics of the vocal tract and the variation in the excitation. The speech signal also carry the information of the particular speaker including social factors, affective factor and the properties of the physical voices production apparatus for which human being are able to recognize whether the speaker is a male or a female easily, during telephone conversation or any hidden condition of the speaker [1][2]. With the current concern of security worldwide gender classification has received great deal of attention among of the speech researchers. Also a rapidly developing environment of computerization, one of the most important issues in the developing world is gender recognition. Gender recognition, which can be classified into two different tasks: Gender identification and Gender verification. In the identification task, or 1: N matching, an unknown speaker is compared against a database of N known speakers, and the best matching speaker is returned as the recognition decision. The verification task, or 1:1 matching, consists of making a decision whether a given voice sample is produced by a claimed speaker. An identity claim (e.g., a PIN
  • 2. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 2 code) is given to the system, and the unknown speaker’s voice sample is compared against the claimed speaker’s voice template. If the similarity degree between the voice sample and the template exceeds a predefined decision threshold, the speaker is accepted, and otherwise rejected. The rest of the paper is organized as follows. In section II shows the related works. Section III describes the mathematical tools and techniques for gender recognition systems. Section IV describes speech recording and feature extraction process. Computations of power and frequency spectrum are described in section V. System implementation is detailed in section VI. Recognition and experimental results are given in section VII. Finally, section VIII concludes the paper. 2. RELATED WORKS Gender recognition is a task of recognizing the gender from his or her voice. With the current concern of security worldwide speaker identification has received great deal of attention among of the speech researchers. Also a rapidly developing environment of computerization, one of the most important issues in the developing world is speaker identification. Speech processing based several types of research work have been continuing from a few decade ago as a field of digital signal processing (DSP). The most efficient related work is “Speaker recognition in a multi- speaker environment” was submitted in Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001) (Aalborg, Denmark, 2001), pp. 787–90[3]. Another related work is “Spectral Feature for Automatic Voice-independent Speaker Recognition” was developed in the department of Computer Science, Joensuu University, Finland 2003 [4]. From the study of different previous research works it was observed that among the different features the power spectrum results in best classification rate. Based on the power spectrum, we have computed frequency spectrum from maximum power of speech signal. We have implemented a complete gender recognition system to identify particular gender (male/female) using frequency component. In addition to description, theoretical and experimental analysis, we provide implementation details as well. 3. MATHEMATICAL TOOLS AND TECHNIQUES For digital communication or digital signal synthesis, it is necessary to convey the analog signal such as speech, music etc. as a sequence of digitized number, which is commonly done by sampling the speech signal denoted by Xn (t) periodically to produce the sequence X(n) = Xn (nt) α < n < α ……………………………………………...(1) Where n0 have only integer value. In this paper, we have used pulse code modulation (PCM) technique to digitize speech signal. The sampled data are operated to find out the different parameter. Discrete Fourier Transform (DFT) computes the frequency information of the equivalent time domain signal [5]. Since a speech signal contains only real point values, we can make use of this fact and use a real-point Fast Fourier Transform (FFT) for increased efficiency. The resulting output contains both the magnitude and phase information of the original time domain signal. The Short time Fourier analysis of windowed speech signal can produce a reasonable feature space for recognition [6]. The Fourier Transform for a discreet time signal f kT( ) is given by
  • 3. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 3 Fn f kT e j Nnk k N ( ) ( ) / . = − = − ∑ 2 0 1 π ………………………………………………………………….. (2) Equation (2) can be written as F n f k WN nk k N ( ) ( )= − = − ∑0 1 ……………………………………………………………………… (3) Where f k f kT( ) ( )= andW eN j N = 2π / , WN is usually referred to as the kernel of the transform. There are several algorithms that can considerably reduce the number of computations in a DFT. DFT implemented using such schemes is referred to as Fast Fourier Transform (FFT). Among the FFT algorithms there are two popular algorithms: decimation-in-time and decimation- in frequency. Through out this paper, the DFT is computed using decimation-in-time algorithm. 4. SPEECH RECORDING AND FEATURE EXTRACTION Human speech signal contains significant amount of energy within 2.5 KHz. So, we have taken the sampling rate of speech signal as 8 KHz, 8 bit mono, which is sufficient for representing signals up to 4 KHz without aliasing effect. To record speech signal we have used Intel(r) Integrated sound card, a normal microphone and windows default sound recorder software. Speech was recorded in room environment. The recorded sound was stored in PCM (.wav) sound file format. The file header is stored at the beginning of the PCM file and occupied 44 bytes [7]. We also know that the actual wave data are stored after 58 bytes from the beginning. So, to extract wave data, we first discard 58 bytes from the beginning of the wave file and then read wave data as character. This data are stored in a text file (.txt) as integer data. Feature extraction is the process of converting the original speech signal to a parametric representation that gives a set of meaningful features useful for recognition. Feature extractions is the combination of some signal processing steps including the computation of drive data from wave sound, computation of Fast Fourier Transform (FFT), Power spectrum, the sample point at maximum power and finally compute the frequency. 5. COMPUTATION OF POWER AND FREQUENCY Power spectrum estimation uses an estimator called peridogram [5][6]. The power spectrum is defined at N/2+1 frequency as [ ] ( )P f N F F k Nk k N k( ) | | | | , ,.......,= + − − − − − − − − − = −−1 1 2 2 12 2 2 Where fk is defined only for the zero and positive frequencies f k N f k N k Nk c≡ = − − − − − − − =∆ 2 0 1 2, ,...., To compute the power spectrum of speech, the speech data is segmented into K segments of N=2M points. Where N is the length of a window, taken as a power of 2 for the convenient computation of FFT. Each segment is FFTed separately and the resulting K periodograms are averaged together to obtain a Power Spectrum estimate at M frequency values between 0 and fc. The figure 1 and 2 shows the signal waveform (fig1.a and fig2.a), power spectrum (fig1.b and fig2.b) of male and female speaker for phoneme “A”.
  • 4. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 4 (a) (b) Figure 1. The signal waveform and power spectrum of a male speaker for Phoneme “A”. (a) (b) Figure 2. The signal waveform and power spectrum of a female speaker for phoneme “A”. 0 0.01 0.02 0.03 0.04 0.05 0 200 400 600 800 Frequencyin Hz PowerMagnitude 0 0.002 0.004 0.006 0.008 0.01 0 200 400 600 800 Frequency in Hz PowerMagnitude
  • 5. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 5 The computation steps for frequency can be summarized as shown in figure 3. Figure 3. The sequence of operations in converting a speech signal into Frequency features 6. SYSTEM IMPLEMENTATION The following figure 4 shows the abstraction of a gender recognition system. Regardless of the type of the task (classification or verification), gender recognition system operates in two modes: training and recognition modes. In the training mode, a new gender person’s voice is recorded and analysis. The recognition mode, an unknown gender person gives a speech input and the system makes a decision about the speaker’s identity. Both the training and the recognition modes include feature extraction, sometimes called the front-end of the system. The feature extractor converts the digital speech signal into a sequence of numerical descriptors, called feature vectors. The features provide a more stable, robust, and compact representation than the raw input signal. Feature extraction can be considered as a data reduction process that attempts to capture the essential characteristics of the speaker with a small data rate. Speech Signal Drive data from recorded Wave sound FFT Log10. 2 Sample point at maximum power spectrum Power Spectrum Frequency determination at max power
  • 6. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 6 Figure 4. Block diagram of the Gender Recognition System The software system contains the following steps: 1. To code and record the speech data as a disk file 2. To compute the Fast Fourier Transform of the data from recorded file 3. To compute the power spectrum from transformer data of step 2. 4. To compute the frequency from power spectrum of step 3. 5. To identify the Speaker as a male person or as female person or give a message that indicate the speaker is neither a male person nor a female person. The developed system is applied to store and analyze English phonemes “A” and “B”. The speech data are recorded for male and female speaker, FFT and power spectrum is calculated using these data files by the developed system. The results are shown in tabular form in Table1: Training Phase Input Speech Feature Extraction Pattern Matching Decision LogicDecision Recognition phase Feature Extraction Store Extracted Feature (data) in Database. Input Speech
  • 7. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 7 Table 1. Frequency at maximum power of some male and female corresponding English phonemes “A” and “B”. Speaker ID Phonemes No.of sample Frequency at maximum power (Male voice) Frequency at maximum power (Female voice) 1 A 1 570 672 2 A 1 588 720 3 A 1 498 672 4 A 1 462 729 5 A 1 432 687 7. SYSTEM RESULTS 7.1. RECOGNITION RESULTS The detailed recognition results are shown in Table 2. Table 2. For sample #1 No. of speaker No. of accuracy of Gender Recognize Percentage (%) 1 Male 100 2 Male 100 3 Male 100 4 Male 100 5 Male 100 6 Female 100 7 Female 100 8 Female 100 9 Female 100 10 Female 100 1 B 2 432 618 2 B 2 342 402 3 B 2 543 666 4 B 2 366 612 5 B 2 510 414
  • 8. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 8 Table 2. For sample #2 No. of speaker No. of accuracy of Gender Recognize Percentage (%) 1 Male 100 2 Male 0 3 Male 100 4 Male 0 5 Male 100 6 Female 100 7 Female 0 8 Female 100 9 Female 100 10 Female 0 7.2. EXPERIMENTAL RESULTS The system was tested with 10 speakers (5 male people and 5 female people).The speech (utterance) of word (“A” and “B”) of 10 speakers was recorded separately. The threshold technique was used in recognition. The objective of the experiment was to study the recognition performance of the system with different speakers. The percentage of accuracy rate of recognition has been calculated using following equations: No. of accurately recognized gender Recognition Accuracy (%) = X 100...................... (4) No. of correct gender expected From recognition result, number of accurately recognized gender 16, number of correct gender expected 20.The recognition accuracy is = 16*100/20 = 80%. 8. CONCLUSION In this paper the main goal was to develop a gender recognition system using speech signal. The feature selection is one of the most important factors in designing a gender recognition system. From the study of different previous research works it was observed that among the different features the power spectrum results in best classification rate. Thus the power spectrum has been selected as the feature for classification. Among the different technique, the statistical analysis and threshold technique is simple in computation and produces very good results. This is why this method was selected for pattern comparison in recognition process to obtain improved performance. The average recognition accuracy is 80 %. From experimental result, it can be seen that recognition rate decreases as the number of speaker increases. Therefore, the system efficiency is decreases, if the number of reference speakers in the speaker database increases. For recognition constant thresholds have been used. If we could use dynamic threshold for recognition it might produce more accurate and better recognition results.
  • 9. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.1, February 2012 9 REFERENCES [1]. Prabhakar, S., Pankanti, S., and Jain, A. “Biometric recognition: security and privacy concerns” IEEE Security and Privacy Magazine 1(2003), 33-42. [2]. Huang X., Acero, A., and Hon, H.-W. “Spoken Language Processing: a Guideto Theory, Algorithm and System Development” prentice-Hall, New Jersey, 2001. [3]. Martin, A., and Przybocki, M. Speaker recognition in a multi-speaker environment.In Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001) (Aalborg, Denmark, 2001), pp. 787–90. [4]. Tomi Kinnunen “ Spectral Feature for Automatic Voice-independent Speaker Recognition” Depertment of Computer Science, Joensuu University,Finland. December 21, 2003. [5]. John R. Deller, John G Proakis and John H. L. Hansen, “Discrete- Time Processing of Speech Signals” Macmillan Publishing company, 866 Third avenue, New York 10022. [6]. Rabiner Lawrence, Juang Bing-Hwang, “Fundamentals of Speech Recognitions”, Prentice Hall New Jersey, 1993, ISBN 0-13-015157-2. [7]. Md. Saidur Rahman, “Small Vocabulary Speech Recognition in Bangla Language”, M.Sc. Thesis, Dept. of Computer Science & Engineering, Islamic University, Kushtia-7003, July-2004. Biography Md. Sadek Ali received the Bachelor’s and Master‘s Degree in the Department of Information and Communication Engineering (ICE) from Islamic University, Kushtia, in 2004 and 2005 respectively. He is currently a Lecturer in the department of ICE, Islamic University, Kushtia-Bangladesh. Since 2003, he has been working as a Research Scientist at the Signal and Communication Research Laboratory, Department of ICE, Islamic University, Kushtia, where he belongs to the spread- spectrum research group. He has three published paper in international and one national journal in the same areas. He has also two published paper in international and one national conference in the same areas. His areas of interest include Signal processing, Wireless Communications, optical fiber communication, Spread Spectrum and mobile communication. Md. Shariful Islam received the Bachelor’s and Master‘s Degree in Applied Physics, Electronics & Communication Engineering from Islamic University, Kushtia, Bangladesh in 1999, and 2000 respectively. He is currently Assistant Professor in the department of ICE, Islamic University, Kushtia-7003 Bangladesh. He has five published papers in international and national journals. His areas of interest include signal processing & mobile communication. Md. Alamgir Hossain received the Bachelor’s and Master‘s Degree in the Dept. of Information and Communication Engineering (ICE) from Islamic University, Kushtia, in 2003 and 2004, respectively. He is currently Lecturer in the department of ICE, Islamic University, Kushtia-7003, and Bangladesh. He was a lecturer in the Dept. of Computer Science & Engineering from Institute of Science Trade & Technology (Under National University), Dhaka, Bangladesh. from 23th October, 2007 to 18th April 2010. Since 2003, he has been working as a Research Scientist at the Communication Reasearch Laboratory, Department of ICE, Islamic University, Kushtia, where he belongs to the spread-spectrum research group. He has two published paper in international and one national journal in the same areas. His current research interests include Wireless Communications, Spread Spectrum and mobile communication, Signal processing, Data Ming.