SlideShare a Scribd company logo
1 of 5
Download to read offline
© May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002
INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 190
Novel Methodologies for Classifying Gender and Emotions
Using Machine Learning Algorithms
G.Neema1
, Mr.C.J.Profun M.E(Ph.D)2
1
PG Student,Department of Electronics and Communication Engineering,
2
Assistant Professor, Department of Electronics and Communication Engineering,
DMI College of Engineering, Chennai, Tamil Nadu, India
Abstract- In this paper, we proposed an emotion and
gender detection and classification by human voice. In
the real world, security plays a very important role, to
improve such security we combine both emotion and
gender detection in single system which is better than
existing system, in the existing system gender is
classified by using image or video only emotion
detection is done separately. Our proposed system
combines emotion and gender detection based on voice
which uses a noise classification then feature extraction
uses Discrete Wavelet Transform , Noise removal based
on Hidden Markov Model (HMM), Finally classification
is done using K-nearest neighbours (KNN).The
Performance metrics is good of 97% as compared to the
existing system which is 75%-86%. Here we will get
final output as audio according to the emotion which
are being detected.
Index Terms- emotion and gender detection, Discrete
Wavelet Transform, Hidden Markov Model (HMM), K-
nearest neighbours (KNN).
I.INTRODUCTION
human life. As per human’s perspective or feelings
emotions are essential medium of expressing his or her
psychological state to others. Humans have the natural
ability to recognize the emotions of their
communication partner by using all their available
senses. They hear the sound, they read lips, they
interpret gestures and facial expression Humans has
normal ability to recognize an emotion through spoken
words but since machine does not have capability to
analyze emotions from speech signal for machine
emotion recognition using speech signal is very
difficult task. Automatic emotion recognition paid
close attention in identifying emotional state of
speaker from voice signal. An emotion plays a key role
for better decision making and there is a desirable
requirement for intelligent machine human interfaces.
Speech emotion Recognition is a complicated and
complex task because for a given speech sample there
are number of tentative answer found as recognized
emotion The vocal emotions may be acted or elicited
from “real” life situation .The identification and
detection of the human emotional state through his or
her voice signal or extracted feature from speech signal
means emotion recognition through speech. it is
principally useful for applications which require natural
machine human interaction such as E-tutoring ,
electronic machine pet , storytelling, intelligent sensing
toys , also in the car board system application where the
detected emotion of users which makes it more
practical . Emotion recognition from speech signal is
Useful for enhancing the naturalness in speech based
human machine interaction To improve machine human
interface automatic emotion recognition through speech
provides some other applications such as speech
emotion recognition system used in aircraft cockpits to
provide analysis of Psychological state of pilot to avoid
accidents. speech emotion recognition systems also
utilizes to recognize stress in speech for better
performance lie detection , in Call centre conversation
to analyze behavioural study of the customers which
helps to improve quality of service of a call attendant
also in medical field for Psychiatric diagnosis, emotion
analysis conversation between criminals would help
crime investigation department. if machine will able to
understand humans like emotions conversation with
robotic toys would be more realistic and enjoyable,
Interactive movie, remote teach school would be more
practical. There are various difficulties occurs in
emotion recognition from the speaker’s voice due to
certain reasons such as, existence of the differ in
speaking styles, speakers, sentences, languages,
speaking rates introduces accosting variability affected
different voice features this a particular features of
speech are not capable to distinguish between various
emotions also each emotion may correspond to the
different portions of the spoken utterance. In this
Project, K nearest Neighbour classifier is utilized for
classification of the basic six emotional states such as
anger, happiness, sad, fear, disgust and neutral state and
no distinct emotion is observed.
At the current time, the use of emotion in computers is
© May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002
INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 191
becoming an increasingly important field for human
computer interaction. Indeed, Affective computing is
becoming a focus in interactive technological systems
and more essential for communication, decision-
making and behavior. There is a rising need for
emotional state recognition in several domains, such as
health monitoring, video games and human-computer
interaction. Indeed, detection of emotion is becoming
an increasingly important field for human-computer
Fig 1:Three Layer Model
II.LITERATURE REVIEW
This different system also differs by different features
extracted and classifiers used for classification. There
are different features utilizes for recognizing emotion
from speech signal such as spectral features and
Prosodic features can be used. Because both of these
features contain large amount of emotional
information. Some of the spectral features are Mel-
frequency cepstrum This different system also differs
by different features extracted and classifiers used for
classification. There are different features utilizes for
recognizing emotion from speech signal such as
spectral features and Prosodic features can be used.
Because both of these features contain large amount of
emotional information. Some of the spectral features
are Mel-frequency cepstrum coefficients (MFCC) and
Linear predictive cepstrum coefficients (LPCC). Some
prosodic features formants , Fundamental frequency,
loudness , Pitch ,energy and speech intensity and
glottal parameters are the prosodic features also for
detecting emotions through speech some of the
semantic labels, linguistic and phonetic features also
used. To made human machine interaction becomes
more powerful there are various types of classifiers
which are used for emotion recognition such as
Gaussian Mixtures Model (GMM) ,k-nearest
neighbours (KNN), Hidden Markov Model (HMM),
Artificial Neural Network (ANN) , GMM super vector
based SVM classifier ,and Support Vector Machine
(SVM). A. Bombatkar, et.al studied K Nearest
Neighbour classifier which give recognition
performance for emotions upto 86.02% classification
accuracy for using energy, entropy, MFCC, ZCC, pitch
Features. Xianglin et al. has been performed emotion
classification using GMM and obtained the recognition
rate of 79% for best features. Also emotion recognition
in speaker independent recognition system typical
performance obtained of 75%, and that of 89.12% for
speaker dependent recognition using GMM if this study
was limited only on pitch and MFCC features. M. Khan
et.al. performed emotion classification using K-NN
classifier average accuracy 91.71% forward feature
selection while SVM classifier has accuracy of 76.57%
show SVM classification for neutral and fear emotion.
III.PROPOSED ARCHITECTURE
In the proposed system, uses a combination of emotion
and gender detection and classification using human
voice signal. In this system, human’s voice signal is
given as input signal , then pre-process step is carried
out ,then noise is classified depends upon the frequency
range. Then, Fast Fourier transform (FFT) is used for
converting time domain to frequency domain of the
signal then noise signal is removed from the original
signal by using Hidden markov model, A Hidden
Markov Model (HMM) is a powerful statistical tool
with many practical applications in temporal pattern
recognition. These applications include speech
enhancement, de-noising of speech, speech recognition
and related tasks. At present there is limited number of
efficient approaches to denoising of speech based on
single channel operations (i.e., where there is only one
sensor/microphone available in the system under
consideration). HMM based approach provides a viable
alternative to other methods such as spectral
subtraction, and, in many ways, is considered as more
powerful, generally speaking. The main reason for
being more powerful is that unlike the spectral
subtraction approach, which is based on the assumption
that the distractor (i.e., undesired signal such as noise)
is stationary, the HMM is not bounded by this limiting
assumption: it is intended to work with non-stationary
distractors as well. Then, feature extraction is done
using Discrete Wavelet Transform (DWT), Feature
© May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002
INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 192
selection prior to classification plays a vital role and a
feature selection technique which combines discrete
wavelet transform (DWT) and moving window
technique. The approximation coefficients of DWT
together with some useful features from the high
frequency coefficients selected by the maximum
modulus method are used as features. A novel way to
think of microarray data is as a signals set. The
number of genes is the length of signals and hence
signal processing techniques such as wavelet transform
can be used to perform microarray data analysis.
Finally , the Classification is done based on KNN
classifier, KNN is a non-parametric and lazy learning
algorithm. Nonparametric means there is no
assumption for underlying data distribution. In other
words, the model structure determined from the
dataset. This will be very helpful in practice where
most of the real world datasets do not follow
mathematical theoretical assumptions. Lazy algorithm
means it does not need any training data points for
model generation. All training data used in the testing
phase. This makes training faster and testing phase
slower and costlier. Costly testing phase means time
and memory. In the worst case, KNN needs more time
to scan all data points and scanning all data points will
require more memory for storing training data.
BLOCK DIAGRAM:
Fig 2: Overall block Diagram
IV.IMPLEMENTATION
A.PRE-PROCESSING
In speech processing it is often advantageous to divide
the signal into frames to achieve stationary. This
worksheet describes how to split speech into frames and
how to combine the frames into a speech
signal.Normally a speech signal is not stationary, but
seen from a short-time point of view it is. This result
from the fact that the glottal system cannot change
immediately. XXX states that a speech signal typically
is stationary in windows of 20 ms.
When the signal is framed it is necessary to
consider how to treat the edges of the frame. This result
from the harmonics the edges add. Therefore it is
expedient to use a window to tone down the edges. As a
consequence the samples will not be assigned the same
weight in the following computations and for this
reason it is prudent to use an overlap.
Fig 3:Illustration of Framing
Figure 3 shows how a speech signal is divided into
frames. Each frame shares the first part with the
previous frame and the last part with the next frame.
The time frame step tfs indicates how long time there is
between the start time of each frame. The overlap is
defined as the time from a new frame starts until the
current stops.
.B.FAST FOURIER TRANSFORM:
The FFT is a fast algorithm for computing the DFT. If
we take the 2-point DFT and 4-point DFT and
generalize them to 8-point, 16-point, ..., 2r -point, we
get the FFT algorithm. To compute the DFT of an N-
point sequence using equation (1) would take O(𝑁 2 )
multiplies and adds. The FFT algorithm computes the
DFT using O(N log N) multiplies and adds. There are
many variants of the FFT algorithm. We’ll discuss one
of them, the “decimation in-time” FFT algorithm for
sequences whose length is a power of two (N = 2r for
some integer r). The FFT algorithm decomposes the
DFT into log2 N stages, each of which consists of N/2
butterfly computations.Each butterfly takes two
complex numbers p and q and computes from them two
other numbers, p + αq and p − αq, where α is a complex
© May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002
INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 193
C.HIDDEN MARKOV MODEL:
The method used for recognition of speech as
mentioned in the introduction part is (HMM) Hidden
Markov Model. Training of models is achieved
through this method, which is used to represent an
utterance of the spoken word. To test the utterance,
this model is only used later. This model is later used
to test an utterance and probability of the model
having created the vector sequences.
When MFCC is achieved, all the given
training negotiations are required to be generalized.
The number of matrix States is divided into several
coefficients. Then all these metrics are used to
calculate the mean and variance. Amid the
experimentation with the quantity of entrance inside
the re-estimation of A the last assessed estimations of
A where seen to stray a considerable amount from the
earliest starting point estimation. The last introduction
estimations of A are instated with the accompanying
esteems rather, which will probably the reassessed
values (the re-estimation issue is managed later on in
this segment. Changes in the initial values are not an
important event, so according to the estimated process,
the estimation again adjusts the value to the right
people.
D.FEATURE EXTRACTION: DISCRETE
WAVELET TRANSFORM:
The Wavelet Transform (WT) is a technique for
analyzing signals. It was developed as an alternative to
the short time Fourier Transform (STFT) to overcome
problems related to its frequency and time resolution
properties. More specifically, unlike the STFT that
provides uniform time resolution for all frequencies
the DWT provides high time resolution and low
frequency resolution for high frequencies and high
frequency resolution and low time resolution for low
frequencies. In that respect it is similar to the human
ear which exhibits similar time-frequency resolution
characteristics. The Discrete Wavelet Transform
(DWT) is a special case of the WT that provides a
compact representation of a signal in time and
frequency that can be computed efficiently.
As a multirate filter bank the DWT can be viewed as a
constant Q filterbank with octave spacing between the
centres’ of the filters. Each subband contains half the
samples of the neighbouring higher frequency
subband. In the pyramidal algorithm the signal is
analyzed at different frequency bands with different
resolution by decomposing the signal into a coarse
approximation and detail information. The coarse
approximation is then further decomposed using the
same wavelet decomposition step. This is achieved by
successive highpass and low pass filtering of the time
domain signal
The extracted wavelet coefficients provide a compact
representation that shows the energy distribution of the
signal in time and frequency. In order to further reduce
the dimensionality of the extracted feature vectors,
statistics over the set of the wavelet coefficients are
used. That way the statistical characteristics of the
“texture” or the “music surface” of the piece can be
represented. For example the distribution of energy in
time and frequency for music is different from that of
speech.
E.CLASSIFICATION: KNN CLASSIFIER:
In pattern recognition, the k-Nearest Neighbors
algorithm (or k-NN) is a nonparametric method which
is used for classification and regression. The input
consists of the k closest training examples in the feature
space. The output depends on whether k-NN is used for
regression or classification .In k-NN classification, the
output is a class member. An object is classified by a
majority vote of its neighbors, with the object being
assigned to the class most common among its k nearest
neighbors (k is a positive integer, k ). If k = 1, then the
object is simply assigned to the class of that single
nearest neighbor. In K-NN regression, the output is the
property value for the object. This value is the average
of the values of its k nearest neighbors. K-NN comes
under instance based learning, or lazy learning, where
the function is only approximated locally and all
evaluation is deferred until classification. The KNN
algorithm is among the simplest of all machine learning
algorithms in the terms of classification and regression,
it can be useful to weight the contributions of the
neighbors, so that the nearer neighbors contribute more
to the average than the more distant ones. For example,
a common weighing scheme consists in giving each
neighbor a weight of 1/d, where d is the distance to the
neighbor.
The k-Nearest-Neighbours (kNN) method of
classification is one of the simplest methods in machine
learning, and is a great way to introduce yourself to
machine learning and classification in general. At its
most basic level, it is essentially classification by
finding the most similar data points in the training data,
and making an educated guess based on their
classifications.
© May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002
INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 194
V. RESULTS
Several experiments were performed in order to
evaluate the accuracy of the classifiers to determine
the emotion and gender. In the analysis of DWT, the
feature that gives the highest classification accuracy
for two emotions angry and excited is HMM. KNN
classifier is used for Gender Classification. In the
analysis of K Nearest Negibour model classifier, the
average correct classification of emotions is 97%. The
features extracted from the speech signal is HMM,
KNNclassifier is employed to get the emotion class
label. KNN experiments were conducted using the
MATLAB software using KNN classifier, and all
results are based on crossvalidation [13]. The acoustic
features were extracted from the best feature
combination from all features by this classifier.
Experiments were conducted using the MATLAB
software and Rapid Miner using Naive Bayes
Classifier, and all results are based on cross-validation.
The acoustic features such as shimmer, jitter, energy,
and pitch were extracted from the best feature
combination from all features by this classifier [14].
1.INPUT SIGNAL
2.OUTPUT: MALE:
3.FEMALE:
4.PERFORMANCE MATRIX
VI.CONCLUSION
Our proposed system combines emotion and gender
detection based on voice which uses a noise
classification then feature extraction uses Discrete
Wavelet Transform , Noise removal based on Hidden
Markov Model (HMM), Finally classification is done
using K-nearest neighbours (KNN).we will be getting
more accurate output while comparing with all other
classifier. In our project, the existing system limitations
of only detecting two emotions are being overcome by
our proposed system.
RFERENCES
[1]. Ayadi M. E., Kamel M. S. and Karray F., „Survey on
Speech Emotion Recognition: Features, Classification
Schemes, and Databases‟, Pattern Recognition, 44 (16), 572-
587, 2011.
[2]. A. S. Utane, Dr. S. L. Nalbalwar , “Emotion Recognition
through Speech Using Gaussian Mixture Model & Support
Vector Machine” International Journal of Scientific &
Engineering Research, Volume 4, Issue 5, May -2013
[3]. Chiriacescu I., „Automatic Emotion Analysis Based On
Speech‟, M.Sc.Thesis, Department of Electrical Engineering,
Delft University of Technology, 2009.
[4]. N. Thapliyal, G. Amoli “Speech based Emotion
Recognition with Gaussian Mixture Model” international
Journal of Advanced Research in Computer Engineering &
Technology Volume 1, Issue 5, July 2012
[5]. Zhou y., Sun Y., Zhang J, Yan Y., „Speech Emotion
Recognition using Both Spectral and Prosodic Features‟,
IEEE,23(5),545-549,2009

More Related Content

What's hot

Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Alexander Decker
 
Mfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechMfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechsipij
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSijnlc
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET Journal
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency Phan Duy
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITIONniranjan kumar
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327IJMER
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depressionijsrd.com
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in androidAnshuli Mittal
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_Dia Abdulkerim
 
A comparative analysis of classifiers in emotion recognition thru acoustic fea...
A comparative analysis of classifiers in emotion recognition thru acoustic fea...A comparative analysis of classifiers in emotion recognition thru acoustic fea...
A comparative analysis of classifiers in emotion recognition thru acoustic fea...Pravena Duplex
 

What's hot (19)

Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...Estimating the quality of digitally transmitted speech over satellite communi...
Estimating the quality of digitally transmitted speech over satellite communi...
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
histogram-based-emotion
histogram-based-emotionhistogram-based-emotion
histogram-based-emotion
 
article_EURASIP
article_EURASIParticle_EURASIP
article_EURASIP
 
Mfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speechMfcc based enlargement of the training set for emotion recognition in speech
Mfcc based enlargement of the training set for emotion recognition in speech
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Nd2421622165
Nd2421622165Nd2421622165
Nd2421622165
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
 
F334047
F334047F334047
F334047
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITIONDEVELOPMENT OF SPEAKER VERIFICATION  UNDER LIMITED DATA AND CONDITION
DEVELOPMENT OF SPEAKER VERIFICATION UNDER LIMITED DATA AND CONDITION
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
A017410108
A017410108A017410108
A017410108
 
A comparative analysis of classifiers in emotion recognition thru acoustic fea...
A comparative analysis of classifiers in emotion recognition thru acoustic fea...A comparative analysis of classifiers in emotion recognition thru acoustic fea...
A comparative analysis of classifiers in emotion recognition thru acoustic fea...
 

Similar to Classifying Gender and Emotions from Voice Using Machine Learning

SPEECH EMOTION RECOGNITION SYSTEM USING RNN
SPEECH EMOTION RECOGNITION SYSTEM USING RNNSPEECH EMOTION RECOGNITION SYSTEM USING RNN
SPEECH EMOTION RECOGNITION SYSTEM USING RNNIRJET Journal
 
IRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET Journal
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
 
IRJET- Comparative Analysis of Emotion Recognition System
IRJET- Comparative Analysis of Emotion Recognition SystemIRJET- Comparative Analysis of Emotion Recognition System
IRJET- Comparative Analysis of Emotion Recognition SystemIRJET Journal
 
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...HanzalaSiddiqui8
 
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...IRJET Journal
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptxJhalakDashora
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
 
ANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHM
ANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHMANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHM
ANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHMIRJET Journal
 
IRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep LearningIRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep LearningIRJET Journal
 
Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...IAESIJAI
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICEVamshidharSingh
 
Recognition of Facial Emotions Based on Sparse Coding
Recognition of Facial Emotions Based on Sparse CodingRecognition of Facial Emotions Based on Sparse Coding
Recognition of Facial Emotions Based on Sparse CodingIJERA Editor
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignmentskevig
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition Systemijtsrd
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientijait
 

Similar to Classifying Gender and Emotions from Voice Using Machine Learning (20)

SPEECH EMOTION RECOGNITION SYSTEM USING RNN
SPEECH EMOTION RECOGNITION SYSTEM USING RNNSPEECH EMOTION RECOGNITION SYSTEM USING RNN
SPEECH EMOTION RECOGNITION SYSTEM USING RNN
 
IRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET- Study of Effect of PCA on Speech Emotion Recognition
IRJET- Study of Effect of PCA on Speech Emotion Recognition
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep Learning
 
IRJET- Comparative Analysis of Emotion Recognition System
IRJET- Comparative Analysis of Emotion Recognition SystemIRJET- Comparative Analysis of Emotion Recognition System
IRJET- Comparative Analysis of Emotion Recognition System
 
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...Emotion Recognition through Speech Analysis using various Deep Learning Algor...
Emotion Recognition through Speech Analysis using various Deep Learning Algor...
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machine
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptx
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification Techniques
 
ANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHM
ANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHMANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHM
ANALYSING SPEECH EMOTION USING NEURAL NETWORK ALGORITHM
 
IRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep LearningIRJET- Prediction of Human Facial Expression using Deep Learning
IRJET- Prediction of Human Facial Expression using Deep Learning
 
Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...Speaker identification under noisy conditions using hybrid convolutional neur...
Speaker identification under noisy conditions using hybrid convolutional neur...
 
SPEECH BASED EMOTION RECOGNITION USING VOICE
SPEECH BASED  EMOTION RECOGNITION USING VOICESPEECH BASED  EMOTION RECOGNITION USING VOICE
SPEECH BASED EMOTION RECOGNITION USING VOICE
 
Recognition of Facial Emotions Based on Sparse Coding
Recognition of Facial Emotions Based on Sparse CodingRecognition of Facial Emotions Based on Sparse Coding
Recognition of Facial Emotions Based on Sparse Coding
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Classification of Language Speech Recognition System
Classification of Language Speech Recognition SystemClassification of Language Speech Recognition System
Classification of Language Speech Recognition System
 
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficientMalayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
Malayalam Isolated Digit Recognition using HMM and PLP cepstral coefficient
 

More from BRIGHT WORLD INNOVATIONS (6)

2018 2019 embedded titles
2018 2019 embedded titles2018 2019 embedded titles
2018 2019 embedded titles
 
Electrical
ElectricalElectrical
Electrical
 
JAVA Real time
JAVA Real timeJAVA Real time
JAVA Real time
 
Robotics
RoboticsRobotics
Robotics
 
matlab titles
matlab titlesmatlab titles
matlab titles
 
Embedded
EmbeddedEmbedded
Embedded
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Classifying Gender and Emotions from Voice Using Machine Learning

  • 1. © May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 190 Novel Methodologies for Classifying Gender and Emotions Using Machine Learning Algorithms G.Neema1 , Mr.C.J.Profun M.E(Ph.D)2 1 PG Student,Department of Electronics and Communication Engineering, 2 Assistant Professor, Department of Electronics and Communication Engineering, DMI College of Engineering, Chennai, Tamil Nadu, India Abstract- In this paper, we proposed an emotion and gender detection and classification by human voice. In the real world, security plays a very important role, to improve such security we combine both emotion and gender detection in single system which is better than existing system, in the existing system gender is classified by using image or video only emotion detection is done separately. Our proposed system combines emotion and gender detection based on voice which uses a noise classification then feature extraction uses Discrete Wavelet Transform , Noise removal based on Hidden Markov Model (HMM), Finally classification is done using K-nearest neighbours (KNN).The Performance metrics is good of 97% as compared to the existing system which is 75%-86%. Here we will get final output as audio according to the emotion which are being detected. Index Terms- emotion and gender detection, Discrete Wavelet Transform, Hidden Markov Model (HMM), K- nearest neighbours (KNN). I.INTRODUCTION human life. As per human’s perspective or feelings emotions are essential medium of expressing his or her psychological state to others. Humans have the natural ability to recognize the emotions of their communication partner by using all their available senses. They hear the sound, they read lips, they interpret gestures and facial expression Humans has normal ability to recognize an emotion through spoken words but since machine does not have capability to analyze emotions from speech signal for machine emotion recognition using speech signal is very difficult task. Automatic emotion recognition paid close attention in identifying emotional state of speaker from voice signal. An emotion plays a key role for better decision making and there is a desirable requirement for intelligent machine human interfaces. Speech emotion Recognition is a complicated and complex task because for a given speech sample there are number of tentative answer found as recognized emotion The vocal emotions may be acted or elicited from “real” life situation .The identification and detection of the human emotional state through his or her voice signal or extracted feature from speech signal means emotion recognition through speech. it is principally useful for applications which require natural machine human interaction such as E-tutoring , electronic machine pet , storytelling, intelligent sensing toys , also in the car board system application where the detected emotion of users which makes it more practical . Emotion recognition from speech signal is Useful for enhancing the naturalness in speech based human machine interaction To improve machine human interface automatic emotion recognition through speech provides some other applications such as speech emotion recognition system used in aircraft cockpits to provide analysis of Psychological state of pilot to avoid accidents. speech emotion recognition systems also utilizes to recognize stress in speech for better performance lie detection , in Call centre conversation to analyze behavioural study of the customers which helps to improve quality of service of a call attendant also in medical field for Psychiatric diagnosis, emotion analysis conversation between criminals would help crime investigation department. if machine will able to understand humans like emotions conversation with robotic toys would be more realistic and enjoyable, Interactive movie, remote teach school would be more practical. There are various difficulties occurs in emotion recognition from the speaker’s voice due to certain reasons such as, existence of the differ in speaking styles, speakers, sentences, languages, speaking rates introduces accosting variability affected different voice features this a particular features of speech are not capable to distinguish between various emotions also each emotion may correspond to the different portions of the spoken utterance. In this Project, K nearest Neighbour classifier is utilized for classification of the basic six emotional states such as anger, happiness, sad, fear, disgust and neutral state and no distinct emotion is observed. At the current time, the use of emotion in computers is
  • 2. © May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 191 becoming an increasingly important field for human computer interaction. Indeed, Affective computing is becoming a focus in interactive technological systems and more essential for communication, decision- making and behavior. There is a rising need for emotional state recognition in several domains, such as health monitoring, video games and human-computer interaction. Indeed, detection of emotion is becoming an increasingly important field for human-computer Fig 1:Three Layer Model II.LITERATURE REVIEW This different system also differs by different features extracted and classifiers used for classification. There are different features utilizes for recognizing emotion from speech signal such as spectral features and Prosodic features can be used. Because both of these features contain large amount of emotional information. Some of the spectral features are Mel- frequency cepstrum This different system also differs by different features extracted and classifiers used for classification. There are different features utilizes for recognizing emotion from speech signal such as spectral features and Prosodic features can be used. Because both of these features contain large amount of emotional information. Some of the spectral features are Mel-frequency cepstrum coefficients (MFCC) and Linear predictive cepstrum coefficients (LPCC). Some prosodic features formants , Fundamental frequency, loudness , Pitch ,energy and speech intensity and glottal parameters are the prosodic features also for detecting emotions through speech some of the semantic labels, linguistic and phonetic features also used. To made human machine interaction becomes more powerful there are various types of classifiers which are used for emotion recognition such as Gaussian Mixtures Model (GMM) ,k-nearest neighbours (KNN), Hidden Markov Model (HMM), Artificial Neural Network (ANN) , GMM super vector based SVM classifier ,and Support Vector Machine (SVM). A. Bombatkar, et.al studied K Nearest Neighbour classifier which give recognition performance for emotions upto 86.02% classification accuracy for using energy, entropy, MFCC, ZCC, pitch Features. Xianglin et al. has been performed emotion classification using GMM and obtained the recognition rate of 79% for best features. Also emotion recognition in speaker independent recognition system typical performance obtained of 75%, and that of 89.12% for speaker dependent recognition using GMM if this study was limited only on pitch and MFCC features. M. Khan et.al. performed emotion classification using K-NN classifier average accuracy 91.71% forward feature selection while SVM classifier has accuracy of 76.57% show SVM classification for neutral and fear emotion. III.PROPOSED ARCHITECTURE In the proposed system, uses a combination of emotion and gender detection and classification using human voice signal. In this system, human’s voice signal is given as input signal , then pre-process step is carried out ,then noise is classified depends upon the frequency range. Then, Fast Fourier transform (FFT) is used for converting time domain to frequency domain of the signal then noise signal is removed from the original signal by using Hidden markov model, A Hidden Markov Model (HMM) is a powerful statistical tool with many practical applications in temporal pattern recognition. These applications include speech enhancement, de-noising of speech, speech recognition and related tasks. At present there is limited number of efficient approaches to denoising of speech based on single channel operations (i.e., where there is only one sensor/microphone available in the system under consideration). HMM based approach provides a viable alternative to other methods such as spectral subtraction, and, in many ways, is considered as more powerful, generally speaking. The main reason for being more powerful is that unlike the spectral subtraction approach, which is based on the assumption that the distractor (i.e., undesired signal such as noise) is stationary, the HMM is not bounded by this limiting assumption: it is intended to work with non-stationary distractors as well. Then, feature extraction is done using Discrete Wavelet Transform (DWT), Feature
  • 3. © May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 192 selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique. The approximation coefficients of DWT together with some useful features from the high frequency coefficients selected by the maximum modulus method are used as features. A novel way to think of microarray data is as a signals set. The number of genes is the length of signals and hence signal processing techniques such as wavelet transform can be used to perform microarray data analysis. Finally , the Classification is done based on KNN classifier, KNN is a non-parametric and lazy learning algorithm. Nonparametric means there is no assumption for underlying data distribution. In other words, the model structure determined from the dataset. This will be very helpful in practice where most of the real world datasets do not follow mathematical theoretical assumptions. Lazy algorithm means it does not need any training data points for model generation. All training data used in the testing phase. This makes training faster and testing phase slower and costlier. Costly testing phase means time and memory. In the worst case, KNN needs more time to scan all data points and scanning all data points will require more memory for storing training data. BLOCK DIAGRAM: Fig 2: Overall block Diagram IV.IMPLEMENTATION A.PRE-PROCESSING In speech processing it is often advantageous to divide the signal into frames to achieve stationary. This worksheet describes how to split speech into frames and how to combine the frames into a speech signal.Normally a speech signal is not stationary, but seen from a short-time point of view it is. This result from the fact that the glottal system cannot change immediately. XXX states that a speech signal typically is stationary in windows of 20 ms. When the signal is framed it is necessary to consider how to treat the edges of the frame. This result from the harmonics the edges add. Therefore it is expedient to use a window to tone down the edges. As a consequence the samples will not be assigned the same weight in the following computations and for this reason it is prudent to use an overlap. Fig 3:Illustration of Framing Figure 3 shows how a speech signal is divided into frames. Each frame shares the first part with the previous frame and the last part with the next frame. The time frame step tfs indicates how long time there is between the start time of each frame. The overlap is defined as the time from a new frame starts until the current stops. .B.FAST FOURIER TRANSFORM: The FFT is a fast algorithm for computing the DFT. If we take the 2-point DFT and 4-point DFT and generalize them to 8-point, 16-point, ..., 2r -point, we get the FFT algorithm. To compute the DFT of an N- point sequence using equation (1) would take O(𝑁 2 ) multiplies and adds. The FFT algorithm computes the DFT using O(N log N) multiplies and adds. There are many variants of the FFT algorithm. We’ll discuss one of them, the “decimation in-time” FFT algorithm for sequences whose length is a power of two (N = 2r for some integer r). The FFT algorithm decomposes the DFT into log2 N stages, each of which consists of N/2 butterfly computations.Each butterfly takes two complex numbers p and q and computes from them two other numbers, p + αq and p − αq, where α is a complex
  • 4. © May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 193 C.HIDDEN MARKOV MODEL: The method used for recognition of speech as mentioned in the introduction part is (HMM) Hidden Markov Model. Training of models is achieved through this method, which is used to represent an utterance of the spoken word. To test the utterance, this model is only used later. This model is later used to test an utterance and probability of the model having created the vector sequences. When MFCC is achieved, all the given training negotiations are required to be generalized. The number of matrix States is divided into several coefficients. Then all these metrics are used to calculate the mean and variance. Amid the experimentation with the quantity of entrance inside the re-estimation of A the last assessed estimations of A where seen to stray a considerable amount from the earliest starting point estimation. The last introduction estimations of A are instated with the accompanying esteems rather, which will probably the reassessed values (the re-estimation issue is managed later on in this segment. Changes in the initial values are not an important event, so according to the estimated process, the estimation again adjusts the value to the right people. D.FEATURE EXTRACTION: DISCRETE WAVELET TRANSFORM: The Wavelet Transform (WT) is a technique for analyzing signals. It was developed as an alternative to the short time Fourier Transform (STFT) to overcome problems related to its frequency and time resolution properties. More specifically, unlike the STFT that provides uniform time resolution for all frequencies the DWT provides high time resolution and low frequency resolution for high frequencies and high frequency resolution and low time resolution for low frequencies. In that respect it is similar to the human ear which exhibits similar time-frequency resolution characteristics. The Discrete Wavelet Transform (DWT) is a special case of the WT that provides a compact representation of a signal in time and frequency that can be computed efficiently. As a multirate filter bank the DWT can be viewed as a constant Q filterbank with octave spacing between the centres’ of the filters. Each subband contains half the samples of the neighbouring higher frequency subband. In the pyramidal algorithm the signal is analyzed at different frequency bands with different resolution by decomposing the signal into a coarse approximation and detail information. The coarse approximation is then further decomposed using the same wavelet decomposition step. This is achieved by successive highpass and low pass filtering of the time domain signal The extracted wavelet coefficients provide a compact representation that shows the energy distribution of the signal in time and frequency. In order to further reduce the dimensionality of the extracted feature vectors, statistics over the set of the wavelet coefficients are used. That way the statistical characteristics of the “texture” or the “music surface” of the piece can be represented. For example the distribution of energy in time and frequency for music is different from that of speech. E.CLASSIFICATION: KNN CLASSIFIER: In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN) is a nonparametric method which is used for classification and regression. The input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for regression or classification .In k-NN classification, the output is a class member. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, k ). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. In K-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors. K-NN comes under instance based learning, or lazy learning, where the function is only approximated locally and all evaluation is deferred until classification. The KNN algorithm is among the simplest of all machine learning algorithms in the terms of classification and regression, it can be useful to weight the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighing scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor. The k-Nearest-Neighbours (kNN) method of classification is one of the simplest methods in machine learning, and is a great way to introduce yourself to machine learning and classification in general. At its most basic level, it is essentially classification by finding the most similar data points in the training data, and making an educated guess based on their classifications.
  • 5. © May 2019 | IJIRT | Volume 5 Issue 12 | ISSN: 2349-6002 INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN TECHNOLOGYIJIRT 148155 194 V. RESULTS Several experiments were performed in order to evaluate the accuracy of the classifiers to determine the emotion and gender. In the analysis of DWT, the feature that gives the highest classification accuracy for two emotions angry and excited is HMM. KNN classifier is used for Gender Classification. In the analysis of K Nearest Negibour model classifier, the average correct classification of emotions is 97%. The features extracted from the speech signal is HMM, KNNclassifier is employed to get the emotion class label. KNN experiments were conducted using the MATLAB software using KNN classifier, and all results are based on crossvalidation [13]. The acoustic features were extracted from the best feature combination from all features by this classifier. Experiments were conducted using the MATLAB software and Rapid Miner using Naive Bayes Classifier, and all results are based on cross-validation. The acoustic features such as shimmer, jitter, energy, and pitch were extracted from the best feature combination from all features by this classifier [14]. 1.INPUT SIGNAL 2.OUTPUT: MALE: 3.FEMALE: 4.PERFORMANCE MATRIX VI.CONCLUSION Our proposed system combines emotion and gender detection based on voice which uses a noise classification then feature extraction uses Discrete Wavelet Transform , Noise removal based on Hidden Markov Model (HMM), Finally classification is done using K-nearest neighbours (KNN).we will be getting more accurate output while comparing with all other classifier. In our project, the existing system limitations of only detecting two emotions are being overcome by our proposed system. RFERENCES [1]. Ayadi M. E., Kamel M. S. and Karray F., „Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases‟, Pattern Recognition, 44 (16), 572- 587, 2011. [2]. A. S. Utane, Dr. S. L. Nalbalwar , “Emotion Recognition through Speech Using Gaussian Mixture Model & Support Vector Machine” International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May -2013 [3]. Chiriacescu I., „Automatic Emotion Analysis Based On Speech‟, M.Sc.Thesis, Department of Electrical Engineering, Delft University of Technology, 2009. [4]. N. Thapliyal, G. Amoli “Speech based Emotion Recognition with Gaussian Mixture Model” international Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 [5]. Zhou y., Sun Y., Zhang J, Yan Y., „Speech Emotion Recognition using Both Spectral and Prosodic Features‟, IEEE,23(5),545-549,2009