SPEECH BASED
EMOTION
RECOGNITION
“SPEECH BASED EMOTION RECOGNITION”
MAJOR PROJECT REVIEW
BY
G.HANNAH SANJANA 17P71A1209
MANASA MITTIPALLY 17P71A1218
VAMSHIDHAR SINGH 17P71A1248
UNDER THE GUIDANCE OF
MRS M.SUPRIYA , ASSOCIATE PROFESSOR
DEPARTMENT OF INFORMATION TECHNOLOGY
SWAMI VIVEKANANDA INSTITUTE OF TECHNOLOGY
Mahbub College Campus, R.P Road, Secunderabad-03
(Affiliated to JNTUH)
2017-2021
CONTENTS :
 Abstract
 Introduction
 Existing System
 Disadvantages of Existing System
 Proposed System
 Advantages of Proposed System
 System Specifications
 UML Diagrams
 Output Screens
 Conclusion
ABSTRACT
 Speech emotion recognition is a trending research topic these days, with its
main motive to improve the human-machine interaction. At present, most of
the work in this area utilizes extraction of discriminatory features for the
purpose of classification of emotions into various categories.
 Most of the present work involves the utterance of words which is used for
lexical analysis for emotion recognition. In our project, a technique is utilized
for classifying emotions into Angry', 'Calm', 'Fearful', 'Happy', and 'Sad'
categories.
ABSTRACT
 In previous works, the maximum cross correlation between audio files is
computed for labeling the speech data into one of the only few (three) emotion
categories. Accordingly, there was one more work developed in MATLAB for
Identification of an emotion for any audio file passed as an argument.
 A variety of classifiers are used through the MATLAB classification learner
toolbox, to classify only few emotion categories. The proposed techniques pave
way for a real-time prototype for speech emotion recognition with open-source
features.
INTRODUCTION
 Speech emotion recognition is a technology that extracts emotion features from computer
speech signals, compares them, and analyzes the feature parameters and the obtained emotion
changes. Recognizing emotions from audio signals requires feature extraction and classifier
training.
 The feature vector is composed of audio signal elements that characterize the specific
characteristics of the speaker (such as pitch, volume, energy), which is essential for training
the classifier model to accurately recognize specific emotions.
EXISTING SYSTEM
 The existing work in this area reveals that most of the present work relies on lexical
analysis for emotion recognition, that have been used for the purpose of classification of
emotions into three categories, i.e., Angry, Happy and Neutral. The maximum cross-
correlation between the discrete time sequences of the audio signals is computed and the
highest degree of correlation between the testing audio file and the training audio file is
used as an integral parameter for identification of a particular emotion type.
 The second technique is used with the feature extraction of discriminatory features with the
Cubic SVM classifier for recognition of Angry, Happy and Neutral emotion segments only.
DISADVANTAGES OF EXISTING
SYSTEM:
 The system is very static in nature and cannot
provide any good performance in real time
systems.
 The system is very slow as to compare the
correlations of the complete dataset with just one
audio file.
 Variable length audio files are not
understandable.
 Long pre-processing steps are required for the
model to understand the audio signal.
 Expensive and not upgradable.
PROPOSED SYSTEM
 In the project, MFCC has been used as the feature for classifying the speech data into
various emotion categories employing artificial neural networks. The usage of the Neural
Networks provides us the advantage of classifying many different types of emotions in a
variable length of audio signal in a real time environment.
 This technique manages to establish a good balance between computational volume and
performance accuracy of the real-time processes.
ADVANTAGES OF PROPOSED
SYSTEM:
 Can be implemented in any hardware
supporting the python language.
 Very fast in processing the audio and easy
to use.
 Variable length audio files are understood
by the system.
SYSTEM SPECIFICATIONS:
 HARDWARE:
Processor : CORE i3
Hard disk : 250GB
RAM : 8 GB
 SOFTWARE:
Operating system : WINDOWS 10
Programming language : PYTHON 3.8.6)
System Architecture:
CNN Model for
Speech Recognition:
UML DIAGRAMS
USE CASE
DIAGRAM:
SEQUENCE
DIAGRAM:
CLASS
DIAGRAM:
ACTIVITY
DIAGRAM:
OUTPUT SCREENS
GUI FOR EMOTION RECOGNISER:
OUTPUT FOR HAPPY EMOTION:
OUTPUT FOR
DIFFERENT
EMOTIONS:
OUTPUT FOR
DIFFERENT
EMOTIONS:
CONCLUSION
 The CNN model was trained and based on this we were able to give the
emotions of a person based on speech.
 The trained model is giving us the F1 score of 91.04.
 ‘Happy’, ‘Sad’, ‘Fearful, ’Calm’, ‘Angry’ are the five different emotions
which are given using this project.
 This speech based emotion recognition can be used in understanding the
opinions/ sentiments they express regarding a product or a political opinion
etc.. by giving the audio as the input to this model.
FUTURE ENHANCEMENTS
 Making the system more accurate.
 Various other emotions can be added to it like disgusted, surprised etc..
 Integrating the system with different platforms.
THANK YOU!

SPEECH BASED EMOTION RECOGNITION USING VOICE

  • 1.
  • 2.
    “SPEECH BASED EMOTIONRECOGNITION” MAJOR PROJECT REVIEW BY G.HANNAH SANJANA 17P71A1209 MANASA MITTIPALLY 17P71A1218 VAMSHIDHAR SINGH 17P71A1248 UNDER THE GUIDANCE OF MRS M.SUPRIYA , ASSOCIATE PROFESSOR DEPARTMENT OF INFORMATION TECHNOLOGY SWAMI VIVEKANANDA INSTITUTE OF TECHNOLOGY Mahbub College Campus, R.P Road, Secunderabad-03 (Affiliated to JNTUH) 2017-2021
  • 3.
    CONTENTS :  Abstract Introduction  Existing System  Disadvantages of Existing System  Proposed System  Advantages of Proposed System  System Specifications  UML Diagrams  Output Screens  Conclusion
  • 4.
    ABSTRACT  Speech emotionrecognition is a trending research topic these days, with its main motive to improve the human-machine interaction. At present, most of the work in this area utilizes extraction of discriminatory features for the purpose of classification of emotions into various categories.  Most of the present work involves the utterance of words which is used for lexical analysis for emotion recognition. In our project, a technique is utilized for classifying emotions into Angry', 'Calm', 'Fearful', 'Happy', and 'Sad' categories.
  • 5.
    ABSTRACT  In previousworks, the maximum cross correlation between audio files is computed for labeling the speech data into one of the only few (three) emotion categories. Accordingly, there was one more work developed in MATLAB for Identification of an emotion for any audio file passed as an argument.  A variety of classifiers are used through the MATLAB classification learner toolbox, to classify only few emotion categories. The proposed techniques pave way for a real-time prototype for speech emotion recognition with open-source features.
  • 6.
    INTRODUCTION  Speech emotionrecognition is a technology that extracts emotion features from computer speech signals, compares them, and analyzes the feature parameters and the obtained emotion changes. Recognizing emotions from audio signals requires feature extraction and classifier training.  The feature vector is composed of audio signal elements that characterize the specific characteristics of the speaker (such as pitch, volume, energy), which is essential for training the classifier model to accurately recognize specific emotions.
  • 7.
    EXISTING SYSTEM  Theexisting work in this area reveals that most of the present work relies on lexical analysis for emotion recognition, that have been used for the purpose of classification of emotions into three categories, i.e., Angry, Happy and Neutral. The maximum cross- correlation between the discrete time sequences of the audio signals is computed and the highest degree of correlation between the testing audio file and the training audio file is used as an integral parameter for identification of a particular emotion type.  The second technique is used with the feature extraction of discriminatory features with the Cubic SVM classifier for recognition of Angry, Happy and Neutral emotion segments only.
  • 8.
    DISADVANTAGES OF EXISTING SYSTEM: The system is very static in nature and cannot provide any good performance in real time systems.  The system is very slow as to compare the correlations of the complete dataset with just one audio file.  Variable length audio files are not understandable.  Long pre-processing steps are required for the model to understand the audio signal.  Expensive and not upgradable.
  • 9.
    PROPOSED SYSTEM  Inthe project, MFCC has been used as the feature for classifying the speech data into various emotion categories employing artificial neural networks. The usage of the Neural Networks provides us the advantage of classifying many different types of emotions in a variable length of audio signal in a real time environment.  This technique manages to establish a good balance between computational volume and performance accuracy of the real-time processes.
  • 10.
    ADVANTAGES OF PROPOSED SYSTEM: Can be implemented in any hardware supporting the python language.  Very fast in processing the audio and easy to use.  Variable length audio files are understood by the system.
  • 11.
    SYSTEM SPECIFICATIONS:  HARDWARE: Processor: CORE i3 Hard disk : 250GB RAM : 8 GB  SOFTWARE: Operating system : WINDOWS 10 Programming language : PYTHON 3.8.6)
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
    GUI FOR EMOTIONRECOGNISER:
  • 21.
  • 22.
  • 23.
  • 24.
    CONCLUSION  The CNNmodel was trained and based on this we were able to give the emotions of a person based on speech.  The trained model is giving us the F1 score of 91.04.  ‘Happy’, ‘Sad’, ‘Fearful, ’Calm’, ‘Angry’ are the five different emotions which are given using this project.  This speech based emotion recognition can be used in understanding the opinions/ sentiments they express regarding a product or a political opinion etc.. by giving the audio as the input to this model.
  • 25.
    FUTURE ENHANCEMENTS  Makingthe system more accurate.  Various other emotions can be added to it like disgusted, surprised etc..  Integrating the system with different platforms.
  • 26.