Advancing Sentiment
Analysis in Audio: Deep
Learning & NLP
approaches for
Emotion detection in
spoken language
AGENDA :
• Introduction
• Technologies Used
• Speech Dataset
• Block Diagram
• Process Flow
• GUI
• Real world Applications
• Conclusion
INTRODUCTION
• Speech Emotion Recognition (SER) is a technology designed to
detect and interpret human emotions through spoken words. By
analyzing these aspects of speech, SER systems can identify the
emotion a person feels—like happiness, sadness, anger, or
calmness.
• ER enables computers to sense and respond to emotional states,
making interactions with technology more intuitive and human-
like.
• This project specifically focuses on developing a model that will
take audio input, analyze it, and output the detected emotion
• This project detect human emotion via speech recognition by using
speech spectrogram. This detect the emotion by analysingthe
information inside the spectrogram.
• In this project we use Convolutional Neural Networks (CNNs)to
classify emotions, focusing on contrasting emotional states like
happiness and sadness.
• •There are a variety of temporal and spectral features that can be
extracted from human speech. We use statistics relating to the pitch,
Mel Frequency Cepstral Coefficients (MFCCs) and Formants of
speech as inputs to classification algorithms.
TECHNOLOGIES USED
• Audio Processing
• Librosa : Used to analyze and process speech signals.
• Extracts audio features like MFCCs (Mel-frequency Cepstral Coefficients), which are
crucial for representing the frequency characteristics of speech.
• Influencer Marketing
• Email Marketing Campaign
• Deep Learning
• sklearn(scikit-learn):For preprocessing tasks such as label encoding and splitting
datasets
• tensorflow / keras: For implementing and training neural networks.
TECHNOLOGIES USED
• Data Handling
• NumPy : Enables efficient computation of large numerical datasets during feature
processing and model training.
• os: For file and directory operations.
• Visualization
• matplotlib: For visualizing training metrics, feature distributions, or results.
• Influencer Marketing
• Email Marketing Campaign
• GUI Development
• Tkinter: For creating the graphical user interface (GUI) of the emotion detection application.
• Pillow (PIL): For handling images within the GUI.
• OAF_angry 200
• OAF_disgust 200
• OAF_Fear 200
• OAF_happy 200
• OAF_neutral 200
• OAF_Pleasant_surprise 200
• OAF_Sad 200
• YAF_angry 200
• YAF_disgust 200
• YAF_fear 200
• YAF_happy 200
• YAF_pleasant_surprised 200
• YAF_neutral 200
• YAF_sad 200
• Total samples: 2800
SPEECH DATASET
•We referred dataset TESS (Toronto emotional speech set) Dataset.
•Speech part of dataset includes happy, sad, angry, fearful , natural,
surprise, and disgust emotions
BLOCK DIAGRAM
@reallygreatsite www.reallygreatsite.com
PROCESS FLOW
• Load speech audio files from a dataset(e.g., wav )
Audio Data Input
Preprocessing
• Noise Reduction: Apply techniques to filter out unwanted noise
• Resampling: Convert audio to a standard sample rate (e.g., 16
kHz).
• Normalization: Normalize the audio signal to ensure consistent
amplitude levels.
• Segmentation: If necessary, split long recordings into smaller
frames or windows.
@reallygreatsite www.reallygreatsite.com
Feature Extraction
Model Training
• MFCC (Mel-Frequency Cepstral Coefficients): Extract tone
features that capture frequency characteristics.
• Libraries used: Librosa.
• Spectrograms: Visual representations of audio used for CNN-
based models
• Chroma Features: Additional features for capturing pitch and
frequency patterns.
• Train Deep Learning Models: Use models such as CNNs
(Convolutional Neural Networks) for emotion classification.
• CNNs: Work with spectrograms or image-like audio features.
• Train on Features: Input the extracted features into the
model for training
@reallygreatsite www.reallygreatsite.com
Emotion Classification
GUI Development
• Emotion Categories: Classify audio into predefined emotional
states such as Happy, Sad, Angry, Neutral , Surprised etc.
• Model Evaluation: Measure performance using accuracy,
confusion matrix, etc.
• Tkinter : Provides the graphical user interface (GUI) for the project,
allowing users to interact with the emotion detection system
intuitively.
• Enables functionality like uploading audio files, displaying
predictions, and integrating visual elements like emojis for
emotions.
• Pillow (PIL):Supports image processing tasks in the GUI, such as
displaying emotion-related images or emojis.
SPECTROGRAMS
ANGRY
SPECTROGRAMS
HAPPY
GUI
HOME PAGE
GUI
EMOTION PREDICTION
REAL WORLD
APPLICATIONS
• Customer Service and Call Centers
• Healthcare and Mental Health Support
• Smart Assistants and Human-
Computer Interaction (HCI)
• Education and Online Learning
• Entertainment and Gaming
CONCLUSION
• Speech Emotion Recognition helps
machines understand and respond to
human emotions, making interactions more
natural and user-friendly.
• Hence our project presents a new way to
give the ability to machine to determine the
emotion with the help of the human voice. It
will give the machine the ability to have a
better approach towards having a better
conversation and seamless conversation
like human does.
Thank
you

Advancing Sentiment Analysis in Audio: Deep Learning & NLP approaches for Emotion detection in spoken language.pptx

  • 1.
    Advancing Sentiment Analysis inAudio: Deep Learning & NLP approaches for Emotion detection in spoken language
  • 2.
    AGENDA : • Introduction •Technologies Used • Speech Dataset • Block Diagram • Process Flow • GUI • Real world Applications • Conclusion
  • 3.
    INTRODUCTION • Speech EmotionRecognition (SER) is a technology designed to detect and interpret human emotions through spoken words. By analyzing these aspects of speech, SER systems can identify the emotion a person feels—like happiness, sadness, anger, or calmness. • ER enables computers to sense and respond to emotional states, making interactions with technology more intuitive and human- like. • This project specifically focuses on developing a model that will take audio input, analyze it, and output the detected emotion
  • 4.
    • This projectdetect human emotion via speech recognition by using speech spectrogram. This detect the emotion by analysingthe information inside the spectrogram. • In this project we use Convolutional Neural Networks (CNNs)to classify emotions, focusing on contrasting emotional states like happiness and sadness. • •There are a variety of temporal and spectral features that can be extracted from human speech. We use statistics relating to the pitch, Mel Frequency Cepstral Coefficients (MFCCs) and Formants of speech as inputs to classification algorithms.
  • 5.
    TECHNOLOGIES USED • AudioProcessing • Librosa : Used to analyze and process speech signals. • Extracts audio features like MFCCs (Mel-frequency Cepstral Coefficients), which are crucial for representing the frequency characteristics of speech. • Influencer Marketing • Email Marketing Campaign • Deep Learning • sklearn(scikit-learn):For preprocessing tasks such as label encoding and splitting datasets • tensorflow / keras: For implementing and training neural networks.
  • 6.
    TECHNOLOGIES USED • DataHandling • NumPy : Enables efficient computation of large numerical datasets during feature processing and model training. • os: For file and directory operations. • Visualization • matplotlib: For visualizing training metrics, feature distributions, or results. • Influencer Marketing • Email Marketing Campaign • GUI Development • Tkinter: For creating the graphical user interface (GUI) of the emotion detection application. • Pillow (PIL): For handling images within the GUI.
  • 7.
    • OAF_angry 200 •OAF_disgust 200 • OAF_Fear 200 • OAF_happy 200 • OAF_neutral 200 • OAF_Pleasant_surprise 200 • OAF_Sad 200 • YAF_angry 200 • YAF_disgust 200 • YAF_fear 200 • YAF_happy 200 • YAF_pleasant_surprised 200 • YAF_neutral 200 • YAF_sad 200 • Total samples: 2800 SPEECH DATASET •We referred dataset TESS (Toronto emotional speech set) Dataset. •Speech part of dataset includes happy, sad, angry, fearful , natural, surprise, and disgust emotions
  • 8.
  • 9.
    @reallygreatsite www.reallygreatsite.com PROCESS FLOW •Load speech audio files from a dataset(e.g., wav ) Audio Data Input Preprocessing • Noise Reduction: Apply techniques to filter out unwanted noise • Resampling: Convert audio to a standard sample rate (e.g., 16 kHz). • Normalization: Normalize the audio signal to ensure consistent amplitude levels. • Segmentation: If necessary, split long recordings into smaller frames or windows.
  • 10.
    @reallygreatsite www.reallygreatsite.com Feature Extraction ModelTraining • MFCC (Mel-Frequency Cepstral Coefficients): Extract tone features that capture frequency characteristics. • Libraries used: Librosa. • Spectrograms: Visual representations of audio used for CNN- based models • Chroma Features: Additional features for capturing pitch and frequency patterns. • Train Deep Learning Models: Use models such as CNNs (Convolutional Neural Networks) for emotion classification. • CNNs: Work with spectrograms or image-like audio features. • Train on Features: Input the extracted features into the model for training
  • 11.
    @reallygreatsite www.reallygreatsite.com Emotion Classification GUIDevelopment • Emotion Categories: Classify audio into predefined emotional states such as Happy, Sad, Angry, Neutral , Surprised etc. • Model Evaluation: Measure performance using accuracy, confusion matrix, etc. • Tkinter : Provides the graphical user interface (GUI) for the project, allowing users to interact with the emotion detection system intuitively. • Enables functionality like uploading audio files, displaying predictions, and integrating visual elements like emojis for emotions. • Pillow (PIL):Supports image processing tasks in the GUI, such as displaying emotion-related images or emojis.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
    REAL WORLD APPLICATIONS • CustomerService and Call Centers • Healthcare and Mental Health Support • Smart Assistants and Human- Computer Interaction (HCI) • Education and Online Learning • Entertainment and Gaming
  • 17.
    CONCLUSION • Speech EmotionRecognition helps machines understand and respond to human emotions, making interactions more natural and user-friendly. • Hence our project presents a new way to give the ability to machine to determine the emotion with the help of the human voice. It will give the machine the ability to have a better approach towards having a better conversation and seamless conversation like human does.
  • 18.