Advancing Sentiment Analysis in Audio: Deep Learning & NLP approaches for Emotion detection in spoken language.pptx

Advancing Sentiment
Analysis in Audio: Deep
Learning & NLP
approaches for
Emotion detection in
spoken language

AGENDA :
• Introduction
• Technologies Used
• Speech Dataset
• Block Diagram
• Process Flow
• GUI
• Real world Applications
• Conclusion

INTRODUCTION
• Speech Emotion Recognition (SER) is a technology designed to
detect and interpret human emotions through spoken words. By
analyzing these aspects of speech, SER systems can identify the
emotion a person feels—like happiness, sadness, anger, or
calmness.
• ER enables computers to sense and respond to emotional states,
making interactions with technology more intuitive and human-
like.
• This project specifically focuses on developing a model that will
take audio input, analyze it, and output the detected emotion

• This project detect human emotion via speech recognition by using
speech spectrogram. This detect the emotion by analysingthe
information inside the spectrogram.
• In this project we use Convolutional Neural Networks (CNNs)to
classify emotions, focusing on contrasting emotional states like
happiness and sadness.
• •There are a variety of temporal and spectral features that can be
extracted from human speech. We use statistics relating to the pitch,
Mel Frequency Cepstral Coefficients (MFCCs) and Formants of
speech as inputs to classification algorithms.

TECHNOLOGIES USED
• Audio Processing
• Librosa : Used to analyze and process speech signals.
• Extracts audio features like MFCCs (Mel-frequency Cepstral Coefficients), which are
crucial for representing the frequency characteristics of speech.
• Influencer Marketing
• Email Marketing Campaign
• Deep Learning
• sklearn(scikit-learn):For preprocessing tasks such as label encoding and splitting
datasets
• tensorflow / keras: For implementing and training neural networks.

TECHNOLOGIES USED
• Data Handling
• NumPy : Enables efficient computation of large numerical datasets during feature
processing and model training.
• os: For file and directory operations.
• Visualization
• matplotlib: For visualizing training metrics, feature distributions, or results.
• Influencer Marketing
• Email Marketing Campaign
• GUI Development
• Tkinter: For creating the graphical user interface (GUI) of the emotion detection application.
• Pillow (PIL): For handling images within the GUI.

• OAF_angry 200
• OAF_disgust 200
• OAF_Fear 200
• OAF_happy 200
• OAF_neutral 200
• OAF_Pleasant_surprise 200
• OAF_Sad 200
• YAF_angry 200
• YAF_disgust 200
• YAF_fear 200
• YAF_happy 200
• YAF_pleasant_surprised 200
• YAF_neutral 200
• YAF_sad 200
• Total samples: 2800
SPEECH DATASET
•We referred dataset TESS (Toronto emotional speech set) Dataset.
•Speech part of dataset includes happy, sad, angry, fearful , natural,
surprise, and disgust emotions

@reallygreatsite www.reallygreatsite.com
PROCESS FLOW
• Load speech audio files from a dataset(e.g., wav )
Audio Data Input
Preprocessing
• Noise Reduction: Apply techniques to filter out unwanted noise
• Resampling: Convert audio to a standard sample rate (e.g., 16
kHz).
• Normalization: Normalize the audio signal to ensure consistent
amplitude levels.
• Segmentation: If necessary, split long recordings into smaller
frames or windows.

Feature Extraction
Model Training
• MFCC (Mel-Frequency Cepstral Coefficients): Extract tone
features that capture frequency characteristics.
• Libraries used: Librosa.
• Spectrograms: Visual representations of audio used for CNN-
based models
• Chroma Features: Additional features for capturing pitch and
frequency patterns.
• Train Deep Learning Models: Use models such as CNNs
(Convolutional Neural Networks) for emotion classification.
• CNNs: Work with spectrograms or image-like audio features.
• Train on Features: Input the extracted features into the
model for training

Emotion Classification
GUI Development
• Emotion Categories: Classify audio into predefined emotional
states such as Happy, Sad, Angry, Neutral , Surprised etc.
• Model Evaluation: Measure performance using accuracy,
confusion matrix, etc.
• Tkinter : Provides the graphical user interface (GUI) for the project,
allowing users to interact with the emotion detection system
intuitively.
• Enables functionality like uploading audio files, displaying
predictions, and integrating visual elements like emojis for
emotions.
• Pillow (PIL):Supports image processing tasks in the GUI, such as
displaying emotion-related images or emojis.

REAL WORLD
APPLICATIONS
• Customer Service and Call Centers
• Healthcare and Mental Health Support
• Smart Assistants and Human-
Computer Interaction (HCI)
• Education and Online Learning
• Entertainment and Gaming

CONCLUSION
• Speech Emotion Recognition helps
machines understand and respond to
human emotions, making interactions more
natural and user-friendly.
• Hence our project presents a new way to
give the ability to machine to determine the
emotion with the help of the human voice. It
will give the machine the ability to have a
better approach towards having a better
conversation and seamless conversation
like human does.

Advancing Sentiment Analysis in Audio: Deep Learning & NLP approaches for Emotion detection in spoken language.pptx

More Related Content

Similar to Advancing Sentiment Analysis in Audio: Deep Learning & NLP approaches for Emotion detection in spoken language.pptx

Recently uploaded

Advancing Sentiment Analysis in Audio: Deep Learning & NLP approaches for Emotion detection in spoken language.pptx