AGENDA :
• Introduction
•Technologies Used
• Speech Dataset
• Block Diagram
• Process Flow
• GUI
• Real world Applications
• Conclusion
3.
INTRODUCTION
• Speech EmotionRecognition (SER) is a technology designed to
detect and interpret human emotions through spoken words. By
analyzing these aspects of speech, SER systems can identify the
emotion a person feels—like happiness, sadness, anger, or
calmness.
• ER enables computers to sense and respond to emotional states,
making interactions with technology more intuitive and human-
like.
• This project specifically focuses on developing a model that will
take audio input, analyze it, and output the detected emotion
4.
• This projectdetect human emotion via speech recognition by using
speech spectrogram. This detect the emotion by analysingthe
information inside the spectrogram.
• In this project we use Convolutional Neural Networks (CNNs)to
classify emotions, focusing on contrasting emotional states like
happiness and sadness.
• •There are a variety of temporal and spectral features that can be
extracted from human speech. We use statistics relating to the pitch,
Mel Frequency Cepstral Coefficients (MFCCs) and Formants of
speech as inputs to classification algorithms.
5.
TECHNOLOGIES USED
• AudioProcessing
• Librosa : Used to analyze and process speech signals.
• Extracts audio features like MFCCs (Mel-frequency Cepstral Coefficients), which are
crucial for representing the frequency characteristics of speech.
• Influencer Marketing
• Email Marketing Campaign
• Deep Learning
• sklearn(scikit-learn):For preprocessing tasks such as label encoding and splitting
datasets
• tensorflow / keras: For implementing and training neural networks.
6.
TECHNOLOGIES USED
• DataHandling
• NumPy : Enables efficient computation of large numerical datasets during feature
processing and model training.
• os: For file and directory operations.
• Visualization
• matplotlib: For visualizing training metrics, feature distributions, or results.
• Influencer Marketing
• Email Marketing Campaign
• GUI Development
• Tkinter: For creating the graphical user interface (GUI) of the emotion detection application.
• Pillow (PIL): For handling images within the GUI.
@reallygreatsite www.reallygreatsite.com
PROCESS FLOW
•Load speech audio files from a dataset(e.g., wav )
Audio Data Input
Preprocessing
• Noise Reduction: Apply techniques to filter out unwanted noise
• Resampling: Convert audio to a standard sample rate (e.g., 16
kHz).
• Normalization: Normalize the audio signal to ensure consistent
amplitude levels.
• Segmentation: If necessary, split long recordings into smaller
frames or windows.
10.
@reallygreatsite www.reallygreatsite.com
Feature Extraction
ModelTraining
• MFCC (Mel-Frequency Cepstral Coefficients): Extract tone
features that capture frequency characteristics.
• Libraries used: Librosa.
• Spectrograms: Visual representations of audio used for CNN-
based models
• Chroma Features: Additional features for capturing pitch and
frequency patterns.
• Train Deep Learning Models: Use models such as CNNs
(Convolutional Neural Networks) for emotion classification.
• CNNs: Work with spectrograms or image-like audio features.
• Train on Features: Input the extracted features into the
model for training
11.
@reallygreatsite www.reallygreatsite.com
Emotion Classification
GUIDevelopment
• Emotion Categories: Classify audio into predefined emotional
states such as Happy, Sad, Angry, Neutral , Surprised etc.
• Model Evaluation: Measure performance using accuracy,
confusion matrix, etc.
• Tkinter : Provides the graphical user interface (GUI) for the project,
allowing users to interact with the emotion detection system
intuitively.
• Enables functionality like uploading audio files, displaying
predictions, and integrating visual elements like emojis for
emotions.
• Pillow (PIL):Supports image processing tasks in the GUI, such as
displaying emotion-related images or emojis.
REAL WORLD
APPLICATIONS
• CustomerService and Call Centers
• Healthcare and Mental Health Support
• Smart Assistants and Human-
Computer Interaction (HCI)
• Education and Online Learning
• Entertainment and Gaming
17.
CONCLUSION
• Speech EmotionRecognition helps
machines understand and respond to
human emotions, making interactions more
natural and user-friendly.
• Hence our project presents a new way to
give the ability to machine to determine the
emotion with the help of the human voice. It
will give the machine the ability to have a
better approach towards having a better
conversation and seamless conversation
like human does.