This research is done based on the identification and thorough analyzing
musical data that is extracted by the various method. This extracted
information can be utilized in the deep learning algorithm to identify the
emotion, based on the hidden features of the dataset. Deep learning-based
convolutional neural network (CNN) and long short-term memory-gated
recurrent unit (LSTM-GRU) models were developed to predict the
information from the musical information. The musical dataset is extracted
using the fast Fourier transform (FFT) models. The three deep learning
models were developed in this work the first model was based on the
information of extracted information such as zero-crossing rate, and spectral
roll-off. Another model was developed on the information of Mel frequencybased cepstral coefficient (MFCC) features, the deep and wide CNN
algorithm with LSTM-GRU bidirectional model was developed. The third
model was developed on the extracted information from Mel-spectrographs
and untied these graphs based on two-dimensional (2D) data information to
the 2D CNN model alongside LSTM models. Proposed model performance
on the information from Mel-spectrographs is compared on the F1 score,
precision, and classification report of the models. Which shows better accuracy with improved F1 and recall values as compared with existing approaches.
Mood Sensitive Music Recommendation SystemIRJET Journal
The document describes a mood-sensitive music recommendation system that uses facial expression analysis to determine a user's mood and recommend matching music. It analyzes the user's facial expressions in real-time using a webcam to infer their emotional state. The system then selects music that corresponds to the user's mood based on attributes like tempo and genre. For example, if the user seems sad, it may recommend slower, melancholic music, and if happy, more upbeat music. The goal is to provide a personalized listening experience and potentially improve the user's mood. The system could be applied in music streaming or retail environments.
IRJET - EMO-MUSIC(Emotion based Music Player)IRJET Journal
This document describes a proposed emotion-based music player system called EMO-MUSIC. The system uses facial expression recognition via a Haar cascade classifier to identify a user's emotion in real-time. It then generates a playlist of songs matching the detected emotion by accessing pre-defined music directories for each emotion category. This provides a more automated music selection process compared to traditional music players that require manual playlist selection. The system aims to reduce the time users spend browsing for music that suits their mood.
IRJET- Emotion based Music Recommendation SystemIRJET Journal
This document discusses an emotion-based music recommendation system that uses facial expression recognition to determine a user's mood and generate an appropriate playlist. It first discusses existing approaches to emotion detection in music, including models that represent emotions in two-dimensional spaces of arousal and valence. It then outlines the objectives of developing a new system using machine learning approaches to detect emotions from facial images in order to provide personalized music recommendations based on the user's mood. Finally, it reviews several related works that use techniques like Active Appearance Models, Bezier curve fitting, and support vector machines for facial expression analysis and emotion recognition from images.
Speech emotion recognition using 2D-convolutional neural networkIJECEIAES
This research proposes a speech emotion recognition model to predict human emotions using the convolutional neural network (CNN) by learning segmented audio of specific emotions. Speech emotion recognition utilizes the extracted features of audio waves to learn speech emotion characteristics; one of them is mel frequency cepstral coefficient (MFCC). Dataset takes a vital role to obtain valuable results in model learning. Hence this research provides the leverage of dataset combination implementation. The model learns a combined dataset with audio segmentation and zero padding using 2D-CNN. Audio segmentation and zero padding equalize the extracted audio features to learn the characteristics. The model results in 83.69% accuracy to predict seven emotions: neutral, happy, sad, angry, fear, disgust, and surprise from the combined dataset with the segmentation of the audio files.
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
Audio signals which include speech, music and environmental sounds are important types of media. The problem of distinguishing audio signals into these different audio types is thus becoming increasingly significant. A human listener can easily distinguish between different audio types by just listening to a short segment of an audio signal. However, solving this problem using computers has proven to be very difficult. Nevertheless, many systems with modest accuracy could still be implemented. The experimental results demonstrate the effectiveness of our classification system. The complete system is developed in ANN Techniques with Autonomic Computing system
SPEECH EMOTION RECOGNITION SYSTEM USING RNNIRJET Journal
This document discusses a speech emotion recognition system using recurrent neural networks (RNNs). It begins with an abstract describing speech emotion recognition and its importance. Then it provides background on speech emotion databases, feature extraction using MFCC, and classification approaches like RNNs. It reviews related work on speech emotion recognition using various methods. Finally, it concludes that MFCC feature extraction and RNN classification was used in the proposed system to take advantage of their performance in machine learning applications. The system aims to help machines understand human interaction and respond based on the user's emotion.
Knn a machine learning approach to recognize a musical instrumentIJARIIT
An outline is provided of a proposed system to recognize musical instruments using machine learning techniques. The system first extracts features from audio files using the MIR toolbox in Matlab. It then uses a hybrid feature selection method and vector quantization to identify instruments. Specifically, the key audio descriptors are selected and feature vectors are generated and matched to standard vectors to classify the instrument. The k-nearest neighbors algorithm is used for classification. Preliminary results show the system can accurately recognize instruments based on extracted acoustic features.
Mood Sensitive Music Recommendation SystemIRJET Journal
The document describes a mood-sensitive music recommendation system that uses facial expression analysis to determine a user's mood and recommend matching music. It analyzes the user's facial expressions in real-time using a webcam to infer their emotional state. The system then selects music that corresponds to the user's mood based on attributes like tempo and genre. For example, if the user seems sad, it may recommend slower, melancholic music, and if happy, more upbeat music. The goal is to provide a personalized listening experience and potentially improve the user's mood. The system could be applied in music streaming or retail environments.
IRJET - EMO-MUSIC(Emotion based Music Player)IRJET Journal
This document describes a proposed emotion-based music player system called EMO-MUSIC. The system uses facial expression recognition via a Haar cascade classifier to identify a user's emotion in real-time. It then generates a playlist of songs matching the detected emotion by accessing pre-defined music directories for each emotion category. This provides a more automated music selection process compared to traditional music players that require manual playlist selection. The system aims to reduce the time users spend browsing for music that suits their mood.
IRJET- Emotion based Music Recommendation SystemIRJET Journal
This document discusses an emotion-based music recommendation system that uses facial expression recognition to determine a user's mood and generate an appropriate playlist. It first discusses existing approaches to emotion detection in music, including models that represent emotions in two-dimensional spaces of arousal and valence. It then outlines the objectives of developing a new system using machine learning approaches to detect emotions from facial images in order to provide personalized music recommendations based on the user's mood. Finally, it reviews several related works that use techniques like Active Appearance Models, Bezier curve fitting, and support vector machines for facial expression analysis and emotion recognition from images.
Speech emotion recognition using 2D-convolutional neural networkIJECEIAES
This research proposes a speech emotion recognition model to predict human emotions using the convolutional neural network (CNN) by learning segmented audio of specific emotions. Speech emotion recognition utilizes the extracted features of audio waves to learn speech emotion characteristics; one of them is mel frequency cepstral coefficient (MFCC). Dataset takes a vital role to obtain valuable results in model learning. Hence this research provides the leverage of dataset combination implementation. The model learns a combined dataset with audio segmentation and zero padding using 2D-CNN. Audio segmentation and zero padding equalize the extracted audio features to learn the characteristics. The model results in 83.69% accuracy to predict seven emotions: neutral, happy, sad, angry, fear, disgust, and surprise from the combined dataset with the segmentation of the audio files.
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
Audio signals which include speech, music and environmental sounds are important types of media. The problem of distinguishing audio signals into these different audio types is thus becoming increasingly significant. A human listener can easily distinguish between different audio types by just listening to a short segment of an audio signal. However, solving this problem using computers has proven to be very difficult. Nevertheless, many systems with modest accuracy could still be implemented. The experimental results demonstrate the effectiveness of our classification system. The complete system is developed in ANN Techniques with Autonomic Computing system
SPEECH EMOTION RECOGNITION SYSTEM USING RNNIRJET Journal
This document discusses a speech emotion recognition system using recurrent neural networks (RNNs). It begins with an abstract describing speech emotion recognition and its importance. Then it provides background on speech emotion databases, feature extraction using MFCC, and classification approaches like RNNs. It reviews related work on speech emotion recognition using various methods. Finally, it concludes that MFCC feature extraction and RNN classification was used in the proposed system to take advantage of their performance in machine learning applications. The system aims to help machines understand human interaction and respond based on the user's emotion.
Knn a machine learning approach to recognize a musical instrumentIJARIIT
An outline is provided of a proposed system to recognize musical instruments using machine learning techniques. The system first extracts features from audio files using the MIR toolbox in Matlab. It then uses a hybrid feature selection method and vector quantization to identify instruments. Specifically, the key audio descriptors are selected and feature vectors are generated and matched to standard vectors to classify the instrument. The k-nearest neighbors algorithm is used for classification. Preliminary results show the system can accurately recognize instruments based on extracted acoustic features.
IRJET- Musical Therapy using Facial ExpressionsIRJET Journal
This document describes a proposed system for musical therapy using facial expression recognition. The system aims to automatically analyze a patient's facial expressions to determine their mood or emotion, then recommend a tailored music playlist intended to soothe or improve their mood. It discusses prior work on musical therapy and emotion recognition technologies. The proposed system would use image processing and machine learning algorithms like Fisherfaces to recognize emotions from facial images. It would also analyze music features and classify songs by genre using clustering algorithms. The goal is to match the identified emotion to an appropriate genre of music for therapy in a way that is more personalized and adaptive than traditional musical therapy.
An evolutionary optimization method for selecting features for speech emotion...TELKOMNIKA JOURNAL
Human-computer interactions benefit greatly from emotion recognition from speech. To promote a contact-free environment in this coronavirus disease 2019 (COVID’19) pandemic situation, most digitally based systems used speech-based devices. Consequently, this emotion detection from speech has many beneficial applications for pathology. The vast majority of speech emotion recognition (SER) systems are designed based on machine learning or deep learning models. Therefore, need greater computing power and requirements. This issue was addressed by developing traditional algorithms for feature selection. Recent research has shown that nature-inspired or evolutionary algorithms such as equilibrium optimization (EO) and cuckoo search (CS) based meta-heuristic approaches are superior to the traditional feature selection (FS) models in terms of recognition performance. The purpose of this study is to investigate the impact of feature selection meta-heuristic approaches on emotion recognition from speech. To achieve this, we selected the rayerson audio-visual database of emotional speech and song (RAVDESS) database and obtained maximum recognition accuracy of 89.64% using the EO algorithm and 92.71% using the CS algorithm. For this final step, we plotted the associated precision and F1 score for each of the emotional classes
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET Journal
This document discusses speech emotion recognition using principal component analysis (PCA). It analyzes speech features like mel frequency cepstral coefficients, pitch, energy, and formant frequency from the Berlin database containing emotions like anger, sadness, happiness, and fear. PCA is applied to reduce the feature dimension and decorrelate features. A support vector machine classifier is then used to classify emotions based on the PCA-processed features. Results show applying PCA improves the classification accuracy compared to without using PCA, with accuracy increasing from 68% to 64.5% on average.
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET Journal
This document describes a proposed emotion-based music recommendation system that uses facial expression recognition and an SVM algorithm. The system aims to suggest songs to users based on their detected emotion state in order to save them time in manually selecting songs. It would use computer vision components like OpenCV to determine a user's emotion from facial expressions. Once an emotion is recognized, the SVM model would suggest a song matching that emotion. The system aims to automate mood-based playlist creation and improve the music enjoyment experience. It outlines the methodology, including using OpenCV for facial recognition, an SVM algorithm to classify emotions detected, natural language processing for chatbot responses, and IFTTT for response recording.
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
Speech emotion recognition aims to identify the emotion expressed in the speech by analyzing the audio signals. In this work, data augmentation is first performed on the audio samples to increase the number of samples for better model learning. The audio samples are comprehensively encoded as the frequency and temporal domain features. In the classification, a light gradient boosting machine is leveraged. The hyperparameter tuning of the light gradient boosting machine is performed to determine the optimal hyperparameter settings. As the speech emotion recognition datasets are imbalanced, the class weights are regulated to be inversely proportional to the sample distribution where minority classes are assigned higher class weights. The experimental results demonstrate that the proposed method outshines the state-of-the-art methods with 84.91% accuracy on the Berlin database of emotional speech (emo-DB) dataset, 67.72% on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset, and 62.94% on the interactive emotional dyadic motion capture (IEMOCAP) dataset.
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...ijtsrd
Suctioning is a common procedure performed by nurses to maintain the gas exchange, adequate oxygenation and alveolar ventilation in critical ill patients under mechanical ventilation and aim of this research is to provide knowledge regarding maintaining airway patency with suctioning care that will help in the implementation of the quality of nursing care, eventually it will lead to better results. The planned study is a pre experimental study to assess the effectiveness of planned teaching programme on knowledge regarding airway patency on patients with mechanical ventilator among the B.Sc. internship students of selected college of nursing at Moradabad. To assess the level of knowledge regarding maintaining airway patency in patients with mechanical ventilator among B.Sc. Nursing internship students. To assess the effectiveness of planned teaching programme in term of knowledge regarding airway patency among B.Sc. nursing internship students. The purpose of this study is to examine the association between knowledge and effectiveness regarding airway patency among B.Sc. Nursing internship demographic students and their selected partner variables. A pre experimental study was conducted among 86 participants, selected by non probability convenient sampling method. Demographic Performa and self structured questionnaire was used to collect the data from the B.Sc. internship students. Nafees Ahmed | Sana Usmani "A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledge Regarding Maintaining Airway Patency in Patients with Mechanical Ventilator" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-1 , December 2021, URL: https://www.ijtsrd.com/papers/ijtsrd47917.pdf Paper URL: https://www.ijtsrd.com/medicine/nursing/47917/a-study-to-assess-the-effectiveness-of-planned-teaching-programme-on-knowledge-regarding-maintaining-airway-patency-in-patients-with-mechanical-ventilator/nafees-ahmed
Speech Emotion Recognition Using Neural Networksijtsrd
Speech is the most natural and easy method for people to communicate, and interpreting speech is one of the most sophisticated tasks that the human brain conducts. The goal of Speech Emotion Recognition SER is to identify human emotion from speech. This is due to the fact that tone and pitch of the voice frequently reflect underlying emotions. Librosa was used to analyse audio and music, sound file was used to read and write sampled sound file formats, and sklearn was used to create the model. The current study looked on the effectiveness of Convolutional Neural Networks CNN in recognising spoken emotions. The networks input characteristics are spectrograms of voice samples. Mel Frequency Cepstral Coefficients MFCC are used to extract characteristics from audio. Our own voice dataset is utilised to train and test our algorithms. The emotions of the speech happy, sad, angry, neutral, shocked, disgusted will be determined based on the evaluation. Anirban Chakraborty "Speech Emotion Recognition Using Neural Networks" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-1 , December 2021, URL: https://www.ijtsrd.com/papers/ijtsrd47958.pdf Paper URL: https://www.ijtsrd.com/other-scientific-research-area/other/47958/speech-emotion-recognition-using-neural-networks/anirban-chakraborty
Mehfil : Song Recommendation System Using Sentiment DetectedIRJET Journal
This document describes a song recommendation system called Mehfil that uses sentiment analysis to recommend songs based on a user's detected mood. It has three main modules: sentiment analysis using facial recognition and emotion detection on images via a deep learning model, music recommendation by classifying songs based on audio features and assigning mood labels, and integration using the Spotify API to generate personalized playlists based on the detected sentiment. The system aims to make creating mood-based playlists easier by analyzing a user's facial expression in real-time with their webcam to infer their mood and select an appropriate playlist of songs. It discusses the technologies used like Haar Cascade for face detection, MobileNetV2 for sentiment classification, and the Spotify API for music metadata and
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...HanzalaSiddiqui8
This research investigates emotion recognition using both speech and EEG data through machine learning techniques, achieving 97% accuracy. An artificial neural network model is designed to effectively extract features and learn representations from the integrated speech and EEG data, demonstrating the potential of this multimodal approach. The high accuracy attained underscores applications in human-computer interaction, healthcare, and more.
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
This document reviews techniques for speech-based emotion detection using deep learning. It discusses how deep learning techniques have been proposed as an alternative to traditional methods for speech emotion recognition. Feature extraction is an important part of speech emotion recognition, and deep learning can help minimize the complexity of extracted features. The document surveys related work on speech emotion recognition using techniques like deep neural networks, convolutional neural networks, recurrent neural networks, and more. It examines the limitations of current approaches and the potential for deep learning to improve speech-based emotion detection.
This document describes a student project on speech-based emotion recognition. The project uses convolutional neural networks (CNN) and mel-frequency cepstral coefficients (MFCC) to classify emotions in speech into categories like happy, sad, fearful, calm and angry. The proposed system provides advantages over existing systems by allowing variable length audio inputs, faster processing, and real-time classification of more emotion categories. It achieves a test accuracy of 91.04% according to the document.
Design and Analysis System of KNN and ID3 Algorithm for Music Classification ...IJECEIAES
Each of music which has been created, has its own mood which is emitted, therefore, there has been many researches in Music Information Retrieval (MIR) field that has been done for recognition of mood to music. This research produced software to classify music to the mood by using K-Nearest Neighbor and ID3 algorithm. In this research accuracy performance comparison and measurement of average classification time is carried out which is obtained based on the value produced from music feature extraction process. For music feature extraction process it uses 9 types of spectral analysis, consists of 400 practicing data and 400 testing data. The system produced outcome as classification label of mood type those are contentment, exuberance, depression and anxious. Classification by using algorithm of KNN is good enough that is 86.55% at k value = 3 and average processing time is 0.01021. Whereas by using ID3 it results accuracy of 59.33% and average of processing time is 0.05091 second.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
Speech Emotion Recognition is a recent research topic in the Human Computer Interaction (HCI) field. The need has risen for a more natural communication interface between humans and computer, as computers have become an integral part of our lives. A lot of work currently going on to improve the interaction between humans and computers. To achieve this goal, a computer would have to be able to distinguish its present situation and respond differently depending on that observation. Part of this process involves understanding a user‟s emotional state. To make the human computer interaction more natural, the objective is that computer should be able to recognize emotional states in the same as human does. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. The proposed system aims at identification of basic emotional states such as anger, joy, neutral and sadness from human speech. While classifying different emotions, features like MFCC (Mel Frequency Cepstral Coefficient) and Energy is used. In this paper, Standard Emotional Database i.e. English Database is used which gives the satisfactory detection of emotions than recorded samples of emotions. This methodology describes and compares the performances of Learning Vector Quantization Neural Network (LVQ NN), Multiclass Support Vector Machine (SVM) and their combination for emotion recognition.
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...IJECEIAES
Music has lyrics and audio. That‟s components can be a feature for music emotion classification. Lyric features were extracted from text data and audio features were extracted from audio signal data.In the classification of emotions, emotion corpus is required for lyrical feature extraction. Corpus Based Emotion (CBE) succeed to increase the value of F-Measure for emotion classification on text documents. The music document has an unstructured format compared with the article text document. So it requires good preprocessing and conversion process before classification process. We used MIREX Dataset for this research. Psycholinguistic and stylistic features were used as lyrics features. Psycholinguistic feature was a feature that related to the category of emotion. In this research, CBE used to support the extraction process of psycholinguistic feature. Stylistic features related with usage of unique words in the lyrics, e.g. „ooh‟, „ah‟, „yeah‟, etc. Energy, temporal and spectrum features were extracted for audio features.The best test result for music emotion classification was the application of Random Forest methods for lyrics and audio features. The value of F-measure was 56.8%.
This document provides a review of various emotion recognition systems using machine learning and deep learning techniques. It summarizes several papers that used different methods and algorithms for emotion recognition from facial expressions, speech signals, and handwriting. Convolutional neural networks (CNNs), support vector machines (SVMs), and recurrent neural networks (RNNs) were among the algorithms applied. The papers extracted features like MFCCs and analyzed techniques like data augmentation, but some methods had limitations like difficulty recognizing certain emotions or being affected by pose and illumination. Overall, the document reviewed emotion recognition research utilizing a range of inputs and machine learning approaches.
This document presents an audio-visual emotion recognition system that uses multiple modalities and machine learning techniques. It extracts audio features like MFCCs and visual features like facial landmarks from video clips. It uses classifiers like CNNs and stacks their confidence outputs to predict emotions. The system achieves state-of-the-art performance on several databases according to experiments. It represents an improvement over previous work by combining audio, visual and classifier fusion approaches for multimodal emotion recognition.
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
1. The document describes a study that uses a Convolutional Neural Network (CNN) model to classify music genres based on labeled Mel spectrograms of audio clips.
2. A CNN model is trained on a dataset of 1000 audio clips across 10 genres. The trained model is then used to classify new, unlabeled audio clips by genre based on their Mel spectrogram representation.
3. CNNs are well-suited for this task as their convolutional layers can extract hierarchical features from the Mel spectrogram images that are indicative of different genres. The study aims to develop an automated music genre classification system using deep learning techniques.
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34×16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
Scientific workload execution on a distributed computing platform such as a
cloud environment is time-consuming and expensive. The scientific workload
has task dependencies with different service level agreement (SLA)
prerequisites at different levels. Existing workload scheduling (WS) designs
are not efficient in assuring SLA at the task level. Alongside, induces higher
costs as the majority of scheduling mechanisms reduce either time or energy.
In reducing, cost both energy and makespan must be optimized together for
allocating resources. No prior work has considered optimizing energy and
processing time together in meeting task level SLA requirements. This paper
presents task level energy and performance assurance-workload scheduling
(TLEPA-WS) algorithm for the distributed computing environment. The
TLEPA-WS guarantees energy minimization with the performance
requirement of the parallel application under a distributed computational
environment. Experiment results show a significant reduction in using energy
and makespan; thereby reducing the cost of workload execution in comparison
with various standard workload execution models.
More Related Content
Similar to Emotion classification for musical data using deep learning techniques
IRJET- Musical Therapy using Facial ExpressionsIRJET Journal
This document describes a proposed system for musical therapy using facial expression recognition. The system aims to automatically analyze a patient's facial expressions to determine their mood or emotion, then recommend a tailored music playlist intended to soothe or improve their mood. It discusses prior work on musical therapy and emotion recognition technologies. The proposed system would use image processing and machine learning algorithms like Fisherfaces to recognize emotions from facial images. It would also analyze music features and classify songs by genre using clustering algorithms. The goal is to match the identified emotion to an appropriate genre of music for therapy in a way that is more personalized and adaptive than traditional musical therapy.
An evolutionary optimization method for selecting features for speech emotion...TELKOMNIKA JOURNAL
Human-computer interactions benefit greatly from emotion recognition from speech. To promote a contact-free environment in this coronavirus disease 2019 (COVID’19) pandemic situation, most digitally based systems used speech-based devices. Consequently, this emotion detection from speech has many beneficial applications for pathology. The vast majority of speech emotion recognition (SER) systems are designed based on machine learning or deep learning models. Therefore, need greater computing power and requirements. This issue was addressed by developing traditional algorithms for feature selection. Recent research has shown that nature-inspired or evolutionary algorithms such as equilibrium optimization (EO) and cuckoo search (CS) based meta-heuristic approaches are superior to the traditional feature selection (FS) models in terms of recognition performance. The purpose of this study is to investigate the impact of feature selection meta-heuristic approaches on emotion recognition from speech. To achieve this, we selected the rayerson audio-visual database of emotional speech and song (RAVDESS) database and obtained maximum recognition accuracy of 89.64% using the EO algorithm and 92.71% using the CS algorithm. For this final step, we plotted the associated precision and F1 score for each of the emotional classes
IRJET- Study of Effect of PCA on Speech Emotion RecognitionIRJET Journal
This document discusses speech emotion recognition using principal component analysis (PCA). It analyzes speech features like mel frequency cepstral coefficients, pitch, energy, and formant frequency from the Berlin database containing emotions like anger, sadness, happiness, and fear. PCA is applied to reduce the feature dimension and decorrelate features. A support vector machine classifier is then used to classify emotions based on the PCA-processed features. Results show applying PCA improves the classification accuracy compared to without using PCA, with accuracy increasing from 68% to 64.5% on average.
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET Journal
This document describes a proposed emotion-based music recommendation system that uses facial expression recognition and an SVM algorithm. The system aims to suggest songs to users based on their detected emotion state in order to save them time in manually selecting songs. It would use computer vision components like OpenCV to determine a user's emotion from facial expressions. Once an emotion is recognized, the SVM model would suggest a song matching that emotion. The system aims to automate mood-based playlist creation and improve the music enjoyment experience. It outlines the methodology, including using OpenCV for facial recognition, an SVM algorithm to classify emotions detected, natural language processing for chatbot responses, and IFTTT for response recording.
We propose a model for carrying out deep learning based multimodal sentiment analysis. The MOUD dataset is taken for experimentation purposes. We developed two parallel text based and audio basedmodels and further, fused these heterogeneous feature maps taken from intermediate layers to complete thearchitecture. Performance measures–Accuracy, precision, recall and F1-score–are observed to outperformthe existing models.
Speech emotion recognition with light gradient boosting decision trees machineIJECEIAES
Speech emotion recognition aims to identify the emotion expressed in the speech by analyzing the audio signals. In this work, data augmentation is first performed on the audio samples to increase the number of samples for better model learning. The audio samples are comprehensively encoded as the frequency and temporal domain features. In the classification, a light gradient boosting machine is leveraged. The hyperparameter tuning of the light gradient boosting machine is performed to determine the optimal hyperparameter settings. As the speech emotion recognition datasets are imbalanced, the class weights are regulated to be inversely proportional to the sample distribution where minority classes are assigned higher class weights. The experimental results demonstrate that the proposed method outshines the state-of-the-art methods with 84.91% accuracy on the Berlin database of emotional speech (emo-DB) dataset, 67.72% on the Ryerson audio-visual database of emotional speech and song (RAVDESS) dataset, and 62.94% on the interactive emotional dyadic motion capture (IEMOCAP) dataset.
A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledg...ijtsrd
Suctioning is a common procedure performed by nurses to maintain the gas exchange, adequate oxygenation and alveolar ventilation in critical ill patients under mechanical ventilation and aim of this research is to provide knowledge regarding maintaining airway patency with suctioning care that will help in the implementation of the quality of nursing care, eventually it will lead to better results. The planned study is a pre experimental study to assess the effectiveness of planned teaching programme on knowledge regarding airway patency on patients with mechanical ventilator among the B.Sc. internship students of selected college of nursing at Moradabad. To assess the level of knowledge regarding maintaining airway patency in patients with mechanical ventilator among B.Sc. Nursing internship students. To assess the effectiveness of planned teaching programme in term of knowledge regarding airway patency among B.Sc. nursing internship students. The purpose of this study is to examine the association between knowledge and effectiveness regarding airway patency among B.Sc. Nursing internship demographic students and their selected partner variables. A pre experimental study was conducted among 86 participants, selected by non probability convenient sampling method. Demographic Performa and self structured questionnaire was used to collect the data from the B.Sc. internship students. Nafees Ahmed | Sana Usmani "A Study to Assess the Effectiveness of Planned Teaching Programme on Knowledge Regarding Maintaining Airway Patency in Patients with Mechanical Ventilator" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-1 , December 2021, URL: https://www.ijtsrd.com/papers/ijtsrd47917.pdf Paper URL: https://www.ijtsrd.com/medicine/nursing/47917/a-study-to-assess-the-effectiveness-of-planned-teaching-programme-on-knowledge-regarding-maintaining-airway-patency-in-patients-with-mechanical-ventilator/nafees-ahmed
Speech Emotion Recognition Using Neural Networksijtsrd
Speech is the most natural and easy method for people to communicate, and interpreting speech is one of the most sophisticated tasks that the human brain conducts. The goal of Speech Emotion Recognition SER is to identify human emotion from speech. This is due to the fact that tone and pitch of the voice frequently reflect underlying emotions. Librosa was used to analyse audio and music, sound file was used to read and write sampled sound file formats, and sklearn was used to create the model. The current study looked on the effectiveness of Convolutional Neural Networks CNN in recognising spoken emotions. The networks input characteristics are spectrograms of voice samples. Mel Frequency Cepstral Coefficients MFCC are used to extract characteristics from audio. Our own voice dataset is utilised to train and test our algorithms. The emotions of the speech happy, sad, angry, neutral, shocked, disgusted will be determined based on the evaluation. Anirban Chakraborty "Speech Emotion Recognition Using Neural Networks" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-1 , December 2021, URL: https://www.ijtsrd.com/papers/ijtsrd47958.pdf Paper URL: https://www.ijtsrd.com/other-scientific-research-area/other/47958/speech-emotion-recognition-using-neural-networks/anirban-chakraborty
Mehfil : Song Recommendation System Using Sentiment DetectedIRJET Journal
This document describes a song recommendation system called Mehfil that uses sentiment analysis to recommend songs based on a user's detected mood. It has three main modules: sentiment analysis using facial recognition and emotion detection on images via a deep learning model, music recommendation by classifying songs based on audio features and assigning mood labels, and integration using the Spotify API to generate personalized playlists based on the detected sentiment. The system aims to make creating mood-based playlists easier by analyzing a user's facial expression in real-time with their webcam to infer their mood and select an appropriate playlist of songs. It discusses the technologies used like Haar Cascade for face detection, MobileNetV2 for sentiment classification, and the Spotify API for music metadata and
Emotion Recognition based on Speech and EEG Using Machine Learning Techniques...HanzalaSiddiqui8
This research investigates emotion recognition using both speech and EEG data through machine learning techniques, achieving 97% accuracy. An artificial neural network model is designed to effectively extract features and learn representations from the integrated speech and EEG data, demonstrating the potential of this multimodal approach. The high accuracy attained underscores applications in human-computer interaction, healthcare, and more.
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
This document reviews techniques for speech-based emotion detection using deep learning. It discusses how deep learning techniques have been proposed as an alternative to traditional methods for speech emotion recognition. Feature extraction is an important part of speech emotion recognition, and deep learning can help minimize the complexity of extracted features. The document surveys related work on speech emotion recognition using techniques like deep neural networks, convolutional neural networks, recurrent neural networks, and more. It examines the limitations of current approaches and the potential for deep learning to improve speech-based emotion detection.
This document describes a student project on speech-based emotion recognition. The project uses convolutional neural networks (CNN) and mel-frequency cepstral coefficients (MFCC) to classify emotions in speech into categories like happy, sad, fearful, calm and angry. The proposed system provides advantages over existing systems by allowing variable length audio inputs, faster processing, and real-time classification of more emotion categories. It achieves a test accuracy of 91.04% according to the document.
Design and Analysis System of KNN and ID3 Algorithm for Music Classification ...IJECEIAES
Each of music which has been created, has its own mood which is emitted, therefore, there has been many researches in Music Information Retrieval (MIR) field that has been done for recognition of mood to music. This research produced software to classify music to the mood by using K-Nearest Neighbor and ID3 algorithm. In this research accuracy performance comparison and measurement of average classification time is carried out which is obtained based on the value produced from music feature extraction process. For music feature extraction process it uses 9 types of spectral analysis, consists of 400 practicing data and 400 testing data. The system produced outcome as classification label of mood type those are contentment, exuberance, depression and anxious. Classification by using algorithm of KNN is good enough that is 86.55% at k value = 3 and average processing time is 0.01021. Whereas by using ID3 it results accuracy of 59.33% and average of processing time is 0.05091 second.
Convolutional neural network with binary moth flame optimization for emotion ...IAESIJAI
Electroencephalograph (EEG) signals have the ability of real-time reflecting brain activities. Utilizing the EEG signal for analyzing human emotional states is a common study. The EEG signals of the emotions aren’t distinctive and it is different from one person to another as every one of them has different emotional responses to same stimuli. Which is why, the signals of the EEG are subject dependent and proven to be effective for the subject dependent detection of the Emotions. For the purpose of achieving enhanced accuracy and high true positive rate, the suggested system proposed a binary moth flame optimization (BMFO) algorithm for the process of feature selection and convolutional neural networks (CNNs) for classifications. In this proposal, optimum features are chosen with the use of accuracy as objective function. Ultimately, optimally chosen features are classified after that with the use of a CNN for the purpose of discriminating different emotion states.
Speech Emotion Recognition is a recent research topic in the Human Computer Interaction (HCI) field. The need has risen for a more natural communication interface between humans and computer, as computers have become an integral part of our lives. A lot of work currently going on to improve the interaction between humans and computers. To achieve this goal, a computer would have to be able to distinguish its present situation and respond differently depending on that observation. Part of this process involves understanding a user‟s emotional state. To make the human computer interaction more natural, the objective is that computer should be able to recognize emotional states in the same as human does. The efficiency of emotion recognition system depends on type of features extracted and classifier used for detection of emotions. The proposed system aims at identification of basic emotional states such as anger, joy, neutral and sadness from human speech. While classifying different emotions, features like MFCC (Mel Frequency Cepstral Coefficient) and Energy is used. In this paper, Standard Emotional Database i.e. English Database is used which gives the satisfactory detection of emotions than recorded samples of emotions. This methodology describes and compares the performances of Learning Vector Quantization Neural Network (LVQ NN), Multiclass Support Vector Machine (SVM) and their combination for emotion recognition.
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...IJECEIAES
Music has lyrics and audio. That‟s components can be a feature for music emotion classification. Lyric features were extracted from text data and audio features were extracted from audio signal data.In the classification of emotions, emotion corpus is required for lyrical feature extraction. Corpus Based Emotion (CBE) succeed to increase the value of F-Measure for emotion classification on text documents. The music document has an unstructured format compared with the article text document. So it requires good preprocessing and conversion process before classification process. We used MIREX Dataset for this research. Psycholinguistic and stylistic features were used as lyrics features. Psycholinguistic feature was a feature that related to the category of emotion. In this research, CBE used to support the extraction process of psycholinguistic feature. Stylistic features related with usage of unique words in the lyrics, e.g. „ooh‟, „ah‟, „yeah‟, etc. Energy, temporal and spectrum features were extracted for audio features.The best test result for music emotion classification was the application of Random Forest methods for lyrics and audio features. The value of F-measure was 56.8%.
This document provides a review of various emotion recognition systems using machine learning and deep learning techniques. It summarizes several papers that used different methods and algorithms for emotion recognition from facial expressions, speech signals, and handwriting. Convolutional neural networks (CNNs), support vector machines (SVMs), and recurrent neural networks (RNNs) were among the algorithms applied. The papers extracted features like MFCCs and analyzed techniques like data augmentation, but some methods had limitations like difficulty recognizing certain emotions or being affected by pose and illumination. Overall, the document reviewed emotion recognition research utilizing a range of inputs and machine learning approaches.
This document presents an audio-visual emotion recognition system that uses multiple modalities and machine learning techniques. It extracts audio features like MFCCs and visual features like facial landmarks from video clips. It uses classifiers like CNNs and stacks their confidence outputs to predict emotions. The system achieves state-of-the-art performance on several databases according to experiments. It represents an improvement over previous work by combining audio, visual and classifier fusion approaches for multimodal emotion recognition.
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
1. The document describes a study that uses a Convolutional Neural Network (CNN) model to classify music genres based on labeled Mel spectrograms of audio clips.
2. A CNN model is trained on a dataset of 1000 audio clips across 10 genres. The trained model is then used to classify new, unlabeled audio clips by genre based on their Mel spectrogram representation.
3. CNNs are well-suited for this task as their convolutional layers can extract hierarchical features from the Mel spectrogram images that are indicative of different genres. The study aims to develop an automated music genre classification system using deep learning techniques.
Similar to Emotion classification for musical data using deep learning techniques (20)
Because of the rapid growth in technology breakthroughs, including
multimedia and cell phones, Telugu character recognition (TCR) has recently
become a popular study area. It is still necessary to construct automated and
intelligent online TCR models, even if many studies have focused on offline
TCR models. The Telugu character dataset construction and validation using
an Inception and ResNet-based model are presented. The collection of 645
letters in the dataset includes 18 Achus, 38 Hallus, 35 Othulu, 34×16
Guninthamulu, and 10 Ankelu. The proposed technique aims to efficiently
recognize and identify distinctive Telugu characters online. This model's main
pre-processing steps to achieve its goals include normalization, smoothing,
and interpolation. Improved recognition performance can be attained by using
stochastic gradient descent (SGD) to optimize the model's hyperparameters.
Scientific workload execution on a distributed computing platform such as a
cloud environment is time-consuming and expensive. The scientific workload
has task dependencies with different service level agreement (SLA)
prerequisites at different levels. Existing workload scheduling (WS) designs
are not efficient in assuring SLA at the task level. Alongside, induces higher
costs as the majority of scheduling mechanisms reduce either time or energy.
In reducing, cost both energy and makespan must be optimized together for
allocating resources. No prior work has considered optimizing energy and
processing time together in meeting task level SLA requirements. This paper
presents task level energy and performance assurance-workload scheduling
(TLEPA-WS) algorithm for the distributed computing environment. The
TLEPA-WS guarantees energy minimization with the performance
requirement of the parallel application under a distributed computational
environment. Experiment results show a significant reduction in using energy
and makespan; thereby reducing the cost of workload execution in comparison
with various standard workload execution models.
Investigating human subjects is the goal of predicting human emotions in the
real world scenario. A significant number of psychological effects require
(feelings) to be produced, directly releasing human emotions. The
development of effect theory leads one to believe that one must be aware of
one's sentiments and emotions to forecast one's behavior. The proposed line
of inquiry focuses on developing a reliable model incorporating
neurophysiological data into actual feelings. Any change in emotional affect
will directly elicit a response in the body's physiological systems. This
approach is named after the notion of Gaussian mixture models (GMM). The
statistical reaction following data processing, quantitative findings on emotion
labels, and coincidental responses with training samples all directly impact the
outcomes that are accomplished. In terms of statistical parameters such as
population mean and standard deviation, the suggested method is evaluated
compared to a technique considered to be state-of-the-art. The proposed
system determines an individual's emotional state after a minimum of 6
iterative learning using the Gaussian expectation-maximization (GEM)
statistical model, in which the iterations tend to continue to zero error. Perhaps
each of these improves predictions while simultaneously increasing the
amount of value extracted.
Early diagnosis of cancers is a major requirement for patients and a
complicated job for the oncologist. If it is diagnosed early, it could have made
the patient more likely to live. For a few decades, fuzzy logic emerged as an
emphatic technique in the identification of diseases like different types of
cancers. The recognition of cancer diseases mostly operated with inexactness,
inaccuracy, and vagueness. This paper aims to design the fuzzy expert system
(FES) and its implementation for the detection of prostate cancer. Specifically,
prostate-specific antigen (PSA), prostate volume (PV), age, and percentage
free PSA (%FPSA) are used to determine prostate cancer risk (PCR), while
PCR serves as an output parameter. Mamdani fuzzy inference method is used
to calculate a range of PCR. The system provides a scale of risk of prostate
cancer and clears the path for the oncologist to determine whether their
patients need a biopsy. This system is fast as it requires minimum calculation
and hence comparatively less time which reduces mortality and morbidity and
is more reliable than other economic systems and can be frequently used by
doctors.
The biomedical profession has gained importance due to the rapid and accurate diagnosis of clinical patients using computer-aided diagnosis (CAD) tools.
The diagnosis and treatment of Alzheimer’s disease (AD) using complementary multimodalities can improve the quality of life and mental state of patients.
In this study, we integrated a lightweight custom convolutional neural network
(CNN) model and nature-inspired optimization techniques to enhance the performance, robustness, and stability of progress detection in AD. A multi-modal
fusion database approach was implemented, including positron emission tomography (PET) and magnetic resonance imaging (MRI) datasets, to create a fused
database. We compared the performance of custom and pre-trained deep learning models with and without optimization and found that employing natureinspired algorithms like the particle swarm optimization algorithm (PSO) algorithm significantly improved system performance. The proposed methodology,
which includes a fused multimodality database and optimization strategy, improved performance metrics such as training, validation, test accuracy, precision, and recall. Furthermore, PSO was found to improve the performance of
pre-trained models by 3-5% and custom models by up to 22%. Combining different medical imaging modalities improved the overall model performance by
2-5%. In conclusion, a customized lightweight CNN model and nature-inspired
optimization techniques can significantly enhance progress detection, leading to
better biomedical research and patient care.
Class imbalance is a pervasive issue in the field of disease classification from
medical images. It is necessary to balance out the class distribution while training a model. However, in the case of rare medical diseases, images from affected
patients are much harder to come by compared to images from non-affected
patients, resulting in unwanted class imbalance. Various processes of tackling
class imbalance issues have been explored so far, each having its fair share of
drawbacks. In this research, we propose an outlier detection based image classification technique which can handle even the most extreme case of class imbalance. We have utilized a dataset of malaria parasitized and uninfected cells. An
autoencoder model titled AnoMalNet is trained with only the uninfected cell images at the beginning and then used to classify both the affected and non-affected
cell images by thresholding a loss value. We have achieved an accuracy, precision, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively,
performing better than large deep learning models and other published works.
As our proposed approach can provide competitive results without needing the
disease-positive samples during training, it should prove to be useful in binary
disease classification on imbalanced datasets.
Recently, plant identification has become an active trend due to encouraging
results achieved in plant species detection and plant classification fields
among numerous available plants using deep learning methods. Therefore,
plant classification analysis is performed in this work to address the problem
of accurate plant species detection in the presence of multiple leaves together,
flowers, and noise. Thus, a convolutional neural network based deep feature
learning and classification (CNN-DFLC) model is designed to analyze
patterns of plant leaves and perform classification using generated finegrained feature weights. The proposed CNN-DFLC model precisely estimates
which the given image belongs to which plant species. Several layers and
blocks are utilized to design the proposed CNN-DFLC model. Fine-grained
feature weights are obtained using convolutional and pooling layers. The
obtained feature maps in training are utilized to predict labels and model
performance is tested on the Vietnam plant image (VPN-200) dataset. This
dataset consists of a total number of 20,000 images and testing results are
achieved in terms of classification accuracy, precision, recall, and other
performance metrics. The mean classification accuracy obtained using the
proposed CNN-DFLC model is 96.42% considering all 200 classes from the
VPN-200 dataset.
Big data as a service (BDaaS) platform is widely used by various
organizations for handling and processing the high volume of data generated
from different internet of things (IoT) devices. Data generated from these IoT
devices are kept in the form of big data with the help of cloud computing
technology. Researchers are putting efforts into providing a more secure and
protected access environment for the data available on the cloud. In order to
create a safe, distributed, and decentralised environment in the cloud,
blockchain technology has emerged as a useful tool. In this research paper, we
have proposed a system that uses blockchain technology as a tool to regulate
data access that is provided by BDaaS platforms. We are securing the access
policy of data by using a modified form of ciphertext policy-attribute based
encryption (CP-ABE) technique with the help of blockchain technology. For
secure data access in BDaaS, algorithms have been created using a mix of CPABE with blockchain technology. Proposed smart contract algorithms are
implemented using Eclipse 7.0 IDE and the cloud environment has been
simulated on CloudSim tool. Results of key generation time, encryption time,
and decryption time has been calculated and compared with access control
mechanism without blockchain technology.
Internet of things (IoT) has become one of the eminent phenomena in human
life along with its collaboration with wireless sensor networks (WSNs), due
to enormous growth in the domain; there has been a demand to address the
various issues regarding it such as energy consumption, redundancy, and
overhead. Data aggregation (DA) is considered as the basic mechanism to
minimize the energy efficiency and communication overhead; however,
security plays an important role where node security is essential due to the
volatile nature of WSN. Thus, we design and develop proximate node aware
secure data aggregation (PNA-SDA). In the PNA-SDA mechanism, additional
data is used to secure the original data, and further information is shared with
the proximate node; moreover, further security is achieved by updating the
state each time. Moreover, the node that does not have updated information is
considered as the compromised node and discarded. PNA-SDA is evaluated
considering the different parameters like average energy consumption, and
average deceased node; also, comparative analysis is carried out with the
existing model in terms of throughput and correct packet identification.
Drones provide an alternative progression in protection submissions since
they are capable of conducting autonomous seismic investigations. Recent
advancement in unmanned aerial vehicle (UAV) communication is an internet
of a drone combined with 5G networks. Because of the quick utilization of
rapidly progressed registering frameworks besides 5G officialdoms, the
information from the user is consistently refreshed and pooled. Thus, safety
or confidentiality is vital among clients, and a proficient substantiation
methodology utilizing a vigorous sanctuary key. Conventional procedures
ensure a few restrictions however taking care of the assault arrangements in
information transmission over the internet of drones (IOD) environmental
frameworks. A unique hyperelliptical curve (HEC) cryptographically based
validation system is proposed to provide protected data facilities among
drones. The proposed method has been compared with the existing methods
in terms of packet loss rate, computational cost, and delay and thereby
provides better insight into efficient and secure communication. Finally, the
simulation results show that our strategy is efficient in both computation and
communication.
Monitoring behavior, numerous actions, or any such information is considered
as surveillance and is done for information gathering, influencing, managing,
or directing purposes. Citizens employ surveillance to safeguard their
communities. Governments do this for the purposes of intelligence collection,
including espionage, crime prevention, the defense of a method, a person, a
group, or an item; or the investigation of criminal activity. Using an internet
of things (IoT) rover, the area will be secured with better secrecy and
efficiency instead of humans, will provide an additional safety step. In this
paper, there is a discussion about an IoT rover for remote surveillance based
around a Raspberry Pi microprocessor which will be able to monitor a
closed/open space. This rover will allow safer survey operations and would
help to reduce the risks involved with it.
In a world where climate change looms large the spotlight often shines on
greenhouse gases, but the shadow of man-made aerosols should not be
underestimated. These tiny particles play a pivotal role in disrupting Earth's
radiative equilibrium, yet many mysteries surround their influence on various
physical aspects of our planet. The root of these mysteries lies in the limited
data we have on aerosol sources, formation processes, conversion dynamics,
and collection methods. Aerosols, composed of particulate matter (PM),
sulfates, and nitrates, hold significant sway across the hemisphere. Accurate
measurement demands the refinement of in-situ, satellite, and ground-based
techniques. As aerosols interact intricately with the environment, their full
impact remains an enigma. Enter a groundbreaking study in Morocco that
dared to compare an internet of thing (IoT) system with satellite-based
atmospheric models, with a focus on fine particles below 10 and 2.5
micrometers in diameter. The initial results, particularly in regions abundant
with extraction pits, shed light on the IoT system's potential to decode
aerosols' role in the grand narrative of climate change. These findings inspire
hope as we confront the formidable global challenge of climate change.
The use of technology has a significant impact to reduce the consequences of
accidents. Sensors, small components that detect interactions experienced by
various components, play a crucial role in this regard. This study focuses on
how the MPU6050 sensor module can be used to detect the movement of
people who are falling, defined as the inability of the lower body, including
the hips and feet, to support the body effectively. An airbag system is
proposed to reduce the impact of a fall. The data processing method in this
study involves the use of a threshold value to identify falling motion. The
results of the study have identified a threshold value for falling motion,
including an acceleration relative (AR) value of less than or equal to 0.38 g,
an angle slope of more than or equal to 40 degrees, and an angular velocity
of more than or equal to 30 °/s. The airbag system is designed to inflate
faster than the time of impact, with a gas flow rate of 0.04876 m3
/s and an
inflating time of 0.05 s. The overall system has a specificity value of 100%,
a sensitivity of 85%, and an accuracy of 94%.
The fundamental principle of the paper is that the soil moisture sensor obtains
the moisture content level of the soil sample. The water pump is automatically
activated if the moisture content is insufficient, which causes water to flow
into the soil. The water pump is immediately turned off when the moisture
content is high enough. Smart home, smart city, smart transportation, and
smart farming are just a few of the new intelligent ideas that internet of things
(IoT) includes. The goal of this method is to increase productivity and
decrease manual labour among farmers. In this paper, we present a system for
monitoring and regulating water flow that employs a soil moisture sensor to
keep track of soil moisture content as well as the land’s water level to keep
track of and regulate the amount of water supplied to the plant. The device
also includes an automated led lighting system.
In order to provide sensing services to low-powered IoT devices, wireless sensor networks (WSNs) organize specialized transducers into networks. Energy usage is one of the most important design concerns in WSN because it is very hard to replace or recharge the batteries in sensor nodes. For an energy-constrained network, the clustering technique is crucial in preserving battery life. By strategically selecting a cluster head (CH), a network's load can be balanced, resulting in decreased energy usage and extended system life. Although clustering has been predominantly used in the literature, the concept of chain-based clustering has not yet been explored. As a result, in this paper, we employ a chain-based clustering architecture for data dissemination in the network. Furthermore, for CH selection, we employ the coati optimisation algorithm, which was recently proposed and has demonstrated significant improvement over other optimization algorithms. In this method, the parameters considered for selecting the CH are energy, node density, distance, and the network’s average energy. The simulation results show tremendous improvement over the competitive cluster-based routing algorithms in the context of network lifetime, stability period (first node dead), transmission rate, and the network's power reserves.
The construction industry is an industry that is always surrounded by
uncertainties and risks. The industry is always associated with a threatindustry which has a complex, tedious layout and techniques characterized by
unpredictable circumstances. It comprises a variety of human talents and the
coordination of different areas and activities associated with it. In this
competitive era of the construction industry, delays and cost overruns of the
project are often common in every project and the causes of that are also
common. One of the problems which we are trying to cater to is the improper
handling of materials at the construction site. In this paper, we propose
developing a system that is capable of tracking construction material on site
that would benefit the contractor and client for better control over inventory
on-site and to minimize loss of material that occurs due to theft and misplacing
of materials.
Today, health monitoring relies heavily on technological advancements. This
study proposes a low-power wide-area network (LPWAN) based, multinodal
health monitoring system to monitor vital physiological data. The suggested
system consists of two nodes, an indoor node, and an outdoor node, and the
nodes communicate via long range (LoRa) transceivers. Outdoor nodes use an
MPU6050 module, heart rate, oxygen pulse, temperature, and skin resistance
sensors and transmit sensed values to the indoor node. We transferred the data
received by the master node to the cloud using the Adafruit cloud service. The
system can operate with a coverage of 4.5 km, where the optimal distance
between outdoor sensor nodes and the indoor master node is 4 km. To further
predict fall detection, various machine learning classification techniques have
been applied. Upon comparing various classifier techniques, the decision tree
method achieved an accuracy of 0.99864 with a training and testing ratio of
70:30. By developing accurate prediction models, we can identify high-risk
individuals and implement preventative measures to reduce the likelihood of
a fall occurring. Remote monitoring of the health and physical status of elderly
people has proven to be the most beneficial application of this technology.
The effectiveness of adaptive filters are mainly dependent on the design
techniques and the algorithm of adaptation. The most common adaptation
technique used is least mean square (LMS) due its computational simplicity.
The application depends on the adaptive filter configuration used and are well
known for system identification and real time applications. In this work, a
modified delayed μ-law proportionate normalized least mean square
(DMPNLMS) algorithm has been proposed. It is the improvised version of the
µ-law proportionate normalized least mean square (MPNLMS) algorithm.
The algorithm is realized using Ladner-Fischer type of parallel prefix
logarithmic adder to reduce the silicon area. The simulation and
implementation of very large-scale integration (VLSI) architecture are done
using MATLAB, Vivado suite and complementary metal–oxide–
semiconductor (CMOS) 90 nm technology node using Cadence RTL and
Genus Compiler respectively. The DMPNLMS method exhibits a reduction
in mean square error, a higher rate of convergence, and more stability. The
synthesis results demonstrate that it is area and delay effective, making it
practical for applications where a faster operating speed is required.
The increasing demand for faster, robust, and efficient device development of enabling technology to mass production of industrial research in circuit design deals with challenges like size, efficiency, power, and scalability. This paper, presents a design and analysis of low power high speed full adder using negative capacitance field effecting transistors. A comprehensive study is performed with adiabatic logic and reversable logic. The performance of full adder is studied with metal oxide field effect transistor (MOSFET) and negative capacitance field effecting (NCFET). The NCFET based full adder offers a low power and high speed compared with conventional MOSFET. The complete design and analysis are performed using cadence virtuoso. The adiabatic logic offering low delay of 0.023 ns and reversable logic is offering low power of 7.19 mw.
The global agriculture system faces significant challenges in meeting the
growing demand for food production, particularly given projections that the
world's population will reach 70% by 2050. Hydroponic farming is an
increasingly popular technique in this field, offering a promising solution to
these challenges. This paper will present the improvement of the current
traditional hydroponic method by providing a system that can be used to
monitor and control the important element in order to help the plant grow up
smoothly. This proposed system is quite efficient and user-friendly that can
be used by anyone. This is a combination of a traditional hydroponic system,
an automatic control system and a smartphone. The primary objective is to
develop a smart system capable of monitoring and controlling potential
hydrogen (pH) levels, a key factor that affects hydroponic plant growth.
Ultimately, this paper offers an alternative approach to address the challenges
of the existing agricultural system and promote the production of clean,
disease-free, and healthy food for a better future.
More from International Journal of Reconfigurable and Embedded Systems (20)
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Emotion classification for musical data using deep learning techniques
1. International Journal of Reconfigurable and Embedded Systems (IJRES)
Vol. 12, No. 2, July 2023, pp. 240~247
ISSN: 2089-4864, DOI: 10.11591/ijres.v12.i2.pp240-247 240
Journal homepage: http://ijres.iaescore.com
Emotion classification for musical data using deep learning
techniques
Gaurav Agarwal1
, Sachi Gupta2
, Shivani Agarwal3
, Atul Kumar Rai4
1
Department of Computer Science and Engineering, KIET Group of Institutions, Ghaziabad, India
2
Department of Information Technology, IMS Engineering College, Ghaziabad, India
3
Department of Information Technology, Ajay Kumar Garg Engineering College, Ghaziabad, India
4
Department of Computer Science and Engineering, Kothiwal Institute of Technology and Professional Studies, Moradabad, India
Article Info ABSTRACT
Article history:
Received Oct 8, 2022
Revised Dec 10, 2022
Accepted Feb 18, 2023
This research is done based on the identification and thorough analyzing
musical data that is extracted by the various method. This extracted
information can be utilized in the deep learning algorithm to identify the
emotion, based on the hidden features of the dataset. Deep learning-based
convolutional neural network (CNN) and long short-term memory-gated
recurrent unit (LSTM-GRU) models were developed to predict the
information from the musical information. The musical dataset is extracted
using the fast Fourier transform (FFT) models. The three deep learning
models were developed in this work the first model was based on the
information of extracted information such as zero-crossing rate, and spectral
roll-off. Another model was developed on the information of Mel frequency-
based cepstral coefficient (MFCC) features, the deep and wide CNN
algorithm with LSTM-GRU bidirectional model was developed. The third
model was developed on the extracted information from Mel-spectrographs
and untied these graphs based on two-dimensional (2D) data information to
the 2D CNN model alongside LSTM models. Proposed model performance
on the information from Mel-spectrographs is compared on the F1 score,
precision, and classification report of the models. Which shows better
accuracy with improved F1 and recall values as compared with existing
approaches.
Keywords:
Deep belief network
Deep convolutional neural
network
Deep neural networks
Fast Fourier transform
Long short-term memory
This is an open access article under the CC BY-SA license.
Corresponding Author:
Gaurav Agarwal
Department of Computer Science and Engineering, KIET Group of Institutions
Ghaziabad, Uttar pradesh, India
Email: gaurav13shaurya@gmail.com
1. INTRODUCTION
In human-to-human communication, emotion plays a significant role because human contact does
not include only language but contains non-verbal clues like gestures of the body, tones of voice, hand
gestures, and facial expressions [1]-[3]. Thus, in various fields, such as applications in a call center,
psychology, and text-to-speech engines, emotion recognition has been an essential inter-disciplinary research
topic. Nowadays, emotion in music, audio, or lyrics plays a vital role in day-to-day human life and even more
so in the digital age. There is a strong relation between music and emotion. Over the last quarter-century,
many researchers recognized and classified emotion in music [4], [5]. In music information retrieval (MIR),
automatic emotion recognition from music is an active task. There are many applications for music emotion
reorganization in the field of music information retrievals, such as classification of the music emotion, music
generation, instrument recognition, music source separation, and music recommender system. Consequently,
2. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Emotion classification for musical data using deep learning techniques (Gaurav Agarwal)
241
emotion-based MIR attracted attraction in fields such as academia and the industry. In the academic emotion
analysis of music, signals involve mood track, music sense, mood cloud, and moody. On the other side, from
the field of industry, MIR also received attraction because many music companies use emotion as a cue for
MIR, which are stated as grace notes, mood logics, mouseovers, and syntonic industries [6], [7].
Determination of the emotional category of music from audio or lyrics is quite challenging due to emotion
labeling of music excerpts, feature extraction from the audio signals, and choosing the classification
algorithm in the music emotion recognizing (MER) system. A music information retrieval subpart represents
a MER system. The MER system has multiple application regions in emotion recognition, for example,
suggestion systems for music, the creation of music playlists automatically, and music therapy. Music
emotion recognizing systems build up with the music databases. There are two types of approaches in the
MER system for databases: categorial and dimensional. The dimensional approach contains two-dimension
valence and arousal. On the other side, emotions for the categorical approach are characterized as sad, happy,
angry, and fearful [8]-[11]. There is the main aim of the music emotion reorganization system is to define the
appropriate content of music by using deep learning or machine learning algorithm for classifying the
emotion from music signals and signal processing techniques such as discrete Fourier transform and fast
Fourier transform, respectively, for converting windowing frame into magnitude spectrum. For effective
emotional recognition in music most widely utilized, energy-based features are perceptual based linear
prediction cestrum coefficients (PLPCC), Mel frequency-based cepstral coefficient (MFCC), Mel energy-
based cestrum dynamic coefficients (MEDC), and linear predictor coefficients (LPC) respectively [12].
Further, the emotion recognition for lyrics feature is based on natural language processing. Natural language
processing uses various text classification methods for feature extraction from text data. The final stage of
emotion recognition from audio or lyrics is the classification stage classification, and the most recent work
addresses emotion classification by machine learning or deep learning. Deep learning algorithm-based
models are stated as the deep belief network (DBN), deep convolutional neural network (DCNN) [13]-[15],
deep neural networks (CNN or DNN), long short-term memory (LSTM), and on another side machine
learning algorithm-based model is Naive bayes classifier, Gaussian mixture model (GMM), random forest
algorithm (RF), K-nearest neighbor (KNN), support vector machine (SVM) are used as a classification model
for music emotion recognition [16], [17]. Also, the emotion classification from the music has been done
using a combination of various models represented as a multi-modal such as bidirectional encoder
representations from transformers (BERT) and LSTM, CNN and LSTM, and CNN and Bi-LSTM. Using the
multi-modal emotion classification model improves model accuracy for the emotion classification in both
audio and lyrics [18]-[20].
Music emotion detection is a challenging task because emotions are subjective [21]. So, an
optimized approach is required to improve the music emotion recognition accuracy. The major contribution
of the presented methodology is to recognize the emotions of music with enhanced accuracy. The recognition
accuracy generally depends on the effective feature extraction as well as on the selection of the features.
Therefore, the proposed methodology uses the innovative pre-processing approach and also develops a
hybrid form of classifier for emotions classification. Motivation, by summarizing music emotion classification
(MEC), getting an idea that MEC is the essential technique for dealing with huge amounts of information about
music. Using a traditional algorithm-based method for music emotion classification has a disadvantage because
it is time consuming for music emotion recognition and has low accuracy; thus, single-model analysis can’t
fully express music emotion, so utilizing the multi-model analysis for the music emotion recognition from the
audio signal [22]-[24]. Therefore, fast learning speed and high classification accuracy are achieved with music
emotion classification by using a multi-modal fusion model research analysis based on audio cues. The
performance measurement of multimodal fusion model analysis and dynamic classification performance could
be improved by using a fusion of emotional information in music audio. Thus, in this research work for music
emotion classification deep learning algorithm-based convolutional neural network (CNN) and ensemble of the
LSTM-GRU model was utilized. Where the feature extraction process is done by the CNN and the emotion
classification process is done by the ensemble LSTM-GRU model.
2. RESEARCH METHOD
2.1. Data collection
For this research work, we used the music emotion classes dataset and the dataset available on Kaggle
open source data. Using this music emotion classes dataset performing music emotion classification for
different emotions. The dataset consists of 10,133 data samples, which contain 5 different emotions from music
signals namely, happy, sad, romantic, dramatic, and aggressive. A dataset consisting of various audio features
and MFCC features for musical clips. Also, in the dataset MFCC feature consisting 20 MFCC coefficients.
3. ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 240-247
242
2.2. Data visualization
We analyze five different emotions happy, aggressive, sad, romantic, and dramatic in a percentage
of the music emotion classes. Dramatic and happy emotions get higher percentages such as 23.5% and 22.2%
respectively compared to the other three emotions from the audio and MFCC features. After analyzing a
correlation between the audio feature, plotting to scatter plot between the 1. Spectral bandwidth and zero-
crossing rate 2. Chroma short-time Fourier transform (STFT) and spectral roll-off for the five types of
emotion. Based on the above clustering figures of the cluster of different audio emotion categories based on
the various features and all these characteristics are further utilized in the deep learning model for the
different emotion pattern recognition based on various features.
2.3. Proposed methodology
A CNN is based on a deep learning algorithm technique. CNN is solely based on the SoftMax unit
for classification purposes. CNN is commonly used to analyze visual images like facial recognition, pattern
recognition, image processing, and object detection which provides a better classification of data. The CNN
is a systematic neural network structure with multiple layers. A CNN consists of three layers, stated as a: i)
convolution layers, ii) pooling layers, and iii) a fully connected layer and a SoftMax unit consist of two-
dimensional planes. CNN builds the input by extracting feature pipeline modeling as an abstract. In CNN,
one or more convolutional layers are utilized. The convolution layer produces a feature map consisting of the
number of filters. During the convolution process, each local feature is first calculated using a convolutional
kernel and the convolution kernel consists of height and bias. Each local component of the input vector
convolved with the filter, which is the base of regional connections. A single convolution kernel represents
the weight and height 𝑊 and 𝐹 respectively, input vector and bias represent with 𝑋𝑖 and 𝑁𝑖 respectively.
After that, obtaining the output of the convolution layers by applying an activation function rectified linear
unit (ReLU) or a non-linear methodology. Finally using the ReLU activation function or non-linear
methodology computing the final convolution feature. The output of the convolutional layer passes to the
pooling layers. After the convolution layer 𝑟𝑖 process, the pooling layer’s aim is to reduce pattern resolut ion
numbers and reduce the computational load by taking the mean values of each and every subsection of the
n*n matrix. During the pooling layer process, the expanse of all the feature maps is done by the max-pooling
operation using filter size. Like any other neural network, in the input layer the fully connected layers, hidden
layer, and output layer must be fully connected layers. The resulting outcome of the pooling layer process is
passed to a fully connected layer and classified by the soft max unit when the flattened layer transforms the
feature into a feature vector.
2.4. CNN and LSTM model layers and configurations
2.4.1. A deep learning model was developed based on the audio features dataset
For this model the following 6 features of the extracted musical data are utilized to train the model.
The music attributes are generally organized into a four-eight number of diverse classifications and each
category specifies various concepts. The names of the categories include dynamics, tone color-timbre,
rhythm, harmony, musical form, melody, musical texture, and expressive techniques.
- Chroma STFT
- RMSE
- Spectral centroid of the music
- Spectral bandwidth of the music
- Roll-off features
- Zero crossing rate of the music
Based on this information audio files features were trained with the CNN and LSTM Bi-direction
models with the categorical cross-entropy of loss functions. Table 1 consisting the information about the CNN-
LSTM model parameters for the various audio features. For this model the first embedded layers were utilized
to combine the two deep learning models CNN with the LSTM Bi-directional model. the initial vector of the
emending vector is 32. The sequential models of 64 filters, 5*5 kernel size, an activation function is ReLU with
the 1-D convolution layers. Further, the layer and information are max pooling and reducing the filter size by
4*4 filters and flattening the model information by the bi-directional LSTM model which is configured by 400
gated recurrent unit (GRU) perceptron of the LSTM model. After this embedding the CNN and LSTM models.
Figure 1 shows the training and validation accuracy and loss of the first model trained with various music
extracted features.
4. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Emotion classification for musical data using deep learning techniques (Gaurav Agarwal)
243
Table 1. CNN-Bi-LSTM model for the MFCC features
Layer (type) Output Shape Param
Embedding layer (Embedding) (None, 20, 32) 16,000
Conv1d_1 (Conv1D) (None, 20, 64) 10,304
Max_polling1d (Max Polling 1D) (None, 5,64) 0
LSTM (LSTM) (None, 300) 438,000
Dense 1 (Dense) (None, 256) 77,056
Dense_2 (Dense) (None, 128) 32,896
Dense_3(Dense) (None, 64) 8,256
Dense_4 (Dense) (None, 5) 325
Total params: 582,837
Trainable params: 582,837
Non-trainable params: 0
Figure 1. Training accuracy and loss of the first model trained with various music extracted features
2.4.2. Deep CNN and LSTM model
Deep CNN and LSTM model for the MFCC channel dataset are utilized with the 20 MFCC
channels, all these features are thoroughly analyzed by statical analysis, and based on this information the
deep learning hybrid model was developed. From Figure 2 training-validation accuracy and loss of the model
trained with the MFCC features can be depicted. Table 2 shows the CNN-Bi-LSTM model parameter for the
MFCC feature. This developed model is configured very similarly to table 1 but the configuration of the deep
learning models and the embedding layer of the model were reconfigured based on the dataset information.
The model is reconfigured with the 32 filters with the exponential increase in the filter size with the max
pooling layer of the configuration of 64 filters for the extraction of the dataset. The dataset consists of 20
features which are more than the first model so the filers are able to extract a greater number of features with
the embedding Bi-directional LSTM model the model configuration is displayed in Table 1.
Figure 2. Training-validation accuracy and loss of the model trained with the MFCC features
5. ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 240-247
244
Table 2. First model classification report (performance analysis)
Class Precision Recall F1-score Support
0 0.44 0.32 0.37 487
1 0.42 0.27 0.33 578
2 0.32 0.68 0.45 576
3 0.37 0.24 0.29 408
4 0.85 0.61 0.71 485
Accuracy 0.73 2534
Micro Avg 0.75 0.73 0.73 2534
Weighted Avg 0.75 0.73 0.73 2534
Dataset used: international society for music information retrieval 2012 dataset (ISMIR2012) [25].
The ISMIR2012 is a popular music dataset used for emotion recognition that consists of various types of
songs in two languages, English and Hindi. The total numbers of songs in the English language are 2886 in
which sad songs are 759, 746 are happy songs, 636 are angry songs, and relaxing songs are 745. Similarly,
the total number of songs in the Hindi language is 1037 in which 200 songs are happy, 216 songs are sad
songs, 283 songs are angry and 338 songs are relaxing. The collection of emotional songs is used for the
categorization of different emotions.
3. RESULTS AND DISCUSSION
As shown in the Figure 1 model 1 were developed based on the musical features which are able to
achieve training accuracy up to only 40% and features get stuck into the local minimum condition of the
model in which the training parameters and the weights of the model get saturated due to sometimes fewer
numbers of epochs or due to fewer features of the training parameters which is why this model gets stuck at
40% of accuracy and loss is get saturated so based on this parameter the model cannot be trained or tuned for
the prediction and analysis of the emotion detection from the musical information.
The second model were developed based on the information and the data of musical information
extraction based on the MFCC 20 channel features. These 20 features of the music are able to extract a
greater number of hidden features from the models however the extracted information from the model.
however, this model is developed with 200 epochs but when the training cross 100 epochs the model is going
to get saturated with the 72% of model accuracy and stuck with the limited information about the musical
features, so based on these two conditions of the model 1 and the model 2 it is very clear that for the
indentation of the emotion from the music more numbers of hidden features needed to be extracted so for that
purpose the Mel-spectrographs is utilized in the third model. Classification report of model trained with
MFCC information can be analyzed in Table 3.
Table 3. Classification report of model trained with MFCC information
Class Precision Recall F1-score Support
0 0.62 0.74 0.67 487
1 0.68 0.83 0.75 578
2 0.88 0.67 0.76 576
3 0.71 0.80 0.76 408
4 0.85 0.61 0.71 485
Accuracy 0.73 2534
Micro Avg 0.75 0.73 0.73 2534
Weighted Avg 0.75 0.73 0.73 2534
In the third model the Mel-spectrographs of the music is utilize to train the model based on the
outcome of the first two models. Training accuracy and loss of the third model with the Mel-spectrographs of
the musical information can be depicted from Figure 3. This third model is trained with the 210 epochs which
are able to generate 94% accuracy during the model training and during the model validation and testing the
average and weighted accuracy is around 95% which can be analyzed in Table 4. These Mel-spectrographs
are 2D type datasets so for this purpose the 2D CNN model has utilized this model extract a greater number
of hidden features by applying multiple filters to the dataset which is not possible for the 1D data utilized in
the above two models which is why this model is able to generate better prediction accuracy than the other
two models.
6. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Emotion classification for musical data using deep learning techniques (Gaurav Agarwal)
245
Figure 3. Training accuracy and loss of the model with the Mel-spectrographs of the musical information
Table 4. Model performance on the information from Mel-spectrographs
Class Precision Recall F1-score Support
0 1.00 1.00 1.00 620
1 0.90 1.00 0.95 735
2 0.92 1.00 0.96 633
3 1.00 0.88 0.94 493
4 1.00 0.85 0.92 559
Accuracy 0.95 3040
Micro Avg 0.96 0.95 0.95 3040
Weighted Avg 0.96 0.95 0.95 3040
4. CONCLUSION
With the development of the deep learning algorithm-based technique and digital technology for
audio, the musical emotion recognition (MER) system has gradually emerged as the research hotspot. At
present, for the MEC problem deep learning techniques are gradually mainstream. The focus of the music
emotion classification research is to design a proficient and robust model for the recognition of emotion. So,
this research work provided a detailed review of deep learning-based architecture and acoustic features for
the music emotion classification. The proposed method is evaluated on the music emotion classes datasets.
These deep learning algorithm-based models and their layer-wise architectures are explained in detail based
on the classification of various emotions as dramatic, happy, aggressive, sad, and romantic. Therefore,
proposing a CNN and an ensemble of LSTM-GRU method for a music emotion classification. First audio and
spectrogram feature extracted from music signal. Here we examined various feature extraction techniques
based on audio features and spectrograph features. After that create additional training samples from
spectrogram images through the data augmentation process. Here, the CNN performs feature extraction from
the spectrogram images. Here we performed a deep learning algorithm-based model for both audio and
spectrograph features and from this analysis, the deep learning-based model for spectrograph feature obtained
higher accuracy than the audio feature.
ACKNOWLEDGEMENTS
Author thanks their organization for providing full support in the completion of this manuscript as
well as all the other authors whose manuscript has been reviewed and considered for this manuscript.
REFERENCES
[1] K. Pyrovolakis, P. Tzouveli, and G. Stamou, “Multi-modal song mood detection with deep learning,” Sensors, vol. 22, no. 3, Jan.
2022, doi: 10.3390/s22031065.
[2] R. Panda, R. Malheiro, and R. P. Paiva, “Audio features for music emotion recognition: a survey,” IEEE Transactions on
Affective Computing, vol. 14, no. 1, pp. 68–88, Jan. 2023, doi: 10.1109/TAFFC.2020.3032373.
[3] B. Maji, M. Swain, and Mustaqeem, “Advanced fusion-based speech emotion recognition system using a dual-attention
mechanism with conv-caps and Bi-GRU features,” Electronics (Switzerland), vol. 11, no. 9, Apr. 2022, doi:
10.3390/electronics11091328.
7. ISSN: 2089-4864
Int J Reconfigurable & Embedded Syst, Vol. 12, No. 2, July 2023: 240-247
246
[4] R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar, and T. Alhussain, “Speech emotion recognition using deep learning
techniques: a review,” IEEE Access, vol. 7, pp. 117327–117345, 2019, doi: 10.1109/ACCESS.2019.2936124.
[5] G. Agarwal and H. Om, “An efficient supervised framework for music mood recognition using autoencoder-based optimised
support vector regression model,” IET Signal Processing, vol. 15, no. 2, pp. 98–121, Apr. 2021, doi: 10.1049/sil2.12015.
[6] J. Yang, “A novel music emotion recognition model using neural network technology,” Frontiers in Psychology, vol. 12, Sep.
2021, doi: 10.3389/fpsyg.2021.760060.
[7] C. Chen and Q. Li, “A multimodal music emotion classification method based on multifeature combined network classifier,”
Mathematical Problems in Engineering, vol. 2020, pp. 1–11, Aug. 2020, doi: 10.1155/2020/4606027.
[8] N. Zafar, I. U. Haq, J. U. R. Chughtai, and O. Shafiq, “Applying hybrid LSTM-GRU model based on heterogeneous data sources
for traffic speed prediction in urban areas,” Sensors, vol. 22, no. 9, p. 3348, Apr. 2022, doi: 10.3390/s22093348.
[9] M. S. N. V Jitendra, “A review: music feature extraction from an audio signal,” International Journal of Advanced Trends in
Computer Science and Engineering, vol. 9, no. 2, pp. 973–980, Apr. 2020, doi: 10.30534/ijatcse/2020/11922020.
[10] G. Liu and Z. Tan, “Research on multi-modal music emotion classification based on audio and lyirc,” in 2020 IEEE 4th
Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Jun. 2020, pp. 2331–2335. doi:
10.1109/ITNEC48623.2020.9084846.
[11] X. Jia, “Music emotion classification method based on deep learning and improved attention mechanism,” Computational
Intelligence and Neuroscience, vol. 2022, pp. 1–8, Jun. 2022, doi: 10.1155/2022/5181899.
[12] F. Chenchah and Z. Lachiri, “Acoustic emotion recognition using linear and nonlinear cepstral coefficients,” International
Journal of Advanced Computer Science and Applications, vol. 6, no. 11, 2015, doi: 10.14569/ijacsa.2015.061119.
[13] G. Agarwal, V. Maheshkar, S. Maheshkar, and S. Gupta, “Recognition of emotions of speech and mood of music: a review,” in
WIDECOM 2018: International Conference on Wireless, Intelligent, and Distributed Environment for Communication, vol. 18,
2018, pp. 181–197, doi: 10.1007/978-3-319-75626-4_14.
[14] I. J. Tashev, Z. Q. Wang, and K. Godin, “Speech emotion recognition based on gaussian mixture models and deep neural
networks,” in 2017 Information Theory and Applications Workshop, ITA 2017, Feb. 2017, pp. 1–4, doi:
10.1109/ITA.2017.8023477.
[15] X. Jia, “A music emotion classification model based on the improved convolutional neural network,” Computational Intelligence
and Neuroscience, vol. 2022, pp. 1–11, Feb. 2022, doi: 10.1155/2022/6749622.
[16] G. Tong, “Music emotion classification method using improved deep belief network,” Mobile Information Systems, vol. 2022, pp.
1–7, Mar. 2022, doi: 10.1155/2022/2715765.
[17] G. Agarwal, H. Om, and S. Gupta, “A learning framework of modified deep recurrent neural network for classification and
recognition of voice mood,” International Journal of Adaptive Control and Signal Processing, vol. 36, no. 8, pp. 1835–1859,
Aug. 2022, doi: 10.1002/acs.3425.
[18] F. Zamani and R. Wulansari, “Emotion classification using 1D-CNN and RNN based on DEAP dataset,” in Natural Language
Processing, Dec. 2021, pp. 363–378, doi: 10.5121/csit.2021.112328.
[19] R. Chauhan, K. K. Ghanshala, and R. C. Joshi, “Convolutional neural network (CNN) for image detection and recognition,” in
ICSCCC 2018 - 1st International Conference on Secure Cyber Computing and Communications, Dec. 2018, pp. 278–282, doi:
10.1109/ICSCCC.2018.8703316.
[20] S. Albawi, T. A. Mohammed, and S. Al-Zawi, “Understanding of a convolutional neural network,” in 2017 International
Conference on Engineering and Technology (ICET), Aug. 2017, pp. 1–6, doi: 10.1109/ICEngTechnol.2017.8308186.
[21] T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. Ambikairajah, “A comprehensive review of speech emotion
recognition systems,” IEEE Access, vol. 9, pp. 47795–47814, 2021, doi: 10.1109/ACCESS.2021.3068045.
[22] Y. Yu, “Research on music emotion classification based on CNN-LSTM network,” in Proceedings of 2021 5th Asian Conference
on Artificial Intelligence Technology, ACAIT 2021, Oct. 2021, pp. 473–476, doi: 10.1109/ACAIT53529.2021.9731277.
[23] T. L. New, S. W. Foo, and L. C. De Silva, “Detection of stress and emotion in speech using traditional and FFT based log energy
features,” in ICICS-PCM 2003 - Proceedings of the 2003 Joint Conference of the 4th International Conference on Information,
Communications and Signal Processing and 4th Pacific-Rim Conference on Multimedia, 2003, vol. 3, pp. 1619–1623, doi:
10.1109/ICICS.2003.1292741.
[24] A. B. Kandali, A. Routray, and T. K. Basu, “Emotion recognition from assamese speeches using MFCC features and GMM
classifier,” in TENCON 2008 - 2008 IEEE Region 10 Conference, Nov. 2008, pp. 1–5, doi: 10.1109/TENCON.2008.4766487.
[25] ISMIR community, “International society for music information retrieval 2012 dataset (ISMIR2012),” 2012. Distributed by
ISMIR. https://ismir.net/resources/datasets/.
BIOGRAPHIES OF AUTHORS
Gaurav Agarwal is an Assistant Professor, KIET Group of Institutions,
Ghaziabad, has more than 19 years of teaching and research experience. He received his Ph.D.
in Computer science of Engineering in 2022 from Indian institute of technology, Dhanbad.
M.Tech. degree in Computer engineering from Shobhit University, Meerut, India, in 2010 and
B.E. degree in Computer engineering from the North Maharastra University, Jalgaon,
Maharastra, India, in 2003. He had also worked as Senior faculty research fellow at IIT Delhi
in 2015. He has contributed more than thirty-five research papers in several International and
National journals and conference proceedings of high repute. His main areas of research
interest are speech signal processing, genetic algorithms and web search engines. He has also
filled 4 Indian patents out of which 2 are granted. He is reviewer of more than 10 SCI Journals.
He can be contacted at email: gaurav13shaurya@gmail.com.
8. Int J Reconfigurable & Embedded Syst ISSN: 2089-4864
Emotion classification for musical data using deep learning techniques (Gaurav Agarwal)
247
Dr. Sachi Gupta is a Professor and Head of the IT Department at IMS
Engineering College, Ghaziabad, has more than 18 years of teaching and research experience.
She completed her Ph.D. and M.Tech. (gold medalist) degrees from Banasthali Vidyapith,
Rajasthan in the Computer Science domain. She completed her B. Tech from UPTU,
Lucknow. She has filed six patents, out of which three have been granted, and published more
than twenty papers in national/international level conferences/journals of repute. She is an
active member of CSI, Vibha, IACSIT, and IAENG. She has also worked as a national and
international advisory board member for various reputed conferences. Her areas of interest
include genetic algorithms, machine learning, and fuzzy logic. She can be contacted at email:
shaurya13@gmail.com.
Dr. Shivani Agarwal is currently working as an Assistant Professor in
Department of Information Technology, Ajay Kumar Garg Engineering College, Ghaziabad.
She Completed her Ph.D. and M.Tech. in Computer Science and Engineering from
Uttarakhand Technical University, Dehradun. B. Tech in IT from UPTU, Lucknow. Dr.
Shivani Agarwal has more than 14 years’ experience in teaching and research. Her areas of
interest are soft computing, machine learning and bioinformatics. She has published many
research papers in different reputed International Journals and Conferences. She can be
contacted at email: kasishivani@gmail.com.
Dr. Atul Kumar Rai is currently working as an Associate Professor and Dean at
Kothiwal Institute of Technology and Professional Studies, Moradabad, have more than 16
years of teaching and 1 year of industrial experience. He completed his Ph.D. from Shri
Venkateshwara University, Gajraula, J.P. Nagar and M.Tech. degree from Rajiv Gandhi
Technical University, Bhopal in the Computer Science domain. He completed his B.Tech.
from UPTU, Lucknow. He has filed two patents, and published more than ten papers in
national international level conferences journals of repute. He is Member of Advisory Board of
QUBITOS Technology Pvt Ltd Noida. He is Cloud Consultant at Smart Brains Engineers &
Technology Pvt Ltd (hiring for HCL). His areas of interest include genetic algorithms, machine
learning, fuzzy logic and cloud computing. He can be contacted at email:
atulrocks@gmail.com.