SlideShare a Scribd company logo
1 of 12
Download to read offline
The Munich LSTM-RNN Approach 
to the MediaEval 2014 “Emotion in Music” Task 
Subtask 2: estimation of Arousal and Valence scores continuously in time 
Eduardo Coutinho1,2,3, Felix Weninger1, Björn Schuller3,1,4 and Klaus R. Scherer3,5 
1 Machine Intelligence & Signal Processing Group, Technische Universität München, Munich, Germany 
2 School of Music, University of Liverpool, Liverpool, UK 
3 Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland 
4 Department of Computing, Imperial College London, London, United Kingdom 
5 Department of Psychology, Ludwig-Maximilians-Universität München, Munich,
Method: features sets 
Feature Set 1 (FS1): 
• 2013 INTERSPEECH Computational Paralinguistics Challenge 
• 65 (energy, spectrum and voice-related) LLDs (plus first order derivates) 
covering a broad set of descriptors from the fields of speech processing, 
Music Information Retrieval, and general sound analysis 
• We computed the mean and standard deviation functionals of each feature 
over 1s time windows with 50% overlap 
• Final set: 260 features extracted at a rate of 2Hz. 
• All features were extracted openSMILE
Method: features sets 
Feature Set 2 (FS2) 
• FS1 plus four new features 
- Roughness (R) and Sensory Dissonance (SDiss) 
- Tempo (T) and Event Density (ED). 
• Correspond to two psychoacoustic dimensions consistently associated with the 
communication of emotion in music and speech - Roughness and Duration 
(Coutinho & Dibben, 2013) 
• The four features were extracted with the MIR Toolbox 
- mirroughness: SDiss (Sethares formula) and R (Vassilakis algorithm) 
- mirtempo (T) 
- mireventdensity (ED) 
E. Coutinho and N. Dibben. Psychoacoustic cues to emotion in speech prosody and music. Cognition & 
Emotion, 27(4):658–684, 2013.
Method: regressors 
Given the importance of the temporal context to the perception of emotion 
in music we considered time-sensitive models 
• Long-Short Term Memory (LSTM) Recurrent Neural Networks (RNN) 
LSTM-RNN networks make use of special memory blocks, which endow the 
model with the capacity of accessing (long-range) temporal context and 
predicting the outputs based on such information. 
An LSTM-RNN network is similar to a RNN except that the nonlinear hidden 
units are replaced by LSTM memory blocks (and memory is implemented 
explicitly).
Method: models training 
Joint learning of Arousal and Valence time-continuous values (multitask) 
Cross-validation 
• The fold subdivision followed a modulus based scheme 
- instance ID modulus 11 
• The instances yielding a remainder of 10 were left out to create a small test set 
for performance estimation 
• On the remaining instances, a 10-fold cross-validation was performed. 
• We computed 5 trials of the same model each with randomized initial weights 
in the range [-0.1,0.1].
Method: models training (cont.) 
Basic architecture: deep LSTM-RNN (2 hidden layers) 
Optimised parameters 
• number of LSTM blocks in each hidden layer, 
• learning rate 
• standard deviation of the Gaussian noise applied to the input activations 
- used to alleviate the effects of over-fitting 
A momentum of 0.9 was used for all tests 
Early stopping strategy (to avoid overfitting the training data) 
• training was stopped after 20 iterations without improvement of the validation set 
performance 
For each fold, instances were presented in random order 
The input (acoustic features) and output (emotion features) data were 
standardised to zero mean and unit variance (on the correspondent training 
sets used in each cross-validation fold)
Method: auto-encoders 
In four of our five runs we pre-trained the first hidden layer. 
The unsupervised pre-training strategy consisted of de-noising LSTM-RNN 
auto-encoders. 
• We first created a LSTM-RNN with a single hidden layer trained to predict the 
input features (y(t) = x(t)) 
• In order to avoid over-fitting, in each training epoch and timestep t, we added a 
noise vector n to x(t), sampled from a Gaussian distribution with zero mean and 
variance n 
• Both the development and test set instances were used to train the DAE. 
After determining the auto-encoder weights a second hidden layer was 
added ...
Submissions 
Run 1: 
• basic architecture trained using the regression targets 
• inputs: FS1 
Run 2: 
• basic architecture + pre-trained 1st layer trained using the regression targets 
• 1st layer weights were not re-trained 
• inputs: FS1 
Run 3: 
• same as Run 2, but 1st layer weights were also re-trained 
• inputs: FS1 
Run 4: 
• same as Run 2, but using FS2; 
Run 5: 
• same as Run 3, but using FS2; 
Submitted results: the average outputs of the five best models (across all folds 
and trials) as estimated using the left out instances
Results 
of 0.9 was used for all tests), and the standard deviation 
of the Gaussian noise applied to the input activations (used 
to alleviate the e↵ects of over-fitting). An early stopping 
strategy was also used to avoid overfitting the training data 
– training was stopped after 20 iterations without improve-ment 
Arousal 
of the validation set performance (sum of squared er-rors). 
The instances in the 10 training sets were presented 
• Similarity was significantly higher (higher r) in 
in random order to the model during training. The input 
(acoustic features) and output (emotion features) data were 
standardised to zero mean and unit variance on the corre-spondent 
Run 4 compared training sets used to all in each other cross-runs 
validation fold. 
In four of our five runs (see next subsection) we pre-trained 
• Precision the was first hidden significantly layer. Our unsupervised pre-training 
strategy consisted of de-noising LSTM-higher RNN auto-(encoders. 
lower 
RMSE) We first in created Run a 4 LSTM-compared RNN with a to single all hidden layer 
trained to predict the input features (y(t) = x(t)). other Both the 
runs 
development and test set instances were used to train the 
Valence 
DAE. In order to avoid over-fitting, in each training epoch 
and timestep t, we added a noise vector n to x(t), sampled 
from a Gaussian distribution with zero mean and variance 
n. After determining the auto-encoder weights a second hid-den 
• Similarity layer was was added. higher In two (of higher the runs, r) all in of the Run weights 
4 
were trained using the regression targets and keeping the 
first layer weights constant. In the other two, the first layer 
weights were retrained. 
2.2 Runs 
We submitted five runs for Subtask 2. All runs consisted 
of LSTM-RNNs using two hidden layers in order to attempt 
modeling high-level abstractions in the data (Deep Learn-ing). 
compared to all other runs (but only 
significantly higher than Run 3) 
• Precision was significantly higher (lower 
RMSE) The in specifics Run 4 of compared each run are as to follows: all Run other 1) The 
runs 
(except Run 5) 
basic architecture was directly trained using the regression 
targets and FS1; Run 2)We pre-trained the first layer, added 
a second one, and all weights (with the exception of the first 
layer weights that were kept constant) were trained using 
the regression targets and FS1; Run 3) Same as Run 2, but 
all weights (including the first layer weights) were trained 
using the regression targets and FS1; Run 4) Same as Run 
rate of 106 and Gaussian noise with a variance of 0.5 
applied to the inputs during development (no noise added 
when processing the test set). 
Table 1: Ocial results of the MISP-TUM team for 
the five runs submitted. 
Arousal Valence 
r 
Run 1 0.247±0.456 0.170±0.458 
Run 2 0.246±0.458 0.181±0.503 
Run 3 0.291±0.479 0.152±0.503 
Run 4 0.354±0.455 0.198±0.492 
Run 5 0.232±0.434 0.172±0.450 
RMSE 
Run 1 0.134±0.062 0.096±0.056 
Run 2 0.121±0.058 0.090±0.055 
Run 3 0.120±0.059 0.090±0.056 
Run 4 0.102±0.052 0.079±0.048 
Run 5 0.112±0.055 0.082±0.050 
4. CONCLUSIONS 
The LSTM-RNN approaches to the 2014 MediaEval“Emo-tion 
in Music” task all delivered consistent improvements 
over the baselines. The results reveal the importance of 
fine-tuning the feature set and the deep learning strategy, 
which could be attributed to the relatively small training 
set. 
5. ACKNOWLEDGMENTS 
This work was partially supported by the ERC in the Eu-ropean 
Community’s 7th Framework Program under grant 
agreements No. 338164 (Starting Grant iHEARu to Bj¨orn 
Schuller) and 230331 (Advanced Grant PROPEREMO to 
Klaus Scherer). 
6. REFERENCES
Best submission 
Run 4: 
• LSTM-RNN with 200 and 5 LSTM blocks (first and second layers, respectively) 
• Trained with 
- Learning rate of 10−6 
- Gaussian noise with a variance of 0.5 applied to the inputs during development (no 
noise added when processing the test set). 
• Pre-trained first layer 
- weights in the 1st layer kept constant while training using the regression targets) 
• FS2 as input
Conclusions 
The LSTM-RNN approaches all delivered consistent improvements over the 
baselines 
The results reveal 
• the importance of fine-tuning the feature set 
- emotional power of music is grounded on animal communication and vocal prosody 
- a focus on relevant features in all this domains in beneficial in terms of Machine 
Learning and coherence with research in Music Psychology 
• the fundamental role of time 
- affect is conveyed as patterns in time  time sensitive models are necessary 
• the relevance of a deep learning strategy 
- to tackle the problem of data scarcity and huge variety of music
Thank you! 
Eduardo Coutinho 
• e.coutinho@tum.de 
• www.eadward.org 
Eduardo Felix Björn Klaus

More Related Content

What's hot

Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep LearningOswald Campesato
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLPhytae
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processingananth
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a functionTaisuke Oe
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksCloudxLab
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaAndre Pemmelaar
 

What's hot (20)

Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a function
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 

Viewers also liked

Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...multimediaeval
 
Emnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsAce12358
 
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...Universitat Politècnica de Catalunya
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
The Distribution of Household Income, Federal Taxes, and Government Spending
The Distribution of Household Income, Federal Taxes, and Government SpendingThe Distribution of Household Income, Federal Taxes, and Government Spending
The Distribution of Household Income, Federal Taxes, and Government SpendingCongressional Budget Office
 
同玩節海報事件
同玩節海報事件同玩節海報事件
同玩節海報事件lalacamp07
 
LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...Nish Mahanty
 
Thought Leadership from Social Sector Masters
Thought Leadership from Social Sector MastersThought Leadership from Social Sector Masters
Thought Leadership from Social Sector MastersSalesforce.org
 
A look back bkelly duo farewell - june 2015
A look back   bkelly duo farewell - june 2015A look back   bkelly duo farewell - june 2015
A look back bkelly duo farewell - june 2015Brian Kelly
 
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011Compulsiva Accesorios
 
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...Chanita Anuwong
 
Patrick Shields Digitising the Public Sector
Patrick Shields   Digitising the Public SectorPatrick Shields   Digitising the Public Sector
Patrick Shields Digitising the Public SectorSoftware AG South Africa
 

Viewers also liked (18)

Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
 
Emnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cws
 
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model (UP...
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
KungFuPR
KungFuPRKungFuPR
KungFuPR
 
The Distribution of Household Income, Federal Taxes, and Government Spending
The Distribution of Household Income, Federal Taxes, and Government SpendingThe Distribution of Household Income, Federal Taxes, and Government Spending
The Distribution of Household Income, Federal Taxes, and Government Spending
 
同玩節海報事件
同玩節海報事件同玩節海報事件
同玩節海報事件
 
LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...
 
Thought Leadership from Social Sector Masters
Thought Leadership from Social Sector MastersThought Leadership from Social Sector Masters
Thought Leadership from Social Sector Masters
 
A look back bkelly duo farewell - june 2015
A look back   bkelly duo farewell - june 2015A look back   bkelly duo farewell - june 2015
A look back bkelly duo farewell - june 2015
 
Driving school
Driving schoolDriving school
Driving school
 
Your Garden and Global Warming
Your Garden and Global WarmingYour Garden and Global Warming
Your Garden and Global Warming
 
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
 
4º básico a semana 25 abril al 29 abril
4º básico a  semana 25 abril  al 29 abril4º básico a  semana 25 abril  al 29 abril
4º básico a semana 25 abril al 29 abril
 
Comic Analysis
Comic AnalysisComic Analysis
Comic Analysis
 
Quick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & MicroformatsQuick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & Microformats
 
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
 
Patrick Shields Digitising the Public Sector
Patrick Shields   Digitising the Public SectorPatrick Shields   Digitising the Public Sector
Patrick Shields Digitising the Public Sector
 

Similar to The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task

20211008 修論中間発表
20211008 修論中間発表20211008 修論中間発表
20211008 修論中間発表Tomoya Koike
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close seteSAT Journals
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close seteSAT Publishing House
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptxthanhdowork
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료Jeong Choi
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...multimediaeval
 
machine learning for engineering students
machine learning for engineering studentsmachine learning for engineering students
machine learning for engineering studentsKavitabani1
 
Continuous human action recognition in ambient assisted living scenarios
Continuous human action recognition in ambient assisted living scenariosContinuous human action recognition in ambient assisted living scenarios
Continuous human action recognition in ambient assisted living scenariosFrancisco (Paco) Florez-Revuelta
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_ASrimatre K
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesisNAVER Engineering
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksNAVER Engineering
 

Similar to The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task (20)

Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
20211008 修論中間発表
20211008 修論中間発表20211008 修論中間発表
20211008 修論中間発表
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close set
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close set
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
 
machine learning for engineering students
machine learning for engineering studentsmachine learning for engineering students
machine learning for engineering students
 
Continuous human action recognition in ambient assisted living scenarios
Continuous human action recognition in ambient assisted living scenariosContinuous human action recognition in ambient assisted living scenarios
Continuous human action recognition in ambient assisted living scenarios
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
D111823
D111823D111823
D111823
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable Networks
 

More from multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimatormultimediaeval
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
 

More from multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Recently uploaded

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task

  • 1. The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task Subtask 2: estimation of Arousal and Valence scores continuously in time Eduardo Coutinho1,2,3, Felix Weninger1, Björn Schuller3,1,4 and Klaus R. Scherer3,5 1 Machine Intelligence & Signal Processing Group, Technische Universität München, Munich, Germany 2 School of Music, University of Liverpool, Liverpool, UK 3 Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland 4 Department of Computing, Imperial College London, London, United Kingdom 5 Department of Psychology, Ludwig-Maximilians-Universität München, Munich,
  • 2. Method: features sets Feature Set 1 (FS1): • 2013 INTERSPEECH Computational Paralinguistics Challenge • 65 (energy, spectrum and voice-related) LLDs (plus first order derivates) covering a broad set of descriptors from the fields of speech processing, Music Information Retrieval, and general sound analysis • We computed the mean and standard deviation functionals of each feature over 1s time windows with 50% overlap • Final set: 260 features extracted at a rate of 2Hz. • All features were extracted openSMILE
  • 3. Method: features sets Feature Set 2 (FS2) • FS1 plus four new features - Roughness (R) and Sensory Dissonance (SDiss) - Tempo (T) and Event Density (ED). • Correspond to two psychoacoustic dimensions consistently associated with the communication of emotion in music and speech - Roughness and Duration (Coutinho & Dibben, 2013) • The four features were extracted with the MIR Toolbox - mirroughness: SDiss (Sethares formula) and R (Vassilakis algorithm) - mirtempo (T) - mireventdensity (ED) E. Coutinho and N. Dibben. Psychoacoustic cues to emotion in speech prosody and music. Cognition & Emotion, 27(4):658–684, 2013.
  • 4. Method: regressors Given the importance of the temporal context to the perception of emotion in music we considered time-sensitive models • Long-Short Term Memory (LSTM) Recurrent Neural Networks (RNN) LSTM-RNN networks make use of special memory blocks, which endow the model with the capacity of accessing (long-range) temporal context and predicting the outputs based on such information. An LSTM-RNN network is similar to a RNN except that the nonlinear hidden units are replaced by LSTM memory blocks (and memory is implemented explicitly).
  • 5. Method: models training Joint learning of Arousal and Valence time-continuous values (multitask) Cross-validation • The fold subdivision followed a modulus based scheme - instance ID modulus 11 • The instances yielding a remainder of 10 were left out to create a small test set for performance estimation • On the remaining instances, a 10-fold cross-validation was performed. • We computed 5 trials of the same model each with randomized initial weights in the range [-0.1,0.1].
  • 6. Method: models training (cont.) Basic architecture: deep LSTM-RNN (2 hidden layers) Optimised parameters • number of LSTM blocks in each hidden layer, • learning rate • standard deviation of the Gaussian noise applied to the input activations - used to alleviate the effects of over-fitting A momentum of 0.9 was used for all tests Early stopping strategy (to avoid overfitting the training data) • training was stopped after 20 iterations without improvement of the validation set performance For each fold, instances were presented in random order The input (acoustic features) and output (emotion features) data were standardised to zero mean and unit variance (on the correspondent training sets used in each cross-validation fold)
  • 7. Method: auto-encoders In four of our five runs we pre-trained the first hidden layer. The unsupervised pre-training strategy consisted of de-noising LSTM-RNN auto-encoders. • We first created a LSTM-RNN with a single hidden layer trained to predict the input features (y(t) = x(t)) • In order to avoid over-fitting, in each training epoch and timestep t, we added a noise vector n to x(t), sampled from a Gaussian distribution with zero mean and variance n • Both the development and test set instances were used to train the DAE. After determining the auto-encoder weights a second hidden layer was added ...
  • 8. Submissions Run 1: • basic architecture trained using the regression targets • inputs: FS1 Run 2: • basic architecture + pre-trained 1st layer trained using the regression targets • 1st layer weights were not re-trained • inputs: FS1 Run 3: • same as Run 2, but 1st layer weights were also re-trained • inputs: FS1 Run 4: • same as Run 2, but using FS2; Run 5: • same as Run 3, but using FS2; Submitted results: the average outputs of the five best models (across all folds and trials) as estimated using the left out instances
  • 9. Results of 0.9 was used for all tests), and the standard deviation of the Gaussian noise applied to the input activations (used to alleviate the e↵ects of over-fitting). An early stopping strategy was also used to avoid overfitting the training data – training was stopped after 20 iterations without improve-ment Arousal of the validation set performance (sum of squared er-rors). The instances in the 10 training sets were presented • Similarity was significantly higher (higher r) in in random order to the model during training. The input (acoustic features) and output (emotion features) data were standardised to zero mean and unit variance on the corre-spondent Run 4 compared training sets used to all in each other cross-runs validation fold. In four of our five runs (see next subsection) we pre-trained • Precision the was first hidden significantly layer. Our unsupervised pre-training strategy consisted of de-noising LSTM-higher RNN auto-(encoders. lower RMSE) We first in created Run a 4 LSTM-compared RNN with a to single all hidden layer trained to predict the input features (y(t) = x(t)). other Both the runs development and test set instances were used to train the Valence DAE. In order to avoid over-fitting, in each training epoch and timestep t, we added a noise vector n to x(t), sampled from a Gaussian distribution with zero mean and variance n. After determining the auto-encoder weights a second hid-den • Similarity layer was was added. higher In two (of higher the runs, r) all in of the Run weights 4 were trained using the regression targets and keeping the first layer weights constant. In the other two, the first layer weights were retrained. 2.2 Runs We submitted five runs for Subtask 2. All runs consisted of LSTM-RNNs using two hidden layers in order to attempt modeling high-level abstractions in the data (Deep Learn-ing). compared to all other runs (but only significantly higher than Run 3) • Precision was significantly higher (lower RMSE) The in specifics Run 4 of compared each run are as to follows: all Run other 1) The runs (except Run 5) basic architecture was directly trained using the regression targets and FS1; Run 2)We pre-trained the first layer, added a second one, and all weights (with the exception of the first layer weights that were kept constant) were trained using the regression targets and FS1; Run 3) Same as Run 2, but all weights (including the first layer weights) were trained using the regression targets and FS1; Run 4) Same as Run rate of 106 and Gaussian noise with a variance of 0.5 applied to the inputs during development (no noise added when processing the test set). Table 1: Ocial results of the MISP-TUM team for the five runs submitted. Arousal Valence r Run 1 0.247±0.456 0.170±0.458 Run 2 0.246±0.458 0.181±0.503 Run 3 0.291±0.479 0.152±0.503 Run 4 0.354±0.455 0.198±0.492 Run 5 0.232±0.434 0.172±0.450 RMSE Run 1 0.134±0.062 0.096±0.056 Run 2 0.121±0.058 0.090±0.055 Run 3 0.120±0.059 0.090±0.056 Run 4 0.102±0.052 0.079±0.048 Run 5 0.112±0.055 0.082±0.050 4. CONCLUSIONS The LSTM-RNN approaches to the 2014 MediaEval“Emo-tion in Music” task all delivered consistent improvements over the baselines. The results reveal the importance of fine-tuning the feature set and the deep learning strategy, which could be attributed to the relatively small training set. 5. ACKNOWLEDGMENTS This work was partially supported by the ERC in the Eu-ropean Community’s 7th Framework Program under grant agreements No. 338164 (Starting Grant iHEARu to Bj¨orn Schuller) and 230331 (Advanced Grant PROPEREMO to Klaus Scherer). 6. REFERENCES
  • 10. Best submission Run 4: • LSTM-RNN with 200 and 5 LSTM blocks (first and second layers, respectively) • Trained with - Learning rate of 10−6 - Gaussian noise with a variance of 0.5 applied to the inputs during development (no noise added when processing the test set). • Pre-trained first layer - weights in the 1st layer kept constant while training using the regression targets) • FS2 as input
  • 11. Conclusions The LSTM-RNN approaches all delivered consistent improvements over the baselines The results reveal • the importance of fine-tuning the feature set - emotional power of music is grounded on animal communication and vocal prosody - a focus on relevant features in all this domains in beneficial in terms of Machine Learning and coherence with research in Music Psychology • the fundamental role of time - affect is conveyed as patterns in time time sensitive models are necessary • the relevance of a deep learning strategy - to tackle the problem of data scarcity and huge variety of music
  • 12. Thank you! Eduardo Coutinho • e.coutinho@tum.de • www.eadward.org Eduardo Felix Björn Klaus