SlideShare a Scribd company logo
1 of 20
Download to read offline
1
Personalized Music Emotion
Recognition via Model Adaptation
Ju-Chiang Wang, Yi-Hsuan Yang,
Hsin-Min Wang, and Skyh-Kang Jeng
Academia Sinica,
National Taiwan University,
Taipei, Taiwan
2
Outline
• Introduction
• The Acoustic Emotion Gaussians (AEG) Model
• Personalization via MAP Adaptation
• Music Emotion Recognition using AEG
• Evaluation and Result
• Conclusion
3
Introduction
• Developing a computational model that
comprehends the affective content from musical
audio signal, for automatic music emotion
recognition and content-based music retrieval
• Emotion perception in music is in nature
subjective (fairly user-dependent)
– A general music emotion recognition (MER) system
could be insufficient
– One’s personal device is desirable to understand
his/her perception of music emotion
– Adaptive MER method, efficient and effective
4
Basic Idea
• The UBM-GMM system for speaker adaptation
– State-of-the-art systems for speaker recognition
– A large background GMM (UBM), representing the
speaker-independent distribution of acoustic features
– Obtain the speaker-dependent GMM via model
adaptation with the speech data of a specific speaker
• Adaptive MER method for personalization
– A probabilistic background emotion model, learn the
broad emotion perception of music from general
users
– Personalize the background emotion model via model
adaptation in an online and dynamic fashion
5
Multi-Dimensional Emotion
• Emotions are considered as numerical values
(instead of discrete labels) over two emotion
dimensions, i.e., Valence and Arousal (Activation)
• Good visualization, a unified model
Mr. Emo developed by Yang and Chen
6
The Valence-Arousal Annotations
• Different emotions may be elicited from a song
• Assumption: the VA annotation of a song can be
drawn from a Gaussian distribution, as observed
• Learn from the multiple annotations and the acoustic
features of the corresponding song
• Predict the emotion as a single Gaussian
7
The Acoustic Emotion Gaussians Model
• Represent the acoustic features of a song by a
probabilistic histogram vector
• Develop a model to comprehend the relationship
between acoustic features and VA annotations
– Wang et al. (2012), “The acoustic emotion Gaussians model for
emotion-based music annotation and retrieval,” Proc. ACM Multimedia
(full paper)
Acoustic GMM Posterior Distributions
8
Construct Feature Reference Model
A1 A2
AK-1
AK A3A4
Each component represents
a specific pattern
EM Training
A Universal
Music Database
Acoustic GMM
Music Tracks
& Audio Signal
Frame-based Features
… …
… …
Global Set  of frame
vectors randomly
selected from each track
…
Music Tracks
& Audio Signal
A Universal
Music Database
Music Tracks
& Audio Signal
9
Represent a Song into Probabilistic Space
1
2
K-1
K…
Posterior
Probabilities over
the Acoustic GMM
…
A1
A2
AK-1
Acoustic GMM
AK
…
Feature Vectors
Histogram:
Acoustic GMM Posterior
prob
1 2 K-1 K…
10
Generative Process of VA GMM
• Key idea: Each component in acoustic GMM can generate
a component VA Gaussian
Audio Signal
of Each Clip
A Mixture of Gaussians
in the VA Space
…
A1
A2
AK-1
Acoustic GMM
AK
1
2
K-1
K
…
Viewed as a set of
acoustic codewords
11
The Likelihood Function of VA GMM
• Each training clip is annotated by multiple users {uj},
indexed by j
• An annotated corpus: assume each annotation eij of clip
si can be generated by a weighted VA GMM with {qik}!
• Generating the Corpus-level likelihood and maximize it
using the EM algorithm
1 1 1 1
( | ) ( | ) ( | , )
jU KN N
i ik ij k k
i i j k
p p s q
= = = =
= = å E E e  m S

1
( | ) ( | , )
K
ij i ik ij k k
k
p s q
=
= åe e Sm
Acoustic GMM posterior
Clip-level likelihood:
Each user contributes equally
parameters of each
latent VA Gaussian to learn
Annotation-level
Likelihood
12
Personalizing VA GMM via MAP
• Apply the Maximum A Posteriori (MAP) adaptation
• Suppose we have a set of personally annotated songs
{ei, qi}, i=1,…,M
• The posterior probability over each component zk for ei is
• The expected sufficient statistics with posterior and ei
1
( | , )
( | , )
( | , )
ik i k k
k i i K
iq i q qq
p z
q
q=
=
å
e
e
e


m
m
S
q
S
1
1
( | , )
( )
( | , )
M
k i i ii
k M
k i ii
p z
E
p z
=
=
¬
å
å
e e
e
m
q
q
1
1
( | , )
( )
( | , )
W T
k ij i i ii
k W
k i ii
p z
E
p z
=
=
=
å
å
e e e
e
q
S
q
13
MAP for GMM: Parameter Interpolation
• The updated parameters for the personalized VA GMM
can be derived by interpolation
• The effective number of component zk for the target user
• The interpolation factors (data-dependent) can be set by
( ) (1 ) ,k k k k kEa a¢ ¬ + -m m m
( ) (1 )( ) .T T
k k k k k k k k kEa a¢ ¢ ¢¬ + - + -S S S m m m m
1
( | , )
M
k k i i
i
M p z
=
= å e q




k
k
k
M
M
Parameter interpolation between the expectation and background
14
Graphical Interpretation – MAP Adaptation
1
2
3 6
5
4
Interpolation
Factor
Acoustic GMM Posterior
The personal annotation
can be applied to clips
exclusive to the
background training set
15
Music Emotion Recognition
• Given the acoustic GMM posterior of a test song, predict
the emotion as a single VA Gaussian
1
2
K-1
K
…
Acoustic GMM Posterior Learned VA GMM Predicted Single Gaussian
1
ˆˆ( | ) ( | , )
K
k ij k k
k
p s q
=
= åe e m S
^
^
^
^
…
{ , }*
m *
S
16
Find the Representative Gaussian
• Minimize the cumulative weighted relative entropy
– The representative Gaussian has the minimal cumulative
distance from all the component VA Gaussians
• The optimal parameters of the Gaussian are
( )KL
{ , }
1
ˆ( | , ) arg min ( | , ) || ( | , )
K
k k k
k
p D p pq* *
=
= åe e e
S
S S S
m
m m m
*
1
ˆ
K
k k
k
q
=
= åm m
( )* * *
1
ˆ ( )( )
K
T
k k k k
k
q
=
= + - -åS S m m m m
17
Evaluation – Dataset and Acoustic Features
• MER60
– 60 music clips, each is 30-second
– 99 users in total, each clip annotated by 40 subjects
– 6 users have annotated all the clips
– Evaluate the personalization based on these 6 users
• Bag-of-frames representation, perform the analysis of
emotion at the clip-level, instead of frame-level
– 70Dim: dynamic, spectral, timbre (13 MFCCs, 13 delta MFCCs,
and 13 delta-delta MFCCs), and tonal
18
Evaluation – Incremental Setting
• Incremental adaptation experiment per target user
– Randomly split all the clips (w/ annotations) into 6 folds
– Perform 6-fold CV
• Hold out one fold for testing
• The rest 5 folds: All the annotations except the
target user’s to train a background VA GMM
• Add one fold of annotation of target user into the
adaptation pool (P=5 iterations loop)
– Adaptation pool to adapt the background VA
GMM
– Evaluate the prediction performance of the test
fold
19
Evaluation – Result
• Metric (ALLi): compute the log-likelihood of the predicted
Gaussian with the ground truth annotation of the target
user
20
Conclusion and Future Work
• The AEG model provides a principled probabilistic
framework that is technically sound, and flexible for
adaptation
• We have presented a novel MAP-based adaptation
technique which is very efficient for personalizing the
AEG model
• Demonstrated the effectiveness of the proposed method
for personalizing MER in an incremental learning manner
• We will investigate the maximum likelihood linear
regression (MLLR) that learns a linear transformation
over the parameters of the AEG model

More Related Content

Similar to Personalized Music Emotion Recognition via Model Adaptation

Eurocon2009 Apalkov
Eurocon2009 ApalkovEurocon2009 Apalkov
Eurocon2009 Apalkov
Khryashchev
 
Data science-2013-heekim
Data science-2013-heekimData science-2013-heekim
Data science-2013-heekim
Haklae Kim
 

Similar to Personalized Music Emotion Recognition via Model Adaptation (20)

Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
Btp 1st
Btp 1stBtp 1st
Btp 1st
 
IRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation SystemIRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation System
 
AC overview
AC overviewAC overview
AC overview
 
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationMusic Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
 
TAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AITAAI 2016 Keynote Talk: It is all about AI
TAAI 2016 Keynote Talk: It is all about AI
 
Eurocon2009 Apalkov
Eurocon2009 ApalkovEurocon2009 Apalkov
Eurocon2009 Apalkov
 
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
IRJET- Machine Learning and Noise Reduction Techniques for Music Genre Classi...
 
A Unified Music Recommender System Using Listening Habits and Semantics of Tags
A Unified Music Recommender System Using Listening Habits and Semantics of TagsA Unified Music Recommender System Using Listening Habits and Semantics of Tags
A Unified Music Recommender System Using Listening Habits and Semantics of Tags
 
Data science-2013-heekim
Data science-2013-heekimData science-2013-heekim
Data science-2013-heekim
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
FMRI medical imagining
FMRI  medical imaginingFMRI  medical imagining
FMRI medical imagining
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-Attention
 
Emotion based music player
Emotion based music playerEmotion based music player
Emotion based music player
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
 
Oceans13 Presentation
Oceans13 PresentationOceans13 Presentation
Oceans13 Presentation
 
Slides of my presentation at EUSIPCO 2017
Slides of my presentation at EUSIPCO 2017 Slides of my presentation at EUSIPCO 2017
Slides of my presentation at EUSIPCO 2017
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Personalized Music Emotion Recognition via Model Adaptation

  • 1. 1 Personalized Music Emotion Recognition via Model Adaptation Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Skyh-Kang Jeng Academia Sinica, National Taiwan University, Taipei, Taiwan
  • 2. 2 Outline • Introduction • The Acoustic Emotion Gaussians (AEG) Model • Personalization via MAP Adaptation • Music Emotion Recognition using AEG • Evaluation and Result • Conclusion
  • 3. 3 Introduction • Developing a computational model that comprehends the affective content from musical audio signal, for automatic music emotion recognition and content-based music retrieval • Emotion perception in music is in nature subjective (fairly user-dependent) – A general music emotion recognition (MER) system could be insufficient – One’s personal device is desirable to understand his/her perception of music emotion – Adaptive MER method, efficient and effective
  • 4. 4 Basic Idea • The UBM-GMM system for speaker adaptation – State-of-the-art systems for speaker recognition – A large background GMM (UBM), representing the speaker-independent distribution of acoustic features – Obtain the speaker-dependent GMM via model adaptation with the speech data of a specific speaker • Adaptive MER method for personalization – A probabilistic background emotion model, learn the broad emotion perception of music from general users – Personalize the background emotion model via model adaptation in an online and dynamic fashion
  • 5. 5 Multi-Dimensional Emotion • Emotions are considered as numerical values (instead of discrete labels) over two emotion dimensions, i.e., Valence and Arousal (Activation) • Good visualization, a unified model Mr. Emo developed by Yang and Chen
  • 6. 6 The Valence-Arousal Annotations • Different emotions may be elicited from a song • Assumption: the VA annotation of a song can be drawn from a Gaussian distribution, as observed • Learn from the multiple annotations and the acoustic features of the corresponding song • Predict the emotion as a single Gaussian
  • 7. 7 The Acoustic Emotion Gaussians Model • Represent the acoustic features of a song by a probabilistic histogram vector • Develop a model to comprehend the relationship between acoustic features and VA annotations – Wang et al. (2012), “The acoustic emotion Gaussians model for emotion-based music annotation and retrieval,” Proc. ACM Multimedia (full paper) Acoustic GMM Posterior Distributions
  • 8. 8 Construct Feature Reference Model A1 A2 AK-1 AK A3A4 Each component represents a specific pattern EM Training A Universal Music Database Acoustic GMM Music Tracks & Audio Signal Frame-based Features … … … … Global Set  of frame vectors randomly selected from each track … Music Tracks & Audio Signal A Universal Music Database Music Tracks & Audio Signal
  • 9. 9 Represent a Song into Probabilistic Space 1 2 K-1 K… Posterior Probabilities over the Acoustic GMM … A1 A2 AK-1 Acoustic GMM AK … Feature Vectors Histogram: Acoustic GMM Posterior prob 1 2 K-1 K…
  • 10. 10 Generative Process of VA GMM • Key idea: Each component in acoustic GMM can generate a component VA Gaussian Audio Signal of Each Clip A Mixture of Gaussians in the VA Space … A1 A2 AK-1 Acoustic GMM AK 1 2 K-1 K … Viewed as a set of acoustic codewords
  • 11. 11 The Likelihood Function of VA GMM • Each training clip is annotated by multiple users {uj}, indexed by j • An annotated corpus: assume each annotation eij of clip si can be generated by a weighted VA GMM with {qik}! • Generating the Corpus-level likelihood and maximize it using the EM algorithm 1 1 1 1 ( | ) ( | ) ( | , ) jU KN N i ik ij k k i i j k p p s q = = = = = = å E E e  m S  1 ( | ) ( | , ) K ij i ik ij k k k p s q = = åe e Sm Acoustic GMM posterior Clip-level likelihood: Each user contributes equally parameters of each latent VA Gaussian to learn Annotation-level Likelihood
  • 12. 12 Personalizing VA GMM via MAP • Apply the Maximum A Posteriori (MAP) adaptation • Suppose we have a set of personally annotated songs {ei, qi}, i=1,…,M • The posterior probability over each component zk for ei is • The expected sufficient statistics with posterior and ei 1 ( | , ) ( | , ) ( | , ) ik i k k k i i K iq i q qq p z q q= = å e e e   m m S q S 1 1 ( | , ) ( ) ( | , ) M k i i ii k M k i ii p z E p z = = ¬ å å e e e m q q 1 1 ( | , ) ( ) ( | , ) W T k ij i i ii k W k i ii p z E p z = = = å å e e e e q S q
  • 13. 13 MAP for GMM: Parameter Interpolation • The updated parameters for the personalized VA GMM can be derived by interpolation • The effective number of component zk for the target user • The interpolation factors (data-dependent) can be set by ( ) (1 ) ,k k k k kEa a¢ ¬ + -m m m ( ) (1 )( ) .T T k k k k k k k k kEa a¢ ¢ ¢¬ + - + -S S S m m m m 1 ( | , ) M k k i i i M p z = = å e q     k k k M M Parameter interpolation between the expectation and background
  • 14. 14 Graphical Interpretation – MAP Adaptation 1 2 3 6 5 4 Interpolation Factor Acoustic GMM Posterior The personal annotation can be applied to clips exclusive to the background training set
  • 15. 15 Music Emotion Recognition • Given the acoustic GMM posterior of a test song, predict the emotion as a single VA Gaussian 1 2 K-1 K … Acoustic GMM Posterior Learned VA GMM Predicted Single Gaussian 1 ˆˆ( | ) ( | , ) K k ij k k k p s q = = åe e m S ^ ^ ^ ^ … { , }* m * S
  • 16. 16 Find the Representative Gaussian • Minimize the cumulative weighted relative entropy – The representative Gaussian has the minimal cumulative distance from all the component VA Gaussians • The optimal parameters of the Gaussian are ( )KL { , } 1 ˆ( | , ) arg min ( | , ) || ( | , ) K k k k k p D p pq* * = = åe e e S S S S m m m m * 1 ˆ K k k k q = = åm m ( )* * * 1 ˆ ( )( ) K T k k k k k q = = + - -åS S m m m m
  • 17. 17 Evaluation – Dataset and Acoustic Features • MER60 – 60 music clips, each is 30-second – 99 users in total, each clip annotated by 40 subjects – 6 users have annotated all the clips – Evaluate the personalization based on these 6 users • Bag-of-frames representation, perform the analysis of emotion at the clip-level, instead of frame-level – 70Dim: dynamic, spectral, timbre (13 MFCCs, 13 delta MFCCs, and 13 delta-delta MFCCs), and tonal
  • 18. 18 Evaluation – Incremental Setting • Incremental adaptation experiment per target user – Randomly split all the clips (w/ annotations) into 6 folds – Perform 6-fold CV • Hold out one fold for testing • The rest 5 folds: All the annotations except the target user’s to train a background VA GMM • Add one fold of annotation of target user into the adaptation pool (P=5 iterations loop) – Adaptation pool to adapt the background VA GMM – Evaluate the prediction performance of the test fold
  • 19. 19 Evaluation – Result • Metric (ALLi): compute the log-likelihood of the predicted Gaussian with the ground truth annotation of the target user
  • 20. 20 Conclusion and Future Work • The AEG model provides a principled probabilistic framework that is technically sound, and flexible for adaptation • We have presented a novel MAP-based adaptation technique which is very efficient for personalizing the AEG model • Demonstrated the effectiveness of the proposed method for personalizing MER in an incremental learning manner • We will investigate the maximum likelihood linear regression (MLLR) that learns a linear transformation over the parameters of the AEG model