SlideShare a Scribd company logo
TIMBRAL MODELING FOR MUSIC
ARTIST RECOGNITION USING
I-VECTORS
Hamid Eghbal-zadeh, Markus Schedl, Gerhard Widmer
Johannes Kepler University
Linz, Austria
1
EUSIPCO 2015
Overview
• Introduction
o Artist recognition
o I-vector based systems
• I-vector Frontend
o Calculate statistics [GMM supervectors ]
o Factor analysis [estimate hidden factors to extract I-vectors]
• Proposed method:
o Normalization and compensation techniques
o Backends
• Experiments
o Setup
o Evaluation
o Baselines
o Results
• Conclusion
2
Introduction – Artist recognition
• Artist recognition:
Recognizing the artist using a part of a song
Artist refers to the singer or the band of a song.
• Difficulties:
– Musical instruments
– Effects of the genre and Instrumentation
– Singer’s voice + instruments
3
Major Lazer & DJ Snake - Lean On
Singing voice Music
Introduction – I-vector based systems
• I-vectors:
– Introduced in speaker verification in 2010
– Provide a compact and low dimensional representation
• Also used for:
– Emotion recognition ,Language recognition , Audio scene detection
• Use Factor Analysis:
– Estimate hidden factors that can help us recognize an artist from a song
• Introducing Artist and Session factors in a song:
– Artist variability : the variability appears between songs of different artists.
– Session variability : the variability appears within songs of an artist.
4
Song Frame-level features
Song-levelfeatures
Estimatehiddenfactors
I-vector Factor Analysis – Terminology
5
i-vector
GMM supervector
Frame-level feature
Step3:
Factor
Analysis
Total Variability
Space (TVS)
[~400]
GMM space
[~20,000]
Frame-level
feature space
[~20]
Total factors
hidden
hidden
SpacesFeatures Factors
Step2:
Statistics
calculation
Step1:
Feature
extraction
Artist variability : the variability appears between different artists.
Session variability : the variability appears within songs of an artist.
Total variability : Artist + Session variability
𝛄 𝑡(𝑐)
𝑡
𝛄 𝑡 𝑐
𝑡
∗ 𝑋𝑡
𝛄 𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c
BW: Baum-Welch
0th BW
1st BW
GMM-supervector*
I-vectors – Statistics calculation
6
UBM
Unsupervised
UBM
Song 1
UBM
Song 2
UBM
.
.
.
Development db
Step 2: extract GMM supervectors
* Similar to: Charbuillet et al. , GMM-Supervector for Content based Music Similarity, DAFx 2011.
{MFCCs}{Songs}
Train/Test db
I-vectors - Factor analysis
7
Step 3: estimate hidden factors
Goal:
• Reduce the dimensionality
• Separate desired factors from undesired
factors in feature space
• Estimate hidden variables related to
desired factors
M(s) M s = m + 𝑂𝑠
UBM
Offset vector
Assumption:
GMM supervector
For song s
I-vectors - Factor analysis
8
Step 3: estimate hidden factors - previous methods
Residual matrix
Session subspace matrix
M s = m + 𝑉 ∗ 𝑦 + 𝑈 ∗ 𝑥 + 𝐷 ∗ 𝑧
Artist subspace matrix
Joint Factor Analysis (JFA) :
• JFA assumes 𝑂s consists of separated artist and session factors.
• JFA showed better performance than previous FA methods
mean vector of UBM Residual term
GMM supervector
For song s
I-vectors - Factor analysis
9
i-vector
~N(0,1)
M s = m + 𝑇 ∗ 𝑦
TVS (low-rank) matrix
Step 3: estimate hidden factors - current method
• TVS: Contains both artist and session factors
• T is initiated randomly and is learned using EM algorithm from
training data
I-vector extraction:
mean vector of UBM
GMM supervector
For song s
I-vectors – Learning T
10
• E step:
For each artist, use the current estimates of T to find the i-vector
which maximizes the likelihood function of the GMM supervector
of song s, 𝑀(𝑠)
y s = arg max
y
𝑃(𝑀(𝑠) |𝑚 + 𝑇𝑦, Σ)
• M step:
Update T by maximizing
𝑃(𝑀(𝑠) |𝑚 + 𝑇𝑦, Σ)
Step 3: estimate hidden factors - expectation
maximization
Covariance matrixUBM mean vector
I-vectors – Proposed system
1. I-vectors are centered by removing the mean
2. I-vectors are length normalized
3. LDA is used for compensation and dimensionality reduction
11
𝑦𝑛 =
𝑦
|𝑦|
i-vector
Length-normalized
i-vector
{I-vector extraction}
{DA,3NN,NB,PLDA}{MFCC}
Extract
features
Extract
GMM
supervectors
Front
end
Compensation/
Normalization
{LDA/Length norm}
Song
Backend
Backends
• Discriminant Analysis classifier
• Nearest neighbor classifier with cosine distance (k=3)
• Naïve Bayes classifier
• Probabilistic Linear Discriminant Analysis
12
𝑦 = 𝑚 + ɸ . 𝑙 + 𝑒
latent factor
Residual termi-vector
mean of training
i-vectors
latent matrix
Experiments – Setup
• 30 seconds is randomly selected from the middle
area of each song
• 13 and 20 dim MFCCs are used as frame-level
features
• 1024 components GMM is trained as UBM
• TVS matrix is trained with 400 factors
• LDA is applied for compensation and dimensionality
reduction
• Development db = Train set
13
Experiments – Evaluation
• “Artist20” dataset: 1413 tracks, mostly rock and pop,
composed of six albums each from 20 artists
• 6-fold cross-validation provided in Artist20 dataset
• In each iteration, 1 album out of 6 albums from artist
is kept out for test.
14
Experiments – Baselines
Best artist recognition performance found on Artist20 db:
1. Single GMM : [D. PW Ellis, 2007]
– Provided with the dataset
2. Signature-based approach: [S. Shirali, 2009]
– Generates compact signatures and compares them using graph matching
3. Sparse modelling: [L. Su, 2013]
– Sparse feature learning method with a ‘bag of features’ using the
magnitude and phase parts of the spectrum
4. Multivariate kernels: [P. Kuksa, 2014]
– Uses multivariate kernels with the direct uniform quantization
5. Alternative:
– Uses the same structure as proposed method, only i-vector extraction block
is switched with PCA
15
{PCA} {DA}
{MFCC}
Extract
features
GMM
supervecto
rs
Front
end
Compensation/
Normalization
{LDA/Length norm}
Song
Backend
I-vectors – Results
16
Best 13
Alt. 13
Best 20
Alt. 20
I-vectors – Results
• Results for different Gaussian numbers with
the proposed method and the DA classifier
17
Best 13
Best 20
Conclusion
18
• Total factors can model an artist
• Compact representation, low dimensionality
• Song-level features
• Robust to multiple backends
Acknowledgement
19
• We would like to acknowledge the tremendous help by Dan Ellis of
Columbia University who provided tools and resources for feature
extraction and shared the details of his work, which enabled us to
reproduce his experiment results
• Thanks also to Pavel Kuksa from University of Pennsylvania for sharing the
details of his work with us.
• We appreciate helpful suggestions of Marko Tkalcic from Johannes
Kepler University of Linz.
• This work was supported by the EU-FP7 project no.601166 “Performances
as Highly Enriched aNd Interactive Concert eXperiences (PHENICX)”.
Questions
20
Thank you for your time!
𝛄 𝑡(𝑐)
𝑡
𝛄 𝑡 𝑐
𝑡
∗ 𝑋𝑡
𝛄 𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c
BW: Baum-Welch
0th BW
1st BW
GMM-supervector
I-vectors - GMM supervector
21
Example:
UBM: 1024 components
Feature: 20 dim
0th BW=1024 x 1
1st BW=20 x 1024
Step 1
I-vectors - Factor analysis
22
𝑦 = (𝐼 + 𝑇 𝑡
Σ−1
𝑁 𝑠 𝑇)−1
. 𝑇−1
Σ−1
𝐹(𝑠)
0th BW 1st BWCovariance matrix of UBM
I−vector of song s:
Step 2: Closed form
𝛄 𝑡(𝑐)
𝑡
𝛄 𝑡 𝑐
𝑡
∗ 𝑋𝑡 𝛄 𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c
BW: Baum-Welch
𝑁 𝑠 ∶ 0th BW
𝐹 𝑠 ∶ 1st BW
TVS matrixIdentity matrix
(GMM supervectors)
i-vector
I-vector Extraction Routine
– Step 1: Feature extraction
– Step 2: Statistics calculation
• Extract GMM-supervectors from frame-level features (MFCCs)
– Step 3: Factor analysis
• Apply factor analysis to estimate hidden variables in GMM space
23
{I-vector extraction}
{PLDA,…}{MFCC}
Extract
features
Extract
GMM
supervectors
Front
end
Compensation/
Normalization
{LDA/Length norm}
Frames
Backend

More Related Content

Viewers also liked

Machine Learning for the Sensored IoT
Machine Learning for the Sensored IoTMachine Learning for the Sensored IoT
Machine Learning for the Sensored IoT
Hank Roark
 
IoT Platform with MQTT and Websocket
IoT Platform with MQTT and WebsocketIoT Platform with MQTT and Websocket
IoT Platform with MQTT and Websocket
Sofian Hadiwijaya
 
IoT and machine learning - Computational Intelligence conference
IoT and machine learning - Computational Intelligence conferenceIoT and machine learning - Computational Intelligence conference
IoT and machine learning - Computational Intelligence conference
Ajit Jaokar
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
ijsrd.com
 
IoT & Machine Learning
IoT & Machine LearningIoT & Machine Learning
IoT & Machine Learning
신동 강
 
Machine Learning and Internet of Things
Machine Learning and Internet of ThingsMachine Learning and Internet of Things
Machine Learning and Internet of Things
Sofian Hadiwijaya
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
Sarah Guido
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
Namratha Dcruz
 

Viewers also liked (8)

Machine Learning for the Sensored IoT
Machine Learning for the Sensored IoTMachine Learning for the Sensored IoT
Machine Learning for the Sensored IoT
 
IoT Platform with MQTT and Websocket
IoT Platform with MQTT and WebsocketIoT Platform with MQTT and Websocket
IoT Platform with MQTT and Websocket
 
IoT and machine learning - Computational Intelligence conference
IoT and machine learning - Computational Intelligence conferenceIoT and machine learning - Computational Intelligence conference
IoT and machine learning - Computational Intelligence conference
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
IoT & Machine Learning
IoT & Machine LearningIoT & Machine Learning
IoT & Machine Learning
 
Machine Learning and Internet of Things
Machine Learning and Internet of ThingsMachine Learning and Internet of Things
Machine Learning and Internet of Things
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
 
Speaker recognition systems
Speaker recognition systemsSpeaker recognition systems
Speaker recognition systems
 

Similar to Timbral modeling for music artist recognition using i-vectors

The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
Ju-Chiang Wang
 
Personalized Music Emotion Recognition via Model Adaptation
Personalized Music Emotion Recognition via Model AdaptationPersonalized Music Emotion Recognition via Model Adaptation
Personalized Music Emotion Recognition via Model Adaptation
Ju-Chiang Wang
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
Kitamura Laboratory
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
Aloïs Gruson
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
Kitamura Laboratory
 
AC overview
AC overviewAC overview
AC overview
WarNik Chow
 
Slides of my presentation at EUSIPCO 2017
Slides of my presentation at EUSIPCO 2017 Slides of my presentation at EUSIPCO 2017
Slides of my presentation at EUSIPCO 2017
Hamid Eghbal-zadeh
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationMusic Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
ivaderivader
 
[Seminar] 20210129 Hongkyu Lim
[Seminar] 20210129 Hongkyu Lim[Seminar] 20210129 Hongkyu Lim
[Seminar] 20210129 Hongkyu Lim
ivaderivader
 
Active Noise Cancellation
Active Noise CancellationActive Noise Cancellation
Active Noise Cancellation
JibranMughal
 
Acoustics 2016 12_06_off_peak_workshop_final
Acoustics 2016 12_06_off_peak_workshop_finalAcoustics 2016 12_06_off_peak_workshop_final
Acoustics 2016 12_06_off_peak_workshop_final
Anna Pernestål
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
奈良先端大 情報科学研究科
 
Emotion based music player
Emotion based music playerEmotion based music player
Emotion based music player
Nizam Muhammed
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
ActiveEon
 
Channel Estimation Techniques Based on Pilot Arrangement in OFDM Systems
Channel Estimation Techniques Based on Pilot Arrangement in OFDM SystemsChannel Estimation Techniques Based on Pilot Arrangement in OFDM Systems
Channel Estimation Techniques Based on Pilot Arrangement in OFDM Systems
Belal Essam ElDiwany
 
Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...
Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...
Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...
Ayush Singh, MS
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Daichi Kitamura
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Ju-Chiang Wang
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
Daichi Kitamura
 
chapter5-Filter Implementation-pp32.pptx
chapter5-Filter Implementation-pp32.pptxchapter5-Filter Implementation-pp32.pptx
chapter5-Filter Implementation-pp32.pptx
Harsh539534
 

Similar to Timbral modeling for music artist recognition using i-vectors (20)

The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
 
Personalized Music Emotion Recognition via Model Adaptation
Personalized Music Emotion Recognition via Model AdaptationPersonalized Music Emotion Recognition via Model Adaptation
Personalized Music Emotion Recognition via Model Adaptation
 
DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...DNN-based frequency-domain permutation solver for multichannel audio source s...
DNN-based frequency-domain permutation solver for multichannel audio source s...
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...Prior distribution design for music bleeding-sound reduction based on nonnega...
Prior distribution design for music bleeding-sound reduction based on nonnega...
 
AC overview
AC overviewAC overview
AC overview
 
Slides of my presentation at EUSIPCO 2017
Slides of my presentation at EUSIPCO 2017 Slides of my presentation at EUSIPCO 2017
Slides of my presentation at EUSIPCO 2017
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationMusic Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
 
[Seminar] 20210129 Hongkyu Lim
[Seminar] 20210129 Hongkyu Lim[Seminar] 20210129 Hongkyu Lim
[Seminar] 20210129 Hongkyu Lim
 
Active Noise Cancellation
Active Noise CancellationActive Noise Cancellation
Active Noise Cancellation
 
Acoustics 2016 12_06_off_peak_workshop_final
Acoustics 2016 12_06_off_peak_workshop_finalAcoustics 2016 12_06_off_peak_workshop_final
Acoustics 2016 12_06_off_peak_workshop_final
 
Depth Estimation of Sound Images Using Directional Clustering and Activation...
Depth Estimation of Sound Images Using  Directional Clustering and Activation...Depth Estimation of Sound Images Using  Directional Clustering and Activation...
Depth Estimation of Sound Images Using Directional Clustering and Activation...
 
Emotion based music player
Emotion based music playerEmotion based music player
Emotion based music player
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
 
Channel Estimation Techniques Based on Pilot Arrangement in OFDM Systems
Channel Estimation Techniques Based on Pilot Arrangement in OFDM SystemsChannel Estimation Techniques Based on Pilot Arrangement in OFDM Systems
Channel Estimation Techniques Based on Pilot Arrangement in OFDM Systems
 
Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...
Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...
Robust Video Denoising and Singing-Voice Separation using Low-rank matrix com...
 
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
 
Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...Depth estimation of sound images using directional clustering and activation-...
Depth estimation of sound images using directional clustering and activation-...
 
chapter5-Filter Implementation-pp32.pptx
chapter5-Filter Implementation-pp32.pptxchapter5-Filter Implementation-pp32.pptx
chapter5-Filter Implementation-pp32.pptx
 

Recently uploaded

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 

Recently uploaded (20)

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 

Timbral modeling for music artist recognition using i-vectors

  • 1. TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS Hamid Eghbal-zadeh, Markus Schedl, Gerhard Widmer Johannes Kepler University Linz, Austria 1 EUSIPCO 2015
  • 2. Overview • Introduction o Artist recognition o I-vector based systems • I-vector Frontend o Calculate statistics [GMM supervectors ] o Factor analysis [estimate hidden factors to extract I-vectors] • Proposed method: o Normalization and compensation techniques o Backends • Experiments o Setup o Evaluation o Baselines o Results • Conclusion 2
  • 3. Introduction – Artist recognition • Artist recognition: Recognizing the artist using a part of a song Artist refers to the singer or the band of a song. • Difficulties: – Musical instruments – Effects of the genre and Instrumentation – Singer’s voice + instruments 3 Major Lazer & DJ Snake - Lean On Singing voice Music
  • 4. Introduction – I-vector based systems • I-vectors: – Introduced in speaker verification in 2010 – Provide a compact and low dimensional representation • Also used for: – Emotion recognition ,Language recognition , Audio scene detection • Use Factor Analysis: – Estimate hidden factors that can help us recognize an artist from a song • Introducing Artist and Session factors in a song: – Artist variability : the variability appears between songs of different artists. – Session variability : the variability appears within songs of an artist. 4 Song Frame-level features Song-levelfeatures Estimatehiddenfactors
  • 5. I-vector Factor Analysis – Terminology 5 i-vector GMM supervector Frame-level feature Step3: Factor Analysis Total Variability Space (TVS) [~400] GMM space [~20,000] Frame-level feature space [~20] Total factors hidden hidden SpacesFeatures Factors Step2: Statistics calculation Step1: Feature extraction Artist variability : the variability appears between different artists. Session variability : the variability appears within songs of an artist. Total variability : Artist + Session variability
  • 6. 𝛄 𝑡(𝑐) 𝑡 𝛄 𝑡 𝑐 𝑡 ∗ 𝑋𝑡 𝛄 𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c BW: Baum-Welch 0th BW 1st BW GMM-supervector* I-vectors – Statistics calculation 6 UBM Unsupervised UBM Song 1 UBM Song 2 UBM . . . Development db Step 2: extract GMM supervectors * Similar to: Charbuillet et al. , GMM-Supervector for Content based Music Similarity, DAFx 2011. {MFCCs}{Songs} Train/Test db
  • 7. I-vectors - Factor analysis 7 Step 3: estimate hidden factors Goal: • Reduce the dimensionality • Separate desired factors from undesired factors in feature space • Estimate hidden variables related to desired factors M(s) M s = m + 𝑂𝑠 UBM Offset vector Assumption: GMM supervector For song s
  • 8. I-vectors - Factor analysis 8 Step 3: estimate hidden factors - previous methods Residual matrix Session subspace matrix M s = m + 𝑉 ∗ 𝑦 + 𝑈 ∗ 𝑥 + 𝐷 ∗ 𝑧 Artist subspace matrix Joint Factor Analysis (JFA) : • JFA assumes 𝑂s consists of separated artist and session factors. • JFA showed better performance than previous FA methods mean vector of UBM Residual term GMM supervector For song s
  • 9. I-vectors - Factor analysis 9 i-vector ~N(0,1) M s = m + 𝑇 ∗ 𝑦 TVS (low-rank) matrix Step 3: estimate hidden factors - current method • TVS: Contains both artist and session factors • T is initiated randomly and is learned using EM algorithm from training data I-vector extraction: mean vector of UBM GMM supervector For song s
  • 10. I-vectors – Learning T 10 • E step: For each artist, use the current estimates of T to find the i-vector which maximizes the likelihood function of the GMM supervector of song s, 𝑀(𝑠) y s = arg max y 𝑃(𝑀(𝑠) |𝑚 + 𝑇𝑦, Σ) • M step: Update T by maximizing 𝑃(𝑀(𝑠) |𝑚 + 𝑇𝑦, Σ) Step 3: estimate hidden factors - expectation maximization Covariance matrixUBM mean vector
  • 11. I-vectors – Proposed system 1. I-vectors are centered by removing the mean 2. I-vectors are length normalized 3. LDA is used for compensation and dimensionality reduction 11 𝑦𝑛 = 𝑦 |𝑦| i-vector Length-normalized i-vector {I-vector extraction} {DA,3NN,NB,PLDA}{MFCC} Extract features Extract GMM supervectors Front end Compensation/ Normalization {LDA/Length norm} Song Backend
  • 12. Backends • Discriminant Analysis classifier • Nearest neighbor classifier with cosine distance (k=3) • Naïve Bayes classifier • Probabilistic Linear Discriminant Analysis 12 𝑦 = 𝑚 + ɸ . 𝑙 + 𝑒 latent factor Residual termi-vector mean of training i-vectors latent matrix
  • 13. Experiments – Setup • 30 seconds is randomly selected from the middle area of each song • 13 and 20 dim MFCCs are used as frame-level features • 1024 components GMM is trained as UBM • TVS matrix is trained with 400 factors • LDA is applied for compensation and dimensionality reduction • Development db = Train set 13
  • 14. Experiments – Evaluation • “Artist20” dataset: 1413 tracks, mostly rock and pop, composed of six albums each from 20 artists • 6-fold cross-validation provided in Artist20 dataset • In each iteration, 1 album out of 6 albums from artist is kept out for test. 14
  • 15. Experiments – Baselines Best artist recognition performance found on Artist20 db: 1. Single GMM : [D. PW Ellis, 2007] – Provided with the dataset 2. Signature-based approach: [S. Shirali, 2009] – Generates compact signatures and compares them using graph matching 3. Sparse modelling: [L. Su, 2013] – Sparse feature learning method with a ‘bag of features’ using the magnitude and phase parts of the spectrum 4. Multivariate kernels: [P. Kuksa, 2014] – Uses multivariate kernels with the direct uniform quantization 5. Alternative: – Uses the same structure as proposed method, only i-vector extraction block is switched with PCA 15 {PCA} {DA} {MFCC} Extract features GMM supervecto rs Front end Compensation/ Normalization {LDA/Length norm} Song Backend
  • 16. I-vectors – Results 16 Best 13 Alt. 13 Best 20 Alt. 20
  • 17. I-vectors – Results • Results for different Gaussian numbers with the proposed method and the DA classifier 17 Best 13 Best 20
  • 18. Conclusion 18 • Total factors can model an artist • Compact representation, low dimensionality • Song-level features • Robust to multiple backends
  • 19. Acknowledgement 19 • We would like to acknowledge the tremendous help by Dan Ellis of Columbia University who provided tools and resources for feature extraction and shared the details of his work, which enabled us to reproduce his experiment results • Thanks also to Pavel Kuksa from University of Pennsylvania for sharing the details of his work with us. • We appreciate helpful suggestions of Marko Tkalcic from Johannes Kepler University of Linz. • This work was supported by the EU-FP7 project no.601166 “Performances as Highly Enriched aNd Interactive Concert eXperiences (PHENICX)”.
  • 21. 𝛄 𝑡(𝑐) 𝑡 𝛄 𝑡 𝑐 𝑡 ∗ 𝑋𝑡 𝛄 𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c BW: Baum-Welch 0th BW 1st BW GMM-supervector I-vectors - GMM supervector 21 Example: UBM: 1024 components Feature: 20 dim 0th BW=1024 x 1 1st BW=20 x 1024 Step 1
  • 22. I-vectors - Factor analysis 22 𝑦 = (𝐼 + 𝑇 𝑡 Σ−1 𝑁 𝑠 𝑇)−1 . 𝑇−1 Σ−1 𝐹(𝑠) 0th BW 1st BWCovariance matrix of UBM I−vector of song s: Step 2: Closed form 𝛄 𝑡(𝑐) 𝑡 𝛄 𝑡 𝑐 𝑡 ∗ 𝑋𝑡 𝛄 𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c BW: Baum-Welch 𝑁 𝑠 ∶ 0th BW 𝐹 𝑠 ∶ 1st BW TVS matrixIdentity matrix (GMM supervectors) i-vector
  • 23. I-vector Extraction Routine – Step 1: Feature extraction – Step 2: Statistics calculation • Extract GMM-supervectors from frame-level features (MFCCs) – Step 3: Factor analysis • Apply factor analysis to estimate hidden variables in GMM space 23 {I-vector extraction} {PLDA,…}{MFCC} Extract features Extract GMM supervectors Front end Compensation/ Normalization {LDA/Length norm} Frames Backend