A small footprint for audio and music classification

A small footprint for
audio and music classification
Hamid Eghbal-zadeh
1

Outline
1. Introduction
2. I-Vector representation
3. Some results
4. Conclusion
2

A small footprint for Audio and Music classification
4
𝑎1
𝑎2
𝑎 𝑛
.
.
.
Audio Acoustic features Front-end Small footprint Classifier
o Front-end:
• Block-level features (Genre classification) [Seyerlehner,2010]
• Adapted GMM means (Genre classification) [Charbuillet,2011]
• Adapted RBM weights (Speaker verification) [Ghahabi,2014]
• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Machine learning

5
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model
(UBM)
Train db
+
UBM Adaptation Adapted UBM
params
Classifier
Train
Test db
+
params
Classifier
Test
Train

6
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model
(UBM)
Train db
+
UBM Adaptation
Adapted UBM
params
Classifier
Train
Test db
+
params
Classifier
Test
Train
Train
Test
Factor analysis
Factor analysis

Effect of Factor Analysis step
7
An example of songs in GTZAN dataset from
3 genres [Eghbal-zadeh, ISMIR2015]:
Right: without Factor Analysis
Left: With Factor Analysis
Artist recognition performance on Artist20 with and
Without Factor Analysis [Eghbal-zadeh, Eusipco2015]
Without FA
With FA

8
Other benefits:
• Noise-Robust features [Eghbal-zadeh,ISMIR2016]
• Combined with Neural Nets [Eghbal-zadeh, DAFx2016]
• Successfully used in different tasks:
• Speaker verification
• Language recognition
• Artist recognition
• Music similarity
• Audio scene classification

Why to apply Factor Analysis?
• They provide an information-rich, fixed-length,
low-dimensional representation
• They have a single-Gaussian distribution
• We can use the properties of Gaussians
• They can be easily scored
• Using cosine distance
• They are the estimated latent factors with a
good discrimination power resulted from a
Factor Analysis procedure
9

I-VECTOR
REPRESENTATION AS
A SMALLFOOTPRINT
10

11
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db UBM (GMM)
Train db
+
UBM
Adapted GMM params
(statistical representation)
Classifier
Train
Test db
+
UBM Classifier
Test
Train
Train
Test
Factor analysis
Factor analysis
Adapted GMM params
(statistical representation)

12
Different Factor Analysis approaches:
Adapted GMM mean
UBM mean
Eigenvoice subspace
Hidden vectorM = m + V y
Adapted GMM mean
UBM mean
Song subspace
residualM = m + Vy + Ux + Dz
Artist subspace
Adapted GMM mean
UBM mean
Low-rank matrix model both artist
and song together
Hidden vector
(i-vector)
M = m + T y
Eigenvoice FA:
Joint Factor Analysis (JFA):
I-vector FA:

13
An example of i-vector based systems
{I-vector extraction}
{Cosine score,…}{MFCC}
Extract
features
Compute
statistics
Extract
i-vectors
Post-Processing
{LDA/WCCN/…}
features
Classification

14
Within-Class Covariance Normalization
Averaged i-vectors for
class c
𝑖 𝑡ℎ
i-vectors from class c
Number of i-vectors from
class cNumber of classes
WCCN projection matrix
Within-class
covariance matrix

15
Within-Class Covariance Normalization
Class B
Class A
WCCN projection
The within-class variability
Is reduced

• Audio Scene Classification
– DCASE-2016 challenge
– 15 different scenes (30 sec audios from: train, tram, office, outdoor, etc…)
– We won the challenge!!!
• Music Similarity
– GTZAN and 1517Artists
– Eval using genre
• Music Artist Recognition
– Artist20 and MSD
– Noise-robust MAR using 12 different kinds and levels of noise
17
Tasks

• Our approach: an i-vector DNN hybrid (4 submissions Among 49 participants)
– 1st place: hybrid
– 2nd place: i-vector
– 5th place: i-vector
– 14th place: DNN
18
Audio Scene Classification Challenge (𝐃𝐂𝐀𝐒𝐄 − 𝟐𝟎𝟏𝟔[𝟏]
)
[1] http://www.cs.tut.fi/sgn/arg/dcase2016/

• UBM trained on 1517Artists db, tested on GTZAN
• I-vectors are extracted unsupervised
• Evaluated with genre labels
19
Music Similarity [ISMIR-2015]

• Artist20 db
– 20 artists
– 1413 songs
20
Music Artist Recognition [Eusipco-2015]

• MSD db
– 50 Artists
– 5,000 songs
21
Music Artist Recognition [DAFx-2016]
CDB-Net
Experiment 2 – Raw i-vectors

• Artist20 db
– 4 different noises :
• festival noise
• humming noise
• pink noise
• PUB noise
– 3 different SNR levels
22
Noise-Robust Music Artist Recognition [ISMIR-2016]

Conclusion:
• A small footprint using FA
• Useful for different audio and music related tasks
• Robustness against noise
• Useful as Neural Net features
24

A small footprint for audio and music classification

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

A small footprint for audio and music classification