4. A small footprint for Audio and Music classification
4
𝑎1
𝑎2
𝑎 𝑛
.
.
.
Audio Acoustic features Front-end Small footprint Classifier
o Front-end:
• Block-level features (Genre classification) [Seyerlehner,2010]
• Adapted GMM means (Genre classification) [Charbuillet,2011]
• Adapted RBM weights (Speaker verification) [Ghahabi,2014]
• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Machine learning
5. 5
• Block-level features (Genre classification) [Seyerlehner,2010]
• Adapted GMM means (Genre classification) [Charbuillet,2011]
• Adapted RBM weights (Speaker verification) [Ghahabi,2014]
• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model
(UBM)
Train db
+
UBM Adaptation Adapted UBM
params
Classifier
Train
Test db
+
UBM Adaptation Adapted UBM
params
Classifier
Test
Train
6. 6
• Block-level features (Genre classification) [Seyerlehner,2010]
• Adapted GMM means (Genre classification) [Charbuillet,2011]
• Adapted RBM weights (Speaker verification) [Ghahabi,2014]
• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model
(UBM)
Train db
+
UBM Adaptation
Adapted UBM
params
Classifier
Train
Test db
+
UBM Adaptation Adapted UBM
params
Classifier
Test
Train
Train
Test
Factor analysis
Factor analysis
7. Effect of Factor Analysis step
7
An example of songs in GTZAN dataset from
3 genres [Eghbal-zadeh, ISMIR2015]:
Right: without Factor Analysis
Left: With Factor Analysis
Artist recognition performance on Artist20 with and
Without Factor Analysis [Eghbal-zadeh, Eusipco2015]
Without FA
With FA
8. 8
Other benefits:
• Noise-Robust features [Eghbal-zadeh,ISMIR2016]
• Combined with Neural Nets [Eghbal-zadeh, DAFx2016]
• Successfully used in different tasks:
• Speaker verification
• Language recognition
• Artist recognition
• Music similarity
• Audio scene classification
9. Why to apply Factor Analysis?
• They provide an information-rich, fixed-length,
low-dimensional representation
• They have a single-Gaussian distribution
• We can use the properties of Gaussians
• They can be easily scored
• Using cosine distance
• They are the estimated latent factors with a
good discrimination power resulted from a
Factor Analysis procedure
9
11. 11
• Block-level features (Genre classification) [Seyerlehner,2010]
• Adapted GMM means (Genre classification) [Charbuillet,2011]
• Adapted RBM weights (Speaker verification) [Ghahabi,2014]
• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db UBM (GMM)
Train db
+
UBM
Adapted GMM params
(statistical representation)
Classifier
Train
Test db
+
UBM Classifier
Test
Train
Train
Test
Factor analysis
Factor analysis
Adapted GMM params
(statistical representation)
12. 12
Different Factor Analysis approaches:
Adapted GMM mean
UBM mean
Eigenvoice subspace
Hidden vectorM = m + V y
Adapted GMM mean
UBM mean
Song subspace
residualM = m + Vy + Ux + Dz
Artist subspace
Adapted GMM mean
UBM mean
Low-rank matrix model both artist
and song together
Hidden vector
(i-vector)
M = m + T y
Eigenvoice FA:
Joint Factor Analysis (JFA):
I-vector FA:
13. 13
An example of i-vector based systems
{I-vector extraction}
{Cosine score,…}{MFCC}
Extract
features
Compute
statistics
Extract
i-vectors
Post-Processing
{LDA/WCCN/…}
features
Classification
14. 14
Within-Class Covariance Normalization
Averaged i-vectors for
class c
𝑖 𝑡ℎ
i-vectors from class c
Number of i-vectors from
class cNumber of classes
WCCN projection matrix
Within-class
covariance matrix
17. • Audio Scene Classification
– DCASE-2016 challenge
– 15 different scenes (30 sec audios from: train, tram, office, outdoor, etc…)
– We won the challenge!!!
• Music Similarity
– GTZAN and 1517Artists
– Eval using genre
• Music Artist Recognition
– Artist20 and MSD
– Noise-robust MAR using 12 different kinds and levels of noise
17
Tasks
19. • UBM trained on 1517Artists db, tested on GTZAN
• I-vectors are extracted unsupervised
• Evaluated with genre labels
19
Music Similarity [ISMIR-2015]
20. • Artist20 db
– 20 artists
– 1413 songs
20
Music Artist Recognition [Eusipco-2015]
21. • MSD db
– 50 Artists
– 5,000 songs
21
Music Artist Recognition [DAFx-2016]
CDB-Net
Experiment 2 – Raw i-vectors
22. • Artist20 db
– 4 different noises :
• festival noise
• humming noise
• pink noise
• PUB noise
– 3 different SNR levels
22
Noise-Robust Music Artist Recognition [ISMIR-2016]
24. Conclusion:
• A small footprint using FA
• Useful for different audio and music related tasks
• Robustness against noise
• Useful as Neural Net features
24