2. Outline
Work and Consciousness
Outline
Introduction
Feature Extraction
Dataset
MFCC
Algorithm and Training
2-Step Classification
Various Techniques used
Reasons for Each
Performance Evaluation
Confusion Matrix and Comparison.
Conclusions ,Future Work and Improvements
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
3. Problem Statement
Work and Consciousness
Introduction
Problem Statement
Given a music waveform, trying to tag it with one of the known genres ( Rock, Pop, classical,country, Jazz..etc)
Use as little manual effort as possible, try to build models which can do this for us.
Why Music Genre Prediction ?
Instant Ability to predict a genre without listening to it.
Millions of songs of the past for which genre isn’t associated, can now have appropriate genre tagged.
Every Genre has a pattern intrinsic to it, which has very good scope for prediction.
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
4. Pipleline Overview
Work and Consciousness
Pipeline Overview
Feature Extraction
2-level Classification process
Custering of Like-Wise Genres
Building the Level-1 Classifier that classifies the cluster ID to which a song would belong
Building Classifiers at Level 2 , one for each cluster
Use this 2-level classifier for Prediction
Libraries Used :
Matlab (MFCC Library)
Python (sklearn)
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
5. Dataset and Data Preparation
Work and Consciousness
Dataset and Data Preparation
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
Marysas DataSet[3] from their website
(Music Analysis ,Retreival, and Synthesis of Audio Signals)
Entire Datasize : 1.2 GB
1000 music tracks , with 100 songs from each of the 10 genres.
22050 Hz Mono, 16 bit in “.au” format
Each Music file is of 30 seconds duration.
70:30 split for training and testing from each genre
700 music files for training the model (70*10)
300 music files for testing the performance (30*10)
6. MFCC
Work and Consciousness
Mel Frequency Cepestral Coefficients
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
MFCC Features represent the signal in Cepestral Domain.
Cepestral Domain
Zeroth order cepstral coefficient is the energy (average of Mel bands)
First Order CC Captures slow variation in spectrum, (such as spectral tilt)
The Second Order CC captures slow variation in the spectrum and so on….
Higher order coefficients represent the fine harmonic structure of the audio signal,
We Consider the top 15 coefficients for our study for simplicity purposes.
7. 2-Level Classification
Work and Consciousness
2-Level Classification
Why ?
Literature Survey has shown that Classification accuracy decreases as the number of genres we try to predict goes
above 4.
Validation ?
Train a model with most prominent genres: Classical, Jazz, Metal ,Pop
Accuracy :
KNN : 81.35% ( 4 with uniform weights)
SVM : 82.76 % ( C (reg) = 1.0 )
Random Forests : 81.51%
(Min estimators = 10, min node split = 2 )
Accuracy with model trained on 10 genres:
SVM : 28.4%
KNN : 22.7%
Neural Nets : 26.9%
(Back Propogation, Initial bias = 0.1, n= 500)
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
8. 2-Level Classification
Work and Consciousness
2-Level Classification
Solution ?
Model can perform better when given a maximum of 4 Genres to Predict
We need to make predictions for 10 genres ?
Go Back to the basic reason for studying this problem….
Patterns in music that can be common across genres
How can we find these patterns and what could these patterns be ?
Leave it for the machine to predict them !!!
Use Clustering Algorithm and let it cluster all the songs it feels are more likewise.
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
9. Clustering
Work and Consciousness
Clustering
K-Means Clustering : (Euclidean Distance)
Number of Clusters = 4, why ? Think…
A Classifier needs to be trained whose performance will again go down when it tries to classify among 4
or more genres.
Songs in each of the Genre and their dist.
across various clusters :
Similarity Metric can be drawn…
Pop,Rock are closer in behaviour.
Other Mappings can be drawn
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
10. Algorithm Paramters and Approach
Work and Consciousness
2-Level Classification
Transform the Initial Training Data with class label to be the Cluster ID (y)
Use this Transformed data for training the level -1 classifier.
Aim of this Classifier :
Given a Song, to tag it with the cluster ID to which it would go.
Classifier Used :SVM with Linear Kernel ( C Regularization = 1.0).
SVM gave the best performance compared to any other.
(Why not RBF Kernel ? Tried this , Lead to over fitting)
Each Cluster is further trained with a classifier that tags any song into the genres
which belong to that cluster .
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
11. Performance Evaluation
Work and Consciousness
Performance Evaluation
First Level Classifier Accuracy : 76%
After the End of Classification
( 1st followed by second)
Average F1 Score : 50.45%
Accuracy (Overall) : 52%
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
5.Performance)Evaluation):)
! I!used!SVM!with!linear!kernel!(with!C=1.0)!for!the!first!step!of!Classification!and!
then!used!AdaBoost!algorithm!on!the!SVM!classifier!to!correctly!identify!the!cluster!to!
which!a!song!must!belong!to.!The!number!of!estimators!used!for!AdaBoost!Algorithm!is!
200.!In!the!second!step!all!of!the!classifiers!that!belong!to!a!cluster!are!trained!with!
another!SVM!classifier!with!Linear!Kernel!and!C!=!1.0.!
The!first!level!Accuracy!was!76.254!%!where!as!at!the!end!of!2nd
!phase,!the!overall!
classification!accuracy!was!found!to!be!50.16722%!.!This!is!a!tremendous!increase!from!
the!28%!accuracy!if!we!involve!all!the!10!Genres!directly!in!classification!in!the!first!step.!
Genre! Cluster!ID!! Precision,!Recall! F1!Score!
Blues! 1! !!!!!!!24,24! !!!!!24!
Classical! 2! !!!!!!!92,80! !!!!!79!
Country! 1! !!!!!!45,70! !!!!!55!
Disco! 0! !!!!!!41,43! !!!!!!42!
Hiphop! 3! !!!!!!51,46! !!!!!!49!!
Jazz! 2! !!!!!!85,63! !!!!!70!
Metal! 0! !!!!!84,53! !!!!!65!
Pop! 0! !!!!!75,70! !!!!!72!
Reggae! 3! !!!!!21,16! !!!!!18!
Rock! 0! !!!!!26,36! !!!!!30!
!
Average!!F1!Score!:!!50.45,!Accuracy!%!=!51!%!!
12. Confusion Matrix
Work and Consciousness
Performance Evaluation
Confusion Matrix :
[[10 3 2 1 2 2 1 0 2 1]
Observe Last 2 Genres : [ 0 26 0 0 0 9 0 0 0 0]
Reggagae & Rock [ 5 0 17 0 5 0 0 1 4 5]
(Even the F1 Score was observed to be very low) [ 1 0 1 14 1 0 3 5 0 8]
[ 1 0 0 0 12 0 1 0 8 1]
Genres that performed well are the ones that had [ 1 0 0 1 0 18 0 0 1 0]
clear cut cluster ID with little scattering. [ 0 0 0 0 1 0 17 0 0 2]
1) How can we achieve this ? [ 1 0 0 3 2 0 0 22 5 1]
2) Soln : Use Hybrid Clustering [ 1 0 5 3 3 0 0 1 5 3]
[ 9 1 5 8 4 1 8 1 5 9]]
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
13. Alternative Approach
Work and Consciousness
Alternate Approach
All the Classification algorithms were tried, and the above posted results were with
the best performed classifier
Can the clustering accuracy be improved ?
Lets Try Hybrid Clustering :
K-Means followed by Agglomerative (Top-Down Hierarchal) clustering.
We take the centroids of each genre (obtained from k-means) as the inputs at the
first level for Agglomerative Clustering.
We break the tree at the level where the number of clusters is 4.
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
14. Agglomerative Clustering
Work and Consciousness
Heirarchal Clustering (Top-Down, Agglomerative)
We Start off with each example as an individual cluster
and continue the clustering of “nearby” clusters till we hit
the desired number of clusters.
Initially, each of the 1000 examples was treated as individual cluster
Clustering Results werent much different from k-Means
Introduced a multiple cluster ID mapping for a genre.
init_mapping={0:[2],1:[0,2],2:[2,3],3:[1,3],4:[1],5:[0,2],6:[3],7:[1],8:[1,2],9:[3]}
Instead of mapping a cluster to single Cluster ID
1st Level Accuracy rose by 2% only .
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
15. Hybrid Clustering Performance
Work and Consciousness
Performance Evaluation from Hybrid Clustering
Choice of Features for Agglomerative Hybrid Clustering :
Centroids of Clusters (where each genre is taken to be an individual cluster) that
captures all the variance in a particular genre.
Using this Centroid gives us a more precise estimate of which genre is closer to
other( without scattering of few individuals in a genre across different clusters).
1st Level Accuracy improved to 82.3% from 72.16%
Lets Try Adding Adaboost (at cluster level) at classification which might eliminate the
bias in each of the classifier (Might also backfire by overfitting!!)
Overall accuracy improved to 58.18%
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
16. Future Work and Limitations
Work and Consciousness
Future Work and Limitations :
Consider using Larger training data ( Million Song Subset)
All these results were published by training only 700 songs and testing on 300
songs.
This might have resulted in over fitting of few algorithms and weaker analysis.
Better Feature Engineering.(Consider taking more than 15 features or a polynomial
relationship across different co-efficient)
Instead of using Euclidean distance in K-Means, we could have used a better
metric for measuring the distances.
This was done for Demo Purposes and tuned for quick implementation ease, hence
has a lot of scope for improvement.
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
17. References
Work and Consciousness
References :
1. Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." The Journal of Machine Learning Research 12
(2011): 2825-2830.
2. Goodfellow, Ian J., et al. "Pylearn2: a machine learning research library." arXiv preprint arXiv:1308.4214 (2013).
3. Marsyas. ”Data Sets” http://marsyas.info/download/data_sets/.
4. Logan, Beth. "Mel Frequency Cepstral Coefficients for Music Modeling." ISMIR. 2000.
5. Fu, A., Lu, G., Ting, K.M., Zhang, D.. ”A Survey of Audio-Based Music Classification and An- notation” IEEE Transactions
on Multimedia.
6. “Mfcc” by kanu Mehta : http://www.mathworks.com/matlabcentral/fileexchange/23119-mfcc
Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results