Music Genre Prediction using ML Techniques

Introduction
Feature Extraction
Algorithm and Training
Performance Evaluation & Results
Music Genre Prediction using
ML Techniques
Anusha Chavva

Outline
Work and Consciousness
Outline
 Introduction
 Feature Extraction
 Dataset
 MFCC
 Algorithm and Training
 2-Step Classification
 Various Techniques used
 Reasons for Each
 Performance Evaluation
 Confusion Matrix and Comparison.
 Conclusions ,Future Work and Improvements
Introduction
Feature Extraction

Problem Statement
Introduction
 Problem Statement
 Given a music waveform, trying to tag it with one of the known genres ( Rock, Pop, classical,country, Jazz..etc)
 Use as little manual effort as possible, try to build models which can do this for us.
 Why Music Genre Prediction ?
 Instant Ability to predict a genre without listening to it.
 Millions of songs of the past for which genre isn’t associated, can now have appropriate genre tagged.
 Every Genre has a pattern intrinsic to it, which has very good scope for prediction.
Introduction
Feature Extraction

Pipleline Overview
Pipeline Overview
Feature Extraction
2-level Classification process
 Custering of Like-Wise Genres
 Building the Level-1 Classifier that classifies the cluster ID to which a song would belong
 Building Classifiers at Level 2 , one for each cluster
 Use this 2-level classifier for Prediction
Libraries Used :
 Matlab (MFCC Library)
 Python (sklearn)
Introduction
Feature Extraction

Dataset and Data Preparation
Dataset and Data Preparation
Introduction
Feature Extraction
 Marysas DataSet[3] from their website
 (Music Analysis ,Retreival, and Synthesis of Audio Signals)
 Entire Datasize : 1.2 GB
 1000 music tracks , with 100 songs from each of the 10 genres.
 22050 Hz Mono, 16 bit in “.au” format
 Each Music file is of 30 seconds duration.
 70:30 split for training and testing from each genre
 700 music files for training the model (70*10)
 300 music files for testing the performance (30*10)

MFCC
Mel Frequency Cepestral Coefficients
Introduction
Feature Extraction
MFCC Features represent the signal in Cepestral Domain.
Cepestral Domain
 Zeroth order cepstral coefficient is the energy (average of Mel bands)
 First Order CC Captures slow variation in spectrum, (such as spectral tilt)
 The Second Order CC captures slow variation in the spectrum and so on….
 Higher order coefficients represent the fine harmonic structure of the audio signal,
We Consider the top 15 coefficients for our study for simplicity purposes.

2-Level Classification
Why ?
 Literature Survey has shown that Classification accuracy decreases as the number of genres we try to predict goes
above 4.
 Validation ?
 Train a model with most prominent genres: Classical, Jazz, Metal ,Pop
 Accuracy :
 KNN : 81.35% ( 4 with uniform weights)
 SVM : 82.76 % ( C (reg) = 1.0 )
 Random Forests : 81.51%
 (Min estimators = 10, min node split = 2 )
 Accuracy with model trained on 10 genres:
 SVM : 28.4%
 KNN : 22.7%
 Neural Nets : 26.9%
 (Back Propogation, Initial bias = 0.1, n= 500)
Introduction
Feature Extraction

Solution ?
 Model can perform better when given a maximum of 4 Genres to Predict
 We need to make predictions for 10 genres ?
Go Back to the basic reason for studying this problem….
 Patterns in music that can be common across genres
 How can we find these patterns and what could these patterns be ?
 Leave it for the machine to predict them !!!
 Use Clustering Algorithm and let it cluster all the songs it feels are more likewise.
Introduction
Feature Extraction

Clustering
Clustering
K-Means Clustering : (Euclidean Distance)
 Number of Clusters = 4, why ? Think…
 A Classifier needs to be trained whose performance will again go down when it tries to classify among 4
or more genres.
 Songs in each of the Genre and their dist.
across various clusters :
Similarity Metric can be drawn…
Pop,Rock are closer in behaviour.
Other Mappings can be drawn
Introduction
Feature Extraction

Algorithm Paramters and Approach
 Transform the Initial Training Data with class label to be the Cluster ID (y)
 Use this Transformed data for training the level -1 classifier.
 Aim of this Classifier :
 Given a Song, to tag it with the cluster ID to which it would go.
 Classifier Used :SVM with Linear Kernel ( C Regularization = 1.0).
 SVM gave the best performance compared to any other.
(Why not RBF Kernel ? Tried this , Lead to over fitting)
 Each Cluster is further trained with a classifier that tags any song into the genres
which belong to that cluster .
Introduction
Feature Extraction

Performance Evaluation
 First Level Classifier Accuracy : 76%
 After the End of Classification
( 1st followed by second)
 Average F1 Score : 50.45%
 Accuracy (Overall) : 52%
Introduction
Feature Extraction
5.Performance)Evaluation):)
! I!used!SVM!with!linear!kernel!(with!C=1.0)!for!the!first!step!of!Classification!and!
then!used!AdaBoost!algorithm!on!the!SVM!classifier!to!correctly!identify!the!cluster!to!
which!a!song!must!belong!to.!The!number!of!estimators!used!for!AdaBoost!Algorithm!is!
200.!In!the!second!step!all!of!the!classifiers!that!belong!to!a!cluster!are!trained!with!
another!SVM!classifier!with!Linear!Kernel!and!C!=!1.0.!
The!first!level!Accuracy!was!76.254!%!where!as!at!the!end!of!2nd
!phase,!the!overall!
classification!accuracy!was!found!to!be!50.16722%!.!This!is!a!tremendous!increase!from!
the!28%!accuracy!if!we!involve!all!the!10!Genres!directly!in!classification!in!the!first!step.!
Genre! Cluster!ID!! Precision,!Recall! F1!Score!
Blues! 1! !!!!!!!24,24! !!!!!24!
Classical! 2! !!!!!!!92,80! !!!!!79!
Country! 1! !!!!!!45,70! !!!!!55!
Disco! 0! !!!!!!41,43! !!!!!!42!
Hiphop! 3! !!!!!!51,46! !!!!!!49!!
Jazz! 2! !!!!!!85,63! !!!!!70!
Metal! 0! !!!!!84,53! !!!!!65!
Pop! 0! !!!!!75,70! !!!!!72!
Reggae! 3! !!!!!21,16! !!!!!18!
Rock! 0! !!!!!26,36! !!!!!30!
!
Average!!F1!Score!:!!50.45,!Accuracy!%!=!51!%!!

Confusion Matrix
 Confusion Matrix :
[[10 3 2 1 2 2 1 0 2 1]
Observe Last 2 Genres : [ 0 26 0 0 0 9 0 0 0 0]
Reggagae & Rock [ 5 0 17 0 5 0 0 1 4 5]
(Even the F1 Score was observed to be very low) [ 1 0 1 14 1 0 3 5 0 8]
[ 1 0 0 0 12 0 1 0 8 1]
Genres that performed well are the ones that had [ 1 0 0 1 0 18 0 0 1 0]
clear cut cluster ID with little scattering. [ 0 0 0 0 1 0 17 0 0 2]
1) How can we achieve this ? [ 1 0 0 3 2 0 0 22 5 1]
2) Soln : Use Hybrid Clustering [ 1 0 5 3 3 0 0 1 5 3]
[ 9 1 5 8 4 1 8 1 5 9]]
Introduction
Feature Extraction

Alternative Approach
Alternate Approach
 All the Classification algorithms were tried, and the above posted results were with
the best performed classifier
 Can the clustering accuracy be improved ?
 Lets Try Hybrid Clustering :
 K-Means followed by Agglomerative (Top-Down Hierarchal) clustering.
 We take the centroids of each genre (obtained from k-means) as the inputs at the
first level for Agglomerative Clustering.
 We break the tree at the level where the number of clusters is 4.
Introduction
Feature Extraction

Agglomerative Clustering
Heirarchal Clustering (Top-Down, Agglomerative)
 We Start off with each example as an individual cluster
and continue the clustering of “nearby” clusters till we hit
the desired number of clusters.
 Initially, each of the 1000 examples was treated as individual cluster
 Clustering Results werent much different from k-Means
 Introduced a multiple cluster ID mapping for a genre.
init_mapping={0:[2],1:[0,2],2:[2,3],3:[1,3],4:[1],5:[0,2],6:[3],7:[1],8:[1,2],9:[3]}
 Instead of mapping a cluster to single Cluster ID
 1st Level Accuracy rose by 2% only .
Introduction
Feature Extraction

Hybrid Clustering Performance
Performance Evaluation from Hybrid Clustering
 Choice of Features for Agglomerative Hybrid Clustering :
 Centroids of Clusters (where each genre is taken to be an individual cluster) that
captures all the variance in a particular genre.
 Using this Centroid gives us a more precise estimate of which genre is closer to
other( without scattering of few individuals in a genre across different clusters).
 1st Level Accuracy improved to 82.3% from 72.16%
 Lets Try Adding Adaboost (at cluster level) at classification which might eliminate the
bias in each of the classifier (Might also backfire by overfitting!!)
 Overall accuracy improved to 58.18%
Introduction
Feature Extraction

Future Work and Limitations
Future Work and Limitations :
 Consider using Larger training data ( Million Song Subset)
 All these results were published by training only 700 songs and testing on 300
songs.
 This might have resulted in over fitting of few algorithms and weaker analysis.
 Better Feature Engineering.(Consider taking more than 15 features or a polynomial
relationship across different co-efficient)
 Instead of using Euclidean distance in K-Means, we could have used a better
metric for measuring the distances.
 This was done for Demo Purposes and tuned for quick implementation ease, hence
has a lot of scope for improvement.
Introduction
Feature Extraction

References
References :
1. Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." The Journal of Machine Learning Research 12
(2011): 2825-2830.
2. Goodfellow, Ian J., et al. "Pylearn2: a machine learning research library." arXiv preprint arXiv:1308.4214 (2013).
3. Marsyas. ”Data Sets” http://marsyas.info/download/data_sets/.
4. Logan, Beth. "Mel Frequency Cepstral Coefficients for Music Modeling." ISMIR. 2000.
5. Fu, A., Lu, G., Ting, K.M., Zhang, D.. ”A Survey of Audio-Based Music Classification and An- notation” IEEE Transactions
on Multimedia.
6. “Mfcc” by kanu Mehta : http://www.mathworks.com/matlabcentral/fileexchange/23119-mfcc
Introduction
Feature Extraction

End
Work and ConsciousnessQuestions ?
Thank You 
Introduction
Feature Extraction

Music Genre Prediction using ML Techniques

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Music Genre Prediction using ML Techniques

Similar to Music Genre Prediction using ML Techniques (20)

Recently uploaded

Recently uploaded (20)

Music Genre Prediction using ML Techniques