SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1487
Implementing Musical Instrument Recognition using CNN and SVM
Prabhjyot Singh1, Dnyaneshwar Bachhav2, Omkar Joshi3, Nita Patil4
1,2,3Student, Computer Department, Datta Meghe College of Engineering, Maharashtra, India
4Professor, Computer Department, Datta Meghe College of Engineering, Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Recognizing instruments in music tracks is a crucial problem. It helps in search indexing as it will be faster to tag
music and also for knowing instruments used in music tracks can be very helpful towards genre detection (e.g., an electric guitar
heavily indicates that a track is not classical). This is a classification problem which will be taken care of by using a CNN
(Convolutional Neural Network) and SVM (Support Vector Machines). The audio excerpts used for training will be preprocessed
into images (visual representation of frequencies in sound). CNN’s have been used to achieve high accuracy in image processing
tasks. The goal is to achieve this high accuracy and apply it to music successfully. This will make the organization of abundant
digital music that is produced nowadays easier and efficient and helpful in the growing fieldofMusicInformationRetrieval(MIR).
Using both CNN and SVM to recognize the instrument and taking their weighted averages will result in higher accuracy.
Key Words: CNN, SVM, MFCC. Kernel, Dataset, Features
1. INTRODUCTION
Music is made by using various instruments and vocals from a human in most cases. These instruments are hard to classify as
many of them have a very minor difference in their sound signature. It is hard even for humans some time to differentiate
between similar sounding instruments. Instrument sounds played by different artists are also differentastheyhavetheirown
style and character and also depends on the quality of the instrument. Recognizing which instrument is playing in a musicwill
help to make recommending songs to a user more accurate as by knowing persons listening habit we can check iftheyprefera
particular instrument and make better suggestions to them. These suggestions could be tailored to a personsliking. Searching
for music will also be benefited as for the longest time only way songs or music has been searched is by typing name of a song
or an instrument, this can be changed as we can add more filters to the search and fine tune the parameters based on
instruments. A person searching for an artist can easily filter songs based on instrumentsplayinginthebackground.Itcanalso
help to check which instruments are more popular and where this will help to cater more precisely to a specific portion of the
populous. Using CNNs to classify images has been extensivelyresearchedandhasproducedhighlyaccurateresults.Usingthese
detailed results and applying them to classify instruments is our goal. Ample research has also been carried out to classify
instruments without the use of CNNs. Combining the results from CNN classification with classification by SVM to get better
results is the eventual goal. The weighted averages of the predictions will be used. SVM have been used for classificationsfora
long time with great results and to apply these to instrument recognition in music is the key idea which may help to achieve
satisfactory results. Every audio file is converted to an image format by feature extraction using LibROSA python package,
melspectrogram and mfcc are to be used.
2. LITERATURE SURVEY
Yoonchang Han et al[1] present a convolutional neural network framework for predominant instrument recognition in real-
world polyphonic music. It includes features like MFCCs, Histogram, Facial expression, Finger features like Minutiae. The
various classifiers used in this are SVM, KNN, NN and Softmax. The IRMAS dataset is used here for training and testing
purposes. Evaluation parameters used here and their corresponding values are Precision=0.7 and recall=0.25, F1
measure=0.25.
Emmanouil Benetos et al. [2] have proposed automatic musical instrument identification using a variety of classifiers is
addressed. Experiments are performed on a large set of recordingsthatstemfrom20instrumentclasses.Several featuresfrom
general audio data classification applications as well as MPEG-7 descriptors are measured for 1000 recordings. Here, the
feature used is Branch-and-bound. The first classifier is based on non-negative matrix factorization(NMF)techniques.Anovel
NMF testing method is proposed. A 3-layered multilayer perceptron, normalized Gaussian radial basis function networksand
support vector machines (SVM) employing a polynomial kernel are also used as classifiers for testing.
Athanasia Zlatintsi et al. [3] proposed new algorithms and features based on multiscale fractal exponents, which are validated
by static and dynamic classification algorithms and then compare their descriptiveness with a standard feature set of MFCCs
which efficient in musical instrument recognition. Markov model is used for experimental results.Theexperimentsarecarried
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1488
out using 1331 notes from 7 different instruments. The first set of experiments used Gaussian Mixture Models (GMMs)upto3
mixtures. The second set of experimentsusedHiddenMarkov Models(HMMs).Forthemodelingofstructureoftheinstruments’
tone, a left-right topology is used. The combination of the proposed features with the MFCCs gives slightly better results than
the MFCCs alone for most cases.
Arie A. Livshin et al. [4] reveals that in every solo instrument recognition using a custom dataset of 108 different solos
performed by different artists have been used. Features used were, Temporal Features, Energy Features, Spectral Features,
Harmonic Features, Perceptual Features, initially 62 features of given types were used but later reducedto20.Classificationis
performed by first Linear Discriminant Analysis (LDA) on learning set and then multiplied the coefficient matrix with test set
and classified using the K nearest neighbour (KNN). Best K is found using leave-one-out. In order to find the best feature
Gradual Descriptor Elimination has been used. It repeatedly uses LDA to eliminate the least significant descriptor.
Lin Zhang et al. [5] proposed the system which is made up of twoparts,theperipheral auditorysystem,andtheauditorycentral
nervous system. The entire system comprises 27 parallel channels. In the posteroventral cochlear nucleus(PVCN) model. Self
Organizing Map Network(SOMN) is used here as a classifier. A large solo database that consists of 243 acoustic and synthetic
solo tones over the full pitch ranges of seven different instruments. Sampling frequency used is 44.1 KHz. The entire system
comprises 27 parallel channels. In the posteroventral cochlear nucleus (PVCN) model. SelfOrganizingMapNetwork(SOMN)is
used here as a classifier.
Manuel Sabin et al. [6] has proposed a methodology in which project learns and trains using a guitar, piano,andmandolin bya
specifically tuned neural network. Each of the recordings were exported as a ‘.wav’ file. The final topology includes the bias
nodes along the top, in which the actual version has 5,001 nodes including the bias node in the input layer, 115 including the
bias node in the hidden layer, and 3 in the output layer. For this problem, the learning rate is set significantly low - i.e. 0.005.
Momentum is an optimization meant to prevent very slow convergence and usually the percentage of the previous jump that
the momentum takes is large, as it is in our program at 0.95.
Juan J. Bosch et al. [7] address the identification of predominant music instrumentsin polytimbral audiobypreviouslydividing
the original signal into several streams. Here, Fuhrmann algorithm, LRMS and FASST algorithm are used for methodology. In
this proposed method, SVM is used as a classifier. Dataset includedhereisproposedby Fuhrmann.Evaluationparametersused
here and their corresponding values are Precision=0.708 and recall=0.258 and F1 measure=0.378.
Toni Heittola et al. [8] also proposes a novel approach to musical instrument recognitioninpolyphonicaudiosignalsbyusinga
source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. In the recognition,
Mel-frequency cepstral coefficients are used to represent the coarse shape of the power spectrum of sound sources and
Gaussian mixture models are usedtomodel instrument-conditional densitiesofthe extractedfeatures.Themethod isevaluated
with polyphonic signals, randomly generated from 19instrumentclasses.GMMisusedasa classifierhere.Nineteeninstrument
classes are selected for the evaluations of the following instruments are accordion, bassoon, clarinet, contrabass,electric bass,
electric guitar, electric piano, flute, guitar, harmonica, horn,oboe,pianopiccolo,recorder,saxophone,trombone,trumpet,tuba.
The instrument instances are randomized into training (70%) or testing (30%) set. This method gives good results when
classifying into 19 instrument classes and with the high polyphony signals, implying a robust separation even with more
complex signals.
Philippe Hamel et al.[9] proposed a model able to determine which classesofmusical instrumentarepresentina givenmusical
audio example, without access to any information other than the audio itself. A Support Vector Machine (SVM), a Multilayer
Perceptron (MLP) and a Deep Belief Network (DBN)classifiersareusedhere. Totestforgeneralizationinstrumentsaredivided
into three independent sets: 50% of the instruments were placed in a training set, 20% in a validation set and the remaining
30% in a test set. MFCCs are used for feature extraction. It also involve the use of spectral feature: centroid, spread, skewness,
kurtosis, decrease, slope, flux and roll-off. To compare three different models F-scoreisusedasa performancemeasure.TheF-
Score is a measure that balances precision and recall. DBN performed better than other 2 model.
3. METHODOLOGY
The model proposed in this project is a combination of Convolutional Neural Network andSupportVectorMachine. Themodel
consists of convolutional layers shown below, followed by one fully connected layer and one SoftMax layer and SVM. It has
been shown that the deeper the convolutional layer is, the more abstract the features it learns [15]. MFCC is used for feature
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1489
extraction for SVM. The results obtained from both the CNN and SVMareaddedtogettheweightedaverage,whichwill givethe
better performance in terms of instrument identification.
Fig-1: System Architecture
Fig.3 Sample Spectrogram
3.1 CNN
Convolutional Neural Networks are similar to Neural Networks as they have weights which change overtime to improve
learning. The big difference is that CNNs are 3-D as they are images and hence are different from ANNs which work on 2-D
vectors. CNNs are pretty straightforward to understand if previous knowledge of ANN is present. CNNs use the kernel also
known as feature map or a filter. This filter is slided over the input image and as it moves block by block, we take the cross
product of the filter with the image. Filter moves over the entire image and this process is repeated. As we can see in the there
is not just a single filter but lots of them depending on the application or model. Here we show a single set of convolution layer
i.e. bunch of filters but there are usually multiple convolution layers in between. And as we progress through multiple layers
the output of the previous layers become the input for the next layer, and we again perform cross product to get the finer
details and helps the model learn more in-depth. It takes a lot of computing power to handle this much data as images in real
world have lots of details and have higher size hence increasing the size of each filter and computation required.
To overcome this, we pool these filters. Pooling is basically reducing the parameters of each filter. This is also called as down-
sampling or sub-sampling. It works on each filter individually. Reducingthesizeimprovestheperformanceofthenetwork.The
most used technique for pooling is Max Pooling. In Max Pooling we basically again move a windowoverthefilterbut insteadof
taking a cross product, just select the maximum value present. It also has other advantages such as making influence of
orientation change and scaling of image lesser. Orientation and scaling can make a lot of difference if not taken into
consideration.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1490
Flattening is the intermediate between convolution layers and fully connected layers. As we have 3-D data after pooling we
convert it to 1-D vector for the fully connected layer. This process is done by flatten.
Fully connected layer is the one in which the actual classification happens. We extract features from the image in the
Convolution part. This part is like a simple ANN which takes the features from the image as an input and adjust the weights
based on back propagation.
CNNs are have been researched a lot in the past few years with amazing results. $ Karen Simonyan and Andrew Zisserman of
the University of Oxford created a 19-layer CNN model more commonly known as VGG Net. It uses 3x3 filters,padof 1and2X2
max pooling layers, with stride 2.
Google Net was one of the first CNN architecture which didn’t just stack convolution and pooling layers on top of each other. It
had an error rate of only 6.67%. They did not just add more layers to the model as it increases the computation power.
3.2 SVM
“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or
regression challenges. However, it is mostly used in classification problems. SVMisstrongclassificationalgorithm becauseitis
simple in structure and it requires a smaller number of features. SVM is currently considered the most efficient family of
algorithm in machine learning because it is computationally efficient androbustinhigh dimension.DuringthetrainingofSVM,
feature extractor converts each input value to feature set and these feature sets capture basic information about each input.
The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N — the
number of features) that distinctly classifies the data points.
Kernel: In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best-known member is the
support vector machine (SVM). The general task of pattern analysis is to find and study general typesofrelations(forexample
clusters, rankings, principal components, correlations, classifications) in datasets. In its simplest form, the kernel trick means
transforming data into another dimension that has a clear dividing margin between classes of data. For many algorithms that
solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a
user-specified feature map: in contrast, kernel methodsrequireonlya user-specified kernel,i.e.,a similarityfunctionoverpairs
of data points in raw representation.
Radial Basis Function Kernel: In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel
function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine
classification.
Parameters of SVM:
C: Parameter C balances the trade-off between the model complexity and empirical error. Since the empirical error is
estimated from the samples in the training set, therefore it is related to the training error. When C is large, the SVM tends tobe
overfitting; and when C is small, the SVM tends to be underfitting. We have set the c value 110.
Gamma: The gamma parameter defines how far the influence of a single training example reaches, with low values meaning
‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples
selected by the model as support vectors. The gamma parameter defines how far the influence of a single training example
reaches, with low values meaning ‘far’ and high values meaning‘close’.Thegamma parameterscanbeseenastheinverseofthe
radius of influence of samples selected by the model as support vectors. We have set the gamma value to 0.1.
4. FEATURE SELECTION
Identifying the components of the audio signal is important. So, featureextractionisusedtoidentifythebasicimportantdata of
given input and it also removes the unnecessary data which comes in the format of emotion, noise, etc.
Spectrogram is used for CNN and MFCC, used for SVM, is a feature used in automatic speech and speaker recognition.Standard
32ms windows is going to use for the extraction of features.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1491
In this section our goal is to achieve a smaller feature set which will give us the quick result i.e. recognizing the input in real-
time. It will also not compromise the recognition rate.
5. DATASET
IRMAS dataset is divided into Training and Testing data. Audio files are sampled at 44.1 kHz in 16 but stereo wavformat.Total
of 9,579 audio files are present. The instruments present for recognition are acoustic guitar (gac), electric guitar (gel), organ
(org), piano (pia) and human singing voice (voi).
The dataset is very well constructed by taking into consideration change of styles in the past decades, also different ways in
which people play the instrument and the production values associated. In the training dataset 3 seconds excerpts arepresent
out of 2000 distinct recordings.
The training set contains 6705 audio files and 2874 audio files are kept for testing.Thelengthofthesoundexcerptfortestingis
between 5-20 seconds.
6. RESULT & ANALYSIS:
Fig.4 CNN Confusion Matrix Fig. 5 SVM Confusion Matrix
Result for CNN:
Precision recall f1-score support
gac 0.50 0.23 0.31 535
gel 0.49 0.32 0.39 942
org 0.18 0.35 0.24 361
pia 0.54 0.35 0.43 995
voi 0.35 0.56 0.43 1044
Result for SVM:
precision recall f1-score support
gac 0.78 0.77 0.78 127
gel 0.79 0.79 0.75 152
org 0.83 0.08 0.81 137
pia 0.78 0.78 0.77 144
voi 0.74 0.74 0.77 156
From the results shown before we can see that in our case the SVM is giving better results than the CNN.Thisiscontrarytofact
that CNN should have been more accurate. This is due to the limitationsputontheimplementationoftheprojectbecauseofthe
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1492
lack of a high-performance GPU available in the system for running the CNN. We were not able to get high numberofepochsto
run as it caused overheating of the mobile GPU in the laptop.
Also, from the results of the CNN we can see that the main problem encountered by the model was to distinguish between
instruments of the same type. In our dataset we had guitar and electric guitar which are of the same type and piano and organ
which are very similar. Human voice was almost always classified easily as it is very different from the instruments.
Another limitation was that in test data many audio files had human singing in them. As dataset consists of audio files from
many years, they have voice in many of them. As these files are from songs there is lot of singing in them. Thisresultedin noisy
data which explains why many times files were wrongly classified as human voice.
A great insight observed from the result is that pianos were classified were very well. This is great because there were two
string-based instruments and piano being a percussion-based instrument which is not that far fromthestringcategorywasn’t
affected much. During testing we saw that rap songs were mostly classified as human voice and then theinstrument playingin
the background in most of the cases.
7. CONCLUSIONS
In musical instrument recognition, we try to identify the instrument playing in a given audio excerpt. Automatic sound source
recognition is an important task in developing numerous applications, including database retrieval and automatic indexingof
instruments in music.
We saw that the SVM was performed better than and expected and the CNN did not due the hardware limitations and this one
weak link that we would like to improve in the future. Another thing istoaddmoreinstrumentsintothefourthatwehavenow.
The precision we get for CNN is 0.50 for acoustic guitar, 0.49 for electric guitar, 0.18 for organ, 0.54 for piano, 0.35 for voice.
The precision we get for SVM is 0.78 for acoustic guitar, 0.79 for electric guitar 0.78, 0.83 for organ, 0.78 for piano, 0.74 for
voice.
The easiest class was human voice, piano, electric guitar, guitar, organ in the given order. This was due to the fact that
distinguishing humans singing from instruments is easier. This gives a great opportunity to make this a multi-label
classification in the future.
We would like to improve our system as it has not reached its full potential due hardwarelimitations.Ittook a lotoftime to run
the CNN model. Also, as instruments such as guitar, electric guitar and piano and organ are similar in nature, it was hard for
models to distinguish them. SVM model performed very well for the data.
In the future we like to implement and add the following features:
● Support for more instruments.
● More accurate results.
● Ability for users to save their history.
● To implement a system for users to add more data to the dataset with their submissions.
REFERENCES
[1] Yoonchang Han, Jaehun Kim, and Kyogu Lee “Deep convolutional neural networksforpredominantinstrumentrecognition
in polyphonic music”, Journal Of Latex Class Files, Vol. 14, No. 8, May 2016
[2]Emmanouil Benetos, Margarita Kotti, and Constantine Kotropoulos,” Large Scale Musical Instrument Identification”,
Department of Informatics, Aristotle Univ. of Thessaloniki Box 451, Thessaloniki 541 24, Greece
[3] Athanasia Zlatintsi and Petros Maragos,” Musical Instruments Signal Analysis And Recognition Using Fractal Features”,
School of Electr. and Comp. Enginr., National Technical University of Athens, 19th European Signal Processing Conference
(EUSIPCO 2011)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1493
[4] Arie Livshin, Xavier Rodet,” Instrument Recognition Beyond Separate Notes -IndexingContinuousRecordings”,ICMC2004,
Nov 2004, Miami, United States.
[5] Lin Zhang, Shan Wang, Lianming Wang,”Musical Instrument Recognition Based on the Bionic Auditory Model”,2013
International Conference on Information Science and Cloud Computing Companion
[6] Manuel Sabin, ”Musical Instrument Recognition With Neural Networks”, Computer Science 180 Dr. Scott Gordon 24 April
2013
[7] Juan J. Bosch, Jordi Janer, Ferdinand Fuhrmann and Perfecto Herrera, ”A comparison of sound segregation technique for
predominant instrument recognition in musical audio signals”, Universitat Pompeu Fabra, Music Technology Group, Roc
Boronat 138, Barcelona,13th International Society for Music Information Retrieval Conference (ISMIR 2012)
[8] Toni Heittola, Anssi Klapuri and Tuomas Virtanen, ”Musical InstrumentRecognitioninPolyphonicAudioUsingsourcefilter
model for sound separation”, Department of Signal Processing, Tampere University of Technology,2009 International Society
for Music Information Retrieval.
[9] Philippe Hamel, Sean Wood and Douglas Eck,” Automatic Identification of instrument classes in Polyphonic and Poly-
Instrument Audio”, Departement d’informatique et de recherche opérationnelle, Université de Montréal,2009 International
Society for Music Information Retrieval.

More Related Content

What's hot

Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
奈良先端大 情報科学研究科
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
International Journal of Engineering Inventions www.ijeijournal.com
 
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET Journal
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
IJCSEIT Journal
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
TELKOMNIKA JOURNAL
 
IRJET- Voice based Gender Recognition
IRJET- Voice based Gender RecognitionIRJET- Voice based Gender Recognition
IRJET- Voice based Gender Recognition
IRJET Journal
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
Keunwoo Choi
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
journalBEEI
 
1801 1805
1801 18051801 1805
1801 1805
Editor IJARCET
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
CSCJournals
 
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Adaptive wavelet thresholding with robust hybrid features  for text-independe...Adaptive wavelet thresholding with robust hybrid features  for text-independe...
Adaptive wavelet thresholding with robust hybrid features for text-independe...
IJECEIAES
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
cscpconf
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
ijceronline
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
editor1knowledgecuddle
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
Keunwoo Choi
 
Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...
eSAT Journals
 
Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...
eSAT Publishing House
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
ijsrd.com
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
Cody Ray
 
T26123129
T26123129T26123129
T26123129
IJERA Editor
 

What's hot (20)

Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image ...
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
 
IRJET- Voice based Gender Recognition
IRJET- Voice based Gender RecognitionIRJET- Voice based Gender Recognition
IRJET- Voice based Gender Recognition
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
 
1801 1805
1801 18051801 1805
1801 1805
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Adaptive wavelet thresholding with robust hybrid features  for text-independe...Adaptive wavelet thresholding with robust hybrid features  for text-independe...
Adaptive wavelet thresholding with robust hybrid features for text-independe...
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...
 
Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...Noise reduction in speech processing using improved active noise control (anc...
Noise reduction in speech processing using improved active noise control (anc...
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
Text-Independent Speaker Verification
Text-Independent Speaker VerificationText-Independent Speaker Verification
Text-Independent Speaker Verification
 
T26123129
T26123129T26123129
T26123129
 

Similar to IRJET- Implementing Musical Instrument Recognition using CNN and SVM

IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
IRJET Journal
 
IRJET- A Review of Music Analysis Techniques
IRJET- A Review of Music Analysis TechniquesIRJET- A Review of Music Analysis Techniques
IRJET- A Review of Music Analysis Techniques
IRJET Journal
 
Knn a machine learning approach to recognize a musical instrument
Knn  a machine learning approach to recognize a musical instrumentKnn  a machine learning approach to recognize a musical instrument
Knn a machine learning approach to recognize a musical instrument
IJARIIT
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
IRJET Journal
 
Literature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural NetworkLiterature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural Network
IRJET Journal
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
CSCJournals
 
IRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation SystemIRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation System
IRJET Journal
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...
Venkat Projects
 
IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural Network
IRJET Journal
 
IRJET- Music Genre Classification using GMM
IRJET- Music Genre Classification using GMMIRJET- Music Genre Classification using GMM
IRJET- Music Genre Classification using GMM
IRJET Journal
 
A novel convolutional neural network based dysphonic voice detection algorit...
A novel convolutional neural network based dysphonic voice  detection algorit...A novel convolutional neural network based dysphonic voice  detection algorit...
A novel convolutional neural network based dysphonic voice detection algorit...
IJECEIAES
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351
Editor IJARCET
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
IRJET Journal
 
IRJET- Music Genre Classification using MFCC and AANN
IRJET- Music Genre Classification using MFCC and AANNIRJET- Music Genre Classification using MFCC and AANN
IRJET- Music Genre Classification using MFCC and AANN
IRJET Journal
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machine
IJECEIAES
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
multimediaeval
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.
IRJET Journal
 
Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...
IRJET Journal
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET Journal
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
IRJET Journal
 

Similar to IRJET- Implementing Musical Instrument Recognition using CNN and SVM (20)

IRJET- Musical Instrument Recognition using CNN and SVM
IRJET-  	  Musical Instrument Recognition using CNN and SVMIRJET-  	  Musical Instrument Recognition using CNN and SVM
IRJET- Musical Instrument Recognition using CNN and SVM
 
IRJET- A Review of Music Analysis Techniques
IRJET- A Review of Music Analysis TechniquesIRJET- A Review of Music Analysis Techniques
IRJET- A Review of Music Analysis Techniques
 
Knn a machine learning approach to recognize a musical instrument
Knn  a machine learning approach to recognize a musical instrumentKnn  a machine learning approach to recognize a musical instrument
Knn a machine learning approach to recognize a musical instrument
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
 
Literature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural NetworkLiterature Survey for Music Genre Classification Using Neural Network
Literature Survey for Music Genre Classification Using Neural Network
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (1) Issue...
 
IRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation SystemIRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation System
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...
 
IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural Network
 
IRJET- Music Genre Classification using GMM
IRJET- Music Genre Classification using GMMIRJET- Music Genre Classification using GMM
IRJET- Music Genre Classification using GMM
 
A novel convolutional neural network based dysphonic voice detection algorit...
A novel convolutional neural network based dysphonic voice  detection algorit...A novel convolutional neural network based dysphonic voice  detection algorit...
A novel convolutional neural network based dysphonic voice detection algorit...
 
Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351Ijarcet vol-2-issue-4-1347-1351
Ijarcet vol-2-issue-4-1347-1351
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
IRJET- Music Genre Classification using MFCC and AANN
IRJET- Music Genre Classification using MFCC and AANNIRJET- Music Genre Classification using MFCC and AANN
IRJET- Music Genre Classification using MFCC and AANN
 
Speech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machineSpeech emotion recognition with light gradient boosting decision trees machine
Speech emotion recognition with light gradient boosting decision trees machine
 
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
MediaEval 2015 - UNIZA System for the "Emotion in Music" task at MediaEval 20...
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.
 
Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 

Recently uploaded (20)

ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 

IRJET- Implementing Musical Instrument Recognition using CNN and SVM

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1487 Implementing Musical Instrument Recognition using CNN and SVM Prabhjyot Singh1, Dnyaneshwar Bachhav2, Omkar Joshi3, Nita Patil4 1,2,3Student, Computer Department, Datta Meghe College of Engineering, Maharashtra, India 4Professor, Computer Department, Datta Meghe College of Engineering, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Recognizing instruments in music tracks is a crucial problem. It helps in search indexing as it will be faster to tag music and also for knowing instruments used in music tracks can be very helpful towards genre detection (e.g., an electric guitar heavily indicates that a track is not classical). This is a classification problem which will be taken care of by using a CNN (Convolutional Neural Network) and SVM (Support Vector Machines). The audio excerpts used for training will be preprocessed into images (visual representation of frequencies in sound). CNN’s have been used to achieve high accuracy in image processing tasks. The goal is to achieve this high accuracy and apply it to music successfully. This will make the organization of abundant digital music that is produced nowadays easier and efficient and helpful in the growing fieldofMusicInformationRetrieval(MIR). Using both CNN and SVM to recognize the instrument and taking their weighted averages will result in higher accuracy. Key Words: CNN, SVM, MFCC. Kernel, Dataset, Features 1. INTRODUCTION Music is made by using various instruments and vocals from a human in most cases. These instruments are hard to classify as many of them have a very minor difference in their sound signature. It is hard even for humans some time to differentiate between similar sounding instruments. Instrument sounds played by different artists are also differentastheyhavetheirown style and character and also depends on the quality of the instrument. Recognizing which instrument is playing in a musicwill help to make recommending songs to a user more accurate as by knowing persons listening habit we can check iftheyprefera particular instrument and make better suggestions to them. These suggestions could be tailored to a personsliking. Searching for music will also be benefited as for the longest time only way songs or music has been searched is by typing name of a song or an instrument, this can be changed as we can add more filters to the search and fine tune the parameters based on instruments. A person searching for an artist can easily filter songs based on instrumentsplayinginthebackground.Itcanalso help to check which instruments are more popular and where this will help to cater more precisely to a specific portion of the populous. Using CNNs to classify images has been extensivelyresearchedandhasproducedhighlyaccurateresults.Usingthese detailed results and applying them to classify instruments is our goal. Ample research has also been carried out to classify instruments without the use of CNNs. Combining the results from CNN classification with classification by SVM to get better results is the eventual goal. The weighted averages of the predictions will be used. SVM have been used for classificationsfora long time with great results and to apply these to instrument recognition in music is the key idea which may help to achieve satisfactory results. Every audio file is converted to an image format by feature extraction using LibROSA python package, melspectrogram and mfcc are to be used. 2. LITERATURE SURVEY Yoonchang Han et al[1] present a convolutional neural network framework for predominant instrument recognition in real- world polyphonic music. It includes features like MFCCs, Histogram, Facial expression, Finger features like Minutiae. The various classifiers used in this are SVM, KNN, NN and Softmax. The IRMAS dataset is used here for training and testing purposes. Evaluation parameters used here and their corresponding values are Precision=0.7 and recall=0.25, F1 measure=0.25. Emmanouil Benetos et al. [2] have proposed automatic musical instrument identification using a variety of classifiers is addressed. Experiments are performed on a large set of recordingsthatstemfrom20instrumentclasses.Several featuresfrom general audio data classification applications as well as MPEG-7 descriptors are measured for 1000 recordings. Here, the feature used is Branch-and-bound. The first classifier is based on non-negative matrix factorization(NMF)techniques.Anovel NMF testing method is proposed. A 3-layered multilayer perceptron, normalized Gaussian radial basis function networksand support vector machines (SVM) employing a polynomial kernel are also used as classifiers for testing. Athanasia Zlatintsi et al. [3] proposed new algorithms and features based on multiscale fractal exponents, which are validated by static and dynamic classification algorithms and then compare their descriptiveness with a standard feature set of MFCCs which efficient in musical instrument recognition. Markov model is used for experimental results.Theexperimentsarecarried
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1488 out using 1331 notes from 7 different instruments. The first set of experiments used Gaussian Mixture Models (GMMs)upto3 mixtures. The second set of experimentsusedHiddenMarkov Models(HMMs).Forthemodelingofstructureoftheinstruments’ tone, a left-right topology is used. The combination of the proposed features with the MFCCs gives slightly better results than the MFCCs alone for most cases. Arie A. Livshin et al. [4] reveals that in every solo instrument recognition using a custom dataset of 108 different solos performed by different artists have been used. Features used were, Temporal Features, Energy Features, Spectral Features, Harmonic Features, Perceptual Features, initially 62 features of given types were used but later reducedto20.Classificationis performed by first Linear Discriminant Analysis (LDA) on learning set and then multiplied the coefficient matrix with test set and classified using the K nearest neighbour (KNN). Best K is found using leave-one-out. In order to find the best feature Gradual Descriptor Elimination has been used. It repeatedly uses LDA to eliminate the least significant descriptor. Lin Zhang et al. [5] proposed the system which is made up of twoparts,theperipheral auditorysystem,andtheauditorycentral nervous system. The entire system comprises 27 parallel channels. In the posteroventral cochlear nucleus(PVCN) model. Self Organizing Map Network(SOMN) is used here as a classifier. A large solo database that consists of 243 acoustic and synthetic solo tones over the full pitch ranges of seven different instruments. Sampling frequency used is 44.1 KHz. The entire system comprises 27 parallel channels. In the posteroventral cochlear nucleus (PVCN) model. SelfOrganizingMapNetwork(SOMN)is used here as a classifier. Manuel Sabin et al. [6] has proposed a methodology in which project learns and trains using a guitar, piano,andmandolin bya specifically tuned neural network. Each of the recordings were exported as a ‘.wav’ file. The final topology includes the bias nodes along the top, in which the actual version has 5,001 nodes including the bias node in the input layer, 115 including the bias node in the hidden layer, and 3 in the output layer. For this problem, the learning rate is set significantly low - i.e. 0.005. Momentum is an optimization meant to prevent very slow convergence and usually the percentage of the previous jump that the momentum takes is large, as it is in our program at 0.95. Juan J. Bosch et al. [7] address the identification of predominant music instrumentsin polytimbral audiobypreviouslydividing the original signal into several streams. Here, Fuhrmann algorithm, LRMS and FASST algorithm are used for methodology. In this proposed method, SVM is used as a classifier. Dataset includedhereisproposedby Fuhrmann.Evaluationparametersused here and their corresponding values are Precision=0.708 and recall=0.258 and F1 measure=0.378. Toni Heittola et al. [8] also proposes a novel approach to musical instrument recognitioninpolyphonicaudiosignalsbyusinga source-filter model and an augmented non-negative matrix factorization algorithm for sound separation. In the recognition, Mel-frequency cepstral coefficients are used to represent the coarse shape of the power spectrum of sound sources and Gaussian mixture models are usedtomodel instrument-conditional densitiesofthe extractedfeatures.Themethod isevaluated with polyphonic signals, randomly generated from 19instrumentclasses.GMMisusedasa classifierhere.Nineteeninstrument classes are selected for the evaluations of the following instruments are accordion, bassoon, clarinet, contrabass,electric bass, electric guitar, electric piano, flute, guitar, harmonica, horn,oboe,pianopiccolo,recorder,saxophone,trombone,trumpet,tuba. The instrument instances are randomized into training (70%) or testing (30%) set. This method gives good results when classifying into 19 instrument classes and with the high polyphony signals, implying a robust separation even with more complex signals. Philippe Hamel et al.[9] proposed a model able to determine which classesofmusical instrumentarepresentina givenmusical audio example, without access to any information other than the audio itself. A Support Vector Machine (SVM), a Multilayer Perceptron (MLP) and a Deep Belief Network (DBN)classifiersareusedhere. Totestforgeneralizationinstrumentsaredivided into three independent sets: 50% of the instruments were placed in a training set, 20% in a validation set and the remaining 30% in a test set. MFCCs are used for feature extraction. It also involve the use of spectral feature: centroid, spread, skewness, kurtosis, decrease, slope, flux and roll-off. To compare three different models F-scoreisusedasa performancemeasure.TheF- Score is a measure that balances precision and recall. DBN performed better than other 2 model. 3. METHODOLOGY The model proposed in this project is a combination of Convolutional Neural Network andSupportVectorMachine. Themodel consists of convolutional layers shown below, followed by one fully connected layer and one SoftMax layer and SVM. It has been shown that the deeper the convolutional layer is, the more abstract the features it learns [15]. MFCC is used for feature
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1489 extraction for SVM. The results obtained from both the CNN and SVMareaddedtogettheweightedaverage,whichwill givethe better performance in terms of instrument identification. Fig-1: System Architecture Fig.3 Sample Spectrogram 3.1 CNN Convolutional Neural Networks are similar to Neural Networks as they have weights which change overtime to improve learning. The big difference is that CNNs are 3-D as they are images and hence are different from ANNs which work on 2-D vectors. CNNs are pretty straightforward to understand if previous knowledge of ANN is present. CNNs use the kernel also known as feature map or a filter. This filter is slided over the input image and as it moves block by block, we take the cross product of the filter with the image. Filter moves over the entire image and this process is repeated. As we can see in the there is not just a single filter but lots of them depending on the application or model. Here we show a single set of convolution layer i.e. bunch of filters but there are usually multiple convolution layers in between. And as we progress through multiple layers the output of the previous layers become the input for the next layer, and we again perform cross product to get the finer details and helps the model learn more in-depth. It takes a lot of computing power to handle this much data as images in real world have lots of details and have higher size hence increasing the size of each filter and computation required. To overcome this, we pool these filters. Pooling is basically reducing the parameters of each filter. This is also called as down- sampling or sub-sampling. It works on each filter individually. Reducingthesizeimprovestheperformanceofthenetwork.The most used technique for pooling is Max Pooling. In Max Pooling we basically again move a windowoverthefilterbut insteadof taking a cross product, just select the maximum value present. It also has other advantages such as making influence of orientation change and scaling of image lesser. Orientation and scaling can make a lot of difference if not taken into consideration.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1490 Flattening is the intermediate between convolution layers and fully connected layers. As we have 3-D data after pooling we convert it to 1-D vector for the fully connected layer. This process is done by flatten. Fully connected layer is the one in which the actual classification happens. We extract features from the image in the Convolution part. This part is like a simple ANN which takes the features from the image as an input and adjust the weights based on back propagation. CNNs are have been researched a lot in the past few years with amazing results. $ Karen Simonyan and Andrew Zisserman of the University of Oxford created a 19-layer CNN model more commonly known as VGG Net. It uses 3x3 filters,padof 1and2X2 max pooling layers, with stride 2. Google Net was one of the first CNN architecture which didn’t just stack convolution and pooling layers on top of each other. It had an error rate of only 6.67%. They did not just add more layers to the model as it increases the computation power. 3.2 SVM “Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. SVMisstrongclassificationalgorithm becauseitis simple in structure and it requires a smaller number of features. SVM is currently considered the most efficient family of algorithm in machine learning because it is computationally efficient androbustinhigh dimension.DuringthetrainingofSVM, feature extractor converts each input value to feature set and these feature sets capture basic information about each input. The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space (N — the number of features) that distinctly classifies the data points. Kernel: In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best-known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general typesofrelations(forexample clusters, rankings, principal components, correlations, classifications) in datasets. In its simplest form, the kernel trick means transforming data into another dimension that has a clear dividing margin between classes of data. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methodsrequireonlya user-specified kernel,i.e.,a similarityfunctionoverpairs of data points in raw representation. Radial Basis Function Kernel: In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification. Parameters of SVM: C: Parameter C balances the trade-off between the model complexity and empirical error. Since the empirical error is estimated from the samples in the training set, therefore it is related to the training error. When C is large, the SVM tends tobe overfitting; and when C is small, the SVM tends to be underfitting. We have set the c value 110. Gamma: The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors. The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning‘close’.Thegamma parameterscanbeseenastheinverseofthe radius of influence of samples selected by the model as support vectors. We have set the gamma value to 0.1. 4. FEATURE SELECTION Identifying the components of the audio signal is important. So, featureextractionisusedtoidentifythebasicimportantdata of given input and it also removes the unnecessary data which comes in the format of emotion, noise, etc. Spectrogram is used for CNN and MFCC, used for SVM, is a feature used in automatic speech and speaker recognition.Standard 32ms windows is going to use for the extraction of features.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1491 In this section our goal is to achieve a smaller feature set which will give us the quick result i.e. recognizing the input in real- time. It will also not compromise the recognition rate. 5. DATASET IRMAS dataset is divided into Training and Testing data. Audio files are sampled at 44.1 kHz in 16 but stereo wavformat.Total of 9,579 audio files are present. The instruments present for recognition are acoustic guitar (gac), electric guitar (gel), organ (org), piano (pia) and human singing voice (voi). The dataset is very well constructed by taking into consideration change of styles in the past decades, also different ways in which people play the instrument and the production values associated. In the training dataset 3 seconds excerpts arepresent out of 2000 distinct recordings. The training set contains 6705 audio files and 2874 audio files are kept for testing.Thelengthofthesoundexcerptfortestingis between 5-20 seconds. 6. RESULT & ANALYSIS: Fig.4 CNN Confusion Matrix Fig. 5 SVM Confusion Matrix Result for CNN: Precision recall f1-score support gac 0.50 0.23 0.31 535 gel 0.49 0.32 0.39 942 org 0.18 0.35 0.24 361 pia 0.54 0.35 0.43 995 voi 0.35 0.56 0.43 1044 Result for SVM: precision recall f1-score support gac 0.78 0.77 0.78 127 gel 0.79 0.79 0.75 152 org 0.83 0.08 0.81 137 pia 0.78 0.78 0.77 144 voi 0.74 0.74 0.77 156 From the results shown before we can see that in our case the SVM is giving better results than the CNN.Thisiscontrarytofact that CNN should have been more accurate. This is due to the limitationsputontheimplementationoftheprojectbecauseofthe
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1492 lack of a high-performance GPU available in the system for running the CNN. We were not able to get high numberofepochsto run as it caused overheating of the mobile GPU in the laptop. Also, from the results of the CNN we can see that the main problem encountered by the model was to distinguish between instruments of the same type. In our dataset we had guitar and electric guitar which are of the same type and piano and organ which are very similar. Human voice was almost always classified easily as it is very different from the instruments. Another limitation was that in test data many audio files had human singing in them. As dataset consists of audio files from many years, they have voice in many of them. As these files are from songs there is lot of singing in them. Thisresultedin noisy data which explains why many times files were wrongly classified as human voice. A great insight observed from the result is that pianos were classified were very well. This is great because there were two string-based instruments and piano being a percussion-based instrument which is not that far fromthestringcategorywasn’t affected much. During testing we saw that rap songs were mostly classified as human voice and then theinstrument playingin the background in most of the cases. 7. CONCLUSIONS In musical instrument recognition, we try to identify the instrument playing in a given audio excerpt. Automatic sound source recognition is an important task in developing numerous applications, including database retrieval and automatic indexingof instruments in music. We saw that the SVM was performed better than and expected and the CNN did not due the hardware limitations and this one weak link that we would like to improve in the future. Another thing istoaddmoreinstrumentsintothefourthatwehavenow. The precision we get for CNN is 0.50 for acoustic guitar, 0.49 for electric guitar, 0.18 for organ, 0.54 for piano, 0.35 for voice. The precision we get for SVM is 0.78 for acoustic guitar, 0.79 for electric guitar 0.78, 0.83 for organ, 0.78 for piano, 0.74 for voice. The easiest class was human voice, piano, electric guitar, guitar, organ in the given order. This was due to the fact that distinguishing humans singing from instruments is easier. This gives a great opportunity to make this a multi-label classification in the future. We would like to improve our system as it has not reached its full potential due hardwarelimitations.Ittook a lotoftime to run the CNN model. Also, as instruments such as guitar, electric guitar and piano and organ are similar in nature, it was hard for models to distinguish them. SVM model performed very well for the data. In the future we like to implement and add the following features: ● Support for more instruments. ● More accurate results. ● Ability for users to save their history. ● To implement a system for users to add more data to the dataset with their submissions. REFERENCES [1] Yoonchang Han, Jaehun Kim, and Kyogu Lee “Deep convolutional neural networksforpredominantinstrumentrecognition in polyphonic music”, Journal Of Latex Class Files, Vol. 14, No. 8, May 2016 [2]Emmanouil Benetos, Margarita Kotti, and Constantine Kotropoulos,” Large Scale Musical Instrument Identification”, Department of Informatics, Aristotle Univ. of Thessaloniki Box 451, Thessaloniki 541 24, Greece [3] Athanasia Zlatintsi and Petros Maragos,” Musical Instruments Signal Analysis And Recognition Using Fractal Features”, School of Electr. and Comp. Enginr., National Technical University of Athens, 19th European Signal Processing Conference (EUSIPCO 2011)
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 07 | July 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1493 [4] Arie Livshin, Xavier Rodet,” Instrument Recognition Beyond Separate Notes -IndexingContinuousRecordings”,ICMC2004, Nov 2004, Miami, United States. [5] Lin Zhang, Shan Wang, Lianming Wang,”Musical Instrument Recognition Based on the Bionic Auditory Model”,2013 International Conference on Information Science and Cloud Computing Companion [6] Manuel Sabin, ”Musical Instrument Recognition With Neural Networks”, Computer Science 180 Dr. Scott Gordon 24 April 2013 [7] Juan J. Bosch, Jordi Janer, Ferdinand Fuhrmann and Perfecto Herrera, ”A comparison of sound segregation technique for predominant instrument recognition in musical audio signals”, Universitat Pompeu Fabra, Music Technology Group, Roc Boronat 138, Barcelona,13th International Society for Music Information Retrieval Conference (ISMIR 2012) [8] Toni Heittola, Anssi Klapuri and Tuomas Virtanen, ”Musical InstrumentRecognitioninPolyphonicAudioUsingsourcefilter model for sound separation”, Department of Signal Processing, Tampere University of Technology,2009 International Society for Music Information Retrieval. [9] Philippe Hamel, Sean Wood and Douglas Eck,” Automatic Identification of instrument classes in Polyphonic and Poly- Instrument Audio”, Departement d’informatique et de recherche opérationnelle, Université de Montréal,2009 International Society for Music Information Retrieval.