MUSIC GENRE
DETECTION
Literature Survey
WAV File format
■ Waveform Audio File Format (WAVE, or more commonly known asWAV due to
its filename extension) is a Microsoft and IBM audio file format standard for
storing an audio bitstream on PCs.
■ It is the main format used onWindows systems for raw and typically
uncompressed audio.The usual bitstream encoding is the linear pulse-code
modulation (LPCM) format.
Mel Frequency Cepstral Coefficient (MFCC)
■ In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-
term power spectrum of a sound
■ Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an
MFC
■ MFCCs are commonly derived as follows:
– Take the Fourier transform of (a windowed excerpt of) a signal.
– Map the powers of the spectrum obtained above onto the mel scale, using triangular
overlapping windows.
– Take the logs of the powers at each of the mel frequencies.
– Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
– The MFCCs are the amplitudes of the resulting spectrum.
Audio genres
■ BLUES:
– The blues form is a cyclic musical form in which a repeating progression of chords
mirrors the call and response scheme commonly found in African and African-
American music.Typical instruments are Guitar, bass guitar, piano, harmonica,
upright bass, drums, saxophone, vocals, trumpet, cornet, trombone
■ CLASSICAL:
– The Classical era stringed instruments were the four instruments which form the
string section of the orchestra: the violin, viola, cello and contrabass.Woodwinds
included the basset clarinet, basset horn, the Classical clarinet, the chalumeau, the
flute, oboe and bassoon. Keyboard instruments included the clavichord and the
fortepiano
■ JAZZ:
– It is music that includes qualities such as swing, improvising, group interaction,
developing an 'individual voice', and being open to different musical possibilities.
Typical instruments are Double bass, drums, guitar (typically electric guitar), piano,
saxophone, trumpet, clarinet, trombone, vocals, vibraphone, Hammond organ and
harmonica. In jazz fusion of the 1970s, electric bass, electric piano and synthesizer
were common.
■ ROCK:
– Rock music is a genre of popular music that originated as "rock and roll" in the
United States in the 1950s, and developed into a range of different styles in the
1960s and later, particularly in the United Kingdom and the United States.Typical
instruments are Electric guitar, bass guitar, acoustic guitar, drums, piano,
synthesizer, keyboards
Collection of dataset
■ We have considered four genres for this project
– Jazz
– Rock
– Classical
– blues
HIDDEN MARKOV MODEL (HMM)
■ A hidden Markov model (HMM) is a statistical Markov model in which the system
being modeled is assumed to be a Markov process with unobserved (hidden) states.A
HMM can be presented as the simplest dynamic Bayesian network.
■ In a hidden Markov model, the state is not directly visible, but output, dependent on
the state, is visible. Each state has a probability distribution over the possible output
tokens.
■ The sequence of tokens generated by an HMM gives some information about the
sequence of states.
■ The adjective 'hidden' refers to the state sequence through which the model passes,
not to the parameters of the model; the model is still referred to as a 'hidden' Markov
model even if these parameters are known exactly.
HMM
■ if the n'th observation in a chain of observations is influenced by a corresponding
latent (i.e. hidden) variable ie.., If the latent variables are discrete and form a Markov
chain, then it is a hidden Markov model (HMM)
PROBLEM STATEMENT
■ Music can be classified into genres based on the style it follows
■ The classification of music is an important part as a person’s taste in music is limited to
a few genres, personalized to them
■ Due to the boom in information, huge amount of music data can be found without
proper description or classification
■ This project aims at categorizing music samples into four genres – jazz, blues, rock,
classic
Steps involved
■ Collection of dataset
■ Preprocessing of dataset
■ Training of classifier
■ Classification of music samples
Collection of dataset
■ Each of these genres have distinctive characteristics
■ Music samples for each of these genres have been collected
■ Each genre has 100 music samples
■ This number of samples have been considered for a better efficiency of the classifier
■ The ratio considered for training and testing is 3:1 ie.., 75% samples will be used for
training and 25%, for testing
■ Out of 400 samples, 300 samples will be used for training and 100 samples will be used
for testing
DESIGN Unprocessed
audio files
Processed audio
files
Audio PreprocessorFeature Extractor
Feature files
Classifier
(HMM)
Output
IMPLEMENTATION
■ PRE-PROCESSING STAGE
– This module automates the audio conversion process.This module scans the current
directory to find all the mp3 files.Then, each file is loaded one by one. An audio clip
may be one-channeled or two channeled.The final step in this automated process is
to decompress the audio and store it in wave format, which is a lossless compression
format. After the process is complete, the current directory contains all wave files.
This process has to be repeated for every genre tag being considered.
start
get list of mp3 files in pwd
for every item in the list do :
load file
set # of channels to 1
export as wave
end
■ FEATURE EXTRACTION STAGE
– The python_speech_features library provides common speech features for ASR
including MFCCs and filterbank energies.
■ features.mfcc() - Mel Frequency Cepstral Coefficients
■ features.logfbank() - Log Filterbank Energies
■ CLASSIFICATION STAGE
– Classifier used is Hidden markov model
– Four HMMs used to classify into 4 genres
– Maximum probability predicts genre of music
TESTINGAND SCREEN SHOTS
■ TESTING STAGE
– Input sample gives probability for each genre when it is tested with HMM classifier
– 30% of dataset is used for testing classifier
– Confusion matrix is as given
RESULTSAND DISCUSSION
■ Training was done with two sizes of datasets-
– Containing 400 audio files
– Containing 100 audio files
■ For bigger dataset, higher efficiency was achieved
■ For smaller dataset, lesser efficiency was achieved.
■ The genres of blues and rock have similar elements, and therefore some of the
samples were classified into the other category, and hence the overall efficiency of
smaller datasets were reduced
CONCLUSION
■ The Hidden markov model approach for music genre classification has an efficiency of
around 72% when the dataset size considered for training is huge.
FUTUREWORK
■ GUI for the application
■ Input file from user
■ Include more classes
■ Display classification and find if audio sample belong to more than one genre
REFERENCES
[1] Molau, Pitz, Schlueter, Ney :Computing Mel- Frequency Cepstral Coefficients
on the Power Spectrum, RWTH Aachen – University ofTechnology, 52056
Aachen,Germany. IEEETransactions on Communications 26 (6)
[2] Frigo, M.; Johnson, S. G. (2005). "The Design and Implementation of FFTW3"
(PDF). Proceedings of the IEEE 93: 216–231. 10.1109/jproc.2004.840301
[3] Narasimha, M.; Peterson, A. (June 1978). "On the Computation of the
Discrete CosineTransform". IEEETransactions on Communications 26 (6): 934–
936. 10.1109/TCOM.1978.1094144
[4] S. B. Kotsiantis : Supervised Machine Learning:A Review of Classification
Techniques, 2007, University of Peloponnesse,Greece
[5] DavidA. Freedman (2009). Statistical Models:Theory and Practice.
Cambridge University Press. p. 128.
[6] Musical genre classification of audio signals, Speech and Audio Processing,
IEEETransactions on (Volume:10 , Issue: 5 ), 293-302, November 2002,
http://dspace.library.uvic.ca:8080/bitstream/handle/1828/1344/tsap02gtzan.pdf
■ WEB LINKSAND OTHER RESOURCES
[7] http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/
[8] en.wikipedia.org/wiki/Hidden_Markov_Model
[9] en.wikipedia.org/wiki/Mel_Frequency_Ceptral
[10] en.wikipedia.org/wiki/Window_function
[11] http://practicalcryptography.com/miscellaneous/machine-learning/guide-
mel-frequency-cepstral-coefficients-mfccs/
THANKYOU

Music genre detection using hidden markov models

  • 1.
  • 2.
    Literature Survey WAV Fileformat ■ Waveform Audio File Format (WAVE, or more commonly known asWAV due to its filename extension) is a Microsoft and IBM audio file format standard for storing an audio bitstream on PCs. ■ It is the main format used onWindows systems for raw and typically uncompressed audio.The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.
  • 3.
    Mel Frequency CepstralCoefficient (MFCC) ■ In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short- term power spectrum of a sound ■ Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC ■ MFCCs are commonly derived as follows: – Take the Fourier transform of (a windowed excerpt of) a signal. – Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows. – Take the logs of the powers at each of the mel frequencies. – Take the discrete cosine transform of the list of mel log powers, as if it were a signal. – The MFCCs are the amplitudes of the resulting spectrum.
  • 4.
    Audio genres ■ BLUES: –The blues form is a cyclic musical form in which a repeating progression of chords mirrors the call and response scheme commonly found in African and African- American music.Typical instruments are Guitar, bass guitar, piano, harmonica, upright bass, drums, saxophone, vocals, trumpet, cornet, trombone ■ CLASSICAL: – The Classical era stringed instruments were the four instruments which form the string section of the orchestra: the violin, viola, cello and contrabass.Woodwinds included the basset clarinet, basset horn, the Classical clarinet, the chalumeau, the flute, oboe and bassoon. Keyboard instruments included the clavichord and the fortepiano
  • 5.
    ■ JAZZ: – Itis music that includes qualities such as swing, improvising, group interaction, developing an 'individual voice', and being open to different musical possibilities. Typical instruments are Double bass, drums, guitar (typically electric guitar), piano, saxophone, trumpet, clarinet, trombone, vocals, vibraphone, Hammond organ and harmonica. In jazz fusion of the 1970s, electric bass, electric piano and synthesizer were common. ■ ROCK: – Rock music is a genre of popular music that originated as "rock and roll" in the United States in the 1950s, and developed into a range of different styles in the 1960s and later, particularly in the United Kingdom and the United States.Typical instruments are Electric guitar, bass guitar, acoustic guitar, drums, piano, synthesizer, keyboards
  • 6.
    Collection of dataset ■We have considered four genres for this project – Jazz – Rock – Classical – blues
  • 7.
    HIDDEN MARKOV MODEL(HMM) ■ A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.A HMM can be presented as the simplest dynamic Bayesian network. ■ In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. ■ The sequence of tokens generated by an HMM gives some information about the sequence of states. ■ The adjective 'hidden' refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a 'hidden' Markov model even if these parameters are known exactly.
  • 8.
    HMM ■ if then'th observation in a chain of observations is influenced by a corresponding latent (i.e. hidden) variable ie.., If the latent variables are discrete and form a Markov chain, then it is a hidden Markov model (HMM)
  • 9.
    PROBLEM STATEMENT ■ Musiccan be classified into genres based on the style it follows ■ The classification of music is an important part as a person’s taste in music is limited to a few genres, personalized to them ■ Due to the boom in information, huge amount of music data can be found without proper description or classification ■ This project aims at categorizing music samples into four genres – jazz, blues, rock, classic
  • 10.
    Steps involved ■ Collectionof dataset ■ Preprocessing of dataset ■ Training of classifier ■ Classification of music samples
  • 11.
    Collection of dataset ■Each of these genres have distinctive characteristics ■ Music samples for each of these genres have been collected ■ Each genre has 100 music samples ■ This number of samples have been considered for a better efficiency of the classifier ■ The ratio considered for training and testing is 3:1 ie.., 75% samples will be used for training and 25%, for testing ■ Out of 400 samples, 300 samples will be used for training and 100 samples will be used for testing
  • 12.
    DESIGN Unprocessed audio files Processedaudio files Audio PreprocessorFeature Extractor Feature files Classifier (HMM) Output
  • 13.
    IMPLEMENTATION ■ PRE-PROCESSING STAGE –This module automates the audio conversion process.This module scans the current directory to find all the mp3 files.Then, each file is loaded one by one. An audio clip may be one-channeled or two channeled.The final step in this automated process is to decompress the audio and store it in wave format, which is a lossless compression format. After the process is complete, the current directory contains all wave files. This process has to be repeated for every genre tag being considered. start get list of mp3 files in pwd for every item in the list do : load file set # of channels to 1 export as wave end
  • 14.
    ■ FEATURE EXTRACTIONSTAGE – The python_speech_features library provides common speech features for ASR including MFCCs and filterbank energies. ■ features.mfcc() - Mel Frequency Cepstral Coefficients ■ features.logfbank() - Log Filterbank Energies ■ CLASSIFICATION STAGE – Classifier used is Hidden markov model – Four HMMs used to classify into 4 genres – Maximum probability predicts genre of music
  • 15.
    TESTINGAND SCREEN SHOTS ■TESTING STAGE – Input sample gives probability for each genre when it is tested with HMM classifier – 30% of dataset is used for testing classifier – Confusion matrix is as given
  • 16.
    RESULTSAND DISCUSSION ■ Trainingwas done with two sizes of datasets- – Containing 400 audio files – Containing 100 audio files ■ For bigger dataset, higher efficiency was achieved ■ For smaller dataset, lesser efficiency was achieved. ■ The genres of blues and rock have similar elements, and therefore some of the samples were classified into the other category, and hence the overall efficiency of smaller datasets were reduced
  • 17.
    CONCLUSION ■ The Hiddenmarkov model approach for music genre classification has an efficiency of around 72% when the dataset size considered for training is huge.
  • 18.
    FUTUREWORK ■ GUI forthe application ■ Input file from user ■ Include more classes ■ Display classification and find if audio sample belong to more than one genre
  • 19.
    REFERENCES [1] Molau, Pitz,Schlueter, Ney :Computing Mel- Frequency Cepstral Coefficients on the Power Spectrum, RWTH Aachen – University ofTechnology, 52056 Aachen,Germany. IEEETransactions on Communications 26 (6) [2] Frigo, M.; Johnson, S. G. (2005). "The Design and Implementation of FFTW3" (PDF). Proceedings of the IEEE 93: 216–231. 10.1109/jproc.2004.840301 [3] Narasimha, M.; Peterson, A. (June 1978). "On the Computation of the Discrete CosineTransform". IEEETransactions on Communications 26 (6): 934– 936. 10.1109/TCOM.1978.1094144
  • 20.
    [4] S. B.Kotsiantis : Supervised Machine Learning:A Review of Classification Techniques, 2007, University of Peloponnesse,Greece [5] DavidA. Freedman (2009). Statistical Models:Theory and Practice. Cambridge University Press. p. 128. [6] Musical genre classification of audio signals, Speech and Audio Processing, IEEETransactions on (Volume:10 , Issue: 5 ), 293-302, November 2002, http://dspace.library.uvic.ca:8080/bitstream/handle/1828/1344/tsap02gtzan.pdf
  • 21.
    ■ WEB LINKSANDOTHER RESOURCES [7] http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/ [8] en.wikipedia.org/wiki/Hidden_Markov_Model [9] en.wikipedia.org/wiki/Mel_Frequency_Ceptral [10] en.wikipedia.org/wiki/Window_function [11] http://practicalcryptography.com/miscellaneous/machine-learning/guide- mel-frequency-cepstral-coefficients-mfccs/
  • 22.