SlideShare a Scribd company logo
1 of 49
Download to read offline
Introduction to Audio Content Analysis
module 12.2: musical genre classification
alexander lerch
overview intro genre MGC features example summary
introduction
overview
corresponding textbook section
section 12.2
lecture content
• musical genre
• processing steps in basic genre classifiers
• example: genre classification with a kNN
learning objectives
• discuss ambiguities in the definition of musical genre and the possible impact on
automatic systems
• describe the processing steps for traditional musical genre classifiers
• implement your own music genre classifier with Matlab
module 12.2: musical genre classification 1 / 20
overview intro genre MGC features example summary
introduction
overview
corresponding textbook section
section 12.2
lecture content
• musical genre
• processing steps in basic genre classifiers
• example: genre classification with a kNN
learning objectives
• discuss ambiguities in the definition of musical genre and the possible impact on
automatic systems
• describe the processing steps for traditional musical genre classifiers
• implement your own music genre classifier with Matlab
module 12.2: musical genre classification 1 / 20
overview intro genre MGC features example summary
musical genre classification
introduction
one of the early/seminal research topics in MIR
classic machine learning task
• features → classification
related tasks:
• speech-music classification
• instrument recognition
• artist identification
• music emotion recognition
module 12.2: musical genre classification 2 / 20
overview intro genre MGC features example summary
musical genre classification
introduction
one of the early/seminal research topics in MIR
classic machine learning task
• features → classification
related tasks:
• speech-music classification
• instrument recognition
• artist identification
• music emotion recognition
module 12.2: musical genre classification 2 / 20
overview intro genre MGC features example summary
musical genre classification
introduction
one of the early/seminal research topics in MIR
classic machine learning task
• features → classification
related tasks:
• speech-music classification
• instrument recognition
• artist identification
• music emotion recognition
module 12.2: musical genre classification 2 / 20
overview intro genre MGC features example summary
musical genre classification
applications
large music databases:
• annotation
• sorting, browsing, retrieving
recommendation and music discovery systems
automatic playlist generation
improving downstream MIR tasks by using side information/conditioning
module 12.2: musical genre classification 3 / 20
overview intro genre MGC features example summary
musical genre classification
applications
large music databases:
• annotation
• sorting, browsing, retrieving
recommendation and music discovery systems
automatic playlist generation
improving downstream MIR tasks by using side information/conditioning
module 12.2: musical genre classification 3 / 20
overview intro genre MGC features example summary
musical genre classification
genre: definition
what is musical genre
module 12.2: musical genre classification 4 / 20
overview intro genre MGC features example summary
musical genre classification
genre: definition
what is musical genre
clusters of musical similarity?
→ hard to answer in general, there are many systematic problems
module 12.2: musical genre classification 4 / 20
overview intro genre MGC features example summary
musical genre classification
genre: definition
what is musical genre
clusters of musical similarity?
→ hard to answer in general, there are many systematic problems
1 non-agreement on taxonomies
▶ e.g., AllMusic vs. Pandora
2 genre label scope
▶ e.g., song, album, artist, piece of a song
3 ill-defined genre labels
▶ e.g., geographic (indian music), historic (baroque), technical (barbershop),
instrumentation (symphonic music), usage (christmas songs)
4 taxonomy scalability
▶ e.g., genres and subgenres evolve over time
5 non-orthogonality
▶ e.g., several genres for one piece of music
module 12.2: musical genre classification 4 / 20
overview intro genre MGC features example summary
musical genre classification
genre: taxonomy examples
Speech
Music
Male
Female
Sports
Disco
Country
Hip Hop
Rock
Blues
Reggae
Pop
Metal
Classical
Jazz
Choir
Orchestra
Piano
String Quartet
Big Band
Cool
Fusion
Piano
Quartet
Swing
Background
Speech
Music
Male
Female
+Background
Classical
Non-Classical
Chamber
Orchestra
Rock
Electro/Pop
Jazz/Blues
Piano
Solo
String Quartet
Other
Symphonic
+Choir
+Soloist
Soft Rock
Hard Rock
Hip Hop
Techno/Dance
Pop
module 12.2: musical genre classification 5 / 20
overview intro genre MGC features example summary
musical genre classification
observations with humans
1 human classification far from perfect:
75–90 % for limited set of classes
2 for many genres, humans need only a
fraction of a second to classify
⇒ short time timbre features sufficient?
plots from1,2
1
S. Lippens, J.-P. Martens, T. D. Mulder, et al., “A Comparison of Human and Automatic Musical Genre Classification,” in Proceedings of the
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, 2004.
2
R. O. Gjerdingen and D. Perrott, “Scanning the Dial: The Rapid Recognition of Music Genres,” Journal of New Music Research, vol. 37, no. 2,
pp. 93–100, Jun. 2008, 00067, issn: 0929-8215.
module 12.2: musical genre classification 6 / 20
overview intro genre MGC features example summary
musical genre classification
observations with humans
1 human classification far from perfect:
75–90 % for limited set of classes
2 for many genres, humans need only a
fraction of a second to classify
⇒ short time timbre features sufficient?
plots from1,2
1
S. Lippens, J.-P. Martens, T. D. Mulder, et al., “A Comparison of Human and Automatic Musical Genre Classification,” in Proceedings of the
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, 2004.
2
R. O. Gjerdingen and D. Perrott, “Scanning the Dial: The Rapid Recognition of Music Genres,” Journal of New Music Research, vol. 37, no. 2,
pp. 93–100, Jun. 2008, 00067, issn: 0929-8215.
module 12.2: musical genre classification 6 / 20
overview intro genre MGC features example summary
musical genre classification
overview
Audio
Signal
Feature
Extraction
Classification
Genre
Label
1 feature extraction
• compressed, meaningful representation
2 classification
• map or convert feature to comprehensible domain
module 12.2: musical genre classification 7 / 20
overview intro genre MGC features example summary
musical genre classification
overview
Audio
Signal
Feature
Extraction
Classification
Genre
Label
1 feature extraction
• compressed, meaningful representation
2 classification
• map or convert feature to comprehensible domain
module 12.2: musical genre classification 7 / 20
overview intro genre MGC features example summary
musical genre classification
feature categories
high level similarities?
• melody, hook lines, bass lines, harmony progression
• rhythm & tempo
• structure
• instrumentation & timbre
technical feature categories
• tonal
• technical
• timbral
• temporal
• intensity
extracted features should be
• extractable (not: time envelope in polyphonic signals)
• relevant (not: pitch chroma for instrument ID)
• non-redundant
• have discriminative power
module 12.2: musical genre classification 8 / 20
overview intro genre MGC features example summary
musical genre classification
feature categories
high level similarities?
• melody, hook lines, bass lines, harmony progression
• rhythm & tempo
• structure
• instrumentation & timbre
technical feature categories
• tonal
• technical
• timbral
• temporal
• intensity
extracted features should be
• extractable (not: time envelope in polyphonic signals)
• relevant (not: pitch chroma for instrument ID)
• non-redundant
• have discriminative power
module 12.2: musical genre classification 8 / 20
overview intro genre MGC features example summary
musical genre classification
feature categories
high level similarities?
• melody, hook lines, bass lines, harmony progression
• rhythm & tempo
• structure
• instrumentation & timbre
technical feature categories
• tonal
• technical
• timbral
• temporal
• intensity
extracted features should be
• extractable (not: time envelope in polyphonic signals)
• relevant (not: pitch chroma for instrument ID)
• non-redundant
• have discriminative power
module 12.2: musical genre classification 8 / 20
overview intro genre MGC features example summary
musical genre classification
instantaneous features
spectral features (timbre):
Spectral Centroid, MFCCs, Spectral Flux, . . .
pitch features (tonal):
pitch chroma distribution/change, . . .
rhythm features (temporal):
onset density, beat histogram features, . . .
statistical features (technical):
standard deviation, skewness, zero crossings, . . .
intensity features:
level variation, number of “pauses”, . . .
module 12.2: musical genre classification 9 / 20
overview intro genre MGC features example summary
musical genre classification
instantaneous features
spectral features (timbre):
Spectral Centroid, MFCCs, Spectral Flux, . . .
pitch features (tonal):
pitch chroma distribution/change, . . .
rhythm features (temporal):
onset density, beat histogram features, . . .
statistical features (technical):
standard deviation, skewness, zero crossings, . . .
intensity features:
level variation, number of “pauses”, . . .
module 12.2: musical genre classification 9 / 20
overview intro genre MGC features example summary
musical genre classification
instantaneous features
spectral features (timbre):
Spectral Centroid, MFCCs, Spectral Flux, . . .
pitch features (tonal):
pitch chroma distribution/change, . . .
rhythm features (temporal):
onset density, beat histogram features, . . .
statistical features (technical):
standard deviation, skewness, zero crossings, . . .
intensity features:
level variation, number of “pauses”, . . .
module 12.2: musical genre classification 9 / 20
overview intro genre MGC features example summary
musical genre classification
instantaneous features
spectral features (timbre):
Spectral Centroid, MFCCs, Spectral Flux, . . .
pitch features (tonal):
pitch chroma distribution/change, . . .
rhythm features (temporal):
onset density, beat histogram features, . . .
statistical features (technical):
standard deviation, skewness, zero crossings, . . .
intensity features:
level variation, number of “pauses”, . . .
module 12.2: musical genre classification 9 / 20
overview intro genre MGC features example summary
musical genre classification
instantaneous features
spectral features (timbre):
Spectral Centroid, MFCCs, Spectral Flux, . . .
pitch features (tonal):
pitch chroma distribution/change, . . .
rhythm features (temporal):
onset density, beat histogram features, . . .
statistical features (technical):
standard deviation, skewness, zero crossings, . . .
intensity features:
level variation, number of “pauses”, . . .
module 12.2: musical genre classification 9 / 20
overview intro genre MGC features example summary
musical genre classification
feature extraction process
1 extract instantaneous features
2 compute derived features (derivatives etc.)
3 compute long term features & subfeatures per texture window or file
4 normalize subfeatures
5 (select or) transform subfeatures
6 feature vector → classifier input
module 12.2: musical genre classification 10 / 20
overview intro genre MGC features example summary
musical genre classification
feature extraction process
1 extract instantaneous features
2 compute derived features (derivatives etc.)
3 compute long term features & subfeatures per texture window or file
4 normalize subfeatures
5 (select or) transform subfeatures
6 feature vector → classifier input
module 12.2: musical genre classification 10 / 20
overview intro genre MGC features example summary
musical genre classification
feature extraction process
1 extract instantaneous features
2 compute derived features (derivatives etc.)
3 compute long term features & subfeatures per texture window or file
4 normalize subfeatures
5 (select or) transform subfeatures
6 feature vector → classifier input
module 12.2: musical genre classification 10 / 20
overview intro genre MGC features example summary
musical genre classification
feature extraction process
1 extract instantaneous features
2 compute derived features (derivatives etc.)
3 compute long term features & subfeatures per texture window or file
4 normalize subfeatures
5 (select or) transform subfeatures
6 feature vector → classifier input
module 12.2: musical genre classification 10 / 20
overview intro genre MGC features example summary
musical genre classification
feature extraction process
1 extract instantaneous features
2 compute derived features (derivatives etc.)
3 compute long term features & subfeatures per texture window or file
4 normalize subfeatures
5 (select or) transform subfeatures
6 feature vector → classifier input
module 12.2: musical genre classification 10 / 20
overview intro genre MGC features example summary
musical genre classification
feature extraction process
1 extract instantaneous features
2 compute derived features (derivatives etc.)
3 compute long term features & subfeatures per texture window or file
4 normalize subfeatures
5 (select or) transform subfeatures
6 feature vector → classifier input
module 12.2: musical genre classification 10 / 20
overview intro genre MGC features example summary
musical genre classification
feature extraction process
1 extract instantaneous features
2 compute derived features (derivatives etc.)
3 compute long term features & subfeatures per texture window or file
4 normalize subfeatures
5 (select or) transform subfeatures
6 feature vector → classifier input
module 12.2: musical genre classification 10 / 20
overview intro genre MGC features example summary
musical genre classification
long term features 1/2
derived from beat histogram3
statistical histogram features
number and values of top
maxima
location (relation) of top
maxima
. . .
3
G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” Transactions on Speech and Audio Processing, vol. 10, no. 5,
pp. 293–302, Jul. 2002, issn: 1063-6676. doi: 10.1109/TSA.2002.800560.
module 12.2: musical genre classification 11 / 20
overview intro genre MGC features example summary
musical genre classification
long term features 2/2
derived from pitch histogram or pitch chroma4
statistical histogram features
number and values of top
maxima
location (relation) of top
maxima
. . .
4
G. Tzanetakis, A. Ermolinskyi, and P. Cook, “Pitch Histograms in Audio and Symbolic Music Information Retrieval,” in Proceedings of the 3rd
International Conference on Music Information Retrieval (ISMIR), Paris, 2002.
module 12.2: musical genre classification 12 / 20
overview intro genre MGC features example summary
musical genre classification
additional possible features
stereo features
• mid channel energy vs. side channel energy
• spectral channel differences
features at higher semantic levels:
• tempo, structure, harmonic complexity, instrumentation
module 12.2: musical genre classification 13 / 20
overview intro genre MGC features example summary
musical genre classification
additional possible features
stereo features
• mid channel energy vs. side channel energy
• spectral channel differences
features at higher semantic levels:
• tempo, structure, harmonic complexity, instrumentation
module 12.2: musical genre classification 13 / 20
overview intro genre MGC features example summary
musical genre classification
results
classification results depend on training set, test set, and number of classes
typical range: ≈ 10 classes ⇒ 50–80%
main challenges
• ill-defined genre boundaries
• non-uniformly distributed classes
• overfitting through songs from same album or artist
• . . .
module 12.2: musical genre classification 14 / 20
overview intro genre MGC features example summary
musical genre classification
results
classification results depend on training set, test set, and number of classes
typical range: ≈ 10 classes ⇒ 50–80%
main challenges
• ill-defined genre boundaries
• non-uniformly distributed classes
• overfitting through songs from same album or artist
• . . .
module 12.2: musical genre classification 14 / 20
overview intro genre MGC features example summary
musical genre classification
results
classification results depend on training set, test set, and number of classes
typical range: ≈ 10 classes ⇒ 50–80%
main challenges
• ill-defined genre boundaries
• non-uniformly distributed classes
• overfitting through songs from same album or artist
• . . .
module 12.2: musical genre classification 14 / 20
overview intro genre MGC features example summary
musical genre classification
speech/music classification baseline example
binary classification task
1 extract features
2 represent each file with its 2-dimensional feature vector
3 kNN to classify unknown audio files
4 evaluate classification performance
module 12.2: musical genre classification 15 / 20
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: features 1/2
for each audio file
1 split input signal into (overlapping) blocks
n
n + 1
n + 2
n + 3
H K
i
2 compute 2 feature series (spectral centroid, RMS)
3 aggregate feature series to one value per file
• mean of Spectral Centroid µSC
• standard deviation of RMS σRMS
4 represent each file as 2-dimensional vector
module 12.2: musical genre classification 16 / 20
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: features 1/2
for each audio file
1 split input signal into (overlapping) blocks
2 compute 2 feature series (spectral centroid, RMS)
3 aggregate feature series to one value per file
• mean of Spectral Centroid µSC
• standard deviation of RMS σRMS
4 represent each file as 2-dimensional vector
module 12.2: musical genre classification 16 / 20
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: features 1/2
for each audio file
1 split input signal into (overlapping) blocks
2 compute 2 feature series (spectral centroid, RMS)
3 aggregate feature series to one value per file
• mean of Spectral Centroid µSC
µSC =
1
N
X
∀n
vSC(n)
• standard deviation of RMS σRMS
σRMS =
s
1
N
X
∀n
(vRMS(n) − µRMS)2
4 represent each file as 2-dimensional vector
module 12.2: musical genre classification 16 / 20
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: features 1/2
for each audio file
1 split input signal into (overlapping) blocks
2 compute 2 feature series (spectral centroid, RMS)
3 aggregate feature series to one value per file
• mean of Spectral Centroid µSC
• standard deviation of RMS σRMS
4 represent each file as 2-dimensional vector
µSC, σRMS
T
module 12.2: musical genre classification 16 / 20
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: features 2/2
module 12.2: musical genre classification 17 / 20
matlab
source:
plotFeaturespace.m
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: training set
use dataset annotated as speech and music:
• requirements
▶ large compared to number of features
▶ representative for use case (diverse)
• here (toy example):
▶ 64 speech files
▶ 64 music files
extract the features for the dataset
• centroid mean
• rms std
use 3NN classifier
procedure: Leave-One-Out-Cross-Validation
module 12.2: musical genre classification 18 / 20
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: results (kNN)
confusion matrix:
speech music # files
gt speech 51 13 64
gt music 11 53 64
⇒ classification rate:
53 + 54
64 + 64
= 81.25%
single feature classification results
• Spectral Centroid: 63.28%
• RMS: 73.44%
module 12.2: musical genre classification 19 / 20
matlab
source:
matlab/ExampleMusicSpeechClassification.m
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: results (kNN)
confusion matrix:
speech music # files
gt speech 51 13 64
gt music 11 53 64
⇒ classification rate:
53 + 54
64 + 64
= 81.25%
single feature classification results
• Spectral Centroid: 63.28%
• RMS: 73.44%
module 12.2: musical genre classification 19 / 20
matlab
source:
matlab/ExampleMusicSpeechClassification.m
overview intro genre MGC features example summary
musical genre classification
speech/music classification example: results (kNN)
confusion matrix:
speech music # files
gt speech 51 13 64
gt music 11 53 64
⇒ classification rate:
53 + 54
64 + 64
= 81.25%
single feature classification results
• Spectral Centroid: 63.28%
• RMS: 73.44%
module 12.2: musical genre classification 19 / 20
matlab
source:
matlab/ExampleMusicSpeechClassification.m
overview intro genre MGC features example summary
summary
lecture content
musical genre
• ill-defined, subjective, no general agreement
• some human agreement
MGC: features
• from all possible categories as all categories might depend on genre
• timbre seems most meaningful feature
MGC: classifier
• any classifier works, and most have been used
MGC: standard baseline
1 MFCCs
2 SVM
module 12.2: musical genre classification 20 / 20

More Related Content

More from AlexanderLerch4

0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdfAlexanderLerch4
 
09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdfAlexanderLerch4
 
09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdfAlexanderLerch4
 
09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdfAlexanderLerch4
 
07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdfAlexanderLerch4
 
15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdfAlexanderLerch4
 
12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdfAlexanderLerch4
 
09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdfAlexanderLerch4
 
11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdfAlexanderLerch4
 
07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdfAlexanderLerch4
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdfAlexanderLerch4
 
07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdfAlexanderLerch4
 
03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdfAlexanderLerch4
 
03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdfAlexanderLerch4
 
07-03-04-ACA-Tonal-Polyf0.pdf
07-03-04-ACA-Tonal-Polyf0.pdf07-03-04-ACA-Tonal-Polyf0.pdf
07-03-04-ACA-Tonal-Polyf0.pdfAlexanderLerch4
 
03-07-01-ACA-Input-Post-Processing.pdf
03-07-01-ACA-Input-Post-Processing.pdf03-07-01-ACA-Input-Post-Processing.pdf
03-07-01-ACA-Input-Post-Processing.pdfAlexanderLerch4
 
07-02-ACA-Tonal-Music.pdf
07-02-ACA-Tonal-Music.pdf07-02-ACA-Tonal-Music.pdf
07-02-ACA-Tonal-Music.pdfAlexanderLerch4
 

More from AlexanderLerch4 (20)

0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf
 
10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf
 
09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf
 
13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf
 
09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf
 
09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf
 
07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf
 
15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf
 
12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf
 
09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf
 
11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf
 
08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf
 
07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf
 
07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf
 
03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf
 
03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf
 
07-03-04-ACA-Tonal-Polyf0.pdf
07-03-04-ACA-Tonal-Polyf0.pdf07-03-04-ACA-Tonal-Polyf0.pdf
07-03-04-ACA-Tonal-Polyf0.pdf
 
03-07-01-ACA-Input-Post-Processing.pdf
03-07-01-ACA-Input-Post-Processing.pdf03-07-01-ACA-Input-Post-Processing.pdf
03-07-01-ACA-Input-Post-Processing.pdf
 
07-02-ACA-Tonal-Music.pdf
07-02-ACA-Tonal-Music.pdf07-02-ACA-Tonal-Music.pdf
07-02-ACA-Tonal-Music.pdf
 

Recently uploaded

Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)Wonjun Hwang
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 

Recently uploaded (20)

Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 

12-02-ACA-Genre.pdf

  • 1. Introduction to Audio Content Analysis module 12.2: musical genre classification alexander lerch
  • 2. overview intro genre MGC features example summary introduction overview corresponding textbook section section 12.2 lecture content • musical genre • processing steps in basic genre classifiers • example: genre classification with a kNN learning objectives • discuss ambiguities in the definition of musical genre and the possible impact on automatic systems • describe the processing steps for traditional musical genre classifiers • implement your own music genre classifier with Matlab module 12.2: musical genre classification 1 / 20
  • 3. overview intro genre MGC features example summary introduction overview corresponding textbook section section 12.2 lecture content • musical genre • processing steps in basic genre classifiers • example: genre classification with a kNN learning objectives • discuss ambiguities in the definition of musical genre and the possible impact on automatic systems • describe the processing steps for traditional musical genre classifiers • implement your own music genre classifier with Matlab module 12.2: musical genre classification 1 / 20
  • 4. overview intro genre MGC features example summary musical genre classification introduction one of the early/seminal research topics in MIR classic machine learning task • features → classification related tasks: • speech-music classification • instrument recognition • artist identification • music emotion recognition module 12.2: musical genre classification 2 / 20
  • 5. overview intro genre MGC features example summary musical genre classification introduction one of the early/seminal research topics in MIR classic machine learning task • features → classification related tasks: • speech-music classification • instrument recognition • artist identification • music emotion recognition module 12.2: musical genre classification 2 / 20
  • 6. overview intro genre MGC features example summary musical genre classification introduction one of the early/seminal research topics in MIR classic machine learning task • features → classification related tasks: • speech-music classification • instrument recognition • artist identification • music emotion recognition module 12.2: musical genre classification 2 / 20
  • 7. overview intro genre MGC features example summary musical genre classification applications large music databases: • annotation • sorting, browsing, retrieving recommendation and music discovery systems automatic playlist generation improving downstream MIR tasks by using side information/conditioning module 12.2: musical genre classification 3 / 20
  • 8. overview intro genre MGC features example summary musical genre classification applications large music databases: • annotation • sorting, browsing, retrieving recommendation and music discovery systems automatic playlist generation improving downstream MIR tasks by using side information/conditioning module 12.2: musical genre classification 3 / 20
  • 9. overview intro genre MGC features example summary musical genre classification genre: definition what is musical genre module 12.2: musical genre classification 4 / 20
  • 10. overview intro genre MGC features example summary musical genre classification genre: definition what is musical genre clusters of musical similarity? → hard to answer in general, there are many systematic problems module 12.2: musical genre classification 4 / 20
  • 11. overview intro genre MGC features example summary musical genre classification genre: definition what is musical genre clusters of musical similarity? → hard to answer in general, there are many systematic problems 1 non-agreement on taxonomies ▶ e.g., AllMusic vs. Pandora 2 genre label scope ▶ e.g., song, album, artist, piece of a song 3 ill-defined genre labels ▶ e.g., geographic (indian music), historic (baroque), technical (barbershop), instrumentation (symphonic music), usage (christmas songs) 4 taxonomy scalability ▶ e.g., genres and subgenres evolve over time 5 non-orthogonality ▶ e.g., several genres for one piece of music module 12.2: musical genre classification 4 / 20
  • 12. overview intro genre MGC features example summary musical genre classification genre: taxonomy examples Speech Music Male Female Sports Disco Country Hip Hop Rock Blues Reggae Pop Metal Classical Jazz Choir Orchestra Piano String Quartet Big Band Cool Fusion Piano Quartet Swing Background Speech Music Male Female +Background Classical Non-Classical Chamber Orchestra Rock Electro/Pop Jazz/Blues Piano Solo String Quartet Other Symphonic +Choir +Soloist Soft Rock Hard Rock Hip Hop Techno/Dance Pop module 12.2: musical genre classification 5 / 20
  • 13. overview intro genre MGC features example summary musical genre classification observations with humans 1 human classification far from perfect: 75–90 % for limited set of classes 2 for many genres, humans need only a fraction of a second to classify ⇒ short time timbre features sufficient? plots from1,2 1 S. Lippens, J.-P. Martens, T. D. Mulder, et al., “A Comparison of Human and Automatic Musical Genre Classification,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, 2004. 2 R. O. Gjerdingen and D. Perrott, “Scanning the Dial: The Rapid Recognition of Music Genres,” Journal of New Music Research, vol. 37, no. 2, pp. 93–100, Jun. 2008, 00067, issn: 0929-8215. module 12.2: musical genre classification 6 / 20
  • 14. overview intro genre MGC features example summary musical genre classification observations with humans 1 human classification far from perfect: 75–90 % for limited set of classes 2 for many genres, humans need only a fraction of a second to classify ⇒ short time timbre features sufficient? plots from1,2 1 S. Lippens, J.-P. Martens, T. D. Mulder, et al., “A Comparison of Human and Automatic Musical Genre Classification,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, 2004. 2 R. O. Gjerdingen and D. Perrott, “Scanning the Dial: The Rapid Recognition of Music Genres,” Journal of New Music Research, vol. 37, no. 2, pp. 93–100, Jun. 2008, 00067, issn: 0929-8215. module 12.2: musical genre classification 6 / 20
  • 15. overview intro genre MGC features example summary musical genre classification overview Audio Signal Feature Extraction Classification Genre Label 1 feature extraction • compressed, meaningful representation 2 classification • map or convert feature to comprehensible domain module 12.2: musical genre classification 7 / 20
  • 16. overview intro genre MGC features example summary musical genre classification overview Audio Signal Feature Extraction Classification Genre Label 1 feature extraction • compressed, meaningful representation 2 classification • map or convert feature to comprehensible domain module 12.2: musical genre classification 7 / 20
  • 17. overview intro genre MGC features example summary musical genre classification feature categories high level similarities? • melody, hook lines, bass lines, harmony progression • rhythm & tempo • structure • instrumentation & timbre technical feature categories • tonal • technical • timbral • temporal • intensity extracted features should be • extractable (not: time envelope in polyphonic signals) • relevant (not: pitch chroma for instrument ID) • non-redundant • have discriminative power module 12.2: musical genre classification 8 / 20
  • 18. overview intro genre MGC features example summary musical genre classification feature categories high level similarities? • melody, hook lines, bass lines, harmony progression • rhythm & tempo • structure • instrumentation & timbre technical feature categories • tonal • technical • timbral • temporal • intensity extracted features should be • extractable (not: time envelope in polyphonic signals) • relevant (not: pitch chroma for instrument ID) • non-redundant • have discriminative power module 12.2: musical genre classification 8 / 20
  • 19. overview intro genre MGC features example summary musical genre classification feature categories high level similarities? • melody, hook lines, bass lines, harmony progression • rhythm & tempo • structure • instrumentation & timbre technical feature categories • tonal • technical • timbral • temporal • intensity extracted features should be • extractable (not: time envelope in polyphonic signals) • relevant (not: pitch chroma for instrument ID) • non-redundant • have discriminative power module 12.2: musical genre classification 8 / 20
  • 20. overview intro genre MGC features example summary musical genre classification instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux, . . . pitch features (tonal): pitch chroma distribution/change, . . . rhythm features (temporal): onset density, beat histogram features, . . . statistical features (technical): standard deviation, skewness, zero crossings, . . . intensity features: level variation, number of “pauses”, . . . module 12.2: musical genre classification 9 / 20
  • 21. overview intro genre MGC features example summary musical genre classification instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux, . . . pitch features (tonal): pitch chroma distribution/change, . . . rhythm features (temporal): onset density, beat histogram features, . . . statistical features (technical): standard deviation, skewness, zero crossings, . . . intensity features: level variation, number of “pauses”, . . . module 12.2: musical genre classification 9 / 20
  • 22. overview intro genre MGC features example summary musical genre classification instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux, . . . pitch features (tonal): pitch chroma distribution/change, . . . rhythm features (temporal): onset density, beat histogram features, . . . statistical features (technical): standard deviation, skewness, zero crossings, . . . intensity features: level variation, number of “pauses”, . . . module 12.2: musical genre classification 9 / 20
  • 23. overview intro genre MGC features example summary musical genre classification instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux, . . . pitch features (tonal): pitch chroma distribution/change, . . . rhythm features (temporal): onset density, beat histogram features, . . . statistical features (technical): standard deviation, skewness, zero crossings, . . . intensity features: level variation, number of “pauses”, . . . module 12.2: musical genre classification 9 / 20
  • 24. overview intro genre MGC features example summary musical genre classification instantaneous features spectral features (timbre): Spectral Centroid, MFCCs, Spectral Flux, . . . pitch features (tonal): pitch chroma distribution/change, . . . rhythm features (temporal): onset density, beat histogram features, . . . statistical features (technical): standard deviation, skewness, zero crossings, . . . intensity features: level variation, number of “pauses”, . . . module 12.2: musical genre classification 9 / 20
  • 25. overview intro genre MGC features example summary musical genre classification feature extraction process 1 extract instantaneous features 2 compute derived features (derivatives etc.) 3 compute long term features & subfeatures per texture window or file 4 normalize subfeatures 5 (select or) transform subfeatures 6 feature vector → classifier input module 12.2: musical genre classification 10 / 20
  • 26. overview intro genre MGC features example summary musical genre classification feature extraction process 1 extract instantaneous features 2 compute derived features (derivatives etc.) 3 compute long term features & subfeatures per texture window or file 4 normalize subfeatures 5 (select or) transform subfeatures 6 feature vector → classifier input module 12.2: musical genre classification 10 / 20
  • 27. overview intro genre MGC features example summary musical genre classification feature extraction process 1 extract instantaneous features 2 compute derived features (derivatives etc.) 3 compute long term features & subfeatures per texture window or file 4 normalize subfeatures 5 (select or) transform subfeatures 6 feature vector → classifier input module 12.2: musical genre classification 10 / 20
  • 28. overview intro genre MGC features example summary musical genre classification feature extraction process 1 extract instantaneous features 2 compute derived features (derivatives etc.) 3 compute long term features & subfeatures per texture window or file 4 normalize subfeatures 5 (select or) transform subfeatures 6 feature vector → classifier input module 12.2: musical genre classification 10 / 20
  • 29. overview intro genre MGC features example summary musical genre classification feature extraction process 1 extract instantaneous features 2 compute derived features (derivatives etc.) 3 compute long term features & subfeatures per texture window or file 4 normalize subfeatures 5 (select or) transform subfeatures 6 feature vector → classifier input module 12.2: musical genre classification 10 / 20
  • 30. overview intro genre MGC features example summary musical genre classification feature extraction process 1 extract instantaneous features 2 compute derived features (derivatives etc.) 3 compute long term features & subfeatures per texture window or file 4 normalize subfeatures 5 (select or) transform subfeatures 6 feature vector → classifier input module 12.2: musical genre classification 10 / 20
  • 31. overview intro genre MGC features example summary musical genre classification feature extraction process 1 extract instantaneous features 2 compute derived features (derivatives etc.) 3 compute long term features & subfeatures per texture window or file 4 normalize subfeatures 5 (select or) transform subfeatures 6 feature vector → classifier input module 12.2: musical genre classification 10 / 20
  • 32. overview intro genre MGC features example summary musical genre classification long term features 1/2 derived from beat histogram3 statistical histogram features number and values of top maxima location (relation) of top maxima . . . 3 G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293–302, Jul. 2002, issn: 1063-6676. doi: 10.1109/TSA.2002.800560. module 12.2: musical genre classification 11 / 20
  • 33. overview intro genre MGC features example summary musical genre classification long term features 2/2 derived from pitch histogram or pitch chroma4 statistical histogram features number and values of top maxima location (relation) of top maxima . . . 4 G. Tzanetakis, A. Ermolinskyi, and P. Cook, “Pitch Histograms in Audio and Symbolic Music Information Retrieval,” in Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR), Paris, 2002. module 12.2: musical genre classification 12 / 20
  • 34. overview intro genre MGC features example summary musical genre classification additional possible features stereo features • mid channel energy vs. side channel energy • spectral channel differences features at higher semantic levels: • tempo, structure, harmonic complexity, instrumentation module 12.2: musical genre classification 13 / 20
  • 35. overview intro genre MGC features example summary musical genre classification additional possible features stereo features • mid channel energy vs. side channel energy • spectral channel differences features at higher semantic levels: • tempo, structure, harmonic complexity, instrumentation module 12.2: musical genre classification 13 / 20
  • 36. overview intro genre MGC features example summary musical genre classification results classification results depend on training set, test set, and number of classes typical range: ≈ 10 classes ⇒ 50–80% main challenges • ill-defined genre boundaries • non-uniformly distributed classes • overfitting through songs from same album or artist • . . . module 12.2: musical genre classification 14 / 20
  • 37. overview intro genre MGC features example summary musical genre classification results classification results depend on training set, test set, and number of classes typical range: ≈ 10 classes ⇒ 50–80% main challenges • ill-defined genre boundaries • non-uniformly distributed classes • overfitting through songs from same album or artist • . . . module 12.2: musical genre classification 14 / 20
  • 38. overview intro genre MGC features example summary musical genre classification results classification results depend on training set, test set, and number of classes typical range: ≈ 10 classes ⇒ 50–80% main challenges • ill-defined genre boundaries • non-uniformly distributed classes • overfitting through songs from same album or artist • . . . module 12.2: musical genre classification 14 / 20
  • 39. overview intro genre MGC features example summary musical genre classification speech/music classification baseline example binary classification task 1 extract features 2 represent each file with its 2-dimensional feature vector 3 kNN to classify unknown audio files 4 evaluate classification performance module 12.2: musical genre classification 15 / 20
  • 40. overview intro genre MGC features example summary musical genre classification speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks n n + 1 n + 2 n + 3 H K i 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value per file • mean of Spectral Centroid µSC • standard deviation of RMS σRMS 4 represent each file as 2-dimensional vector module 12.2: musical genre classification 16 / 20
  • 41. overview intro genre MGC features example summary musical genre classification speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value per file • mean of Spectral Centroid µSC • standard deviation of RMS σRMS 4 represent each file as 2-dimensional vector module 12.2: musical genre classification 16 / 20
  • 42. overview intro genre MGC features example summary musical genre classification speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value per file • mean of Spectral Centroid µSC µSC = 1 N X ∀n vSC(n) • standard deviation of RMS σRMS σRMS = s 1 N X ∀n (vRMS(n) − µRMS)2 4 represent each file as 2-dimensional vector module 12.2: musical genre classification 16 / 20
  • 43. overview intro genre MGC features example summary musical genre classification speech/music classification example: features 1/2 for each audio file 1 split input signal into (overlapping) blocks 2 compute 2 feature series (spectral centroid, RMS) 3 aggregate feature series to one value per file • mean of Spectral Centroid µSC • standard deviation of RMS σRMS 4 represent each file as 2-dimensional vector µSC, σRMS T module 12.2: musical genre classification 16 / 20
  • 44. overview intro genre MGC features example summary musical genre classification speech/music classification example: features 2/2 module 12.2: musical genre classification 17 / 20 matlab source: plotFeaturespace.m
  • 45. overview intro genre MGC features example summary musical genre classification speech/music classification example: training set use dataset annotated as speech and music: • requirements ▶ large compared to number of features ▶ representative for use case (diverse) • here (toy example): ▶ 64 speech files ▶ 64 music files extract the features for the dataset • centroid mean • rms std use 3NN classifier procedure: Leave-One-Out-Cross-Validation module 12.2: musical genre classification 18 / 20
  • 46. overview intro genre MGC features example summary musical genre classification speech/music classification example: results (kNN) confusion matrix: speech music # files gt speech 51 13 64 gt music 11 53 64 ⇒ classification rate: 53 + 54 64 + 64 = 81.25% single feature classification results • Spectral Centroid: 63.28% • RMS: 73.44% module 12.2: musical genre classification 19 / 20 matlab source: matlab/ExampleMusicSpeechClassification.m
  • 47. overview intro genre MGC features example summary musical genre classification speech/music classification example: results (kNN) confusion matrix: speech music # files gt speech 51 13 64 gt music 11 53 64 ⇒ classification rate: 53 + 54 64 + 64 = 81.25% single feature classification results • Spectral Centroid: 63.28% • RMS: 73.44% module 12.2: musical genre classification 19 / 20 matlab source: matlab/ExampleMusicSpeechClassification.m
  • 48. overview intro genre MGC features example summary musical genre classification speech/music classification example: results (kNN) confusion matrix: speech music # files gt speech 51 13 64 gt music 11 53 64 ⇒ classification rate: 53 + 54 64 + 64 = 81.25% single feature classification results • Spectral Centroid: 63.28% • RMS: 73.44% module 12.2: musical genre classification 19 / 20 matlab source: matlab/ExampleMusicSpeechClassification.m
  • 49. overview intro genre MGC features example summary summary lecture content musical genre • ill-defined, subjective, no general agreement • some human agreement MGC: features • from all possible categories as all categories might depend on genre • timbre seems most meaningful feature MGC: classifier • any classifier works, and most have been used MGC: standard baseline 1 MFCCs 2 SVM module 12.2: musical genre classification 20 / 20