Unsupervised learning approach for
identifying sub-genres in music scores
Girija Shingte and Mathieu d'Aquin (@mdaquin)
NUI Galway, Data Science Institute (@DSIatNUIG)
Insight SFI Research Centre for Data Analytics
Photo by Caitlin Bussey - flickr
How to represent the tunes in those collections
so they become comparable?
Symbolic music representation in MIDI
Event based-representation:
note-on <note: 64, velocity: 100, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 63, velocity: 100, time: 1>
note-off <note: 63, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
Symbolic music representation in MIDI
Event based-representation:
note-on <note: 64, velocity: 100, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 63, velocity: 100, time: 1>
note-off <note: 63, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
Can be simplified as a
sequence of notes, each
having to main
components:
Pitch: represented by the
number of the note
Time: represented by the
number of ticks since last
event.
Symbolic music representation in MIDI
Event based-representation:
note-on <note: 64, velocity: 100, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 63, velocity: 100, time: 1>
note-off <note: 63, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
Can be simplified as a
sequence of notes, each
having to main
components:
Pitch: represented by the
number of the note
Time: represented by the
number of ticks since last
event.
Symbolic music representation in MIDI
Event based-representation:
note-on <note: 64, velocity: 100, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 63, velocity: 100, time: 1>
note-off <note: 63, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
note-on <note: 66, velocity: 80, time: 1>
note-off <note: 66, velocity: 0, time: 480>
note-on <note: 64, velocity: 80, time: 1>
note-off <note: 64, velocity: 0, time: 240>
Can be simplified as a
sequence of notes, each
having to main
components:
Pitch: represented by the
number of the note
Time: represented by the
number of ticks since last
event.
Representing pitch features
Basic approach: As a vector of note numbers
[64,66,63,64,64,66,64]
But, the same tune transposed would be completely different...
Vector of differences with the first note:
[0,+2,-1,0,0,+2,0]
Vector of differences with the average note:
[-0.43,+1.57,-1.43,-0.43,-0.43,+1.57,-0.43]
Vector of differences with first note:
[0,+2,-3,+1,0,+2,-2]
Representing time features
Basic approach: Vector of durations
[240,480,240,240,240,480,240]
But depends on tick per beats. Vector of durations in beats
[0.5,1,0.5,0.5,0.5,1,0.5]
But depends on specific timing of the tune. Binary vector of note occurrence,
dividing beats in 24 possible locations (here reduced to 4):
[1,0,1,0,0,0,1,0,1,0,1,0,1,0,0,0,1,0]
Also, beat extracted from wav files and other “spectral” features.
Average similarities with jigs from TheSession.org
Average similarities with jigs from TheSession.org
Clustering based on features (K-Means)
Feature representation Silhouette coefficient
Pitch values -0.02
Difference with first pitch -0.02
Difference with average pitch 0.03
Difference with previous note’s pitch -0.02
Note duration -0.07
Binary representation of note beats 0.2
Beats from wav files 0.159
Applied to a sample of 400 tunes from the TheSession.org with k=4
Pre-weighted combined features for clustering
Weight pitch
(avg)
Weight timing 0/1 Weight beats wav Silhouette The
Session
Silhouette ITMA
0.33 0.33 0.34 0.31 0.19
0.2 0.2 0.6 0.22 0.03
0.45 0.45 0.1 0.17 0.25
0.55 0.25 0.2 0.31 0.19
0.1 0.1 0.8 0.33 0.21
Applied to a sample of 32 common tunes from the TheSession.org and ITMA with k=4
Applications
Musing information retrieval
Currently being extended to
search across collections and
to enable customising the
similarity function
Applications
Studying genres
Applications
Studying tunes and their connections
Conclusions
Music tunes are indeed complex objects which analysis
through machine learning and data mining approaches
require careful considerations.
They encapsulate rich knowledge about composition, styles
and culture, which we hope can be extracted.
Further work need on automatically extracting comparable
components of tunes, rather than treating tunes as atomic
artefacts.
Unsupervised learning approach for
identifying sub-genres in music scores
Girija Shingte and Mathieu d'Aquin (@mdaquin)
NUI Galway, Data Science Institute (@DSIatNUIG)
Insight SFI Research Centre for Data Analytics

Unsupervised learning approach for identifying sub-genres in music scores

  • 1.
    Unsupervised learning approachfor identifying sub-genres in music scores Girija Shingte and Mathieu d'Aquin (@mdaquin) NUI Galway, Data Science Institute (@DSIatNUIG) Insight SFI Research Centre for Data Analytics
  • 2.
    Photo by CaitlinBussey - flickr
  • 4.
    How to representthe tunes in those collections so they become comparable?
  • 5.
    Symbolic music representationin MIDI Event based-representation: note-on <note: 64, velocity: 100, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 63, velocity: 100, time: 1> note-off <note: 63, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240>
  • 6.
    Symbolic music representationin MIDI Event based-representation: note-on <note: 64, velocity: 100, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 63, velocity: 100, time: 1> note-off <note: 63, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> Can be simplified as a sequence of notes, each having to main components: Pitch: represented by the number of the note Time: represented by the number of ticks since last event.
  • 7.
    Symbolic music representationin MIDI Event based-representation: note-on <note: 64, velocity: 100, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 63, velocity: 100, time: 1> note-off <note: 63, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> Can be simplified as a sequence of notes, each having to main components: Pitch: represented by the number of the note Time: represented by the number of ticks since last event.
  • 8.
    Symbolic music representationin MIDI Event based-representation: note-on <note: 64, velocity: 100, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 63, velocity: 100, time: 1> note-off <note: 63, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> note-on <note: 66, velocity: 80, time: 1> note-off <note: 66, velocity: 0, time: 480> note-on <note: 64, velocity: 80, time: 1> note-off <note: 64, velocity: 0, time: 240> Can be simplified as a sequence of notes, each having to main components: Pitch: represented by the number of the note Time: represented by the number of ticks since last event.
  • 9.
    Representing pitch features Basicapproach: As a vector of note numbers [64,66,63,64,64,66,64] But, the same tune transposed would be completely different... Vector of differences with the first note: [0,+2,-1,0,0,+2,0] Vector of differences with the average note: [-0.43,+1.57,-1.43,-0.43,-0.43,+1.57,-0.43] Vector of differences with first note: [0,+2,-3,+1,0,+2,-2]
  • 10.
    Representing time features Basicapproach: Vector of durations [240,480,240,240,240,480,240] But depends on tick per beats. Vector of durations in beats [0.5,1,0.5,0.5,0.5,1,0.5] But depends on specific timing of the tune. Binary vector of note occurrence, dividing beats in 24 possible locations (here reduced to 4): [1,0,1,0,0,0,1,0,1,0,1,0,1,0,0,0,1,0] Also, beat extracted from wav files and other “spectral” features.
  • 11.
    Average similarities withjigs from TheSession.org
  • 12.
    Average similarities withjigs from TheSession.org
  • 13.
    Clustering based onfeatures (K-Means) Feature representation Silhouette coefficient Pitch values -0.02 Difference with first pitch -0.02 Difference with average pitch 0.03 Difference with previous note’s pitch -0.02 Note duration -0.07 Binary representation of note beats 0.2 Beats from wav files 0.159 Applied to a sample of 400 tunes from the TheSession.org with k=4
  • 14.
    Pre-weighted combined featuresfor clustering Weight pitch (avg) Weight timing 0/1 Weight beats wav Silhouette The Session Silhouette ITMA 0.33 0.33 0.34 0.31 0.19 0.2 0.2 0.6 0.22 0.03 0.45 0.45 0.1 0.17 0.25 0.55 0.25 0.2 0.31 0.19 0.1 0.1 0.8 0.33 0.21 Applied to a sample of 32 common tunes from the TheSession.org and ITMA with k=4
  • 15.
    Applications Musing information retrieval Currentlybeing extended to search across collections and to enable customising the similarity function
  • 16.
  • 17.
  • 18.
    Conclusions Music tunes areindeed complex objects which analysis through machine learning and data mining approaches require careful considerations. They encapsulate rich knowledge about composition, styles and culture, which we hope can be extracted. Further work need on automatically extracting comparable components of tunes, rather than treating tunes as atomic artefacts.
  • 19.
    Unsupervised learning approachfor identifying sub-genres in music scores Girija Shingte and Mathieu d'Aquin (@mdaquin) NUI Galway, Data Science Institute (@DSIatNUIG) Insight SFI Research Centre for Data Analytics