Computational models of symphonic music: challenges and opportunities for melody extraction and structural analysis

Computational models of symphonic music:
challenges and opportunities
Emilia Gómez
Universitat Pompeu Fabra, Barcelona, Spain

Computational models of symphonic music:
challenges and opportunities
Juan J. Bosch, Julio Carabias-Orti, Jordi Janer, Agustín Martorell,
Oscar Mayor, Marius Miron, Álvaro Sarasúa
Universitat Pompeu Fabra, Barcelona, Spain
Cynthia Liem, TUDelft, Neetherlands

2
Introduction
Which are the main challenges people experience
when confronted with a piece of music
they are unfamiliar with?
Can computational models
make these challenges easier?

3
Introduction
Are state-of-the-art automatic
description methods ready for
that?

4
Coordinators:
Musical institutions:
Technological partners:
Introduction
PHENICX : Performances a Highly Enriched aNd
Interactive Concert Experiences
Feb. 2013 – Feb. 2016

5
Transform music concert performances into multi-modal,
multi-layer and multi-perspective digital artefacts.
Introduction
Concert performance
Multi-perspective
Multimodal
Multilayer

6
Provide facilities to explore, (re)enjoy and share concerts.
• Before, During & After:
1. Digital program notes
2. Virtual concert guide
3. Overseeing the music
4. Focusing attention and switching viewpoints
5. Comparing different performances
6. Capturing the moment, sharing the magic
7. Joining the orchestra
• Research goals:
• Automatic music description.
• Visualization & interaction.
Introduction
Liem, C. C. S., R. van der Sterren, M. van Tilburg, Á. Sarasúa, J. J. Bosch, J. Janer, M. Melenhorst, E. Gómez, and A. Hanjalic, "Innovating
the Classical Music Experience in the PHENICX Project: Use Cases and Initial User Feedback", 1st International Workshop on
Interactive Content Consumption (WSICC) at EuroITV 2013, Como, Italy, 06/2013
http://www.concertgebouworkest.nl/en/rco-editions/

7
Introduction
Challenges for current technologies Beethoven Symphony 3rd Eroica
• Classical - Romantic.
• Paradigm of formal complexity: vast literature.
• Significant usage of symphonic resources, yet not exploiting full possibilities.
• Variety of problems specific for the symphonic repertoire / avoiding a complex task.
Literature
focus
Symphonic
repertoire
Duration Short
(song 3’)
Long
(57’, 4 movements)
Musical
complexity
Low High
Instrumentation Simple Complex (13 staves)
Overlapping
sources in audio
Small number High number (instrument
sections in unison)
Modalities Score or audio 17 audio, 8 video, score
(15 Gbytes)
Performance by the
Royal Concertgebouw Orchestra (RCO).

8
Introduction
Beethoven Symphony No. 3 Eroica (4 movements) by RCO:
• 17 microphones, 7 cameras, 1 edited video, Aligned score, Automatic descriptors,
isolated tracks (Source separation).
• 57 min (15 Gbytes)
Beethoven Symphony No. 9 (4 movements) by Orquesta Sinfónica del Vallés:
• 32 microphones, 2 cameras (1 orchestra, 1 conductor RGB+ RGBD), Kinect
Sensor (skeleton joints and bones), Aligned score, Automatic descriptors,
isolated tracks (Source separation).
• 1 hour (20 Gbytes)
Mahler Symphony No. 4 (4 movements) by RCO:
• 27 microphones, 8 cameras, 1 edited video.
• 1h 6’ (75 Gbytes)
Brahms Symphony No. 3 Op. 90 (4 movements) by RCO:
• 23 microphones, 7 cameras, 1 edited video
• 40 min (26 Gbytes)
http://repovizz.upf.edu

9
Introduction
• Data repository for synchronization, visualization,
interaction, computation.
• Repovizz
http://repovizz.upf.edu
Mayor, O., Llimona Q., Marchini M., Papiotis P., & Maestre E. (2013). “repoVizz: a Framework for Remote Storage, Browsing, Annotation,
and Exchange of Multi-modal Data”. ACM International Conference on Multimedia (MM'13).

10
Approaches
Research topics
1. Melody extraction
2. Structural analysis
3. Source separation
4. Music visualization
Linked to information needs.

11
Approaches
Research topics
Linked to information needs.
Methodology
• Data gathering.
• Analysis of human annotations.
• Evaluation of existing methods.
• Adaptation and improvement.

12
Approaches
Research topics

13
Definition:
Sequence of fundamental frequency values representing the pitch of
the lead voice or instrument (Salamon, 2012).
Sequence of pitches that people hum or sing to represent a music
piece (Poliner et al. 2007).
Hypothesis: more intuitive for non-expert users than the traditional score.
Challenges:
• High number of overlapping sources.
• Melody played by different/multiple instruments or sections, unison,
octave relation, or with harmonized melodic lines.
(Salamon and Gómez, 2012)
E. Gómez, A. Klapuri, B. Meudic, “Melody description and extraction in the context of music content processing”, JNMR 32(1), 2003.
J. Salamon, E. Gómez, “Melody extraction from polyphonic music signals using pitch contour characteristics”, IEEE TASLP 20(6), 2012.

14
State-of-the-art
Audio:
• Multiple f0 estimation: 69% note accuracy for simple material.
• Predominant f0 estimation: vocal pop and jazz (85% frame
accuracy) vs other instruments (68% frame accuracy).
Score:
• Methods to select the predominant melodic line (Uitdenbogerd and Zobel,1999)
Methodology
• Music collection building.
• Analysis of human annotations.
• Evaluation of existing methods: audio centred.
J. Salamon, E. Gómez, D. P. W. Ellis and G. Richard, "Melody Extraction from Polyphonic Music Signals: Approaches, Applications and
Challenges", IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.

15
Music collection building and annotation
• Symphonies and symphonic poems, ballets, suites.
• Mostly romantic period, also classical and 20th century.
• 10 to 32 seconds 64 excerpts (94% voiced frames).
J. J. Bosch and E. Gómez, “Melody extraction in symphonic classical music: a comparative study on mutual agreement between humans
and algorithms”, In Proc. of 9th Conference on Interdisciplinary Musicology, Berlin, December 2014
Sections of dominant instruments playing the
main melody:
ST: Strings, BR: Brass, WW: Woodwinds), where
Alt- denotes alternation

16
Music collection building and annotation
• Subjects singing along with music.
• Measure inter-subject & algorithm mutual agreement.
• Ground truth generation.
• Algorithm combination.
J. J. Bosch and E. Gómez, “Melody extraction in symphonic classical music: a comparative study on mutual agreement between humans
and algorithms”, In Proc. of 9th Conference on Interdisciplinary Musicology, Berlin, December 2014
• Measure correlation with
melodic features: range,
density, tessitura (Hippel, 2000),
complexity (Eerola, 2000),
melodiousness (Leman, 1995),
originality (Simonton, 1984), at MIDI
toolbox (Eerola & Toiviainen 2004).

17
Annotation (MIDI based)
Female singing
Male singing
Algorithms

18
Mutual agreement
• Melodic range, note density & complexity have a negative correlation
with people’s ability and agreement.
• Algorithms differ more in excerpts with high note density and pitch
complexity.

19
Evaluation of 13 state-of-the-art approaches
• 5 pitch salience functions (SF)
• 4 multi-pitch estimation methods (MP)
• 4 melody extraction methods (ME)
• Novel approach: combination of salience functions, refinement &
tracking method.
J. J. Bosch and E. Gómez, “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”,
submitted.

20
Evaluation of state-of-the-art approaches
• ME 67%, MP (10 estimates) 94.2%.
submitted.
Raw Pitch, Weighted Raw Pitch,
Raw Chroma, Overall ACcurracy

21
Conclusions:
• Signal processing front-ends do not generally estimate
the melody pitch as the most salient one.
• Current methods (estimation & tracking) are biased to
singing voice. Best results with source-filter models
(Durrieu et al., 2010).
• Difficulty in selecting the melody instrument and pitch.
• Algorithm performance is correlated with melodic
complexity (specially pitch complexity) and density.
• Need for a combined audio + score melody estimation.
submitted.

22
Approaches
Research topics

23
Motivation
• Navigation and orientation along the piece.
Goals
• Characterize specificities for symphonic repertoire: musical
characteristics and signal properties unique to symphonic music.
• Use this knowledge to improve state-of-the-art algorithms.
Methodology:
• Expert analyses.
• Evaluation of existing methods.

24
Structure in symphonic music
Tonality
• A main factor contributing to musical form.
• Ex: “sonata-form”, 1st movement of many symphonies, clear on the simplest
(earliest) symphonies, otherwise serve as ‘structural references’.
• Practice: tonality in constant evolution, short-term keys (‘tonicisations’).
Orchestration
• Combination of instrumentation and pitch content.
• Richness of sonority: combination of instrument families (woodwinds, brass,
strings and percussion), pitch content (register).
• Well-known effects: contrast between sections, dynamic transformations
(‘orchestral crescendo’).
• Impact on listeners: solo vs tutti.

25
State of the art
Tonal description
• Key estimation (template-based models): pitch-class profiles vs templates (learnt
from data, perceptual experiments, music theory) ex: (Krumhansl and Kessler, 1982)
• Key tracking: probabilistic inference (Hidden-Markov Models, Neural Networks).
• Annotations for short excerpts (Chuan and Chew, 2012).
Structural analysis
• Wealth of research from both audio and score
(Mueller and Smith, ISMIR tutorial, 2014).
• Structure as related to:
• Self-similarity analysis: timbre, tonality.
• Homogeneity, Novelty: abrupt changes.
(Foote 2001)
E. Gómez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on Computing 18(3), pp. 294-
304, 2006.

26
Analysis of expert information
Survey of 8 music analysts on “Eroica” + own music analysis
(1st movement, most complex one).
Large discrepancies / high level of conceptualization.
Voting scheme: number of supporting scholars, th=3 (minimum consensus)
 16 segment boundaries in Eroica exposition.
Plantinga, L. Romantic Music: a History of Musical Style in Nineteenth-Century Europe. New York: Norton, 1984.
Sipe, T. Beethoven: Eroica Symphony. Cambridge University Press, 1998.
Taruskin, R. The Oxford History of Western Music, vol.2., 2005.
Dahlhaus, C. Ludwig van Beethoven: Approaches to His Music. Oxford: Clarendon, 1991.
Schenker, H. 'Beethoven's Third Sympohony: Its True Content Described for the First Time', in Heinrich Schenker, The Masterwork
in Music: A Yearbook, vol.3, Cambridge Studies in Music Theory and Analysis 10, Cambridge: Cambridge University Press, 1997.
Webster, J. 'Sonata Form', in New Grove Dictionary of Music and Musicians, vol.23, London: Macmillan, 2001.
Horne, W. 'The hidden trellis: where does the second group begin in the first movement of Beethoven's Eroica Symphony?'
Beethoven Forum, vol.13, nº.2 (2006), pp. 95-147.
Grove, G. Beethoven and his nine symphonies. London: Novello (1890)
Additional schorlarly sources: Kerman, J.; Lockwood, L.; Kegan, P.; Dent, J. M.; Mathews, D.; Hopkins, A.; Christopher, H.
Program notes from 10 Symphony Orchestras: Philadelphia, Utah, Florida, Atlantic, Oregon, Jacksonville, St. Louis, San Francisco,
New York and Boston ... almost no structural information at all !!!

27
Key estimation
Multi-scale key estimation & representation method (audio - MIDI)
(Sapp’s keyscapes)
Several of the ground-truth boundaries are well defined by key estimation.
Many other short tonicisations are present.
Some of the segments are not bounded by tonal shifts.
E. Gómez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on Computing 18(3), pp. 294-304, 2006.
A. Martorell and E. Gómez, “Hierarchical multi-scale set-class analysis”, Journal of Mathematics and Music, pp. 1-14, 2014.
audio
MIDI
Exposition

28
Cadence finding
Key-independent cadential analysis based on transposition-
invariant set-classes.
Some important cadential procedures in symphonies,
contribution to large-scale structures, are beyond the
description of cadences as plain sequences of chords, and
require sophisticated hierarchical interpretation.
B. A. Martorell, “Systematic set-class surface analysis: a hierarchical multi-scale approach”, in 8th European Music Analysis Conference, Leuven,, 2014.

29
Orchestration
Pitches, pitch classes & instruments.
Instrumentation
All the structural boundaries correspond to important changes in the
instrumentation.

30
Conclusions
Symphonic pieces are long and complex in term of structures.
A combination of audio and score descriptors seem to
capture structural boundaries.
Difficulty of evaluation,
lack of consensus.
Different layers:
• Non-expert users:
instrumentation.
Ex: solo vs tutti.
• Experts: key.
A. Martorell and E. Gómez, “Systematic multi-scale set-class analysis”, 15th ISMIR Conference, Taipei, 2014.
A. Martorell, “Systematic set-class surface analysis: a hierarchical multi-scale approach”, in 8th European Music Analysis Conference, Leuven,, 2014.

31
Approaches
Research topics

32
Goals: interact with symphonic music by listening to
different instrument sections separately.
Tasks:
Multi-channel source separation of orchestral
instruments/sections.
Note-level alignment for refined separation.
Instrument emphasis.
Source localization & rendering.

33
Multi-channel audio source separation
Nb of sources ≤ Nb of microphone signals.
Each source has a channel at which is predominant (highest direct-
to-reverberant ratio)
Nb & type of instruments known in advance
J.J. Carabias-Orti, M. Cobos, P. Vera-Candeas and F.J. Rodriguez-Serrano, “Nonnegative signal factorization with learnt instrument
models for sound source separation in close-microphone recordings”, in EURASIP Journal on Advances in Signal Processing.

34
Multi-channel source separation
1. Panning Matrix estimation using score information to find isolated time-
frequency source locations from multi-channel input (17 channels, 12 instruments).
2. NMF-based signal factorization on the selected channel per instrument with
trained instrument models (RWC) to estimate the separated sources spectrogram.
3. Wiener mask separation to perform the reconstruction of each instrument source.

35
Note-level alignment
The quality of the separation strongly relies in the
quality of the alignment!
Current audio to score methods are evaluated using a
tolerance window  beat level (Cont et al., 2007).
Even manual alignment (usually at beat level) has
inexactitudes in the onset/offset.
missaligned vs aligned

36
The reconstructed signal can
be seen as the product
between the several harmonic
components (A) and the gains
(B).
After NMF, the resulting gains
(C) are split in submatrices
and used to detect blobs
(boundaries and shapes) (D).
Note-level alignment
M. Miron, Carabias, J. José, and Janer, J., “Audio-to-score alignment at the note level for orchestral recordings”, in 15th International
Society for Music Information Retrieval Conference, Taipei, Taiwan, 2014

37
Demo
http://repovizz.upf.edu/phenicx/

38
Conclusions
• Symphonic music is challenging for audio source
separation due to the high number of overlapping
sources.
• Take advantage of multi-channel recordings, redundant
information.
• Need for an informed approach, note-level score
alignment for better quality.
• Quality still far from what a musician would expect.

39
Approaches
Research topics

40
Goal: provide meaningful music visualizations.
Research questions:
• Which information to present users?
• How to visualize this information?
Challenges:
• Not much research on visualizing descriptors.
• Expert vs non-expert users.
• Short-time (local) vs long-time (global).
• Off-line (after the concert) vs on-line (during the concert).
• Cope with errors of current technologies.

41
State-of-the-art approaches
Short-term visualizations: “now” (24-100 fps)
Score
Piano roll
Instrumentation
Large-scale visualizations “piece”
Structure: instrumentation & key

42
Score
• Eroica 13 staves
(vs 25 staves mainstream
symphonic repertoire).
• Physical space.
• Need of musical knowledge.
A. Arzt, Böck, S., Flossmann, S., Frostel, H., Gasser, M., and Widmer, G., “The Complete Classical Music Companion V0.9”, in 53rd
AES Conference on Semantic Audio, London, UK, 2014

43
Score reduction
• Predominant melodic lines.

44
Tonality
• Key (local) – geometrical models
(Krumhansl 1990)

45
Tonality
• Keyscapes (global)
(C. S. Sapp)

46
Instrumentation
• Instrumentation & physical space (local).

47
Instrumentation
• Instrumentation & physical space (global)

48
Structure
• Segmentation

49
Evaluation strategy
• 2 focus groups: Amsterdam (casual, heavy consumers)
vs Barcelona (musicians).
• Show different visualization concepts, discussion and
questionnaire.

50
Results
• Need for specific information on a specific moment:
• Experts: score, structure.
• Naïve: melodic line, instrumentation, structure.
• Need to have control of this information.
• Scenarios: during (musicians, learning tool), after (concert goers).
• Interest (Gareth Loy keynote):
• Surprise factor vs overview of what is coming.
• Attracting attention towards specific elements vs
overstimulation.
• Design recommendations  minimalistic, unobtrusive, appealing,
adaptable.

51
Screen, tech audience, educators, around 1000 people.
Young orchestra
Prometheus Overture, Beethoven

53
4 visualizations: The sound, The piece, The orchestra,
The conductor.
+ Some quotes about the Prometheus legend
Design support by http://www.hand-coded.net/

The Conductor
Work by A. Sarasúa & E. Guaus
Á. Sarasúa and E. Guaus, “Dynamics in Music Conducting: A Computational Comparative Study Among
Subjects“. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME’14,
pages 195-200. Goldsmiths London, UK, 2014. https://github.com/asarasua/ofxKinectFeatures

73
Technical setup
• Openframeworks / openGL
• Cable connection
• Kinect & audio input

74
Real-time feature extraction:
• Chord probabilities.
• Loudness.
• Assisted score following method:
• Off-line analysis: meter, average tempo, pauses.
• Probabilistic model for tempo prediction and
tracking.
• Video mixer.

75
Mocap feature extraction for visualization
ofxKinectFeatures
o openFrameworks addon for real-time feature extraction
o Mapping to animation
 Loudness
 Beat detection
 Quantity of motion
Á. Sarasúa and E. Guaus, “Dynamics in Music Conducting: A Computational Comparative Study Among Subjects“. In Proceedings
of the International Conference on New Interfaces for Musical Expression, NIME’14, pages 195-200. Goldsmiths London, UK,
2014. https://github.com/asarasua/ofxKinectFeatures

76
Very good feedback of attendees &
media:
• Spanish press: ABC, El País.
• PHENICX-A live concert, in Digital
Agenda for Europe.
http://europa.eu/!yx93gV
• Twitter:
• Novelty of gestures.
• Interest for technology by
young audiences.
• Educational applications.

77
Screen, tech audience, children orchestra
Video

78
Introduction
Are state-of-the-art automatic
description methods ready for
that?

79
• Technology can facilitate the appreciation of classical
music by new audiences.
• Current technologies have limitations when:
• Addressing symphonic music.
• On a real concert setup.
• Opportunity to address tasks in a different way and
improve state of the art methods.
• User-centred paradigms: visualization, interaction,
adaptation.
Conclusions

Computational models of symphonic music: challenges and opportunities for melody extraction and structural analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Computational models of symphonic music: challenges and opportunities for melody extraction and structural analysis

Similar to Computational models of symphonic music: challenges and opportunities for melody extraction and structural analysis (20)

Recently uploaded

Recently uploaded (20)

Computational models of symphonic music: challenges and opportunities for melody extraction and structural analysis

Editor's Notes