Computational models of symphonic music face various challenges due to the genre's formal complexity, long durations, complex instrumentation, and overlapping sources. Researchers are developing approaches to address melody extraction, structural analysis, source separation, and music visualization for symphonic works. For melody extraction, current methods perform best on simple excerpts but struggle with density and complexity, indicating the need for combined audio-score approaches. Structural analysis of symphonies requires consideration of tonality, orchestration, and discrepancies between expert analyses. Source separation aims to isolate instrument sections from multi-channel recordings.
Prach: A Feature-Rich Platform Empowering the Autism Community
Computational models of symphonic music: challenges and opportunities for melody extraction and structural analysis
1. Computational models of symphonic music:
challenges and opportunities
Emilia Gómez
Universitat Pompeu Fabra, Barcelona, Spain
2. Computational models of symphonic music:
challenges and opportunities
Juan J. Bosch, Julio Carabias-Orti, Jordi Janer, Agustín Martorell,
Oscar Mayor, Marius Miron, Álvaro Sarasúa
Universitat Pompeu Fabra, Barcelona, Spain
Cynthia Liem, TUDelft, Neetherlands
3. 2
Introduction
Which are the main challenges people experience
when confronted with a piece of music
they are unfamiliar with?
Can computational models
make these challenges easier?
6. 5
Transform music concert performances into multi-modal,
multi-layer and multi-perspective digital artefacts.
Introduction
Concert performance
Multi-perspective
Multimodal
Multilayer
7. 6
Provide facilities to explore, (re)enjoy and share concerts.
• Before, During & After:
1. Digital program notes
2. Virtual concert guide
3. Overseeing the music
4. Focusing attention and switching viewpoints
5. Comparing different performances
6. Capturing the moment, sharing the magic
7. Joining the orchestra
• Research goals:
• Automatic music description.
• Visualization & interaction.
Introduction
Liem, C. C. S., R. van der Sterren, M. van Tilburg, Á. Sarasúa, J. J. Bosch, J. Janer, M. Melenhorst, E. Gómez, and A. Hanjalic, "Innovating
the Classical Music Experience in the PHENICX Project: Use Cases and Initial User Feedback", 1st International Workshop on
Interactive Content Consumption (WSICC) at EuroITV 2013, Como, Italy, 06/2013
http://www.concertgebouworkest.nl/en/rco-editions/
8. 7
Introduction
Challenges for current technologies Beethoven Symphony 3rd Eroica
• Classical - Romantic.
• Paradigm of formal complexity: vast literature.
• Significant usage of symphonic resources, yet not exploiting full possibilities.
• Variety of problems specific for the symphonic repertoire / avoiding a complex task.
Literature
focus
Symphonic
repertoire
Duration Short
(song 3’)
Long
(57’, 4 movements)
Musical
complexity
Low High
Instrumentation Simple Complex (13 staves)
Overlapping
sources in audio
Small number High number (instrument
sections in unison)
Modalities Score or audio 17 audio, 8 video, score
(15 Gbytes)
Performance by the
Royal Concertgebouw Orchestra (RCO).
10. 9
Introduction
• Data repository for synchronization, visualization,
interaction, computation.
• Repovizz
http://repovizz.upf.edu
Mayor, O., Llimona Q., Marchini M., Papiotis P., & Maestre E. (2013). “repoVizz: a Framework for Remote Storage, Browsing, Annotation,
and Exchange of Multi-modal Data”. ACM International Conference on Multimedia (MM'13).
12. 11
Approaches
Research topics
1. Melody extraction
2. Structural analysis
3. Source separation
4. Music visualization
Linked to information needs.
Methodology
• Data gathering.
• Analysis of human annotations.
• Evaluation of existing methods.
• Adaptation and improvement.
14. 13
Definition:
Sequence of fundamental frequency values representing the pitch of
the lead voice or instrument (Salamon, 2012).
Sequence of pitches that people hum or sing to represent a music
piece (Poliner et al. 2007).
Hypothesis: more intuitive for non-expert users than the traditional score.
Challenges:
• High number of overlapping sources.
• Melody played by different/multiple instruments or sections, unison,
octave relation, or with harmonized melodic lines.
1. Melody extraction
(Salamon and Gómez, 2012)
E. Gómez, A. Klapuri, B. Meudic, “Melody description and extraction in the context of music content processing”, JNMR 32(1), 2003.
J. Salamon, E. Gómez, “Melody extraction from polyphonic music signals using pitch contour characteristics”, IEEE TASLP 20(6), 2012.
15. 14
1. Melody extraction
State-of-the-art
Audio:
• Multiple f0 estimation: 69% note accuracy for simple material.
• Predominant f0 estimation: vocal pop and jazz (85% frame
accuracy) vs other instruments (68% frame accuracy).
Score:
• Methods to select the predominant melodic line (Uitdenbogerd and Zobel,1999)
Methodology
• Music collection building.
• Analysis of human annotations.
• Evaluation of existing methods: audio centred.
• Adaptation and improvement.
J. Salamon, E. Gómez, D. P. W. Ellis and G. Richard, "Melody Extraction from Polyphonic Music Signals: Approaches, Applications and
Challenges", IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.
16. 15
1. Melody extraction
Music collection building and annotation
• Symphonies and symphonic poems, ballets, suites.
• Mostly romantic period, also classical and 20th century.
• 10 to 32 seconds 64 excerpts (94% voiced frames).
J. J. Bosch and E. Gómez, “Melody extraction in symphonic classical music: a comparative study on mutual agreement between humans
and algorithms”, In Proc. of 9th Conference on Interdisciplinary Musicology, Berlin, December 2014
Sections of dominant instruments playing the
main melody:
ST: Strings, BR: Brass, WW: Woodwinds), where
Alt- denotes alternation
17. 16
1. Melody extraction
Music collection building and annotation
• Subjects singing along with music.
• Measure inter-subject & algorithm mutual agreement.
• Ground truth generation.
• Algorithm combination.
J. J. Bosch and E. Gómez, “Melody extraction in symphonic classical music: a comparative study on mutual agreement between humans
and algorithms”, In Proc. of 9th Conference on Interdisciplinary Musicology, Berlin, December 2014
• Measure correlation with
melodic features: range,
density, tessitura (Hippel, 2000),
complexity (Eerola, 2000),
melodiousness (Leman, 1995),
originality (Simonton, 1984), at MIDI
toolbox (Eerola & Toiviainen 2004).
19. 18
1. Melody extraction
Mutual agreement
• Melodic range, note density & complexity have a negative correlation
with people’s ability and agreement.
• Algorithms differ more in excerpts with high note density and pitch
complexity.
20. 19
1. Melody extraction
Evaluation of 13 state-of-the-art approaches
• 5 pitch salience functions (SF)
• 4 multi-pitch estimation methods (MP)
• 4 melody extraction methods (ME)
• Novel approach: combination of salience functions, refinement &
tracking method.
J. J. Bosch and E. Gómez, “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”,
submitted.
21. 20
1. Melody extraction
Evaluation of state-of-the-art approaches
• ME 67%, MP (10 estimates) 94.2%.
J. J. Bosch and E. Gómez, “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”,
submitted.
Raw Pitch, Weighted Raw Pitch,
Raw Chroma, Overall ACcurracy
22. 21
1. Melody extraction
Conclusions:
• Signal processing front-ends do not generally estimate
the melody pitch as the most salient one.
• Current methods (estimation & tracking) are biased to
singing voice. Best results with source-filter models
(Durrieu et al., 2010).
• Difficulty in selecting the melody instrument and pitch.
• Algorithm performance is correlated with melodic
complexity (specially pitch complexity) and density.
• Need for a combined audio + score melody estimation.
J. J. Bosch and E. Gómez, “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”,
submitted.
24. 23
2. Structural analysis
Motivation
• Navigation and orientation along the piece.
Goals
• Characterize specificities for symphonic repertoire: musical
characteristics and signal properties unique to symphonic music.
• Use this knowledge to improve state-of-the-art algorithms.
Methodology:
• Expert analyses.
• Evaluation of existing methods.
• Adaptation and improvement.
25. 24
2. Structural analysis
Structure in symphonic music
Tonality
• A main factor contributing to musical form.
• Ex: “sonata-form”, 1st movement of many symphonies, clear on the simplest
(earliest) symphonies, otherwise serve as ‘structural references’.
• Practice: tonality in constant evolution, short-term keys (‘tonicisations’).
Orchestration
• Combination of instrumentation and pitch content.
• Richness of sonority: combination of instrument families (woodwinds, brass,
strings and percussion), pitch content (register).
• Well-known effects: contrast between sections, dynamic transformations
(‘orchestral crescendo’).
• Impact on listeners: solo vs tutti.
26. 25
2. Structural analysis
State of the art
Tonal description
• Key estimation (template-based models): pitch-class profiles vs templates (learnt
from data, perceptual experiments, music theory) ex: (Krumhansl and Kessler, 1982)
• Key tracking: probabilistic inference (Hidden-Markov Models, Neural Networks).
• Annotations for short excerpts (Chuan and Chew, 2012).
Structural analysis
• Wealth of research from both audio and score
(Mueller and Smith, ISMIR tutorial, 2014).
• Structure as related to:
• Self-similarity analysis: timbre, tonality.
• Homogeneity, Novelty: abrupt changes.
(Foote 2001)
E. Gómez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on Computing 18(3), pp. 294-
304, 2006.
27. 26
Analysis of expert information
Survey of 8 music analysts on “Eroica” + own music analysis
(1st movement, most complex one).
Large discrepancies / high level of conceptualization.
Voting scheme: number of supporting scholars, th=3 (minimum consensus)
16 segment boundaries in Eroica exposition.
2. Structural analysis
Plantinga, L. Romantic Music: a History of Musical Style in Nineteenth-Century Europe. New York: Norton, 1984.
Sipe, T. Beethoven: Eroica Symphony. Cambridge University Press, 1998.
Taruskin, R. The Oxford History of Western Music, vol.2., 2005.
Dahlhaus, C. Ludwig van Beethoven: Approaches to His Music. Oxford: Clarendon, 1991.
Schenker, H. 'Beethoven's Third Sympohony: Its True Content Described for the First Time', in Heinrich Schenker, The Masterwork
in Music: A Yearbook, vol.3, Cambridge Studies in Music Theory and Analysis 10, Cambridge: Cambridge University Press, 1997.
Webster, J. 'Sonata Form', in New Grove Dictionary of Music and Musicians, vol.23, London: Macmillan, 2001.
Horne, W. 'The hidden trellis: where does the second group begin in the first movement of Beethoven's Eroica Symphony?'
Beethoven Forum, vol.13, nº.2 (2006), pp. 95-147.
Grove, G. Beethoven and his nine symphonies. London: Novello (1890)
Additional schorlarly sources: Kerman, J.; Lockwood, L.; Kegan, P.; Dent, J. M.; Mathews, D.; Hopkins, A.; Christopher, H.
Program notes from 10 Symphony Orchestras: Philadelphia, Utah, Florida, Atlantic, Oregon, Jacksonville, St. Louis, San Francisco,
New York and Boston ... almost no structural information at all !!!
28. 27
Key estimation
Multi-scale key estimation & representation method (audio - MIDI)
(Sapp’s keyscapes)
Several of the ground-truth boundaries are well defined by key estimation.
Many other short tonicisations are present.
Some of the segments are not bounded by tonal shifts.
2. Structural analysis
E. Gómez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on Computing 18(3), pp. 294-304, 2006.
A. Martorell and E. Gómez, “Hierarchical multi-scale set-class analysis”, Journal of Mathematics and Music, pp. 1-14, 2014.
audio
MIDI
Exposition
29. 28
Cadence finding
Key-independent cadential analysis based on transposition-
invariant set-classes.
Some important cadential procedures in symphonies,
contribution to large-scale structures, are beyond the
description of cadences as plain sequences of chords, and
require sophisticated hierarchical interpretation.
2. Structural analysis
A. Martorell and E. Gómez, “Hierarchical multi-scale set-class analysis”, Journal of Mathematics and Music, pp. 1-14, 2014.
B. A. Martorell, “Systematic set-class surface analysis: a hierarchical multi-scale approach”, in 8th European Music Analysis Conference, Leuven,, 2014.
30. 29
Orchestration
Pitches, pitch classes & instruments.
Instrumentation
All the structural boundaries correspond to important changes in the
instrumentation.
2. Structural analysis
31. 30
Conclusions
Symphonic pieces are long and complex in term of structures.
A combination of audio and score descriptors seem to
capture structural boundaries.
Difficulty of evaluation,
lack of consensus.
Different layers:
• Non-expert users:
instrumentation.
Ex: solo vs tutti.
• Experts: key.
2. Structural analysis
A. Martorell and E. Gómez, “Hierarchical multi-scale set-class analysis”, Journal of Mathematics and Music, pp. 1-14, 2014.
A. Martorell and E. Gómez, “Systematic multi-scale set-class analysis”, 15th ISMIR Conference, Taipei, 2014.
A. Martorell, “Systematic set-class surface analysis: a hierarchical multi-scale approach”, in 8th European Music Analysis Conference, Leuven,, 2014.
33. 32
3. Source separation
Goals: interact with symphonic music by listening to
different instrument sections separately.
Tasks:
Multi-channel source separation of orchestral
instruments/sections.
Note-level alignment for refined separation.
Instrument emphasis.
Source localization & rendering.
34. 33
Multi-channel audio source separation
Nb of sources ≤ Nb of microphone signals.
Each source has a channel at which is predominant (highest direct-
to-reverberant ratio)
Nb & type of instruments known in advance
3. Source separation
J.J. Carabias-Orti, M. Cobos, P. Vera-Candeas and F.J. Rodriguez-Serrano, “Nonnegative signal factorization with learnt instrument
models for sound source separation in close-microphone recordings”, in EURASIP Journal on Advances in Signal Processing.
35. 34
Multi-channel source separation
1. Panning Matrix estimation using score information to find isolated time-
frequency source locations from multi-channel input (17 channels, 12 instruments).
2. NMF-based signal factorization on the selected channel per instrument with
trained instrument models (RWC) to estimate the separated sources spectrogram.
3. Wiener mask separation to perform the reconstruction of each instrument source.
3. Source separation
36. 35
3. Source separation
Note-level alignment
The quality of the separation strongly relies in the
quality of the alignment!
Current audio to score methods are evaluated using a
tolerance window beat level (Cont et al., 2007).
Even manual alignment (usually at beat level) has
inexactitudes in the onset/offset.
missaligned vs aligned
37. 36
3. Source separation
The reconstructed signal can
be seen as the product
between the several harmonic
components (A) and the gains
(B).
After NMF, the resulting gains
(C) are split in submatrices
and used to detect blobs
(boundaries and shapes) (D).
Note-level alignment
M. Miron, Carabias, J. José, and Janer, J., “Audio-to-score alignment at the note level for orchestral recordings”, in 15th International
Society for Music Information Retrieval Conference, Taipei, Taiwan, 2014
39. 38
Conclusions
• Symphonic music is challenging for audio source
separation due to the high number of overlapping
sources.
• Take advantage of multi-channel recordings, redundant
information.
• Need for an informed approach, note-level score
alignment for better quality.
• Quality still far from what a musician would expect.
3. Source separation
41. 40
4. Music visualization
Goal: provide meaningful music visualizations.
Research questions:
• Which information to present users?
• How to visualize this information?
Challenges:
• Not much research on visualizing descriptors.
• Expert vs non-expert users.
• Short-time (local) vs long-time (global).
• Off-line (after the concert) vs on-line (during the concert).
• Cope with errors of current technologies.
50. 49
4. Music visualization
Evaluation strategy
• 2 focus groups: Amsterdam (casual, heavy consumers)
vs Barcelona (musicians).
• Show different visualization concepts, discussion and
questionnaire.
51. 50
4. Music visualization
Results
• Need for specific information on a specific moment:
• Experts: score, structure.
• Naïve: melodic line, instrumentation, structure.
• Need to have control of this information.
• Scenarios: during (musicians, learning tool), after (concert goers).
• Interest (Gareth Loy keynote):
• Surprise factor vs overview of what is coming.
• Attracting attention towards specific elements vs
overstimulation.
• Design recommendations minimalistic, unobtrusive, appealing,
adaptable.
52. 51
Screen, tech audience, educators, around 1000 people.
Young orchestra
Prometheus Overture, Beethoven
4. Music visualization
54. 53
4 visualizations: The sound, The piece, The orchestra,
The conductor.
+ Some quotes about the Prometheus legend
Design support by http://www.hand-coded.net/
4. Music visualization
68. The Conductor
Work by A. Sarasúa & E. Guaus
Á. Sarasúa and E. Guaus, “Dynamics in Music Conducting: A Computational Comparative Study Among
Subjects“. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME’14,
pages 195-200. Goldsmiths London, UK, 2014. https://github.com/asarasua/ofxKinectFeatures
75. 74
Real-time feature extraction:
• Chord probabilities.
• Loudness.
• Assisted score following method:
• Off-line analysis: meter, average tempo, pauses.
• Probabilistic model for tempo prediction and
tracking.
• Video mixer.
4. Music visualization
76. 75
Mocap feature extraction for visualization
ofxKinectFeatures
o openFrameworks addon for real-time feature extraction
o Mapping to animation
Loudness
Beat detection
Quantity of motion
4. Music visualization
Á. Sarasúa and E. Guaus, “Dynamics in Music Conducting: A Computational Comparative Study Among Subjects“. In Proceedings
of the International Conference on New Interfaces for Musical Expression, NIME’14, pages 195-200. Goldsmiths London, UK,
2014. https://github.com/asarasua/ofxKinectFeatures
77. 76
Very good feedback of attendees &
media:
• Spanish press: ABC, El País.
• PHENICX-A live concert, in Digital
Agenda for Europe.
http://europa.eu/!yx93gV
• Twitter:
• Novelty of gestures.
• Interest for technology by
young audiences.
• Educational applications.
4. Music visualization
80. 79
• Technology can facilitate the appreciation of classical
music by new audiences.
• Current technologies have limitations when:
• Addressing symphonic music.
• On a real concert setup.
• Opportunity to address tasks in a different way and
improve state of the art methods.
• User-centred paradigms: visualization, interaction,
adaptation.
Conclusions
81. Computational models of symphonic music:
challenges and opportunities
Emilia Gómez
Universitat Pompeu Fabra, Barcelona, Spain
Editor's Notes
An orchestra classical concert embraces a wealth of musical information, which may not be easily perceived or understood for general audiences
Diminish physical, social barriers. Attract new audiences.
An orchestra classical concert embraces a wealth of musical information, which may not be easily perceived or understood for general audiences. Mariss Janson.
Socios del consorcio
Beethoven’s 3rd symphony ’Eroica’, is generally agreed as a pivotal composition between the Classical and Romantic periods.
This work also constitutes a paradigm of formal complexity, as evidenced by the vast literature analysing the symphony.
It also involves a significant usage of the symphonic resources, yet not exploiting the full possibilities of later works.
This constitutes a proper compromise for analysing a variety of problems specific for the symphonic repertoire, yet avoiding a too complex task. The length of this symphony is also a convenient feature for our purposes, as this duration is comparable to that of the mainstream symphonic repertoire. We considered a performance by the Royal Concertgebouw Orchestra (RCO).
Length: 1 hour! , 4 movements
Free score edition CCARH
Synchronization
Different octaves
Algorithms
Tonality is a main factor contributing to musical form. For instance “sonata-form”, which scaffolds the first movement of many symphonies, is usually built from its tonal plan. This tonal plan is generally clear “only” on the simplest (early) symphonies, which are barely performed today. These general keys often serve more as ’structural references’ (to be implied
by the listener) rather than having large segments of music actually written ’in’ these tonalities.
The actual tonality is often in constant evolution, passing through many short-term keys (referred to as ’tonicisations’). One exponent of this practice is usually found at the development sections of many allegro-sonata forms, in which the composition is intended to challenge or disorient the listener with frequent and unexpected tonal shifts. By estimating these long-term tonal references, we would provide a means for explaining some aspects of the musical form to different users, which could be conveyed through visualizations.
ORCHESTRATION
Much symphonic writing exploits instrumentation/orchestration for creating contrast between different sections, as well as for developing dynamic transformations between them (e.g. ’orchestral crescendo’). The relation between instrumentation/orchestration and musical structure motivates the research on orchestral description. Moreover, the impact of orchestration in listeners has a strong and direct perceptual basis, as much of its effect does not require musical training to be understood.
For instance, any normal-hearing person would distinguish a solo from a tutti section. Many of the sound effects created by composers are rooted on orchestration, and its effect on general
audiences has been extensively exploited (e.g. in movie soundtracks). Additionally, research on orchestration has been barely considered by the MIR community, so it constitutes a novel research path.
Aside from some standard structures (such as the ’rondo’, or the literal repetition of the exposition in the allegro sonata forms), it is generally unlikely to find close timbral or tonal relations between significant sections, in comparable terms as for popular music.
Many symphonies make extensive usage of tonal material restatements (e.g. themes or motives), but they often appear transformed in many varied ways. This is particularly complex in symphonic works from the Romantic period onwards. This limits the practical usage of the most common recurrence-finding methods for
the kind of repertoire likely to be found in orchestral concerts.
We also analysed the problem of assigning labels to the segments and/or boundaries, but this information will be considered in future work. The segmentation problem alone proved complex enough to deserve a careful consideration.
A qualitative similarity analysis reveals that the key estimation method from audio can be
roughly compared to its counterpart from the score. However, the problem of structural segmentation
based on key estimation is clearly manifested for music of this tonal complexity. Several of
the ground truth structural boundaries are well defined by the tonal estimation, from both score
and audio. However, it is also clear that many other short tonicisations are present as well,
and some of the segments are not bounded by tonal shifts.
It is the case of the controversial definition of the second theme group in the first movement of the ’Eroica’, for which several
-technically speaking- perfect cadences towards the second theme’s tonality occur before the (mostly agreed) establishment of the second theme proper. Similar long and complex cadential
procedures are featured by much Romantic repertoire which is not symphony-specific, although the extended possibilities of the symphonic resources favors this practice.
Orchestration: This representation informs about chord complexity (number of pitch-classes), octavations (ratio between the number of pitches and pitch-classes), and unisons (ratio between number of voices and pitches).
Cadential processes are among the most common structure-defining resources. We propose a
key-independent cadential analysis, based on the description in terms of transposition-invariant
set-classes. The general computational framework is described in [29], and information retrieval
applications of this kind are described in [30]. We performed a systematic cadential analysis
from the first movement of the ’Eroica’, inspecting a variety of common cadential sets. Our
preliminary analysis confirms that some important cadential procedures in symphonies, contributing
to large-scale structures, are beyond the description of cadences as plain sequences
of chords, and require sophisticated hierarchical interpretation. It is the case of the controversial
definition of the second theme group in the first movement of the ’Eroica’, for which several
-technically speaking- perfect cadences towards the second theme’s tonality occur before the
(mostly agreed) establishment of the second theme proper. Similar long and complex cadential
procedures are featured by much Romantic repertoire which is not symphony-specific, although
the extended possibilities of the symphonic resources favors this practice.
Mixing matrix, detect the channel where the instrument is more predominant, NMF just on this channel.
Converge to local solutions.
Restrictions: train with instrument models (RWC)
Use the score to see where the instrument plays
Eliminate the partials that are overlapping with other instruments
Use of ofxKinectFeatures (openFrameworks addon for real-time feature extraction) for enhanced live visualization or conducting gestures.
Beats, Quantity of Motion mapped to different aspects of animation.
Use of ofxKinectFeatures (openFrameworks addon for real-time feature extraction) for enhanced live visualization or conducting gestures.
Beats, Quantity of Motion mapped to different aspects of animation.
An orchestra classical concert embraces a wealth of musical information, which may not be easily perceived or understood for general audiences