Computational models of symphonic music face various challenges due to the genre's formal complexity, long durations, complex instrumentation, and overlapping sources. Researchers are developing approaches to address melody extraction, structural analysis, source separation, and music visualization for symphonic works. For melody extraction, current methods perform best on simple excerpts but struggle with density and complexity, indicating the need for combined audio-score approaches. Structural analysis of symphonies requires consideration of tonality, orchestration, and discrepancies between expert analyses. Source separation aims to isolate instrument sections from multi-channel recordings.
slides presented at a three-hour local AI music course in Taiwan in Oct 2021; part 1: a brief introduction to music information retrieval (+analysis, +generation)
20190625 Research at Taiwan AI Labs: Music and Speech AIYi-Hsuan Yang
A very brief introduction of what we have been working on at the AI Labs on "music AI" (specifically, automatic music composition/generation) and "speech AI" (specifically, Mandarin ASR).
a set of slides introducing the application of machine learning to music related applications; intended for audience not with computer science background;
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020Yi-Hsuan Yang
Slides introducing our ongoing projects on automatic music composition at the Yating Music AI Team of the Taiwan AI Labs (https://ailabs.tw/). The following URLs link to some demo audio files we have put on SoundCloud: all of them were fully automatically generated without any manual post-processing or editing.
@ai_piano demo: https://soundcloud.com/yating_ai/sets/ai-piano-generation-demo-202004
@ai_piano+drum demo: https://soundcloud.com/yating_ai/sets/ai-pianodrum-generation-demo-202004
@ai_guitar demo: https://soundcloud.com/yating_ai/ai-guitar-tab-generation-202003/s-KHozfW0PTv5
Yi-Hsuan Yang is an Associate Research Fellow with Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010, and became an Assistant Research Fellow in Academia Sinica in 2011. He is also an Adjunct Associate Professor with the National Tsing Hua University, Taiwan. His research interests include music information retrieval, machine learning and affective computing. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award, the 2012 ACM Multimedia Grand Challenge First Prize, and the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan. He is an author of the book Music Emotion Recognition (CRC Press 2011) and a tutorial speaker on music affect recognition in the International Society for Music Information Retrieval Conference (ISMIR 2012). In 2014, he served as a Technical Program Co-chair of ISMIR, and a Guest Editor of the IEEE Transactions on Affective Computing and the ACM Transactions on Intelligent Systems and Technology.
Learning to Generate Jazz & Pop Piano Music from Audio via MIR TechniquesYi-Hsuan Yang
This set of slides briefly describes what we have been working on at the Yating Music AI team at the Taiwan AI Labs. We are going to talk about these as two demo papers at the 20th annual conference of the International Society for Music Information Retrieval (ISMIR),
ISMIR 2019 tutorial: Generating music with generative adverairal networks (GANs)Yi-Hsuan Yang
This document provides an overview and outline of a tutorial on music generation with generative adversarial networks (GANs). The tutorial will begin with an introduction to music generation research and GANs. It will include coding sessions to demonstrate GANs for image generation. Case studies of GAN-based music generation systems will then be presented, including symbolic melody generation, arrangement generation, and style transfer. Current limitations and future research directions will also be discussed. The document lists the speakers and their backgrounds and affiliations in music and artificial intelligence research.
slides presented at a three-hour local AI music course in Taiwan in Oct 2021; part 1: a brief introduction to music information retrieval (+analysis, +generation)
20190625 Research at Taiwan AI Labs: Music and Speech AIYi-Hsuan Yang
A very brief introduction of what we have been working on at the AI Labs on "music AI" (specifically, automatic music composition/generation) and "speech AI" (specifically, Mandarin ASR).
a set of slides introducing the application of machine learning to music related applications; intended for audience not with computer science background;
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020Yi-Hsuan Yang
Slides introducing our ongoing projects on automatic music composition at the Yating Music AI Team of the Taiwan AI Labs (https://ailabs.tw/). The following URLs link to some demo audio files we have put on SoundCloud: all of them were fully automatically generated without any manual post-processing or editing.
@ai_piano demo: https://soundcloud.com/yating_ai/sets/ai-piano-generation-demo-202004
@ai_piano+drum demo: https://soundcloud.com/yating_ai/sets/ai-pianodrum-generation-demo-202004
@ai_guitar demo: https://soundcloud.com/yating_ai/ai-guitar-tab-generation-202003/s-KHozfW0PTv5
Yi-Hsuan Yang is an Associate Research Fellow with Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010, and became an Assistant Research Fellow in Academia Sinica in 2011. He is also an Adjunct Associate Professor with the National Tsing Hua University, Taiwan. His research interests include music information retrieval, machine learning and affective computing. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award, the 2012 ACM Multimedia Grand Challenge First Prize, and the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan. He is an author of the book Music Emotion Recognition (CRC Press 2011) and a tutorial speaker on music affect recognition in the International Society for Music Information Retrieval Conference (ISMIR 2012). In 2014, he served as a Technical Program Co-chair of ISMIR, and a Guest Editor of the IEEE Transactions on Affective Computing and the ACM Transactions on Intelligent Systems and Technology.
Learning to Generate Jazz & Pop Piano Music from Audio via MIR TechniquesYi-Hsuan Yang
This set of slides briefly describes what we have been working on at the Yating Music AI team at the Taiwan AI Labs. We are going to talk about these as two demo papers at the 20th annual conference of the International Society for Music Information Retrieval (ISMIR),
ISMIR 2019 tutorial: Generating music with generative adverairal networks (GANs)Yi-Hsuan Yang
This document provides an overview and outline of a tutorial on music generation with generative adversarial networks (GANs). The tutorial will begin with an introduction to music generation research and GANs. It will include coding sessions to demonstrate GANs for image generation. Case studies of GAN-based music generation systems will then be presented, including symbolic melody generation, arrangement generation, and style transfer. Current limitations and future research directions will also be discussed. The document lists the speakers and their backgrounds and affiliations in music and artificial intelligence research.
The document outlines Olmo Cornelis' research on opportunities for symbiosis between Western and non-Western musical idioms from 2008-2014. It discusses his background, previous work digitizing an ethnomusicological archive, challenges in accurately describing non-Western music, and a proposed methodology using music information retrieval techniques to objectively analyze ethnic music and inform new compositions blending musical elements from different cultures.
Machine learning for creative AI applications in music (2018 nov)Yi-Hsuan Yang
An up-to-date overview of our recent research on music/audio and AI. It contains four parts:
* AI Listener: source separation (ICMLA'18a) and sound event detection (IJCAI'18)
* AI DJ: music thumbnailing (TISMIR'18) and music sequencing (AAAI'18a)
* AI Composer: melody generation (ISMIR'17), lead sheet generation (ICMLA'18b), multitrack pianoroll generation (AAAI'18b), and instrumentation generation (arxiv)
* AI Performer: CNN-based score-to-audio generation (AAAI'19)
Machine Learning for Creative AI Applications in Music (2018 May)Yi-Hsuan Yang
Machine Learning for Creative AI Applications in Music, slides presented at the Fifth Taiwanese Music and Audio Computing Workshop (http://mac.citi.sinica.edu.tw/tmac18/)
1. The study analyzed bass melodies from Red Hot Chili Peppers albums before and after 1999 to investigate changes in the playing style of bassist Flea over time. Features related to pitch, duration, and note counts were extracted from MIDI data and used to accurately classify melodies as pre- or post-1999.
2. A second study investigated the relationship between verbal impressions of equalized vocal tones (e.g. warm, bright) and parametric equalizer settings. Participants evaluated modified vocal recordings and their responses were used to map impressions to frequency boosts and cuts.
3. The results of both studies could help develop music analysis and generation systems that incorporate stylistic changes over time or allow intuitive equalizer
Automatic Music Composition with Transformers, Jan 2021Yi-Hsuan Yang
An up-to-date version of slides introducing our ongoing projects on automatic music composition at the Yating Music AI Team of the Taiwan AI Labs (https://ailabs.tw/), focusing on introducing the following two publications from our group.
[1] "Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions," in Proc. ACM Multimedia, 2020.
[2] "Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs," in Proc. AAAI 2021.
For the last version of the slides, please visit: https://www2.slideshare.net/affige/research-on-automatic-music-composition-at-the-taiwan-ai-labs-april-2020/edit?src=slideview
Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...Andreas Floros
A variety of recent researches in Audio Emotion Recognition (AER) outlines high performance and retrieval accuracy results. However, in most works music is considered as the original sound content that conveys the identified emotions. One of the music characteristics that is found to represent a fundamental means for conveying emotions are the rhythm- related acoustic cues. Although music is an important aspect of everyday life, there are numerous non-linguistic and non- musical sounds surrounding humans, generally defined as sound events (SEs). Despite this enormous impact of SEs to humans, a scarcity of investigations regarding AER from SEs is observed. There are only a few recent investigations concerned with SEs and AER, presenting a semantic connection between the former and the listener’s triggered emotion. In this work we analytically investigate the connection of rhythm-related characteristics of a wide range of common SEs with the arousal of the listener using sound events with semantic content. To this aim, several feature evaluation and classification tasks are conducted using different ranking and classification algorithms. High accuracy results are obtained, demonstrating a significant relation of SEs rhythmic characteristics to the elicited arousal.
Research at MAC Lab, Academia Sincia, in 2017Yi-Hsuan Yang
Some research projects we did in 2017 at the Music & Audio Computing (MAC) Lab, Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan. It includes three parts: 1) vocal separation, 2) music generation, 3) AI DJ.
The document discusses developing a model to compose monophonic world music using deep learning techniques. It proposes using a bi-axial recurrent neural network with one axis representing time and the other representing musical notes. The network will be trained on a dataset of MIDI files describing pitch, timing, and velocity of notes. It will also incorporate information from music theory on scales, chords, and other elements extracted from sheet music files. The goal is to generate unique musical sequences while adhering to music theory rules. The model aims to address the problem of composing long durations of background music for public spaces in an automated way.
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...Gilberto Bernardes
The document outlines a presentation on the creative process behind the work "Dialogismos I", which explores musical borrowing and concatenative sound synthesis. It discusses conceptual influences from Bakhtin's dialogism and Kristeva's intertextuality. The work structures sections based on Arvo Pärt and Johann Sebastian Bach, with a saxophone score and electronic sounds generated through earGram. Future directions include enhancing interactivity and representation in the corpus.
Distinguishing Violinists and Pianists Based on Their Brain SignalsGianpaolo Coro
Many studies in neuropsychology have highlighted that expert musicians, who started learning music in childhood, present structural differences in their brains with respect to non-musicians. This indicates that early music learning affects the development of the brain. Also, musicians’ neuronal activity is different depending on the played instrument and on the expertise. This difference can be analysed by processing electroencephalographic (EEG) signals through Artificial Intelligence models. This paper explores the feasibility to build an automatic model that distinguishes violinists from pianists based only on their brain signals. To this aim, EEG signals of violinists and pianists are recorded while they play classical music pieces and an Artificial Neural Network is trained through a cloud computing platform to build a binary classifier of segments of these signals. Our model has the best classification performance on 20 seconds EEG segments, but this performance depends on the involved musicians’ expertise. Also, the brain signals of a cellist are demonstrated to be more similar to violinists’ signals than to pianists’ signals. In summary, this paper demonstrates that distinctive information is present in the two types of musicians’ brain signals, and that this information can be detected even by an automatic model working with a basic EEG equipment.
Abstract of the scientific paper
Coro, G., Masetti, G., Bonhoeffer, P., & Betcher, M. (2019, September). Distinguishing Violinists and Pianists Based on Their Brain Signals. In International Conference on Artificial Neural Networks (pp. 123-137). Springer, Cham.
https://link.springer.com/chapter/10.1007%2F978-3-030-30487-4_11
Music Information Retrieval (MIR) is an interdisciplinary field that retrieves information from music. MIR aims to make the world's music accessible by using techniques from computer science, information retrieval, audio processing, musicology and more. MIR applications include music document retrieval, recommender systems, emotion detection and more. Music document retrieval identifies music through metadata like Table of Contents or content-based fingerprinting like Shazam. Emotion detection in music aims to classify emotions in music but faces challenges due to subjective human emotion and requires multi-disciplinary techniques.
This document discusses the elements that define indie/rock music genres including the music videos, iconography, camerawork and editing, sounds, narratives, and performances. Key aspects are a distinctive vocal style with authentic and passionate delivery of real-life experiences, as well as live performances that capture the energy of the music. Album art, imagery, and design elements like color schemes and fonts are also discussed as important representative techniques.
This document discusses the elements that define indie/rock music genres including the music videos, iconography, camerawork and editing, sounds, narratives, and performances. Key aspects are a distinctive vocal style with authentic and passionate delivery of real-life experiences, as well as live performances that capture the energy of the music. Album packaging and art is also important, utilizing consistent color schemes, fonts, and imagery to represent the artist.
The document discusses music information retrieval (MIR) and content-based approaches. It describes how MIR deals with intrinsic characteristics of music like pitch, intensity, and timbre. Key concepts covered include music formats, dimensions of music language, and different types of music users. The document also summarizes Shazam's process for identifying songs from short audio clips, which involves fingerprinting audio files and matching fingerprints to identify songs.
The document discusses the convergence of "hard" and "soft" aspects of music technology. It begins by defining "hard" as the physics of sound and instruments, and "soft" as subjective human experiences of music like emotions. It notes that music relies on both aspects. The challenge of music technology is merging the objective "hard" aspects with the full range of human subjective experiences and identities. As an example, it analyzes representations of a short tune, like the sound signal, tones, and human perceptions of nuances. It reviews the historical relationship between music and technology, and areas of current music technology research like synthesis, analysis, and music information retrieval.
There are two aspects of dissonance perception: learned/top-down and innate/bottom-up. Sensory dissonance can be modeled using either auditory models based on the auditory periphery or curve-mapping models based on empirical data. Computer programs that simulate sensory dissonance processing can estimate the degree of dissonance for a given sound. The models were tested on piano music, drone music, and synthesized chords by comparing their predictions of dissonance to participant ratings. The curve-mapping models predicted ratings reasonably well for isolated chords and drone music but not piano music, possibly due to non-sensory influences on ratings for more complex music.
Human Perception and Recognition of Musical Instruments: A ReviewEditor IJCATR
Musical Instrument is the soul of music. Musical Instrument and Player are the two fundamental component of Music. In
the past decade the growth of a new research field targeting the Musical Instrument Identification, Retrieval, Classification,
Recognition and management of large sets of music is known as Music Information Retrieval. An attempt to review the methods,
features and database is done.
Interactive Socio-Mobile Systems for Active Experience of Audiovisual Content swissnex San Francisco
- The document describes research from the Casa Paganini – InfoMus Research Centre focusing on interactive socio-mobile systems using music and audiovisual content.
- The European SIEMPRE project aims to study social interaction and entrainment in music performance using experiments with string quartets and orchestras.
- Research uses the EyesWeb software platform to record multimodal data on movement, audio, and physiology during music performances to analyze factors like leadership, synchronization, and audience response.
The document outlines Olmo Cornelis' research on opportunities for symbiosis between Western and non-Western musical idioms from 2008-2014. It discusses his background, previous work digitizing an ethnomusicological archive, challenges in accurately describing non-Western music, and a proposed methodology using music information retrieval techniques to objectively analyze ethnic music and inform new compositions blending musical elements from different cultures.
Machine learning for creative AI applications in music (2018 nov)Yi-Hsuan Yang
An up-to-date overview of our recent research on music/audio and AI. It contains four parts:
* AI Listener: source separation (ICMLA'18a) and sound event detection (IJCAI'18)
* AI DJ: music thumbnailing (TISMIR'18) and music sequencing (AAAI'18a)
* AI Composer: melody generation (ISMIR'17), lead sheet generation (ICMLA'18b), multitrack pianoroll generation (AAAI'18b), and instrumentation generation (arxiv)
* AI Performer: CNN-based score-to-audio generation (AAAI'19)
Machine Learning for Creative AI Applications in Music (2018 May)Yi-Hsuan Yang
Machine Learning for Creative AI Applications in Music, slides presented at the Fifth Taiwanese Music and Audio Computing Workshop (http://mac.citi.sinica.edu.tw/tmac18/)
1. The study analyzed bass melodies from Red Hot Chili Peppers albums before and after 1999 to investigate changes in the playing style of bassist Flea over time. Features related to pitch, duration, and note counts were extracted from MIDI data and used to accurately classify melodies as pre- or post-1999.
2. A second study investigated the relationship between verbal impressions of equalized vocal tones (e.g. warm, bright) and parametric equalizer settings. Participants evaluated modified vocal recordings and their responses were used to map impressions to frequency boosts and cuts.
3. The results of both studies could help develop music analysis and generation systems that incorporate stylistic changes over time or allow intuitive equalizer
Automatic Music Composition with Transformers, Jan 2021Yi-Hsuan Yang
An up-to-date version of slides introducing our ongoing projects on automatic music composition at the Yating Music AI Team of the Taiwan AI Labs (https://ailabs.tw/), focusing on introducing the following two publications from our group.
[1] "Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions," in Proc. ACM Multimedia, 2020.
[2] "Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs," in Proc. AAAI 2021.
For the last version of the slides, please visit: https://www2.slideshare.net/affige/research-on-automatic-music-composition-at-the-taiwan-ai-labs-april-2020/edit?src=slideview
Sound Events and Emotions: Investigating the Relation of Rhythmic Characteri...Andreas Floros
A variety of recent researches in Audio Emotion Recognition (AER) outlines high performance and retrieval accuracy results. However, in most works music is considered as the original sound content that conveys the identified emotions. One of the music characteristics that is found to represent a fundamental means for conveying emotions are the rhythm- related acoustic cues. Although music is an important aspect of everyday life, there are numerous non-linguistic and non- musical sounds surrounding humans, generally defined as sound events (SEs). Despite this enormous impact of SEs to humans, a scarcity of investigations regarding AER from SEs is observed. There are only a few recent investigations concerned with SEs and AER, presenting a semantic connection between the former and the listener’s triggered emotion. In this work we analytically investigate the connection of rhythm-related characteristics of a wide range of common SEs with the arousal of the listener using sound events with semantic content. To this aim, several feature evaluation and classification tasks are conducted using different ranking and classification algorithms. High accuracy results are obtained, demonstrating a significant relation of SEs rhythmic characteristics to the elicited arousal.
Research at MAC Lab, Academia Sincia, in 2017Yi-Hsuan Yang
Some research projects we did in 2017 at the Music & Audio Computing (MAC) Lab, Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan. It includes three parts: 1) vocal separation, 2) music generation, 3) AI DJ.
The document discusses developing a model to compose monophonic world music using deep learning techniques. It proposes using a bi-axial recurrent neural network with one axis representing time and the other representing musical notes. The network will be trained on a dataset of MIDI files describing pitch, timing, and velocity of notes. It will also incorporate information from music theory on scales, chords, and other elements extracted from sheet music files. The goal is to generate unique musical sequences while adhering to music theory rules. The model aims to address the problem of composing long durations of background music for public spaces in an automated way.
The Creative Process Behind Dialogismos I: Theoretical and Technical Consider...Gilberto Bernardes
The document outlines a presentation on the creative process behind the work "Dialogismos I", which explores musical borrowing and concatenative sound synthesis. It discusses conceptual influences from Bakhtin's dialogism and Kristeva's intertextuality. The work structures sections based on Arvo Pärt and Johann Sebastian Bach, with a saxophone score and electronic sounds generated through earGram. Future directions include enhancing interactivity and representation in the corpus.
Distinguishing Violinists and Pianists Based on Their Brain SignalsGianpaolo Coro
Many studies in neuropsychology have highlighted that expert musicians, who started learning music in childhood, present structural differences in their brains with respect to non-musicians. This indicates that early music learning affects the development of the brain. Also, musicians’ neuronal activity is different depending on the played instrument and on the expertise. This difference can be analysed by processing electroencephalographic (EEG) signals through Artificial Intelligence models. This paper explores the feasibility to build an automatic model that distinguishes violinists from pianists based only on their brain signals. To this aim, EEG signals of violinists and pianists are recorded while they play classical music pieces and an Artificial Neural Network is trained through a cloud computing platform to build a binary classifier of segments of these signals. Our model has the best classification performance on 20 seconds EEG segments, but this performance depends on the involved musicians’ expertise. Also, the brain signals of a cellist are demonstrated to be more similar to violinists’ signals than to pianists’ signals. In summary, this paper demonstrates that distinctive information is present in the two types of musicians’ brain signals, and that this information can be detected even by an automatic model working with a basic EEG equipment.
Abstract of the scientific paper
Coro, G., Masetti, G., Bonhoeffer, P., & Betcher, M. (2019, September). Distinguishing Violinists and Pianists Based on Their Brain Signals. In International Conference on Artificial Neural Networks (pp. 123-137). Springer, Cham.
https://link.springer.com/chapter/10.1007%2F978-3-030-30487-4_11
Music Information Retrieval (MIR) is an interdisciplinary field that retrieves information from music. MIR aims to make the world's music accessible by using techniques from computer science, information retrieval, audio processing, musicology and more. MIR applications include music document retrieval, recommender systems, emotion detection and more. Music document retrieval identifies music through metadata like Table of Contents or content-based fingerprinting like Shazam. Emotion detection in music aims to classify emotions in music but faces challenges due to subjective human emotion and requires multi-disciplinary techniques.
This document discusses the elements that define indie/rock music genres including the music videos, iconography, camerawork and editing, sounds, narratives, and performances. Key aspects are a distinctive vocal style with authentic and passionate delivery of real-life experiences, as well as live performances that capture the energy of the music. Album art, imagery, and design elements like color schemes and fonts are also discussed as important representative techniques.
This document discusses the elements that define indie/rock music genres including the music videos, iconography, camerawork and editing, sounds, narratives, and performances. Key aspects are a distinctive vocal style with authentic and passionate delivery of real-life experiences, as well as live performances that capture the energy of the music. Album packaging and art is also important, utilizing consistent color schemes, fonts, and imagery to represent the artist.
The document discusses music information retrieval (MIR) and content-based approaches. It describes how MIR deals with intrinsic characteristics of music like pitch, intensity, and timbre. Key concepts covered include music formats, dimensions of music language, and different types of music users. The document also summarizes Shazam's process for identifying songs from short audio clips, which involves fingerprinting audio files and matching fingerprints to identify songs.
The document discusses the convergence of "hard" and "soft" aspects of music technology. It begins by defining "hard" as the physics of sound and instruments, and "soft" as subjective human experiences of music like emotions. It notes that music relies on both aspects. The challenge of music technology is merging the objective "hard" aspects with the full range of human subjective experiences and identities. As an example, it analyzes representations of a short tune, like the sound signal, tones, and human perceptions of nuances. It reviews the historical relationship between music and technology, and areas of current music technology research like synthesis, analysis, and music information retrieval.
There are two aspects of dissonance perception: learned/top-down and innate/bottom-up. Sensory dissonance can be modeled using either auditory models based on the auditory periphery or curve-mapping models based on empirical data. Computer programs that simulate sensory dissonance processing can estimate the degree of dissonance for a given sound. The models were tested on piano music, drone music, and synthesized chords by comparing their predictions of dissonance to participant ratings. The curve-mapping models predicted ratings reasonably well for isolated chords and drone music but not piano music, possibly due to non-sensory influences on ratings for more complex music.
Human Perception and Recognition of Musical Instruments: A ReviewEditor IJCATR
Musical Instrument is the soul of music. Musical Instrument and Player are the two fundamental component of Music. In
the past decade the growth of a new research field targeting the Musical Instrument Identification, Retrieval, Classification,
Recognition and management of large sets of music is known as Music Information Retrieval. An attempt to review the methods,
features and database is done.
Interactive Socio-Mobile Systems for Active Experience of Audiovisual Content swissnex San Francisco
- The document describes research from the Casa Paganini – InfoMus Research Centre focusing on interactive socio-mobile systems using music and audiovisual content.
- The European SIEMPRE project aims to study social interaction and entrainment in music performance using experiments with string quartets and orchestras.
- Research uses the EyesWeb software platform to record multimodal data on movement, audio, and physiology during music performances to analyze factors like leadership, synchronization, and audience response.
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
Sankalp Gulati proposed a methodology for discovering and characterizing melodic motives in large audio music collections using domain knowledge of Indian art music. The methodology involves extracting pitch, loudness, and timbre features from audio signals, representing melodies, calculating melodic similarity, extracting repeated patterns as motives, and analyzing the extracted motives. Gulati aims to apply this methodology to a collection of over 550 hours of Indian art music audio and evaluate the results through listening tests and user feedback.
This document discusses conceptual metaphors in music through the lens of the Study and Research Group on Musical Metaphors (GERMM). It provides examples of conceptual metaphors that understand musical ideas in terms of other domains, such as architecture, language, and the body. The group aims to study the links between metaphorical language, conceptual metaphors, and our physical and sensory experiences of music through techniques like questionnaires, interviews, motion capture, and neuroimaging. Their research could provide insights into how metaphor, expectation, and embodiment relate to musical meaning and response.
The document discusses how new multimedia technologies have changed musical culture and practices. It outlines how the music industry has shifted from CDs to online delivery and DAW production. It also discusses new trends in music consumption like music discovery sites, more passionate fans, the influence of celebrity culture, and openness to brand sponsorships. New content models are emerging like remixes, mashups, and live DJ sets. Research topics discussed include new methods of music representation, interaction rules for group experiences, automatic structure discovery, and characterizing aesthetics and emotions.
Tom Collins is a PhD student at the Centre for Research in Computing studying how current methods for pattern discovery in music can be improved and integrated into an automated composition system. He is improving pattern discovery algorithms in two ways: 1) developing a new formula to rate discovered patterns based on empirical user ratings, and 2) creating a new algorithm called SIACT that outperforms existing algorithms at finding translational patterns based on benchmarks set by a music analyst. His presentation will demonstrate these improvements and how they are incorporated into a user interface.
Analysis of Piano Duo Tempo Changes in Varying ConvolutionJames Weaver
This study analyzed changes in tempo for a piano duo performing in different simulated reverberation conditions. Recordings of the duo performing Mozart's Sonata for 2 Pianos in D Major were analyzed after being processed with convolution reverberation to simulate 10 acoustic environments ranging from no reverberation to highly reverberant conditions. The results showed slower tempos were associated with no reverberation and the most reverberant conditions, while the fastest tempos occurred in simulated conditions with moderate reverberation times of 1-2 seconds. This aligns with previous studies on soloists and duos in separate rooms. Further analysis of additional performance parameters and physical space comparisons are recommended areas for additional research.
BASIC PRINCIPLES OF NETWORKED MUSIC PERFORMANCEAlvaro Barbosa
Presentation at the 132nd Audio Engineering Society Convention in Budapest. The lecture was held during the Distributed Music Workshop with Alexander Carôt, Nathan Brock, Andrea Szigetvári, David Willyard and Karl Steinberg.
April 27th, 2012
Towards a Computational Model of Melody Identification in Polyphonic MusicRonildo Oliveira
Slide de apresentação de artigo da disciplina de Inteligência Artificial sobre modelos de identificação de melodias.
"Towards a Computational Model of Melody Identification in Polyphonic Music"
Caroline Ardrey (University of Birmingham)Renata Brandão
This document describes Dr. Caroline Ardrey's research projects involving the digital analysis of poetry performance through audio recordings. The main projects are the Baudelaire Song Project, which analyzes over 1100 songs set to the poetry of Charles Baudelaire, and Visualising Voice, which develops tools for visually representing linguistic features in audio recordings of poetry performances. The goals are to map the historical reception of Baudelaire in song and engage non-specialist audiences with open-access audio archives through a web-based interface showing analysis of poetic form, stress patterns, and other linguistic elements. Challenges include standardizing the methodology across languages, obtaining a sufficient range of recordings, and addressing copyright and storage issues that arise
This document discusses 11 applications of machine learning to music research, focusing on expressive music performance. It describes two approaches - learning at the note level and learning at the structure level. For the note level approach, it uses a system called IBL-SMART that learns rules to determine loudness and tempo for each note. For the structure level approach, it analyzes musical structures like phrases and learns prototypical expression shapes associated with them. It presents experiments applying these approaches to classical pieces, finding the structure level approach produced more musically convincing results.
This document discusses 11 applications of machine learning to music research, focusing on expressive music performance. It describes two approaches - learning at the note level and learning at the structure level. For the note level approach, it uses a system called IBL-SMART that learns rules to determine loudness and tempo for each note. For the structure level approach, it analyzes musical structures like phrases and learns prototypical expression shapes associated with them. It presents experiments applying these approaches to classical pieces, finding the structure level approach produced more musically convincing results.
The past, present and future of singing synthesisEji Warp
The document discusses the history and current state of speech and singing voice modeling. It addresses limitations in existing models, including the assumptions of quasi-static signals and oversimplified representations of aperiodic components. The author proposes a new "low level speech model" that models the harmonic, noise and transient components separately and incorporates a more accurate glottal/source model. This could enable applications like improved pitch shifting by revealing the instants of vocal fold closure. Overall, the document analyzes past approaches, outlines ongoing challenges, and presents a future direction of research toward a more robust low-level speech model.
Introduction of my research histroy: From instrument recognition to support o...kthrlab
This document introduces Tetsuro Kitahara and summarizes his research history in music information retrieval and automatic music generation. It describes his early work on instrument recognition in polyphonic music using probabilistic models. It then outlines his later research developing probabilistic models for computer-assisted music creation tools that allow users to generate and edit melodies and harmonies through intuitive interfaces. The document emphasizes that his recent works aim to automatically generate music from user inputs while facilitating human-computer interaction through abstract representations that hide implementation details.
This document proposes a melody extraction method using multi-column deep neural networks (MCDNNs). The key points are:
1. An MCDNN architecture is used to classify frames into multiple pitch resolutions (e.g. 1 semitone, 0.5 semitone) for improved accuracy and resolution.
2. Data augmentation by pitch shifting and a singing voice detector are used to increase training data.
3. Hidden Markov models provide temporal smoothing of MCDNN outputs.
4. Evaluation on various datasets shows the MCDNN approach outperforms state-of-the-art methods for melody extraction.
A Large-scale comparison of world music corpora with computational tools - Wi...labsbl
Presented by Maria Pantelli
Fifth British Library Labs (BL Labs) Symposium, Monday October 30, 2017
This Award recognises a project or activity which shows the development of new knowledge, research methods, or tools using the Library’s digital content.
Trends in Answer Set-Programming - Focus Musik - PresentationErhard Dinhobl
This document summarizes a presentation on automatic music composition using answer set programming. It discusses using the ANTON system to composition melodies and harmonies by applying declarative rules. ANTON uses 191 lines of code and answer set programming to represent 28 melodic and harmonic rules to generate valid musical pieces. The system can compose for multiple voices and outputs pieces in standard music formats. Future work aims to improve performance and expand ANTON's capabilities to real-time improvisation and more complex musical structures.
Similar to Computational models of symphonic music (20)
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
1. Computational models of symphonic music:
challenges and opportunities
Emilia Gómez
Universitat Pompeu Fabra, Barcelona, Spain
2. Computational models of symphonic music:
challenges and opportunities
Juan J. Bosch, Julio Carabias-Orti, Jordi Janer, Agustín Martorell,
Oscar Mayor, Marius Miron, Álvaro Sarasúa
Universitat Pompeu Fabra, Barcelona, Spain
Cynthia Liem, TUDelft, Neetherlands
3. 2
Introduction
Which are the main challenges people experience
when confronted with a piece of music
they are unfamiliar with?
Can computational models
make these challenges easier?
6. 5
Transform music concert performances into multi-modal,
multi-layer and multi-perspective digital artefacts.
Introduction
Concert performance
Multi-perspective
Multimodal
Multilayer
7. 6
Provide facilities to explore, (re)enjoy and share concerts.
• Before, During & After:
1. Digital program notes
2. Virtual concert guide
3. Overseeing the music
4. Focusing attention and switching viewpoints
5. Comparing different performances
6. Capturing the moment, sharing the magic
7. Joining the orchestra
• Research goals:
• Automatic music description.
• Visualization & interaction.
Introduction
Liem, C. C. S., R. van der Sterren, M. van Tilburg, Á. Sarasúa, J. J. Bosch, J. Janer, M. Melenhorst, E. Gómez, and A. Hanjalic, "Innovating
the Classical Music Experience in the PHENICX Project: Use Cases and Initial User Feedback", 1st International Workshop on
Interactive Content Consumption (WSICC) at EuroITV 2013, Como, Italy, 06/2013
http://www.concertgebouworkest.nl/en/rco-editions/
8. 7
Introduction
Challenges for current technologies Beethoven Symphony 3rd Eroica
• Classical - Romantic.
• Paradigm of formal complexity: vast literature.
• Significant usage of symphonic resources, yet not exploiting full possibilities.
• Variety of problems specific for the symphonic repertoire / avoiding a complex task.
Literature
focus
Symphonic
repertoire
Duration Short
(song 3’)
Long
(57’, 4 movements)
Musical
complexity
Low High
Instrumentation Simple Complex (13 staves)
Overlapping
sources in audio
Small number High number (instrument
sections in unison)
Modalities Score or audio 17 audio, 8 video, score
(15 Gbytes)
Performance by the
Royal Concertgebouw Orchestra (RCO).
10. 9
Introduction
• Data repository for synchronization, visualization,
interaction, computation.
• Repovizz
http://repovizz.upf.edu
Mayor, O., Llimona Q., Marchini M., Papiotis P., & Maestre E. (2013). “repoVizz: a Framework for Remote Storage, Browsing, Annotation,
and Exchange of Multi-modal Data”. ACM International Conference on Multimedia (MM'13).
12. 11
Approaches
Research topics
1. Melody extraction
2. Structural analysis
3. Source separation
4. Music visualization
Linked to information needs.
Methodology
• Data gathering.
• Analysis of human annotations.
• Evaluation of existing methods.
• Adaptation and improvement.
14. 13
Definition:
Sequence of fundamental frequency values representing the pitch of
the lead voice or instrument (Salamon, 2012).
Sequence of pitches that people hum or sing to represent a music
piece (Poliner et al. 2007).
Hypothesis: more intuitive for non-expert users than the traditional score.
Challenges:
• High number of overlapping sources.
• Melody played by different/multiple instruments or sections, unison,
octave relation, or with harmonized melodic lines.
1. Melody extraction
(Salamon and Gómez, 2012)
E. Gómez, A. Klapuri, B. Meudic, “Melody description and extraction in the context of music content processing”, JNMR 32(1), 2003.
J. Salamon, E. Gómez, “Melody extraction from polyphonic music signals using pitch contour characteristics”, IEEE TASLP 20(6), 2012.
15. 14
1. Melody extraction
State-of-the-art
Audio:
• Multiple f0 estimation: 69% note accuracy for simple material.
• Predominant f0 estimation: vocal pop and jazz (85% frame
accuracy) vs other instruments (68% frame accuracy).
Score:
• Methods to select the predominant melodic line (Uitdenbogerd and Zobel,1999)
Methodology
• Music collection building.
• Analysis of human annotations.
• Evaluation of existing methods: audio centred.
• Adaptation and improvement.
J. Salamon, E. Gómez, D. P. W. Ellis and G. Richard, "Melody Extraction from Polyphonic Music Signals: Approaches, Applications and
Challenges", IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.
16. 15
1. Melody extraction
Music collection building and annotation
• Symphonies and symphonic poems, ballets, suites.
• Mostly romantic period, also classical and 20th century.
• 10 to 32 seconds 64 excerpts (94% voiced frames).
J. J. Bosch and E. Gómez, “Melody extraction in symphonic classical music: a comparative study on mutual agreement between humans
and algorithms”, In Proc. of 9th Conference on Interdisciplinary Musicology, Berlin, December 2014
Sections of dominant instruments playing the
main melody:
ST: Strings, BR: Brass, WW: Woodwinds), where
Alt- denotes alternation
17. 16
1. Melody extraction
Music collection building and annotation
• Subjects singing along with music.
• Measure inter-subject & algorithm mutual agreement.
• Ground truth generation.
• Algorithm combination.
J. J. Bosch and E. Gómez, “Melody extraction in symphonic classical music: a comparative study on mutual agreement between humans
and algorithms”, In Proc. of 9th Conference on Interdisciplinary Musicology, Berlin, December 2014
• Measure correlation with
melodic features: range,
density, tessitura (Hippel, 2000),
complexity (Eerola, 2000),
melodiousness (Leman, 1995),
originality (Simonton, 1984), at MIDI
toolbox (Eerola & Toiviainen 2004).
19. 18
1. Melody extraction
Mutual agreement
• Melodic range, note density & complexity have a negative correlation
with people’s ability and agreement.
• Algorithms differ more in excerpts with high note density and pitch
complexity.
20. 19
1. Melody extraction
Evaluation of 13 state-of-the-art approaches
• 5 pitch salience functions (SF)
• 4 multi-pitch estimation methods (MP)
• 4 melody extraction methods (ME)
• Novel approach: combination of salience functions, refinement &
tracking method.
J. J. Bosch and E. Gómez, “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”,
submitted.
21. 20
1. Melody extraction
Evaluation of state-of-the-art approaches
• ME 67%, MP (10 estimates) 94.2%.
J. J. Bosch and E. Gómez, “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”,
submitted.
Raw Pitch, Weighted Raw Pitch,
Raw Chroma, Overall ACcurracy
22. 21
1. Melody extraction
Conclusions:
• Signal processing front-ends do not generally estimate
the melody pitch as the most salient one.
• Current methods (estimation & tracking) are biased to
singing voice. Best results with source-filter models
(Durrieu et al., 2010).
• Difficulty in selecting the melody instrument and pitch.
• Algorithm performance is correlated with melodic
complexity (specially pitch complexity) and density.
• Need for a combined audio + score melody estimation.
J. J. Bosch and E. Gómez, “Evaluation and Combination of Pitch Estimation Methods for Melody Extraction in Symphonic Classical Music”,
submitted.
24. 23
2. Structural analysis
Motivation
• Navigation and orientation along the piece.
Goals
• Characterize specificities for symphonic repertoire: musical
characteristics and signal properties unique to symphonic music.
• Use this knowledge to improve state-of-the-art algorithms.
Methodology:
• Expert analyses.
• Evaluation of existing methods.
• Adaptation and improvement.
25. 24
2. Structural analysis
Structure in symphonic music
Tonality
• A main factor contributing to musical form.
• Ex: “sonata-form”, 1st movement of many symphonies, clear on the simplest
(earliest) symphonies, otherwise serve as ‘structural references’.
• Practice: tonality in constant evolution, short-term keys (‘tonicisations’).
Orchestration
• Combination of instrumentation and pitch content.
• Richness of sonority: combination of instrument families (woodwinds, brass,
strings and percussion), pitch content (register).
• Well-known effects: contrast between sections, dynamic transformations
(‘orchestral crescendo’).
• Impact on listeners: solo vs tutti.
26. 25
2. Structural analysis
State of the art
Tonal description
• Key estimation (template-based models): pitch-class profiles vs templates (learnt
from data, perceptual experiments, music theory) ex: (Krumhansl and Kessler, 1982)
• Key tracking: probabilistic inference (Hidden-Markov Models, Neural Networks).
• Annotations for short excerpts (Chuan and Chew, 2012).
Structural analysis
• Wealth of research from both audio and score
(Mueller and Smith, ISMIR tutorial, 2014).
• Structure as related to:
• Self-similarity analysis: timbre, tonality.
• Homogeneity, Novelty: abrupt changes.
(Foote 2001)
E. Gómez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on Computing 18(3), pp. 294-
304, 2006.
27. 26
Analysis of expert information
Survey of 8 music analysts on “Eroica” + own music analysis
(1st movement, most complex one).
Large discrepancies / high level of conceptualization.
Voting scheme: number of supporting scholars, th=3 (minimum consensus)
16 segment boundaries in Eroica exposition.
2. Structural analysis
Plantinga, L. Romantic Music: a History of Musical Style in Nineteenth-Century Europe. New York: Norton, 1984.
Sipe, T. Beethoven: Eroica Symphony. Cambridge University Press, 1998.
Taruskin, R. The Oxford History of Western Music, vol.2., 2005.
Dahlhaus, C. Ludwig van Beethoven: Approaches to His Music. Oxford: Clarendon, 1991.
Schenker, H. 'Beethoven's Third Sympohony: Its True Content Described for the First Time', in Heinrich Schenker, The Masterwork
in Music: A Yearbook, vol.3, Cambridge Studies in Music Theory and Analysis 10, Cambridge: Cambridge University Press, 1997.
Webster, J. 'Sonata Form', in New Grove Dictionary of Music and Musicians, vol.23, London: Macmillan, 2001.
Horne, W. 'The hidden trellis: where does the second group begin in the first movement of Beethoven's Eroica Symphony?'
Beethoven Forum, vol.13, nº.2 (2006), pp. 95-147.
Grove, G. Beethoven and his nine symphonies. London: Novello (1890)
Additional schorlarly sources: Kerman, J.; Lockwood, L.; Kegan, P.; Dent, J. M.; Mathews, D.; Hopkins, A.; Christopher, H.
Program notes from 10 Symphony Orchestras: Philadelphia, Utah, Florida, Atlantic, Oregon, Jacksonville, St. Louis, San Francisco,
New York and Boston ... almost no structural information at all !!!
28. 27
Key estimation
Multi-scale key estimation & representation method (audio - MIDI)
(Sapp’s keyscapes)
Several of the ground-truth boundaries are well defined by key estimation.
Many other short tonicisations are present.
Some of the segments are not bounded by tonal shifts.
2. Structural analysis
E. Gómez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on Computing 18(3), pp. 294-304, 2006.
A. Martorell and E. Gómez, “Hierarchical multi-scale set-class analysis”, Journal of Mathematics and Music, pp. 1-14, 2014.
audio
MIDI
Exposition
29. 28
Cadence finding
Key-independent cadential analysis based on transposition-
invariant set-classes.
Some important cadential procedures in symphonies,
contribution to large-scale structures, are beyond the
description of cadences as plain sequences of chords, and
require sophisticated hierarchical interpretation.
2. Structural analysis
A. Martorell and E. Gómez, “Hierarchical multi-scale set-class analysis”, Journal of Mathematics and Music, pp. 1-14, 2014.
B. A. Martorell, “Systematic set-class surface analysis: a hierarchical multi-scale approach”, in 8th European Music Analysis Conference, Leuven,, 2014.
30. 29
Orchestration
Pitches, pitch classes & instruments.
Instrumentation
All the structural boundaries correspond to important changes in the
instrumentation.
2. Structural analysis
31. 30
Conclusions
Symphonic pieces are long and complex in term of structures.
A combination of audio and score descriptors seem to
capture structural boundaries.
Difficulty of evaluation,
lack of consensus.
Different layers:
• Non-expert users:
instrumentation.
Ex: solo vs tutti.
• Experts: key.
2. Structural analysis
A. Martorell and E. Gómez, “Hierarchical multi-scale set-class analysis”, Journal of Mathematics and Music, pp. 1-14, 2014.
A. Martorell and E. Gómez, “Systematic multi-scale set-class analysis”, 15th ISMIR Conference, Taipei, 2014.
A. Martorell, “Systematic set-class surface analysis: a hierarchical multi-scale approach”, in 8th European Music Analysis Conference, Leuven,, 2014.
33. 32
3. Source separation
Goals: interact with symphonic music by listening to
different instrument sections separately.
Tasks:
Multi-channel source separation of orchestral
instruments/sections.
Note-level alignment for refined separation.
Instrument emphasis.
Source localization & rendering.
34. 33
Multi-channel audio source separation
Nb of sources ≤ Nb of microphone signals.
Each source has a channel at which is predominant (highest direct-
to-reverberant ratio)
Nb & type of instruments known in advance
3. Source separation
J.J. Carabias-Orti, M. Cobos, P. Vera-Candeas and F.J. Rodriguez-Serrano, “Nonnegative signal factorization with learnt instrument
models for sound source separation in close-microphone recordings”, in EURASIP Journal on Advances in Signal Processing.
35. 34
Multi-channel source separation
1. Panning Matrix estimation using score information to find isolated time-
frequency source locations from multi-channel input (17 channels, 12 instruments).
2. NMF-based signal factorization on the selected channel per instrument with
trained instrument models (RWC) to estimate the separated sources spectrogram.
3. Wiener mask separation to perform the reconstruction of each instrument source.
3. Source separation
36. 35
3. Source separation
Note-level alignment
The quality of the separation strongly relies in the
quality of the alignment!
Current audio to score methods are evaluated using a
tolerance window beat level (Cont et al., 2007).
Even manual alignment (usually at beat level) has
inexactitudes in the onset/offset.
missaligned vs aligned
37. 36
3. Source separation
The reconstructed signal can
be seen as the product
between the several harmonic
components (A) and the gains
(B).
After NMF, the resulting gains
(C) are split in submatrices
and used to detect blobs
(boundaries and shapes) (D).
Note-level alignment
M. Miron, Carabias, J. José, and Janer, J., “Audio-to-score alignment at the note level for orchestral recordings”, in 15th International
Society for Music Information Retrieval Conference, Taipei, Taiwan, 2014
39. 38
Conclusions
• Symphonic music is challenging for audio source
separation due to the high number of overlapping
sources.
• Take advantage of multi-channel recordings, redundant
information.
• Need for an informed approach, note-level score
alignment for better quality.
• Quality still far from what a musician would expect.
3. Source separation
41. 40
4. Music visualization
Goal: provide meaningful music visualizations.
Research questions:
• Which information to present users?
• How to visualize this information?
Challenges:
• Not much research on visualizing descriptors.
• Expert vs non-expert users.
• Short-time (local) vs long-time (global).
• Off-line (after the concert) vs on-line (during the concert).
• Cope with errors of current technologies.
50. 49
4. Music visualization
Evaluation strategy
• 2 focus groups: Amsterdam (casual, heavy consumers)
vs Barcelona (musicians).
• Show different visualization concepts, discussion and
questionnaire.
51. 50
4. Music visualization
Results
• Need for specific information on a specific moment:
• Experts: score, structure.
• Naïve: melodic line, instrumentation, structure.
• Need to have control of this information.
• Scenarios: during (musicians, learning tool), after (concert goers).
• Interest (Gareth Loy keynote):
• Surprise factor vs overview of what is coming.
• Attracting attention towards specific elements vs
overstimulation.
• Design recommendations minimalistic, unobtrusive, appealing,
adaptable.
52. 51
Screen, tech audience, educators, around 1000 people.
Young orchestra
Prometheus Overture, Beethoven
4. Music visualization
54. 53
4 visualizations: The sound, The piece, The orchestra,
The conductor.
+ Some quotes about the Prometheus legend
Design support by http://www.hand-coded.net/
4. Music visualization
68. The Conductor
Work by A. Sarasúa & E. Guaus
Á. Sarasúa and E. Guaus, “Dynamics in Music Conducting: A Computational Comparative Study Among
Subjects“. In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME’14,
pages 195-200. Goldsmiths London, UK, 2014. https://github.com/asarasua/ofxKinectFeatures
75. 74
Real-time feature extraction:
• Chord probabilities.
• Loudness.
• Assisted score following method:
• Off-line analysis: meter, average tempo, pauses.
• Probabilistic model for tempo prediction and
tracking.
• Video mixer.
4. Music visualization
76. 75
Mocap feature extraction for visualization
ofxKinectFeatures
o openFrameworks addon for real-time feature extraction
o Mapping to animation
Loudness
Beat detection
Quantity of motion
4. Music visualization
Á. Sarasúa and E. Guaus, “Dynamics in Music Conducting: A Computational Comparative Study Among Subjects“. In Proceedings
of the International Conference on New Interfaces for Musical Expression, NIME’14, pages 195-200. Goldsmiths London, UK,
2014. https://github.com/asarasua/ofxKinectFeatures
77. 76
Very good feedback of attendees &
media:
• Spanish press: ABC, El País.
• PHENICX-A live concert, in Digital
Agenda for Europe.
http://europa.eu/!yx93gV
• Twitter:
• Novelty of gestures.
• Interest for technology by
young audiences.
• Educational applications.
4. Music visualization
80. 79
• Technology can facilitate the appreciation of classical
music by new audiences.
• Current technologies have limitations when:
• Addressing symphonic music.
• On a real concert setup.
• Opportunity to address tasks in a different way and
improve state of the art methods.
• User-centred paradigms: visualization, interaction,
adaptation.
Conclusions
81. Computational models of symphonic music:
challenges and opportunities
Emilia Gómez
Universitat Pompeu Fabra, Barcelona, Spain
Editor's Notes
An orchestra classical concert embraces a wealth of musical information, which may not be easily perceived or understood for general audiences
Diminish physical, social barriers. Attract new audiences.
An orchestra classical concert embraces a wealth of musical information, which may not be easily perceived or understood for general audiences. Mariss Janson.
Socios del consorcio
Beethoven’s 3rd symphony ’Eroica’, is generally agreed as a pivotal composition between the Classical and Romantic periods.
This work also constitutes a paradigm of formal complexity, as evidenced by the vast literature analysing the symphony.
It also involves a significant usage of the symphonic resources, yet not exploiting the full possibilities of later works.
This constitutes a proper compromise for analysing a variety of problems specific for the symphonic repertoire, yet avoiding a too complex task. The length of this symphony is also a convenient feature for our purposes, as this duration is comparable to that of the mainstream symphonic repertoire. We considered a performance by the Royal Concertgebouw Orchestra (RCO).
Length: 1 hour! , 4 movements
Free score edition CCARH
Synchronization
Different octaves
Algorithms
Tonality is a main factor contributing to musical form. For instance “sonata-form”, which scaffolds the first movement of many symphonies, is usually built from its tonal plan. This tonal plan is generally clear “only” on the simplest (early) symphonies, which are barely performed today. These general keys often serve more as ’structural references’ (to be implied
by the listener) rather than having large segments of music actually written ’in’ these tonalities.
The actual tonality is often in constant evolution, passing through many short-term keys (referred to as ’tonicisations’). One exponent of this practice is usually found at the development sections of many allegro-sonata forms, in which the composition is intended to challenge or disorient the listener with frequent and unexpected tonal shifts. By estimating these long-term tonal references, we would provide a means for explaining some aspects of the musical form to different users, which could be conveyed through visualizations.
ORCHESTRATION
Much symphonic writing exploits instrumentation/orchestration for creating contrast between different sections, as well as for developing dynamic transformations between them (e.g. ’orchestral crescendo’). The relation between instrumentation/orchestration and musical structure motivates the research on orchestral description. Moreover, the impact of orchestration in listeners has a strong and direct perceptual basis, as much of its effect does not require musical training to be understood.
For instance, any normal-hearing person would distinguish a solo from a tutti section. Many of the sound effects created by composers are rooted on orchestration, and its effect on general
audiences has been extensively exploited (e.g. in movie soundtracks). Additionally, research on orchestration has been barely considered by the MIR community, so it constitutes a novel research path.
Aside from some standard structures (such as the ’rondo’, or the literal repetition of the exposition in the allegro sonata forms), it is generally unlikely to find close timbral or tonal relations between significant sections, in comparable terms as for popular music.
Many symphonies make extensive usage of tonal material restatements (e.g. themes or motives), but they often appear transformed in many varied ways. This is particularly complex in symphonic works from the Romantic period onwards. This limits the practical usage of the most common recurrence-finding methods for
the kind of repertoire likely to be found in orchestral concerts.
We also analysed the problem of assigning labels to the segments and/or boundaries, but this information will be considered in future work. The segmentation problem alone proved complex enough to deserve a careful consideration.
A qualitative similarity analysis reveals that the key estimation method from audio can be
roughly compared to its counterpart from the score. However, the problem of structural segmentation
based on key estimation is clearly manifested for music of this tonal complexity. Several of
the ground truth structural boundaries are well defined by the tonal estimation, from both score
and audio. However, it is also clear that many other short tonicisations are present as well,
and some of the segments are not bounded by tonal shifts.
It is the case of the controversial definition of the second theme group in the first movement of the ’Eroica’, for which several
-technically speaking- perfect cadences towards the second theme’s tonality occur before the (mostly agreed) establishment of the second theme proper. Similar long and complex cadential
procedures are featured by much Romantic repertoire which is not symphony-specific, although the extended possibilities of the symphonic resources favors this practice.
Orchestration: This representation informs about chord complexity (number of pitch-classes), octavations (ratio between the number of pitches and pitch-classes), and unisons (ratio between number of voices and pitches).
Cadential processes are among the most common structure-defining resources. We propose a
key-independent cadential analysis, based on the description in terms of transposition-invariant
set-classes. The general computational framework is described in [29], and information retrieval
applications of this kind are described in [30]. We performed a systematic cadential analysis
from the first movement of the ’Eroica’, inspecting a variety of common cadential sets. Our
preliminary analysis confirms that some important cadential procedures in symphonies, contributing
to large-scale structures, are beyond the description of cadences as plain sequences
of chords, and require sophisticated hierarchical interpretation. It is the case of the controversial
definition of the second theme group in the first movement of the ’Eroica’, for which several
-technically speaking- perfect cadences towards the second theme’s tonality occur before the
(mostly agreed) establishment of the second theme proper. Similar long and complex cadential
procedures are featured by much Romantic repertoire which is not symphony-specific, although
the extended possibilities of the symphonic resources favors this practice.
Mixing matrix, detect the channel where the instrument is more predominant, NMF just on this channel.
Converge to local solutions.
Restrictions: train with instrument models (RWC)
Use the score to see where the instrument plays
Eliminate the partials that are overlapping with other instruments
Use of ofxKinectFeatures (openFrameworks addon for real-time feature extraction) for enhanced live visualization or conducting gestures.
Beats, Quantity of Motion mapped to different aspects of animation.
Use of ofxKinectFeatures (openFrameworks addon for real-time feature extraction) for enhanced live visualization or conducting gestures.
Beats, Quantity of Motion mapped to different aspects of animation.
An orchestra classical concert embraces a wealth of musical information, which may not be easily perceived or understood for general audiences