• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The convergence of "hard" and "soft"in music technology, Rolf Inge Godøy, UiO

The convergence of "hard" and "soft"in music technology, Rolf Inge Godøy, UiO



VERDIKT conference 2013

VERDIKT conference 2013



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    The convergence of "hard" and "soft"in music technology, Rolf Inge Godøy, UiO The convergence of "hard" and "soft"in music technology, Rolf Inge Godøy, UiO Presentation Transcript

    • The convergence of "hard" and "soft" in music technology Rolf Inge Godøy Department of Musicology, University of Oslo e-mail: r.i.godoy@imv.uio.no fourMs Music, Mind, Motion, Machines www.fourms.uio.no
    • The "hard" and "soft" in music: • "hard" = physics of sound and instruments since time immemorial, more recently also including physics of recording, processing and distribution • "soft" = subjective experiences of musical sound, including innumerable sensory and affective features • Music (as we know it) always presupposes both "hard" and "soft" • The challenge, and marvelous future prospects, of music technology is in the convergence of "hard" and "soft", or of "objective" physics and nothing less than the full extent of human social identity, well-being and passions
    • The "hard" and "soft" in music: • As an initial example, let's listen to this short tune by Carol King: • No doubt an expressive little piece of music that we have no trouble perceiving • Yet there is also a whole lot of details here that we can zoom into by stretching its duration four-fold: • To further get an impression of what goes on, we can look at the following representations of this little tune:
    • The "hard" and "soft" in music: • Continuous sound signal, the sub-symbolic • Somehow extract tone-level symbolic information • And also perceive nuances, i.e. inflections, vibratos, glides, transients and bursts of noise (chaotic components) • And: perceive the fusion of tones into phrases • Question: how is such feature rich sound produced? • And: could we produce such sound by digital synthesis? • What would the control input for such synthesis be? • These and numerous similar questions relate "hard" and "soft" in music
    • The "hard" and "soft" in music: • The "hard"-"soft" duality has a long-standing history in Western culture, extending back to Pre-Socratic times • Music linked with mathematical thinking (Pythagoras) • Music linked with passion and even rage (Plato) • Music linked with cosmology (medieval times) • Music linked with affect (baroque period) • Music linked with drama and longing (romantic period) • Music linked with psychophysics (Helmholtz, 1862) • Yet all through this, music linked with increasingly sophisticated technologies for sound production
    • A quick historical tour of music technology: • Current research suggest musical instruments emerged approximately 40000 years ago (Geißenklösterle Cave flute, Southwestern Germany): • Ancient Greeks and Romans had water powered organs:
    • A quick historical tour of music technology: • An advanced mechanical musical instruments like this from 1650: • Having some resemblance with present MIDI principles:
    • Current music technology research: • Very extensive, including the whole chain from production to perception • Sound synthesis, various models • Physical model synthesis, particularly interesting • Effects processing • Diffusion (spatialization) • Online distribution of music • Search engines for music • Analytical tools for music research • Music Information Retrieval (MIR)
    • Music Information Retrieval • How could an artificial system perceive sound? • A spectrogram of someone saying "shoe" in a quiet (left) and a noisy (right) environment (illustrations from Bregman, A. (1990), Auditory Scene Analysis, MIT Press):
    • Music Information Retrieval • Would an artificial system be able to pick out single sounds in a noisy environment? • Would such a system work in spite of occlusion?
    • Music Information Retrieval • Information extraction from continuous and often messy signals is a major challenges in current music technology research • One application here is automatic transcription (subsymbolic to symbolic), but does it work? • Finnish folksong • Stevie Wonder • And: could a computer find the beat in the music as in this excerpt?
    • Music Information Retrieval • Musical sound is usually multidimensional, so one major challenge is to define features: what features are we interested in selecting? • • And: what about similarity? Similarity measures, a tricky and fascinating topic in music information retrieval Now an overview of music technology related areas:
    • International Computer Music Conference: Digital Audio Signal Processing Sound Synthesis and Analysis Music Analysis Music Information Retrieval Representation and Models for Computer Music Artificial Intelligence and Music Languages for Computer Music Printing and Optical Recognition of Music Mathematical Music Theory Psychoacoustics Music Perception Acoustics of Music Aesthetics, Philosophy and Criticism of Music History of Electroacoustic Music Computer Systems in Music Education Composition Systems and Techniques Interactive Performance Systems Software and Hardware Systems General and Miscellaneous Issues in Computer Music Studio Reports
    • ISMIR - International Society for Music Information Retrieval: Music perception and cognition Musical knowledge and meaning Content-based querying and retrieval Automatic classification Music recommendation and playlist generation Fingerprinting and digital rights management Score following, audio alignment, and music synchronization Transcription and annotation Music summarisation Music structure analysis Optical music recognition Music signal processing Libraries, archives and digital collections Database systems, indexing and query languages Text and web mining Compression and streaming Modification and transformation of music data Evaluation of mir systems Knowledge representation, social tags, and metadata Melody and motives Harmony, chords and tonality Rhythm, beat, tempo and form Timbre, instrumentation and voice Genre, style and mood Performance analysis Similarity metrics Computational musicology User interfaces and user models Emotion and aesthetics Applications of mir to the performing arts and multimedia
    • NIME - New Interfaces for Musical Expression: Novel controllers and interfaces for musical expression Novel musical instruments Augmented/hyper instruments Novel controllers for collaborative performance Interfaces for dance and physical expression Interactive game music Robotic music Interactive sound and multimedia installations Interactive sonification Sensor and actuator technologies Haptic and force feedback devices Interface protocols and data formats Motion, gesture and music Perceptual and cognitive issues Interactivity design and software tools Sonic interaction design NIME intersecting with game design Musical mapping strategies Performance analysis Performance rendering and generative algorithms Machine learning in performance systems Experiences with novel interfaces in live performance and composition Surveys of past work and stimulating ideas for future research Historical studies in twentieth-century instrument design Experiences with novel interfaces in education and entertainment Reports on student projects in the framework of NIME related courses Artistic, cultural, and social impact of NIME technology Biological and bio-inspired systems Mobile music technologies Musical human-computer interaction Multimodal expressive interfaces Practice-based research approaches/methodologies/criticism
    • International Conference on Music Perception and Cognition: Acoustics and psychoacoustics Aesthetic perception and response Cognitive modeling of music Cognitive musicology Composition and improvisation Cross-cultural studies of music Memory and music Musical development Musical timbre Music and emotions Music and evolution Music and language Music and meaning Music and movement Music and neuroscience Music and personality Music and well-being Music education Music performance Music therapy Pitch and tonal perception Rhythm, meter, and timing Social psychology of music
    • Our own research focus: • Point of departure: phenomenological approach to music research, exploring sound and music-related body motion with the working hypothesis that music = sound + body motion • Perception and action inseparable in musical experience, hence a so-called motor theory perspective: • This means that music is fundamentally multimodal:
    • Which figure is maluma andmaluma or takete?! Cross-domain mapping: which is takete?
    • fourMs research, past and present: • The Musical Gestures Project (RCN), 2004-2007 • Sensing Music-Related Actions (RCN, Verdikt), 2008-2012 • ConGAS, Gesture Controlled Audio Systems (EU-COST), 2004-2007 • SID, Sonic Interaction Design (EU-COST), 2008-2011 • EPiCS: Engineering Proprioception in Computing Systems (EU-7th FW), 2010-2014 • Research heavily dependent on technology, both for sound and motion analysis, hence the idea of the 4 Ms, i.e. Music, Mind, Motion, Machines, in our acronym • The fourMs group established on the basis of several years of research cooperation between departments of musicology, informatics and psychology at the University of Oslo
    • Imitation in musical behavior • Motor theory of perception stipulates an incessant mental simulation of actions, both of assumed actions associated with sounds we hear, objects we see, and actions of other people • Assumption: Peoples’ music-related actions tells us something about how they perceive music • Question: how much have listeners with different levels of musical training learned about soundrelated actions? • First, an example of air guitar, then of what we have called sound tracing:
    • processing. In the top part of the figure, we see the spectrogram of a noise sound with a falling spectral centroid, and with a jagged descending curve indicating the calculated spectral centroid trajectory. In the bottom part of the figure we see the corresponding sound-tracing hand motion made by one subject to this sound. Sound-tracing: 12000 Spectral Centroid 10000 Frequency (Hz) 8000 6000 4000 2000 0 0 0.5 1 1.5 Time (s) 1.6 2 2.5 3 X (sideways) Y (back/forth) 1.4 Z (up/down) Position (m) 1.2 1 0.8 0.6 0.4 0.2 0 0.5 1 1.5 Time (s) 2 2.5 3 A main challenge he mat the data to some in a database. This i vice, setup (position vices) placement of resolution, etc., as w comparable with oth for such a scheme, t Interchange Format been worked out and tional partners. A co be seen in figure 2. In addition to vari data will be possibl from low to higher such as different lev tion, jerk) can be use tion qualities. Mid-le motion and chunking ful to find different s level features of e.g. be analysis-based vis views of pertinent fea like playback interfa sound. In order to make tions material acces number of other desi count in developing o • Metadata forma ences/searches
    • frequncy (Hz) 0 0 2 3 0 0 pitch (normalised) hand distance (normalised) 1 time 0 0 time 0 -4 1st canonical component 0 1st 2nd velocity (normalised) 1 6 0 -2 1st canonical component 4 Canonical components (sound) 4 3 0 0 4 2 1 1 2nd canonical component 2nd canonical component 4 The features are projected (i.e. scaled, streched, and rotated) into a new space spanned by the canonical components. This space is found by seeking the maximum correlation between the first canonical components. The process is handled by the canoncorr function in Matlab. loudness (normalised) 1 time (s) e 1 time (s) velocity distance tim 0 1 e loudness pitch tim 1 3 To see more clearly what happens, we represent the two features in a two-dimensional feature space. 5 By plotting the canonical components on a timeline, we see that the 1st canonical componetns bear similarities to each other. In this case, the correlation between the 1st canonical components is 0.75, and between the 2nd canonical components correlation is 0.25. Motion motion features (normalised) 2 The sound and the motion are represented by features (only two sound features and two motion features are used in this example for the sake of clarity). Sound 4000 sound features (normalised) 1 Starting with a sound file and a corresponding motion recording. Canonical components (motion) 6 1st 2nd correlate correlate -2 -4 0 1 time (s) 2 3 0 1 time (s) 2 3
    • Measuring and comparing motion capture systems as well as developing various sensor-based input devices: Static marker X Head marker X Pos. (mm) 0.05 Cameras 50 0 0 −0.05 200 400 600 −50 • Data set #1: 6 recordings of one marker lying still on the floor, and markers on the neck and right foot of 2 subjects standing still [1]. Recorded with the Qualisys system in a setup covering a space of 5 m x 7 m x 3 m, and with a sampling rate of 100 Hz and duration of 10 minutes. 0 200 400 600 −50 0 400 600 400 200 Z Pos. (mm) 600 600 0 Z 0.05 The analyses in this paper are based on data from three separate data sets: 400 Y 0 Markers 2.1 Data sets 200 50 −0.05 Figure 1. Picture from the lab used to record Data set #2 and most of #3, showing one of the setups for the study. 0 Y 0.05 Pos. (mm) Stands with markers 0 50 0 0 −0.05 0 200 400 600 −50 0 200 Time (s) Time (s) Figure 2. Example of a recording of XYZ positions (centred around the mean value) of one static marker (left) and one head marker (right), from Data set #2. Duration = 10 min, SR = 20 Hz. The scaling of the Y-axes are different for the left and right plots (by a factor of 1000). different duration. Then it is better to divide the cumulative distance by the time taken, which gives the average speed that the marker travelled with. We will use this measure to describe the quantity of motion (QoM) of the marker: But also ordinary video can be quite useful: • Data set #2: 25 recordings of one marker fixed to a pole standing still, and markers on the head of 3 subjects standing still [2]. Recorded with the Qualisys QoM = N P n=2 ||p(n) p(n T 1)|| (1)
    • 8.4.2 Reading Motiongrams It takes some time to get used to reading motiongrams. To guide one’s attention, I have added some descriptors to the horizontal motiongram of the 5-minute dance sequence (from Video 7.1) shown in Figure 8.17. Note that a dancefrom left to right in the Here is a video-based motiongram of time runs sequence: display, and that the motiongram only represents the vertical movement of the dancer. Figure 8.17: Motiongram of 5 minutes of free dance movements from Video 7.1. The dancer moved to 5 different musical excerpts (a-e) and each excerpt was repeated three times (1-3). Although quite rough, it is easy to see differences in the QoM and similarities in the upward/downward patterns between the sequences. And cumulative motion in which theimagesare usedpiano andenergy level As opposed to a spectrogram capture colours of a to show the of the marimba performancemotiongram come from the video used to creating of a frequency bands, colours in a sequence: the motiongram (here the motion image). In this example, the body looks blue, which is because the dancer wore black clothes, and thus the colour of the background (the bluescreen) becomes visible. It is also possible to see traces of the red and yellow coloured
    • similar to what can be seen e.g. in golf or tennis. & b œ bœ nœ bœ nœ nœ nœ bœ bœ œ b œ bœ œ nœ 1800 RR RL LL LR Z Position (mm) 1600 1400 1200 1000 800 0 1 2 3 4 0 Time (s) 4000 RR RL LL LR Z Velocity (mm/s) 3000 2000 1000 0 −1000 −2000 −3000 0 1 2 3 4 0 Time (s) 5 Z Acceleration (mm/s2) 1 x 10 RR RL LL LR 0.5 0 −0.5 −1 0 1 2 3 Time (s) 4 0
    • Interactive performance: Fig. 5: Rehearsal before the performance at Norwegian Academy of Music. Parts of the computer and mixer setup to the left, the camera hangs in the ceiling, and the 8 loudspeakers are placed in two squares around the quadratic stage area. the limitations found in video cameras: speed and resolution. We will explore using high speed and high resolution cameras to improve the response time. This will also be combined with small 6D Zigbee based sensor devices containing accelerometers, gyroscopes and magnetometers [11]. Adaptability: A drawback with the current system is the need for manual calibration. This we hope to improve by creating an auto-calibration routine so that the system can adjust itself to a new location and light condition. Motion feature database: CataRT is based on extracting various features that are perceptually relevant. We are currently exploring similar feature extraction and classification techniques for motion capture data, so that motion features can be treated in the same way as we now work with sound data [2]. Action-sound synthesis: Based on a future database of motion capture data, we aim at creating a system with relationships between motion segments (i.e. actions) and sound objects. This will open for more complex mappings between action and sound. A prototype study of this has already been presented in [7], and will be further refined in future experimentation. References Fig. 6: An image from the concert 3 September 2010. A visual element, the white carpet also marked the boundaries for the video analysis area. [1] M. Casey. General sound classification and similarity in MPEG-7 Organised Sound, 6(2):153–164, 2001. [2] K. Glette, A. R. Jensenius, and R. I. Godøy. Extracting action sound features from a sound-tracing study. In Proceedings o Norwegian Artificial Intelligence Symposium, Gjøvik, 22 Novembe 2010, 2010. [3] K. Guettler, H. Wilmers, and V. Johnson. Victoria counts – a case study with electronic violin bow. In Proceedings of the 2008 International Computer Music Conference, Belfast, 2008. [4] A. R. Jensenius, R. I. Godøy, and M. M. Wanderley. Developing tools for studying musical gestures within the Max/MSP/Jitte environment. In Proceedings of the International Computer Music Conference, 4-10 September, 2005, pages 282–285, Barcelona, 2005. [5] K. A. McMillen. Stage-worthy sensor bows for stringed instru ments. In Proceedings of New Interfaces for Musical Expression, pages 347–348, Genova, 2008. [6] F. R. Moore. The dysfunctions of MIDI. Computer Music Journal And motion capture in sound synthesis, sound saber: reverb to create a more holistic soundscape. This is done by using a simple mono reverb effect on each grain. 4. Discussion
    • Standstill: Jensenius Part B2 MICRO A−C7 KA−C7 5 0 −5 5 0 −5 5 0 −5 5 0 −5 5 0 −5 5 0 −5 5 0 −5 5 0 −5 5 0 −5 5 0 −5 Normalised vertical position (mm) 5 0 −5 5 0 −5 0 1 2 3 4 time (samples) 5 6 4 x 10 0 1 2 3 4 time (samples) 5 6 4 x 10 Figure 1: Plots of the normalised vertical position of a head marker for six di erent ten-minute standstill recordings of two persons. and durations (Clarke, 1999; Gabrielsson, 1999). But there are also some studies that have taken a more embodied approach, looking at the multidimensional aspects of rhythm (Waadeland, 2001), the importance of microrhythmic e ects (Danielsen, 2010), and the coupling between rhythm and motor
    • of the database will have to sign a formal agreement before having access to the database. In this connection, users will also agree to follow the data format schemes so as to allow Musicalmaterial to Database: the Motion be easily accessed by other users. qualitative head up 1a breathing 1b 2a breathing 2b 2c 3a left arm turn 3b Annotations 3c Sections Trajectories Segmentation 2 Segmentation 1 Intensity Velocity Raw data quantitative time Figure 2. A conceptual overview of the GDIF scheme for the storage, classification, analysis and annotation of music-related motion and sound, starting from raw input data at the bottom to perceptually more pertinent features at the top. The horizontal axis represents time, and the vertical axis represents the bottom-to-top layers, i.e. the quantitative to qualitative dimension. R. I Mov ge. Godøy, in M tica, Godøy, Mov Grafton, distr Hum Gritten, shire Gritten, and Halmras Gest Mus 211) Jenseniu Tool Univ Jenseniu laye mov tiona
    • Future trends and hopes for music and ICT: • Continuing convergence of "hard" and "soft" in music technology • Need to explore and formlize "soft" features in order to use them in various ICT contexts • Need to develop enhanced means for control of musical features, cf. simulating Carol King type expressivity • Need to recruit more researchers from musicology and psychology to advance music technology (and also to redress severe gender imbalances) • Recognize that social and affective computing are growing fast, and that content-based (signal-based) "soft" descriptors could become integral to music technology
    • Thank you for your attention! For more information, publications, and software: www.fourms.uio.no
    • And have a look at this: http://www.biomotionlab.ca/Demos/ BMLwalker.html