Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Music Information Retrieval


Published on

Music Information Retrieval is about retrieving information from music entities.
The slides will introduce the basic concepts of the music language, passing through different kind of music representations and it will end up describing some low level features that are used when dealing with music entities.

Published in: Software
  • Login to see the comments

Introduction to Music Information Retrieval

  1. 1. London Information Retrieval Meetup 19 Feb 2019 Introduction to Music Information Retrieval Thoughts from a former bass player Andrea Gazzarini, Software Engineer 19th February 2019
  2. 2. London Information Retrieval Meetup Who I am ▪ Software Engineer (1999-) ▪ “Hermit” Software Engineer (2010-) ▪ Java & Information Retrieval Passionate ▪ Apache Qpid (past) Committer ▪ Husband & Father ▪ Bass Player Andrea Gazzarini, “Gazza”
  3. 3. London Information Retrieval Meetup Sease Search Services ● Open Source Enthusiasts ● Apache Lucene/Solr experts ! Community Contributors ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Search Quality Evaluation, Relevancy Tuning
  4. 4. London Information Retrieval Meetup ✓Music Information Retrieval (MIR)? ➢ Music Essentials ➢ Audio Processing ➢ Q&A Agenda
  5. 5. London Information Retrieval Meetup MIR is concerned with the extraction, analysis and usage of information about any kind of music entity (e.g. a song or a music artist) on any representation level (for example, audio signal, symbolic MIDI representation of a piece of music, or name of a music artist).” Schedl, M.: Automatically extracting, analyzing and visualizing information on music artists from the world wide web. Dissertation, Johannes Kepler University, Wien (2003) Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications. Those involved in MIR may have a background in in musicology, psychoacoustics, psychology, academic music study, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these. Music Information Retrieval (MIR)
  7. 7. London Information Retrieval Meetup Music Content includes all those low-level things we can extract from the audio signal (e.g. time, frequencies, loudness) Computational Factors Context State Music Content Music Context Music Context defines additional metadata that cannot be extracted from the audio signal (e.g. lyrics, tags, artists, feedback, posts) Listener state includes the user state in a given moment (e.g. mood, musical knowledge, preferences) Listener Context relates to the environment where the listener is in a given moment (e.g. political, geographical, social) Factors in Music Perception
  8. 8. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ✓Music Essentials ‣ Essentials ‣ Score Music Representation ‣ Symbolic Representations ‣ Audio Representation ➢ Audio Processing ➢ Q&A Agenda
  9. 9. London Information Retrieval Meetup A note is used for denoting a sound, its pitch and duration A sound is the audio signal produced by a vibrating body Notes are associated to graphical symbols (for indicating the pitch and the duration) Two notes with the same fundamental frequency in a ratio of any integer power of two are perceived as similar. As consequence of that, we say they belong to the same pitch class A note is also used for denoting a pitch class. The traditional music theory individuates 12 pitch classes Notes and Pitch classes are associated to mnemonic codes (e.g. C,D,E,F,G,A,B or DO,RE,MI,FA,SOL,LA,SI) C D E F G A B C B A G F E D C C# D# F# G# A# Bb Ab Gb Eb Db Music Language Essentials
  10. 10. London Information Retrieval Meetup Text Music Letter Note Word Phrase Sentence Chord Ghost Note Phrase Text vs Music
  11. 11. London Information Retrieval Meetup Time Signature Key Signature Clef Tempo Note Reference Chord Chord Score music representation
  12. 12. London Information Retrieval Meetup Symbolic music representations comprise any kind of score representation with an explicit encoding of notes or other musical events. Piano Roll, initially used for denoting rolls of paper with holes for controlling a melody execution on a self-playing device, it is nowadays used for referring to a digital visualisation which provides pitches over time. Musical Instrument Digital Interface (MIDI) is another representation, widely adopted, for representing music event (e.g. pitch, velocity, duration, intensity) Piano Roll & MIDI Symbolic music representation
  13. 13. London Information Retrieval Meetup MusicXML [1] is an XML dialect for expressing Music in XML format. As you can imagine from the example on the right, encoding a whole song will result in a huge and verbose textual representation (that’s XML!). For that reason MusicXML 2.0 introduced a compressed format with a .mxml suffix • Widely supported (scorewriting, OCR, sequencer) • Easy to understand • Full support of music features MusicXML Part Time Clef Note(s) [1] MusicXML
  14. 14. London Information Retrieval Meetup The Parsons code, formally named the Parsons code for melodic contours, is a simple notation used to identify a piece of music through melodic motion — movements of the pitch up and down. ( The encoding focuses on the pitch relation between subsequent notes. Main points about this method are: • Simplicity • Being a textual encoding it offers interesting challenges in text search engines • Limited: It doesn’t consider at all important features like time and intervals, pauses, ghost notes Parsons CodeSymbol Description * First note of a sequence u,/ “up”, the note is higher than the previous one d, “down”, the note is lower than the previous one r,- “repeat”, the note is the same of the previous one Parsons Code (1/4)
  15. 15. London Information Retrieval Meetup Parsons Code (2/4)
  16. 16. London Information Retrieval Meetup * * r u u rr u r u r d r d r d r d r u r u r u r u r * u d d d u u uX u d d d u u uXd Money, Pink Floyd Parsons Code (3/4)
  17. 17. London Information Retrieval Meetup Tempo (Time) Intervals Rests Ghost Notes Parsons Code (4/4)
  18. 18. London Information Retrieval Meetup Digital computers can only capture this data at discrete moments in time. The rate at which a computer captures audio data is called the sampling frequency or sampling rate. An audio signal is a representation of sound that represents the fluctuation in air pressure caused by the vibration as a function of time. Unlike sheet music or symbolic representations, audio representations encode everything that is necessary to reproduce an acoustic realization of a piece of music. Audio Representation: Time Domain
  19. 19. London Information Retrieval Meetup The Frequency Domain representation decomposes the audio signal in a number of waves oscillating a different frequencies. The FD plots the frequencies on the horizontal axis by their corresponding magnitude (power) on the vertical axis. This representation, among other things, can be used for highlighting the dominant frequencies of a musical tone. Frequency Domain Frequency Domain
  20. 20. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Music Essentials ✓ Audio Processing ‣ Basic Pipeline ‣ Time Domain Features ‣ Frequency Domain Features ‣ Chroma Features ➢ Q&A Agenda
  21. 21. London Information Retrieval Meetup Time Domain Features Extraction Frequency Domain Features Extraction Sampling / Quantization Framing Windowing FFT Analog Signal Basic Audio Processing Pipeline
  22. 22. London Information Retrieval Meetup Amplitude Envelope (AE) Max amplitude within a frame Root-Mean-Square Energy (RMS) Perceived sound intensity Zero Crossing Rate (ZCR) Number of times the amplitude changes its sign within a frameFeature Example Usage Loudness Estimation Timbre Analysis Speech Recognition Audio Segmentation Onset Detection Time Domain Features
  23. 23. London Information Retrieval Meetup Band Energy Ratio (BER) Ratio between lower and higher frequency bands energy Spectral Centroid Frequency band where most of the energy is concentrated Bandwidth (BW) Spectral range of interesting part of a signal Feature Example Usage Timbre Analysis Speech Recognition Onset DetectionSpeech/Music Discrimination Spectral Flux Frequency band where most of the energy is concentrated Frequency Domain Features
  24. 24. London Information Retrieval Meetup Chroma features are a powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave. It’s a kind of analysis which bridges between low-level and middle-level features, moving the audio signal representation toward something which is more readable, from a functional perspective. Chroma Features Chroma Features (1/2)
  25. 25. London Information Retrieval Meetup Time C D E F G A B C# D# F# G# A# A A A A A A C A F F F F F F FG C C C C C C D C B B B B B B C B N O I S E Chroma Features (2/2)
  26. 26. London Information Retrieval Meetup FALCON: FAst Lucene-based Cover sOng identification | chromaprint (part of AcustID) Interesting Projects
  27. 27. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Music Representation ➢ Audio Processing ✓ Q&A Agenda
  28. 28. London Information Retrieval Meetup 19 Feb 2019 Thank you! Introduction to Music Information Retrieval Thoughts from a former bass player Andrea Gazzarini, Software Engineer 19th February 2019