Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Musical Information Retrieval Take 2: Interval Hashing Based Ranking

434 views

Published on

Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Musical Information Retrieval Take 2: Interval Hashing Based Ranking

  1. 1. London Information Retrieval Meetup 21 OCT 2019 Music Information Retrieval Take 2 Interval Hashing Based Ranking Andrea Gazzarini, Software Engineer 21th October 2019
  2. 2. London Information Retrieval Meetup ▪ Software Engineer (1999-) ▪ “Hermit” Software Engineer (2010-) ▪ Java & Information Retrieval Passionate ▪ Apache Qpid (past) Committer ▪ Husband & Father ▪ Bass Player Andrea Gazzarini, “Gazza”
  3. 3. London Information Retrieval Meetup Sease Search Services ● Open Source Enthusiasts ● Apache Lucene/Solr experts ! Community Contributors ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Search Quality Evaluation, Relevancy Tuning
  4. 4. London Information Retrieval Meetup ✓Music Information Retrieval (MIR) ➢ Query by Humming and Cover Song Detection ➢ Chroma Feature ➢ Music Language Essentials ➢ Ranked Interval Hashing ➢ Next steps ➢ Q&A Agenda
  5. 5. London Information Retrieval Meetup MIR is concerned with the extraction, analysis and usage of information about any kind of music entity (e.g. a song or a music artist) on any representation level (for example, audio signal, symbolic MIDI representation of a piece of music, or name of a music artist).” Schedl, M.: Automatically extracting, analyzing and visualizing information on music artists from the world wide web. Dissertation, Johannes Kepler University, Wien (2003) Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications. Those involved in MIR may have a background in in musicology, psychoacoustics, psychology, academic music study, signal processing, informatics, machine learning, optical music recognition, computational intelligence or some combination of these. https://en.wikipedia.org/wiki/Music_information_retrieval Music Information Retrieval (MIR)
  6. 6. London Information Retrieval Meetup AUDIO IDENTIFICATION GENRE IDENTIFICATION TRANSCRIPTION RECOMMENDATION SYMBOLIC SIMILARITY MOOD SOURCE SEPARATION INSTRUMENT RECOGNITION TEMPO ESTIMATION SCORE ALIGNMENT BEAT TRACKING QUERY BY HUMMINGAUDIO IDENTIFICATION INSTRUMENT RECOGNITION GENRE IDENTIFICATION TRANSCRIPTION RECOMMENDATION TEMPO ESTIMATION SONG STRUCTURE SCORE ALIGNMENT COVER SONG DETECTION SYMBOLIC SIMILARITY KEY DETECTION BEAT TRACKING MOOD SOURCE SEPARATION Music Information Retrieval (MIR)
  7. 7. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ✓Query By Humming / Cover Song Detection ➢ Chroma Feature ➢ Music Language Essentials ➢ Ranked Interval Hashing ➢ Next steps ➢ Q&A Agenda
  8. 8. London Information Retrieval Meetup Query by humming (QbH) is a music retrieval system that branches off the original classification systems of title, artist, composer, and genre. It normally applies to songs or other music with a distinct single theme or melody. The system involves taking a user-hummed melody (input query) and comparing it to an existing database. (Wikipedia) What is it? Query by Humming/Playing The main challenge of such query paradigm is the difference, in terms of audio signal, between the input query and the stored dataset.
  9. 9. London Information Retrieval Meetup A cover song is an alternative version/recording of a previously recorded song. A cover can differ from the original song in several features: key, timbre, tempo, structure, arrangement, language, even melody and lyrics. The goal of a cover song detection search algorithm is to identify, for a query audio track, recordings that belong to the same conceptual work/composition. The main challenges are: • find a set of features that capture the common aspects of the target compositions • give a rank to each response items (generally not so important) What is it? Cover Song Detection
  10. 10. London Information Retrieval Meetup QBH + CSD
  11. 11. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Query by Humming and Cover Song Detection ✓Chroma Feature ➢ Music Language Essentials ➢ Ranked Interval Hashing ➢ Next steps ➢ Q&A Agenda
  12. 12. London Information Retrieval Meetup Chroma features are a powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave. The base assumption is that in music, notes exactly one octave apart are perceived as “similar”. As consequence of that, working on the distribution of chroma without the absolute frequency can give useful musical information about the audio It’s a kind of analysis which bridges between low- level and middle-level features, moving the audio signal representation towards something which is more readable, from a functional perspective. Chroma Feature Chroma Feature (1/5) Source: Wikipedia
  13. 13. London Information Retrieval Meetup Chroma Feature (2/5) C D D# D DD D D DC C C E F F# A A# B F# F# BB EF D C D F
  14. 14. London Information Retrieval Meetup A Chroma Matrix consists of a 12 * tn matrix where each vector represents the twelve pitches strengths for a given instant t. Sometimes the matrix is inverted so the same information described above is available at row-level. Chroma features can be computed using different tools available on the internet including: • LabROSA audio analysis tools (partially portable to Octave) • opensource tools written in several languages Chroma Matrix Chroma Feature (3/5) Pitches strengths at instant t1 where you can see the top (i.e. most intense) 3 classes are A#, B and A
  15. 15. London Information Retrieval Meetup Time C D E F G A B C# D# F# G# A# A A A A A A C A F F F F F F FG C C C C C C D C B B B B B B C B N O I S E Chroma Feature (4/5)
  16. 16. London Information Retrieval Meetup Chroma Feature (5/5)
  17. 17. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Query by Humming and Cover Song Detection ➢ Chroma Feature ✓Music Language Essentials ➢ Ranked Interval Hashing ➢ Next steps ➢ Q&A Agenda
  18. 18. London Information Retrieval Meetup A note is used for denoting a sound, its pitch and duration A sound is the audio signal produced by a vibrating body Notes are associated to graphical symbols (for indicating the pitch and the duration) Two notes with the same fundamental frequency in a ratio of any integer power of two are perceived as similar. As consequence of that, we say they belong to the same pitch class A note is also used for denoting a pitch class. The Western Music Theory individuates 12 pitch classes Notes and Pitch classes are associated to mnemonic codes (e.g. C,D,E,F,G,A,B or DO,RE,MI,FA,SOL,LA,SI) C D E F G A B C B A G F E D C C# D# F# G# A# Bb Ab Gb Eb Db Music Language Essentials The distance between two subsequent notes is called a semitone
  19. 19. London Information Retrieval Meetup Text Music Letter Note Word Phrase Sentence Chord Ghost Note Phrase Text vs Music
  20. 20. London Information Retrieval Meetup Time Signature Key Signature Clef Tempo Note Reference Chord Chord Score music representation
  21. 21. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Query by Humming and Cover Song Detection ➢ Chroma Feature ➢ Music Language Essentials ✓Ranked Interval Hashing ➢ Next steps ➢ Q&A Agenda
  22. 22. London Information Retrieval Meetup We’ve already seen Chroma Features analysis normalises the entire spectrogram in classes, loosing the absolute frequency. That introduces a normalisation which actually puts on the same layer notes belonging to the same class. This strictly relates to a concept known in music theory as “Inversion” “Inversion, in music, is the rearrangement of the top- to-bottom elements in an interval, a chord, a melody, or a group of contrapuntal lines of music.” (Source: Encyclopedia Britannica) Inversions Assumption #1: Inversions as “Synonyms” = = = AND = IF AND THEN ~ ~ =~ =~ =~
  23. 23. London Information Retrieval Meetup In Music Theory, an interval is defined as the distance between two notes. In Western Music Theory the minimum distance between two notes is called semitone. As consequence of that, an interval measures the number of semitones that occurs between two notes. The intervals concept allows to translate any sequence of notes. If we have a melody starting from a given note (e.g. C) we can translate the whole sequence using a different starting point (e.g. A): the most important thing is that intervals must remain the same. Intervals Assumption #2: Intervals instead of Notes!
  24. 24. London Information Retrieval Meetup A simple matrix which defines all the possible intervals between two notes. Each interval has a precise name in Western Music Theory (e.g. 0 = unison), but for our purposes seeing the number (of semitones) is more important The interval between two arbitrary notes can be easily computed by using the following formula: (defn semitones [x y] (if (>= y x) (- y x) (+ (- 11 (- x 1)) y))) Intervals Table Intervals Table C C# D D# E F F# G G# A A# B C 0 1 2 3 4 5 6 7 8 9 10 11 C# 11 0 1 2 3 4 5 6 7 8 9 10 D 10 11 0 1 2 3 4 5 6 7 8 9 D# 9 10 11 0 1 2 3 4 5 6 7 8 E 8 9 10 11 0 1 2 3 4 5 6 7 F 7 8 9 10 11 0 1 2 3 4 5 6 F# 6 7 8 9 10 11 0 1 2 3 4 5 G 5 6 7 8 9 10 11 0 1 2 3 4 G # 4 5 6 7 8 9 10 11 0 1 2 3 A 3 4 5 6 7 8 9 10 11 0 1 2 A# 2 3 4 5 6 7 8 9 10 11 0 1 B 1 2 3 4 5 6 7 8 9 10 11 0
  25. 25. London Information Retrieval Meetup For each Chroma Vector we keep the top k most intense pitch energies. Each class is denoted by its ordinal position (the examples on the left provides, for simplicity, the corresponding mnemonic code as well). The underlying assumption is that in order to represent and summarise a chroma vector, the stronger a class energy is, the better. • INPUT => n Chroma vectors • OUTPUT => n Rank vectors Top K classes Step #1: Ranks 0.28956729767718553 0.29918483818230696 0.29750046253718804 0.26970075757159021 0.26570971680808941 0.28762092331555628 0.27230350897098865 0.26479060237734442 0.24563184404410343 0.35945972816613164 0.20234795661509283 0.19691813882136551 0.21751928582300245 0.31682129361484279 0.21327678582822374 0.19345434437896755 0.19067933460430175 0.19899201366683592 0.20626346569405868 0.11096279563650845 0.18864723689615112 0.15732581183385463 0.16582685461399227 0.13454776152251802 0.17659990800974107 0.12781101466846442 0.12998904712393031 0.17361679104251077 0.15805875199726557 0.11070659008562342 0.12210923914829071 0.14658222225322309 0.14652990682934677 0.10462518086147332 0.11147004101461088 0.11962189455960689 0.13012578265891636 0.08802297436725277 0.08795354261988087 0.07726694847307212 0.10615064267867232 0.06285819867921958 0.06280150731135973 0.04137698779765421 0.07978125487239938 0.04066800213125138 0.03900988562456922 0.01849582605529331 0 - C 2 - D 0 - C 0 - C 1 - C# 3 - D# 1 - C# 1 - C# 2 - D 0 - C 3 - D# 2 - D 3 - D# 1 - C# 4 - E 3 - D# 4 - E 4 - E 2 - D 6 - F#
  26. 26. London Information Retrieval Meetup For each pair of Rank vectors, for each pair of pitch classes sharing the same index, we compute the corresponding interval. The underlying assumption is that we want to have a relative model which works regardless the input key/tonality. • INPUT => n Rank Vectors • OUTPUT => n - 1 Interval Vectors Distances Step #2: Intervals 0 - C 2 - D 0 - C 0 - C 1 - C# 3 - D# 1 - C# 1 - C# 2 - D 0 - C 3 - D# 2 - D 3 - D# 1 - C# 4 - E 3 - D# 4 - E 4 - E 2 - D 6 - F# (defn semitones [x y] (if (>= y x) (- y x) (+ (- 11 (- x 1)) y))) 2 10 0 2 10 0 10 3 11 10 3 11 0 10 4
  27. 27. London Information Retrieval Meetup Step #3: Hashing Each Interval Vector is reduced to a number (hash). The hash uniquely represents a given intervals distribution. • INPUT => n-1 Interval Vectors • OUTPUT => 1 Hash Vector Summarisation 2 10 0 … 2 10 0 … 10 3 11 … 10 3 11 … 0 10 4 … ∑xi *12^i 18476 213106 103536 …
  28. 28. London Information Retrieval Meetup Step #4: Partitions The hash vector is partitioned using a size and an overlap parameters (see examples on the left). An audio input is therefore represented as a list of partitions derived from the hash vector where • each partition is a term • ordering is preserved: • in the term metadata (e.g. position increment) • implicitly in the term value Bags of Bags of Words 39483 4149 1440 3213 33232 44256 6654 4432 4149 7843 … Partition of 3 hashes without overlap [39483, 4149, 1440] [3213, 33232, 44256] [6654,4432, 4149], … Partition of 3 hashes with overlap = 1 [39483, 4149, 1440] [1440, 3213, 33232] [33232, 44256, 6654], … Partition of 5 hashes without overlap [39483,4149,1440,3213,33232] [44256,6654,4432,4149,7843] … (Examples)
  29. 29. London Information Retrieval Meetup Step #5: Multiple choices for the output { "title": "Zombie", "hashes": [“39483,4149,1440”, “3213,33232,44256”, “6654,4432, 4149”, … ] } Only one partition type Only one partition type with different overlaps { "title": "Zombie", “hashes_overlap_0”: [“39483,4149,1440”, “3213,33232,44256”, “6654,4432, 4149”, … ], “hashes_overlap_1”: [“39483,4149,1440”, “1440,3213,33232”, “33232,44256,6654”, … ], … } Multiple partition types { "title": "Zombie", “hashes_3”: [“39483,4149,1440”, “3213,33232,44256”, “6654,4432, 4149”, … ], “hashes_5”: [“39483,4149,1440,3213,33232”, “33232,44256,6654,4432,4149”, … ], … }
  30. 30. London Information Retrieval Meetup Sample test query Sample Data Sample test data Money, Pink Floyd https://www.youtube.com/watch?v=-0kcet4aPpQ {Q1 - Main Bass Riff (original tonality) Q2 - Main Bass Riff (different tonality) Q3 - Descendent pentatonic scale (different tonality) Zombie, Cranberries https://www.youtube.com/watch?v=6Ejga4kJUts { Q1 - Main Bass Riff (original tonality) Q2 - Main Melody Q3 - Chorus Stand By Me, John Lennon https://www.youtube.com/watch?v=YqB8Dm65X18 {Q1 - Main Bass Riff (different tonality) Q2 - Main Melody
  31. 31. London Information Retrieval Meetup V1.5 Sample Config V1.0 <requestHandler name="standard"> <lst name="defaults"> <str name="defType">edismax</str> <str name="qf">hash</str> <str name="mm">10%</str> <str name="fl">*,score</str> </lst> </requestHandler><lst name="defaults"> <str name="defType">edismax</str> <str name="qf">hash_3</str> <str name="fl">*,score</str> <str name="pf2">hash_2 hash_3^10 hash_4^100 hash_5^1000</str> <str name="ps2">3</str> <str name="pf3">hash_2^100 hash_3^1000 hash_4^10000 hash_5^100000</str> <str name="ps3">2</str> <str name="boost">product(query({!lucene df=hash_2 v=$q}), 10)</str> <str name="boost">product(query({!lucene df=hash_3 v=$q}), 100)</str> <str name="boost">product(query({!lucene df=hash_4 v=$q}), 1000)</str> <str name="boost">product(query({!lucene df=hash_5 v=$q}), 10000)</str> <str name="tie">0.1</str>
  32. 32. London Information Retrieval Meetup Nice shot #1 Q2 - Money (Pink Floyd): main bass riff (different tonality) Q1 - Money (Pink Floyd): main bass riff (original tonality) Same melody (bass riff) in different tonalities yield similar results! P@1 P@2 P@10 NDCG
  33. 33. London Information Retrieval Meetup Nice shot #2 Q2 - Zombie (Cranberries): main melody (different tonality) Q1 - Zombie (Cranberries): main bass riff (original tonality) Position matters! P@1 P@2 P@10 NDCG
  34. 34. London Information Retrieval Meetup Nice shot #3 Q1 - Zombie (Cranberries): main bass riff (original tonality) Good similarity even if the query pattern is not in result. P@1
  35. 35. London Information Retrieval Meetup To be improved Q1 - Zombie (Cranberries): main bass riff (original tonality) Same query as before, red items have a very different “shape” P@1
  36. 36. London Information Retrieval Meetup To be improved If we were looking for “Money” these would look like great results… …but unfortunately the input query here is the “Stand by me” bass line
  37. 37. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Query by Humming and Cover Song Detection ➢ Chroma Feature ➢ Music Language Essentials ➢ Ranked Interval Hashing ✓Next Steps ➢ Q&A Agenda
  38. 38. London Information Retrieval Meetup Further areas of investigation • Different types of Chroma Features • “Stop” chroma vectors / intervals / energies • Input data understanding & pre-processing • Complementary approaches • Additional audio features • Query time • Machine Learning
  39. 39. London Information Retrieval Meetup Chroma Matrix Clean Up Original Threshold = 50% Threshold = 70% Threshold = 80% Top frequency
  40. 40. London Information Retrieval Meetup ➢ Music Information Retrieval (MIR) ➢ Query by Humming and Cover Song Detection ➢ Chroma Feature ➢ Music Language Essentials ➢ Ranked Interval Hashing ➢ Next Steps ✓Q&A Agenda
  41. 41. London Information Retrieval Meetup Links Chroma Feature Analysis and Synthesis https://labrosa.ee.columbia.edu/matlab/chroma-ansyn/ Chroma (Java) https://github.com/adiyoss/Chroma CHROMA TOOLBOX: MATLAB IMPLEMENTATIONS FOR EXTRACTING VARIANTS OF CHROMA-BASED AUDIO FEATURES https://www.academia.edu/30760846/ CHROMA_TOOLBOX_MATLAB_IMPLEMENTATIONS_FOR_EXTRACTING_VARIANTS_OF_CH ROMA-BASED_AUDIO_FEATURES FALCON: FAst Lucene-based Cover sOng identification https://www.researchgate.net/publication/221571301_FALCON_FAst_Lucene-based_Cover_sOng_identification Music Theory https://en.wikipedia.org/wiki/Music_theory Inversions https://en.wikipedia.org/wiki/Inversion_(music) Intervals https://en.wikipedia.org/wiki/Interval_(music) Chroma Feature https://en.wikipedia.org/wiki/Chroma_feature
  42. 42. London Information Retrieval Meetup 21 OCT 2019 Thank you! Music Information Retrieval Take 2 Interval Hashing Based Ranking Andrea Gazzarini, Software Engineer 21th October 2019

×