Music Information Retrieval
DNA like classification of publicly accessible broadcast stations
Author:
Ganesh K Harugeri
Examiner:
apl. Prof. Dr. habil. Marcus Liwicki
What is Information Retrieval?
● It is a discipline that deals with retrieval of unstructured data
● Mainy textual documents
● This is in response to a query or topic statement
● Output may be unstructured or structured
● Example:
○ A sentence or even another document (Unstructured),
○ A boolean expression (Structured document) [1]
What is Music Information Retrieval?
● Music Information retrieval is an interdisciplinary research, involving
musicology, psychology, signal processing, machine learning or combination
among these disciplines.
● Music information retrieval includes
○ Discovery
○ Analysis and
○ Collection
○ of large amount of music or music related data.
Now, The Objective of the Study..
● Germany has over 200 publicly accessible radio stations
● Evaluate to systematically classify these radio stations
● Additionally, interpret these stations by a set of empirical factors
● The study helps in assigning DNA like identification after evaluation
● Few empirical factors to mention here are:
○ Proportion of German speaking artists (%)
○ Proportion of Newcomer artists (%)
○ Proportion of Distinct titles (%) etc.
The Study Workflow
Part-1
The study is divided into three tasks:
● Data collection
● Data validation and
● Data analysis
Part-2
As research experiment
● Audio fingerprint use in broadcast monitoring
Data collection
● Multiple sources to address to collect all required data
○ By commercial music monitoring service
○ By web scrapers (for most of the classical radio stations)
● Commercial sources
○ CSV
● Web scrapers
○ API used - jsoup
● Requester programs (collective work)
○ These collect music metadata from music information services
○ Two types - programs taking artist as input, another artist-title combination as
input
○ Those are Deezer, Discogs, MusicBrainz, SoundCloud and Spotify
Data Flow and Data CollectionProcess
Input for Requesters and their Output Usage
Experiment on Audio Fingerprint
● What is audio fingerprinting?
○ It is a summary of an audio object using a limited number of bits [2].
● Audio fingerprint need
○ It is crucial to think resourceful ways to accelerate the process
○ To analyse how effectively audio fingerprinting can be used in broadcast monitoring
○ To reduce the number of repeated requests to music information services.
Audio Fingerprint Realisation
● Various recent APIs were analysed
● Finally, musicg API was used which suited best to system requirements
● The system is prepared to take “.wav” file as input and generate fingerprints
● Use of the system:
○ Audio stream of 24 hrs is splitted to songs (.mp3)
○ “.mp3” are converted to “.wav” (ffmpeg)
○ This wave input generates fingerprint in byte stream
○ It give an opportunity to save it is as filetype or as byte[] in database
○ It also has an function to find similarity with other songs
Audio Fingerprinting Framework
Fingerprint Time Evaluation Analysis
Conclusions
1 2 6 7 8 10
Stations Deutsch
Sprachig
E-/gehob
Music
Anteil
Regional
Anteil
Nische
Anteil
Nach-
wuchs
Anteil
Eindeutige
Titel
Station1 16% 0% 1% 17% 26% 3%
Station2 12% 0% 0% 31% 10% 6%
Station3 17% 0% 0% 29% 25% 15%
Station4 18% 0% 0% 18% 21% 7%
Station5 3% 1% 0% 8% 10% 4%
Conclusions
● Station1 air played about 16% of songs of German artists (factor 1).
● Station3 air played 25% of the songs were originated by so called newcomers
identified by having not more than 2 albums published (Factor 8)
Other Benefits
● It is easy to conduct an investigation over a defined period, such as a month
● Documentation about empirical factors helps in calculation of royalties or
identification of copyright infringements
● Content categorisation helps in assigning a unique description to stations by
contents variety.
References
● Ed Greengrass. “Information retrieval: A survey”. In: (2000).
● S. Shum. An Introduction to Audio Fingerprinting," SLS Group Meeting,
October 2011. Oct. 2011. URL: http : / / people . csail . mit . edu / sshum
/talks/audio_fingerprinting_sls_24Oct2011.pdf

Thesis presentation on Music Information Retrieval

  • 1.
    Music Information Retrieval DNAlike classification of publicly accessible broadcast stations Author: Ganesh K Harugeri Examiner: apl. Prof. Dr. habil. Marcus Liwicki
  • 2.
    What is InformationRetrieval? ● It is a discipline that deals with retrieval of unstructured data ● Mainy textual documents ● This is in response to a query or topic statement ● Output may be unstructured or structured ● Example: ○ A sentence or even another document (Unstructured), ○ A boolean expression (Structured document) [1]
  • 3.
    What is MusicInformation Retrieval? ● Music Information retrieval is an interdisciplinary research, involving musicology, psychology, signal processing, machine learning or combination among these disciplines. ● Music information retrieval includes ○ Discovery ○ Analysis and ○ Collection ○ of large amount of music or music related data.
  • 4.
    Now, The Objectiveof the Study.. ● Germany has over 200 publicly accessible radio stations ● Evaluate to systematically classify these radio stations ● Additionally, interpret these stations by a set of empirical factors ● The study helps in assigning DNA like identification after evaluation ● Few empirical factors to mention here are: ○ Proportion of German speaking artists (%) ○ Proportion of Newcomer artists (%) ○ Proportion of Distinct titles (%) etc.
  • 5.
    The Study Workflow Part-1 Thestudy is divided into three tasks: ● Data collection ● Data validation and ● Data analysis Part-2 As research experiment ● Audio fingerprint use in broadcast monitoring
  • 6.
    Data collection ● Multiplesources to address to collect all required data ○ By commercial music monitoring service ○ By web scrapers (for most of the classical radio stations) ● Commercial sources ○ CSV ● Web scrapers ○ API used - jsoup ● Requester programs (collective work) ○ These collect music metadata from music information services ○ Two types - programs taking artist as input, another artist-title combination as input ○ Those are Deezer, Discogs, MusicBrainz, SoundCloud and Spotify
  • 7.
    Data Flow andData CollectionProcess
  • 8.
    Input for Requestersand their Output Usage
  • 9.
    Experiment on AudioFingerprint ● What is audio fingerprinting? ○ It is a summary of an audio object using a limited number of bits [2]. ● Audio fingerprint need ○ It is crucial to think resourceful ways to accelerate the process ○ To analyse how effectively audio fingerprinting can be used in broadcast monitoring ○ To reduce the number of repeated requests to music information services.
  • 10.
    Audio Fingerprint Realisation ●Various recent APIs were analysed ● Finally, musicg API was used which suited best to system requirements ● The system is prepared to take “.wav” file as input and generate fingerprints ● Use of the system: ○ Audio stream of 24 hrs is splitted to songs (.mp3) ○ “.mp3” are converted to “.wav” (ffmpeg) ○ This wave input generates fingerprint in byte stream ○ It give an opportunity to save it is as filetype or as byte[] in database ○ It also has an function to find similarity with other songs
  • 11.
  • 12.
  • 13.
    Conclusions 1 2 67 8 10 Stations Deutsch Sprachig E-/gehob Music Anteil Regional Anteil Nische Anteil Nach- wuchs Anteil Eindeutige Titel Station1 16% 0% 1% 17% 26% 3% Station2 12% 0% 0% 31% 10% 6% Station3 17% 0% 0% 29% 25% 15% Station4 18% 0% 0% 18% 21% 7% Station5 3% 1% 0% 8% 10% 4%
  • 14.
    Conclusions ● Station1 airplayed about 16% of songs of German artists (factor 1). ● Station3 air played 25% of the songs were originated by so called newcomers identified by having not more than 2 albums published (Factor 8)
  • 15.
    Other Benefits ● Itis easy to conduct an investigation over a defined period, such as a month ● Documentation about empirical factors helps in calculation of royalties or identification of copyright infringements ● Content categorisation helps in assigning a unique description to stations by contents variety.
  • 16.
    References ● Ed Greengrass.“Information retrieval: A survey”. In: (2000). ● S. Shum. An Introduction to Audio Fingerprinting," SLS Group Meeting, October 2011. Oct. 2011. URL: http : / / people . csail . mit . edu / sshum /talks/audio_fingerprinting_sls_24Oct2011.pdf