This document discusses music information retrieval and classifying German radio stations based on their content. It involves collecting music metadata from over 200 publicly accessible German radio stations using various data sources. The stations are then analyzed and classified based on empirical factors like the proportion of German artists played, newcomer artists, distinct song titles, etc. An experiment is also described to use audio fingerprinting to identify songs from radio broadcasts to reduce requests to music databases and help analyze broadcast monitoring. The results show classifications of 5 sample stations based on the empirical factors. Overall, the study aims to systematically classify radio stations in Germany based on their music content.
Thesis presentation on Music Information Retrieval
1. Music Information Retrieval
DNA like classification of publicly accessible broadcast stations
Author:
Ganesh K Harugeri
Examiner:
apl. Prof. Dr. habil. Marcus Liwicki
2. What is Information Retrieval?
● It is a discipline that deals with retrieval of unstructured data
● Mainy textual documents
● This is in response to a query or topic statement
● Output may be unstructured or structured
● Example:
○ A sentence or even another document (Unstructured),
○ A boolean expression (Structured document) [1]
3. What is Music Information Retrieval?
● Music Information retrieval is an interdisciplinary research, involving
musicology, psychology, signal processing, machine learning or combination
among these disciplines.
● Music information retrieval includes
○ Discovery
○ Analysis and
○ Collection
○ of large amount of music or music related data.
4. Now, The Objective of the Study..
● Germany has over 200 publicly accessible radio stations
● Evaluate to systematically classify these radio stations
● Additionally, interpret these stations by a set of empirical factors
● The study helps in assigning DNA like identification after evaluation
● Few empirical factors to mention here are:
○ Proportion of German speaking artists (%)
○ Proportion of Newcomer artists (%)
○ Proportion of Distinct titles (%) etc.
5. The Study Workflow
Part-1
The study is divided into three tasks:
● Data collection
● Data validation and
● Data analysis
Part-2
As research experiment
● Audio fingerprint use in broadcast monitoring
6. Data collection
● Multiple sources to address to collect all required data
○ By commercial music monitoring service
○ By web scrapers (for most of the classical radio stations)
● Commercial sources
○ CSV
● Web scrapers
○ API used - jsoup
● Requester programs (collective work)
○ These collect music metadata from music information services
○ Two types - programs taking artist as input, another artist-title combination as
input
○ Those are Deezer, Discogs, MusicBrainz, SoundCloud and Spotify
9. Experiment on Audio Fingerprint
● What is audio fingerprinting?
○ It is a summary of an audio object using a limited number of bits [2].
● Audio fingerprint need
○ It is crucial to think resourceful ways to accelerate the process
○ To analyse how effectively audio fingerprinting can be used in broadcast monitoring
○ To reduce the number of repeated requests to music information services.
10. Audio Fingerprint Realisation
● Various recent APIs were analysed
● Finally, musicg API was used which suited best to system requirements
● The system is prepared to take “.wav” file as input and generate fingerprints
● Use of the system:
○ Audio stream of 24 hrs is splitted to songs (.mp3)
○ “.mp3” are converted to “.wav” (ffmpeg)
○ This wave input generates fingerprint in byte stream
○ It give an opportunity to save it is as filetype or as byte[] in database
○ It also has an function to find similarity with other songs
14. Conclusions
● Station1 air played about 16% of songs of German artists (factor 1).
● Station3 air played 25% of the songs were originated by so called newcomers
identified by having not more than 2 albums published (Factor 8)
15. Other Benefits
● It is easy to conduct an investigation over a defined period, such as a month
● Documentation about empirical factors helps in calculation of royalties or
identification of copyright infringements
● Content categorisation helps in assigning a unique description to stations by
contents variety.
16. References
● Ed Greengrass. “Information retrieval: A survey”. In: (2000).
● S. Shum. An Introduction to Audio Fingerprinting," SLS Group Meeting,
October 2011. Oct. 2011. URL: http : / / people . csail . mit . edu / sshum
/talks/audio_fingerprinting_sls_24Oct2011.pdf