More info: http://mtg.upf.edu/node/3108
Abstract: Discovery of repeating structures in music is fundamental to its analysis, understanding and interpretation. We present a data-driven approach for the discovery of short-time melodic patterns in large collections of Indian art music. The approach first discovers melodic patterns within an audio recording and subsequently searches for their repetitions in the entire music collection. We compute similarity between melodic patterns using dynamic time warping (DTW). Furthermore, we investigate four different variants of the DTW cost function for rank refinement of the obtained results. The music collection used in this study comprises 1,764 audio recordings with a total duration of 365 hours. Over 13 trillion DTW distance computations are done for the entire dataset. Due to the computational complexity of the task, different lower bounding and early abandoning techniques are applied during DTW distance computation. An evaluation based on expert feedback on a subset of the dataset shows that the discovered melodic patterns are musically relevant. Several musically interesting relationships are discovered, yielding further scope for establishing novel similarity measures based on melodic patterns. The discovered melodic patterns can further be used in challenging computational tasks such as automatic raga recognition, composition identification and music recommendation.
Constantine Kotropoulos, Associate Professor, Aristotle University of Thessaloniki, Department of Informatics, Sparse and Low Rank Representations in Music Signal Analysis
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
This document describes an approach for automatic raga recognition in Indian art music using phrase-based vector space modeling. It involves discovering melodic patterns from a Carnatic music collection through intra-recording and inter-recording analysis. The patterns are then clustered into communities based on their network of similarities. Features are extracted from the pattern-recording relationships using term frequency-inverse document frequency weighting. These features are used to build a feature matrix for raga recognition.
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
More info: http://mtg.upf.edu/node/2998
Abstract: Musical melodies contain hierarchically organized events, where some events are more salient than others, acting as melodic landmarks. In Hindustani music melodies, an important landmark is the occurrence of a nyas. Occurrence of nyas is crucial to build and sustain the format of a rag and mark the boundaries of melodic motifs. Detection of nyas segments is relevant to tasks such as melody segmentation, motif discovery and rag recognition. However, detection of nyas segments is challenging as these segments do not follow explicit set of rules in terms of segment length, contour characteristics, and melodic context. In this paper we propose a method for the automatic detection of nyas segments in Hindustani music melodies. It consists of two main steps: a segmentation step that incorporates domain knowledge in order to facilitate the placement of nyas boundaries, and a segment classification step that is based on a series of musically motivated pitch contour features. The proposed method obtains significant accuracies for a heterogeneous data set of 20 audio music recordings containing 1257 nyas svar occurrences and total duration of 1.5 hours. Further, we show that the proposed segmentation strategy significantly improves over a classical piece-wise linear segmentation approach.
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
Sankalp Gulati proposed a methodology for discovering and characterizing melodic motives in large audio music collections using domain knowledge of Indian art music. The methodology involves extracting pitch, loudness, and timbre features from audio signals, representing melodies, calculating melodic similarity, extracting repeated patterns as motives, and analyzing the extracted motives. Gulati aims to apply this methodology to a collection of over 550 hours of Indian art music audio and evaluate the results through listening tests and user feedback.
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
This document summarizes research on computational melodic analysis of Indian art music. Key areas discussed include tonic identification, predominant melody estimation, motif discovery, melodic similarity, melodic pattern networks, raga recognition using melodic patterns, and resources like datasets and demonstrations of the techniques.
This document discusses using machine learning techniques for classifying Indian music. It provides examples of applying supervised learning methods like neural networks to tasks like raag recognition, composer identification, and melody tracking. The document also proposes representing Indian music structures like ragas and talas in a structured format to enable applications like music visualization, composition and query processing. Automatic recognition of ragas is framed as a sequential pattern classification problem that can use hidden Markov models.
Performance Comparison of Musical Instrument Family Classification Using Soft...Waqas Tariq
Nowadays, it appears essential to design automatic and efficacious classification algorithm for the musical instruments. Automatic classification of musical instruments is made by extracting relevant features from the audio samples, afterward classification algorithm is used (using these extracted features) to identify into which of a set of classes, the sound sample is possible to fit. The aim of this paper is to demonstrate the viability of soft set for audio signal classification. A dataset of 104 (single monophonic notes) pieces of Traditional Pakistani musical instruments were designed. Feature extraction is done using two feature sets namely perception based and mel-frequency cepstral coefficients (MFCCs). In a while, two different classification techniques are applied for classification task, which are soft set (comparison table) and fuzzy soft set (similarity measurement). Experimental results show that both classifiers can perform well on numerical data. However, soft set achieved accuracy up to 94.26% with best generated dataset. Consequently, these promising results provide new possibilities for soft set in classifying musical instrument sounds. Based on the analysis of the results, this study offers a new view on automatic instrument classification
Constantine Kotropoulos, Associate Professor, Aristotle University of Thessaloniki, Department of Informatics, Sparse and Low Rank Representations in Music Signal Analysis
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
This document describes an approach for automatic raga recognition in Indian art music using phrase-based vector space modeling. It involves discovering melodic patterns from a Carnatic music collection through intra-recording and inter-recording analysis. The patterns are then clustered into communities based on their network of similarities. Features are extracted from the pattern-recording relationships using term frequency-inverse document frequency weighting. These features are used to build a feature matrix for raga recognition.
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
More info: http://mtg.upf.edu/node/2998
Abstract: Musical melodies contain hierarchically organized events, where some events are more salient than others, acting as melodic landmarks. In Hindustani music melodies, an important landmark is the occurrence of a nyas. Occurrence of nyas is crucial to build and sustain the format of a rag and mark the boundaries of melodic motifs. Detection of nyas segments is relevant to tasks such as melody segmentation, motif discovery and rag recognition. However, detection of nyas segments is challenging as these segments do not follow explicit set of rules in terms of segment length, contour characteristics, and melodic context. In this paper we propose a method for the automatic detection of nyas segments in Hindustani music melodies. It consists of two main steps: a segmentation step that incorporates domain knowledge in order to facilitate the placement of nyas boundaries, and a segment classification step that is based on a series of musically motivated pitch contour features. The proposed method obtains significant accuracies for a heterogeneous data set of 20 audio music recordings containing 1257 nyas svar occurrences and total duration of 1.5 hours. Further, we show that the proposed segmentation strategy significantly improves over a classical piece-wise linear segmentation approach.
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
Sankalp Gulati proposed a methodology for discovering and characterizing melodic motives in large audio music collections using domain knowledge of Indian art music. The methodology involves extracting pitch, loudness, and timbre features from audio signals, representing melodies, calculating melodic similarity, extracting repeated patterns as motives, and analyzing the extracted motives. Gulati aims to apply this methodology to a collection of over 550 hours of Indian art music audio and evaluate the results through listening tests and user feedback.
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
This document summarizes research on computational melodic analysis of Indian art music. Key areas discussed include tonic identification, predominant melody estimation, motif discovery, melodic similarity, melodic pattern networks, raga recognition using melodic patterns, and resources like datasets and demonstrations of the techniques.
This document discusses using machine learning techniques for classifying Indian music. It provides examples of applying supervised learning methods like neural networks to tasks like raag recognition, composer identification, and melody tracking. The document also proposes representing Indian music structures like ragas and talas in a structured format to enable applications like music visualization, composition and query processing. Automatic recognition of ragas is framed as a sequential pattern classification problem that can use hidden Markov models.
Performance Comparison of Musical Instrument Family Classification Using Soft...Waqas Tariq
Nowadays, it appears essential to design automatic and efficacious classification algorithm for the musical instruments. Automatic classification of musical instruments is made by extracting relevant features from the audio samples, afterward classification algorithm is used (using these extracted features) to identify into which of a set of classes, the sound sample is possible to fit. The aim of this paper is to demonstrate the viability of soft set for audio signal classification. A dataset of 104 (single monophonic notes) pieces of Traditional Pakistani musical instruments were designed. Feature extraction is done using two feature sets namely perception based and mel-frequency cepstral coefficients (MFCCs). In a while, two different classification techniques are applied for classification task, which are soft set (comparison table) and fuzzy soft set (similarity measurement). Experimental results show that both classifiers can perform well on numerical data. However, soft set achieved accuracy up to 94.26% with best generated dataset. Consequently, these promising results provide new possibilities for soft set in classifying musical instrument sounds. Based on the analysis of the results, this study offers a new view on automatic instrument classification
The document outlines Olmo Cornelis' research on opportunities for symbiosis between Western and non-Western musical idioms from 2008-2014. It discusses his background, previous work digitizing an ethnomusicological archive, challenges in accurately describing non-Western music, and a proposed methodology using music information retrieval techniques to objectively analyze ethnic music and inform new compositions blending musical elements from different cultures.
1. The document describes an experiment that tested participants' ability to perceive simple spatial figures (circles, squares, triangles) rendered over loudspeaker arrays under different conditions of array configuration, spatial rendering technique, source velocity, and reverberation.
2. A 48-speaker circular array was used to render precomputed audio simulations of rotating figures, and participants indicated which figure they perceived.
3. Statistical analysis found significant effects of trajectory figure, speaker configuration, spatial renderer, and velocity on participant response accuracy. Circles and triangles were better perceived than squares.
DETECTING AND LOCATING PLAGIARISM OF MUSIC MELODIES BY PATH EXPLORATION OVER ...csandit
To the best of our knowledge, the issues of automatic detection of music plagiarism have never
been addressed before. This paper presents the design of an Automatic Music Melody
Plagiarism Detection (AMMPD) method to detect and locate the possible plagiarism in music
melodies. The key contribution of the work is an algorithm proposed to address the challenging
issues encountered in the AMMPD problem, including (1) the inexact matching of noisy and
inaccurate pitches of music audio and (2) the fast detection and positioning of similar
subsegments between suspicious music audio. The major novelty of the proposed method is that
we address the above two issues in temporal domain by means of a novel path finding approach
on a binarized 2-D bit mask in spatial domain. In fact, the proposed AMMPD method can not
only identify the similar pieces inside two suspicious music melodies, but also retrieve music
audio of similar melodies from a music database given a humming or singing query.
Experiments have been conducted to assess the overall performance and examine the effects of
various parameters introduced in the proposed method.
Graphical visualization of musical emotionsPranay Prasoon
The document discusses graphical visualization of musical emotions using artificial neural networks. 13 audio features are extracted from Hindustani classical music clips labeled as happy or sad. An ANN model with backpropagation algorithm is trained on 70% of data, validated on 15% and tested on 15%. The model correctly classified 15 of 17 happy clips and 21 of 22 sad clips. Testing was repeated 10 times with over 90% accuracy each time, showing the model effectively recognizes musical emotions. Future work involves expanding the model to recognize additional emotions and incorporating physiological features.
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that proposes a system for identifying ragas in North Indian classical music. It begins with background on ragas and their identifying musical elements. The system first uses a pitch detection tool to convert an audio file into a note sequence. It then applies two pakad matching methods - n-gram matching and delta occurrence with alpha-bounded gaps - to identify which raga's pakad most closely matches the note sequence. This identifies the raga of the input music. The document provides details on each step of the system to enable accurate raga identification from an audio file.
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Case-based Sequential Ordering of Songs for Playlist RecommendationUri Levanon
A model of contiguous sequential patterns of songs, in order to acquire the knowledge about which songs sound well together in a playlist.
A research paper that I have presented in September 2006 at the European Conference on Case Based Reasoning (ECCBR '06).
By Claudio Baccigalupo and Enric Plaza
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...chakravarthy Gopi
This document proposes using robust principal component analysis (RPCA) to separate singing voices from music accompaniment in monaural recordings. RPCA decomposes an audio signal matrix into a low-rank matrix representing the repetitive music accompaniment and a sparse matrix representing the more variable singing voice. RPCA is applied to the short-time Fourier transform (STFT) of an audio signal. A binary time-frequency mask is then used to separate the estimated singing voice and accompaniment based on their STFT magnitudes in the low-rank and sparse matrices. Evaluations on the MIR-1K dataset show this RPCA-based method achieves 1-1.4 dB higher source-to-distortion ratio than other state
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
Presentation for my PhD defense, Music Technology Group, Barcelona, Spain.
Resources: http://compmusic.upf.edu/node/304
Short abstract:
Automatically describing contents of recorded music is crucial for interacting with large volumes of audio recordings, and for developing novel tools to facilitate music pedagogy. Melody is a fundamental facet in most music traditions and, therefore, is an indispensable component in such description. In this thesis, we develop computational approaches for analyzing high-level melodic aspects of music performances in Indian art music (IAM), with which we can describe and interlink large amounts of audio recordings. With its complex melodic framework and well-grounded theory, the description of IAM melody beyond pitch contours offers a very interesting and challenging research topic. We analyze melodies within their tonal context, identify melodic patterns, compare them both within and across music pieces, and finally, characterize the specific melodic context of IAM, the rāgas. All these analyses are done using data-driven methodologies on sizable curated music corpora. Our work paves the way for addressing several interesting research problems in the field of music information research, as well as developing novel applications in the context of music discovery and music pedagogy.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
The music objects are classified into Monophonic and Polyphonic. In Monophonic there is only one track
which is the main melody that leads the song. In Polyphonic objects, there are several tracks that
accompany the main melody. Each track is a sequence of notes played simultaneously with other tracks.
But, the main melody captures the essence of the music and plays vital role in MIR. The MIR involves
representation of main melody as a sequence of notes played, extraction of repeating patterns from it and
matching of query sequence with frequent repeating sequential patterns constituting the music object.
Repeating patterns are subsequences of notes played time and again in a main melody with possible
variations in the notes to a tolerable extent. Similarly, the query sequence meant for retrieving a music
object may not contain the repeating patterns of the main melody in its exact form. Hence, extraction of
approximate patterns is essential for a MIR system. This paper proposes a novel method of finding
approximate repeating patterns for the purpose of MIR. The effectiveness of methodology is tested and
found satisfactory on real world data namely ‘Raga Surabhi’ an Indian Carnatic Music portal.
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSISijscai
This document analyzes the structure of the Indian classical raga Bhairavi through simple exponential smoothing. It finds the smoothing factor that best fits the raga's note sequence is 0.481384. It identifies the raga based on its ascent-descent pattern and characteristic note combinations. Melody analysis reveals the most significant melodies in the sequence, with lengths of 12 and 6 notes. The paper concludes the simple exponential model successfully captures the raga's structure and proposes further analyzing other ragas' structures using this and more advanced smoothing techniques.
This document discusses music data mining. It provides an overview of music data mining tasks and approaches. Some key tasks discussed are music similarity search, clustering, and music sequence mining. Music similarity search involves finding music files similar to a given file based on features and similarity measures. Clustering divides music data into groups of similar objects based on pairwise similarity. Music sequence mining finds frequent patterns that appear in the ordered sequences of elements in music data instances.
Transferring Singing Expressions from One Voice to Another for a Given SongNAVER Engineering
발표자: 용상언(KAIST 박사과정)
발표일: 2018.6.
노래를 못부르는 사람도 노래를 부르고, 녹음하는 행위를 즐길 수 있게 하는 방법에 대한 연구를 진행하고 있습니다. 그 중에서도 현재 중점적으로 진행하고 있는 노래를 잘 부르는 사람의 음악적 표현을 못 부르는 사람의 음색에 이식하는 방법에 대해 소개합니다. 해당 시스템의 간단한 구조부터 시스템이 현재 가지고 있는 한계점, 그리고 어떻게 응용 및 확장될 수 있는지에 대해서 이야기 해보려고 합니다.
Audio descriptive analysis of singer and musical instrument identification in...eSAT Journals
Abstract Music information retrieval (MIR) has reached to a reasonably stable state after advancement in the Low Level audio Descriptors (LLDs) and feature extraction techniques. The analysis of sound has now become simple by the continuous efforts and research of MIR community in the field of signal processing from last two decades. In north Indian classical music, a singer is accompanied by some instruments such as harmonium, violin or flute. These instruments are tuned in the same musical scale (pitch range) in which the singer is signing. Separate researches have been made in recent past to identify a musical instrument and a singer. In this paper, we have analyzed the low level audio descriptors, for singing voice and musical instrument sound together, that appears to human ear as similar with respect to ‘timbre’, to see if we could treat them same and use identification/ classification routines to classify them into their classes. We have used Hybrid Selection algorithm from wrapper technique(the one that uses classifier also in feature selection process) to identify and extract the features and K-Means and K nearest neighbor classifiers to classify and cross verify the accuracy of classification. The accuracy of classification achieved was 91.1% which clearly proves that musical instruments and singing voice that sounds similar in timbral aspect can be grouped together and classification is possible with mixed database of instruments and singing voices. Keywords: Music Information Retrieval (MIR), Timbre, Singing Voice, Low level Descriptors (LLD, North Indian Classical music. MIRTOOL BOX
A Computational Framework for Sound Segregation in Music Signals using MarsyasLuís Gustavo Martins
This document discusses a computational framework for sound segregation in music signals. It begins with acknowledgments of collaborators on the work. It then provides an overview of the research project, which involves developing an auditory scene analysis framework for sound segregation in polyphonic music signals. The document outlines the problem statement, main challenges, current state of research, related research areas, and the main contributions and proposed approach of the framework. It involves applying ideas from computational auditory scene analysis to define perceptual grouping cues and implement a flexible and efficient sound segregation system based on these cues.
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
This document describes research applying Fisher Linear Discriminant Analysis (LDA) and K-Nearest Neighbors (K-NN) algorithms to classify speech and music audio clips. It finds that Fisher LDA using single features like mel-frequency cepstral coefficients achieves classification error rates below 5%, outperforming K-NN. While combining multiple features does not improve LDA results, combining the outputs of LDA and K-NN classifiers using majority voting further lowers the error rate to 4.5%, demonstrating the benefit of classifier ensembles for this task.
This was my Technical Article paper presented on Symposium on Vision-Smart India 2017 held at CSI-MCKVIE Student Chapter and MCKV Institute of Engineering
This document presents a method called Spectrographic Image Template Matching (SITM) for the automatic classification of birdsong notes. SITM performs correlation calculations on spectrographic images to determine the similarity between notes in a bird's vocal repertoire. The method was tested on recordings of three Rufous-bellied Thrush individuals, correctly classifying 77-97% of notes from one recording, 64-82% from another, and 74-97% from the third. The automated classification of birdsong notes using SITM can help analyze long, varied songs and dialects with more precision and scalability than human classification alone.
The document outlines Olmo Cornelis' research on opportunities for symbiosis between Western and non-Western musical idioms from 2008-2014. It discusses his background, previous work digitizing an ethnomusicological archive, challenges in accurately describing non-Western music, and a proposed methodology using music information retrieval techniques to objectively analyze ethnic music and inform new compositions blending musical elements from different cultures.
1. The document describes an experiment that tested participants' ability to perceive simple spatial figures (circles, squares, triangles) rendered over loudspeaker arrays under different conditions of array configuration, spatial rendering technique, source velocity, and reverberation.
2. A 48-speaker circular array was used to render precomputed audio simulations of rotating figures, and participants indicated which figure they perceived.
3. Statistical analysis found significant effects of trajectory figure, speaker configuration, spatial renderer, and velocity on participant response accuracy. Circles and triangles were better perceived than squares.
DETECTING AND LOCATING PLAGIARISM OF MUSIC MELODIES BY PATH EXPLORATION OVER ...csandit
To the best of our knowledge, the issues of automatic detection of music plagiarism have never
been addressed before. This paper presents the design of an Automatic Music Melody
Plagiarism Detection (AMMPD) method to detect and locate the possible plagiarism in music
melodies. The key contribution of the work is an algorithm proposed to address the challenging
issues encountered in the AMMPD problem, including (1) the inexact matching of noisy and
inaccurate pitches of music audio and (2) the fast detection and positioning of similar
subsegments between suspicious music audio. The major novelty of the proposed method is that
we address the above two issues in temporal domain by means of a novel path finding approach
on a binarized 2-D bit mask in spatial domain. In fact, the proposed AMMPD method can not
only identify the similar pieces inside two suspicious music melodies, but also retrieve music
audio of similar melodies from a music database given a humming or singing query.
Experiments have been conducted to assess the overall performance and examine the effects of
various parameters introduced in the proposed method.
Graphical visualization of musical emotionsPranay Prasoon
The document discusses graphical visualization of musical emotions using artificial neural networks. 13 audio features are extracted from Hindustani classical music clips labeled as happy or sad. An ANN model with backpropagation algorithm is trained on 70% of data, validated on 15% and tested on 15%. The model correctly classified 15 of 17 happy clips and 21 of 22 sad clips. Testing was repeated 10 times with over 90% accuracy each time, showing the model effectively recognizes musical emotions. Future work involves expanding the model to recognize additional emotions and incorporating physiological features.
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that proposes a system for identifying ragas in North Indian classical music. It begins with background on ragas and their identifying musical elements. The system first uses a pitch detection tool to convert an audio file into a note sequence. It then applies two pakad matching methods - n-gram matching and delta occurrence with alpha-bounded gaps - to identify which raga's pakad most closely matches the note sequence. This identifies the raga of the input music. The document provides details on each step of the system to enable accurate raga identification from an audio file.
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Case-based Sequential Ordering of Songs for Playlist RecommendationUri Levanon
A model of contiguous sequential patterns of songs, in order to acquire the knowledge about which songs sound well together in a playlist.
A research paper that I have presented in September 2006 at the European Conference on Case Based Reasoning (ECCBR '06).
By Claudio Baccigalupo and Enric Plaza
SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMP...chakravarthy Gopi
This document proposes using robust principal component analysis (RPCA) to separate singing voices from music accompaniment in monaural recordings. RPCA decomposes an audio signal matrix into a low-rank matrix representing the repetitive music accompaniment and a sparse matrix representing the more variable singing voice. RPCA is applied to the short-time Fourier transform (STFT) of an audio signal. A binary time-frequency mask is then used to separate the estimated singing voice and accompaniment based on their STFT magnitudes in the low-rank and sparse matrices. Evaluations on the MIR-1K dataset show this RPCA-based method achieves 1-1.4 dB higher source-to-distortion ratio than other state
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
Presentation for my PhD defense, Music Technology Group, Barcelona, Spain.
Resources: http://compmusic.upf.edu/node/304
Short abstract:
Automatically describing contents of recorded music is crucial for interacting with large volumes of audio recordings, and for developing novel tools to facilitate music pedagogy. Melody is a fundamental facet in most music traditions and, therefore, is an indispensable component in such description. In this thesis, we develop computational approaches for analyzing high-level melodic aspects of music performances in Indian art music (IAM), with which we can describe and interlink large amounts of audio recordings. With its complex melodic framework and well-grounded theory, the description of IAM melody beyond pitch contours offers a very interesting and challenging research topic. We analyze melodies within their tonal context, identify melodic patterns, compare them both within and across music pieces, and finally, characterize the specific melodic context of IAM, the rāgas. All these analyses are done using data-driven methodologies on sizable curated music corpora. Our work paves the way for addressing several interesting research problems in the field of music information research, as well as developing novel applications in the context of music discovery and music pedagogy.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
The music objects are classified into Monophonic and Polyphonic. In Monophonic there is only one track
which is the main melody that leads the song. In Polyphonic objects, there are several tracks that
accompany the main melody. Each track is a sequence of notes played simultaneously with other tracks.
But, the main melody captures the essence of the music and plays vital role in MIR. The MIR involves
representation of main melody as a sequence of notes played, extraction of repeating patterns from it and
matching of query sequence with frequent repeating sequential patterns constituting the music object.
Repeating patterns are subsequences of notes played time and again in a main melody with possible
variations in the notes to a tolerable extent. Similarly, the query sequence meant for retrieving a music
object may not contain the repeating patterns of the main melody in its exact form. Hence, extraction of
approximate patterns is essential for a MIR system. This paper proposes a novel method of finding
approximate repeating patterns for the purpose of MIR. The effectiveness of methodology is tested and
found satisfactory on real world data namely ‘Raga Surabhi’ an Indian Carnatic Music portal.
AN INTERESTING APPLICATION OF SIMPLE EXPONENTIAL SMOOTHING IN MUSIC ANALYSISijscai
This document analyzes the structure of the Indian classical raga Bhairavi through simple exponential smoothing. It finds the smoothing factor that best fits the raga's note sequence is 0.481384. It identifies the raga based on its ascent-descent pattern and characteristic note combinations. Melody analysis reveals the most significant melodies in the sequence, with lengths of 12 and 6 notes. The paper concludes the simple exponential model successfully captures the raga's structure and proposes further analyzing other ragas' structures using this and more advanced smoothing techniques.
This document discusses music data mining. It provides an overview of music data mining tasks and approaches. Some key tasks discussed are music similarity search, clustering, and music sequence mining. Music similarity search involves finding music files similar to a given file based on features and similarity measures. Clustering divides music data into groups of similar objects based on pairwise similarity. Music sequence mining finds frequent patterns that appear in the ordered sequences of elements in music data instances.
Transferring Singing Expressions from One Voice to Another for a Given SongNAVER Engineering
발표자: 용상언(KAIST 박사과정)
발표일: 2018.6.
노래를 못부르는 사람도 노래를 부르고, 녹음하는 행위를 즐길 수 있게 하는 방법에 대한 연구를 진행하고 있습니다. 그 중에서도 현재 중점적으로 진행하고 있는 노래를 잘 부르는 사람의 음악적 표현을 못 부르는 사람의 음색에 이식하는 방법에 대해 소개합니다. 해당 시스템의 간단한 구조부터 시스템이 현재 가지고 있는 한계점, 그리고 어떻게 응용 및 확장될 수 있는지에 대해서 이야기 해보려고 합니다.
Audio descriptive analysis of singer and musical instrument identification in...eSAT Journals
Abstract Music information retrieval (MIR) has reached to a reasonably stable state after advancement in the Low Level audio Descriptors (LLDs) and feature extraction techniques. The analysis of sound has now become simple by the continuous efforts and research of MIR community in the field of signal processing from last two decades. In north Indian classical music, a singer is accompanied by some instruments such as harmonium, violin or flute. These instruments are tuned in the same musical scale (pitch range) in which the singer is signing. Separate researches have been made in recent past to identify a musical instrument and a singer. In this paper, we have analyzed the low level audio descriptors, for singing voice and musical instrument sound together, that appears to human ear as similar with respect to ‘timbre’, to see if we could treat them same and use identification/ classification routines to classify them into their classes. We have used Hybrid Selection algorithm from wrapper technique(the one that uses classifier also in feature selection process) to identify and extract the features and K-Means and K nearest neighbor classifiers to classify and cross verify the accuracy of classification. The accuracy of classification achieved was 91.1% which clearly proves that musical instruments and singing voice that sounds similar in timbral aspect can be grouped together and classification is possible with mixed database of instruments and singing voices. Keywords: Music Information Retrieval (MIR), Timbre, Singing Voice, Low level Descriptors (LLD, North Indian Classical music. MIRTOOL BOX
A Computational Framework for Sound Segregation in Music Signals using MarsyasLuís Gustavo Martins
This document discusses a computational framework for sound segregation in music signals. It begins with acknowledgments of collaborators on the work. It then provides an overview of the research project, which involves developing an auditory scene analysis framework for sound segregation in polyphonic music signals. The document outlines the problem statement, main challenges, current state of research, related research areas, and the main contributions and proposed approach of the framework. It involves applying ideas from computational auditory scene analysis to define perceptual grouping cues and implement a flexible and efficient sound segregation system based on these cues.
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
This document describes research applying Fisher Linear Discriminant Analysis (LDA) and K-Nearest Neighbors (K-NN) algorithms to classify speech and music audio clips. It finds that Fisher LDA using single features like mel-frequency cepstral coefficients achieves classification error rates below 5%, outperforming K-NN. While combining multiple features does not improve LDA results, combining the outputs of LDA and K-NN classifiers using majority voting further lowers the error rate to 4.5%, demonstrating the benefit of classifier ensembles for this task.
This was my Technical Article paper presented on Symposium on Vision-Smart India 2017 held at CSI-MCKVIE Student Chapter and MCKV Institute of Engineering
This document presents a method called Spectrographic Image Template Matching (SITM) for the automatic classification of birdsong notes. SITM performs correlation calculations on spectrographic images to determine the similarity between notes in a bird's vocal repertoire. The method was tested on recordings of three Rufous-bellied Thrush individuals, correctly classifying 77-97% of notes from one recording, 64-82% from another, and 74-97% from the third. The automated classification of birdsong notes using SITM can help analyze long, varied songs and dialects with more precision and scalability than human classification alone.
Similar to Mining Melodic Patterns in Large Audio Collections of Indian Art Music (20)
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
1. Mining Melodic Patterns in Large Audio
Collections of Indian Art Music
Sankalp Gulati*, Joan Serrà†, Vignesh Ishwar* and Xavier Serra*
*Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
†Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
SITIS – 2014 (MIRA), Marrakech, Morocco5
2. Multimedia Content – Music Recordings
>37 Million
songs
> 20,000 songs added
per day!
3. Multimedia Content – Audio Music
>37 Million
songs
> 20,000 songs added
per day!
Meaning navigation
Efficient
search
and
discovery
Relevant
recommenda/on
10. Music Collection
§ Indian Art Music
§ Carnatic music
§ Hindustani music
§ Rāga: melodic framework
§ Swar (note)
§ Swar prominence/role (Vādi, Samvādi, Nyās, Grah)
§ Pakads (characteristic melodic patterns)
11. Music Collection – Carnatic music
§ Why this music tradition?
§ Signal processing steps relatively easier
§ Main challenge due to melodic characteristics and
improvisation
§ Melodic patterns cues to rāga identification
12. Music Collection – Carnatic music
§ Dataset details
§ 1764 commercially available polyphonic audio
recordings (subset of CompMusic collection)
§ 365 hours of music ( > 50 billion audio samples)
§ Diverse dataset – gender, #ragas, #compositions…
X. Serra, “Creating research corpora for the computational study of music: the case of the Compmusic project,” in Proc. of the 53rd AES
International Conference on Semantic Audio, London, Jan. 2014.
13. Previous Work – Symbolic Data
§ Hungarian, Slovak, French, Sicilian, Bulgarian and
Appalachian Folk Melodies - (Juhász, 2006)
§ Cretan, Nova scotia and Essen Folk Melodies – (Conklin and
Anagnostopoulou, 2010, 2006)
§ Tunisian modal music -(Lartillot & Ayari, 2006).
Juhász, Z. (2006, June). A systematic comparison of different European folk music traditions using self-organizing maps. Journal of New
Music Research, 35(2), 95–112.
Conklin, D., & Anagnostopoulou, C. (2006). Segmental pattern discovery in music. INFORMS Journal on Computing, 18(3), 285–293.
Lartillot, O., & Ayari, M. (2006). Motivic pattern extraction in music, and application to the study of Tunisian modal music. South
African Computer Journal, 36, 16–28.
14. Previous Work – Audio Data (IAM)
§ Spotting motifs in Carnatic Music
§ Detecting melodic motifs in Hindustani music
§ Classification of melodic motifs
§ Discovering typical melodic motifs from one-liners
ical music us-
17] has exten-
sing pitch his-
ove prove the
ure for compu-
he characteris-
t motif recog-
nuation, in this
en a long Ala-
motif recogni-
c. J.C. Ross et.
ycle 2
empha-
cussion instru-
another work,
n a Bandish (a
sing elongated
valent to find-
erestingly, the
lative duration
The attempt in
series and em-
to identify the
the properties
ime series mo-
anav et. al. in
attempt motif
3, 18], time se-
om projection
es data. In [2],
mbling distance
treaming data.
usic matching
Common Sub-
ngest Common
me series motif
r a 2-3 second
min Alapana
erroneous, ow-
conclusions are presented in Section 6.
2. SADDLE POINTS
2.1 Saddle Points: Reducing the Search Space
0 1 2 3 4
−1000
0
1000
2000
Time in Minutes
FrequencyinCents
Pitch Contour of an Alapana
0 0.5 1
600
800
1000
1200
1400
Time in Seconds
FrequencyinCents
Motif1
0 0.5 1
600
800
1000
1200
1400
Time in Seconds
FrequencyinCents
Motif2
0 0.5 1
600
800
1000
1200
1400
Time in Seconds
FrequencyinCents
Motif3
Figure 2. a) Motifs Interspersed in an Alapana ; b) Mag-
nified Motif
The task of this paper is to attempt automatic spotting
of a motif that is queried. The motif is queried against a set
of Alapanas of a particular raga to obtain locations of the
occurrences of the motif. The task is non-trivial since in
Alapanas, rhythm is not maintained by a percussion instru-
ment. Figure 2 (a) shows repetitive occurrences of motifs
in a piece of music. An enlarged view of the motif is given
in Figure 2(b). Since the Alapana is much longer than the
motif, searching for a motif in an Alapana is like searching
for a needle in a haystack. After an analysis of the pitch
contours and discussions with professional musicians, it
was conjectured that the pitch contour can be quantized at
saddle points. Figure 3 shows an example phrase of the
raga Kamboji with the saddle points highlighted.
Musically, the saddle points are a measure of the extent
V. Ishwar, S. Dutta, A. Bellur, and H. Murthy, “Motif spotting in an Alapana in Carnatic music,” in Proc. of Int. Conf. on Music Information
Retrieval (ISMIR), 2013, pp. 499–504.
J. C. Ross, T. P. Vinutha, and P. Rao, “Detecting melodic motifs from audio for Hindustani classical music,” in Proc. of Int. Conf. on Music
Information Retrieval (ISMIR), 2012, pp. 193–198.
P. Rao, J. C. Ross, K. K. Ganguli, V. Pandit, V. Ishwar, A. Bellur, and H. A. Murthy, “Classification of melodic motifs in raga music with
time-series matching,” Journal of New Music Research, vol. 43, no. 1, pp. 115–131, Jan. 2014.
15. Previous Work – Time Series Analysis
§ Motif Discovery approaches
§ Exact, Online, Probabilistic discovery
§ Lower bounding (indexing techniques)
§ Symbolic representation - SAX
§ Lower bounds on dynamic time warping (DTW)
kshop on Research Issues in Data Mining and Knowledge Discovery, 2003 page 4
1 is of little utility, since the
ated in main memory may be
lution that would have been
owever, one had a symbolic
ing of the true distance, one
ric time series data mining
orithms, definitions and data
for discrete data, including
x trees. This is exactly the
ur symbolic representation of
regate approXimation), and
C APPROACH
y length n to be reduced to a
n, typically w << n). The
eger a, where a > 2. Table 2
sed in this and subsequent
oximation of a time series
time series wccC ˆ,...,ˆˆ
1!
s representing time series
phabet = {a,b,c}, a = 3)
notation used in this paper
dimensions, the data is divided into w equal sized “frames.” The
mean value of the data falling within a frame is calculated and a
vector of these values becomes the data-reduced representation.
The representation can be visualized as an attempt to approximate
the original time series with a linear combination of box basis
functions as shown in Figure 3.
Figure 3: The PAA representation can be visualized as an attempt
to model a time series with a linear combination of box basis
functions. In this case, a sequence of length 128 is reduced to 8
dimensions
The PAA dimensionality reduction is intuitive and simple, yet has
been shown to rival more sophisticated dimensionality reduction
techniques like Fourier transforms and wavelets [22, 23, 35].
We normalize each time series to have a mean of zero and a
standard deviation of one before converting it to the PAA
0 2 0 4 0 6 0 8 0 1 0 0 1 2 0
-1 .5
-1
-0 .5
0
0 .5
1
1 .5
c 1
c 2
c 3
c 4
c 5
c 6
c 7
c 8
C
C
A. Mueen, E. Keogh, Q. Zhu, S. Cash, and B. Westover, “Exact discovery of time series motifs,” in Proc. of SIAM Int. Con. on Data
Mining (SDM), 2009, pp. 1–12.
B. Chiu, E. Keogh, and S. Lonardi, “Probabilistic discovery of time series motifs,” Proc. ninth ACM SIGKDD Int. Conf. Knowl. Discov.
data Min. - KDD ’03, p. 493, 2003.
J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A symbolic representation of time series, with implications for streaming algorithms,” Proc.
8th ACM SIGMOD Work. Res. issues data Min. Knowl. Discov. - DMKD ’03, p. 2, 2003.
T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh, “Addressing big data time
series: mining trillions of time series subsequences under dynamic time warping,” ACM Transactions on Knowledge Discovery from Data
(TKDD), vol. 7, no. 3, pp. 10:1–10:31, Sep. 2013.
19. Data processing: Pre-processing
§ Predominant pitch estimation – Melodia
§ Designed for polyphonic music audio
§ Uses melodic contour characteristics
Signal processing
Learning
Salamon, Justin, and Emilia Gómez. "Melody extraction from polyphonic music signals using pitch contour characteristics." Audio,
Speech, and Language Processing, IEEE Transactions on 20.6 (2012): 1759-1770.
1http://nema.lis.illinois.edu/nema out/mirex2011/results/ame/indian08/ summary.html
§ Essentia implementation of Melodia
§ Use default parameters in Essentia
§ Performed well for Indian art music dataset in MIREX’111
20. Data processing: Pre-processing
§ Hertz to Cents conversion
§ Musically meaningful (logarithmic) scale
§ Tonic normalization
§ Robustness against different tonic pitches of the lead artists
§ Automatic tonic identification (Essentia implementation)
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H. A. Murthy, and X. Serra, “Automatic Tonic Identi- fication in Indian Art Music:
Approaches and Evalua- tion,” Journal of New Music Research, vol. 43, no. 01, pp. 55–73, 2014.
Pcents = 1200log2(PHz/ftonic)
21. Data processing: Pre-processing
§ Down-sampling
§ Histogram of Auto-correlation (ACF) at each lag value
§ Segments of 2 seconds
§ Significant drop in ACF for sampling rate of the pitch
contour more than 22.2 ms
22. Data processing: Subsequence generation
§ Melody segmentation – hard task
§ Nyās based segmentation
§ Brute-force segmentation
§ Sliding window with constant hop
§ 2 second window
§ Remove segments across silence regions ( > 0.5 seconds)
S. Gulati, J. Serrà, K. K. Ganguli, and X. Serra, “Landmark detection in hindustani music melodies,” in Proc. of Int. Computer Music Conf.,
Sound and Music Computing Conf., Athens, Greece, 2014, pp. 1062–1068.
23. Data processing: Subsequence generation
§ Uniform time-scaling
§ 5 scale factors{0.9, 0.95, 1.0, 1.05, 1.1}
§ Similarity computation of 16 out of 25 combinations
saved!! (S1.0à1.05 = S1.05à1.1)
25. Data processing: Mridangam Segments
§ Model based filtering
§ MFCC, spectral centroid & flatness
§ 46 ms frame size
§ Aggregate duration 2 seconds
§ Classifiers: Tree, KNN, NB, LR, SVM
§ Median filtering 20 seconds
D. Bogdanov, N. Wack, E. Go ́mez, S. Gulati, P. Herrera, O. Mayor, G. Roma, J. Salamon, J. Zapata, and X. Serra, “Essentia: an audio
analysis library for music information retrieval,” in Proc. of Int. Society for Music Information Retrieval Conf. (ISMIR), 2013, pp. 493–498
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J.
Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: machine learning in Python,” Journal of
Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
26. Intra-recording Discovery
§ Melodic Similarity
§ Dynamic time warping (DTW)
§ Cost matrix – Sq. Euclidean distance
§ 10% Sakoe-Chiba band
§ Step size [(1,0),(1,1),(0,1)]
§ No local constraint or penalties
§ Lower bounds
§ FL bound
§ LB_Keogh_EC / EQ
§ Statistics
§ 25 patterns per song
§ 79,000 total melodic patterns
§ 1.43 trillion similarity computations
§ 76 % computations avoided!!
H. Sakoe and S. Chiba, “Dynamic programming algorithm
optimization for spoken word recognition,” IEEE Trans. on
Acoustics, Speech, and Language Processing, vol. 26, no. 1,
pp. 43–50, 1978.
27. Inter-recording Search
§ Melodic Similarity
§ Dynamic time warping (DTW)
§ Cost matrix – Sq. Euclidean distance
§ 10% Sakoe-Chiba band
§ Step size [(1,0),(1,1),(0,1)]
§ No local constraint or penalties
§ Lower bounds
§ FL bound
§ LB_Keogh_EC / EQ
§ Statistics
§ 200 nearest neighbors
§ 15.8 million melodic patterns
§ 12.4 trillion similarity computations
§ 99 % computations avoided!!
H. Sakoe and S. Chiba, “Dynamic programming algorithm
optimization for spoken word recognition,” IEEE Trans. on
Acoustics, Speech, and Language Processing, vol. 26, no. 1,
pp. 43–50, 1978.
28. Rank Refinement
§ Melodic Similarity
§ Dynamic time warping (DTW)
§ Cost matrix – 4 distance measures
§ 10% Sakoe-Chiba band
§ Step size [(2,1),(1,1),(1,2)]
§ Local constraint, no penalties
§ No lower bounds
H. Sakoe and S. Chiba, “Dynamic programming algorithm
optimization for spoken word recognition,” IEEE Trans. on
Acoustics, Speech, and Language Processing, vol. 26, no. 1,
pp. 43–50, 1978.
29. Evaluation
S1 S2 S3
§ 79,000 seed patterns, 15.8 million searched patterns
§ 4 different distance measure for rank refinement
§ 200 seed pattern pairs
§ Top 10 searched patterns for 4 methods
§ Total of 8000 patterns (200*10*4)
30. Evaluation - Annotations
§ Professional musician with over 20 years of formal
training.
§ Listening short audio fragments (melodic patterns)
§ Listening Melodically similar: 1 (Good)
§ Melodically dissimilar: 0 (Bad)
31. Evaluation - Measures
§ Mean Average Precision (MAP)
§ Statistical significance
§ Mann-Whitney U test (P < 0.05)
§ Multiple comparison compensation
§ Holm-Bonferroni method
C. D. Manning, P. Raghavan, and H. Schuẗze, Introduction to information retrieval. Cambridge university press Cam- bridge, 2008, vol. 1.
H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” The annals of
mathematical statistics, vol. 18, no. 1, pp. 50– 60, 1947.
S. Holm, “A simple sequentially rejective multiple test pro- cedure,” Scandinavian journal of statistics, vol. 6, no. 2, pp. 65–70, 1979.
33. Results – Intra recording patterns
§ Fraction of melodically similar seed patterns
§ S1 (0.98), S2(0.67) and S3(0.31)
§ Well separated distance distributions
34. Results – Inter recording patterns
Table II
MAP SCORES FOR FOUR VARIANTS OF RANK REFINEMENT METHOD
(Vi) FOR EACH SEED CATEGORY (S1, S2 AND S3).
Seed Category V1 V2 V3 V4
S1 0.92 0.92 0.91 0.89
S2 0.68 0.73 0.73 0.66
S3 0.35 0.34 0.35 0.35
bounda
position
analysis
This
search C
work P
grant a
1434 f
from th
CSIC, a
[1] A.
bro
in
Co
35. Results – Inter recording patterns
Table II
MAP SCORES FOR FOUR VARIANTS OF RANK REFINEMENT METHOD
(Vi) FOR EACH SEED CATEGORY (S1, S2 AND S3).
Seed Category V1 V2 V3 V4
S1 0.92 0.92 0.91 0.89
S2 0.68 0.73 0.73 0.66
S3 0.35 0.34 0.35 0.35
bounda
position
analysis
This
search C
work P
grant a
1434 f
from th
CSIC, a
[1] A.
bro
in
Co
S1 S2 S3
36. Conclusions and Future work
§ Data driven unsupervised approach – melodic pattern discovery
§ DTW based distance measure is good for intra recording discovery
§ Need informed distance measures for inter song pattern search
§ DTW using Cityblock distance performs little better than the rest
§ Closer seed pattern pairs have higher MAP scores à higher
number of repetitions
§ Future Work
§ Similar analysis on Hindustani music
§ Transposition invariance
§ Network analysis from mined patterns
39. Mining Melodic Patterns in Large Audio
Collections of Indian Art Music
Sankalp Gulati*, Joan Serrà†, Vignesh Ishwar* and Xavier Serra*
*Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
†Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
SITIS – 2014 (MIRA), Marrakech, Morocco5