More info: http://mtg.upf.edu/node/2998
Abstract: Musical melodies contain hierarchically organized events, where some events are more salient than others, acting as melodic landmarks. In Hindustani music melodies, an important landmark is the occurrence of a nyas. Occurrence of nyas is crucial to build and sustain the format of a rag and mark the boundaries of melodic motifs. Detection of nyas segments is relevant to tasks such as melody segmentation, motif discovery and rag recognition. However, detection of nyas segments is challenging as these segments do not follow explicit set of rules in terms of segment length, contour characteristics, and melodic context. In this paper we propose a method for the automatic detection of nyas segments in Hindustani music melodies. It consists of two main steps: a segmentation step that incorporates domain knowledge in order to facilitate the placement of nyas boundaries, and a segment classification step that is based on a series of musically motivated pitch contour features. The proposed method obtains significant accuracies for a heterogeneous data set of 20 audio music recordings containing 1257 nyas svar occurrences and total duration of 1.5 hours. Further, we show that the proposed segmentation strategy significantly improves over a classical piece-wise linear segmentation approach.
Constantine Kotropoulos, Associate Professor, Aristotle University of Thessaloniki, Department of Informatics, Sparse and Low Rank Representations in Music Signal Analysis
Mining Melodic Patterns in Large Audio Collections of Indian Art MusicSankalp Gulati
More info: http://mtg.upf.edu/node/3108
Abstract: Discovery of repeating structures in music is fundamental to its analysis, understanding and interpretation. We present a data-driven approach for the discovery of short-time melodic patterns in large collections of Indian art music. The approach first discovers melodic patterns within an audio recording and subsequently searches for their repetitions in the entire music collection. We compute similarity between melodic patterns using dynamic time warping (DTW). Furthermore, we investigate four different variants of the DTW cost function for rank refinement of the obtained results. The music collection used in this study comprises 1,764 audio recordings with a total duration of 365 hours. Over 13 trillion DTW distance computations are done for the entire dataset. Due to the computational complexity of the task, different lower bounding and early abandoning techniques are applied during DTW distance computation. An evaluation based on expert feedback on a subset of the dataset shows that the discovered melodic patterns are musically relevant. Several musically interesting relationships are discovered, yielding further scope for establishing novel similarity measures based on melodic patterns. The discovered melodic patterns can further be used in challenging computational tasks such as automatic raga recognition, composition identification and music recommendation.
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
This document describes research applying Fisher Linear Discriminant Analysis (LDA) and K-Nearest Neighbors (K-NN) algorithms to classify speech and music audio clips. It finds that Fisher LDA using single features like mel-frequency cepstral coefficients achieves classification error rates below 5%, outperforming K-NN. While combining multiple features does not improve LDA results, combining the outputs of LDA and K-NN classifiers using majority voting further lowers the error rate to 4.5%, demonstrating the benefit of classifier ensembles for this task.
This document proposes a melody extraction method using multi-column deep neural networks (MCDNNs). The key points are:
1. An MCDNN architecture is used to classify frames into multiple pitch resolutions (e.g. 1 semitone, 0.5 semitone) for improved accuracy and resolution.
2. Data augmentation by pitch shifting and a singing voice detector are used to increase training data.
3. Hidden Markov models provide temporal smoothing of MCDNN outputs.
4. Evaluation on various datasets shows the MCDNN approach outperforms state-of-the-art methods for melody extraction.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
Everything happens somewhere and spatial analysis attempts to use location as an explanatory variable. Such analysis is made complex by the very many ways we habitually record spatial location, the complexity of spatial data structures, and the wide variety of possible domain-driven questions we might ask. One option is to develop and use software for specific types of spatial data, another is to use a purpose-built geographical information system (GIS), but determined work by R enthusiasts has resulted in a multiplicity of packages in the R environment that can also be used.
Constantine Kotropoulos, Associate Professor, Aristotle University of Thessaloniki, Department of Informatics, Sparse and Low Rank Representations in Music Signal Analysis
Mining Melodic Patterns in Large Audio Collections of Indian Art MusicSankalp Gulati
More info: http://mtg.upf.edu/node/3108
Abstract: Discovery of repeating structures in music is fundamental to its analysis, understanding and interpretation. We present a data-driven approach for the discovery of short-time melodic patterns in large collections of Indian art music. The approach first discovers melodic patterns within an audio recording and subsequently searches for their repetitions in the entire music collection. We compute similarity between melodic patterns using dynamic time warping (DTW). Furthermore, we investigate four different variants of the DTW cost function for rank refinement of the obtained results. The music collection used in this study comprises 1,764 audio recordings with a total duration of 365 hours. Over 13 trillion DTW distance computations are done for the entire dataset. Due to the computational complexity of the task, different lower bounding and early abandoning techniques are applied during DTW distance computation. An evaluation based on expert feedback on a subset of the dataset shows that the discovered melodic patterns are musically relevant. Several musically interesting relationships are discovered, yielding further scope for establishing novel similarity measures based on melodic patterns. The discovered melodic patterns can further be used in challenging computational tasks such as automatic raga recognition, composition identification and music recommendation.
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
This document describes research applying Fisher Linear Discriminant Analysis (LDA) and K-Nearest Neighbors (K-NN) algorithms to classify speech and music audio clips. It finds that Fisher LDA using single features like mel-frequency cepstral coefficients achieves classification error rates below 5%, outperforming K-NN. While combining multiple features does not improve LDA results, combining the outputs of LDA and K-NN classifiers using majority voting further lowers the error rate to 4.5%, demonstrating the benefit of classifier ensembles for this task.
This document proposes a melody extraction method using multi-column deep neural networks (MCDNNs). The key points are:
1. An MCDNN architecture is used to classify frames into multiple pitch resolutions (e.g. 1 semitone, 0.5 semitone) for improved accuracy and resolution.
2. Data augmentation by pitch shifting and a singing voice detector are used to increase training data.
3. Hidden Markov models provide temporal smoothing of MCDNN outputs.
4. Evaluation on various datasets shows the MCDNN approach outperforms state-of-the-art methods for melody extraction.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in RRevolution Analytics
Everything happens somewhere and spatial analysis attempts to use location as an explanatory variable. Such analysis is made complex by the very many ways we habitually record spatial location, the complexity of spatial data structures, and the wide variety of possible domain-driven questions we might ask. One option is to develop and use software for specific types of spatial data, another is to use a purpose-built geographical information system (GIS), but determined work by R enthusiasts has resulted in a multiplicity of packages in the R environment that can also be used.
This document summarizes a presentation on using string kernels for text classification. It introduces text classification and the challenge of representing text documents as feature vectors. It then discusses how kernel methods can be used as an alternative, by mapping documents into a feature space without explicitly extracting features. Different string kernel algorithms are described that measure similarity between documents based on common subsequences of characters. The document evaluates the performance of these kernels on a text dataset and explores ways to improve efficiency, such as through kernel approximation.
This document describes a method for automatically extracting key terms from spoken documents. It uses branching entropy to identify phrases, then extracts prosodic, lexical, and semantic features for machine learning. Three learning methods - K-means, AdaBoost, and neural networks - are evaluated. The best performance is from neural networks using all feature types. When applied to lecture transcripts, it achieves an F-measure of 67.31% for key terms, only slightly lower than human annotations.
This document discusses sequence learning from acoustic models to end-to-end automatic speech recognition (ASR) systems. It covers feedforward neural networks, recurrent neural networks including LSTM, connectionist temporal classification, and building an end-to-end ASR system. Experimental results on a low-resource language are also presented. Key papers on the topics are referenced.
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
This document describes an approach for automatic raga recognition in Indian art music using phrase-based vector space modeling. It involves discovering melodic patterns from a Carnatic music collection through intra-recording and inter-recording analysis. The patterns are then clustered into communities based on their network of similarities. Features are extracted from the pattern-recording relationships using term frequency-inverse document frequency weighting. These features are used to build a feature matrix for raga recognition.
Metric Learning for Music Discovery with Source and Target PlaylistsYing-Shu Kuo
Playlist generation for music exploration by defining sets of source songs and target songs and deriving a playlist through metric learning and boundary constraints.
https://github.com/hank5925/mlmdstp
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
This work presents an application of Fundamental Frequency (Pitch), Linear Predictive Cepstral Coefficient
(LPCC) and Mel Frequency Cepstral Coefficient (MFCC) in identification of sex of the speaker in speech
recognition research. The aim of this article is to compare the performance of these three methods for
identification of sex of the speakers. A successful speech recognition system can help in non critical operations
such as presenting the driving route to the driver, dialing a phone number, light switch turn on/off, the coffee
machine on/off etc. apart from speaker verification-caste wise, community wise and locality wise including
identification of sex. Here an attempt has been made to identify the sex of Bodo speakers through vowel
utterance by following Pitch value, LPCC and MFCC techniques. It is found here that the feature vector
organization of LPCC coefficients provides a more promising way of speech-speaker recognition in case of
Bodo Language than that of Pitch and MFCC.
Transferring Singing Expressions from One Voice to Another for a Given SongNAVER Engineering
발표자: 용상언(KAIST 박사과정)
발표일: 2018.6.
노래를 못부르는 사람도 노래를 부르고, 녹음하는 행위를 즐길 수 있게 하는 방법에 대한 연구를 진행하고 있습니다. 그 중에서도 현재 중점적으로 진행하고 있는 노래를 잘 부르는 사람의 음악적 표현을 못 부르는 사람의 음색에 이식하는 방법에 대해 소개합니다. 해당 시스템의 간단한 구조부터 시스템이 현재 가지고 있는 한계점, 그리고 어떻게 응용 및 확장될 수 있는지에 대해서 이야기 해보려고 합니다.
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
The task of identifying pieces of evidence in texts is of fundamental importance in supporting qualitative studies in various domains, especially in the humanities. In this paper, we coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. We devise a generic framework aimed at capturing themed evidence in texts based on a hybrid approach, combining statistical natural language processing, background knowledge, and Semantic Web technologies. The effectiveness of the method is demonstrated on a case study of a digital humanities database aimed at collecting and curating a repository of evidence of experiences of listening to music. Extensive experiments demonstrate that our hybrid approach outperforms alternative solutions. We also evidence its generality by testing it on a different use case in the digital humanities.
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Locality sensitive hashing (LSH) is a technique to improve the efficiency of near neighbor searches in high-dimensional spaces. LSH works by hash functions that map similar items to the same buckets with high probability. The document discusses applications of near neighbor searching, defines the near neighbor reporting problem, and introduces LSH. It also covers techniques like gap amplification to improve LSH performance and parameter optimization to minimize query time.
Prosody is an essential component of human speech. Prosody, broadly, describes all of the production qualities of speech that are not involved in conveying lexical information. Where the words are “what is said”, prosody is “how it is said”. Prosody of speech, plays an important role not only in communicating the syntax, semantics and pragmatics of spoken language, but also in conveying information about the speaker and their internal state (e.g. emotion or fatigue).
Understanding prosody is critical to understanding speech communication. Spoken language processing (SLP) technology that approaches human levels of competence will necessarily include automatic analysis of prosody. Despite the importance of prosody in spoken communication, researchers are often unable to reliably incorporate prosodic information into applications. One explanation is a lack of compact, consistent, and universal representations of prosodic information. This talk will describe the state of the art in prosodic analysis and its use in spoken language processing with a focus on the development of new representations of prosody.
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
This document provides an overview of RNA-seq and its applications. It discusses key aspects of RNA-seq including transcriptome profiling, alignment, quantification, differential expression analysis, clustering and visualization. It also covers experimental design considerations and highlights some commonly used tools and software. The document is a comprehensive guide that describes the RNA-seq workflow and analysis from start to finish.
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
Modal sense classification (MSC) is a special case of sense disambiguation relevant for distinguishing facts from hypotheses and speculations, or apprehended, planned and desired states of affairs. Prior approaches showed that even with carefully designed semantic feature sets, the models have difficulties beating the majority sense baseline in cases of difficult sense distinctions and when applying the models to heterogeneous text genres. Another drawback of former approaches is that feature implementation heavily depends on a external language-specific resources such as dependency or constituency parse trees and lexical databases such as WordNet or CELEX. To alleviate manual crafting of the features and to obtain a model which is easily portable to novel languages, we propose to cast MSC as a sentence classification task with a fixed sense inventory in a convolutional neural network (CNN) architecture. Our performance study shows that CNN is an appropriate model for MSC and its special properties motivate us to investigate it as a formal framework for general word sense disambiguation tasks.
The document discusses formal language theory and its applications in natural language processing (NLP). It covers two main goals in computational linguistics - theoretical interest in formally characterizing natural language and practical interest in using well-understood frameworks like finite state models to solve NLP problems. Finite state devices are widely used in NLP tasks due to their efficiency and ability to model linguistic phenomena like words through dictionaries and rules. While finite state models provide a useful approximation of language, natural languages pose challenges like ambiguity, long distance dependencies and non-regular features that require extensions to basic finite state models.
The document presents a novel method for processing automatic speech recognition (ASR) lattices using convolutional neural networks (CNNs). The method generalizes the convolutional layer to process both the posterior probabilities and lexical information in an ASR lattice. It was used in a CNN-based hierarchical discriminative model (HDM) for spoken language understanding. Evaluation on two corpora showed the CNN-based HDM improved over an original HDM based on support vector machines, achieving up to a 3% absolute improvement in concept accuracy.
This document summarizes an academic presentation about algorithms for aligning genomic sequences. It discusses edit distance models for comparing sequences and finding the best alignment. Local, global, and glocal alignment algorithms are introduced, including CHAOS for local alignment and LAGAN and Shuffle-LAGAN for multiple global and glocal alignments of whole genomes. Applications to aligning the cystic fibrosis gene and analyzing rearrangements between human and mouse are also summarized.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
Computational Approaches to Melodic Analysis of Indian Art Music discusses various computational approaches to analyzing the melody of Indian art music, including tonic identification and predominant pitch estimation. For tonic identification, the document describes using multipitch analysis of the audio signal along with a drone background to identify the tonic note, with reported 90% accuracy. It also discusses signal processing techniques used such as STFT and spectral peak picking. For predominant pitch estimation, it reviews various pitch estimation algorithms including autocorrelation-based, frequency-domain, and multipitch approaches, highlighting the YIN algorithm which produces fewer errors than other methods.
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
This document summarizes research on computational melodic analysis of Indian art music. Key areas discussed include tonic identification, predominant melody estimation, motif discovery, melodic similarity, melodic pattern networks, raga recognition using melodic patterns, and resources like datasets and demonstrations of the techniques.
More Related Content
Similar to Landmark Detection in Hindustani Music Melodies
This document summarizes a presentation on using string kernels for text classification. It introduces text classification and the challenge of representing text documents as feature vectors. It then discusses how kernel methods can be used as an alternative, by mapping documents into a feature space without explicitly extracting features. Different string kernel algorithms are described that measure similarity between documents based on common subsequences of characters. The document evaluates the performance of these kernels on a text dataset and explores ways to improve efficiency, such as through kernel approximation.
This document describes a method for automatically extracting key terms from spoken documents. It uses branching entropy to identify phrases, then extracts prosodic, lexical, and semantic features for machine learning. Three learning methods - K-means, AdaBoost, and neural networks - are evaluated. The best performance is from neural networks using all feature types. When applied to lecture transcripts, it achieves an F-measure of 67.31% for key terms, only slightly lower than human annotations.
This document discusses sequence learning from acoustic models to end-to-end automatic speech recognition (ASR) systems. It covers feedforward neural networks, recurrent neural networks including LSTM, connectionist temporal classification, and building an end-to-end ASR system. Experimental results on a low-resource language are also presented. Key papers on the topics are referenced.
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
This document describes an approach for automatic raga recognition in Indian art music using phrase-based vector space modeling. It involves discovering melodic patterns from a Carnatic music collection through intra-recording and inter-recording analysis. The patterns are then clustered into communities based on their network of similarities. Features are extracted from the pattern-recording relationships using term frequency-inverse document frequency weighting. These features are used to build a feature matrix for raga recognition.
Metric Learning for Music Discovery with Source and Target PlaylistsYing-Shu Kuo
Playlist generation for music exploration by defining sets of source songs and target songs and deriving a playlist through metric learning and boundary constraints.
https://github.com/hank5925/mlmdstp
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
This work presents an application of Fundamental Frequency (Pitch), Linear Predictive Cepstral Coefficient
(LPCC) and Mel Frequency Cepstral Coefficient (MFCC) in identification of sex of the speaker in speech
recognition research. The aim of this article is to compare the performance of these three methods for
identification of sex of the speakers. A successful speech recognition system can help in non critical operations
such as presenting the driving route to the driver, dialing a phone number, light switch turn on/off, the coffee
machine on/off etc. apart from speaker verification-caste wise, community wise and locality wise including
identification of sex. Here an attempt has been made to identify the sex of Bodo speakers through vowel
utterance by following Pitch value, LPCC and MFCC techniques. It is found here that the feature vector
organization of LPCC coefficients provides a more promising way of speech-speaker recognition in case of
Bodo Language than that of Pitch and MFCC.
Transferring Singing Expressions from One Voice to Another for a Given SongNAVER Engineering
발표자: 용상언(KAIST 박사과정)
발표일: 2018.6.
노래를 못부르는 사람도 노래를 부르고, 녹음하는 행위를 즐길 수 있게 하는 방법에 대한 연구를 진행하고 있습니다. 그 중에서도 현재 중점적으로 진행하고 있는 노래를 잘 부르는 사람의 음악적 표현을 못 부르는 사람의 음색에 이식하는 방법에 대해 소개합니다. 해당 시스템의 간단한 구조부터 시스템이 현재 가지고 있는 한계점, 그리고 어떻게 응용 및 확장될 수 있는지에 대해서 이야기 해보려고 합니다.
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
The task of identifying pieces of evidence in texts is of fundamental importance in supporting qualitative studies in various domains, especially in the humanities. In this paper, we coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. We devise a generic framework aimed at capturing themed evidence in texts based on a hybrid approach, combining statistical natural language processing, background knowledge, and Semantic Web technologies. The effectiveness of the method is demonstrated on a case study of a digital humanities database aimed at collecting and curating a repository of evidence of experiences of listening to music. Extensive experiments demonstrate that our hybrid approach outperforms alternative solutions. We also evidence its generality by testing it on a different use case in the digital humanities.
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Locality sensitive hashing (LSH) is a technique to improve the efficiency of near neighbor searches in high-dimensional spaces. LSH works by hash functions that map similar items to the same buckets with high probability. The document discusses applications of near neighbor searching, defines the near neighbor reporting problem, and introduces LSH. It also covers techniques like gap amplification to improve LSH performance and parameter optimization to minimize query time.
Prosody is an essential component of human speech. Prosody, broadly, describes all of the production qualities of speech that are not involved in conveying lexical information. Where the words are “what is said”, prosody is “how it is said”. Prosody of speech, plays an important role not only in communicating the syntax, semantics and pragmatics of spoken language, but also in conveying information about the speaker and their internal state (e.g. emotion or fatigue).
Understanding prosody is critical to understanding speech communication. Spoken language processing (SLP) technology that approaches human levels of competence will necessarily include automatic analysis of prosody. Despite the importance of prosody in spoken communication, researchers are often unable to reliably incorporate prosodic information into applications. One explanation is a lack of compact, consistent, and universal representations of prosodic information. This talk will describe the state of the art in prosodic analysis and its use in spoken language processing with a focus on the development of new representations of prosody.
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
This document provides an overview of RNA-seq and its applications. It discusses key aspects of RNA-seq including transcriptome profiling, alignment, quantification, differential expression analysis, clustering and visualization. It also covers experimental design considerations and highlights some commonly used tools and software. The document is a comprehensive guide that describes the RNA-seq workflow and analysis from start to finish.
Colloquium talk on modal sense classification using a convolutional neural ne...Ana Marasović
Modal sense classification (MSC) is a special case of sense disambiguation relevant for distinguishing facts from hypotheses and speculations, or apprehended, planned and desired states of affairs. Prior approaches showed that even with carefully designed semantic feature sets, the models have difficulties beating the majority sense baseline in cases of difficult sense distinctions and when applying the models to heterogeneous text genres. Another drawback of former approaches is that feature implementation heavily depends on a external language-specific resources such as dependency or constituency parse trees and lexical databases such as WordNet or CELEX. To alleviate manual crafting of the features and to obtain a model which is easily portable to novel languages, we propose to cast MSC as a sentence classification task with a fixed sense inventory in a convolutional neural network (CNN) architecture. Our performance study shows that CNN is an appropriate model for MSC and its special properties motivate us to investigate it as a formal framework for general word sense disambiguation tasks.
The document discusses formal language theory and its applications in natural language processing (NLP). It covers two main goals in computational linguistics - theoretical interest in formally characterizing natural language and practical interest in using well-understood frameworks like finite state models to solve NLP problems. Finite state devices are widely used in NLP tasks due to their efficiency and ability to model linguistic phenomena like words through dictionaries and rules. While finite state models provide a useful approximation of language, natural languages pose challenges like ambiguity, long distance dependencies and non-regular features that require extensions to basic finite state models.
The document presents a novel method for processing automatic speech recognition (ASR) lattices using convolutional neural networks (CNNs). The method generalizes the convolutional layer to process both the posterior probabilities and lexical information in an ASR lattice. It was used in a CNN-based hierarchical discriminative model (HDM) for spoken language understanding. Evaluation on two corpora showed the CNN-based HDM improved over an original HDM based on support vector machines, achieving up to a 3% absolute improvement in concept accuracy.
This document summarizes an academic presentation about algorithms for aligning genomic sequences. It discusses edit distance models for comparing sequences and finding the best alignment. Local, global, and glocal alignment algorithms are introduced, including CHAOS for local alignment and LAGAN and Shuffle-LAGAN for multiple global and glocal alignments of whole genomes. Applications to aligning the cystic fibrosis gene and analyzing rearrangements between human and mouse are also summarized.
“Automatically learning multiple levels of representations of the underlying distribution of the data to be modelled”
Deep learning algorithms have shown superior learning and classification performance.
In areas such as transfer learning, speech and handwritten character recognition, face recognition among others.
(I have referred many articles and experimental results provided by Stanford University)
Similar to Landmark Detection in Hindustani Music Melodies (20)
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
Computational Approaches to Melodic Analysis of Indian Art Music discusses various computational approaches to analyzing the melody of Indian art music, including tonic identification and predominant pitch estimation. For tonic identification, the document describes using multipitch analysis of the audio signal along with a drone background to identify the tonic note, with reported 90% accuracy. It also discusses signal processing techniques used such as STFT and spectral peak picking. For predominant pitch estimation, it reviews various pitch estimation algorithms including autocorrelation-based, frequency-domain, and multipitch approaches, highlighting the YIN algorithm which produces fewer errors than other methods.
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
This document summarizes research on computational melodic analysis of Indian art music. Key areas discussed include tonic identification, predominant melody estimation, motif discovery, melodic similarity, melodic pattern networks, raga recognition using melodic patterns, and resources like datasets and demonstrations of the techniques.
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
Presentation for my PhD defense, Music Technology Group, Barcelona, Spain.
Resources: http://compmusic.upf.edu/node/304
Short abstract:
Automatically describing contents of recorded music is crucial for interacting with large volumes of audio recordings, and for developing novel tools to facilitate music pedagogy. Melody is a fundamental facet in most music traditions and, therefore, is an indispensable component in such description. In this thesis, we develop computational approaches for analyzing high-level melodic aspects of music performances in Indian art music (IAM), with which we can describe and interlink large amounts of audio recordings. With its complex melodic framework and well-grounded theory, the description of IAM melody beyond pitch contours offers a very interesting and challenging research topic. We analyze melodies within their tonal context, identify melodic patterns, compare them both within and across music pieces, and finally, characterize the specific melodic context of IAM, the rāgas. All these analyses are done using data-driven methodologies on sizable curated music corpora. Our work paves the way for addressing several interesting research problems in the field of music information research, as well as developing novel applications in the context of music discovery and music pedagogy.
Our presentation of Hindify at MusicHackDay Barcelona, 2013. Hindify is a music hack that automatically transforms an audio song and embed characteristics of Hindustani music such as slow/relaxed tempo, a drone (tanpura) in the background and tabla as the percussion instrument. This hack was done by Varun Jewalikar and Sankalp Gulati.
Some audio examples: https://soundcloud.com/sankalpg/sets/hindify
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
Sankalp Gulati proposed a methodology for discovering and characterizing melodic motives in large audio music collections using domain knowledge of Indian art music. The methodology involves extracting pitch, loudness, and timbre features from audio signals, representing melodies, calculating melodic similarity, extracting repeated patterns as motives, and analyzing the extracted motives. Gulati aims to apply this methodology to a collection of over 550 hours of Indian art music audio and evaluate the results through listening tests and user feedback.
Tonic Identification System for Indian Art MusicSankalp Gulati
The document describes a system for identifying the tonic pitch in Hindustani and Carnatic music recordings. The system utilizes both audio signals and metadata. It analyzes the audio to extract sinusoidal components and computes pitch salience over time. Potential tonic candidates are identified and further processed using signal processing techniques like harmonic summation. The system then identifies the tonic pitch class using a multi-pitch histogram and machine learning. It also estimates the correct octave of the tonic using predominant melody extraction and classification methods. The goal is to automatically label the tonic in large music databases, which provides fundamental information for music analysis tasks.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
The chapter Lifelines of National Economy in Class 10 Geography focuses on the various modes of transportation and communication that play a vital role in the economic development of a country. These lifelines are crucial for the movement of goods, services, and people, thereby connecting different regions and promoting economic activities.
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Landmark Detection in Hindustani Music Melodies
1. Landmark Detection in Hindustani
Music Melodies
Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1
sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu
1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
3Indian Institute of Technology Bombay, Mumbai, India
SMC-ICMC 2014, Athens, Greece
5
2. Indian Art Music
§ Hindustani music (North Indian music)
§ Carnatic music
4. Melodies in Hindustani Music
§ Rāg: melodic framework of Indian art music
S r R g G m M P d D n N
Do Re Mi Fa Sol La Ti
Svaras
5. Melodies in Hindustani Music
§ Rāg: melodic framework of Indian art music
S r R g G m M P d D n N
Do Re Mi Fa Sol La Ti
Svaras
Bhairavi Thaat
S r g m P d n
6. Melodies in Hindustani Music
§ Rāg: melodic framework of Indian art music
S r R g G m M P d D n N
Do Re Mi Fa Sol La Ti
Svaras
Bhairavi Thaat
*Nyās Svar (Rāg Bilaskhani todi)
S r* g* m* P d* n
S r g m P d n
Nyās translates to home/residence
7. § Example
Melodic Landmark: Nyās Occurrences
178 180 182 184 186 188 190
−200
0
200
400
600
800
Time (s)
F0frequency(cents)
N1 N5N4N3N2
A. K. Dey, Nyāsa in rāga: the pleasant pause in Hindustani music. Kanishka Publishers, Distributors,
2008.
10. Goal and Motivation
§ Methodology for detecting nyās occurrences
§ Motivation
§ Melodic motif discovery [Ross and Rao 2012]
§ Melodic segmentation
§ Music transcription
N N N N N
J. C. Ross and P. Rao,“DetecKon of raga-characterisKc phrases from Hindustani classical music
audio,” in Proc. of 2nd CompMusic Workshop, 2012, pp. 133– 138.
11. Methodology: Block Diagram
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Block diagram of the proposed methodology
12. Methodology: Pred. Pitch Estimation
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
13. Methodology: Segmentation
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
16. Methodology: Segment Classification
Predominant pitch
estimation
Tonic identification
Audio
Nyās svars
Pred. pitch est.
and
representation
Histogram
computation
SegmentationSvar
identification
Segmentation
Local
Feature
extractionContextual Local + Contextual
Segment classification Segment fusion
Segment
classification and
fusion
Nyās
Non nyās
Nyās
Non nyās
Non nyās
17. Pred. Pitch Estimation and Representation
§ Predominant pitch estimation
§ Method by Salamon and Gómez (2012)
§ Favorable results in MIREX’11
§ Tonic Normalization
§ Pitch values converted from Hertz to Cents
§ Multi-pitch approach by Gulati et al. (2014)
J. Salamon and E. Go ́mez, “Melody extracKon from polyphonic music signals using pitch contour charac- terisKcs,” IEEE TransacKons on
Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759–1770, 2012.
S. GulaK, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H. A. Murthy, and X. Serra, “AutomaKc Tonic IdenK- ficaKon in Indian Art Music:
Approaches and Evalua- Kon,” Journal of New Music Research, vol. 43, no. 01, pp. 55–73, 2014.
18. Melody Segmentation
1 2 3 4 5 6 7 8 9 10
850
900
950
1000
1050
1100
1150
T1
Sn
Sn-1
Sn+1
ε
ρ1T2 T3
ρ2
T4 T7 T8 T9
T5 T6
Nyās segment
Time (s)
FundamentalFrequency(cents)
§ Baseline: Piecewise linear
segmentation (PLS)
E. Keogh, S. Chu, D. Hart, and M. Pazzani, “SegmenKng Kme series: A survey and novel approach,” Data
Mining in Time Series Databases, vol. 57, pp. 1–22, 2004.
25. Feature Extraction: Local
§ Segment Length
§ μ and σ of pitch values
§ μ and σ of difference in adjacent peaks and valley
locations
d1 d2 d3 d4 d5
26. Feature Extraction: Local
§ Segment Length
§ μ and σ of pitch values
§ μ and σ of difference in adjacent peaks and valley
locations
§ μ and σ of the peak and valley amplitudes
p1
p2
p3
p4
p5
p6
27. Feature Extraction: Local
§ Segment Length
§ μ and σ of pitch values
§ μ and σ of difference in adjacent peaks and valley
locations
§ μ and σ of the peak and valley amplitudes
§ Temporal centroid (length normalized)
28. Feature Extraction: Local
§ Segment Length
§ μ and σ of pitch values
§ μ and σ of difference in adjacent peaks and valley
locations
§ μ and σ of the peak and valley amplitudes
§ Temporal centroid (length normalized)
§ Binary flatness measure
Flat or non-flat
33. Feature Extraction: Contextual
§ 4 different normalized segment lengths
§ Time difference from the succeeding and preceding
breath pauses
d1 d2
Breath
Pause
Breath
Pause
34. Feature Extraction: Contextual
§ 4 different normalized segment lengths
§ Time difference from the succeeding and
proceeding unvoiced regions
§ Local features of neighboring segments (9*2=18)
Breath
Pause
Breath
Pause
35. Segment Classification
§ Class: Nyās and Non-nyās
§ Classifiers:
§ Trees (min_sample_split=10)
§ K nearest neighbors (n_neighbors=5)
§ Naive bayes (fit_prior=False)
§ Logistic regression (class_weight=‘auto’)
§ Support vector machines (RBF)(class weight=‘auto’)
§ Testing methodology
§ Cross-fold validation
§ Software: Scikit-learn, version 0.14.1
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prejenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A.
Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duches- nay, “Scikit-learn: machine learning in Python,” Journal of Machine Learning
Research, vol. 12, pp. 2825– 2830, 2011
36. Evaluation: Dataset
§ Audio:
§ Number of recordings: 20 Ālap vocal pieces
§ Duration of recordings: 1.5 hours
§ Number of artists: 8
§ Number of rāgs: 16
§ Type of audio: 15 polyphonic commercial recordings, 5
in-house monophonic recordings**
§ Annotations: Musician with > 15 years of training
§ Statistics:
§ 1257 nyās segments
§ 150 ms to16.7 s
§ mean 2.46 s, median 1.47 s.
**Openly available under CC license in freesound.org
37. Evaluation: Measures
§ Boundary annotations (F-scores)
§ Hit rate
§ Allowed deviation: 100 ms
§ Label annotations (F-scores):
§ Pairwise frame clustering method [Levy and Sandler
2008]
§ Statistical significance: Mann-Whitney U test
(p=0.05)
§ Multiple comparison: Holm-Bonferroni method
M. Levy and M. Sandler, “Structural segmentaKon of musical audio by constrained clustering,” IEEE TransacKons on
Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 318–326, 2008.
H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochasKcally larger than the
other,” The annals of mathemaKcal staKsKcs, vol. 18, no. 1, pp. 50–60, 1947.
S. Holm, “A simple sequenKally rejecKve mulKple test procedure,” Scandinavian journal of staKsKcs, pp. 65– 70, 1979.
38. Evaluation: Baseline Approach
§ DTW based kNN classification (k=5)
§ Frequently used for time series classification
§ Random baselines
§ Randomly planting boundaries
§ Evenly planting boundaries at every 100 ms *
§ Ground truth boundaries, randomly assign class labels
X. Xi, E. J. Keogh, C. R. Shelton, L. Wei, and C. A. Ratanamahatana, “Fast Kme series classificaKon using numerosity
reducKon,” in Proc. of the Int. Conf. on Ma- chine Learning, 2006, pp. 1033–1040.
X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. J. Keogh, “Experimental com- parison of
representaKon methods and distance mea- sures for Kme series data,” Data Mining and Knowl- edge Discovery,
vol. 26, no. 2, pp. 275–309, 2013.
39. Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
40. Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
41. Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
42. Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
43. Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
44. Results: Nyās Boundary Annotation
Feat. DTW Tree KNN NB LR SVM
A
L 0.356 0.407 0.447 0.248 0.449 0.453
C 0.284 0.394 0.387 0.383 0.389 0.406
L+C 0.289 0.414 0.426 0.409 0.432 0.437
B
L 0.524 0.672 0.719 0.491 0.736 0.749
C 0.436 0.629 0.615 0.641 0.621 0.673
L+C 0.446 0.682 0.708 0.591 0.725 0.735
Table 1. F-scores for ny¯as boundary detection using PLS
method (A) and the proposed segmentation method (B).
Results are shown for different classifiers (Tree, KNN,
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
L 0.553 0.685 0.723 0.621 0.727 0.722
5. CONCLUSI
We proposed a metho
odies of Hindustani m
broad steps: melody
cation. For melody s
which incorporates d
boundary annotations
cal, contextual and th
that the performance
cantly better compar
dard dynamic time w
est neighbor classifie
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whic
the future we plan to
mances, which is a co
45. Results: Nyās Label Annotation
NB, LR, SVM) and local (L), contextual (C) and local to-
gether with contextual (L+C) features. DTW is the base-
line method used for comparison. F-score for the random
baseline obtained using RB2 is 0.184.
Feat. DTW Tree KNN NB LR SVM
A
L 0.553 0.685 0.723 0.621 0.727 0.722
C 0.251 0.639 0.631 0.690 0.688 0.674
L+C 0.389 0.694 0.693 0.708 0.722 0.706
B
L 0.546 0.708 0.754 0.714 0.749 0.758
C 0.281 0.671 0.611 0.697 0.689 0.697
L+C 0.332 0.672 0.710 0.730 0.743 0.731
Table 2. F-scores for ny¯as and non-ny¯as label annotations
task using PLS method (A) and the proposed segmenta-
tion method (B). Results are shown for different classifiers
(Tree, KNN, NB, LR, SVM) and local (L), contextual (C)
and local together with contextual (L+C) features. DTW is
the baseline method used for comparison. The best random
baseline F-score is 0.153 obtained using RB2.
mentation method consistently performs better than PLS.
However, the differences are not statistically significant.
proposed segmentatio
proach based on piece
set that includes only
form best. However,
textual information w
curacy. This indicate
melodic context whi
the future we plan to
mances, which is a co
sic. We also plan to
and different evaluati
Acknowledgments
This work is partly s
Council under the Eu
Program, as part of
agreement 267583).
from Generalitat de C
the European Commi
and European Social
6.
[1] A. D. Patel, Musi
UK: Oxford Univ
[2] W. S. Rockstro,
ers, and J. Rushto
46. Conclusions and Future work
§ Proposed segmentation better than PLS method
§ Proposed methodology better than standard DTW based
kNN classification
§ Local features yield highest accuracy
§ Contextual features are also important (maybe not
complementary to local features)
§ Future work
§ Perform similar analysis on Bandish performances
§ Incorporate Rāga specific knowledge
47. Landmark Detection in Hindustani
Music Melodies
Sankalp Gulati1, Joan Serrà2, Kaustuv K. Ganguli3 and Xavier Serra1
sankalp.gulati@upf.edu, jserra@iiia.csic.es, kaustuvkanti@ee.iitb.ac.in, xavier.serra@upf.edu
1Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
2Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Barcelona, Spain
3Indian Institute of Technology Bombay, Mumbai, India
SMC-ICMC 2014, Athens, Greece
5