In this paper, we present a benchmark dataset based on the KUSC classical music collection and provide
baseline key-finding comparison results. Audio key finding is a basic music information retrieval task; it
forms an essential component of systems for music segmentation, similarity assessment, and mood
detection. Due to copyright restrictions and a labor-intensive annotation process, audio key finding
algorithms have only been evaluated using small proprietary datasets to date. To create a common base for
systematic comparisons, we have constructed a dataset comprising of more than 3,000 excerpts of classical
music. The excerpts are made publicly accessible via commonly used acoustic features such as pitch-based
spectrograms and chromagrams. We introduce a hybrid annotation scheme that combines the use of title
keys with expert validation and correction of only the challenging cases. The expert musicians also provide
ratings of key recognition difficulty. Other meta-data include instrumentation. As demonstration of use of
the dataset, and to provide initial benchmark comparisons for evaluating new algorithms, we conduct a
series of experiments reporting key determination accuracy of four state-of-the-art algorithms. We further
show the importance of considering factors such as estimated tuning frequency, key strength or confidence
value, and key recognition difficulty in key finding. In the future, we plan to expand the dataset to include
meta-data for other music information retrieval tasks.
The goal of this project is to build a classifier able to predict whether a song is happy or sad analysing its lyrics. Most of the research on music classication is based on features
obtained by audio signals. However, the exploration of lyrics alone as a source of information can be relevant in music
classication. It is an interesting problem and it has not been widely explored in the literature.
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
The document discusses audio mining, which uses speech recognition technology to analyze digitized audio content like newscasts and meetings and create searchable indexes. It describes two main approaches: text-based indexing that converts speech to text, and phoneme-based indexing that works with sounds instead of text. Several challenges of audio mining are discussed, such as improving precision for applications like medical transcription. Potential uses of audio mining include analyzing customer service calls and intercepted phone conversations.
Knn a machine learning approach to recognize a musical instrumentIJARIIT
An outline is provided of a proposed system to recognize musical instruments using machine learning techniques. The system first extracts features from audio files using the MIR toolbox in Matlab. It then uses a hybrid feature selection method and vector quantization to identify instruments. Specifically, the key audio descriptors are selected and feature vectors are generated and matched to standard vectors to classify the instrument. The k-nearest neighbors algorithm is used for classification. Preliminary results show the system can accurately recognize instruments based on extracted acoustic features.
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
The ability of traditional packet level Forward Error Correction approaches can limit errors for
small sporadic network losses but when dropouts of large portions occur listening quality becomes an
issue. Services such as audio-on-demand drastically increase the loads on networks therefore new, robust
and highly efficient coding algorithms are necessary. One method overlooked to date, which can work
alongside existing audio compression schemes, is that which takes account of the semantics and natural
repetition of music through meta-data tagging. Similarity detection within polyphonic audio has presented
problematic challenges within the field of Music Information Retrieval. We present a system which works
at the content level thus rendering it applicable in existing streaming services. Using the MPEG–7 Audio
Spectrum Envelope (ASE) gives features for extraction and combined with k-means clustering enables
self-similarity to be performed within polyphonic audio.
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
"LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors"
Publisher: Association for Computational LinguisticsDecember 2012
Authors: Aaron Li-Feng Han, Derek F. Wong and Lidia S. Chao
Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, December 2012. Open tool https://github.com/aaronlifenghan/aaron-project-lepor
High Efficiency Video Coding (HEVC) has achieved significant coding efficiency improvement beyond
existing video coding standard by employing several new coding tools. Deblocking Filter, Sample Adaptive
Offset (SAO) and Adaptive Loop Filter (ALF) for in-loop filtering are currently introduced for the HEVC
standard. However, these filters are implemented in spatial domain despite the fact of temporal correlation
within video sequences. To reduce the artifacts and better align object boundaries in video, a proposed
algorithm in in-loop filtering is proposed. The proposed algorithm is implemented in HM-11.0 software.
This proposed algorithm allows an average bitrate reduction of about 0.7% and improves the PSNR of the
decoded frame by 0.05%, 0.30% and 0.35% in luminance and chroma.
The goal of this project is to build a classifier able to predict whether a song is happy or sad analysing its lyrics. Most of the research on music classication is based on features
obtained by audio signals. However, the exploration of lyrics alone as a source of information can be relevant in music
classication. It is an interesting problem and it has not been widely explored in the literature.
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
Musical Information Retrieval Take 2: Interval Hashing Based RankingSease
Retrieving musical records from a corpus of Information, using an audio input as a query is not an easy task. Various approaches try to solve the problem modelling the query and the corpus of Information as an array of hashes calculated from the chroma features of the audio input.
Scope of this talk is to introduce a novel approach in calculating such hashes, considering the intervals of the most intense pitches of sequential chroma vectors.
Building on the theoretical introduction, a prototype will show you this approach in action with Apache Solr with a sample dataset and the benefits of positional queries.
Challenges and future works will follow up as conclusive considerations.
The document discusses audio mining, which uses speech recognition technology to analyze digitized audio content like newscasts and meetings and create searchable indexes. It describes two main approaches: text-based indexing that converts speech to text, and phoneme-based indexing that works with sounds instead of text. Several challenges of audio mining are discussed, such as improving precision for applications like medical transcription. Potential uses of audio mining include analyzing customer service calls and intercepted phone conversations.
Knn a machine learning approach to recognize a musical instrumentIJARIIT
An outline is provided of a proposed system to recognize musical instruments using machine learning techniques. The system first extracts features from audio files using the MIR toolbox in Matlab. It then uses a hybrid feature selection method and vector quantization to identify instruments. Specifically, the key audio descriptors are selected and feature vectors are generated and matched to standard vectors to classify the instrument. The k-nearest neighbors algorithm is used for classification. Preliminary results show the system can accurately recognize instruments based on extracted acoustic features.
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
The ability of traditional packet level Forward Error Correction approaches can limit errors for
small sporadic network losses but when dropouts of large portions occur listening quality becomes an
issue. Services such as audio-on-demand drastically increase the loads on networks therefore new, robust
and highly efficient coding algorithms are necessary. One method overlooked to date, which can work
alongside existing audio compression schemes, is that which takes account of the semantics and natural
repetition of music through meta-data tagging. Similarity detection within polyphonic audio has presented
problematic challenges within the field of Music Information Retrieval. We present a system which works
at the content level thus rendering it applicable in existing streaming services. Using the MPEG–7 Audio
Spectrum Envelope (ASE) gives features for extraction and combined with k-means clustering enables
self-similarity to be performed within polyphonic audio.
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
"LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors"
Publisher: Association for Computational LinguisticsDecember 2012
Authors: Aaron Li-Feng Han, Derek F. Wong and Lidia S. Chao
Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, December 2012. Open tool https://github.com/aaronlifenghan/aaron-project-lepor
High Efficiency Video Coding (HEVC) has achieved significant coding efficiency improvement beyond
existing video coding standard by employing several new coding tools. Deblocking Filter, Sample Adaptive
Offset (SAO) and Adaptive Loop Filter (ALF) for in-loop filtering are currently introduced for the HEVC
standard. However, these filters are implemented in spatial domain despite the fact of temporal correlation
within video sequences. To reduce the artifacts and better align object boundaries in video, a proposed
algorithm in in-loop filtering is proposed. The proposed algorithm is implemented in HM-11.0 software.
This proposed algorithm allows an average bitrate reduction of about 0.7% and improves the PSNR of the
decoded frame by 0.05%, 0.30% and 0.35% in luminance and chroma.
An optimized framework for detection and tracking of video objects in challen...ijma
Segmentation and tracking are two important aspects in visual surveillance systems. Many barriers such as
cluttered background, camera movements, and occlusion make the robust detection and tracking a difficult
problem, especially in case of multiple moving objects. Object detection in the presence of camera noise
and with variable or unfavourable luminance conditions is still an active area of research. This paper
propose a framework which can effectively detect the moving objects and track them despite of occlusion
and a priori knowledge of objects in the scene. The segmentation step uses a robust threshold decision
algorithm which uses a multi-background model. The video object tracking is able to track multiple objects
along with their trajectories based on Continuous Energy Minimization. In this work, an effective
formulation of multi-target tracking as minimization of a continuous energy is combined with multibackground
registration. Apart from the recent approaches, it focus on making use of an energy that
corresponds to a more complete representation of the problem, rather than one that is amenable to global
optimization. Besides the image evidence, the energy function considers physical constraints, such as target
dynamics, mutual exclusion, and track persistence. The proposed tracking framework is able to track
multiple objects despite of occlusions under dynamic background conditions.
Column store decision tree classification of unseen attribute setijma
A decision tree can be used for clustering of frequently used attributes to improve tuple reconstruction time
in column-stores databases. Due to ad-hoc nature of queries, strongly correlative attributes are grouped
together using a decision tree to share a common minimum support probability distribution. At the same
time in order to predict the cluster for unseen attribute set, the decision tree may work as a classifier. In
this paper we propose classification and clustering of unseen attribute set using decision tree to improve
tuple reconstruction time.
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
The document proposes using a multi-agent system and ontology to extract useful information from large, unstructured datasets on the web. It discusses challenges with current information retrieval techniques, and how semantic web, ontologies, and multi-agent systems can help address these challenges by structuring data and allowing agents to cooperate. The proposed solution involves developing an ontology of the data domain, using this to build a structured dataset, and employing a multi-agent system using the JADE framework to analyze the dataset with data mining techniques to extract relevant information for users.
System analysis and design for multimedia retrieval systemsijma
Due to the extensive use of information technology and the recent developments in multimedia systems, the
amount of multimedia data available to users has increased exponentially. Video is an example of
multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio.
Content based video retrieval is an approach for facilitating the searching and browsing of large
multimedia collections over WWW. In order to create an effective video retrieval system, visual perception
must be taken into account. We conjectured that a technique which employs multiple features for indexing
and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate
this, content based indexing and retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
On deferred constraints in distributed database systemsijma
An atomic commit protocol (ACP) is a distributed algorithm used to ensure the atomicity property of
transactions in distributed database systems. Although ACPs are designed to guarantee atomicity, they add
a significant extra cost to each transaction execution time. This added cost is due to the overhead of the
required coordination messages and log writes at each involved database site to achieve atomicity. For this
reason, the continuing research efforts led to a number of optimizations that reduce the aforementioned
cost. The most commonly adopted optimizations in the database standards and commercial database
management systems are those designed around the early release of read locks of transactions. In this type
of optimizations, certain participating sites may start releasing the read locks held by transactions before
they are fully terminated across all participants. Hence, greatly enhancing concurrency among executing
transactions and, consequently, the overall system performance. However, this type of optimizations
introduces possible “execution infections” in the presence of deferred consistency constraints; a
devastating complication that may lead to non-serializable executions of transactions. Thus, this type of
optimizations could be considered useless, given the importance of preserving the consistency of the
database in presence of deferred constraints, unless this complication is resolved in a practical and
efficient manner. This is the essence of the “unsolicited deferred consistency constraints validation”
mechanism presented in this paper.
Integrated system for monitoring and recognizing students during class sessionijma
In this paper we propose a new student attendance system based on biometric authentication protocol. This
system is basically using the face detection and the recognition protocols to facilitate checking students’
attendance in the classroom. In the proposed system, the classroom’s camera is capturing the students’
photo, directly the face detection and recognition processes will be implemented to produce the instructor
attendance report. Actually, this system is more efficient than others student attendance methods since the
detection and the recognition are considered to be the best and fastest method for biometric attendance
system. Regarding to the students and instructor sides, the system is working without any preparation and
with no more effort.
A new keyphrases extraction method based on suffix tree data structure for ar...ijma
The document presents a new method for extracting keyphrases from Arabic documents using a suffix tree data structure. The method first cleans documents by removing stop words and special characters. It then generates a suffix tree document model to represent documents as sequences of words. Nodes in the suffix tree are scored based on factors like phrase length and term frequency-inverse document frequency. Highly scored nodes that are 1-3 words are selected as keyphrases. The keyphrases are used instead of full text for clustering Arabic documents with hierarchical clustering algorithms and various similarity measures. Experimental results showed the keyphrase extraction approach improved the quality of document clustering.
The application of computer aided learning to learn basic concepts of branchi...ijma
This document summarizes a study on the development of a Computer Aided Learning (CAL) application to help students learn basic concepts of branching and looping in logic algorithms. The CAL application includes 5 multimedia modules that teach key concepts through interactive exercises and video tutorials. A trial with 40 students found that most students scored 80% or higher on tests of each module, indicating the CAL application was effective at improving learning outcomes. The researchers conclude CAL applications have potential as an instructional tool to help more students succeed in introductory computer programming courses.
A survey on components of virtual 'umrah applicationijma
The document summarizes the components of a proposed Virtual 'Umrah application. It discusses five main components: 1) Content about 'Umrah rituals and activities; 2) Use of virtual reality technology to provide realistic experiences; 3) Inclusion of multimedia elements like text, images, video and audio; 4) User profiles to understand target users; and 5) Usability evaluation to test the application. The application aims to educate users about performing 'Umrah through an interactive virtual simulation using these various components. It was developed using a user-centered design approach to focus on involving users throughout the process.
Real time realistic illumination and rendering of cumulus cloudsijma
This document proposes an efficient and computationally inexpensive phenomenological approach for modeling and rendering cumulus clouds in real-time. It draws upon several existing approaches that it combines and extends. Specifically, it models the cloud shape using a hierarchy of agglutinated spheres or particles recursively defined on top of each other. It also develops a rendering approach based on observations of cumulus cloud characteristics, using Lambert's law of reflection for the cloud body and accounting for forward scattering at the corona. This allows for a realistic and credible cloud appearance while achieving real-time performance by implementing the rendering function using graphics hardware shaders. The approach is aimed at applications requiring realistic cloud synthesis like flight simulators.
A robust audio watermarking in cepstrum domain composed of sample's relation ...ijma
This document proposes a robust audio watermarking technique that embeds watermarks in the cepstrum domain based on the relationship between mean values of consecutive sample groups. The host audio is divided into frames, which are then divided into four equal-sized sub-frames. The sub-frames are transformed to the cepstrum domain. Watermarks are embedded by either interchanging or updating the differences between the mean values of the first two sub-frames and last two sub-frames, selectively distorting sample values in the sub-frames. This allows imperceptible embedding while making extraction computationally simple without needing to know distortion amounts. Simulation results show the technique is robust and imperceptible against attacks.
Image Denoising is an important part of diverse image processing and computer vision problems. The
important property of a good image denoising model is that it should completely remove noise as far as
possible as well as preserve edges. One of the most powerful and perspective approaches in this area is
image denoising using discrete wavelet transform (DWT). In this paper, comparison of various Wavelets at
different decomposition levels has been done. As number of levels increased, Peak Signal to Noise Ratio
(PSNR) of image gets decreased whereas Mean Absolute Error (MAE) and Mean Square Error (MSE) get
increased . A comparison of filters and various wavelet based methods has also been carried out to denoise
the image. The simulation results reveal that wavelet based Bayes shrinkage method outperforms other
methods.
Indoor 3 d video monitoring using multiple kinect depth camerasijma
The document describes a system for remote indoor 3D video monitoring using multiple Kinect depth cameras. The system addresses hardware limitations of connecting multiple Kinect cameras to individual computers by implementing a client-server architecture that allows an unlimited number of Kinects to be connected across different computers. 3D data from the Kinects is compressed before being sent over the network to reconstruct the scene and merge skeleton detections in the central client. An optimal camera layout is also proposed to minimize infrared interference while ensuring overlapping coverage for robust skeleton tracking of moving subjects.
Satellite image compression algorithm based on the fftijma
Image compression is minimizing the size in bytes of a graphics file without degrading the quality of the
image to an unacceptable level ,the reduction in file size allows more images to be stored in a given amount
of disk or memory space, it also reduces the time required for images to be sent over the ground This paper
presents a new coding scheme for satellite images. In this study we apply the fast Fourier transform and the
scalar quantization for standard LENA image and satellite image, The results obtained after the (SQ) phase
are encoded using entropy encoding, after decompression, the results show that it is possible to achieve
higher compression ratios, more than 78%, the results are discussed in the paper.
Consideration of reputation prediction of ladygaga using the mathematical mod...ijma
A mathematical model for the hit phenomenon in entertainment within a society is presented as a stochastic
process of interactions of human dynamics. The calculations for the Japanese motion picture market based
on to the mathematical model agree very well with the actual residue distribution in time. LADYGAGA are
also analyzed using the data of SNS as well.
This new algorithm mixes two or more images of different types and sizes by employing a shuffling
procedure combined with S-box substitution to perform lossless image encryption. This combines stream
cipher with block cipher, on the byte level, in mixing the images. When this algorithm was implemented,
empirical analysis using test images of different types and sizes showed that it is effective and resistant to
attacks.
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...ijma
This document discusses the implementation and performance comparison of two fast Fourier transform (FFT) algorithms - the Cooley-Tukey FFT and the Grigoryan FFT - on three Xilinx FPGAs. The Grigoryan FFT uses a decomposition based on paired transforms, while the Cooley-Tukey FFT uses a radix-2 decomposition. Both algorithms were implemented on Virtex-II Pro, Virtex-5, and Virtex-4 FPGAs. The results showed that the Grigoryan FFT operated at higher sampling rates and was faster than the Cooley-Tukey FFT. Additionally, the Virtex-5 FPGA provided the highest speed for implementing the Grigoryan FFT compared
E XPLORING T HE S ELF -E NHANCED M ECHANISM OF I NTERACTIVE A DVERTISING...ijma
The document discusses interactive advertising and explores its self-enhanced dissemination mechanism. It analyzes three cases of interactive ads: on official websites, social media, and mobile media. It identifies micro, meso, and macro level mechanisms. At the micro level, core, inner, and outer interaction reveal the process of interacting with content, people, and computers. The meso level shows self-fission-type spread and communication approaches. At the macro level, the communication effect of interactive advertising achieves a spiral increasing effect. The analysis enriches research on the communication effects of interactive advertising.
This document describes a student project to create an algorithm that generates a short music playlist based on one or more seed songs. The algorithm is based on a previous published method called AutoDJ that uses Gaussian process regression with a kernel function to predict a user's preference for additional songs based on attributes of the seed songs. The key aspects of the student's algorithm include using a kernel that is trained on song attribute data to learn song similarities, and generating a playlist sorted by predicted user preference for each song. The student's model and data differ from the original AutoDJ method primarily due to having a mix of continuous and categorical song attributes rather than purely categorical data.
The document discusses developing a model to compose monophonic world music using deep learning techniques. It proposes using a bi-axial recurrent neural network with one axis representing time and the other representing musical notes. The network will be trained on a dataset of MIDI files describing pitch, timing, and velocity of notes. It will also incorporate information from music theory on scales, chords, and other elements extracted from sheet music files. The goal is to generate unique musical sequences while adhering to music theory rules. The model aims to address the problem of composing long durations of background music for public spaces in an automated way.
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
Presentation for my PhD defense, Music Technology Group, Barcelona, Spain.
Resources: http://compmusic.upf.edu/node/304
Short abstract:
Automatically describing contents of recorded music is crucial for interacting with large volumes of audio recordings, and for developing novel tools to facilitate music pedagogy. Melody is a fundamental facet in most music traditions and, therefore, is an indispensable component in such description. In this thesis, we develop computational approaches for analyzing high-level melodic aspects of music performances in Indian art music (IAM), with which we can describe and interlink large amounts of audio recordings. With its complex melodic framework and well-grounded theory, the description of IAM melody beyond pitch contours offers a very interesting and challenging research topic. We analyze melodies within their tonal context, identify melodic patterns, compare them both within and across music pieces, and finally, characterize the specific melodic context of IAM, the rāgas. All these analyses are done using data-driven methodologies on sizable curated music corpora. Our work paves the way for addressing several interesting research problems in the field of music information research, as well as developing novel applications in the context of music discovery and music pedagogy.
This document summarizes a comparison of six sound analysis/synthesis systems conducted at the 2000 International Computer Music Conference. Each system analyzed the same set of 27 varied input sounds and output the results in a common format (SDIF). The comparison describes each system, compares them in terms of availability, sound models used, interpolation models, noise modeling, parameter mutability, required analysis parameters, and artifacts. The goal was not competition but providing information to help musicians choose appropriate analysis/synthesis tools.
An optimized framework for detection and tracking of video objects in challen...ijma
Segmentation and tracking are two important aspects in visual surveillance systems. Many barriers such as
cluttered background, camera movements, and occlusion make the robust detection and tracking a difficult
problem, especially in case of multiple moving objects. Object detection in the presence of camera noise
and with variable or unfavourable luminance conditions is still an active area of research. This paper
propose a framework which can effectively detect the moving objects and track them despite of occlusion
and a priori knowledge of objects in the scene. The segmentation step uses a robust threshold decision
algorithm which uses a multi-background model. The video object tracking is able to track multiple objects
along with their trajectories based on Continuous Energy Minimization. In this work, an effective
formulation of multi-target tracking as minimization of a continuous energy is combined with multibackground
registration. Apart from the recent approaches, it focus on making use of an energy that
corresponds to a more complete representation of the problem, rather than one that is amenable to global
optimization. Besides the image evidence, the energy function considers physical constraints, such as target
dynamics, mutual exclusion, and track persistence. The proposed tracking framework is able to track
multiple objects despite of occlusions under dynamic background conditions.
Column store decision tree classification of unseen attribute setijma
A decision tree can be used for clustering of frequently used attributes to improve tuple reconstruction time
in column-stores databases. Due to ad-hoc nature of queries, strongly correlative attributes are grouped
together using a decision tree to share a common minimum support probability distribution. At the same
time in order to predict the cluster for unseen attribute set, the decision tree may work as a classifier. In
this paper we propose classification and clustering of unseen attribute set using decision tree to improve
tuple reconstruction time.
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
The document proposes using a multi-agent system and ontology to extract useful information from large, unstructured datasets on the web. It discusses challenges with current information retrieval techniques, and how semantic web, ontologies, and multi-agent systems can help address these challenges by structuring data and allowing agents to cooperate. The proposed solution involves developing an ontology of the data domain, using this to build a structured dataset, and employing a multi-agent system using the JADE framework to analyze the dataset with data mining techniques to extract relevant information for users.
System analysis and design for multimedia retrieval systemsijma
Due to the extensive use of information technology and the recent developments in multimedia systems, the
amount of multimedia data available to users has increased exponentially. Video is an example of
multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio.
Content based video retrieval is an approach for facilitating the searching and browsing of large
multimedia collections over WWW. In order to create an effective video retrieval system, visual perception
must be taken into account. We conjectured that a technique which employs multiple features for indexing
and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate
this, content based indexing and retrieval systems were implemented using color histogram, Texture feature
(GLCM), edge density and motion..
On deferred constraints in distributed database systemsijma
An atomic commit protocol (ACP) is a distributed algorithm used to ensure the atomicity property of
transactions in distributed database systems. Although ACPs are designed to guarantee atomicity, they add
a significant extra cost to each transaction execution time. This added cost is due to the overhead of the
required coordination messages and log writes at each involved database site to achieve atomicity. For this
reason, the continuing research efforts led to a number of optimizations that reduce the aforementioned
cost. The most commonly adopted optimizations in the database standards and commercial database
management systems are those designed around the early release of read locks of transactions. In this type
of optimizations, certain participating sites may start releasing the read locks held by transactions before
they are fully terminated across all participants. Hence, greatly enhancing concurrency among executing
transactions and, consequently, the overall system performance. However, this type of optimizations
introduces possible “execution infections” in the presence of deferred consistency constraints; a
devastating complication that may lead to non-serializable executions of transactions. Thus, this type of
optimizations could be considered useless, given the importance of preserving the consistency of the
database in presence of deferred constraints, unless this complication is resolved in a practical and
efficient manner. This is the essence of the “unsolicited deferred consistency constraints validation”
mechanism presented in this paper.
Integrated system for monitoring and recognizing students during class sessionijma
In this paper we propose a new student attendance system based on biometric authentication protocol. This
system is basically using the face detection and the recognition protocols to facilitate checking students’
attendance in the classroom. In the proposed system, the classroom’s camera is capturing the students’
photo, directly the face detection and recognition processes will be implemented to produce the instructor
attendance report. Actually, this system is more efficient than others student attendance methods since the
detection and the recognition are considered to be the best and fastest method for biometric attendance
system. Regarding to the students and instructor sides, the system is working without any preparation and
with no more effort.
A new keyphrases extraction method based on suffix tree data structure for ar...ijma
The document presents a new method for extracting keyphrases from Arabic documents using a suffix tree data structure. The method first cleans documents by removing stop words and special characters. It then generates a suffix tree document model to represent documents as sequences of words. Nodes in the suffix tree are scored based on factors like phrase length and term frequency-inverse document frequency. Highly scored nodes that are 1-3 words are selected as keyphrases. The keyphrases are used instead of full text for clustering Arabic documents with hierarchical clustering algorithms and various similarity measures. Experimental results showed the keyphrase extraction approach improved the quality of document clustering.
The application of computer aided learning to learn basic concepts of branchi...ijma
This document summarizes a study on the development of a Computer Aided Learning (CAL) application to help students learn basic concepts of branching and looping in logic algorithms. The CAL application includes 5 multimedia modules that teach key concepts through interactive exercises and video tutorials. A trial with 40 students found that most students scored 80% or higher on tests of each module, indicating the CAL application was effective at improving learning outcomes. The researchers conclude CAL applications have potential as an instructional tool to help more students succeed in introductory computer programming courses.
A survey on components of virtual 'umrah applicationijma
The document summarizes the components of a proposed Virtual 'Umrah application. It discusses five main components: 1) Content about 'Umrah rituals and activities; 2) Use of virtual reality technology to provide realistic experiences; 3) Inclusion of multimedia elements like text, images, video and audio; 4) User profiles to understand target users; and 5) Usability evaluation to test the application. The application aims to educate users about performing 'Umrah through an interactive virtual simulation using these various components. It was developed using a user-centered design approach to focus on involving users throughout the process.
Real time realistic illumination and rendering of cumulus cloudsijma
This document proposes an efficient and computationally inexpensive phenomenological approach for modeling and rendering cumulus clouds in real-time. It draws upon several existing approaches that it combines and extends. Specifically, it models the cloud shape using a hierarchy of agglutinated spheres or particles recursively defined on top of each other. It also develops a rendering approach based on observations of cumulus cloud characteristics, using Lambert's law of reflection for the cloud body and accounting for forward scattering at the corona. This allows for a realistic and credible cloud appearance while achieving real-time performance by implementing the rendering function using graphics hardware shaders. The approach is aimed at applications requiring realistic cloud synthesis like flight simulators.
A robust audio watermarking in cepstrum domain composed of sample's relation ...ijma
This document proposes a robust audio watermarking technique that embeds watermarks in the cepstrum domain based on the relationship between mean values of consecutive sample groups. The host audio is divided into frames, which are then divided into four equal-sized sub-frames. The sub-frames are transformed to the cepstrum domain. Watermarks are embedded by either interchanging or updating the differences between the mean values of the first two sub-frames and last two sub-frames, selectively distorting sample values in the sub-frames. This allows imperceptible embedding while making extraction computationally simple without needing to know distortion amounts. Simulation results show the technique is robust and imperceptible against attacks.
Image Denoising is an important part of diverse image processing and computer vision problems. The
important property of a good image denoising model is that it should completely remove noise as far as
possible as well as preserve edges. One of the most powerful and perspective approaches in this area is
image denoising using discrete wavelet transform (DWT). In this paper, comparison of various Wavelets at
different decomposition levels has been done. As number of levels increased, Peak Signal to Noise Ratio
(PSNR) of image gets decreased whereas Mean Absolute Error (MAE) and Mean Square Error (MSE) get
increased . A comparison of filters and various wavelet based methods has also been carried out to denoise
the image. The simulation results reveal that wavelet based Bayes shrinkage method outperforms other
methods.
Indoor 3 d video monitoring using multiple kinect depth camerasijma
The document describes a system for remote indoor 3D video monitoring using multiple Kinect depth cameras. The system addresses hardware limitations of connecting multiple Kinect cameras to individual computers by implementing a client-server architecture that allows an unlimited number of Kinects to be connected across different computers. 3D data from the Kinects is compressed before being sent over the network to reconstruct the scene and merge skeleton detections in the central client. An optimal camera layout is also proposed to minimize infrared interference while ensuring overlapping coverage for robust skeleton tracking of moving subjects.
Satellite image compression algorithm based on the fftijma
Image compression is minimizing the size in bytes of a graphics file without degrading the quality of the
image to an unacceptable level ,the reduction in file size allows more images to be stored in a given amount
of disk or memory space, it also reduces the time required for images to be sent over the ground This paper
presents a new coding scheme for satellite images. In this study we apply the fast Fourier transform and the
scalar quantization for standard LENA image and satellite image, The results obtained after the (SQ) phase
are encoded using entropy encoding, after decompression, the results show that it is possible to achieve
higher compression ratios, more than 78%, the results are discussed in the paper.
Consideration of reputation prediction of ladygaga using the mathematical mod...ijma
A mathematical model for the hit phenomenon in entertainment within a society is presented as a stochastic
process of interactions of human dynamics. The calculations for the Japanese motion picture market based
on to the mathematical model agree very well with the actual residue distribution in time. LADYGAGA are
also analyzed using the data of SNS as well.
This new algorithm mixes two or more images of different types and sizes by employing a shuffling
procedure combined with S-box substitution to perform lossless image encryption. This combines stream
cipher with block cipher, on the byte level, in mixing the images. When this algorithm was implemented,
empirical analysis using test images of different types and sizes showed that it is effective and resistant to
attacks.
Implementation Of Grigoryan FFT For Its Performance Case Study Over Cooley-Tu...ijma
This document discusses the implementation and performance comparison of two fast Fourier transform (FFT) algorithms - the Cooley-Tukey FFT and the Grigoryan FFT - on three Xilinx FPGAs. The Grigoryan FFT uses a decomposition based on paired transforms, while the Cooley-Tukey FFT uses a radix-2 decomposition. Both algorithms were implemented on Virtex-II Pro, Virtex-5, and Virtex-4 FPGAs. The results showed that the Grigoryan FFT operated at higher sampling rates and was faster than the Cooley-Tukey FFT. Additionally, the Virtex-5 FPGA provided the highest speed for implementing the Grigoryan FFT compared
E XPLORING T HE S ELF -E NHANCED M ECHANISM OF I NTERACTIVE A DVERTISING...ijma
The document discusses interactive advertising and explores its self-enhanced dissemination mechanism. It analyzes three cases of interactive ads: on official websites, social media, and mobile media. It identifies micro, meso, and macro level mechanisms. At the micro level, core, inner, and outer interaction reveal the process of interacting with content, people, and computers. The meso level shows self-fission-type spread and communication approaches. At the macro level, the communication effect of interactive advertising achieves a spiral increasing effect. The analysis enriches research on the communication effects of interactive advertising.
This document describes a student project to create an algorithm that generates a short music playlist based on one or more seed songs. The algorithm is based on a previous published method called AutoDJ that uses Gaussian process regression with a kernel function to predict a user's preference for additional songs based on attributes of the seed songs. The key aspects of the student's algorithm include using a kernel that is trained on song attribute data to learn song similarities, and generating a playlist sorted by predicted user preference for each song. The student's model and data differ from the original AutoDJ method primarily due to having a mix of continuous and categorical song attributes rather than purely categorical data.
The document discusses developing a model to compose monophonic world music using deep learning techniques. It proposes using a bi-axial recurrent neural network with one axis representing time and the other representing musical notes. The network will be trained on a dataset of MIDI files describing pitch, timing, and velocity of notes. It will also incorporate information from music theory on scales, chords, and other elements extracted from sheet music files. The goal is to generate unique musical sequences while adhering to music theory rules. The model aims to address the problem of composing long durations of background music for public spaces in an automated way.
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
Presentation for my PhD defense, Music Technology Group, Barcelona, Spain.
Resources: http://compmusic.upf.edu/node/304
Short abstract:
Automatically describing contents of recorded music is crucial for interacting with large volumes of audio recordings, and for developing novel tools to facilitate music pedagogy. Melody is a fundamental facet in most music traditions and, therefore, is an indispensable component in such description. In this thesis, we develop computational approaches for analyzing high-level melodic aspects of music performances in Indian art music (IAM), with which we can describe and interlink large amounts of audio recordings. With its complex melodic framework and well-grounded theory, the description of IAM melody beyond pitch contours offers a very interesting and challenging research topic. We analyze melodies within their tonal context, identify melodic patterns, compare them both within and across music pieces, and finally, characterize the specific melodic context of IAM, the rāgas. All these analyses are done using data-driven methodologies on sizable curated music corpora. Our work paves the way for addressing several interesting research problems in the field of music information research, as well as developing novel applications in the context of music discovery and music pedagogy.
This document summarizes a comparison of six sound analysis/synthesis systems conducted at the 2000 International Computer Music Conference. Each system analyzed the same set of 27 varied input sounds and output the results in a common format (SDIF). The comparison describes each system, compares them in terms of availability, sound models used, interpolation models, noise modeling, parameter mutability, required analysis parameters, and artifacts. The goal was not competition but providing information to help musicians choose appropriate analysis/synthesis tools.
Design and Analysis System of KNN and ID3 Algorithm for Music Classification ...IJECEIAES
Each of music which has been created, has its own mood which is emitted, therefore, there has been many researches in Music Information Retrieval (MIR) field that has been done for recognition of mood to music. This research produced software to classify music to the mood by using K-Nearest Neighbor and ID3 algorithm. In this research accuracy performance comparison and measurement of average classification time is carried out which is obtained based on the value produced from music feature extraction process. For music feature extraction process it uses 9 types of spectral analysis, consists of 400 practicing data and 400 testing data. The system produced outcome as classification label of mood type those are contentment, exuberance, depression and anxious. Classification by using algorithm of KNN is good enough that is 86.55% at k value = 3 and average processing time is 0.01021. Whereas by using ID3 it results accuracy of 59.33% and average of processing time is 0.05091 second.
This document presents an architecture for automatically tagging and clustering song files according to mood. It uses audio content analysis to extract music features like intensity, timbre and rhythm to map songs to a 2D emotional space. It also analyzes song lyrics to map songs to a 1D emotional space representing positive or negative emotion. The results from both analyses are merged and used to classify songs into mood clusters like contentment, depression, exuberance and anxious/frantic. The system aims to enable context-aware playlist generation and music recommendation based on mood. It achieves 70.7% accuracy in mood detection on over 70 Indian songs.
Identifying Successful Melodic Similarity Algorithms for use in Musicinscit2006
The document discusses research on identifying successful algorithms for measuring melodic similarity that can be used in music information retrieval applications. It describes a listening experiment conducted with 34 subjects to rate the similarity of melodic variations on a theme. Correlation analysis was used to check the consistency of subject ratings. Two main algorithms are discussed that measure melodic similarity based on pitch differences between notes over time windows, with weights applied based on duration or metrical stress. The researchers are analyzing the results to determine which musical features lead to the most accurate algorithms according to human similarity judgments.
Multiple machine learning models were developed and compared to predict if a song would become a top 10 hit based on audio features. The models included logistic regression, decision trees, random forests, and naive Bayes. Random forests had the highest accuracy of 86.78%, slightly outperforming decision trees and logistic regression. The most important predictor was found to be the timbre_0 feature. Overall, the research demonstrated that popularity can be predicted based on analyzing music signals with machine learning.
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET Journal
This document summarizes research on implementing musical instrument recognition using convolutional neural networks (CNNs) and support vector machines (SVMs). The researchers aim to preprocess audio excerpts into images and use CNNs to achieve high accuracy in instrument classification. They will then combine CNN and SVM classifications and take a weighted average to achieve even higher accuracy. The document reviews several related works that used features like MFCCs and classifiers like SVMs, GMMs, and neural networks for instrument recognition. The researchers intend to use mel spectrograms and MFCCs to represent audio as images for CNN classification and improve music information retrieval and organization.
Music fingerprinting based on bhattacharya distance for song and cover song r...IJECEIAES
People often have trouble recognizing a song especially, if the song is sung by a not original artist which is called cover song. Hence, an identification system might be used to help recognize a song or to detect copyright violation. In this study, we try to recognize a song and a cover song by using the fingerprint of the song represented by features extracted from MPEG-7. The fingerprint of the song is represented by Audio Signature Type. Moreover, the fingerprint of the cover song is represented by Audio Spectrum Flatness and Audio Spectrum Projection. Furthermore, we propose a sliding algorithm and k-Nearest Neighbor (k-NN) with Bhattacharyya distance for song recognition and cover song recognition. The results of this experiment show that the proposed fingerprint technique has an accuracy of 100% for song recognition and an accuracy of 85.3% for cover song recognition.
Harmony Search as a Metaheuristic AlgorithmXin-She Yang
This document discusses Harmony Search, a metaheuristic algorithm inspired by music improvisation. It outlines the fundamental steps of Harmony Search, analyzing why it is an effective metaheuristic approach. It also briefly reviews other popular metaheuristics like particle swarm optimization, comparing their similarities and differences to Harmony Search. The document provides examples applying Harmony Search to optimization problems and discusses ways to improve and develop new variants of the algorithm.
Human Perception and Recognition of Musical Instruments: A ReviewEditor IJCATR
Musical Instrument is the soul of music. Musical Instrument and Player are the two fundamental component of Music. In
the past decade the growth of a new research field targeting the Musical Instrument Identification, Retrieval, Classification,
Recognition and management of large sets of music is known as Music Information Retrieval. An attempt to review the methods,
features and database is done.
Performance Comparison of Musical Instrument Family Classification Using Soft...Waqas Tariq
Nowadays, it appears essential to design automatic and efficacious classification algorithm for the musical instruments. Automatic classification of musical instruments is made by extracting relevant features from the audio samples, afterward classification algorithm is used (using these extracted features) to identify into which of a set of classes, the sound sample is possible to fit. The aim of this paper is to demonstrate the viability of soft set for audio signal classification. A dataset of 104 (single monophonic notes) pieces of Traditional Pakistani musical instruments were designed. Feature extraction is done using two feature sets namely perception based and mel-frequency cepstral coefficients (MFCCs). In a while, two different classification techniques are applied for classification task, which are soft set (comparison table) and fuzzy soft set (similarity measurement). Experimental results show that both classifiers can perform well on numerical data. However, soft set achieved accuracy up to 94.26% with best generated dataset. Consequently, these promising results provide new possibilities for soft set in classifying musical instrument sounds. Based on the analysis of the results, this study offers a new view on automatic instrument classification
survey on Hybrid recommendation mechanism to get effective ranking results fo...Suraj Ligade
These days clients are having exclusive
requirements towards advancements, they need to hunt tunes
in such circumstances where they are not ready to recall tunes
title or melody related points of interest. Recovery of music or
melodies substance is one of the hardest errands and testing
work in the field of Music Information Retrieval (MIR). There
are different looking techniques created and executed, yet
these seeking strategies are no more ready to inquiry tunes
which required by the clients and confronting different issues
like programmed playlist creation, music suggestion or music
pursuit are connected issues. In past framework client seek
the tune with the assistance of tune title, craftsman name and
whatever other related points of interest so this strategy is
exceptionally tedious. To beat this issue singing so as to look
tune or murmuring a segment of it is the most regular
approach to seek the tune. This hunt strategy is the most
helpful when client don't have entry to sound gadget or client
can't review the traits of the tune such as tune title, name of
craftsman, name of collection. In proposed framework client
have not stress over recalling the tune data and this technique
is not tedious. In this strategy we utilize the data from a
client's hunt history and in addition the normal properties of
client's comparative foundations. Cross breed proposal
component utilizes the substance construct recovery
framework situated in light of utilization of the sound data
such as tone, pitch, mood. This component used to get exact
result to the client. The more imperative idea is clients ready
to work their gadgets without manual information orders by
hand. It is simple and basic system to perform music look.
This document discusses analyzing music data extracted from Spotify using various data science techniques. It describes collecting audio feature data for Billboard top songs from 1958-2017 and Grammy winning albums from 1959-2018. Exploratory analysis was performed on song attributes like duration, tempo and loudness. Principal component analysis and k-means clustering were used to group songs based on similarities in audio features like energy, danceability and acousticness. The analysis revealed patterns in attributes of popular music over time that could aid music recommendation systems.
This document discusses the use of artificial intelligence in organized sound as surveyed in the journal Organised Sound. It provides an overview of key AI technologies like Auto-Tune audio processing that can correct pitch and organize sound. Applications discussed include general sound classification, open sound control for music networking, and time-frequency representations for sound analysis and resynthesis. The document also outlines recent research on intelligent composer assistants, responsive instruments, and recognition of musical sounds. Finally, it discusses the future of AI in organizing sound through planning and machine learning.
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET Journal
This document presents research on classifying music genres using machine learning algorithms. The researchers built multiple classification models using the Free Music Archive dataset and compared the models' performance in predicting genre accuracy. Some models were trained on mel-spectrograms of songs and their audio features, while others used only spectrograms. The researchers found that a convolutional neural network model trained solely on spectrograms achieved the highest accuracy among the tested models. The goal of the research was to develop a machine learning approach for automatic music genre classification that performs better than existing methods.
Graphical visualization of musical emotionsPranay Prasoon
The document discusses graphical visualization of musical emotions using artificial neural networks. 13 audio features are extracted from Hindustani classical music clips labeled as happy or sad. An ANN model with backpropagation algorithm is trained on 70% of data, validated on 15% and tested on 15%. The model correctly classified 15 of 17 happy clips and 21 of 22 sad clips. Testing was repeated 10 times with over 90% accuracy each time, showing the model effectively recognizes musical emotions. Future work involves expanding the model to recognize additional emotions and incorporating physiological features.
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
1. The document describes a study that uses a Convolutional Neural Network (CNN) model to classify music genres based on labeled Mel spectrograms of audio clips.
2. A CNN model is trained on a dataset of 1000 audio clips across 10 genres. The trained model is then used to classify new, unlabeled audio clips by genre based on their Mel spectrogram representation.
3. CNNs are well-suited for this task as their convolutional layers can extract hierarchical features from the Mel spectrogram images that are indicative of different genres. The study aims to develop an automated music genre classification system using deep learning techniques.
This document provides an overview of optical music recognition (OMR) systems. It discusses the typical architecture of an OMR system, which includes image preprocessing, recognition of musical symbols, reconstruction of musical notation, and construction of a machine-readable symbolic representation. The document also reviews common techniques used in OMR systems, such as binarization, and describes some of the challenges of OMR, particularly for handwritten musical scores which exhibit greater variability in symbols.
Similar to The kusc classical music dataset for audio key finding (20)
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
The kusc classical music dataset for audio key finding
1. The International Journal of Multimedia & Its Applications (IJMA) Vol.6, No.4, August 2014
THE KUSC CLASSICAL MUSIC DATASET FOR
AUDIO KEY FINDING
Ching-Hua Chuan1 and Elaine Chew2
1School of Computing, University of North Florida, Jacksonville, Florida, USA
2School of Engineering and Computer Science, Queen Mary, University of London,
London, UK
ABSTRACT
In this paper, we present a benchmark dataset based on the KUSC classical music collection and provide
baseline key-finding comparison results. Audio key finding is a basic music information retrieval task; it
forms an essential component of systems for music segmentation, similarity assessment, and mood
detection. Due to copyright restrictions and a labor-intensive annotation process, audio key finding
algorithms have only been evaluated using small proprietary datasets to date. To create a common base for
systematic comparisons, we have constructed a dataset comprising of more than 3,000 excerpts of classical
music. The excerpts are made publicly accessible via commonly used acoustic features such as pitch-based
spectrograms and chromagrams. We introduce a hybrid annotation scheme that combines the use of title
keys with expert validation and correction of only the challenging cases. The expert musicians also provide
ratings of key recognition difficulty. Other meta-data include instrumentation. As demonstration of use of
the dataset, and to provide initial benchmark comparisons for evaluating new algorithms, we conduct a
series of experiments reporting key determination accuracy of four state-of-the-art algorithms. We further
show the importance of considering factors such as estimated tuning frequency, key strength or confidence
value, and key recognition difficulty in key finding. In the future, we plan to expand the dataset to include
meta-data for other music information retrieval tasks.
KEYWORDS
Audio Key Finding, Ground Truth, Tuning, Dataset, Evaluation
1. INTRODUCTION
In this paper, we present a publicly available dataset for audio key finding. In tonal music, the key
establishes the context in which the roles of pitches and chords are defined. Determining key
from audio recordings is one of the most important music information retrieval (MIR) tasks; key
finding forms an essential component of systems for music segmentation, similarity assessment,
and mood detection. Due to issues such as copyright restrictions and the lack of annotated ground
truth, algorithms are often evaluated using relatively small datasets from private collections. In
addition, tonal complexity varies significantly in different genres and stylistic periods. Therefore,
simply using correct rates and dataset size to judge the performance of an audio key finding
system is a far from ideal approach. A commonly accessible dataset consisting of adequate
examples with sufficient variety and accurate annotations is thus needed.
The KUSC dataset presented in this paper consists of 15-second excerpts of compositions by
Bach, Mozart, and Schubert, including symphonies, concertos, preludes, fugues, sonatas, quartets
and more. We focus on the global key by examining only the first and last 15 seconds of the
DOI : 10.5121/ijma.2014.6401 1
2. The International Journal of Multimedia & Its Applications (IJMA) Vol.6, No.4, August 2014
recording—typically when the key of the piece is established and when the piece returns to the
original key, respectively—to reduce the possibility of key modulations. A common key labelling
method simply uses the key in the title, such as Symphony in G major, as ground truth.
While highly efficient, we show that this method can sometimes be prone to error. We introduce a
hybrid key annotation scheme that combines the use of title keys with selective manual validation
and correction of challenging cases. The manual annotations by expert musicians are
accompanied by ratings of key recognition difficulty, on a 5-point Likert scale, that provide
valuable information for evaluation of key-finding algorithms. Meta-data related to
instrumentation are also listed with the excerpt as tags.
For copyright reasons, the excerpts are shared in the form of commonly used and irreversible
acoustic features instead of audio waves. The acoustic features include the constant-Q
spectrogram, chromagram, and other mid-level features proposed by existing algorithms.
With the goal to develop the dataset as a benchmark for audio key finding, we conducted a series
of experiments using existing algorithms to provide an initial baseline for performance
comparisons. We use the dataset to test four existing audio key finding algorithms and report their
individual and combined scores for key finding accuracy. We further demonstrate that it is
important to consider in the evaluations factors such as tuning frequency, key
strength/confidence, and key recognition difficulty.
The recorded ensemble’s or instrument’s tuning directly impacts an algorithm’s chance of
success. Instrumentalists and ensembles may tune to reference frequencies different from A440;
for example, early music ensembles may tune to an A of 415 Hz, which would currently be heard
as an Ab. We first apply two tuning estimation methods and compare the difference between their
tuning frequency outputs. We also study the difference in estimated tuning between the first and
last 15 seconds of the same piece.
Using straightforward methods for determining key strength or confidence from the respective
models, we assessed the relation between the algorithm’s accuracy and the confidence or key
strength value. Based on the combined score, we investigate the relationship between key
recognition difficulty and instruments/music styles, and the algorithms’ accuracy.
Although the dataset is originally designed for audio key finding, we plan to expand the dataset
with more data for other MIR tasks such as automatic chord recognition and instrument
identification.
The remainder of the paper is organized as follows: Section 2 provides a general introduction to
audio key finding systems and current methods for evaluating them; Section 3 describes the
KUSC Classical music dataset, and the meta-data provided with the dataset; Section 4 introduces
the hybrid approach to key annotation and the rating of key-finding difficulty; Section 5 presents
the benchmark evaluations of four state-of-the-art audio key finding algorithms using the KUSC
dataset, followed by the conclusions.
2
2. AUDIO KEY FINDING AND RELATED WORK
2.1. Problem Definition and Background
The task of audio key finding is to identify the key of a music composition from its audio
recording. Polyphonic music comprise of multiple streams of notes sounded simultaneously, such
as music with multiple parts played simultaneously by different instruments as shown in the score
3. The International Journal of Multimedia & Its Applications (IJMA) Vol.6, No.4, August 2014
in Figure 1. Each part is a sequence of note and silent events, and each note event can be
described by a vector of properties, including <pitch, onset time, and duration>. For example, the
first three note events in the part that shown on the top of the score in Figure 1 are <
, 1, 0.5,
) consists of pitch class (bb) and register (4) that indicates the height of the pitch.
3
, 1.5, 0.125 and
, 1.625, 0.125, in which onset time and duration are presented in beats.
A pitch (
There exists 12 pitch classes including {c, c#/db, d, d#/eb, e, f, f#/gb, g, g#/ab, a, a#/bb, b}.
In Western tonal music, the key refers to the pitch context of a piece of music, and is represented
by the tonal center (a.k.a. the tonic), the most stable pitch class (for example, A or B) in the key.
The tonal center provides a reference for the identity and function of all other pitches in the piece.
For example, in a piece of Eb major key as shown in Figure 1, pitch eb is the tonic and pitch bb is
the fifth (five scale steps up from the pitch eb), the two most stable pitches in the piece. In
contrast, pitches such as a and b are less likely to appear in a piece of Eb major. Therefore, the
occurrence of the 12 pitch classes is an important indicator for the key. There are 24 keys in
Western tonal music, 12 major keys and 12 minor keys. In this study we concentrate on the global
key, the one key that operates over the entire length of a piece of music.
2.2. A General Architecture of Audio Key Finding Systems
A general architecture of audio key finding systems is illustrated in Figure 1. The input of the
system is an audio recording of a music composition or improvisation, and the output is the key.
An audio key finding system generally consists of two components: the first component focuses
on audio signal processing to extract features related to the pitch classes content of the sample,
while the second component determines the key based on the extracted pitch class information.
Figure 1. A general architecture of audio key finding systems
In the audio signal-processing component, the musical signal is first pre-processed by removing
silence and reducing noise. It is then divided into overlapped frames for spectral analysis. The
purpose of spectral analysis on short-duration frames is to obtain local information about note
events, particularly pitch information. Pitch information is related to the frequency of the signal.
In the equal temperament system, the interval between any two pitches corresponds to a simple
ratio with whole numbers indicating the relation between their frequencies. Using the standard
tuning 440Hz for pitch A4, the frequency of the pitch p that is n semitones away from pitch A4
can be calculated by:
Frequency
= 440 × 2
,
where n is a positive integer if the pitch is higher than A4 and a negative integer if it is lower.
The result of spectral analysis is further converted into a distribution representing the occurrence
of the 12 pitch classes. Various approaches have been proposed for the generation of pitch class
4. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
distributions with the goal of increasing the accuracy of pitch information. A detailed survey of
existing algorithms is summarized in [1].
The key finding component in Figure 1 consists of representation models for the 24 keys and an
algorithm for determining the key by comparing the generated pitch class distribution and the
representation models. The representation model is usually defined as a typical pitch class
distribution for a specific key. The model can be constructed based on psychoacoustic
experiments [2], music theory [3], or learned from a dataset [4]. The comparison between the
generated pitch class distribution and the representation model in a key finding algorithm usually
involves calculating a distance function [5], correlations [6], or posterior probabilities [4]. The
key that has the shortest distance, or the highest correlation or probability, is chosen as the output.
2.3. Related Work
Table 1 summarizes existing approaches for audio key finding with their reported accuracy and
datasets used for evaluations.
4
Table 1. A summary of existing approaches for audio key finding
Approach Accuracy Dataset, Ground Truth, and Availability
Chai Verco [7] 84% 10 classical piano pieces
Ground truth was manually annotated by the author.
Chuan Chew [5, 8] 75% (410 classical),
70% (Chopin)
410 classical pieces / 24 Chopin preludes
Title key was used as ground truth.
Gómez Herrera [6] 64% 878 excerpts of classical music
Title key was used as ground truth.
zmirli [9] (template-based)
86% 85 classical pieces
zmirli [10] (local
key)
84%(pop), 76.9%
(classical)
17 pop / 17 classical excerpts
Ground truth was manually annotated.
Peeters [4] 81% 302 classical pieces
Title key was used as ground truth.
Papadopoulos
Peeters [1]
80.21% (classical),
61.31% (pop)
2 datasets:
• 5 movements of Mozart piano sonatas; ground truth
(key and chord progression) was manually
annotated; info. about composer and composition
title is available
• 16 real audio songs annotated by IRCAM
(QUAERO project)
Schuller Gollan
[11]
78.5% (pop),
88.8 (classical),
63.4 (jazz)
520 pieces in pop, jazz, classical and others
Ground truth was labeled by three musicians without
disagreements.
Tracks with key modulation were removed.
Sun, Li Ma [12] 68.7% in average 228 pieces in classical, pop, jazz, new age, and folk
Ground truth was labeled either by students or from
published music books.
Weiss [13] 92.2%, 93.7%, and
97%
3 datasets:
• 115 tracks of 29 symphonies: title key was used as
the ground truth; info. about composer and
symphony no. is available
• Saarland Music Data [14]: 126 tracks performed by
students; ground truth was manually annotated;
recordings are available in mp3
• 237 piano sonatas [15]: title key was used as the
ground truth; info. about composer, composition
title, and performer is available.
5. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
3. THE KUSC CLASSICAL MUSIC DATASET
This section describes the KUSC classical music dataset for audio key finding, focusing on
acoustic features, meta-data, and data formats that are made available with the dataset.
3.1. Music in the Dataset
The dataset used in this study was provided by Classical KUSC, a classical public radio station.
We selected compositions by Bach, Mozart and Schubert for audio key finding because they
represent three different styles with distinguishable levels of tonal complexity. In addition,
tonality is more clearly defined in these pieces than in more contemporary compositions. The
selected compositions include symphonies, concertos, sonatas, quartets, preludes, fugues, and
other forms with instruments including the recorder, violin, flute, piano, oboe, guitar, and choir.
For multi-movement works, we used only the first and last movement because these segments are
generally in the title key.
The entire dataset consists of 3224 different excerpts. We extracted two excerpts from each
recording: one containing the first 15 seconds and the other representing the last 15 seconds of the
recording. We only considered the 15-second segments of the recording because the key is more
likely to remain in the global key without modulations in these parts of the piece. Silence in the
beginning and the end of each recording were identified and removed using root-mean square
energy as described in [16].
The audio excerpts and their titles are not available for public access for copyright reasons.
However, low-level acoustic features related to audio key finding are made available for key
finding research as described in the following section.
3.2. Acoustic Features
Acoustic features related to tuning frequency and to presence of pitch classes are the most
important for audio key finding algorithms. Therefore, for each excerpt, we extracted frame-by-frame
features—the constant-Q spectrogram, chromagram, and pitch class profiles—related to
pitch classes using the fuzzy analysis center of effect generator algorithm proposed by Chuan and
Chew [5], Gómez’s harmonic pitch class profile [17], Lartillot and Toiviainen’s Matlab
implementation [16] of Gómez’s method, and Noland and Sandler’s hidden Markov models for
tonal estimation [18]. The features are given based on the tuning frequency estimated by two
existing methods [19, 20], as well as the standard tuning frequency (440 Hz for pitch A4).
5
3.2.1. Estimated Tuning Frequency
Here, we describe the two methods employed for estimating the tuning frequency.
The first is based on a method described by Müller and Ewert in [19]. The global tuning of a
recording is estimated by shifting the center of multi-rate filter banks to obtain an average
spectrogram based on several tuning frequencies. The deviation of the center that produces the
highest average value in the spectrogram is then identified as the estimated tuning frequency. In
this study, six deviation classes corresponding to shifts of {-1/2,-1/3,-1/4,0,1/4,1/3} semitones
from 440Hz were used.
The second is based on the technique outlined by Mauch and Dixon in [20]. Estimation of the
tuning frequency corresponds to the phase angle of discrete Fourier transform using circular
statistics. The frequency is wrapped onto the interval [-, ) where represents a quartertone.
Details of obtaining the estimated frequency can be found in [21].
6. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
Note that equal temperament is assumed in both tuning methods. To compare the differences
between the estimated tuning frequencies from the two methods, the output of [20] is mapped to
one of the categories in [19]: {427.4741, 431.6092, 433.6918, 440, 446.3999, 448.5539} Hz for
A4.
3.2.2. Pitch Class Distribution/Chromagram and Pitch-Based Spectrogram
Each pitch-based spectrogram consists of a series of vectors that represent spectral information
related to pitches as defined by their frequency ranges. For example, a pitch-based spectrogram
can be a matrix in which each row consists of 88 values for pitches A0 to C8. Pitch class
distributions or chromagrams, usually computed by folding values in pitch-based spectrograms
from pitches into pitch classes, are vectors of 12 values that relate to the distribution of 12 pitch
classes. Four algorithms [5, 16, 17, 18], to be outlined in Section 4.1.1, were used to extract
features for pitch-based spectrograms and pitch class distribution/chromagrams. In [5, 16, 17],
Fast Fourier Transform or Short Time Fourier Transform is used to produce spectral information.
In [18], the constant-Q transform is used. We briefly outline the concepts underlying these
methods.
In [17], Gómez detected spectral peaks using 120 bins per octave, a higher number of bins than
the standard 12 per octave for pitches. A triangular weighting function is used to reduce boundary
errors in the adjacent frequency bins. The resulting Harmonic Pitch Class Profile (HPCP) is a
vector of 120 values for 12 pitch classes. The chromagram in Lartillot and Toiviainen’s MIR
Toolbox [16] is another implementation of Gómez’s approach with slightly different default
parameter settings including frame size, hop size, pitch range and sampling frequency.
Chuan and Chew’s fuzzy analysis technique in [5] also generates a 12-value Pitch Class Profile
(PCP) for audio key finding. The technique uses knowledge of the harmonic series to reduce the
errors in noisy low frequency pitches. Pitches in low frequency are important to key finding
because they usually indicate the root of the chord played by the bass. However, it is difficult to
identify these pitches correctly because of the logarithmic scale in pitch frequency.
We also used Queen Mary University of London’s Vamp plug-in [22] to generate a constant-Q
spectrogram [23] for each excerpt. The spectrogram is used in Noland and Sandler’s key finding
method in [18]. The constant-Q spectrogram is the pitch-based spectrogram used to create the
folded chromagram.
Figure 2 provides an example of the extracted acoustic features for a recording of Bach’s Prelude
BWV 552 as shown in (a). The color specifies the significance of the observed frequency or
pitch, with the color of red for high values and blue or black for lower values. The resolution in
the grids depends on the algorithm’s default setting in parameters such as frame size and hop size.
Figure 2 (b) and (d) are generated by QM Vamp plug-in and (e) and (f) are generated using
Matlab, which result in similar color scheme for the figures using the same software. The y-axis
represents the 12 pitch classes in (d), (e), and (f) while the y-axis in (b) lists a range of detected
pitches. In Figure 2 (c), the y-axis is the 120 bins for the 12 pitch classes, which result in the
different visualization from other figures in Figure 2.
6
7. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
7
Figure 2. (a) The first three bars of Bach Prelude in Eb major BWV552, and extracted acoustic
features: (b) spectrogram [22], (c) HPCP [17], (d) chromagram [23], (e) mirchroma [16] and (f)
fuzzy analysis [5].
3.3. Meta-data
In addition to acoustic features and manual annotations, we also provided meta-data related to the
instrumentation. The meta-data consist of the tags, listed in Table 2, associated with each piece of
music.
Two types of information were manually annotated for the excerpts: ground truth key and key
recognition difficulty. We took a hybrid approach to ask musicians to manually annotate the key
only for the challenging excerpts without exhausting the annotators. We also noted the key
recognition difficulty by asking another three musicians to indicate how difficult it was for them
to determine the key using a 5-point Likert scale. The details for annotating ground truth key and
key recognition difficulty are described in Section 4.
For researchers to systematically compare the performance of their newly developed algorithm
with existing ones, we computed an average weighted score for each excerpt using the four audio
key finding algorithms [5, 16, 17, 18]. For each excerpt, each algorithm receives a score based on
the relation between their output key and the ground truth: 1 point (a full score) if the output is
identical to the ground truth, 0.5 if they are a perfect fifth apart, 0.3 if one is the relative
major/minor of the other, and 0.2 if one is the parallel major/minor of the other. An average is
then calculated using the four scores obtained by the algorithms. The details of the score
calculation and experiments are described in Section 5.
Table 2. Tags and their associated number
1 Soprano 7 Choir 13 Symphony 19 Harpsichord 25 Clarinet
2 Alto 8 Cello 14 Orchestra 20 Fortepiano 26 Oboe
3 Tenor 9 Violin 15 Concerto 21 Piano 27 Trumpet
4 Bass 10 Viola 16 Lute 22 Sonata 28 Bassoon
5 Baritone 11 String quartet 17 Guitar 23 Recorder 29 Horn
6 Chorus 12 Organ 18 Harp 24 Flute 30 Posthorn
8. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
3.4. Data Formats
The dataset is available via www.unf.edu/~c.chuan/KUSC/. Acoustic features described in Section
3.2.2 are pre-computed using different settings for the pre-defined parameters: frame size, hop
size, pitch range, sampling frequency, number of frequency bins per octave. The dataset can be
downloaded as delimited text files and MySQL database files.
4. ANNOTATING GROUND TRUTH AND KEY RECOGNITION DIFFICULTY
This section introduces our procedure to obtaining accurate ground truth key information for the
dataset by first identifying challenging musical fragments, then manually annotating them with
key information and key finding difficulty ratings.
4.1. A Hybrid Approach to Ground Truth Annotations
We first assigned the title key as ground truth for every excerpt in the dataset. To avoid
exhaustive manual examination of the entire dataset, we implemented five audio key finding
systems and used them to focus the manual examination on a subset consisting of challenging
excerpts in which the title key may not be the key. Each excerpt was first supplied as the input to
the five systems, which produces five key outputs. If no more than two out of the five systems
reported the same answer as the title key, the excerpt was labeled as a challenging case requiring
further manual examination. This challenging set was re-examined by three professional
musicians and their keys manually labelled by them.
4.1.1. Key Finding Algorithms
In this section, we briefly describe the audio key finding systems implemented for the
construction of the challenging set in detail, emphasizing the uniqueness of each system. More
details can be found in [24]. The methods we have chosen are mainly template-based approaches,
in which a pitch class distribution or chromagram is compared to a template key representation.
Note that systems that rely on training data were not implemented because the title key (ground
truth) may not be the key of the excerpt.
The five systems implemented include algorithms using Krumhansl and Schmuckler’s (K-S)
probe tone profile [2], Temperley’s modified version of the K-S model [3], zmirli’s template-based
correlation model [9], Gómez’s harmonic pitch class profile (HPCP) method [17], and our
fuzzy analysis center of effect generator (FACEG) algorithm [5]. The first four algorithms all
compute a correlation coefficient between the pitch class distribution of the excerpt and the
template pitch class profile for each key. They differ mostly in their template pitch class profiles
as shown in Figure 3. We thus implemented a basic approach to generate pitch class distributions
using the Fast Fourier Transform.
In [5], we proposed an audio key finding system called fuzzy analysis center of effect generator.
A fuzzy analysis technique is used for PCP generation using the harmonic series to reduce the
errors in noisy low frequency pitches. The PCP is further re-fined periodically using the current
key information. The representation model used in the system is Chew’s Spiral Array model [25,
26], a representation of pitches, chords, and keys in the same 3-dimensional space with distances
reflecting their musical relations. The Center of Effect Generator (CEG) key finding algorithm
determines key via a nearest-neighbor search in the Spiral Array space in real-time: an
instantaneous key answer is generated in each window based on past information. The CEG is the
only model that considers pitch spelling.
8
9. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
9
Figure 3. C major and C minor key profiles proposed by Krumhansl, Temperley, zmirli, and Gómez.
Figure 4. The flowchart of the hybrid approach for ground truth annotations
4.1.2. Experiments and Results
The process of examining the accuracy of the title key and re-labeling the ground truth is
illustrated in Figure 4. An excerpt was first examined by the five implemented audio key finding
systems. If more than two systems out of the five reported the key identical to the one in the title,
10. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
we saved the title key as the ground truth. If not, the excerpt was moved to the challenging set and
re-examined by two musicians. If the two musicians had the same key for the excerpt, then we
saved the re-labeled key as the ground truth. If the two musicians disagreed, the excerpt was
examined by the third musician. Experiment results were reported by comparing the accuracy of
the five audio key finding systems using the title key and the relabelled key as ground truth.
Out of the 3324 excerpts, 727 excerpts (21.87%) were moved to the challenging set that was
examined by musicians. Table 3 lists the distribution of the challenging set, in absolute numbers
and as a percentage of the number of excerpts we considered by each composer.
10
Table 3. Details of the entire dataset and the challenging set by Bach, Mozart, and Schubert
Composer Total # of recordings Challenging set (first 15 sec) Challenging set (last 15 sec)
Bach 553 245 (44.30%) 244 (44.12%)
Mozart 873 75 (8.59%) 98 (11.23%)
Schubert 236 24 (10.17%) 41 (17.37%)
Figures 5 and 6 show the experiment results using the title key and the re-labeled key
respectively. The reported key are divided into nine categories based on its relation to the ground
truth: correct (Cor), dominant (Dom), sub-dominant (Sub), parallel major/minor (Par), relative
major/minor (Rel), same mode with the root one semitone higher (M+1), same mode with the root
one semitone lower (M-1), same mode but not in the previous categories (MO), and relations not
included in any of the previous categories (Others).
Figure 5. Key finding results for the challenging dataset using the title key as ground truth.
11. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
11
Figure 6. Key finding results for the challenging dataset using the re-labeled key as ground truth.
Comparing Figure 5 (a), (b) with Figure 6 (a), (b), it can be observed that most of the incorrect
answers using the title key fall into the last three categories, especially (M-1), which indicates that
tuning may be an issue in key finding for Bach’s pieces. Similar results can be observed in
Mozart’s pieces by comparing Figure 5 (c), (d) with Figure 6 (c), (d). However, the significant
proportion in parallel major/minor (Par) category in Figure 5 (d) indicates that many Mozart
pieces actually end in the parallel key with respect to the title key. In Figure 5 (e), the reported
keys are more evenly distributed among the categories for Schubert’s pieces using the title key as
the ground truth. This implies more complex compositional strategies that the composer
employed to start the piece. The result in Figure 5 (f) for the last 15-second of Schubert’s pieces
is more similar to Mozart’s than Bach’s, showing that the pieces end in the parallel major/minor
key or a key that has the same mode as the title key. More detailed analysis can be found in [24].
4.2. Key Recognition Difficulty Annotations
In addition to reporting the key of the excerpts, we also collected data about key recognition
difficulty. Three musicians were asked to listen to the excerpts in the challenging dataset, and use
the 5-point Likert scale, with 1 indicating the easiest level and 5 the most difficult, to rate the
difficulty that they experienced in recognizing the key. One objective of this data collection step
was to observe how individuals perceive this difficulty and the difference between individuals’
responses. Therefore, no formal definition of key recognition difficulty, nor examples of each
level, were given to the musicians.
We also studied the consistency of individual rater’s annotations by examining the rater’s
responses to multiple versions of the same compositions, i.e., recordings of the same piece by
different performers. A consistent rater should indicate similar difficulty levels and identical, or
closely related, keys for the different recordings of the same piece.
12. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
Figure 7 shows the rater’s consistency according to two measures: (a) the weighted average of
standard deviations of rated difficulty levels; and, (b) the ratio of number of distinct keys to total
number of duplicate excerpts. The first measure is the average of the standard deviations,
computed for each set of multiple recordings of the same piece, weighted by the number of
recordings in the set. This measure is bounded between zero and 2.5, when two scores are at the
extreme ends. The second measure serves as a confusion indicator for determining the key for
different versions of the same piece. For the given dataset, this measure is bounded between
0.248 and 1, when every duplicated excerpt is assigned a different key. According to both
measures, the raters’ labels are found to be consistent as the results are closer to the lower bounds
of each measure. We also proposed ways to automatically predict key recognition difficulty based
on acoustic features as described in [27].
12
Figure 7. Consistency of individual rater’s labels
5. EVALUATIONS USING THE KUSC DATASET
After we collected data for the ground truth and key recognition difficulty, we used the dataset
and the annotated data to test several state-of-the-art audio key-finding algorithms as described in
this section. The test focuses on five aspects: tuning, audio key finding accuracy against re-labeled
ground truth, the relation between an algorithm’s accuracy and its confidence value, the
relation between an algorithm’s accuracy and human perceived key recognition difficulty, and the
relation between an algorithm’s accuracy and instruments/musical styles. The experiment’s
results not only provide a detail examination of the dataset, but also offer a baseline to which any
newly developed algorithm can be compared in the future.
5.1. Tuning
Figure 8 shows the difference, measured in number of semitones, between the estimated tuning
frequencies generated by the Chroma Toolbox [19] and non-negative least squares (NNLS) [20].
It can be observed that the outputs of the two tuning estimation methods are mostly the same
(2961 out of 3224 excerpts). However, the two methods reported different estimated tuning
frequencies, in some cases more than 0.8 semitones apart.
In addition to examining the difference between the outputs of the two tuning methods, we also
compared the estimated tuning frequencies of the first and last 15 seconds of the same
composition. Table 4 lists the percentage of excerpts in which the estimated tuning frequencies
are different for the beginning and ending segments. The percentage in Table 4 is relatively high
based given that the tuning usually stays the same throughout a recording. More studies will be
conducted to examine the reasons for the observed discrepancies.
13. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
13
Figure 8. The distance between the tuning frequencies estimated using the Chroma Toolbox [19] and
NNLS [20]
Table 4. The percentage of the estimated tuning frequencies that differ in the first and last 15 seconds
Tuning method Bach Mozart Schubert
Chroma toolbox [19] 12.45% 23.22% 17.45%
NNLS [20] 10.65% 21.49% 17.02%
5.2. Audio Key Finding Accuracy
We tested four audio key finding algorithms [5, 16, 17, 18] using the dataset. Since it is not the
focus of this paper to optimize the algorithms, we only implemented the basic versions of these
algorithms. We used their default settings for parameters such as frame size, hop size, and pitch
range to generate frame-by-frame pitch class profiles or chromagrams. We then calculated the
average profile over all frames.
For the algorithms described in [16, 17], the average profile was then compared to the pre-defined
templates for the 24 keys, and the template with the highest correlation value was then reported as
the key. For FACEG [5], the only algorithm that considered pitch spelling, we re-moved the
requirement of pitch spelling by considering all spellings for the purpose of comparison. The
average profile was used to generate the Center of Effect (CE) representing the tonal center in the
3-dimensional Spiral Array model [1]. The key was determined by searching for the key
representation, from among the pre-defined points representing the 24 keys, closest to the CE.
The algorithm in [18] constructs a hidden Markov model with 24 major and minor key states and
observations representing chord transitions. The key profiles in the implementation of the
algorithm are created from analysis of Bach’s Well Tempered Clavier, Book I. The implemented
program available via QM Vamp plugin [22] tracks local keys and key changes. For comparison
with the other approaches, we needed to select one key from the multiple local keys as the global
key for the excerpt. The selection was based on the total duration of the key, i.e., the local key
reported with the longest period was chosen to be the global key.
To prevent tuning frequencies that deviated from the norm from affecting the key finding result,
we used the output of the tuning estimation methods to specify the reference frequency for the
algorithms instead of fixing it at 440Hz for A4. For conciseness, in this paper, we only report the
results using the estimated frequency from the Chroma Toolbox [19].
Table 5 shows the key finding accuracy, overall weighted scores, for the four algorithms. Each
weighted score is calculated based on the relation of the output key and the ground truth key (as
described in Section 3.4), and converted to a percentage by dividing by the total number of
excerpts. Note that the key finding program in the MIR Toolbox [16] reported the highest
weighted score, with the HPCP [17] reporting one very close to it. The low percentage in the
result of [18] may be due to the fact that the program is too sensitive to local changes. The
14. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
availability of the KUSC dataset will allow researchers to further investigate the effect of local
key modulations on global key determination.
For each excerpt, we calculated the average of the four scores from the four algorithms. Figure 9
shows the distribution of average weighted scores for the dataset. A peak can be observed at 0.7,
with low values on either side. This information is helpful for evaluating a new audio key finding
algorithm: excerpts that receive an average score above 0.7 can be used as benchmarks for
evaluating the basic performance of a key finding algorithm.
14
Table 5. The four algorithms’ key finding accuracy scores for the KUSC classical music dataset
MIR HPCP FACEG QM
Weighted score %* 91.7% 90.34% 82.64% 59.74%
*identical = 1, dominant = 0.5, relative = 0.3, parallel = 0.2
Figure 9. The histogram showing the distribution of average weighted scores of the four algorithms
5.3. Key Strength and Confidence Value
We also used the dataset to examine the relation between key strength/confidence value and key
determination accuracy. In addition to the output key, audio key finding algorithms also report a
numeric value corresponding to key strength or confidence, a value indicating the certainty of the
output key. Such key strength/confidence values have also been used by zmirli as the weights in a
voting scheme to determine the global key from multiple local keys [9].
For the key finding algorithm in HPCP [17] and the MIR Toolbox [16], we used the correlation
between the pitch class profile and the 24 keys as the key strength value. The confidence value in
FACEG [5] is calculated using the distance between the CE and the two closest keys. Suppose d1
is the distance between the CE and the closest key and d2 is the same for the second closest key.
The confidence value is calculated as (d2 – d1)/ d1, the scaled difference between the distance to
the closest and second closest key. The key detector program in the QM Vamp plugin [22]
outputs key probabilities for all 24 keys for each frame. In this study, the key strength value is
given by the average of the global key’s probabilities across all frames.
Table 6 lists the correlation between key strength/confidence value and accuracy as represented
by the weighted score. Although some algorithms use this value to determine key, the value is not
strongly correlated with the accuracy of the output key.
The correlation coefficients between the key strength/confidence values of pairs of the four
algorithms are shown in Table 7. Except for the higher correlation between the key finding
programs in the MIR Toolbox and HPCP, both based on the Krumhansl-Schmuckler template-
15. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
matching algorithm [2], no strong correlations are observed between any other pairs of
algorithms.
15
Table 6. Correlation between the key strength/confidence value and accuracy scores for the four
algorithms.
MIR HPCP FACEG QM
Correlation 0.2666 0.2922 0.092 -0.0592
Table 7. The four algorithms’ key finding accuracy scores for the KUSC classical music dataset
MIR HPCP FACEG QM
MIR 1 - - -
HPCP 0.5479 1 - -
FACEG 0.1124 0.0653 1 -
QM 0.0633 -0.0144 -0.0030 1
5.4. Key Recognition Difficulty
To understand the connection between challenges faced by the algorithms and musicians, we
calculated the correlation between the algorithms’ average weighted accuracy score and the
annotators’ average key recognition difficulty ratings. Although the musicians were instructed to
use the 5-point scale for their ratings, they may not use the scale the same way. Therefore, in
addition to simply calculating the average of the three ratings, we also normalized and
standardized their ratings before calculating the correlation coefficient.
Table 8 shows the correlation results. The values in the table show negative correlation between
the difficulty ratings and the average weighted score generated by the four algorithms. However,
the correlation is not strong.
Table 8. Correlation between the key recognition difficulty and weighted accuracy scores.
Unprocessed
difficulty
Normalized
difficulty
Standardized
difficulty
Correlation -0.3026 -0.2817 -0.2970
Since the correlation in Table 8 was not strong, we further conducted t-tests. Based on the
distribution of the average weighted scores as shown in Figure 9, we propose a null hypothesis
(H0) that excerpts with an average weighted score less than 0.7 received the same mean difficulty
ratings as those with scores equal to or above 0.7. The alternative hypothesis (H1) is that excerpts
with a score less than 0.7 have a significant lower mean difficulty rating than ones with scores
equal to or above 0.7. Results of the t-tests, shown in Table 9, reject the null hypothesis. Thus
excerpts receiving a weighted accuracy score higher than 0.7 from the algorithms do have a mean
difficulty rating lower than those with weighted accuracy scores below 0.7.
Table 9. The result of the t-tests
Unprocessed
difficulty
Normalized
difficulty
Standardized
difficulty
H 1 1 1
p-value 1.4808 x 10-8 1.2505 x 10-7 2.0356 x 10-8
16. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
5.5. Instrumentation
For excerpts that contain instrument and music style tags as listed in Table 2, we calculated the
mean of average weighted scores for each tag to examine how instrumentation impacts the
algorithms’ accuracy. The result is shown in Figure 10 and it indicates that wind instruments (tags
no. 23 – 30), except for the oboe (tag no. 26), have higher average weighted scores than others. It
is also observed that excerpts labelled as vocal tracks (tag no. 1 – 7) report higher standard errors
than others. However, more data is needed in order to obtain more conclusive results.
16
Figure 10. The average weighted scores with standard errors for the 30 instrumentation tags listed in
Table 2.
6. CONCLUSIONS
A dataset consisting of 3224 classical music excerpts for audio key finding was described in the
paper. To share the dataset with the research community without violating copyright, excerpts
were represented as commonly used and irreversible acoustic features such as chromagrams and
constant-Q spectrograms instead of audio source files. In addition, manual annotations such as
ground truth for the global key and key recognition difficulty were collected. Meta-data relating
to instrumentation was also made available.
We conducted a series of experiments with the dataset using several algorithms pertaining to
aspects of the audio key finding process. First, for tuning frequency estimation, the two methods
tested agreed on the same frequency for over 90% of the excerpts. But 10% to 23% of the
beginning excerpts returned a different tuning frequency than that for the corresponding end
portion of the same piece. Four audio key finding algorithms were also tested with weighted
accuracy scores ranging from 60% to 92%. In addition to the estimated key, many algorithms also
produce a numerical value indicating the strength or confidence of the output key. We did not
observe clear relations between this key strength/confidence value and the algorithm’s accuracy.
We also analyzed the relations between the four algorithms’ key finding accuracy and musicians’
key recognition difficulty ratings. Although accuracy was not strongly correlated with difficulty
rating, the result showed that the mean key difficulty rating of excerpts receiving weighted scores
above 0.7 is significantly lower than that for excerpts with weighted accuracy scores lower than
0.7. Finally, we examined the relations between the algorithms’ accuracy and instruments tags.
Some patterns were observed, but these were not sufficient for drawing strong conclusions.
For future work, we plan to expand the dataset by including more excerpts and generating
features and annotations for other music information retrieval tasks.
17. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
REFERENCES
[1] Papadopoulos, H. Peeters, G. (2012) “Local key estimation from an audio signal relying on
harmonic and metrical structures,” IEEE Transaction on Audio, Speech, and Language Processing,
Vol. 20, No. 4, pp. 1297–1312.
[2] Krumhansl, C. L. (1990) “Quantifying tonal hierarchies and key distances,” Cognitive Foundations of
17
Musical Pitch, chapter 2, pp. 16–49, Oxford University Press, New York.
[3] Temperley, D. (1999) “What’s Key for Key? The Krumhansl-Schmuckler Key-Finding Algorithm
Reconsidered,” Music Perception, Vol. 17, No. 1, pp. 65–100.
[4] Peeters, G. (2006) “Musical key estimation of audio signal based on HMM modeling of chroma
vectors,” Proceedings of the International Conference on Digital Audio Effects, 127-131.
[5] Chuan, C.-H. and Chew, E. (2005) “Fuzzy analysis in pitch class determination for polyphonic audio
key finding,” Proceedings of the 6th International Conference on Music Information Retrieval.
[6] Gómez, E. and Herrera, P. (2004) “Estimating the tonality of polyphonic audio files: Cognitive versus
machine learning modeling strategies,” Proceedings of the 5th International Conference on Music
Information Retrieval, 92-95.
[7] Chai, W. and Verco, B. (2005) “Detecting of key change in classical piano music,” Proceedings of the
6th International Conference on Music Information Retrieval, 468-473.
[8] Chuan, C.-H. and Chew, E. (2006) “Audio key finding: considerations in system design, and case
studies on 24 Chopin’s preludes,” Ichiro Fujinaga, Masataka Goto, George Tzanetakis (eds),
EURASIP Journal on Applied Signal Processing, Special Issue on Music Information Retrieval.
[9] zmirli, Ö. (2005) “Template based key finding from audio,” Proceedings of the International
Computer Music Conference.
[10] zmirli, Ö. (2007) “Localized key finding from audio using non-negative matrix factorization for
segmentation,” Proceedings of the 8th International Conference on Music Information Retrieval, 195-
200.
[11] Schuller, B. Gollan, B. (2012) “Music theoretic and perception-based features for audio key
determination,” Journal of New Music Research, Vol. 41, No. 2, pp. 175–193.
[12] Sun, J., Li, H., Ma, L. (2011) “A music key detection method based on pitch class distribution
theory,” International Journal of Knowledge-based and Intelligent Engineering Systems, Vol. 15, pp.
165–175.
[13] Weiss, C. (2013) “Global key extraction from classical music audio recordings based on the final
chord,” Proceedings of the Sound and Music Computing Conference, pp. 742–747.
[14] Müller, M., Konz, V., Bogler, W. and Arifi-Müller, V. (2011) “Saarland music data,” Proceedings of
the 12th International Conference on Music Information Retrieval.
[15] Pauws, S. (2004) “Musical key extraction from audio,” Proceedings of the 5th International
Conference on Music Information Retrieval.
[16] Lartillot, O. Toiviainen, P. (2007) “Matlab toolbox for musical features extraction from audio,”
Proceedings of the 10th International Conference on Digital Audio Effects, pp. 237–244.
[17] Gómez, E. (2006) Tonal Description of Music Audio Signals, PhD thesis.
[18] Noland, K. Sandler, M. (2007) “Signal processing parameters for tonality estimation,” Proceedings
of Audio Engineering Society 122nd Convention.
[19] Müller, M. Ewert, S. (2011) “Chroma toolbox: matlab implementations for extracting variants of
chroma-based audio features,” Proceedings of the 12th International Society for Music Information
Retrieval Conference, pp. 215–220.
[20] Mauch, M. Dixon, S. (2010) “Approximate note transcription for the improved identification of
difficult chords,” Proceedings of the 11th International Society for Music Information Retrieval
Conference, pp. 135–140.
[21] Mauch, M. (2010) Automatic Chord Transcription from Audio Using Computational Models of
Musical Context, PhD thesis, Queen Mary University of London.
[22] www.vamp-plugins.org/plugin-doc/qm-vamp-plugins.html#qm-keydetector, accessed July 2014.
[23] Brown, J. (1991) “Calculation of a constant Q spectral transform,” Journal of the Acoustical Society
of America, Vol. 89, No. 1, pp. 425–434.
[24] Chuan, C.-H. and Chew, E. (2012) “Generating ground truth for audio key finding: When the title key
may not be the key,” Proceedings of the 13th International Conference on Music Information
Retrieval.
[25] Chew, E. (2000) “Towards a mathematical model of tonality,” doctoral dissertation, Department of
Operations Research, Massachusetts Institute of Technology, Cambridge, Mass, USA.
18. The International Journal of Multimedia Its Applications (IJMA) Vol.6, No.4, August 2014
[26] Chew, E. (2001) “Modeling tonality: applications to music cognition,” Proceedings of the 23rd
18
Annual Meeting of the Cognitive Science Society, pp. 206–211.
[27] Chuan, C.-H. and Charapko, A. (2013) “Predicting key recognition difficulty in polyphonic audio,”
Proceedings of the 9th IEEE International Workshop on Multimedia Information Processing and
Retrieval.
AUTHORS
Ching-Hua Chuan is an assistant professor of computing at Universityof North
Florida. She received her Ph.D. in computer science from University of Southern
California Viterbi School of Engineering in 2008. She received her B.S. and M.S.
degrees in electrical engineering from National Taiwan University. Dr. Chuan’s
research interests include audio signal processing, music information retrieval,
artificial intelligence and machine learning. She was the recipient of the best new
investigator paper award at the Grace Hopper Celebration of Women in Computing
in 2010.
Elaine Chew received the B.A.S. degree in mathematical and computational
sciences with honors, and in music with distinction, from Stanford University,
Stanford, California, in 1992, and the S.M. and Ph.D. degrees in operations
research from the Massachusetts Institute of Technology, Cambridge,
Massachusetts, in 1998 and 2000, respectively, in the United States. She joined
Queen Mary, University of London, in the United Kingdom, as Professor of Digital
Media in Fall 2011, where she serves as Director of Music Initiatives in the Centre
for Digital Music. Her research interests center on mathematical and computational
modelling of music prosody and structure, on which she has authored over 80
refereed journal and conference articles. Prof. Chew was the recipient of a National Science Foundation
Faculty Early Career Award, Presidential Early Career Award in Science and Engineering, and Edward,
Frances, and Shirley B. Daniels Fellowship at the Radcliffe Institute for Advanced Study.
.