The document discusses developing a model to compose monophonic world music using deep learning techniques. It proposes using a bi-axial recurrent neural network with one axis representing time and the other representing musical notes. The network will be trained on a dataset of MIDI files describing pitch, timing, and velocity of notes. It will also incorporate information from music theory on scales, chords, and other elements extracted from sheet music files. The goal is to generate unique musical sequences while adhering to music theory rules. The model aims to address the problem of composing long durations of background music for public spaces in an automated way.
Query By Humming - Music Retrieval TechniqueShital Kat
This seminar report summarizes query by humming technology. The basic architecture involves extracting melodic information from a hummed input, transcribing it, and comparing it to melodic contours in a database. Challenges include imperfect user queries and accurately capturing pitches from hums. Popular query by humming applications include Shazam, SoundHound, and Midomi. The report also discusses file formats like WAV and MIDI, and the Parsons code algorithm for representing melodies.
DETECTING AND LOCATING PLAGIARISM OF MUSIC MELODIES BY PATH EXPLORATION OVER ...csandit
To the best of our knowledge, the issues of automatic detection of music plagiarism have never
been addressed before. This paper presents the design of an Automatic Music Melody
Plagiarism Detection (AMMPD) method to detect and locate the possible plagiarism in music
melodies. The key contribution of the work is an algorithm proposed to address the challenging
issues encountered in the AMMPD problem, including (1) the inexact matching of noisy and
inaccurate pitches of music audio and (2) the fast detection and positioning of similar
subsegments between suspicious music audio. The major novelty of the proposed method is that
we address the above two issues in temporal domain by means of a novel path finding approach
on a binarized 2-D bit mask in spatial domain. In fact, the proposed AMMPD method can not
only identify the similar pieces inside two suspicious music melodies, but also retrieve music
audio of similar melodies from a music database given a humming or singing query.
Experiments have been conducted to assess the overall performance and examine the effects of
various parameters introduced in the proposed method.
Yi-Hsuan Yang is an Associate Research Fellow with Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010, and became an Assistant Research Fellow in Academia Sinica in 2011. He is also an Adjunct Associate Professor with the National Tsing Hua University, Taiwan. His research interests include music information retrieval, machine learning and affective computing. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award, the 2012 ACM Multimedia Grand Challenge First Prize, and the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan. He is an author of the book Music Emotion Recognition (CRC Press 2011) and a tutorial speaker on music affect recognition in the International Society for Music Information Retrieval Conference (ISMIR 2012). In 2014, he served as a Technical Program Co-chair of ISMIR, and a Guest Editor of the IEEE Transactions on Affective Computing and the ACM Transactions on Intelligent Systems and Technology.
This document summarizes a research paper that proposes a novel query-by-humming/singing method using a fuzzy inference system. The method translates hummed or sung melodies into MIDI format, extracts pitch contour information, and uses a fuzzy inference system to transform pitch intervals into symbolic representations. It then uses the Longest Common Subsequence algorithm to compare the symbolic query representation to representations in a MIDI database to identify similar melodies. An experiment using 718 vocal query samples achieved 85% accuracy in retrieving the top 5 matches from the database.
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
1. The document describes a study that uses a Convolutional Neural Network (CNN) model to classify music genres based on labeled Mel spectrograms of audio clips.
2. A CNN model is trained on a dataset of 1000 audio clips across 10 genres. The trained model is then used to classify new, unlabeled audio clips by genre based on their Mel spectrogram representation.
3. CNNs are well-suited for this task as their convolutional layers can extract hierarchical features from the Mel spectrogram images that are indicative of different genres. The study aims to develop an automated music genre classification system using deep learning techniques.
Computational models of symphonic musicEmilia Gómez
Computational models of symphonic music face various challenges due to the genre's formal complexity, long durations, complex instrumentation, and overlapping sources. Researchers are developing approaches to address melody extraction, structural analysis, source separation, and music visualization for symphonic works. For melody extraction, current methods perform best on simple excerpts but struggle with density and complexity, indicating the need for combined audio-score approaches. Structural analysis of symphonies requires consideration of tonality, orchestration, and discrepancies between expert analyses. Source separation aims to isolate instrument sections from multi-channel recordings.
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)Keunwoo Choi
The document discusses artificial intelligence models for music perception. It summarizes the talk that analyzes and classifies music AI into analysis, creation, signal generation, and signal processing. Specifically, the analysis part is discussed in detail by dividing it into timbre, notes, and lyrics recognition. Through this, we can understand what music AI researchers aim for, assume, develop, neglect, and misunderstand.
Query By Humming - Music Retrieval TechniqueShital Kat
This seminar report summarizes query by humming technology. The basic architecture involves extracting melodic information from a hummed input, transcribing it, and comparing it to melodic contours in a database. Challenges include imperfect user queries and accurately capturing pitches from hums. Popular query by humming applications include Shazam, SoundHound, and Midomi. The report also discusses file formats like WAV and MIDI, and the Parsons code algorithm for representing melodies.
DETECTING AND LOCATING PLAGIARISM OF MUSIC MELODIES BY PATH EXPLORATION OVER ...csandit
To the best of our knowledge, the issues of automatic detection of music plagiarism have never
been addressed before. This paper presents the design of an Automatic Music Melody
Plagiarism Detection (AMMPD) method to detect and locate the possible plagiarism in music
melodies. The key contribution of the work is an algorithm proposed to address the challenging
issues encountered in the AMMPD problem, including (1) the inexact matching of noisy and
inaccurate pitches of music audio and (2) the fast detection and positioning of similar
subsegments between suspicious music audio. The major novelty of the proposed method is that
we address the above two issues in temporal domain by means of a novel path finding approach
on a binarized 2-D bit mask in spatial domain. In fact, the proposed AMMPD method can not
only identify the similar pieces inside two suspicious music melodies, but also retrieve music
audio of similar melodies from a music database given a humming or singing query.
Experiments have been conducted to assess the overall performance and examine the effects of
various parameters introduced in the proposed method.
Yi-Hsuan Yang is an Associate Research Fellow with Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010, and became an Assistant Research Fellow in Academia Sinica in 2011. He is also an Adjunct Associate Professor with the National Tsing Hua University, Taiwan. His research interests include music information retrieval, machine learning and affective computing. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award, the 2012 ACM Multimedia Grand Challenge First Prize, and the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan. He is an author of the book Music Emotion Recognition (CRC Press 2011) and a tutorial speaker on music affect recognition in the International Society for Music Information Retrieval Conference (ISMIR 2012). In 2014, he served as a Technical Program Co-chair of ISMIR, and a Guest Editor of the IEEE Transactions on Affective Computing and the ACM Transactions on Intelligent Systems and Technology.
This document summarizes a research paper that proposes a novel query-by-humming/singing method using a fuzzy inference system. The method translates hummed or sung melodies into MIDI format, extracts pitch contour information, and uses a fuzzy inference system to transform pitch intervals into symbolic representations. It then uses the Longest Common Subsequence algorithm to compare the symbolic query representation to representations in a MIDI database to identify similar melodies. An experiment using 718 vocal query samples achieved 85% accuracy in retrieving the top 5 matches from the database.
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
1. The document describes a study that uses a Convolutional Neural Network (CNN) model to classify music genres based on labeled Mel spectrograms of audio clips.
2. A CNN model is trained on a dataset of 1000 audio clips across 10 genres. The trained model is then used to classify new, unlabeled audio clips by genre based on their Mel spectrogram representation.
3. CNNs are well-suited for this task as their convolutional layers can extract hierarchical features from the Mel spectrogram images that are indicative of different genres. The study aims to develop an automated music genre classification system using deep learning techniques.
Computational models of symphonic musicEmilia Gómez
Computational models of symphonic music face various challenges due to the genre's formal complexity, long durations, complex instrumentation, and overlapping sources. Researchers are developing approaches to address melody extraction, structural analysis, source separation, and music visualization for symphonic works. For melody extraction, current methods perform best on simple excerpts but struggle with density and complexity, indicating the need for combined audio-score approaches. Structural analysis of symphonies requires consideration of tonality, orchestration, and discrepancies between expert analyses. Source separation aims to isolate instrument sections from multi-channel recordings.
인공지능의 음악 인지 모델 - 65차 한국음악지각인지학회 기조강연 (최근우 박사)Keunwoo Choi
The document discusses artificial intelligence models for music perception. It summarizes the talk that analyzes and classifies music AI into analysis, creation, signal generation, and signal processing. Specifically, the analysis part is discussed in detail by dividing it into timbre, notes, and lyrics recognition. Through this, we can understand what music AI researchers aim for, assume, develop, neglect, and misunderstand.
Automatic Music Generation Using Deep LearningIRJET Journal
This document discusses automatic music generation using deep learning. It begins with an abstract describing how music is generated in the form of a sequence of ABC notes using deep learning concepts. LSTM or GRUs are commonly used for music generation as recurrent neural networks that can efficiently model sequences. The main purpose of the project described is to generate melodious and rhythmic music automatically using a recurrent neural network. It reviews approaches like WaveNet and LSTM for music generation and tools like Magenta and DeepJazz. The design uses a character RNN and LSTM network to classify and predict the next character in an ABC notation sequence to generate music.
This document discusses the use of artificial intelligence in organized sound as surveyed in the journal Organised Sound. It provides an overview of key AI technologies like Auto-Tune audio processing that can correct pitch and organize sound. Applications discussed include general sound classification, open sound control for music networking, and time-frequency representations for sound analysis and resynthesis. The document also outlines recent research on intelligent composer assistants, responsive instruments, and recognition of musical sounds. Finally, it discusses the future of AI in organizing sound through planning and machine learning.
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET Journal
This document presents research on classifying music genres using machine learning algorithms. The researchers built multiple classification models using the Free Music Archive dataset and compared the models' performance in predicting genre accuracy. Some models were trained on mel-spectrograms of songs and their audio features, while others used only spectrograms. The researchers found that a convolutional neural network model trained solely on spectrograms achieved the highest accuracy among the tested models. The goal of the research was to develop a machine learning approach for automatic music genre classification that performs better than existing methods.
This document summarizes a comparison of six sound analysis/synthesis systems conducted at the 2000 International Computer Music Conference. Each system analyzed the same set of 27 varied input sounds and output the results in a common format (SDIF). The comparison describes each system, compares them in terms of availability, sound models used, interpolation models, noise modeling, parameter mutability, required analysis parameters, and artifacts. The goal was not competition but providing information to help musicians choose appropriate analysis/synthesis tools.
This document discusses a method for extracting vocals from songs and converting them to instrumental covers using deep learning techniques. It involves using the Spleeter library to separate vocals from music tracks. The extracted vocals can then be converted to instrumental covers for different instruments using a DDSP (Differentiable Digital Signal Processing) library combined with pretrained convolutional neural networks. This allows generating instrumental covers from songs to help music students learn instruments without relying on professionals to create covers. The proposed approach could make a variety of instrumental covers more widely available and assist those learning music.
MUSZIC GENERATION USING DEEP LEARNING PPT.pptxlife45165
To create a Streamlit application for music generation using deep learning, you need to ensure that all the elements of your Python script are correctly set up and that you handle file paths correctly, especially given the specific paths on your system.
Application of Recurrent Neural Networks paired with LSTM - Music GenerationIRJET Journal
This document discusses using recurrent neural networks and long short-term memory networks to generate music. It notes that producing music can be expensive, but an AI system could provide a cheaper alternative for businesses. The system would be trained on music theory concepts like notes, chords, scales and keys to understand harmonious combinations. A web-based platform could then generate custom music based on user selections and input the trained machine learning model. The goal is an affordable way for companies to automatically produce unique music for branding and promotions.
This document provides an overview of optical music recognition (OMR) systems. It discusses the typical architecture of an OMR system, which includes image preprocessing, recognition of musical symbols, reconstruction of musical notation, and construction of a machine-readable symbolic representation. The document also reviews common techniques used in OMR systems, such as binarization, and describes some of the challenges of OMR, particularly for handwritten musical scores which exhibit greater variability in symbols.
This document proposes an end-to-end neural approach for optical music recognition (OMR) of monophonic scores. It trains a neural network model using a dataset of over 80,000 real musical score images paired with their symbolic transcripts. The model combines convolutional and recurrent neural networks to process the image and sequential output. Experimental results demonstrate the neural approach can successfully perform OMR in an end-to-end manner on the monophonic scores. The study provides a starting point for developing scalable neural models for OMR of various printed and handwritten musical scores.
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET Journal
This document describes a proposed emotion-based music recommendation system that uses facial expression recognition and an SVM algorithm. The system aims to suggest songs to users based on their detected emotion state in order to save them time in manually selecting songs. It would use computer vision components like OpenCV to determine a user's emotion from facial expressions. Once an emotion is recognized, the SVM model would suggest a song matching that emotion. The system aims to automate mood-based playlist creation and improve the music enjoyment experience. It outlines the methodology, including using OpenCV for facial recognition, an SVM algorithm to classify emotions detected, natural language processing for chatbot responses, and IFTTT for response recording.
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORKJennifer Roman
The document describes an automated music making system using recurrent neural networks that allows users to generate music online by specifying genres, types, and length, with the goal of making music generation more accessible; it details the architecture of the system and the recurrent neural network model used to generate MIDI music files based on training from existing music datasets; and several challenges of music generation are discussed along with related work on algorithmic music generation.
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
The ability of traditional packet level Forward Error Correction approaches can limit errors for
small sporadic network losses but when dropouts of large portions occur listening quality becomes an
issue. Services such as audio-on-demand drastically increase the loads on networks therefore new, robust
and highly efficient coding algorithms are necessary. One method overlooked to date, which can work
alongside existing audio compression schemes, is that which takes account of the semantics and natural
repetition of music through meta-data tagging. Similarity detection within polyphonic audio has presented
problematic challenges within the field of Music Information Retrieval. We present a system which works
at the content level thus rendering it applicable in existing streaming services. Using the MPEG–7 Audio
Spectrum Envelope (ASE) gives features for extraction and combined with k-means clustering enables
self-similarity to be performed within polyphonic audio.
Tom Collins is a PhD student at the Centre for Research in Computing studying how current methods for pattern discovery in music can be improved and integrated into an automated composition system. He is improving pattern discovery algorithms in two ways: 1) developing a new formula to rate discovered patterns based on empirical user ratings, and 2) creating a new algorithm called SIACT that outperforms existing algorithms at finding translational patterns based on benchmarks set by a music analyst. His presentation will demonstrate these improvements and how they are incorporated into a user interface.
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
Audio signals which include speech, music and environmental sounds are important types of media. The problem of distinguishing audio signals into these different audio types is thus becoming increasingly significant. A human listener can easily distinguish between different audio types by just listening to a short segment of an audio signal. However, solving this problem using computers has proven to be very difficult. Nevertheless, many systems with modest accuracy could still be implemented. The experimental results demonstrate the effectiveness of our classification system. The complete system is developed in ANN Techniques with Autonomic Computing system
Modeling of Song Pattern Similarity using Coefficient of VarianceGobinda Karmakar ☁
This paper proposes a system to automatically identify the raga and raga cycle of a song by analyzing note frequencies. It calculates the coefficient of variance of note frequencies to measure similarity between songs. If the coefficient of variance is between 0-1, the songs are from the same raga cycle and are similar. The system was tested on songs and accurately identified whether songs were from the same or different raga cycles based on their coefficient of variance.
This document provides an overview of a dissertation on Emofy, a classical music recommender system. The summary includes:
- Emofy is a music recommender system that recommends classical Indian music based on the user's mood by classifying moods and associating different ragas and genres with different moods.
- The dissertation discusses collecting and labeling a dataset of classical music, extracting features to classify mood, and using machine learning algorithms like random forests to achieve over 90% accuracy in mood classification.
- The recommended system uses mood classification to map users to appropriate ragas and playlists of classical music tracks on Spotify aimed at therapeutic applications.
IRJET- A Personalized Music Recommendation SystemIRJET Journal
This document describes a personalized music recommendation system that uses collaborative filtering and convolutional neural networks. The system provides three types of recommendations: popularity-based recommendations based on the most popular songs among all users, item-based recommendations of similar songs based on a user's listening history using collaborative filtering, and genre-based recommendations based on the genres of songs a user has listened to previously as determined by a convolutional neural network classifier. The system was tested on a dataset of music listening logs and audio files and evaluated based on its ability to provide personalized music recommendations to users.
This document provides details about Katie Wilkie's PhD research which aims to identify conceptual metaphors used by musicians to understand musical concepts like pitch, melody, and harmony. The research will involve discussions with musicians to elicit these metaphors, which will then be used to evaluate existing music interaction designs and inform the creation of new, more intuitive designs. The contributions will include increased knowledge of how metaphors aid musical understanding and guidelines for designing music interactions based on conceptual metaphors.
The kusc classical music dataset for audio key findingijma
In this paper, we present a benchmark dataset based on the KUSC classical music collection and provide
baseline key-finding comparison results. Audio key finding is a basic music information retrieval task; it
forms an essential component of systems for music segmentation, similarity assessment, and mood
detection. Due to copyright restrictions and a labor-intensive annotation process, audio key finding
algorithms have only been evaluated using small proprietary datasets to date. To create a common base for
systematic comparisons, we have constructed a dataset comprising of more than 3,000 excerpts of classical
music. The excerpts are made publicly accessible via commonly used acoustic features such as pitch-based
spectrograms and chromagrams. We introduce a hybrid annotation scheme that combines the use of title
keys with expert validation and correction of only the challenging cases. The expert musicians also provide
ratings of key recognition difficulty. Other meta-data include instrumentation. As demonstration of use of
the dataset, and to provide initial benchmark comparisons for evaluating new algorithms, we conduct a
series of experiments reporting key determination accuracy of four state-of-the-art algorithms. We further
show the importance of considering factors such as estimated tuning frequency, key strength or confidence
value, and key recognition difficulty in key finding. In the future, we plan to expand the dataset to include
meta-data for other music information retrieval tasks.
survey on Hybrid recommendation mechanism to get effective ranking results fo...Suraj Ligade
These days clients are having exclusive
requirements towards advancements, they need to hunt tunes
in such circumstances where they are not ready to recall tunes
title or melody related points of interest. Recovery of music or
melodies substance is one of the hardest errands and testing
work in the field of Music Information Retrieval (MIR). There
are different looking techniques created and executed, yet
these seeking strategies are no more ready to inquiry tunes
which required by the clients and confronting different issues
like programmed playlist creation, music suggestion or music
pursuit are connected issues. In past framework client seek
the tune with the assistance of tune title, craftsman name and
whatever other related points of interest so this strategy is
exceptionally tedious. To beat this issue singing so as to look
tune or murmuring a segment of it is the most regular
approach to seek the tune. This hunt strategy is the most
helpful when client don't have entry to sound gadget or client
can't review the traits of the tune such as tune title, name of
craftsman, name of collection. In proposed framework client
have not stress over recalling the tune data and this technique
is not tedious. In this strategy we utilize the data from a
client's hunt history and in addition the normal properties of
client's comparative foundations. Cross breed proposal
component utilizes the substance construct recovery
framework situated in light of utilization of the sound data
such as tone, pitch, mood. This component used to get exact
result to the client. The more imperative idea is clients ready
to work their gadgets without manual information orders by
hand. It is simple and basic system to perform music look.
Automatic Music Generation Using Deep LearningIRJET Journal
This document discusses automatic music generation using deep learning. It begins with an abstract describing how music is generated in the form of a sequence of ABC notes using deep learning concepts. LSTM or GRUs are commonly used for music generation as recurrent neural networks that can efficiently model sequences. The main purpose of the project described is to generate melodious and rhythmic music automatically using a recurrent neural network. It reviews approaches like WaveNet and LSTM for music generation and tools like Magenta and DeepJazz. The design uses a character RNN and LSTM network to classify and predict the next character in an ABC notation sequence to generate music.
This document discusses the use of artificial intelligence in organized sound as surveyed in the journal Organised Sound. It provides an overview of key AI technologies like Auto-Tune audio processing that can correct pitch and organize sound. Applications discussed include general sound classification, open sound control for music networking, and time-frequency representations for sound analysis and resynthesis. The document also outlines recent research on intelligent composer assistants, responsive instruments, and recognition of musical sounds. Finally, it discusses the future of AI in organizing sound through planning and machine learning.
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET Journal
This document presents research on classifying music genres using machine learning algorithms. The researchers built multiple classification models using the Free Music Archive dataset and compared the models' performance in predicting genre accuracy. Some models were trained on mel-spectrograms of songs and their audio features, while others used only spectrograms. The researchers found that a convolutional neural network model trained solely on spectrograms achieved the highest accuracy among the tested models. The goal of the research was to develop a machine learning approach for automatic music genre classification that performs better than existing methods.
This document summarizes a comparison of six sound analysis/synthesis systems conducted at the 2000 International Computer Music Conference. Each system analyzed the same set of 27 varied input sounds and output the results in a common format (SDIF). The comparison describes each system, compares them in terms of availability, sound models used, interpolation models, noise modeling, parameter mutability, required analysis parameters, and artifacts. The goal was not competition but providing information to help musicians choose appropriate analysis/synthesis tools.
This document discusses a method for extracting vocals from songs and converting them to instrumental covers using deep learning techniques. It involves using the Spleeter library to separate vocals from music tracks. The extracted vocals can then be converted to instrumental covers for different instruments using a DDSP (Differentiable Digital Signal Processing) library combined with pretrained convolutional neural networks. This allows generating instrumental covers from songs to help music students learn instruments without relying on professionals to create covers. The proposed approach could make a variety of instrumental covers more widely available and assist those learning music.
MUSZIC GENERATION USING DEEP LEARNING PPT.pptxlife45165
To create a Streamlit application for music generation using deep learning, you need to ensure that all the elements of your Python script are correctly set up and that you handle file paths correctly, especially given the specific paths on your system.
Application of Recurrent Neural Networks paired with LSTM - Music GenerationIRJET Journal
This document discusses using recurrent neural networks and long short-term memory networks to generate music. It notes that producing music can be expensive, but an AI system could provide a cheaper alternative for businesses. The system would be trained on music theory concepts like notes, chords, scales and keys to understand harmonious combinations. A web-based platform could then generate custom music based on user selections and input the trained machine learning model. The goal is an affordable way for companies to automatically produce unique music for branding and promotions.
This document provides an overview of optical music recognition (OMR) systems. It discusses the typical architecture of an OMR system, which includes image preprocessing, recognition of musical symbols, reconstruction of musical notation, and construction of a machine-readable symbolic representation. The document also reviews common techniques used in OMR systems, such as binarization, and describes some of the challenges of OMR, particularly for handwritten musical scores which exhibit greater variability in symbols.
This document proposes an end-to-end neural approach for optical music recognition (OMR) of monophonic scores. It trains a neural network model using a dataset of over 80,000 real musical score images paired with their symbolic transcripts. The model combines convolutional and recurrent neural networks to process the image and sequential output. Experimental results demonstrate the neural approach can successfully perform OMR in an end-to-end manner on the monophonic scores. The study provides a starting point for developing scalable neural models for OMR of various printed and handwritten musical scores.
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET Journal
This document describes a proposed emotion-based music recommendation system that uses facial expression recognition and an SVM algorithm. The system aims to suggest songs to users based on their detected emotion state in order to save them time in manually selecting songs. It would use computer vision components like OpenCV to determine a user's emotion from facial expressions. Once an emotion is recognized, the SVM model would suggest a song matching that emotion. The system aims to automate mood-based playlist creation and improve the music enjoyment experience. It outlines the methodology, including using OpenCV for facial recognition, an SVM algorithm to classify emotions detected, natural language processing for chatbot responses, and IFTTT for response recording.
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORKJennifer Roman
The document describes an automated music making system using recurrent neural networks that allows users to generate music online by specifying genres, types, and length, with the goal of making music generation more accessible; it details the architecture of the system and the recurrent neural network model used to generate MIDI music files based on training from existing music datasets; and several challenges of music generation are discussed along with related work on algorithmic music generation.
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
The ability of traditional packet level Forward Error Correction approaches can limit errors for
small sporadic network losses but when dropouts of large portions occur listening quality becomes an
issue. Services such as audio-on-demand drastically increase the loads on networks therefore new, robust
and highly efficient coding algorithms are necessary. One method overlooked to date, which can work
alongside existing audio compression schemes, is that which takes account of the semantics and natural
repetition of music through meta-data tagging. Similarity detection within polyphonic audio has presented
problematic challenges within the field of Music Information Retrieval. We present a system which works
at the content level thus rendering it applicable in existing streaming services. Using the MPEG–7 Audio
Spectrum Envelope (ASE) gives features for extraction and combined with k-means clustering enables
self-similarity to be performed within polyphonic audio.
Tom Collins is a PhD student at the Centre for Research in Computing studying how current methods for pattern discovery in music can be improved and integrated into an automated composition system. He is improving pattern discovery algorithms in two ways: 1) developing a new formula to rate discovered patterns based on empirical user ratings, and 2) creating a new algorithm called SIACT that outperforms existing algorithms at finding translational patterns based on benchmarks set by a music analyst. His presentation will demonstrate these improvements and how they are incorporated into a user interface.
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESAM Publications
Audio signals which include speech, music and environmental sounds are important types of media. The problem of distinguishing audio signals into these different audio types is thus becoming increasingly significant. A human listener can easily distinguish between different audio types by just listening to a short segment of an audio signal. However, solving this problem using computers has proven to be very difficult. Nevertheless, many systems with modest accuracy could still be implemented. The experimental results demonstrate the effectiveness of our classification system. The complete system is developed in ANN Techniques with Autonomic Computing system
Modeling of Song Pattern Similarity using Coefficient of VarianceGobinda Karmakar ☁
This paper proposes a system to automatically identify the raga and raga cycle of a song by analyzing note frequencies. It calculates the coefficient of variance of note frequencies to measure similarity between songs. If the coefficient of variance is between 0-1, the songs are from the same raga cycle and are similar. The system was tested on songs and accurately identified whether songs were from the same or different raga cycles based on their coefficient of variance.
This document provides an overview of a dissertation on Emofy, a classical music recommender system. The summary includes:
- Emofy is a music recommender system that recommends classical Indian music based on the user's mood by classifying moods and associating different ragas and genres with different moods.
- The dissertation discusses collecting and labeling a dataset of classical music, extracting features to classify mood, and using machine learning algorithms like random forests to achieve over 90% accuracy in mood classification.
- The recommended system uses mood classification to map users to appropriate ragas and playlists of classical music tracks on Spotify aimed at therapeutic applications.
IRJET- A Personalized Music Recommendation SystemIRJET Journal
This document describes a personalized music recommendation system that uses collaborative filtering and convolutional neural networks. The system provides three types of recommendations: popularity-based recommendations based on the most popular songs among all users, item-based recommendations of similar songs based on a user's listening history using collaborative filtering, and genre-based recommendations based on the genres of songs a user has listened to previously as determined by a convolutional neural network classifier. The system was tested on a dataset of music listening logs and audio files and evaluated based on its ability to provide personalized music recommendations to users.
This document provides details about Katie Wilkie's PhD research which aims to identify conceptual metaphors used by musicians to understand musical concepts like pitch, melody, and harmony. The research will involve discussions with musicians to elicit these metaphors, which will then be used to evaluate existing music interaction designs and inform the creation of new, more intuitive designs. The contributions will include increased knowledge of how metaphors aid musical understanding and guidelines for designing music interactions based on conceptual metaphors.
The kusc classical music dataset for audio key findingijma
In this paper, we present a benchmark dataset based on the KUSC classical music collection and provide
baseline key-finding comparison results. Audio key finding is a basic music information retrieval task; it
forms an essential component of systems for music segmentation, similarity assessment, and mood
detection. Due to copyright restrictions and a labor-intensive annotation process, audio key finding
algorithms have only been evaluated using small proprietary datasets to date. To create a common base for
systematic comparisons, we have constructed a dataset comprising of more than 3,000 excerpts of classical
music. The excerpts are made publicly accessible via commonly used acoustic features such as pitch-based
spectrograms and chromagrams. We introduce a hybrid annotation scheme that combines the use of title
keys with expert validation and correction of only the challenging cases. The expert musicians also provide
ratings of key recognition difficulty. Other meta-data include instrumentation. As demonstration of use of
the dataset, and to provide initial benchmark comparisons for evaluating new algorithms, we conduct a
series of experiments reporting key determination accuracy of four state-of-the-art algorithms. We further
show the importance of considering factors such as estimated tuning frequency, key strength or confidence
value, and key recognition difficulty in key finding. In the future, we plan to expand the dataset to include
meta-data for other music information retrieval tasks.
survey on Hybrid recommendation mechanism to get effective ranking results fo...Suraj Ligade
These days clients are having exclusive
requirements towards advancements, they need to hunt tunes
in such circumstances where they are not ready to recall tunes
title or melody related points of interest. Recovery of music or
melodies substance is one of the hardest errands and testing
work in the field of Music Information Retrieval (MIR). There
are different looking techniques created and executed, yet
these seeking strategies are no more ready to inquiry tunes
which required by the clients and confronting different issues
like programmed playlist creation, music suggestion or music
pursuit are connected issues. In past framework client seek
the tune with the assistance of tune title, craftsman name and
whatever other related points of interest so this strategy is
exceptionally tedious. To beat this issue singing so as to look
tune or murmuring a segment of it is the most regular
approach to seek the tune. This hunt strategy is the most
helpful when client don't have entry to sound gadget or client
can't review the traits of the tune such as tune title, name of
craftsman, name of collection. In proposed framework client
have not stress over recalling the tune data and this technique
is not tedious. In this strategy we utilize the data from a
client's hunt history and in addition the normal properties of
client's comparative foundations. Cross breed proposal
component utilizes the substance construct recovery
framework situated in light of utilization of the sound data
such as tone, pitch, mood. This component used to get exact
result to the client. The more imperative idea is clients ready
to work their gadgets without manual information orders by
hand. It is simple and basic system to perform music look.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
Nithin Xavier research_proposal
1. Composing Monophonic World Music Using Deep Learning
Nithin Xavier
x17110530
MSc in Data Analytics
14th August 2018
Keywords Music Generation; Computer Composing; Neural Networks; Deep Learning.
Abstract: Music is considered as a universal language, loved by all. We enjoy music of
languages, without any language barrier since music defines itself as a medium which connects
the mind and the soul. Computer generated music is a relatively new term and the associated
domain is still in its infancy. Still, there have been good result yielding researches in this field
albeit there are limited researches. Emulating a human composer is the task of the system
proposed to generate good sounding music. Alan Turings theory can be applied here that when
a human cannot distinguish between a computer-generated music and a human composers
work, that computer music generation system will be deemed as perfect. The current problems
of music composers to generate long hours of music which are normally played in airports,
restaurants, flights, malls and other public places can be effectively solved by neural networks.
These neural networks can also synthesize Sleep Music, which also is a growing genre is
becoming popular with people having sleeping problems or discomforts. We generalize these
problems and strive to develop a model to compose monophonic world music using a good
training dataset and a novel application of musical theory knowledge.
1
3. 1 Introduction
Composition of music is considered a creative and innovative work. It involves application of music
principles like chord progressions, scales, harmony, dynamics, etc. Even though a composer ought to
follow these musical rules or theory, sometimes a composer may introduce out of the box arrangements
by making changes in chord progressions or by inducing accidental notes and styles. Also, due to the
fact that there are lot of music already available and under development, the composer has to be alert to
avoid the chances of his/her music to be similar to another already available music. There are seven main
notes in a scale ranging from C to B. The higher octave C will constitute the 8th note forming the C scale
in music. For every note there are sharp notes or flat notes except for two notes. This same arrangement
of keys are repeated to form higher or lower octaves. Hence, there are only 12 distinct notes in music.
Composers have only these 12 notes to compose any music. The permutations and combinations of these
12 notes is what we find in every music produced or arranged. Sometimes, after composing a song, the
composer may find a similar tune matching his/her work and is in a position to change the complete
affected sequence. The musical rules and theory can be fed into neural networks to generate a unique
sequence of musical arrangement which form the novelty in this project. Computer systems can learn
effectively better than humans if properly trained. Computer generated music have been researched since
long, but there are only a few researches in this field, owing to the lack of musical training or expertise and
also due to the lack of interest shown by sound technologists and musicians. Recurrent neural networks
are employed in many researches because of its good performance in this arena. Through this research we
address the problems of the music composers such as composing long hours of music for meditation and
leisure music to be played at public places like airports, flights, malls, etc. Composing music which has
a duration for more than 2-3 hours can be time consuming and exhaustive experience. These music may
contain similar patterns throughout which can be generated effectively by neural networks by training
the system well. Hence, the research undertaken with this regard has the following research question:
How effectively can deep learning techniques generate unique monophonic world music based on a music
dataset and musical theory?. We address this research question to propose an appropriate methodology,
implementation and expected result evaluation. In the next section we talk about the various researches
carried out related to our proposed objective.
2 State of the Art
In this section we analyse, and survey various research papers related to our research objective. The
following first sub-section under consideration is the format or type of the dataset used in the researches
of the referenced papers.
2.1 Filetype of Dataset used
Since our research question seeks solution to develop a monophonic music sample based on a contem-
porary music dataset and music theory rules, the processing of the dataset and the output format of the
music generated plays a crucial role. The type of the dataset used for the processing in neural networks
decides the quality of the output of generated music and the complexity of neural networks. There are
two main types of audio formats which are MIDI (Musical Instrument Digital Interface Format) and
WAV (Waveform Audio File Format) filetypes. If we analyse the type of data used in the research pa-
pers of related work, we can observe that there is a clear dominance of the usage of MIDI format for
computer music generation. The following are the citations of research works which have used MIDI
filetype: Madhok et al. (2018), Mao (2018), Yang et al. (2017), Sabathe et al. (2017), Liang (2016),
Lyu et al. (2015), Goel et al. (2014), Chung et al. (2014), Roig et al. (2014), Boulanger-Lewandowski
et al. (2012), Yuksel et al. (2011), Oliwa and Wagner (2008), Cameron (2001) and Masako and Kazuyuki
(1992). This shows that from the early 90s till the present time, MIDI filetype have been the preferred
choice of researchers in this domain. The main reason of this choice is because of low complexity in
computation and the output is recognizable and editable in most of the digital audio workstations or
music production softwares. MIDI format contains information like the pitch, velocity and many other
information regarding the played notation. The main advantage is that instrument of the music gener-
ated can be changed to any desirable, high quality virtual instrument to increase the sound quality of
the final output. The disadvantage of the direct playback of MIDI files is that the sound quality of the
instrument is substandard, and poor compared to all other formats. Now, well see the next filetype used
for the researches in this domain.
3
4. WAV is the preferred filetype in music production scenarios because of the feature of lossless audio.
But they face a drawback of a significantly large file size than compared to the MIDI format. Engel et al.
(2017) and Ma´ndziuk et al. (2016) have used the WAV format in their researches. These files cannot be
edited by note but will have to be edited by the waveform which is a more complex task. Because of the
large file size, the computational complexity is high, and it will require superior level GPUs to execute
the machine learning algorithms. The next filetype concerns extraction of information related to sheet
music to train the system.
Lichtenwalter and Lichtenwalter (2009) has used MusicXML format to garner information regarding
the notes, time signature, pitch, dynamics and other musical parameters to feed into the system for
teaching the system about chord progressions and other music theory for effective music generation.
While, Eck and Schmidhuber (2002) have directly extracted musical information from sheet music to
train the system of the chords and the sequence of notes to be played. Extracting information from
sheet music is not as effective as the two commonly playable audio formats mentioned before which are
MIDI and WAV. Hence, most of the researches in the domain of computerised music generation or music
composing utilizes these filetypes.
2.2 Model Generation
The model generation or the development of neural network is the major part of this project which
learns and trains itself to produce monophonic music samples as per our objective. This section aims to
discuss and compare the various methods used for the network generation. Madhok et al. (2018) in their
research have recorded 7 major human emotions and then as per the detected emotion, generated music
apt for that observed scenario and emotion using dual layer Long Short Term Memory Network (LSTM)
architecture. The evaluation of this work was performed using a correlation between the facial expression
detected and the probability that the resulting music falls in the same section. This correlation resulted
in 0.93, which proves to be a good score. Mao (2018) have used a dual axis LSTM architecture wherein
one axis provides provision for the desired time of generated music and the other axis facilitates the
output of the desired notes. By the addition of style and volume features the music production quality
was enhanced. The evaluation of this approach was done by a statistical hypothesis with the level of
significance 0.05 and the value z = 0.945 conveys that the classification precision of human composers
and the proposed approach was almost similar. Solutions to three different music generation problems
like harmonization, chord inversion and voicings and chord estimation were achieved by Kitahara (2017)
by using Bayesian Networks. Yang et al. (2017) have implemented model based on Convolutional Neural
Networks (CNN) and Generative Adversarial Network (GAN) in which information regarding the previ-
ous bar and the sequence of chord structure is incorporated and have produced similar results as others.
The drawback of this model is highlighted by the absence of consideration to velocity and musical pauses
which makes the music produced to be aligned towards artificial music. Sabathe et al. (2017) used
LSTM networks for music generation with optimized parameters like 167 units of LSTM for both the
decoding and encoding functions and 23 steps to perform sequential automatic encoding. The major
drawback in this approach was that production of music pieces longer than the trained samples could
not be generated.
Music theory and other features of music were utilised more effectively by Ma´ndziuk et al. (2016) in
which the authors developed a combined algorithm consisting of a genetic algorithm and local optimiza-
tion which captures all necessary technicalities of music theory to produce aesthetically and theoretically
superior music.Liang (2016) have developed a sequential LSTM network where they train the system to
produce good quality music without much training of musical theory concepts. Lyu et al. (2015) have
done an amalgamation of LSTM units to Recurrent Temporal Restricted Boltzmann Machine (RTRBM)
and have secured average results which were caused by the absence of optimization techniques. Goel et al.
(2014) used RNN with two layers of Restricted Boltzmann Machine for sequence modelling to produce
music whose results are only at par with the other researches owing to lack of optimization methods and
pretraining of musical theory.Chung et al. (2014) shows that LSTM and GRU units fare better in LSTM
networks as opposed to tanh unit in the applications of raw speech and polyphonic music generation.
Eck and Schmidhuber (2002) facilitated learning chord sequences and melody sequences to input the
learnt information to LSTM networks. Hence, a majority of the researchers have used recurrent neural
networks, more specifically LSTM networks to be able to learn and generate music pieces based on an
input work and musical technicalities.
4
5. 3 Research Question
The research question for this project is as follows: How effectively can deep learning techniques generate
unique monophonic world music based on a music dataset and musical theory?.
This falls under the domain of Computer Music Generation. Neural networks have been successful
in producing good musical recordings as seen in the literature review in the next sub-section. To be
able to evaluate and learn the previous notes and chords played in a musical sequence is necessary in
computerised music generation. Hence, we use recurrent neural networks (RNN) have recurrent loops
for the nodes. LSTM units have been seen as the best RNN unit in the referenced papers. But for the
effective implementation of our research objective, we introduce a novel implementation of the learning of
Musical theory concerning scales in music, time signatures, chord progressions, velocity understanding of
each note according to genre and accidental chords and progressions that may be introduced unexpectedly
into a music. These information are contained in MusicXML format which are extracted into the system
along with MIDI information. In the related literature, we can observe the mention of musical knowledge
which can complement and better the system under development to generate music. The researchers
have suggested the collaboration of musical experts and computer experts. Since, I am a musician and
have the requisite music theory knowledge to train the neural networks, this project is feasible and can
be improved upon the other researches in this domain.
4 Proposed Approach
In this section the approach to be followed for the given research question is detailed and given an over-
view of the complete picture. We propose to implement recurrent neural networks (RNN) to analyse the
global contemporary music, MIDI dataset and recreate similar sequences of music. We use MIDI files as
an input to the network to train known melody lines and musical sequences and test for the probability
of occurrence of similar but unique music. The dataset used is the The Lakh MIDI Dataset v1.0 found
in the web link: http:colinraffel.comprojectslmd#get. There around 1 lakh MIDI files of the songs listed
in the Million Song Dataset. So this dataset is a subset of the Million Song Dataset containing con-
temporary global music. MIDI Files consist of only the information about the notes played, timing of
the notes, time signature, dynamics and velocity. Also, we use information related to music theory such
as chord progression, scales, accidental note and chord usages and other music rules in the MusicXML
format to be fed into the Recurrent Neural Network. These information are present as sheet music in
this format which will be extracted and trained in the system proposed.
The desired design constraints of the network are as follows:
Time Signature The Recurrent neural network should be able to identify the current playing time
with reference to the musical time signature. Time signature refers to the number of beats occurring
in one single bar of music. In common time, there are four beats in one bar which is denoted by 4/4.
Likewise, there are many time signatures like 3/4, 5/8, 6/8, 7/8, etc.
Invariance in Notes There should be independence in the music with related to the octave. Changing
octaves should not affect the basic note, chord structure and progressions.
Repetition of Notes The sustain of one note over two bars should be distinguished from playing that
same note twice.
Invariance in Time There should be freedom given to the network to generate music independent of
the time frame like an adlib, as called in musical terms
Accidental Note and Chord Changes Accidental notes or chords can be termed as out of the box
methods which do not feature in musical theory. These can be innovative and enriching to hear if used
correctly with some developed rules.
5
6. Figure 1: Network Design
The property of being invariant in time is achievable using RNN. However, note invariance is not
achievable in RNN because the fully connected layer has nodes to represent all the notes in MIDI format.
If we increase the pitch of every note by one half step in music, the output of the network will be entirely
different from the desirable output. This drawback can be resolved by the convolutional neural networks
(CNN). In the case of image recognition applications, the kernel of CNN is used to apply to all pixels of
the image of the input source. Now, we assume that kernel of CNN is replaced with an RNN. Hence, the
network will consist of an RNN, wherein the kernel is replaced with another RNN. This would facilitate
the cells or the pixel the luxury to possess a neural network of its own. This example is now applied to
our study and hence we replace the pixels in this example with notes which are main elements in our
research. If we implement a stack of similar RNN, wherein this RNN is provided for each and every
note, every note gets a neighbourhood of RNN. These neighbourhood RNN of every note is assigned one
octave higher and lower than the normal pitch of the note. Hence, we achieve an invariance in time as
well as notes.
Because of the memory retained concerning previous notes and sequences, we must now build a
method to produce good innovative chords for the music. Hence, we divide this approach into two
sections. Bi-Axial recurrent neural network is suitable for meeting our research objective. In Bi-Axial
RNN, the first axis will represent time and the other axis will represent note. The network design is as
shown in the figure.
The following are the details concerning the inputs and outputs of the proposed network: The inputs
to the time layer in the Bi-axial RNN are discussed here.
1. Note Value: The note value refers to the MIDI value which describes1 the register of the played
note whether it lies in the lower register or the higher register.
2. Pitch: This refers to the value of the pitch of the played note where A notes pitch value is 0 and
the value increases for every one-half note increase in pitch.
3. Scale: The scale refers to the sequence of notes following certain musical theories. There are
many scales in the music theory which can be input to the system to emulate world music without
mistakes.
4. Previous State: This input is to train the network the number of instances a particular note was
played during the previous time step.
6
7. 5. Rhythm: This is a useful input to let the network understand the position of the current note
with respect to the time measure and time signature
Along the axis of time, LSTM which have recurrent loops forms the first hidden layer. The other axis
of LSTM, which is the note axis searches notes from the low registers ranging till the high registers.
After the running of the last layer of Long Short Term Memory Network, the final fully connected and
non-recurrent layer provides an output of two types of probabilities which are given as follows:
1. Probability of each note getting played
2. If the note probability is 1 then the probability of the articulation of that note will also form one
of the outputs from the non-recurrent network layer.
Processing of musical output: The MIDI file music generated from the output may be then fed into the
music production softwares and then edited to change the instrument. Since MIDI format quality is very
poor, high quality virtual instruments from third parties can be used to provide scoring level quality.
This option is available for the MIDI files since MIDI format represent information about note, velocity,
pitch and other musical parameters.
5 Proposed Implementation
Our music generation model proposed will be implemented in Python programming language. Particu-
larly, well be using a python library called Theano which simplifies the computational complexity and
provides flexibility to the network architecture. Now we will discuss the step by step implementation
which is as described below
Random small pieces of the MIDI files are fed into the recurrent neural networks during training.The
cross-entropy can be found out by obtaining the probabilities of all the outputs. These probabilities
are then applied a logarithmic transformation and then applied a negation. The output is then fed as
cost for the optimization of weights into the AdaDelta optimizer. Training of the time-axis layers is
done by batching each and every note together and then training the layers of the note-axis by batching
together each and every time step. The processing unit of the computer system is better utilised because
of the ability to multiply big matrices. Dropout is used in our network so as to eliminate the problem
of overfitting. The application of dropout in each layer, renders the work of elimination of 50% of the
hidden nodes. The output of each and every layer is multiplied with a mask and hence the fragile nodes
are eliminated by multiplying their output by zero. This achieves the purpose of specialization and
thereby prevents the nodes to sway towards weak dependencies. We then multiply the output of each
and every node by 0.5 with the purpose of adding a correction factor to prevent more active nodes to
eliminate large number of active nodes.
For training the model we use the instance of Amazon web service (AWS). We use cheaper options
of instances which are called as spot instances. For one hour, the price of the using the instance range
from 10 cents (US) to 15 cents (US). Our proposed model consists of two hidden layers of note-axis and
two hidden layers of time-axis. The note axis layers have 100 nodes and 50 nodes respectively for the
two layers. Similarly, the 300 nodes are present for the two hidden time axis layers. The training of all
the MIDI files in our dataset was performed by choosing 8- count segments of these MIDI clips and then
batching them together.
7
8. Parsing MIDI
Dataset
Parsing Dataset
of Music Theory
Dimension
Reduction
Final Features
Training Com-
bined Dataset
Testing Com-
bined Dataset
Training Fea-
ture Vectors
Testing Fea-
ture Vectors
Music Generation
Evaluation
Figure 2: Decision Model
6 Proposed Evaluation
The evaluation of the output for our music generation model will be performed by conducting an open
survey. The details of this evaluation is as described in the following section:
In the open survey, we will select a group of 50 people, among them the 80 percentage of the people
will have musical background and 20 percent people would not be having any musical background.
There will be three sets of identification tasks and in each set there will be three musical recordings to
be identified as generated by our music generation system. Out of these three musical recordings, two
will be composed by humans and one will be generated by our system. The group of people will be
knowing these rules and will have to identify from these three recordings, the music generated by our
system. The participants of this survey will also be given an option to write comments about each of the
recordings in each set. We will now describe the evaluation metrics to be used for this survey. Recording
Set Identifier will denote each of the three sets of recordings used for the survey. A metric called as
Incorrect Identification will be used to denote the percentage of incorrectly identified recordings in each
set and similarly, Correct Identification will denote the percentage of correctly identified recordings in
each set used for the survey. We estimate that the percentage of correctly identified samples will be
lower than 40 percentage ranging till 20 percentage. The estimation of the results can be tabulated as
follows
Hence, we anticipate the incorrect identification to be around 73.6 percentage and the correct iden-
tification to be 26.4 percentage. The highlight of this survey is that when the participants are asked
to pick the computer generated music from the total three recordings, the participants will definitely
pick the most unpleasing and inferior level recording. This is because of the less advances in the field of
computer generated music and the ability of humans to create world class, superior music. Hence, the
8
9. Recording Set Identifier Incorrect Identification Correct Identification
1 66% 34%
2 75% 25%
3 80% 20%
Total 73.60% 26.40%
incorrect identification is the accuracy of the evaluation of our system. Hence, we expect around 70 to
75 percentage of accuracy in this respect.
In the next evaluation, we survey the correctness of the genre of each recording. The same participants
are given 20 recordings of three genres selected for this evaluation. The recordings of these genres from
the training dataset and the generated recordings having similar genres are selected to be reviewed by
the participants. The metric used for reviewing the matching criteria between the training data and
the generated data is the similarity score. A mean of these similarity scores are taken for each genre to
get the similarity score of the complete recordings in each genre. The similarity score ranges from 1 to
5 where 1 denotes that the recordings sound very different to the training dataset and 5 denotes that
the recordings sound very similar to the training dataset. We expect a similarity mean score of 4.1 for
pop music genre since pop music is very commonly known among the masses and hence there wont be
much difficulty in identifying the differences in this genre. Also, the accuracy of the proposed model
will increase the similarity score. The similarity score of Jazz is expected to be less compared to other
genres because of the complexity of music of this genre and less followers of this genre. The following
table shows the mean of the similarity scores of each genre:
Genre Mean
Pop 4.1
Jazz 3.8
Blues 3.9
Hence, we calculate the accuracy of the proposed music generation model by the reviews of surveys.
The estimation or the anticipation of the results were shown in the above tabulated results. We expect
a state of the art performing model using the proposed methodology and implementation
7 Conclusion
Hence, we have proposed the plan or the blueprint of the research to be undertaken with regards to
composing or generating monophonic music using neural networks. Our plan aims to guide the research
appropriately to complete the project within the time frame of three months and to have a better
understanding during the actual implementation of the methodology. We anticipate state of the art
results compared to other researches in this field as seen in the proposed evaluation. We strive to
detail the steps of implementation and enhancing the proposed model even more after trying and testing
different approaches.
References
Boulanger-Lewandowski, N., Bengio, Y. and Vincent, P. (2012). Modeling Temporal Dependencies in
High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription, (Cd).
URL: http://arxiv.org/abs/1206.6392
Cameron, B. B. (2001). System and Method for Automatic Music Generation using a Neural Network
Architecture, 2(12).
Chung, J., Gulcehre, C., Cho, K. and Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent
Neural Networks on Sequence Modeling, pp. 1–9.
URL: http://arxiv.org/abs/1412.3555
Eck, D. and Schmidhuber, J. (2002). A First Look at Music Composition using LSTM Recurrent Neural
Networks, Idsia pp. 1–11.
URL: http://people.idsia.ch/ juergen/blues/IDSIA-07-02.pdf%0Ahttp://www.idsia.ch/ juergen/blues/IDSIA-
07-02.pdf
9
10. Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K. and Norouzi, M. (2017). Neural
Audio Synthesis of Musical Notes with WaveNet Autoencoders.
URL: http://arxiv.org/abs/1704.01279
Goel, K., Vohra, R. and Sahoo, J. K. (2014). Polyphonic music generation by modeling temporal
dependencies using a RNN-DBN, Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8681 LNCS: 217–224.
Kitahara, T. (2017). Music Generation Using Bayesian Networks, pp. 3–6.
URL: http://www.kthrlab.jp/
Liang, F. (2016). BachBot: Automatic composition in the style of Bach chorales - Developing, analyzing,
and evaluating a deep LSTM model for musical style, (August).
Lichtenwalter, R. and Lichtenwalter, K. (2009). Applying learning algorithms to music generation,
Proceedings of the 4th pp. 483–502.
URL: http://www.cse.nd.edu/Reports/2008/TR-2008-10.pdf
Lyu, Q., Wu, Z., Zhu, J. and Meng, H. (2015). Modelling high-dimensional sequences with LSTM-
RTRBM: Application to polyphonic music generation, IJCAI International Joint Conference on Ar-
tificial Intelligence 2015-Janua(Ijcai): 4138–4139.
Madhok, R., Goel, S. and Garg, S. (2018). SentiMozart : Music Generation based on Emotions,
2(Icaart): 501–506.
Ma´ndziuk, J., Wo´zniczko, A., Goss, M., Ma´ndziuk, J., Wo´zniczko, A., Goss, M. and System, A. N.-m.
(2016). A Neuro-memetic System for Music Composing.
Mao, H. H. (2018). DeepJ: Style-Specific Music Generation, Proceedings - 12th IEEE International
Conference on Semantic Computing, ICSC 2018 2018-Janua: 377–382.
Masako, N. and Kazuyuki, W. (1992). Interactive Music Composer Based on Neural Networks.Pdf.
Oliwa, T. and Wagner, M. (2008). Composing music with Neural Networks and probabilistic finite-
state machines, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics) 4974 LNCS: 503–508.
Roig, C., Tard´on, L. J., Barbancho, I. and Barbancho, A. M. (2014). Automatic melody composition
based on a probabilistic model of music style and harmonic rules, Knowledge-Based Systems 71: 419–
434.
URL: http://dx.doi.org/10.1016/j.knosys.2014.08.018
Sabathe, R., Coutinho, E. and Schuller, B. (2017). Deep recurrent music writer: Memory-enhanced
variational autoencoder-based musical score composition and an objective measure, Proceedings of the
International Joint Conference on Neural Networks 2017-May: 3467–3474.
Yang, L.-C., Chou, S.-Y. and Yang, Y.-H. (2017). MidiNet: A Convolutional Generative Adversarial
Network for Symbolic-domain Music Generation.
URL: http://arxiv.org/abs/1703.10847
Yuksel, A., Karci, M. and Uyar, A. (2011). Automatic music generation using evolutionary algorithms
and neural networks, pp. 354–358.
10