SlideShare a Scribd company logo
Composing Monophonic World Music Using Deep Learning
Nithin Xavier
x17110530
MSc in Data Analytics
14th August 2018
Keywords Music Generation; Computer Composing; Neural Networks; Deep Learning.
Abstract: Music is considered as a universal language, loved by all. We enjoy music of
languages, without any language barrier since music defines itself as a medium which connects
the mind and the soul. Computer generated music is a relatively new term and the associated
domain is still in its infancy. Still, there have been good result yielding researches in this field
albeit there are limited researches. Emulating a human composer is the task of the system
proposed to generate good sounding music. Alan Turings theory can be applied here that when
a human cannot distinguish between a computer-generated music and a human composers
work, that computer music generation system will be deemed as perfect. The current problems
of music composers to generate long hours of music which are normally played in airports,
restaurants, flights, malls and other public places can be effectively solved by neural networks.
These neural networks can also synthesize Sleep Music, which also is a growing genre is
becoming popular with people having sleeping problems or discomforts. We generalize these
problems and strive to develop a model to compose monophonic world music using a good
training dataset and a novel application of musical theory knowledge.
1
Contents
1 Introduction 3
2 State of the Art 3
2.1 Filetype of Dataset used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Model Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Research Question 5
4 Proposed Approach 5
5 Proposed Implementation 7
6 Proposed Evaluation 8
7 Conclusion 9
2
1 Introduction
Composition of music is considered a creative and innovative work. It involves application of music
principles like chord progressions, scales, harmony, dynamics, etc. Even though a composer ought to
follow these musical rules or theory, sometimes a composer may introduce out of the box arrangements
by making changes in chord progressions or by inducing accidental notes and styles. Also, due to the
fact that there are lot of music already available and under development, the composer has to be alert to
avoid the chances of his/her music to be similar to another already available music. There are seven main
notes in a scale ranging from C to B. The higher octave C will constitute the 8th note forming the C scale
in music. For every note there are sharp notes or flat notes except for two notes. This same arrangement
of keys are repeated to form higher or lower octaves. Hence, there are only 12 distinct notes in music.
Composers have only these 12 notes to compose any music. The permutations and combinations of these
12 notes is what we find in every music produced or arranged. Sometimes, after composing a song, the
composer may find a similar tune matching his/her work and is in a position to change the complete
affected sequence. The musical rules and theory can be fed into neural networks to generate a unique
sequence of musical arrangement which form the novelty in this project. Computer systems can learn
effectively better than humans if properly trained. Computer generated music have been researched since
long, but there are only a few researches in this field, owing to the lack of musical training or expertise and
also due to the lack of interest shown by sound technologists and musicians. Recurrent neural networks
are employed in many researches because of its good performance in this arena. Through this research we
address the problems of the music composers such as composing long hours of music for meditation and
leisure music to be played at public places like airports, flights, malls, etc. Composing music which has
a duration for more than 2-3 hours can be time consuming and exhaustive experience. These music may
contain similar patterns throughout which can be generated effectively by neural networks by training
the system well. Hence, the research undertaken with this regard has the following research question:
How effectively can deep learning techniques generate unique monophonic world music based on a music
dataset and musical theory?. We address this research question to propose an appropriate methodology,
implementation and expected result evaluation. In the next section we talk about the various researches
carried out related to our proposed objective.
2 State of the Art
In this section we analyse, and survey various research papers related to our research objective. The
following first sub-section under consideration is the format or type of the dataset used in the researches
of the referenced papers.
2.1 Filetype of Dataset used
Since our research question seeks solution to develop a monophonic music sample based on a contem-
porary music dataset and music theory rules, the processing of the dataset and the output format of the
music generated plays a crucial role. The type of the dataset used for the processing in neural networks
decides the quality of the output of generated music and the complexity of neural networks. There are
two main types of audio formats which are MIDI (Musical Instrument Digital Interface Format) and
WAV (Waveform Audio File Format) filetypes. If we analyse the type of data used in the research pa-
pers of related work, we can observe that there is a clear dominance of the usage of MIDI format for
computer music generation. The following are the citations of research works which have used MIDI
filetype: Madhok et al. (2018), Mao (2018), Yang et al. (2017), Sabathe et al. (2017), Liang (2016),
Lyu et al. (2015), Goel et al. (2014), Chung et al. (2014), Roig et al. (2014), Boulanger-Lewandowski
et al. (2012), Yuksel et al. (2011), Oliwa and Wagner (2008), Cameron (2001) and Masako and Kazuyuki
(1992). This shows that from the early 90s till the present time, MIDI filetype have been the preferred
choice of researchers in this domain. The main reason of this choice is because of low complexity in
computation and the output is recognizable and editable in most of the digital audio workstations or
music production softwares. MIDI format contains information like the pitch, velocity and many other
information regarding the played notation. The main advantage is that instrument of the music gener-
ated can be changed to any desirable, high quality virtual instrument to increase the sound quality of
the final output. The disadvantage of the direct playback of MIDI files is that the sound quality of the
instrument is substandard, and poor compared to all other formats. Now, well see the next filetype used
for the researches in this domain.
3
WAV is the preferred filetype in music production scenarios because of the feature of lossless audio.
But they face a drawback of a significantly large file size than compared to the MIDI format. Engel et al.
(2017) and Ma´ndziuk et al. (2016) have used the WAV format in their researches. These files cannot be
edited by note but will have to be edited by the waveform which is a more complex task. Because of the
large file size, the computational complexity is high, and it will require superior level GPUs to execute
the machine learning algorithms. The next filetype concerns extraction of information related to sheet
music to train the system.
Lichtenwalter and Lichtenwalter (2009) has used MusicXML format to garner information regarding
the notes, time signature, pitch, dynamics and other musical parameters to feed into the system for
teaching the system about chord progressions and other music theory for effective music generation.
While, Eck and Schmidhuber (2002) have directly extracted musical information from sheet music to
train the system of the chords and the sequence of notes to be played. Extracting information from
sheet music is not as effective as the two commonly playable audio formats mentioned before which are
MIDI and WAV. Hence, most of the researches in the domain of computerised music generation or music
composing utilizes these filetypes.
2.2 Model Generation
The model generation or the development of neural network is the major part of this project which
learns and trains itself to produce monophonic music samples as per our objective. This section aims to
discuss and compare the various methods used for the network generation. Madhok et al. (2018) in their
research have recorded 7 major human emotions and then as per the detected emotion, generated music
apt for that observed scenario and emotion using dual layer Long Short Term Memory Network (LSTM)
architecture. The evaluation of this work was performed using a correlation between the facial expression
detected and the probability that the resulting music falls in the same section. This correlation resulted
in 0.93, which proves to be a good score. Mao (2018) have used a dual axis LSTM architecture wherein
one axis provides provision for the desired time of generated music and the other axis facilitates the
output of the desired notes. By the addition of style and volume features the music production quality
was enhanced. The evaluation of this approach was done by a statistical hypothesis with the level of
significance 0.05 and the value z = 0.945 conveys that the classification precision of human composers
and the proposed approach was almost similar. Solutions to three different music generation problems
like harmonization, chord inversion and voicings and chord estimation were achieved by Kitahara (2017)
by using Bayesian Networks. Yang et al. (2017) have implemented model based on Convolutional Neural
Networks (CNN) and Generative Adversarial Network (GAN) in which information regarding the previ-
ous bar and the sequence of chord structure is incorporated and have produced similar results as others.
The drawback of this model is highlighted by the absence of consideration to velocity and musical pauses
which makes the music produced to be aligned towards artificial music. Sabathe et al. (2017) used
LSTM networks for music generation with optimized parameters like 167 units of LSTM for both the
decoding and encoding functions and 23 steps to perform sequential automatic encoding. The major
drawback in this approach was that production of music pieces longer than the trained samples could
not be generated.
Music theory and other features of music were utilised more effectively by Ma´ndziuk et al. (2016) in
which the authors developed a combined algorithm consisting of a genetic algorithm and local optimiza-
tion which captures all necessary technicalities of music theory to produce aesthetically and theoretically
superior music.Liang (2016) have developed a sequential LSTM network where they train the system to
produce good quality music without much training of musical theory concepts. Lyu et al. (2015) have
done an amalgamation of LSTM units to Recurrent Temporal Restricted Boltzmann Machine (RTRBM)
and have secured average results which were caused by the absence of optimization techniques. Goel et al.
(2014) used RNN with two layers of Restricted Boltzmann Machine for sequence modelling to produce
music whose results are only at par with the other researches owing to lack of optimization methods and
pretraining of musical theory.Chung et al. (2014) shows that LSTM and GRU units fare better in LSTM
networks as opposed to tanh unit in the applications of raw speech and polyphonic music generation.
Eck and Schmidhuber (2002) facilitated learning chord sequences and melody sequences to input the
learnt information to LSTM networks. Hence, a majority of the researchers have used recurrent neural
networks, more specifically LSTM networks to be able to learn and generate music pieces based on an
input work and musical technicalities.
4
3 Research Question
The research question for this project is as follows: How effectively can deep learning techniques generate
unique monophonic world music based on a music dataset and musical theory?.
This falls under the domain of Computer Music Generation. Neural networks have been successful
in producing good musical recordings as seen in the literature review in the next sub-section. To be
able to evaluate and learn the previous notes and chords played in a musical sequence is necessary in
computerised music generation. Hence, we use recurrent neural networks (RNN) have recurrent loops
for the nodes. LSTM units have been seen as the best RNN unit in the referenced papers. But for the
effective implementation of our research objective, we introduce a novel implementation of the learning of
Musical theory concerning scales in music, time signatures, chord progressions, velocity understanding of
each note according to genre and accidental chords and progressions that may be introduced unexpectedly
into a music. These information are contained in MusicXML format which are extracted into the system
along with MIDI information. In the related literature, we can observe the mention of musical knowledge
which can complement and better the system under development to generate music. The researchers
have suggested the collaboration of musical experts and computer experts. Since, I am a musician and
have the requisite music theory knowledge to train the neural networks, this project is feasible and can
be improved upon the other researches in this domain.
4 Proposed Approach
In this section the approach to be followed for the given research question is detailed and given an over-
view of the complete picture. We propose to implement recurrent neural networks (RNN) to analyse the
global contemporary music, MIDI dataset and recreate similar sequences of music. We use MIDI files as
an input to the network to train known melody lines and musical sequences and test for the probability
of occurrence of similar but unique music. The dataset used is the The Lakh MIDI Dataset v1.0 found
in the web link: http:colinraffel.comprojectslmd#get. There around 1 lakh MIDI files of the songs listed
in the Million Song Dataset. So this dataset is a subset of the Million Song Dataset containing con-
temporary global music. MIDI Files consist of only the information about the notes played, timing of
the notes, time signature, dynamics and velocity. Also, we use information related to music theory such
as chord progression, scales, accidental note and chord usages and other music rules in the MusicXML
format to be fed into the Recurrent Neural Network. These information are present as sheet music in
this format which will be extracted and trained in the system proposed.
The desired design constraints of the network are as follows:
Time Signature The Recurrent neural network should be able to identify the current playing time
with reference to the musical time signature. Time signature refers to the number of beats occurring
in one single bar of music. In common time, there are four beats in one bar which is denoted by 4/4.
Likewise, there are many time signatures like 3/4, 5/8, 6/8, 7/8, etc.
Invariance in Notes There should be independence in the music with related to the octave. Changing
octaves should not affect the basic note, chord structure and progressions.
Repetition of Notes The sustain of one note over two bars should be distinguished from playing that
same note twice.
Invariance in Time There should be freedom given to the network to generate music independent of
the time frame like an adlib, as called in musical terms
Accidental Note and Chord Changes Accidental notes or chords can be termed as out of the box
methods which do not feature in musical theory. These can be innovative and enriching to hear if used
correctly with some developed rules.
5
Figure 1: Network Design
The property of being invariant in time is achievable using RNN. However, note invariance is not
achievable in RNN because the fully connected layer has nodes to represent all the notes in MIDI format.
If we increase the pitch of every note by one half step in music, the output of the network will be entirely
different from the desirable output. This drawback can be resolved by the convolutional neural networks
(CNN). In the case of image recognition applications, the kernel of CNN is used to apply to all pixels of
the image of the input source. Now, we assume that kernel of CNN is replaced with an RNN. Hence, the
network will consist of an RNN, wherein the kernel is replaced with another RNN. This would facilitate
the cells or the pixel the luxury to possess a neural network of its own. This example is now applied to
our study and hence we replace the pixels in this example with notes which are main elements in our
research. If we implement a stack of similar RNN, wherein this RNN is provided for each and every
note, every note gets a neighbourhood of RNN. These neighbourhood RNN of every note is assigned one
octave higher and lower than the normal pitch of the note. Hence, we achieve an invariance in time as
well as notes.
Because of the memory retained concerning previous notes and sequences, we must now build a
method to produce good innovative chords for the music. Hence, we divide this approach into two
sections. Bi-Axial recurrent neural network is suitable for meeting our research objective. In Bi-Axial
RNN, the first axis will represent time and the other axis will represent note. The network design is as
shown in the figure.
The following are the details concerning the inputs and outputs of the proposed network: The inputs
to the time layer in the Bi-axial RNN are discussed here.
1. Note Value: The note value refers to the MIDI value which describes1 the register of the played
note whether it lies in the lower register or the higher register.
2. Pitch: This refers to the value of the pitch of the played note where A notes pitch value is 0 and
the value increases for every one-half note increase in pitch.
3. Scale: The scale refers to the sequence of notes following certain musical theories. There are
many scales in the music theory which can be input to the system to emulate world music without
mistakes.
4. Previous State: This input is to train the network the number of instances a particular note was
played during the previous time step.
6
5. Rhythm: This is a useful input to let the network understand the position of the current note
with respect to the time measure and time signature
Along the axis of time, LSTM which have recurrent loops forms the first hidden layer. The other axis
of LSTM, which is the note axis searches notes from the low registers ranging till the high registers.
After the running of the last layer of Long Short Term Memory Network, the final fully connected and
non-recurrent layer provides an output of two types of probabilities which are given as follows:
1. Probability of each note getting played
2. If the note probability is 1 then the probability of the articulation of that note will also form one
of the outputs from the non-recurrent network layer.
Processing of musical output: The MIDI file music generated from the output may be then fed into the
music production softwares and then edited to change the instrument. Since MIDI format quality is very
poor, high quality virtual instruments from third parties can be used to provide scoring level quality.
This option is available for the MIDI files since MIDI format represent information about note, velocity,
pitch and other musical parameters.
5 Proposed Implementation
Our music generation model proposed will be implemented in Python programming language. Particu-
larly, well be using a python library called Theano which simplifies the computational complexity and
provides flexibility to the network architecture. Now we will discuss the step by step implementation
which is as described below
Random small pieces of the MIDI files are fed into the recurrent neural networks during training.The
cross-entropy can be found out by obtaining the probabilities of all the outputs. These probabilities
are then applied a logarithmic transformation and then applied a negation. The output is then fed as
cost for the optimization of weights into the AdaDelta optimizer. Training of the time-axis layers is
done by batching each and every note together and then training the layers of the note-axis by batching
together each and every time step. The processing unit of the computer system is better utilised because
of the ability to multiply big matrices. Dropout is used in our network so as to eliminate the problem
of overfitting. The application of dropout in each layer, renders the work of elimination of 50% of the
hidden nodes. The output of each and every layer is multiplied with a mask and hence the fragile nodes
are eliminated by multiplying their output by zero. This achieves the purpose of specialization and
thereby prevents the nodes to sway towards weak dependencies. We then multiply the output of each
and every node by 0.5 with the purpose of adding a correction factor to prevent more active nodes to
eliminate large number of active nodes.
For training the model we use the instance of Amazon web service (AWS). We use cheaper options
of instances which are called as spot instances. For one hour, the price of the using the instance range
from 10 cents (US) to 15 cents (US). Our proposed model consists of two hidden layers of note-axis and
two hidden layers of time-axis. The note axis layers have 100 nodes and 50 nodes respectively for the
two layers. Similarly, the 300 nodes are present for the two hidden time axis layers. The training of all
the MIDI files in our dataset was performed by choosing 8- count segments of these MIDI clips and then
batching them together.
7
Parsing MIDI
Dataset
Parsing Dataset
of Music Theory
Dimension
Reduction
Final Features
Training Com-
bined Dataset
Testing Com-
bined Dataset
Training Fea-
ture Vectors
Testing Fea-
ture Vectors
Music Generation
Evaluation
Figure 2: Decision Model
6 Proposed Evaluation
The evaluation of the output for our music generation model will be performed by conducting an open
survey. The details of this evaluation is as described in the following section:
In the open survey, we will select a group of 50 people, among them the 80 percentage of the people
will have musical background and 20 percent people would not be having any musical background.
There will be three sets of identification tasks and in each set there will be three musical recordings to
be identified as generated by our music generation system. Out of these three musical recordings, two
will be composed by humans and one will be generated by our system. The group of people will be
knowing these rules and will have to identify from these three recordings, the music generated by our
system. The participants of this survey will also be given an option to write comments about each of the
recordings in each set. We will now describe the evaluation metrics to be used for this survey. Recording
Set Identifier will denote each of the three sets of recordings used for the survey. A metric called as
Incorrect Identification will be used to denote the percentage of incorrectly identified recordings in each
set and similarly, Correct Identification will denote the percentage of correctly identified recordings in
each set used for the survey. We estimate that the percentage of correctly identified samples will be
lower than 40 percentage ranging till 20 percentage. The estimation of the results can be tabulated as
follows
Hence, we anticipate the incorrect identification to be around 73.6 percentage and the correct iden-
tification to be 26.4 percentage. The highlight of this survey is that when the participants are asked
to pick the computer generated music from the total three recordings, the participants will definitely
pick the most unpleasing and inferior level recording. This is because of the less advances in the field of
computer generated music and the ability of humans to create world class, superior music. Hence, the
8
Recording Set Identifier Incorrect Identification Correct Identification
1 66% 34%
2 75% 25%
3 80% 20%
Total 73.60% 26.40%
incorrect identification is the accuracy of the evaluation of our system. Hence, we expect around 70 to
75 percentage of accuracy in this respect.
In the next evaluation, we survey the correctness of the genre of each recording. The same participants
are given 20 recordings of three genres selected for this evaluation. The recordings of these genres from
the training dataset and the generated recordings having similar genres are selected to be reviewed by
the participants. The metric used for reviewing the matching criteria between the training data and
the generated data is the similarity score. A mean of these similarity scores are taken for each genre to
get the similarity score of the complete recordings in each genre. The similarity score ranges from 1 to
5 where 1 denotes that the recordings sound very different to the training dataset and 5 denotes that
the recordings sound very similar to the training dataset. We expect a similarity mean score of 4.1 for
pop music genre since pop music is very commonly known among the masses and hence there wont be
much difficulty in identifying the differences in this genre. Also, the accuracy of the proposed model
will increase the similarity score. The similarity score of Jazz is expected to be less compared to other
genres because of the complexity of music of this genre and less followers of this genre. The following
table shows the mean of the similarity scores of each genre:
Genre Mean
Pop 4.1
Jazz 3.8
Blues 3.9
Hence, we calculate the accuracy of the proposed music generation model by the reviews of surveys.
The estimation or the anticipation of the results were shown in the above tabulated results. We expect
a state of the art performing model using the proposed methodology and implementation
7 Conclusion
Hence, we have proposed the plan or the blueprint of the research to be undertaken with regards to
composing or generating monophonic music using neural networks. Our plan aims to guide the research
appropriately to complete the project within the time frame of three months and to have a better
understanding during the actual implementation of the methodology. We anticipate state of the art
results compared to other researches in this field as seen in the proposed evaluation. We strive to
detail the steps of implementation and enhancing the proposed model even more after trying and testing
different approaches.
References
Boulanger-Lewandowski, N., Bengio, Y. and Vincent, P. (2012). Modeling Temporal Dependencies in
High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription, (Cd).
URL: http://arxiv.org/abs/1206.6392
Cameron, B. B. (2001). System and Method for Automatic Music Generation using a Neural Network
Architecture, 2(12).
Chung, J., Gulcehre, C., Cho, K. and Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent
Neural Networks on Sequence Modeling, pp. 1–9.
URL: http://arxiv.org/abs/1412.3555
Eck, D. and Schmidhuber, J. (2002). A First Look at Music Composition using LSTM Recurrent Neural
Networks, Idsia pp. 1–11.
URL: http://people.idsia.ch/ juergen/blues/IDSIA-07-02.pdf%0Ahttp://www.idsia.ch/ juergen/blues/IDSIA-
07-02.pdf
9
Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K. and Norouzi, M. (2017). Neural
Audio Synthesis of Musical Notes with WaveNet Autoencoders.
URL: http://arxiv.org/abs/1704.01279
Goel, K., Vohra, R. and Sahoo, J. K. (2014). Polyphonic music generation by modeling temporal
dependencies using a RNN-DBN, Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8681 LNCS: 217–224.
Kitahara, T. (2017). Music Generation Using Bayesian Networks, pp. 3–6.
URL: http://www.kthrlab.jp/
Liang, F. (2016). BachBot: Automatic composition in the style of Bach chorales - Developing, analyzing,
and evaluating a deep LSTM model for musical style, (August).
Lichtenwalter, R. and Lichtenwalter, K. (2009). Applying learning algorithms to music generation,
Proceedings of the 4th pp. 483–502.
URL: http://www.cse.nd.edu/Reports/2008/TR-2008-10.pdf
Lyu, Q., Wu, Z., Zhu, J. and Meng, H. (2015). Modelling high-dimensional sequences with LSTM-
RTRBM: Application to polyphonic music generation, IJCAI International Joint Conference on Ar-
tificial Intelligence 2015-Janua(Ijcai): 4138–4139.
Madhok, R., Goel, S. and Garg, S. (2018). SentiMozart : Music Generation based on Emotions,
2(Icaart): 501–506.
Ma´ndziuk, J., Wo´zniczko, A., Goss, M., Ma´ndziuk, J., Wo´zniczko, A., Goss, M. and System, A. N.-m.
(2016). A Neuro-memetic System for Music Composing.
Mao, H. H. (2018). DeepJ: Style-Specific Music Generation, Proceedings - 12th IEEE International
Conference on Semantic Computing, ICSC 2018 2018-Janua: 377–382.
Masako, N. and Kazuyuki, W. (1992). Interactive Music Composer Based on Neural Networks.Pdf.
Oliwa, T. and Wagner, M. (2008). Composing music with Neural Networks and probabilistic finite-
state machines, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics) 4974 LNCS: 503–508.
Roig, C., Tard´on, L. J., Barbancho, I. and Barbancho, A. M. (2014). Automatic melody composition
based on a probabilistic model of music style and harmonic rules, Knowledge-Based Systems 71: 419–
434.
URL: http://dx.doi.org/10.1016/j.knosys.2014.08.018
Sabathe, R., Coutinho, E. and Schuller, B. (2017). Deep recurrent music writer: Memory-enhanced
variational autoencoder-based musical score composition and an objective measure, Proceedings of the
International Joint Conference on Neural Networks 2017-May: 3467–3474.
Yang, L.-C., Chou, S.-Y. and Yang, Y.-H. (2017). MidiNet: A Convolutional Generative Adversarial
Network for Symbolic-domain Music Generation.
URL: http://arxiv.org/abs/1703.10847
Yuksel, A., Karci, M. and Uyar, A. (2011). Automatic music generation using evolutionary algorithms
and neural networks, pp. 354–358.
10

More Related Content

Similar to Nithin Xavier research_proposal

Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep Learning
IRJET Journal
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUND
Jaideep Ghosh
 
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET Journal
 
Analysis Synthesis Comparison
Analysis Synthesis ComparisonAnalysis Synthesis Comparison
Analysis Synthesis Comparison
Jim Webb
 
Extraction and Conversion of Vocals
Extraction and Conversion of VocalsExtraction and Conversion of Vocals
Extraction and Conversion of Vocals
IRJET Journal
 
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING  PPT.pptxMUSZIC GENERATION USING DEEP LEARNING  PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
life45165
 
Application of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music GenerationApplication of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music Generation
IRJET Journal
 
2012 a rebeloijmir
2012 a rebeloijmir2012 a rebeloijmir
2012 a rebeloijmir
Miguel Ponce
 
Applsci 08-00606-v3
Applsci 08-00606-v3Applsci 08-00606-v3
Applsci 08-00606-v3
IsraelEbonko
 
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET Journal
 
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORKAUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
Jennifer Roman
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
TELKOMNIKA JOURNAL
 
Collins
CollinsCollins
Collins
anesah
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
AM Publications
 
Modeling of Song Pattern Similarity using Coefficient of Variance
Modeling of Song Pattern Similarity using Coefficient of VarianceModeling of Song Pattern Similarity using Coefficient of Variance
Modeling of Song Pattern Similarity using Coefficient of Variance
Gobinda Karmakar ☁
 
Emofy
Emofy Emofy
IRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation SystemIRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation System
IRJET Journal
 
Wilkie
WilkieWilkie
Wilkie
anesah
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key finding
ijma
 
survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...
Suraj Ligade
 

Similar to Nithin Xavier research_proposal (20)

Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep Learning
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUND
 
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
 
Analysis Synthesis Comparison
Analysis Synthesis ComparisonAnalysis Synthesis Comparison
Analysis Synthesis Comparison
 
Extraction and Conversion of Vocals
Extraction and Conversion of VocalsExtraction and Conversion of Vocals
Extraction and Conversion of Vocals
 
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING  PPT.pptxMUSZIC GENERATION USING DEEP LEARNING  PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
 
Application of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music GenerationApplication of Recurrent Neural Networks paired with LSTM - Music Generation
Application of Recurrent Neural Networks paired with LSTM - Music Generation
 
2012 a rebeloijmir
2012 a rebeloijmir2012 a rebeloijmir
2012 a rebeloijmir
 
Applsci 08-00606-v3
Applsci 08-00606-v3Applsci 08-00606-v3
Applsci 08-00606-v3
 
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
IRJET- Implementation of Emotion based Music Recommendation System using SVM ...
 
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORKAUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
AUTOMATED MUSIC MAKING WITH RECURRENT NEURAL NETWORK
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
 
Collins
CollinsCollins
Collins
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
 
Modeling of Song Pattern Similarity using Coefficient of Variance
Modeling of Song Pattern Similarity using Coefficient of VarianceModeling of Song Pattern Similarity using Coefficient of Variance
Modeling of Song Pattern Similarity using Coefficient of Variance
 
Emofy
Emofy Emofy
Emofy
 
IRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation SystemIRJET- A Personalized Music Recommendation System
IRJET- A Personalized Music Recommendation System
 
Wilkie
WilkieWilkie
Wilkie
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key finding
 
survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...survey on Hybrid recommendation mechanism to get effective ranking results fo...
survey on Hybrid recommendation mechanism to get effective ranking results fo...
 

Recently uploaded

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 

Recently uploaded (20)

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 

Nithin Xavier research_proposal

  • 1. Composing Monophonic World Music Using Deep Learning Nithin Xavier x17110530 MSc in Data Analytics 14th August 2018 Keywords Music Generation; Computer Composing; Neural Networks; Deep Learning. Abstract: Music is considered as a universal language, loved by all. We enjoy music of languages, without any language barrier since music defines itself as a medium which connects the mind and the soul. Computer generated music is a relatively new term and the associated domain is still in its infancy. Still, there have been good result yielding researches in this field albeit there are limited researches. Emulating a human composer is the task of the system proposed to generate good sounding music. Alan Turings theory can be applied here that when a human cannot distinguish between a computer-generated music and a human composers work, that computer music generation system will be deemed as perfect. The current problems of music composers to generate long hours of music which are normally played in airports, restaurants, flights, malls and other public places can be effectively solved by neural networks. These neural networks can also synthesize Sleep Music, which also is a growing genre is becoming popular with people having sleeping problems or discomforts. We generalize these problems and strive to develop a model to compose monophonic world music using a good training dataset and a novel application of musical theory knowledge. 1
  • 2. Contents 1 Introduction 3 2 State of the Art 3 2.1 Filetype of Dataset used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Model Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Research Question 5 4 Proposed Approach 5 5 Proposed Implementation 7 6 Proposed Evaluation 8 7 Conclusion 9 2
  • 3. 1 Introduction Composition of music is considered a creative and innovative work. It involves application of music principles like chord progressions, scales, harmony, dynamics, etc. Even though a composer ought to follow these musical rules or theory, sometimes a composer may introduce out of the box arrangements by making changes in chord progressions or by inducing accidental notes and styles. Also, due to the fact that there are lot of music already available and under development, the composer has to be alert to avoid the chances of his/her music to be similar to another already available music. There are seven main notes in a scale ranging from C to B. The higher octave C will constitute the 8th note forming the C scale in music. For every note there are sharp notes or flat notes except for two notes. This same arrangement of keys are repeated to form higher or lower octaves. Hence, there are only 12 distinct notes in music. Composers have only these 12 notes to compose any music. The permutations and combinations of these 12 notes is what we find in every music produced or arranged. Sometimes, after composing a song, the composer may find a similar tune matching his/her work and is in a position to change the complete affected sequence. The musical rules and theory can be fed into neural networks to generate a unique sequence of musical arrangement which form the novelty in this project. Computer systems can learn effectively better than humans if properly trained. Computer generated music have been researched since long, but there are only a few researches in this field, owing to the lack of musical training or expertise and also due to the lack of interest shown by sound technologists and musicians. Recurrent neural networks are employed in many researches because of its good performance in this arena. Through this research we address the problems of the music composers such as composing long hours of music for meditation and leisure music to be played at public places like airports, flights, malls, etc. Composing music which has a duration for more than 2-3 hours can be time consuming and exhaustive experience. These music may contain similar patterns throughout which can be generated effectively by neural networks by training the system well. Hence, the research undertaken with this regard has the following research question: How effectively can deep learning techniques generate unique monophonic world music based on a music dataset and musical theory?. We address this research question to propose an appropriate methodology, implementation and expected result evaluation. In the next section we talk about the various researches carried out related to our proposed objective. 2 State of the Art In this section we analyse, and survey various research papers related to our research objective. The following first sub-section under consideration is the format or type of the dataset used in the researches of the referenced papers. 2.1 Filetype of Dataset used Since our research question seeks solution to develop a monophonic music sample based on a contem- porary music dataset and music theory rules, the processing of the dataset and the output format of the music generated plays a crucial role. The type of the dataset used for the processing in neural networks decides the quality of the output of generated music and the complexity of neural networks. There are two main types of audio formats which are MIDI (Musical Instrument Digital Interface Format) and WAV (Waveform Audio File Format) filetypes. If we analyse the type of data used in the research pa- pers of related work, we can observe that there is a clear dominance of the usage of MIDI format for computer music generation. The following are the citations of research works which have used MIDI filetype: Madhok et al. (2018), Mao (2018), Yang et al. (2017), Sabathe et al. (2017), Liang (2016), Lyu et al. (2015), Goel et al. (2014), Chung et al. (2014), Roig et al. (2014), Boulanger-Lewandowski et al. (2012), Yuksel et al. (2011), Oliwa and Wagner (2008), Cameron (2001) and Masako and Kazuyuki (1992). This shows that from the early 90s till the present time, MIDI filetype have been the preferred choice of researchers in this domain. The main reason of this choice is because of low complexity in computation and the output is recognizable and editable in most of the digital audio workstations or music production softwares. MIDI format contains information like the pitch, velocity and many other information regarding the played notation. The main advantage is that instrument of the music gener- ated can be changed to any desirable, high quality virtual instrument to increase the sound quality of the final output. The disadvantage of the direct playback of MIDI files is that the sound quality of the instrument is substandard, and poor compared to all other formats. Now, well see the next filetype used for the researches in this domain. 3
  • 4. WAV is the preferred filetype in music production scenarios because of the feature of lossless audio. But they face a drawback of a significantly large file size than compared to the MIDI format. Engel et al. (2017) and Ma´ndziuk et al. (2016) have used the WAV format in their researches. These files cannot be edited by note but will have to be edited by the waveform which is a more complex task. Because of the large file size, the computational complexity is high, and it will require superior level GPUs to execute the machine learning algorithms. The next filetype concerns extraction of information related to sheet music to train the system. Lichtenwalter and Lichtenwalter (2009) has used MusicXML format to garner information regarding the notes, time signature, pitch, dynamics and other musical parameters to feed into the system for teaching the system about chord progressions and other music theory for effective music generation. While, Eck and Schmidhuber (2002) have directly extracted musical information from sheet music to train the system of the chords and the sequence of notes to be played. Extracting information from sheet music is not as effective as the two commonly playable audio formats mentioned before which are MIDI and WAV. Hence, most of the researches in the domain of computerised music generation or music composing utilizes these filetypes. 2.2 Model Generation The model generation or the development of neural network is the major part of this project which learns and trains itself to produce monophonic music samples as per our objective. This section aims to discuss and compare the various methods used for the network generation. Madhok et al. (2018) in their research have recorded 7 major human emotions and then as per the detected emotion, generated music apt for that observed scenario and emotion using dual layer Long Short Term Memory Network (LSTM) architecture. The evaluation of this work was performed using a correlation between the facial expression detected and the probability that the resulting music falls in the same section. This correlation resulted in 0.93, which proves to be a good score. Mao (2018) have used a dual axis LSTM architecture wherein one axis provides provision for the desired time of generated music and the other axis facilitates the output of the desired notes. By the addition of style and volume features the music production quality was enhanced. The evaluation of this approach was done by a statistical hypothesis with the level of significance 0.05 and the value z = 0.945 conveys that the classification precision of human composers and the proposed approach was almost similar. Solutions to three different music generation problems like harmonization, chord inversion and voicings and chord estimation were achieved by Kitahara (2017) by using Bayesian Networks. Yang et al. (2017) have implemented model based on Convolutional Neural Networks (CNN) and Generative Adversarial Network (GAN) in which information regarding the previ- ous bar and the sequence of chord structure is incorporated and have produced similar results as others. The drawback of this model is highlighted by the absence of consideration to velocity and musical pauses which makes the music produced to be aligned towards artificial music. Sabathe et al. (2017) used LSTM networks for music generation with optimized parameters like 167 units of LSTM for both the decoding and encoding functions and 23 steps to perform sequential automatic encoding. The major drawback in this approach was that production of music pieces longer than the trained samples could not be generated. Music theory and other features of music were utilised more effectively by Ma´ndziuk et al. (2016) in which the authors developed a combined algorithm consisting of a genetic algorithm and local optimiza- tion which captures all necessary technicalities of music theory to produce aesthetically and theoretically superior music.Liang (2016) have developed a sequential LSTM network where they train the system to produce good quality music without much training of musical theory concepts. Lyu et al. (2015) have done an amalgamation of LSTM units to Recurrent Temporal Restricted Boltzmann Machine (RTRBM) and have secured average results which were caused by the absence of optimization techniques. Goel et al. (2014) used RNN with two layers of Restricted Boltzmann Machine for sequence modelling to produce music whose results are only at par with the other researches owing to lack of optimization methods and pretraining of musical theory.Chung et al. (2014) shows that LSTM and GRU units fare better in LSTM networks as opposed to tanh unit in the applications of raw speech and polyphonic music generation. Eck and Schmidhuber (2002) facilitated learning chord sequences and melody sequences to input the learnt information to LSTM networks. Hence, a majority of the researchers have used recurrent neural networks, more specifically LSTM networks to be able to learn and generate music pieces based on an input work and musical technicalities. 4
  • 5. 3 Research Question The research question for this project is as follows: How effectively can deep learning techniques generate unique monophonic world music based on a music dataset and musical theory?. This falls under the domain of Computer Music Generation. Neural networks have been successful in producing good musical recordings as seen in the literature review in the next sub-section. To be able to evaluate and learn the previous notes and chords played in a musical sequence is necessary in computerised music generation. Hence, we use recurrent neural networks (RNN) have recurrent loops for the nodes. LSTM units have been seen as the best RNN unit in the referenced papers. But for the effective implementation of our research objective, we introduce a novel implementation of the learning of Musical theory concerning scales in music, time signatures, chord progressions, velocity understanding of each note according to genre and accidental chords and progressions that may be introduced unexpectedly into a music. These information are contained in MusicXML format which are extracted into the system along with MIDI information. In the related literature, we can observe the mention of musical knowledge which can complement and better the system under development to generate music. The researchers have suggested the collaboration of musical experts and computer experts. Since, I am a musician and have the requisite music theory knowledge to train the neural networks, this project is feasible and can be improved upon the other researches in this domain. 4 Proposed Approach In this section the approach to be followed for the given research question is detailed and given an over- view of the complete picture. We propose to implement recurrent neural networks (RNN) to analyse the global contemporary music, MIDI dataset and recreate similar sequences of music. We use MIDI files as an input to the network to train known melody lines and musical sequences and test for the probability of occurrence of similar but unique music. The dataset used is the The Lakh MIDI Dataset v1.0 found in the web link: http:colinraffel.comprojectslmd#get. There around 1 lakh MIDI files of the songs listed in the Million Song Dataset. So this dataset is a subset of the Million Song Dataset containing con- temporary global music. MIDI Files consist of only the information about the notes played, timing of the notes, time signature, dynamics and velocity. Also, we use information related to music theory such as chord progression, scales, accidental note and chord usages and other music rules in the MusicXML format to be fed into the Recurrent Neural Network. These information are present as sheet music in this format which will be extracted and trained in the system proposed. The desired design constraints of the network are as follows: Time Signature The Recurrent neural network should be able to identify the current playing time with reference to the musical time signature. Time signature refers to the number of beats occurring in one single bar of music. In common time, there are four beats in one bar which is denoted by 4/4. Likewise, there are many time signatures like 3/4, 5/8, 6/8, 7/8, etc. Invariance in Notes There should be independence in the music with related to the octave. Changing octaves should not affect the basic note, chord structure and progressions. Repetition of Notes The sustain of one note over two bars should be distinguished from playing that same note twice. Invariance in Time There should be freedom given to the network to generate music independent of the time frame like an adlib, as called in musical terms Accidental Note and Chord Changes Accidental notes or chords can be termed as out of the box methods which do not feature in musical theory. These can be innovative and enriching to hear if used correctly with some developed rules. 5
  • 6. Figure 1: Network Design The property of being invariant in time is achievable using RNN. However, note invariance is not achievable in RNN because the fully connected layer has nodes to represent all the notes in MIDI format. If we increase the pitch of every note by one half step in music, the output of the network will be entirely different from the desirable output. This drawback can be resolved by the convolutional neural networks (CNN). In the case of image recognition applications, the kernel of CNN is used to apply to all pixels of the image of the input source. Now, we assume that kernel of CNN is replaced with an RNN. Hence, the network will consist of an RNN, wherein the kernel is replaced with another RNN. This would facilitate the cells or the pixel the luxury to possess a neural network of its own. This example is now applied to our study and hence we replace the pixels in this example with notes which are main elements in our research. If we implement a stack of similar RNN, wherein this RNN is provided for each and every note, every note gets a neighbourhood of RNN. These neighbourhood RNN of every note is assigned one octave higher and lower than the normal pitch of the note. Hence, we achieve an invariance in time as well as notes. Because of the memory retained concerning previous notes and sequences, we must now build a method to produce good innovative chords for the music. Hence, we divide this approach into two sections. Bi-Axial recurrent neural network is suitable for meeting our research objective. In Bi-Axial RNN, the first axis will represent time and the other axis will represent note. The network design is as shown in the figure. The following are the details concerning the inputs and outputs of the proposed network: The inputs to the time layer in the Bi-axial RNN are discussed here. 1. Note Value: The note value refers to the MIDI value which describes1 the register of the played note whether it lies in the lower register or the higher register. 2. Pitch: This refers to the value of the pitch of the played note where A notes pitch value is 0 and the value increases for every one-half note increase in pitch. 3. Scale: The scale refers to the sequence of notes following certain musical theories. There are many scales in the music theory which can be input to the system to emulate world music without mistakes. 4. Previous State: This input is to train the network the number of instances a particular note was played during the previous time step. 6
  • 7. 5. Rhythm: This is a useful input to let the network understand the position of the current note with respect to the time measure and time signature Along the axis of time, LSTM which have recurrent loops forms the first hidden layer. The other axis of LSTM, which is the note axis searches notes from the low registers ranging till the high registers. After the running of the last layer of Long Short Term Memory Network, the final fully connected and non-recurrent layer provides an output of two types of probabilities which are given as follows: 1. Probability of each note getting played 2. If the note probability is 1 then the probability of the articulation of that note will also form one of the outputs from the non-recurrent network layer. Processing of musical output: The MIDI file music generated from the output may be then fed into the music production softwares and then edited to change the instrument. Since MIDI format quality is very poor, high quality virtual instruments from third parties can be used to provide scoring level quality. This option is available for the MIDI files since MIDI format represent information about note, velocity, pitch and other musical parameters. 5 Proposed Implementation Our music generation model proposed will be implemented in Python programming language. Particu- larly, well be using a python library called Theano which simplifies the computational complexity and provides flexibility to the network architecture. Now we will discuss the step by step implementation which is as described below Random small pieces of the MIDI files are fed into the recurrent neural networks during training.The cross-entropy can be found out by obtaining the probabilities of all the outputs. These probabilities are then applied a logarithmic transformation and then applied a negation. The output is then fed as cost for the optimization of weights into the AdaDelta optimizer. Training of the time-axis layers is done by batching each and every note together and then training the layers of the note-axis by batching together each and every time step. The processing unit of the computer system is better utilised because of the ability to multiply big matrices. Dropout is used in our network so as to eliminate the problem of overfitting. The application of dropout in each layer, renders the work of elimination of 50% of the hidden nodes. The output of each and every layer is multiplied with a mask and hence the fragile nodes are eliminated by multiplying their output by zero. This achieves the purpose of specialization and thereby prevents the nodes to sway towards weak dependencies. We then multiply the output of each and every node by 0.5 with the purpose of adding a correction factor to prevent more active nodes to eliminate large number of active nodes. For training the model we use the instance of Amazon web service (AWS). We use cheaper options of instances which are called as spot instances. For one hour, the price of the using the instance range from 10 cents (US) to 15 cents (US). Our proposed model consists of two hidden layers of note-axis and two hidden layers of time-axis. The note axis layers have 100 nodes and 50 nodes respectively for the two layers. Similarly, the 300 nodes are present for the two hidden time axis layers. The training of all the MIDI files in our dataset was performed by choosing 8- count segments of these MIDI clips and then batching them together. 7
  • 8. Parsing MIDI Dataset Parsing Dataset of Music Theory Dimension Reduction Final Features Training Com- bined Dataset Testing Com- bined Dataset Training Fea- ture Vectors Testing Fea- ture Vectors Music Generation Evaluation Figure 2: Decision Model 6 Proposed Evaluation The evaluation of the output for our music generation model will be performed by conducting an open survey. The details of this evaluation is as described in the following section: In the open survey, we will select a group of 50 people, among them the 80 percentage of the people will have musical background and 20 percent people would not be having any musical background. There will be three sets of identification tasks and in each set there will be three musical recordings to be identified as generated by our music generation system. Out of these three musical recordings, two will be composed by humans and one will be generated by our system. The group of people will be knowing these rules and will have to identify from these three recordings, the music generated by our system. The participants of this survey will also be given an option to write comments about each of the recordings in each set. We will now describe the evaluation metrics to be used for this survey. Recording Set Identifier will denote each of the three sets of recordings used for the survey. A metric called as Incorrect Identification will be used to denote the percentage of incorrectly identified recordings in each set and similarly, Correct Identification will denote the percentage of correctly identified recordings in each set used for the survey. We estimate that the percentage of correctly identified samples will be lower than 40 percentage ranging till 20 percentage. The estimation of the results can be tabulated as follows Hence, we anticipate the incorrect identification to be around 73.6 percentage and the correct iden- tification to be 26.4 percentage. The highlight of this survey is that when the participants are asked to pick the computer generated music from the total three recordings, the participants will definitely pick the most unpleasing and inferior level recording. This is because of the less advances in the field of computer generated music and the ability of humans to create world class, superior music. Hence, the 8
  • 9. Recording Set Identifier Incorrect Identification Correct Identification 1 66% 34% 2 75% 25% 3 80% 20% Total 73.60% 26.40% incorrect identification is the accuracy of the evaluation of our system. Hence, we expect around 70 to 75 percentage of accuracy in this respect. In the next evaluation, we survey the correctness of the genre of each recording. The same participants are given 20 recordings of three genres selected for this evaluation. The recordings of these genres from the training dataset and the generated recordings having similar genres are selected to be reviewed by the participants. The metric used for reviewing the matching criteria between the training data and the generated data is the similarity score. A mean of these similarity scores are taken for each genre to get the similarity score of the complete recordings in each genre. The similarity score ranges from 1 to 5 where 1 denotes that the recordings sound very different to the training dataset and 5 denotes that the recordings sound very similar to the training dataset. We expect a similarity mean score of 4.1 for pop music genre since pop music is very commonly known among the masses and hence there wont be much difficulty in identifying the differences in this genre. Also, the accuracy of the proposed model will increase the similarity score. The similarity score of Jazz is expected to be less compared to other genres because of the complexity of music of this genre and less followers of this genre. The following table shows the mean of the similarity scores of each genre: Genre Mean Pop 4.1 Jazz 3.8 Blues 3.9 Hence, we calculate the accuracy of the proposed music generation model by the reviews of surveys. The estimation or the anticipation of the results were shown in the above tabulated results. We expect a state of the art performing model using the proposed methodology and implementation 7 Conclusion Hence, we have proposed the plan or the blueprint of the research to be undertaken with regards to composing or generating monophonic music using neural networks. Our plan aims to guide the research appropriately to complete the project within the time frame of three months and to have a better understanding during the actual implementation of the methodology. We anticipate state of the art results compared to other researches in this field as seen in the proposed evaluation. We strive to detail the steps of implementation and enhancing the proposed model even more after trying and testing different approaches. References Boulanger-Lewandowski, N., Bengio, Y. and Vincent, P. (2012). Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription, (Cd). URL: http://arxiv.org/abs/1206.6392 Cameron, B. B. (2001). System and Method for Automatic Music Generation using a Neural Network Architecture, 2(12). Chung, J., Gulcehre, C., Cho, K. and Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, pp. 1–9. URL: http://arxiv.org/abs/1412.3555 Eck, D. and Schmidhuber, J. (2002). A First Look at Music Composition using LSTM Recurrent Neural Networks, Idsia pp. 1–11. URL: http://people.idsia.ch/ juergen/blues/IDSIA-07-02.pdf%0Ahttp://www.idsia.ch/ juergen/blues/IDSIA- 07-02.pdf 9
  • 10. Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K. and Norouzi, M. (2017). Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. URL: http://arxiv.org/abs/1704.01279 Goel, K., Vohra, R. and Sahoo, J. K. (2014). Polyphonic music generation by modeling temporal dependencies using a RNN-DBN, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8681 LNCS: 217–224. Kitahara, T. (2017). Music Generation Using Bayesian Networks, pp. 3–6. URL: http://www.kthrlab.jp/ Liang, F. (2016). BachBot: Automatic composition in the style of Bach chorales - Developing, analyzing, and evaluating a deep LSTM model for musical style, (August). Lichtenwalter, R. and Lichtenwalter, K. (2009). Applying learning algorithms to music generation, Proceedings of the 4th pp. 483–502. URL: http://www.cse.nd.edu/Reports/2008/TR-2008-10.pdf Lyu, Q., Wu, Z., Zhu, J. and Meng, H. (2015). Modelling high-dimensional sequences with LSTM- RTRBM: Application to polyphonic music generation, IJCAI International Joint Conference on Ar- tificial Intelligence 2015-Janua(Ijcai): 4138–4139. Madhok, R., Goel, S. and Garg, S. (2018). SentiMozart : Music Generation based on Emotions, 2(Icaart): 501–506. Ma´ndziuk, J., Wo´zniczko, A., Goss, M., Ma´ndziuk, J., Wo´zniczko, A., Goss, M. and System, A. N.-m. (2016). A Neuro-memetic System for Music Composing. Mao, H. H. (2018). DeepJ: Style-Specific Music Generation, Proceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018 2018-Janua: 377–382. Masako, N. and Kazuyuki, W. (1992). Interactive Music Composer Based on Neural Networks.Pdf. Oliwa, T. and Wagner, M. (2008). Composing music with Neural Networks and probabilistic finite- state machines, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4974 LNCS: 503–508. Roig, C., Tard´on, L. J., Barbancho, I. and Barbancho, A. M. (2014). Automatic melody composition based on a probabilistic model of music style and harmonic rules, Knowledge-Based Systems 71: 419– 434. URL: http://dx.doi.org/10.1016/j.knosys.2014.08.018 Sabathe, R., Coutinho, E. and Schuller, B. (2017). Deep recurrent music writer: Memory-enhanced variational autoencoder-based musical score composition and an objective measure, Proceedings of the International Joint Conference on Neural Networks 2017-May: 3467–3474. Yang, L.-C., Chou, S.-Y. and Yang, Y.-H. (2017). MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation. URL: http://arxiv.org/abs/1703.10847 Yuksel, A., Karci, M. and Uyar, A. (2011). Automatic music generation using evolutionary algorithms and neural networks, pp. 354–358. 10