Roz: Automatic Music Creation
by Rakshak Talwar - ECE Student,
Dr.Kakadiaris of Computer Science
Motivation
Results
Methodology
• Other approaches:
• Compose one note at a time. Older systems use rule based learning,
newer ones use LSTM/RNNs.
• Our approach:
• Convolutional Neural Network (CNN) trained on classifying images. We
turn music into “pictures” (spectrograms) and use the CNN as a
generative model. This is borrowed from “A Neural Algorithm of Artistic
Style” (Gatys, et al.) . We add spectrogram and denoising functionality to
work with audio data.
Conclusions
• This technique was able to generate and denoise simple audio
inputs.
• However, this technique did not produce appreciable results when
inputting more complex audio e.g. pieces of music with many
components.
• Further work needs to be done. May entail building specialized
audio classifiers to use as generative models.
Acknowledgements
Dr. Christophoros Nikou, Evangelos Kazakos, Abraham Baez-Suarez, Nikolaos Sarafianos, Ali Memariani, Dr. Michalis Vrigkas, Ha A. Le, Desmond
Ikegwuono, Dr. Aaminah Durrani , Dr. Timothy Koozin, Dr. John L. Snyder, Kimberly Lopez, SURF UH, Wavtones.com
NVIDIA GPU Grant
References
D. Cope, The algorithmic composer vol. 16: AR Editions, Inc., 2000., D. Cope, Computers and musical style. Madison, WI, USA: A-R Editions, Inc., 1991.
A. Coenen, "David cope, experiments in musical intelligence. A-R editions, madison, wisconsin, USA. vol. 12 1996," Org. Sound, vol. 2, pp. 57--60, apr 1997.
A. Karpathy, The unreasonable effectiveness of recurrent neural networks, 2015.
Understanding LSTM networks, 2015.
B. Sturm, “lisl’s stis”: recurrent neural networks for folk music generation, 2015.
L. A. Gatys, A. S. Ecker, and M. Bethge, "A neural algorithm of artistic style," arXiv preprint arXiv:1508.06576, p. n/a, 2015.
S. van der Walt and Schönberger, "Scikit-image: image processing in python," PeerJ, vol. 2, p. e453, jun 2014.
S. van der Walt, S. C. Colbert, and G. Varoquaux, "The numpy array: A structure for efficient numerical computation," Computing in Science Engineering, vol. 13,
pp. 22-30, mar 2011.
B. Thirion, E. Duschenay, V. Michel, G. Varoquaux, O. Grisel, J. VanderPlas, et al., "Scikitlearn," ed, 2016.
K. Simonyan and A. Zisserman, "Title," unpublished|.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., "Scikit-learn: machine learning in python," Journal of Machine Learning Research, vol. 12,
pp. 2825--2830, 2011.
C. Olah, Conv nets: A modular perspective, 2014.
W. McKinney, "data structures for statistical computing in python," presented at the Proc. Proceedings of the 9th Python in Science Conference, Austin, Texas, 2010.
J. D. Hunter, "Matplotlib: A 2D graphics environment," Computing in Science Engineering, vol. 9, pp. 90-95, may 2007.
L. A. Gatys, A. S. Ecker, and M. Bethge, "Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks," arXiv preprint
arXiv:1505.07376, vol. 12, p. n/a, 2015.
D. Eck and J. Schmidhuber, "A first look at music composition using LSTM recurrent neural networks," Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale
Instituto Dalle Molle di studi sull' intelligenza artificiale, Galleria 2, CH-6900 Manno, Switzerland2002.
S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri, et al., "Lasagne: first release," ed, 2015.
Z. E. Dell, "From White Noise to Für Elise: What makes music beautiful?," unpublished|.
Spectrogram calculation for numpy, 2016.
Can a computer combine and synthesize music as well as a professional DJ?
Humans require a considerable level of musical intellect if they want to cleanly combine two songs to produce a new mix. There is great difficulty
in capturing the nuances in music composition and fusion. Here we introduce a small step towards answering this question. We use a deep learning
network originally trained on image classification to fuse two audio inputs with the goal of generating sound that borrows properties from the both
input pieces.
Audio generation
process using
spectrograms.
Spectrogram
generation happens
from top-left to
bottom-right.

Talwar_Rakshak_2016URD

  • 1.
    Roz: Automatic MusicCreation by Rakshak Talwar - ECE Student, Dr.Kakadiaris of Computer Science Motivation Results Methodology • Other approaches: • Compose one note at a time. Older systems use rule based learning, newer ones use LSTM/RNNs. • Our approach: • Convolutional Neural Network (CNN) trained on classifying images. We turn music into “pictures” (spectrograms) and use the CNN as a generative model. This is borrowed from “A Neural Algorithm of Artistic Style” (Gatys, et al.) . We add spectrogram and denoising functionality to work with audio data. Conclusions • This technique was able to generate and denoise simple audio inputs. • However, this technique did not produce appreciable results when inputting more complex audio e.g. pieces of music with many components. • Further work needs to be done. May entail building specialized audio classifiers to use as generative models. Acknowledgements Dr. Christophoros Nikou, Evangelos Kazakos, Abraham Baez-Suarez, Nikolaos Sarafianos, Ali Memariani, Dr. Michalis Vrigkas, Ha A. Le, Desmond Ikegwuono, Dr. Aaminah Durrani , Dr. Timothy Koozin, Dr. John L. Snyder, Kimberly Lopez, SURF UH, Wavtones.com NVIDIA GPU Grant References D. Cope, The algorithmic composer vol. 16: AR Editions, Inc., 2000., D. Cope, Computers and musical style. Madison, WI, USA: A-R Editions, Inc., 1991. A. Coenen, "David cope, experiments in musical intelligence. A-R editions, madison, wisconsin, USA. vol. 12 1996," Org. Sound, vol. 2, pp. 57--60, apr 1997. A. Karpathy, The unreasonable effectiveness of recurrent neural networks, 2015. Understanding LSTM networks, 2015. B. Sturm, “lisl’s stis”: recurrent neural networks for folk music generation, 2015. L. A. Gatys, A. S. Ecker, and M. Bethge, "A neural algorithm of artistic style," arXiv preprint arXiv:1508.06576, p. n/a, 2015. S. van der Walt and Schönberger, "Scikit-image: image processing in python," PeerJ, vol. 2, p. e453, jun 2014. S. van der Walt, S. C. Colbert, and G. Varoquaux, "The numpy array: A structure for efficient numerical computation," Computing in Science Engineering, vol. 13, pp. 22-30, mar 2011. B. Thirion, E. Duschenay, V. Michel, G. Varoquaux, O. Grisel, J. VanderPlas, et al., "Scikitlearn," ed, 2016. K. Simonyan and A. Zisserman, "Title," unpublished|. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., "Scikit-learn: machine learning in python," Journal of Machine Learning Research, vol. 12, pp. 2825--2830, 2011. C. Olah, Conv nets: A modular perspective, 2014. W. McKinney, "data structures for statistical computing in python," presented at the Proc. Proceedings of the 9th Python in Science Conference, Austin, Texas, 2010. J. D. Hunter, "Matplotlib: A 2D graphics environment," Computing in Science Engineering, vol. 9, pp. 90-95, may 2007. L. A. Gatys, A. S. Ecker, and M. Bethge, "Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks," arXiv preprint arXiv:1505.07376, vol. 12, p. n/a, 2015. D. Eck and J. Schmidhuber, "A first look at music composition using LSTM recurrent neural networks," Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale Instituto Dalle Molle di studi sull' intelligenza artificiale, Galleria 2, CH-6900 Manno, Switzerland2002. S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri, et al., "Lasagne: first release," ed, 2015. Z. E. Dell, "From White Noise to Für Elise: What makes music beautiful?," unpublished|. Spectrogram calculation for numpy, 2016. Can a computer combine and synthesize music as well as a professional DJ? Humans require a considerable level of musical intellect if they want to cleanly combine two songs to produce a new mix. There is great difficulty in capturing the nuances in music composition and fusion. Here we introduce a small step towards answering this question. We use a deep learning network originally trained on image classification to fuse two audio inputs with the goal of generating sound that borrows properties from the both input pieces. Audio generation process using spectrograms. Spectrogram generation happens from top-left to bottom-right.