*What is Machine Learning?
-Definition
-Explanation
*Difference between Machine Learning and Standard Programs
*Machine Learning Models
-Supervised Learning
--Classification
--Regression
-Unsupervised Learning
--Clustering
*AI Evolution
-History of AI
-Neural Networks and Deep Learning
-Simple Neural Network and Deep Neural Network
-Difference between AI, Machine Learning, and Deep Learning
This talk overviews my background as a female data scientist, introduces many types of generative AI, discusses potential use cases, highlights the need for representation in generative AI, and showcases a few tools that currently exist.
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
A presentation about a new Google Research paper in the text-to-image task - Imagen.
This latent diffusion-based model outperforms DALLE-2 and other models and produces incredibly realistic images.
A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model.
Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.
*What is Machine Learning?
-Definition
-Explanation
*Difference between Machine Learning and Standard Programs
*Machine Learning Models
-Supervised Learning
--Classification
--Regression
-Unsupervised Learning
--Clustering
*AI Evolution
-History of AI
-Neural Networks and Deep Learning
-Simple Neural Network and Deep Neural Network
-Difference between AI, Machine Learning, and Deep Learning
This talk overviews my background as a female data scientist, introduces many types of generative AI, discusses potential use cases, highlights the need for representation in generative AI, and showcases a few tools that currently exist.
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
A presentation about a new Google Research paper in the text-to-image task - Imagen.
This latent diffusion-based model outperforms DALLE-2 and other models and produces incredibly realistic images.
A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model.
Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
This session was presented at the AWS Community Day in Munich (September 2023). It's for builders that heard the buzz about Generative AI but can’t quite grok it yet. Useful if you are eager to connect the dots on the Generative AI terminology and get a fast start for you to explore further and navigate the space. This session is largely product agnostic and meant to give you the fundamentals to get started.
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
Discuss the impact and opportunity of using Generative AI to support your development and creative teams
* Explore business challenges in content creation
* Cost-per-unit of different types of content
* Use AI to reduce cost-per-unit
* New partnerships being formed that will have a material impact on the way we search and engage with content
Part 4 of a 9 Part Research Series named "What matters in AI" published on www.andremuscat.com
* "Responsible AI Leadership: A Global Summit on Generative AI"
*April 2023 guide for experts and policymakers
* Developing and governing generative AI systems
* + 100 thought leaders and practitioners participated
* Recommendations for responsible development, open innovation & social progress
* 30 action-oriented recommendations aim
* Navigate AI complexities
Build an LLM-powered application using LangChain.pdfStephenAmell4
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
I will talk about Generative AI and its applications to 2D art production in the gaming industry. We will explore the Stable Diffusion neural net and concepts such as Prompt Engineering, Image-to-Image, ControlNet, and Dreambooth and how they can enhance game development. Moreover, we will compare the pros and cons of Stable Diffusion with Midjourney. As a result, you will better understand the potential benefits of incorporating generative AI into your game development workflow.
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Vadym Kazulkin
In this talk I will compare 2 services Github Copilot (including Copilot X) and Amazon CodeWhisperer from the perspective of the Java developers in terms of the quality of the given recommendations for simple tasks, complex algorithms, Spring Boot and AWS development, IDE integration and pricing.
Both services are the machine learning-powered services that help improve developer productivity by generating code recommendations based on developers’ comments in natural language and their code. Based on natural language comments, these services also automatically recommend unit test code that matches your implementation code.
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Fordham University
In recent years, the field of artificial intelligence (AI) has witnessed remarkable advancements, particularly in the domain of Generative models. Generative AI, a subset of machine learning, focuses on developing systems that can create novel and realistic content, ranging from text, speech, images to the multimodal content. This burgeoning field has demonstrated unprecedented potential to revolutionize various industries, making it imperative to introduce dedicated study materials on the foundation of Generative AI. With the increasing integration of Generative AI in various industries, professionals with expertise in this field are in high demand, and thus we believe that the publication of the slides are extremely important to meet the current need. The proposed outline aims to equip students with the knowledge and skills required to harness the creative power of AI and navigate the ethical implications associated with Generative technologies. * Materials used in this PPT were collected from Wikipedia, Google Image, and OpenAI GPT. No copyright is claimed by the author.
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...multimediaeval
Presenters: Dmitry Bogdanov, Universitat Pompeu Fabra, Spain
Alastair Porter, Universitat Pompeu Fabra, Spain
Hendrik Schreiber, tagtraum industries incorporated
Paper: http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_6.pdf
Video: https://youtu.be/NpN2Fr3go_Y
Authors: Dmitry Bogdanov, Alastair Porter, Julián Urbano, Hendrik Schreiber
Abstract: This paper provides an overview of the AcousticBrainz Genre Task organized as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation. The task is focused on content-based music genre recognition using genre annotations from multiple sources and large-scale music features data available in the AcousticBrainz database. The goal of our task is to explore how the same music pieces can be annotated differently by different communities following different genre taxonomies, and how this should be addressed by content-based genre recognition systems. We present the task challenges, the employed ground-truth information and datasets, and the evaluation methodology.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
This session was presented at the AWS Community Day in Munich (September 2023). It's for builders that heard the buzz about Generative AI but can’t quite grok it yet. Useful if you are eager to connect the dots on the Generative AI terminology and get a fast start for you to explore further and navigate the space. This session is largely product agnostic and meant to give you the fundamentals to get started.
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
Discuss the impact and opportunity of using Generative AI to support your development and creative teams
* Explore business challenges in content creation
* Cost-per-unit of different types of content
* Use AI to reduce cost-per-unit
* New partnerships being formed that will have a material impact on the way we search and engage with content
Part 4 of a 9 Part Research Series named "What matters in AI" published on www.andremuscat.com
* "Responsible AI Leadership: A Global Summit on Generative AI"
*April 2023 guide for experts and policymakers
* Developing and governing generative AI systems
* + 100 thought leaders and practitioners participated
* Recommendations for responsible development, open innovation & social progress
* 30 action-oriented recommendations aim
* Navigate AI complexities
Build an LLM-powered application using LangChain.pdfStephenAmell4
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
I will talk about Generative AI and its applications to 2D art production in the gaming industry. We will explore the Stable Diffusion neural net and concepts such as Prompt Engineering, Image-to-Image, ControlNet, and Dreambooth and how they can enhance game development. Moreover, we will compare the pros and cons of Stable Diffusion with Midjourney. As a result, you will better understand the potential benefits of incorporating generative AI into your game development workflow.
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Vadym Kazulkin
In this talk I will compare 2 services Github Copilot (including Copilot X) and Amazon CodeWhisperer from the perspective of the Java developers in terms of the quality of the given recommendations for simple tasks, complex algorithms, Spring Boot and AWS development, IDE integration and pricing.
Both services are the machine learning-powered services that help improve developer productivity by generating code recommendations based on developers’ comments in natural language and their code. Based on natural language comments, these services also automatically recommend unit test code that matches your implementation code.
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Fordham University
In recent years, the field of artificial intelligence (AI) has witnessed remarkable advancements, particularly in the domain of Generative models. Generative AI, a subset of machine learning, focuses on developing systems that can create novel and realistic content, ranging from text, speech, images to the multimodal content. This burgeoning field has demonstrated unprecedented potential to revolutionize various industries, making it imperative to introduce dedicated study materials on the foundation of Generative AI. With the increasing integration of Generative AI in various industries, professionals with expertise in this field are in high demand, and thus we believe that the publication of the slides are extremely important to meet the current need. The proposed outline aims to equip students with the knowledge and skills required to harness the creative power of AI and navigate the ethical implications associated with Generative technologies. * Materials used in this PPT were collected from Wikipedia, Google Image, and OpenAI GPT. No copyright is claimed by the author.
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...multimediaeval
Presenters: Dmitry Bogdanov, Universitat Pompeu Fabra, Spain
Alastair Porter, Universitat Pompeu Fabra, Spain
Hendrik Schreiber, tagtraum industries incorporated
Paper: http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_6.pdf
Video: https://youtu.be/NpN2Fr3go_Y
Authors: Dmitry Bogdanov, Alastair Porter, Julián Urbano, Hendrik Schreiber
Abstract: This paper provides an overview of the AcousticBrainz Genre Task organized as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation. The task is focused on content-based music genre recognition using genre annotations from multiple sources and large-scale music features data available in the AcousticBrainz database. The goal of our task is to explore how the same music pieces can be annotated differently by different communities following different genre taxonomies, and how this should be addressed by content-based genre recognition systems. We present the task challenges, the employed ground-truth information and datasets, and the evaluation methodology.
Computational models of symphonic musicEmilia Gómez
Computational models of symphonic music: challenges and opportunities
Keynote speech by Emilia Gómez
Mathematics and Computation in Music Conference
London, UK, June 2015
http://mcm2015.qmul.ac.uk/
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...multimediaeval
Presenters: Dmitry Bogdanov, Universitat Pompeu Fabra, Spain
Alastair Porter, Universitat Pompeu Fabra, Spain
Hendrik Schreiber, tagtraum industries incorporated
Paper: http://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_6.pdf
Video: https://youtu.be/NpN2Fr3go_Y
Authors: Dmitry Bogdanov, Alastair Porter, Julián Urbano, Hendrik Schreiber
Abstract: This paper provides an overview of the AcousticBrainz Genre Task organized as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation. The task is focused on content-based music genre recognition using genre annotations from multiple sources and large-scale music features data available in the AcousticBrainz database. The goal of our task is to explore how the same music pieces can be annotated differently by different communities following different genre taxonomies, and how this should be addressed by content-based genre recognition systems. We present the task challenges, the employed ground-truth information and datasets, and the evaluation methodology.
Research at MAC Lab, Academia Sincia, in 2017Yi-Hsuan Yang
Some research projects we did in 2017 at the Music & Audio Computing (MAC) Lab, Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan. It includes three parts: 1) vocal separation, 2) music generation, 3) AI DJ.
Distinguishing Violinists and Pianists Based on Their Brain SignalsGianpaolo Coro
Many studies in neuropsychology have highlighted that expert musicians, who started learning music in childhood, present structural differences in their brains with respect to non-musicians. This indicates that early music learning affects the development of the brain. Also, musicians’ neuronal activity is different depending on the played instrument and on the expertise. This difference can be analysed by processing electroencephalographic (EEG) signals through Artificial Intelligence models. This paper explores the feasibility to build an automatic model that distinguishes violinists from pianists based only on their brain signals. To this aim, EEG signals of violinists and pianists are recorded while they play classical music pieces and an Artificial Neural Network is trained through a cloud computing platform to build a binary classifier of segments of these signals. Our model has the best classification performance on 20 seconds EEG segments, but this performance depends on the involved musicians’ expertise. Also, the brain signals of a cellist are demonstrated to be more similar to violinists’ signals than to pianists’ signals. In summary, this paper demonstrates that distinctive information is present in the two types of musicians’ brain signals, and that this information can be detected even by an automatic model working with a basic EEG equipment.
Abstract of the scientific paper
Coro, G., Masetti, G., Bonhoeffer, P., & Betcher, M. (2019, September). Distinguishing Violinists and Pianists Based on Their Brain Signals. In International Conference on Artificial Neural Networks (pp. 123-137). Springer, Cham.
https://link.springer.com/chapter/10.1007%2F978-3-030-30487-4_11
Human Perception and Recognition of Musical Instruments: A ReviewEditor IJCATR
Musical Instrument is the soul of music. Musical Instrument and Player are the two fundamental component of Music. In
the past decade the growth of a new research field targeting the Musical Instrument Identification, Retrieval, Classification,
Recognition and management of large sets of music is known as Music Information Retrieval. An attempt to review the methods,
features and database is done.
MUSZIC GENERATION USING DEEP LEARNING PPT.pptxlife45165
To create a Streamlit application for music generation using deep learning, you need to ensure that all the elements of your Python script are correctly set up and that you handle file paths correctly, especially given the specific paths on your system.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
1. Automatic Music Transcription
for Polyphonic Music
CS599 Deep Learning Final Project
By Keen Quartet team
Guided by Prof. Joseph Lim
& Artem Molchanov
2. Project Overview
● Attempt to design a system that can transcribe music
● Musical piece characteristics:
○ Has multiple musical sources (many
○ Each instrument piece is polyphonic (more than one note at a given
time)
● Motivation:
○ To make it easy for music amateurs to learn to play instrument
3. Approach
Challenges
○ Polyphonic music:
multiple notes / time frame → exponential combinations → difficult
learning
○ Multiple instruments,vocals → multiple models each to transcribe single
instrument
● We address these challenges by incorporating:
○ Separation of music piece into its sources
■ Current focus, on separating vocals and background instruments only.
○ Identify predominant instrument and transcribe each accordingly
○ Currently we focus on transcription of piano music only
5. Source Separation
Goal
● Separate out different musical sources
○ Sources : (Voice, various instruments etc)
● Multiple instrument => highly complex task!
○ Need Labels for each source type
○ Tune loss function
● Focus on separating two sources : vocals, instrument
● Input : A spectrogram of mixed audio signals.
● Output: 2 audio files for each separate source.
● Dataset used: MIR-1K
Difficulttoretrieve
Image source:
http://www.cs.northwestern.edu/~pardo/courses/eecs352/lectures/source%20separation.pdf
6. Source Separation - Our Approach
● LSTM based approach.
● 2 dense layers: capture each source.
● Masking layer:
○ Normalize outputs of dense layers.
○ Mask out other source from mixed spectra.
● Joint training :
○ Network parameters.
○ Output of masking layer.
● Discriminative training:
○ Increase difference between :
■ Predicted vocal, actual instruments
■ Predicted instruments, actual vocals
:
8. Predominant Music Identification
Goal
Identify the predominant instrument in the file obtained from source
segregation
Why? Transcription is instrument specific. So very important to know the
instrument before going for transcription
Approach
● Train a CNN model on 6000 audio files to infer the pattern in music file
● 11 categories of instruments for training
Input: .wav files obtained from previous steps
Output: Label of the predominant instrument
Dataset: IRMAS
9. Predominant Instrument Identification
Model
Results
● Initially very bad accuracy (15%)
○ Why? Less training, images larger than usual (43 x 128)
● Improved to achieve ~60% accuracy
● Batch normalization, more epochs (150 epochs with early stopping)
Image source:
10. Automatic Transcription for Polyphonic Music
Goal
Obtain transcription (note representation) of music
Seems easy : one-to-one mapping between notes and notations
But is not easy: Why?
● Polyphonic music: number of notes playing at given time
● Exponential combinations at a given time
● Multiple instruments: need separate model for each instrument as loss function differs for each
Currently, focusing on Piano music
Same approach works for any instrument
● Given good dataset
● Lots of training and proper loss function
11. Automatic Transcription for Polyphonic Music
Appoach:
● Train a ConvNET model on polyphonic piano music
● Used MAPS dataset:
○ 45 GB audio files, around 60 hours of recording
○ Processed about 6 million timeframes
Approach 1:
● Use the whole dataset
○ Computationally intensive
○ Trained for 7 epochs before early stopping
Approach 2:
● Iterative training by using one category at a time
○ Trained for 63, 20, 7, 7, 7 epochs
We obtain the probability distribution of notes being played. Infer the notes being played by
keeping threshold
Result: ~96% accuracy
12. Learning outcome
● Explored a domain completely new for us.
● Beginners in Deep Learning
● Our pipeline had 3 different models, one for each step, all using deep
learning approach. This required an extensive literature survey for each of
them and implementation and training effort. Each model is trained using a
different dataset
● Attempted to build on existing concepts in each part:
● Source separation: LSTM, Discriminative Learning
● Predominant Instrument Identification: Batch normalization
● Transcription: Different approaches to train for better generalization
13. Summary
● Our system is divided into three components:
● First attempt to transcribe Polyphonic Music for Multiple Instrument
using Deep Learning technique
● Future directions:
○ Extend source separation for multiple instruments
○ Make transcription model more flexible
Source separation → predominant instrument identification →
Transcription
14. References
1. Huang, Po-Sen, et al. “Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural
Networks.” ISMIR. 2014.
2. Chandna, Pritish, et al. “Monoaural audio source separation using deep convolutional neural networks.”
International Conference on Latent Variable Analysis and Signal Separation. Springer, Cham, 2017.
3. MIR-1K dataset: Chao-Ling Hsu, DeLiang Wang, Jyh-Shing Roger Jang, and Ke Hu, “ A Tandem Algorithm for Singing
Pitch Extraction and Voice Separation from Music Accompaniment,” IEEE Trans. Audio, Speech, and Language
Processing, 2011
4. Han, Yoonchang, et al. “Deep convolutional neural networks for predominant instrument recognition in
polyphonic music.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 25.1 (2017):
208-221.
5. IRMAS Dataset: Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. “A Comparison of Sound Segregation Techniques
for Predominant Instrument Recognition in Musical Audio Signals”, in Proc. ISMIR (pp. 559-564), 2012.
6. Sigtia, Siddharth, Emmanouil Benetos, and Simon Dixon. “An end-to-end neural network for polyphonic piano
music transcription.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.5 (2016):
927-939.
7. MAPS Dataset: Multi-pitch estimation of piano sounds using a new probabilistic spectral smoothness principle, V.
Emiya, R. Badeau, B. David, IEEE Transactions on Audio, Speech and Language Processing, 2010.