SlideShare a Scribd company logo
Automatic Music Transcription
for Polyphonic Music
CS599 Deep Learning Final Project
By Keen Quartet team
Guided by Prof. Joseph Lim
& Artem Molchanov
Project Overview
● Attempt to design a system that can transcribe music
● Musical piece characteristics:
○ Has multiple musical sources (many
○ Each instrument piece is polyphonic (more than one note at a given
time)
● Motivation:
○ To make it easy for music amateurs to learn to play instrument
Approach
Challenges
○ Polyphonic music:
multiple notes / time frame → exponential combinations → difficult
learning
○ Multiple instruments,vocals → multiple models each to transcribe single
instrument
● We address these challenges by incorporating:
○ Separation of music piece into its sources
■ Current focus, on separating vocals and background instruments only.
○ Identify predominant instrument and transcribe each accordingly
○ Currently we focus on transcription of piano music only
Our Project Pipeline
Predominant
Instrument
Identification
Transcription
Input
Music file
Multiple files
after source
separation
Predominant
Instrument
Identification
Transcription
voice
piano
Source
Separation
notes
notes
Source Separation
Goal
● Separate out different musical sources
○ Sources : (Voice, various instruments etc)
● Multiple instrument => highly complex task!
○ Need Labels for each source type
○ Tune loss function
● Focus on separating two sources : vocals, instrument
● Input : A spectrogram of mixed audio signals.
● Output: 2 audio files for each separate source.
● Dataset used: MIR-1K
Difficulttoretrieve
Image source:
http://www.cs.northwestern.edu/~pardo/courses/eecs352/lectures/source%20separation.pdf
Source Separation - Our Approach
● LSTM based approach.
● 2 dense layers: capture each source.
● Masking layer:
○ Normalize outputs of dense layers.
○ Mask out other source from mixed spectra.
● Joint training :
○ Network parameters.
○ Output of masking layer.
● Discriminative training:
○ Increase difference between :
■ Predicted vocal, actual instruments
■ Predicted instruments, actual vocals
:
Source Separation: Results
Music only
Voice
Visuals to show the effectiveness of our model.
Original
Predominant Music Identification
Goal
Identify the predominant instrument in the file obtained from source
segregation
Why? Transcription is instrument specific. So very important to know the
instrument before going for transcription
Approach
● Train a CNN model on 6000 audio files to infer the pattern in music file
● 11 categories of instruments for training
Input: .wav files obtained from previous steps
Output: Label of the predominant instrument
Dataset: IRMAS
Predominant Instrument Identification
Model
Results
● Initially very bad accuracy (15%)
○ Why? Less training, images larger than usual (43 x 128)
● Improved to achieve ~60% accuracy
● Batch normalization, more epochs (150 epochs with early stopping)
Image source:
Automatic Transcription for Polyphonic Music
Goal
Obtain transcription (note representation) of music
Seems easy : one-to-one mapping between notes and notations
But is not easy: Why?
● Polyphonic music: number of notes playing at given time
● Exponential combinations at a given time
● Multiple instruments: need separate model for each instrument as loss function differs for each
Currently, focusing on Piano music
Same approach works for any instrument
● Given good dataset
● Lots of training and proper loss function
Automatic Transcription for Polyphonic Music
Appoach:
● Train a ConvNET model on polyphonic piano music
● Used MAPS dataset:
○ 45 GB audio files, around 60 hours of recording
○ Processed about 6 million timeframes
Approach 1:
● Use the whole dataset
○ Computationally intensive
○ Trained for 7 epochs before early stopping
Approach 2:
● Iterative training by using one category at a time
○ Trained for 63, 20, 7, 7, 7 epochs
We obtain the probability distribution of notes being played. Infer the notes being played by
keeping threshold
Result: ~96% accuracy
Learning outcome
● Explored a domain completely new for us.
● Beginners in Deep Learning
● Our pipeline had 3 different models, one for each step, all using deep
learning approach. This required an extensive literature survey for each of
them and implementation and training effort. Each model is trained using a
different dataset
● Attempted to build on existing concepts in each part:
● Source separation: LSTM, Discriminative Learning
● Predominant Instrument Identification: Batch normalization
● Transcription: Different approaches to train for better generalization
Summary
● Our system is divided into three components:
● First attempt to transcribe Polyphonic Music for Multiple Instrument
using Deep Learning technique
● Future directions:
○ Extend source separation for multiple instruments
○ Make transcription model more flexible
Source separation → predominant instrument identification →
Transcription
References
1. Huang, Po-Sen, et al. “Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural
Networks.” ISMIR. 2014.
2. Chandna, Pritish, et al. “Monoaural audio source separation using deep convolutional neural networks.”
International Conference on Latent Variable Analysis and Signal Separation. Springer, Cham, 2017.
3. MIR-1K dataset: Chao-Ling Hsu, DeLiang Wang, Jyh-Shing Roger Jang, and Ke Hu, “ A Tandem Algorithm for Singing
Pitch Extraction and Voice Separation from Music Accompaniment,” IEEE Trans. Audio, Speech, and Language
Processing, 2011
4. Han, Yoonchang, et al. “Deep convolutional neural networks for predominant instrument recognition in
polyphonic music.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 25.1 (2017):
208-221.
5. IRMAS Dataset: Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. “A Comparison of Sound Segregation Techniques
for Predominant Instrument Recognition in Musical Audio Signals”, in Proc. ISMIR (pp. 559-564), 2012.
6. Sigtia, Siddharth, Emmanouil Benetos, and Simon Dixon. “An end-to-end neural network for polyphonic piano
music transcription.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.5 (2016):
927-939.
7. MAPS Dataset: Multi-pitch estimation of piano sounds using a new probabilistic spectral smoothness principle, V.
Emiya, R. Badeau, B. David, IEEE Transactions on Audio, Speech and Language Processing, 2010.
Thank You...
Automatic Music Transcription

More Related Content

What's hot

Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)  Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)
Shalin Hai-Jew
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Ai vs machine learning vs deep learning
Ai vs machine learning vs deep learningAi vs machine learning vs deep learning
Ai vs machine learning vs deep learning
Sanjay Patel
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
Vitaly Bondar
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
thomasjvarghese49
 
Generative AI for the rest of us
Generative AI for the rest of usGenerative AI for the rest of us
Generative AI for the rest of us
Massimo Ferre'
 
Neural networks
Neural networksNeural networks
Neural networks
Rizwan Rizzu
 
Deepfake.pptx
Deepfake.pptxDeepfake.pptx
Deepfake.pptx
NandeeshNandhu2
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
Andre Muscat
 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AI
CMassociates
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
StephenAmell4
 
AI for Art Generation / Dave Savio (Artifactory.ai)
AI for Art Generation / Dave Savio (Artifactory.ai)AI for Art Generation / Dave Savio (Artifactory.ai)
AI for Art Generation / Dave Savio (Artifactory.ai)
DevGAMM Conference
 
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Vadym Kazulkin
 
Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning
Emad Nabil
 
Deepfakes
DeepfakesDeepfakes
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
Cori Faklaris
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
CoriFaklaris1
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Fordham University
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 

What's hot (20)

Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)  Exploring the Deep Dream Generator (an Art-Making Generative AI)
Exploring the Deep Dream Generator (an Art-Making Generative AI)
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Ai vs machine learning vs deep learning
Ai vs machine learning vs deep learningAi vs machine learning vs deep learning
Ai vs machine learning vs deep learning
 
Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2Latent diffusions vs DALL-E v2
Latent diffusions vs DALL-E v2
 
Journey of Generative AI
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
 
Generative AI for the rest of us
Generative AI for the rest of usGenerative AI for the rest of us
Generative AI for the rest of us
 
Gesture Recognition
Gesture RecognitionGesture Recognition
Gesture Recognition
 
Neural networks
Neural networksNeural networks
Neural networks
 
Deepfake.pptx
Deepfake.pptxDeepfake.pptx
Deepfake.pptx
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AI
 
Build an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdfBuild an LLM-powered application using LangChain.pdf
Build an LLM-powered application using LangChain.pdf
 
AI for Art Generation / Dave Savio (Artifactory.ai)
AI for Art Generation / Dave Savio (Artifactory.ai)AI for Art Generation / Dave Savio (Artifactory.ai)
AI for Art Generation / Dave Savio (Artifactory.ai)
 
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023
 
Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning
 
Deepfakes
DeepfakesDeepfakes
Deepfakes
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 

Similar to Automatic Music Transcription

Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...
kthrlab
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
multimediaeval
 
BTP_MIDSEM_RNN.pptx
BTP_MIDSEM_RNN.pptxBTP_MIDSEM_RNN.pptx
BTP_MIDSEM_RNN.pptx
janhavisingh55
 
Computational models of symphonic music
Computational models of symphonic musicComputational models of symphonic music
Computational models of symphonic music
Emilia Gómez
 
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
kthrlab
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
multimediaeval
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
Yi-Hsuan Yang
 
Distinguishing Violinists and Pianists Based on Their Brain Signals
Distinguishing Violinists and Pianists Based on Their Brain SignalsDistinguishing Violinists and Pianists Based on Their Brain Signals
Distinguishing Violinists and Pianists Based on Their Brain Signals
Gianpaolo Coro
 
Thesis presentation on Music Information Retrieval
Thesis presentation on Music Information RetrievalThesis presentation on Music Information Retrieval
Thesis presentation on Music Information Retrieval
Ganesh Harugeri
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
Nithin Xavier
 
Mit21 m 380s12_complecnot
Mit21 m 380s12_complecnotMit21 m 380s12_complecnot
Mit21 m 380s12_complecnot
VenkateshKumar708402
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
Eric Battenberg
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
MLconf
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
Aloïs Gruson
 
Human Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewHuman Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A Review
Editor IJCATR
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
NAVER D2
 
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING  PPT.pptxMUSZIC GENERATION USING DEEP LEARNING  PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
life45165
 
Presentation on LMMS
Presentation on LMMSPresentation on LMMS
Presentation on LMMS
bhattigurjot
 
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
kthrlab
 

Similar to Automatic Music Transcription (20)

Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...Introduction of my research histroy: From instrument recognition to support o...
Introduction of my research histroy: From instrument recognition to support o...
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
 
BTP_MIDSEM_RNN.pptx
BTP_MIDSEM_RNN.pptxBTP_MIDSEM_RNN.pptx
BTP_MIDSEM_RNN.pptx
 
Computational models of symphonic music
Computational models of symphonic musicComputational models of symphonic music
Computational models of symphonic music
 
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
音楽の非専門家が演奏・創作を通じて音楽を楽しめる世界を目指して
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
 
audio
audioaudio
audio
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 
Distinguishing Violinists and Pianists Based on Their Brain Signals
Distinguishing Violinists and Pianists Based on Their Brain SignalsDistinguishing Violinists and Pianists Based on Their Brain Signals
Distinguishing Violinists and Pianists Based on Their Brain Signals
 
Thesis presentation on Music Information Retrieval
Thesis presentation on Music Information RetrievalThesis presentation on Music Information Retrieval
Thesis presentation on Music Information Retrieval
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Mit21 m 380s12_complecnot
Mit21 m 380s12_complecnotMit21 m 380s12_complecnot
Mit21 m 380s12_complecnot
 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
Human Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A ReviewHuman Perception and Recognition of Musical Instruments: A Review
Human Perception and Recognition of Musical Instruments: A Review
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING  PPT.pptxMUSZIC GENERATION USING DEEP LEARNING  PPT.pptx
MUSZIC GENERATION USING DEEP LEARNING PPT.pptx
 
Presentation on LMMS
Presentation on LMMSPresentation on LMMS
Presentation on LMMS
 
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
北原研究室の研究事例紹介:ベーシストの旋律分析とイコライザーの印象分析(Music×Analytics Meetup vol.5 ロングトーク)
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 

Automatic Music Transcription

  • 1. Automatic Music Transcription for Polyphonic Music CS599 Deep Learning Final Project By Keen Quartet team Guided by Prof. Joseph Lim & Artem Molchanov
  • 2. Project Overview ● Attempt to design a system that can transcribe music ● Musical piece characteristics: ○ Has multiple musical sources (many ○ Each instrument piece is polyphonic (more than one note at a given time) ● Motivation: ○ To make it easy for music amateurs to learn to play instrument
  • 3. Approach Challenges ○ Polyphonic music: multiple notes / time frame → exponential combinations → difficult learning ○ Multiple instruments,vocals → multiple models each to transcribe single instrument ● We address these challenges by incorporating: ○ Separation of music piece into its sources ■ Current focus, on separating vocals and background instruments only. ○ Identify predominant instrument and transcribe each accordingly ○ Currently we focus on transcription of piano music only
  • 4. Our Project Pipeline Predominant Instrument Identification Transcription Input Music file Multiple files after source separation Predominant Instrument Identification Transcription voice piano Source Separation notes notes
  • 5. Source Separation Goal ● Separate out different musical sources ○ Sources : (Voice, various instruments etc) ● Multiple instrument => highly complex task! ○ Need Labels for each source type ○ Tune loss function ● Focus on separating two sources : vocals, instrument ● Input : A spectrogram of mixed audio signals. ● Output: 2 audio files for each separate source. ● Dataset used: MIR-1K Difficulttoretrieve Image source: http://www.cs.northwestern.edu/~pardo/courses/eecs352/lectures/source%20separation.pdf
  • 6. Source Separation - Our Approach ● LSTM based approach. ● 2 dense layers: capture each source. ● Masking layer: ○ Normalize outputs of dense layers. ○ Mask out other source from mixed spectra. ● Joint training : ○ Network parameters. ○ Output of masking layer. ● Discriminative training: ○ Increase difference between : ■ Predicted vocal, actual instruments ■ Predicted instruments, actual vocals :
  • 7. Source Separation: Results Music only Voice Visuals to show the effectiveness of our model. Original
  • 8. Predominant Music Identification Goal Identify the predominant instrument in the file obtained from source segregation Why? Transcription is instrument specific. So very important to know the instrument before going for transcription Approach ● Train a CNN model on 6000 audio files to infer the pattern in music file ● 11 categories of instruments for training Input: .wav files obtained from previous steps Output: Label of the predominant instrument Dataset: IRMAS
  • 9. Predominant Instrument Identification Model Results ● Initially very bad accuracy (15%) ○ Why? Less training, images larger than usual (43 x 128) ● Improved to achieve ~60% accuracy ● Batch normalization, more epochs (150 epochs with early stopping) Image source:
  • 10. Automatic Transcription for Polyphonic Music Goal Obtain transcription (note representation) of music Seems easy : one-to-one mapping between notes and notations But is not easy: Why? ● Polyphonic music: number of notes playing at given time ● Exponential combinations at a given time ● Multiple instruments: need separate model for each instrument as loss function differs for each Currently, focusing on Piano music Same approach works for any instrument ● Given good dataset ● Lots of training and proper loss function
  • 11. Automatic Transcription for Polyphonic Music Appoach: ● Train a ConvNET model on polyphonic piano music ● Used MAPS dataset: ○ 45 GB audio files, around 60 hours of recording ○ Processed about 6 million timeframes Approach 1: ● Use the whole dataset ○ Computationally intensive ○ Trained for 7 epochs before early stopping Approach 2: ● Iterative training by using one category at a time ○ Trained for 63, 20, 7, 7, 7 epochs We obtain the probability distribution of notes being played. Infer the notes being played by keeping threshold Result: ~96% accuracy
  • 12. Learning outcome ● Explored a domain completely new for us. ● Beginners in Deep Learning ● Our pipeline had 3 different models, one for each step, all using deep learning approach. This required an extensive literature survey for each of them and implementation and training effort. Each model is trained using a different dataset ● Attempted to build on existing concepts in each part: ● Source separation: LSTM, Discriminative Learning ● Predominant Instrument Identification: Batch normalization ● Transcription: Different approaches to train for better generalization
  • 13. Summary ● Our system is divided into three components: ● First attempt to transcribe Polyphonic Music for Multiple Instrument using Deep Learning technique ● Future directions: ○ Extend source separation for multiple instruments ○ Make transcription model more flexible Source separation → predominant instrument identification → Transcription
  • 14. References 1. Huang, Po-Sen, et al. “Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks.” ISMIR. 2014. 2. Chandna, Pritish, et al. “Monoaural audio source separation using deep convolutional neural networks.” International Conference on Latent Variable Analysis and Signal Separation. Springer, Cham, 2017. 3. MIR-1K dataset: Chao-Ling Hsu, DeLiang Wang, Jyh-Shing Roger Jang, and Ke Hu, “ A Tandem Algorithm for Singing Pitch Extraction and Voice Separation from Music Accompaniment,” IEEE Trans. Audio, Speech, and Language Processing, 2011 4. Han, Yoonchang, et al. “Deep convolutional neural networks for predominant instrument recognition in polyphonic music.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 25.1 (2017): 208-221. 5. IRMAS Dataset: Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. “A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals”, in Proc. ISMIR (pp. 559-564), 2012. 6. Sigtia, Siddharth, Emmanouil Benetos, and Simon Dixon. “An end-to-end neural network for polyphonic piano music transcription.” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.5 (2016): 927-939. 7. MAPS Dataset: Multi-pitch estimation of piano sounds using a new probabilistic spectral smoothness principle, V. Emiya, R. Badeau, B. David, IEEE Transactions on Audio, Speech and Language Processing, 2010.