Automatic Music
Transcription :
Overview
01/26/2016
Cho Won Ik
• Introduction : What is transcription?
• Goal of automatic music transcription
• Early days in AMT research
• Current research areas on AMT
• Multi-pitch analysis
• Semi-automatic (informed) transcription
• Complete music notation
• Challenge
• Future works
Contents
• What is transcription?
• Notating a piece or a sound which is previously unnotated
• Usually hand-written
at past; notated digital
nowadays
• Why is it necessary?
• Information retrieval from blind source
• e.g. traditional music, impromptu, piece with score unreleased …
• Objective musical performance measurement
• Application to systematic/computational musicology
Introduction
• Example of transcription software
• Mostly pitch estimation
Introduction
• What is required in music transcription?
• Pitch, onset time, duration (frequency-temporal analysis)
• Loudness (amplitude)
• Instrumentation (waveform, after source separation)
• High-level features
• Melody tracking (often among same instrument)
• Rhythmic information : tempo and beat
• Harmonic data : key and chord
Introduction
“Can machine transcribe music just as (trained) human do?”
• Melograph [Metfessel, 1928]
• Special-purpose hardware device that makes a
graph of the pitch of the input waveform with time
Early days in AMT research
• Segmentation and analysis of continuous musical sound
by digital computer [Moorer, 1975]
• First paper to discuss automatic transcription in signal
processing view (especially filter theory)
• Optimum comb method is used to detect F0
Early days in AMT research
• Blackboard system [Martin, 1996]
• Various forms of knowledge integrated for specific purpose
• human physiology, acoustics, musical practice etc.
• Blackboard workspace is arranged in a
hierarchy of five hypothesis becoming
abstract as going upward
• Tracks, Partials, Notes, Intervals, Chords
Early days in AMT research
• Blackboard system (cont’d)
• Input
• Discretized version of the information
in the spectrogram representations
• Output
• Textual representation of
detected note
• Graphical display of the
note onset data
Early days in AMT research
• Connectionist approach [Marolt, 2004]
• Resembles human perception of pitch
• Auditory-model based partial tracking
• Networks of adaptive oscillators inspired from hair cells of cochlea
• Note recognition based on neural network
Early days in AMT research
• Current research areas
• Multi-pitch analysis
• Frame-level
• Note-level
• Timbre tracking
• Semi-automatic (informed) transcription
• Complete music notation
Research areas on AMT
• Core problem in automatic music transcription
• Most studies deal with western classical piano pieces
• Due to clarity, polyphony, plentiful DB
• Multi-pitch analysis difficult for human
• Overlapping partials
Multi-pitch analysis
• Octave ambiguity
• Ambiguity in estimation of the number of sources
• Obscurity from instrumentation
Multi-pitch analysis
• Frame-level analysis
• Estimate pitches and polyphony in each frame
• Feature-based analysis
• Statistical model-based analysis
• Spectrogram decomposition-based analysis
• Note-level analysis
• Estimate pitch, onset & offset of notes
• Minimum duration pruning
• Hidden Markov model
• Efficient convolutional sparse coding
Multi-pitch analysis
• Feature-based analysis
• Pitch of complex tone : fundamental frequency (F0)
• Partials/Overtones
• Harmonics
• Harmonic instrument
• String, winds, piano etc.
• Produced overtone difference
causes diversity in timbre (spectral envelope)
Frame-level analysis
f = 440 Hz n = 1 Fundamental tone 1st harmonic 1st partial
f = 880 Hz n = 2 1st overtone 2nd harmonic 2nd partial
f = 1320 Hz n = 3 2nd overtone 3rd harmonic 3rd partial
• Multiple-F0 estimation based on polyphony inference [Yeh, 2008]
• Goal : extract multiple-F0 from STFT frame of harmonic instrument
• Noise model / Source model / Source interaction model
• Noise model distinguish unnecessary components for harmonic analysis
• Non-harmonically related F0s (NHRF0s)
• Abbreviation in computation for proper F0 candidate selection
• Hypothetical partial sequence (HPS)
Frame-level analysis
Extraction of a HRF0 F0c from
the HPS of a NHRF0 F0a
• Multiple-F0 estimation based on polyphony inference (cont’d)
• Source model : Quasi-periodic
• Partial frequencies and amplitude of hypothetical sources are estimated
• Source interaction model : Guiding principles for generative signal model
• Harmonicity
• Smoothness of spectral envelopes
• Synchronous amplitude evolution of partials
• Scoring function for joint evaluation is suggested
• Criteria : Harmonicity/Mean bandwidth/Spectral centroid/Synchronicity
• Smaller weighted sum stands for a better score (𝑝𝑝𝑖𝑖 decided experimentally)
Frame-level analysis
𝑆𝑆 = 𝑝𝑝1 ∙ HAR + 𝑝𝑝2 ∙ MBW + 𝑝𝑝3 ∙ SPC + 𝑝𝑝4 ∙ SYNC
• Multiple-F0 estimation based on polyphony inference (cont’d)
• Polyphony is inferred based on assumption that combination of correct
number of F0s are expected to give the highest score
Frame-level analysis
• Statistical model-based analysis
• Multi-pitch estimation using
new probabilistic spectral
smoothness principle [Emiya, 2010]
• Given an observed frame x and a set
𝑪𝑪 of all possible fundamental
frequency combinations, multi-pitch
detection function 𝐶𝐶̂ can be written
(Maximum a posteriori)
Frame-level analysis
𝐶𝐶̂ = 𝑃𝑃 𝐶𝐶 𝑋𝑋𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
𝐶𝐶 ∈ 𝑪𝑪
=
𝑃𝑃 𝐶𝐶 𝑋𝑋 𝑃𝑃(𝐶𝐶)
𝑃𝑃(𝑋𝑋)
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
𝐶𝐶 ∈ 𝑪𝑪
• Spectrogram decomposition-based analysis
• Nonnegative matrix factorization [Smaragdis, 2003]
• NMF model decomposes an input spectrogram 𝑋𝑋 with 𝐾𝐾 frequency
bins and 𝑁𝑁 frames into 𝑋𝑋 ≈ 𝑊𝑊𝑊𝑊
• For number of pitch bases R ≪ 𝑁𝑁, 𝐾𝐾; 𝑊𝑊 contains the spectral bases
for each of the 𝑅𝑅 pitch components, and 𝐻𝐻 is the pitch activity matrix
across time
Frame-level analysis
• Time-pitch computation must be further processed to
detect note events with
• Discrete pitch value
• An onset time and offset time (duration)
• Minimum duration pruning [Dessein, 2010]
• Simple and fast solution
• Applied after thresholding
• Note events which have a
duration smaller than a
pre-defined value are removed from the final score
Note-level analysis
• Hidden Markov models [Ryynanen, 2005]
• Model each note with a 3-state note event HMM
• 3 states : attack, sustain, noise states of each sound
• Musicological model was used for estimating musical key and note
transition probabilities
• Observation :
• Pitch deviation
• Pitch salience
• Onset strength
• Model silence with a 1-state
silence HMM
Note-level analysis
• Efficient convolutional sparse coding [Wohlberg, 2014]
• Note tracking from audio directly
𝑠𝑠[𝑡𝑡] : monaural, polyphonic audio recording of a piano piece
𝑑𝑑 𝑚𝑚[𝑡𝑡] : dictionary element representing notes of piano
𝑥𝑥 𝑚𝑚 𝑡𝑡 : activation vectors
• Nonzero value at index 𝑡𝑡 of 𝑥𝑥 𝑚𝑚[𝑡𝑡] represent activation of note
𝑚𝑚 at sample 𝑡𝑡
Note-level analysis
𝑠𝑠 𝑡𝑡 ≅ � 𝑑𝑑 𝑚𝑚 𝑡𝑡 ∗ 𝑥𝑥 𝑚𝑚[𝑡𝑡]
𝑚𝑚
• Area closely related to source separation problem
• Also known as multi-pitch streaming
• Supervised
• Train timbre models of sound sources
• Apply timbre models during pitch estimation
• Classify estimated pitches/notes
• Supervised with timbre adaptation
• Adapt trained timbre models to sources in mixture
• Unsupervised
• Cluster pitch estimates according to timbre
• Includes problem of percussive instrument separation
• Spectrogram decomposition is still useful
Timbre tracking
• Spectrogram decomposition-based analysis
• Probabilistic latent component analysis [Smaragdis, 2007]
• For N-dim random variable 𝑥𝑥 and latent variable 𝑧𝑧,
• Estimation of marginals 𝑃𝑃(𝑥𝑥 𝑗𝑗
|𝑧𝑧) is performed using EM algorithm
• In source separation, magnitude spectrogram is expressed as
that decomposition will result into two sets of marginals
Timbre tracking
• Probabilistic latent component analysis (cont’d)
• 𝑃𝑃 𝑓𝑓 𝑧𝑧 = 𝑃𝑃1(𝑓𝑓|𝑧𝑧) ∪ 𝑃𝑃2(𝑓𝑓|𝑧𝑧)
• 𝑃𝑃1(𝑓𝑓|𝑧𝑧) and 𝑃𝑃2 𝑓𝑓 𝑧𝑧 known frequency marginals
• For 𝑃𝑃 𝑓𝑓 𝑧𝑧 to explain mixture spectrogram
𝑃𝑃 𝑓𝑓, 𝑡𝑡 , we only need to estimate 𝑃𝑃(𝑡𝑡|𝑧𝑧)
• 𝑃𝑃(𝑡𝑡|𝑧𝑧) is splited into two sets which
correspond to each source
• Reconstruction of input spectrogram
that correspond to only one
Timbre tracking
• Current state-of-the-art AMT system do not reach same
level accuracy as transcriptions made by human experts
• Human could assist computational transcription process that
are crucial for an accurate transcription but difficult to model
algorithmically
• Instrument identification
• Auditory stream segregation
• Not applicable to the analysis of large music database
• Useful for more detailed and accurate transcription of music
Semi-automatic transcription
• Current AMT system can
• Detect (multiple) pitches,
onsets, offsets
• Identify instruments and
track notes in polyphony
• Identify articulation and
rhythm information
• Analyzed data need to be
translated into musical form
• Score form / MIDI form
• Fingering / string detection
• Direct mapping to software tools
Complete music notation
• MIREX (MIR Evaluation eXchange)
• Multiple F0 estimation & tracking
• Performance measure
• Precision (the portion of correct retrieved pitches for all pitches
retrieved for each frame)
• Recall (the ratio of correct pitches to all ground truth pitches for each
frame)
• Audio onset detection
• Performance measure
• Precision / Recall / F-measure / Scoring for doubled onset
• Time precision (tolerance from +/- 50 ms to less)
• Separate scoring for different instrument types
• Singing voice separation
• Performance measure
• SDR / SIR (Source to inferences ratio) / SAR (Source to artifacts ratio)
Challenge
• Apply ideas of AED/source separation in AMT
• Instrument identification and timbre tracking is still difficult
• AED can be used to identify onset and offset of instruments
• Source separation can be applied to decompose polyphonic
music to set of monophony
• Presence information of instruments, from AED, can be useful
Future works
• Z. Duan and E. Benetos, “Tutorial : Automatic music transcription,” 16th International So
ciety of Music Information Retrieval Conference, 2015.
• E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri, “Automatic music
transcription: challenges and future directions,” Journal of Intelligent Information
Systems, Vol. 41, No. 3, pp. 407-434, 2013.
• J. A. Moorer, “On the segmentation and analysis of continuous musical sound by digital
computer,” PhD thesis, Stanford University, 1975.
• K. D. Martin, "A blackboard system for automatic transcription of simple polyphonic
music." Massachusetts Institute of Technology Media Laboratory Perceptual Computing
Section Technical Report, No. 385, 1996.
• M. Marolt, “A Connectionist Approach to Automatic Transcription of Polyphonic Piano
Music,” IEEE Transactions on Multimedia, Vol. 6, No. 3, Jun. 2004.
• C. Yeh, “Multiple fundamental frequency estimation of polyphonic recordings,” PhD
thesis, Universite Paris VI – Pierre et Marie Curie, 2008.
• V. Emiya, R. Badeau, and B. David, “Multipitch estimation of piano sound using a
new probabilistic spectral smoothness principle,” IEEE Transactions on Audio,
Speech, and Language Processing, Vol. 18, No. 6, pp. 1643-1654, Aug. 2010.
Reference
• P. Smaragdis and J. C. Brown, “Non-negative factorization for polyphonic music transcri
ption,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, NY,
USA, Oct. 2003.
• A. Dessein, A. Cont, and G. Lemaitre, “Real-time polyphonic music transcription with no
n-negative matrix factorization and beta-divergence,” In proceedings of 11th
International Society of Music Information Retrieval Conference, pp. 489-494, 2010.
• M. P. Ryynanen and A. Klapuri, “Polyphonic music transcription using note event model
ing,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, NY, US
A, Oct. 2003.
• A. Cogliati, Z. Duan, and B. Wohlberg, “Piano music transcription with fast convolutiona
l sparse coding,” IEEE Workshop on Machine Learning for Signal Processing, Boston, US
A, Sep. 2015.
• P. Smaragdis, R. Bhiksha, and S. Madhusudana, “Supervised and semi-supervised
separation of sounds from single- channel mixtures,” In proceedings of 7th International
Conference on Independent Component Analysis and Signal Separation, pp. 414-421,
2007.
Reference
Thank You!!!

AMT overview

  • 1.
  • 2.
    • Introduction :What is transcription? • Goal of automatic music transcription • Early days in AMT research • Current research areas on AMT • Multi-pitch analysis • Semi-automatic (informed) transcription • Complete music notation • Challenge • Future works Contents
  • 3.
    • What istranscription? • Notating a piece or a sound which is previously unnotated • Usually hand-written at past; notated digital nowadays • Why is it necessary? • Information retrieval from blind source • e.g. traditional music, impromptu, piece with score unreleased … • Objective musical performance measurement • Application to systematic/computational musicology Introduction
  • 4.
    • Example oftranscription software • Mostly pitch estimation Introduction
  • 5.
    • What isrequired in music transcription? • Pitch, onset time, duration (frequency-temporal analysis) • Loudness (amplitude) • Instrumentation (waveform, after source separation) • High-level features • Melody tracking (often among same instrument) • Rhythmic information : tempo and beat • Harmonic data : key and chord Introduction “Can machine transcribe music just as (trained) human do?”
  • 6.
    • Melograph [Metfessel,1928] • Special-purpose hardware device that makes a graph of the pitch of the input waveform with time Early days in AMT research
  • 7.
    • Segmentation andanalysis of continuous musical sound by digital computer [Moorer, 1975] • First paper to discuss automatic transcription in signal processing view (especially filter theory) • Optimum comb method is used to detect F0 Early days in AMT research
  • 8.
    • Blackboard system[Martin, 1996] • Various forms of knowledge integrated for specific purpose • human physiology, acoustics, musical practice etc. • Blackboard workspace is arranged in a hierarchy of five hypothesis becoming abstract as going upward • Tracks, Partials, Notes, Intervals, Chords Early days in AMT research
  • 9.
    • Blackboard system(cont’d) • Input • Discretized version of the information in the spectrogram representations • Output • Textual representation of detected note • Graphical display of the note onset data Early days in AMT research
  • 10.
    • Connectionist approach[Marolt, 2004] • Resembles human perception of pitch • Auditory-model based partial tracking • Networks of adaptive oscillators inspired from hair cells of cochlea • Note recognition based on neural network Early days in AMT research
  • 11.
    • Current researchareas • Multi-pitch analysis • Frame-level • Note-level • Timbre tracking • Semi-automatic (informed) transcription • Complete music notation Research areas on AMT
  • 12.
    • Core problemin automatic music transcription • Most studies deal with western classical piano pieces • Due to clarity, polyphony, plentiful DB • Multi-pitch analysis difficult for human • Overlapping partials Multi-pitch analysis
  • 13.
    • Octave ambiguity •Ambiguity in estimation of the number of sources • Obscurity from instrumentation Multi-pitch analysis
  • 14.
    • Frame-level analysis •Estimate pitches and polyphony in each frame • Feature-based analysis • Statistical model-based analysis • Spectrogram decomposition-based analysis • Note-level analysis • Estimate pitch, onset & offset of notes • Minimum duration pruning • Hidden Markov model • Efficient convolutional sparse coding Multi-pitch analysis
  • 15.
    • Feature-based analysis •Pitch of complex tone : fundamental frequency (F0) • Partials/Overtones • Harmonics • Harmonic instrument • String, winds, piano etc. • Produced overtone difference causes diversity in timbre (spectral envelope) Frame-level analysis f = 440 Hz n = 1 Fundamental tone 1st harmonic 1st partial f = 880 Hz n = 2 1st overtone 2nd harmonic 2nd partial f = 1320 Hz n = 3 2nd overtone 3rd harmonic 3rd partial
  • 16.
    • Multiple-F0 estimationbased on polyphony inference [Yeh, 2008] • Goal : extract multiple-F0 from STFT frame of harmonic instrument • Noise model / Source model / Source interaction model • Noise model distinguish unnecessary components for harmonic analysis • Non-harmonically related F0s (NHRF0s) • Abbreviation in computation for proper F0 candidate selection • Hypothetical partial sequence (HPS) Frame-level analysis Extraction of a HRF0 F0c from the HPS of a NHRF0 F0a
  • 17.
    • Multiple-F0 estimationbased on polyphony inference (cont’d) • Source model : Quasi-periodic • Partial frequencies and amplitude of hypothetical sources are estimated • Source interaction model : Guiding principles for generative signal model • Harmonicity • Smoothness of spectral envelopes • Synchronous amplitude evolution of partials • Scoring function for joint evaluation is suggested • Criteria : Harmonicity/Mean bandwidth/Spectral centroid/Synchronicity • Smaller weighted sum stands for a better score (𝑝𝑝𝑖𝑖 decided experimentally) Frame-level analysis 𝑆𝑆 = 𝑝𝑝1 ∙ HAR + 𝑝𝑝2 ∙ MBW + 𝑝𝑝3 ∙ SPC + 𝑝𝑝4 ∙ SYNC
  • 18.
    • Multiple-F0 estimationbased on polyphony inference (cont’d) • Polyphony is inferred based on assumption that combination of correct number of F0s are expected to give the highest score Frame-level analysis
  • 19.
    • Statistical model-basedanalysis • Multi-pitch estimation using new probabilistic spectral smoothness principle [Emiya, 2010] • Given an observed frame x and a set 𝑪𝑪 of all possible fundamental frequency combinations, multi-pitch detection function 𝐶𝐶̂ can be written (Maximum a posteriori) Frame-level analysis 𝐶𝐶̂ = 𝑃𝑃 𝐶𝐶 𝑋𝑋𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶 ∈ 𝑪𝑪 = 𝑃𝑃 𝐶𝐶 𝑋𝑋 𝑃𝑃(𝐶𝐶) 𝑃𝑃(𝑋𝑋) 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝐶𝐶 ∈ 𝑪𝑪
  • 20.
    • Spectrogram decomposition-basedanalysis • Nonnegative matrix factorization [Smaragdis, 2003] • NMF model decomposes an input spectrogram 𝑋𝑋 with 𝐾𝐾 frequency bins and 𝑁𝑁 frames into 𝑋𝑋 ≈ 𝑊𝑊𝑊𝑊 • For number of pitch bases R ≪ 𝑁𝑁, 𝐾𝐾; 𝑊𝑊 contains the spectral bases for each of the 𝑅𝑅 pitch components, and 𝐻𝐻 is the pitch activity matrix across time Frame-level analysis
  • 21.
    • Time-pitch computationmust be further processed to detect note events with • Discrete pitch value • An onset time and offset time (duration) • Minimum duration pruning [Dessein, 2010] • Simple and fast solution • Applied after thresholding • Note events which have a duration smaller than a pre-defined value are removed from the final score Note-level analysis
  • 22.
    • Hidden Markovmodels [Ryynanen, 2005] • Model each note with a 3-state note event HMM • 3 states : attack, sustain, noise states of each sound • Musicological model was used for estimating musical key and note transition probabilities • Observation : • Pitch deviation • Pitch salience • Onset strength • Model silence with a 1-state silence HMM Note-level analysis
  • 23.
    • Efficient convolutionalsparse coding [Wohlberg, 2014] • Note tracking from audio directly 𝑠𝑠[𝑡𝑡] : monaural, polyphonic audio recording of a piano piece 𝑑𝑑 𝑚𝑚[𝑡𝑡] : dictionary element representing notes of piano 𝑥𝑥 𝑚𝑚 𝑡𝑡 : activation vectors • Nonzero value at index 𝑡𝑡 of 𝑥𝑥 𝑚𝑚[𝑡𝑡] represent activation of note 𝑚𝑚 at sample 𝑡𝑡 Note-level analysis 𝑠𝑠 𝑡𝑡 ≅ � 𝑑𝑑 𝑚𝑚 𝑡𝑡 ∗ 𝑥𝑥 𝑚𝑚[𝑡𝑡] 𝑚𝑚
  • 24.
    • Area closelyrelated to source separation problem • Also known as multi-pitch streaming • Supervised • Train timbre models of sound sources • Apply timbre models during pitch estimation • Classify estimated pitches/notes • Supervised with timbre adaptation • Adapt trained timbre models to sources in mixture • Unsupervised • Cluster pitch estimates according to timbre • Includes problem of percussive instrument separation • Spectrogram decomposition is still useful Timbre tracking
  • 25.
    • Spectrogram decomposition-basedanalysis • Probabilistic latent component analysis [Smaragdis, 2007] • For N-dim random variable 𝑥𝑥 and latent variable 𝑧𝑧, • Estimation of marginals 𝑃𝑃(𝑥𝑥 𝑗𝑗 |𝑧𝑧) is performed using EM algorithm • In source separation, magnitude spectrogram is expressed as that decomposition will result into two sets of marginals Timbre tracking
  • 26.
    • Probabilistic latentcomponent analysis (cont’d) • 𝑃𝑃 𝑓𝑓 𝑧𝑧 = 𝑃𝑃1(𝑓𝑓|𝑧𝑧) ∪ 𝑃𝑃2(𝑓𝑓|𝑧𝑧) • 𝑃𝑃1(𝑓𝑓|𝑧𝑧) and 𝑃𝑃2 𝑓𝑓 𝑧𝑧 known frequency marginals • For 𝑃𝑃 𝑓𝑓 𝑧𝑧 to explain mixture spectrogram 𝑃𝑃 𝑓𝑓, 𝑡𝑡 , we only need to estimate 𝑃𝑃(𝑡𝑡|𝑧𝑧) • 𝑃𝑃(𝑡𝑡|𝑧𝑧) is splited into two sets which correspond to each source • Reconstruction of input spectrogram that correspond to only one Timbre tracking
  • 27.
    • Current state-of-the-artAMT system do not reach same level accuracy as transcriptions made by human experts • Human could assist computational transcription process that are crucial for an accurate transcription but difficult to model algorithmically • Instrument identification • Auditory stream segregation • Not applicable to the analysis of large music database • Useful for more detailed and accurate transcription of music Semi-automatic transcription
  • 28.
    • Current AMTsystem can • Detect (multiple) pitches, onsets, offsets • Identify instruments and track notes in polyphony • Identify articulation and rhythm information • Analyzed data need to be translated into musical form • Score form / MIDI form • Fingering / string detection • Direct mapping to software tools Complete music notation
  • 29.
    • MIREX (MIREvaluation eXchange) • Multiple F0 estimation & tracking • Performance measure • Precision (the portion of correct retrieved pitches for all pitches retrieved for each frame) • Recall (the ratio of correct pitches to all ground truth pitches for each frame) • Audio onset detection • Performance measure • Precision / Recall / F-measure / Scoring for doubled onset • Time precision (tolerance from +/- 50 ms to less) • Separate scoring for different instrument types • Singing voice separation • Performance measure • SDR / SIR (Source to inferences ratio) / SAR (Source to artifacts ratio) Challenge
  • 30.
    • Apply ideasof AED/source separation in AMT • Instrument identification and timbre tracking is still difficult • AED can be used to identify onset and offset of instruments • Source separation can be applied to decompose polyphonic music to set of monophony • Presence information of instruments, from AED, can be useful Future works
  • 31.
    • Z. Duanand E. Benetos, “Tutorial : Automatic music transcription,” 16th International So ciety of Music Information Retrieval Conference, 2015. • E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri, “Automatic music transcription: challenges and future directions,” Journal of Intelligent Information Systems, Vol. 41, No. 3, pp. 407-434, 2013. • J. A. Moorer, “On the segmentation and analysis of continuous musical sound by digital computer,” PhD thesis, Stanford University, 1975. • K. D. Martin, "A blackboard system for automatic transcription of simple polyphonic music." Massachusetts Institute of Technology Media Laboratory Perceptual Computing Section Technical Report, No. 385, 1996. • M. Marolt, “A Connectionist Approach to Automatic Transcription of Polyphonic Piano Music,” IEEE Transactions on Multimedia, Vol. 6, No. 3, Jun. 2004. • C. Yeh, “Multiple fundamental frequency estimation of polyphonic recordings,” PhD thesis, Universite Paris VI – Pierre et Marie Curie, 2008. • V. Emiya, R. Badeau, and B. David, “Multipitch estimation of piano sound using a new probabilistic spectral smoothness principle,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 18, No. 6, pp. 1643-1654, Aug. 2010. Reference
  • 32.
    • P. Smaragdisand J. C. Brown, “Non-negative factorization for polyphonic music transcri ption,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, NY, USA, Oct. 2003. • A. Dessein, A. Cont, and G. Lemaitre, “Real-time polyphonic music transcription with no n-negative matrix factorization and beta-divergence,” In proceedings of 11th International Society of Music Information Retrieval Conference, pp. 489-494, 2010. • M. P. Ryynanen and A. Klapuri, “Polyphonic music transcription using note event model ing,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, NY, US A, Oct. 2003. • A. Cogliati, Z. Duan, and B. Wohlberg, “Piano music transcription with fast convolutiona l sparse coding,” IEEE Workshop on Machine Learning for Signal Processing, Boston, US A, Sep. 2015. • P. Smaragdis, R. Bhiksha, and S. Madhusudana, “Supervised and semi-supervised separation of sounds from single- channel mixtures,” In proceedings of 7th International Conference on Independent Component Analysis and Signal Separation, pp. 414-421, 2007. Reference
  • 33.