This document proposes a melody extraction method using multi-column deep neural networks (MCDNNs). The key points are:
1. An MCDNN architecture is used to classify frames into multiple pitch resolutions (e.g. 1 semitone, 0.5 semitone) for improved accuracy and resolution.
2. Data augmentation by pitch shifting and a singing voice detector are used to increase training data.
3. Hidden Markov models provide temporal smoothing of MCDNN outputs.
4. Evaluation on various datasets shows the MCDNN approach outperforms state-of-the-art methods for melody extraction.