This document summarizes Chin Yun Yu's presentation on developing a Non-negative Matrix Factorization (NMF) library in PyTorch. It introduces NMF and PyTorch, describes how Yu developed a PyTorch implementation of NMF called torchnmf with features like convolutional cases, and discusses potential applications of combining NMF with deep learning. The document also provides an overview of Yu's background and the presentation outline.
2. About me
● Background
○ Bachelor of CS, NCTU
○ Research Assistant in Music and Culture Technology Lab, IIS, AS
○ Engineer in Vive R&D/Multi Media, HTC (current position)
○ MIR research/open source project entusiast
○ Music producer
○ Guitarist of Catalyst (Taipei)
3. Outline
● What is NMF?
● What is PyTorch?
● How I developed PyTorch NMF
● What NMF can do with PyTorch
(particularly in the erra of deep
learning)?
4. What is NMF?
● Non-negative Matrix Factorization
● V, W, H > 0
● Useful to analyze audio spectrogram
5. Parameter update
● Févotte, Cédric, and Jérôme Idier. "Algorithms for nonnegative matrix
factorization with the β-divergence." Neural computation 23.9 (2011):
2421-2456.
● multiplicative update
6. Drums Transcription
● Wu, Chih-Wei, et al. "A review of automatic drum transcription." IEEE/ACM
Transactions on Audio, Speech, and Language Processing 26.9 (2018):
1457-1483.
7. Extend to convolutional case
● Smaragdis, Paris. "Non-negative matrix factor deconvolution; extraction of
multiple sound sources from monophonic inputs." International Conference on
Independent Component Analysis and Signal Separation. Springer, Berlin,
Heidelberg, 2004.
8. Drums Transcription (with NMFD)
● Wu, Chih-Wei, et al. "A review of automatic drum transcription." IEEE/ACM
Transactions on Audio, Speech, and Language Processing 26.9 (2018):
1457-1483.
9. Drums Transcription (with NMFD)
● Dittmar, Christian. "Source Separation and Restoration of Drum Sounds in
Music Recordings." (2018).
10. Reverse Engineering the Amen Break
● Dittmar, Christian. "Source Separation and Restoration of Drum Sounds in
Music Recordings." (2018).
11. Music Structure Analyze
● López-Serrano, Patricio, et al. "NMF TOOLBOX: MUSIC PROCESSING
APPLICATIONS OF NONNEGATIVE MATRIX FACTORIZATION." (2019).
12. 2D deconvolutional case
● Schmidt, Mikkel N., and Morten Mørup. "Nonnegative matrix factor 2-D
deconvolution for blind single channel source separation." International
Conference on Independent Component Analysis and Signal Separation.
Springer, Berlin, Heidelberg, 2006.
13. 2D deconvolutional case
● Schmidt, Mikkel N., and Morten Mørup. "Nonnegative matrix factor 2-D
deconvolution for blind single channel source separation." International
Conference on Independent Component Analysis and Signal Separation.
Springer, Berlin, Heidelberg, 2006.
14. 2D deconvolutional case
● Schmidt, Mikkel N., and Morten Mørup. "Nonnegative matrix factor 2-D
deconvolution for blind single channel source separation." International
Conference on Independent Component Analysis and Signal Separation.
Springer, Berlin, Heidelberg, 2006.
15. 2D deconvolutional case
● Schmidt, Mikkel N., and Morten Mørup. "Nonnegative matrix factor 2-D
deconvolution for blind single channel source separation." International
Conference on Independent Component Analysis and Signal Separation.
Springer, Berlin, Heidelberg, 2006.
16. Let it BEE - replace sound source
● Driedger, Jonathan, Thomas Prätzlich, and Meinard Müller. "Let it Bee-Towards
NMF-Inspired Audio Mosaicing." ISMIR. 2015.
https://www.audiolabs-erlangen.de/resources/
MIR/2015-ISMIR-LetItBee
17. PyTorch
● One of the most well-known deep learning frameworks
● Lauched in 2016 by Facebook
● Features
○ Easy to use python API
○ Dynamic computing graphs
○ Support GPU acceleration
○ automatic gradients calculation
○ Easy for prototyping
● Has been quickly adopted by researchers from many fields
24. Regular NMF v.s. PyTorch NMF
➔ Weight update in regular NMF
a. Compute positive component
b. Compute negative component
c. derive multiplicative update coefficients
d. multiplication
➔ Weight update in PyTorch NMF
a. Compute loss value
b. derive gradients via backward propagation
c. Compute positive component
d. Compute negative component by subtraction
e. derive multiplicative update coefficients
f. multiplication
➔ Reduce almost half lines of code -> easy to maintain
25. Extended to convolutional cases
Using torch.nn.functional.conv1d/2d/3d and inheriting from NMF base class
NMF
NMFD NMF2D NMF3D
34. How if we combine DL with traditional
methods ?
● Ravanelli, Mirco, and Yoshua Bengio.
"Speaker recognition from raw waveform with
sincnet." 2018 IEEE Spoken Language
Technology Workshop (SLT). IEEE, 2018.
● Use Band-passed signal as input feature
● Less parameters to learn, more robust,
converge faster, lower error
35.
36. Conclusion of torchnmf
● Advantages compare to others
○ Maintenance is more easy
○ Better support for convolutional cases (especially 3D)
○ Can run on GPU for faster convergence
● News
○ Feature such as batching/controlling sparsity is on the way!
● Next step
○ Full autograd support so it can be integrated in other DL models
○ Documentations
○ upload to PyPI
● Feel free to create PRs or issues!
name = "torchnmf"
__version__ = '0.2'
__maintainer__ = 'Chin-Yun Yu'
__email__ = 'ya70201@gmail.com'