1. MDX 2021 Town Hall
Presentation
Chin-Yun Yu, Kin-Wai Cheuk
8/22
1
2. About Us
Chin-Yun Yu
● Organizer of the team Kazane_Ryo_no_Danna
● Software Engineer, Rayark (current position)
● Engineer, Vive R&D/Multi Media, HTC
● Research Assistant, Music and Culture Technology Lab, Institute of Information
Science, Academia Sinica
Kin-Wai Chuek
● PhD candidate, Singapore University of Technology and Design
2
3. Motivation
● Try out new ideas
● Get familiar with different types of MIR topics
● Test our limits
● GET THE PRIZE
3
4. Final rank
● 4th place on leaderboard A
● 6th place on leaderboard B
4
6. Model Fusion
● Blending outputs from three different model linearly
● Round 2 score: 6.85
Drums Bass Other Vocals
Model 1 (frequency masking) 0.2 0.2 0 0.2
Model 2 (frequency masking) 0.2 0.17 0.5 0.4
Model 3 (time domain) 0.6 0.73 0.5 0.4
6
7. Model 1: Improving X-UMX
● Two tiny tweaks on the loss function
○ Calculate the frequency-domain mse loss (Eq. 4a in [1]) using complex value instead of absolute
value
○ Apply one iteration of Multichannel Wiener Filtering (MWF) before calculating the loss
■ Our differentiable norbert package:
https://github.com/yoyololicon/norbert/tree/batch-support
○ Round 2 score: 5.5(baseline) => 5.64 (complex mse + MWF)
[1] Sawata, Ryosuke, et al. "All for One and One for All: Improving Music Separation by Bridging Networks." ICASSP 2021-2021
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. 7
8. Model 2: U-Net
● Adopt the U-Net structure from Spleeter
● Replace each convolution block with a D3
Block from D3Net
● Add two local attention layers as the
bottleneck layer
● Round 2 score: 6.03
● We also experimented with biaxial biLSTMs
(time and frequency axes)
● Round 2 score: 5.85
2D local
attention x2
8
9. Model 3: Improving Demus
● Cropping mechanism
○ The original Demucs uses center-crop to connect skipped layers when doing upsampling,
which might introduce some time shifting between layers
○ We neglectthose extra values at the end of the sequences
skipped layer
hidden layer
skipped layer
hidden layer
9
10. Model 3: Improving Demus
● Using 4 independent decoders, each of which corresponds to one source
○ We adjust the channel size of the decoders to have a comparable number of parameters
compared to the original version
● Round 2 score with these two modifications: 6.07 (baseline) => 6.27
LSTM
x2
LSTM
x2
Enc Dec
Enc
10
12. What we liked
● musdb dataloader is pretty slow
● Unexpected server error
● Cannot change team name after creation
12
● Friendly community
● Instant help from the support team
What could be improved