Data augmentation for sound data
MS student 1st grade, The University of Tokyo
Tomoya Koike
What is data augmentation
2
Increase data volume by deforming the input happening in real world
e.g. Flip
1. Can be observed in real world(or test dataset)
2. Does not affect the features required for classification
Data augmentation can be useful if augmented data…
Focus area in this survey
3
Automatic Speech Recognition(ASR)
Speech Emotion Recognition
Environmental Sound Classification(ESC)
Acoustic Scene Classification(ASC)
Audio Tagging
Traditional audio augmentation
4
• Pitch shifting
• Time stretching
• Loudness variation / Changing gain
• Adding background noise
• Adding reverberation noise[1]
• Time shifting
• Crop/Sub-sequence sampling
• Shuffling frames[2]
SpecAugment[8]
5
Simple, but strong
1. Audio into Mel spectrogram
2. Drop some section in time axis
and frequency axis
https://ai.googleblog.com/2019/04/specaugment-
new-data-augmentation.html
Procedure
Spectrogram augmentation[11]
6
Demo page
Mixup[3]/BC-learning[4]
7
c.f. SamplePairing[6],
Extrapolation[5]
Mixup
Between Class(BC) learning
cVAE / ACGAN[7]
8
Reference
9
[1] https://ieeexplore.ieee.org/abstract/document/7472835
[2] DOMESTIC ACTIVITIES CLASSIFICATION BASED ON CNN USING SHUFFLING AND MIXING DATA AUGMENTATION
[3] https://arxiv.org/pdf/1710.09412.pdf
[4] https://arxiv.org/pdf/1711.10282.pdf
[5] https://arxiv.org/pdf/1808.03883.pdf
[6] https://arxiv.org/pdf/1801.02929.pdf
[7] http://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Zhang_34.pdf
[8] https://arxiv.org/pdf/1904.08779.pdf
[9] https://arxiv.org/pdf/2002.12231.pdf
[10] https://www.sciencedirect.com/science/article/abs/pii/S0003682X19309442
[11] https://arxiv.org/pdf/2001.01401.pdf
[12] https://arxiv.org/pdf/1904.05862.pdf

Audio augmentation

  • 1.
    Data augmentation forsound data MS student 1st grade, The University of Tokyo Tomoya Koike
  • 2.
    What is dataaugmentation 2 Increase data volume by deforming the input happening in real world e.g. Flip 1. Can be observed in real world(or test dataset) 2. Does not affect the features required for classification Data augmentation can be useful if augmented data…
  • 3.
    Focus area inthis survey 3 Automatic Speech Recognition(ASR) Speech Emotion Recognition Environmental Sound Classification(ESC) Acoustic Scene Classification(ASC) Audio Tagging
  • 4.
    Traditional audio augmentation 4 •Pitch shifting • Time stretching • Loudness variation / Changing gain • Adding background noise • Adding reverberation noise[1] • Time shifting • Crop/Sub-sequence sampling • Shuffling frames[2]
  • 5.
    SpecAugment[8] 5 Simple, but strong 1.Audio into Mel spectrogram 2. Drop some section in time axis and frequency axis https://ai.googleblog.com/2019/04/specaugment- new-data-augmentation.html Procedure
  • 6.
  • 7.
  • 8.
  • 9.
    Reference 9 [1] https://ieeexplore.ieee.org/abstract/document/7472835 [2] DOMESTICACTIVITIES CLASSIFICATION BASED ON CNN USING SHUFFLING AND MIXING DATA AUGMENTATION [3] https://arxiv.org/pdf/1710.09412.pdf [4] https://arxiv.org/pdf/1711.10282.pdf [5] https://arxiv.org/pdf/1808.03883.pdf [6] https://arxiv.org/pdf/1801.02929.pdf [7] http://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Zhang_34.pdf [8] https://arxiv.org/pdf/1904.08779.pdf [9] https://arxiv.org/pdf/2002.12231.pdf [10] https://www.sciencedirect.com/science/article/abs/pii/S0003682X19309442 [11] https://arxiv.org/pdf/2001.01401.pdf [12] https://arxiv.org/pdf/1904.05862.pdf