Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning

228 views

Published on

Slides of my talk at the Interpretability and Robustness in Audio, Speech, and Language (IRASL) Workshop at NeurIPS2018

Published in: Science
  • Login to see the comments

Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning

  1. 1. Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Hamid Eghbal-zadeh 1,2 , Matthias Dorfer 1 , Gerhard Widmer 1,2 1 2
  2. 2. Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Hamid Eghbal-zadeh 1,2 , Matthias Dorfer 1 , Gerhard Widmer 1,2 1 2
  3. 3. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Motivation
  4. 4. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations
  5. 5. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations ● CNNs are also known to generalize on the unseen data
  6. 6. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations ● CNNs are also known to generalize on the unseen data ● Many of the benchmark datasets have similar train/test distributions
  7. 7. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning ● Convolutional Neural Networks learn useful features and build good representations ● CNNs are also known to generalize on the unseen data ● Many of the benchmark datasets have similar train/test distributions ● How about a distribution mismatch between training and test?
  8. 8. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set
  9. 9. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set ● Speaker Recognition: Training on English, testing on Chinese
  10. 10. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set ● Speaker Recognition: Training on English, testing on Chinese ● Acoustic Scene Classification: Training on Scenes in one country, testing on scenes of another country, in another period of time
  11. 11. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Distribution mismatch: When the distribution of the data in training and validation sets differ from the test set ● Speaker Recognition: Training on English, testing on Chinese ● Acoustic Scene Classification: Training on Scenes in one country, testing on scenes of another country, in another period of time
  12. 12. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Performance of end-to-end CNNs (no mismatch vs mismatched): ● We use DCASE2016 (no mismatch) and DCASE2017 (mismatched) datasets1 ● Same training and validation, different test set ● Look at several end-to-end CNNs 1) Detection and Classification of Acoustic Scenes and Events, http://dcase.community
  13. 13. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Analysis of the representation
  14. 14. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Eigenvalue Analysis: ● We train a VGG network on No mismatch and Mismatched using spectrograms
  15. 15. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Eigenvalue Analysis: ● We train a VGG network on No mismatch and Mismatched using spectrograms ● We analyse the internal representation of the VGG
  16. 16. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Covariance Eigenvalue Analysis: ● We train a VGG network on No mismatch and Mismatched using spectrograms ● We analyse the internal representation of the VGG ● We use covariance analysis ○ Eigen-values of the covariances matrix ○ Visualisation of the representations projected via PCA
  17. 17. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Covariance Eigenvalue Analysis: Train Test Mismatched Validation
  18. 18. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning NomismatchVisualisation of the VGG representations: Train Validation Test Mismatched
  19. 19. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalisation (WCCN)
  20. 20. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalization1,2 : ● Proposed for Speaker Recognition to reduce the false positive/negatives 1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken language processing. 2006. 2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
  21. 21. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalization1,2 : ● Proposed for Speaker Recognition to reduce the false positive/negatives ● Used to reduce the within-class variability in features such as GMM supervectors or i-vector features 1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken language processing. 2006. 2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
  22. 22. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Within-Class Covariance Normalization1,2 : 1) Hatch, Andrew O., et al. "Within-class covariance normalization for SVM-based speaker recognition." Ninth international conference on spoken language processing. 2006. 2) Hatch, Andrew O., et al. "Generalized linear kernels for one-versus-all classification: application to speaker recognition." Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on. Vol. 5. IEEE, 2006.
  23. 23. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis
  24. 24. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN
  25. 25. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches
  26. 26. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability
  27. 27. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass
  28. 28. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass
  29. 29. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass ● A running average is computed for test time (similar to batchnorm)
  30. 30. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass ● A running average is computed for test time (similar to batchnorm) ● Compatible with different supervised tasks (Classification, Detection, metric learning...) and data (raw audio...)
  31. 31. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Deep Within-Class Covariance Analysis (DWCCA): ● A deep learning compatible version of WCCN ● A statistical DL layer, trained end-to-end using SGD with minibatches ● Can be placed anywhere to reduce the within-class variability ● B in training is equal to Bb in forward pass ● Gradients wrt B are computed and used in backward pass ● A running average is computed for test time (similar to batchnorm) ● Compatible with different supervised tasks (Classification, Detection, metric learning...) and data (raw audio...) ● Can be used with different supervised losses (CCE, BCE, l2 , ...)
  32. 32. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Results
  33. 33. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Within-Class Covariance Eigenvalue Analysis (Without DWCCA): Train Validation Test Mismatched
  34. 34. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Within-Class Covariance Eigenvalue Analysis (With DWCCA): Train Test Mismatched Validation
  35. 35. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch Eigenvalue Analysis (With vs without DWCCA): Train Test Mismatched Validation
  36. 36. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Nomismatch K-NN classification results on VGG representations Validation Test Mismatched
  37. 37. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  38. 38. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  39. 39. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  40. 40. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  41. 41. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning *: Single model, Single-channel features : Multi-channel features :Ensemble of various models NomismatchMismatched End-to-end classification:
  42. 42. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning MismatchedNo mismatch End-to-end class-wise F1:
  43. 43. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning MismatchedNo mismatch End-to-end class-wise F1:
  44. 44. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary
  45. 45. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network Nomismatch Train Test Mismatched Validation
  46. 46. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network ● We showed that the more mismatch there is between training and test, the more within-class variability increases in the representation Nomismatch Train Test Mismatched Validation
  47. 47. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network ● We showed that the more mismatch there is between training and test, the more within-class variability increases in the representation ● We proposed Deep Within-class Covariance Analysis, a deep learning compatible layer capable of significantly reducing within-class variability of a network’s representation
  48. 48. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Summary: ● We analysed covariance of the representations in a VGG network ● We showed that the more mismatch there is between training and test, the more within-class variability increases in the representation ● We proposed Deep Within-class Covariance Analysis, a deep learning compatible layer capable of significantly reducing within-class variability of a network’s representation ● We empirically showed that DWCCA improves the generalisation when the training and test have mismatched distributions. Nomismatch Validation Test Mismatched
  49. 49. Motivation Covariance Analysis WCCN DWCCA Results Summary Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Learning Thank you for your attention! Come to the poster for more discussions. hamid.eghbal-zadeh@jku.at heghbalz

×