SlideShare a Scribd company logo
1 of 16
Download to read offline
Audio tagging system using
densely connected convolutional networks
Il-Young Jeong
Presented by:
Il-Young Jeong and Hyungui Lim
Authors:
Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018
20 November 2018, Surrey, UK
Introduction: DCASE 2018 challenge task 2
General-purpose audio tagging of
Freesound content with AudioSet labels
• Classifying sound events of very diverse nature including:
- musical instruments
- human sounds
- domestic sounds
- animals
- etc.
• Dataset: Subset of Freesound Dataset with AudioSet Ontology
Difficulty of the task was due to:
• Varied input length

- from 300ms to 30s
• Insufficient training data

- ~9.5k recordings for 41 classes

• Imbalanced class distribution

- from 94 to 300 samples per class

• Unreliable annotation

- Only ~40% labels were verified

Introduction: DCASE 2018 challenge task 2
Introduction: DCASE 2018 challenge task 2
Our Solutions
•Segment-wise learning


•Strong augmentation 

(mixup) 

•Evenly-distributed batch



•Batch-wise loss masking
Difficulty of the task was due to:
• Varied input length

- from 300ms to 30s
• Insufficient training data

- ~9.5k recordings for 41 classes

• Imbalanced class distribution

- from 94 to 300 samples per class

• Unreliable annotation

- Only ~40% labels were verified

•Ensemble approach
Segmentation
• All the preprocessing steps are performed for each batch generation. 

Pros: Fast implementation of various settings

Cons: Computation in batch generation
Framework: (On-the-fly) Preprocessing
Mixup Augmentation T-F representation
- Long data
-> Takes excerpts
- Short data
-> Zero-padding
- New data generated by
mixing two segments.
- Raw waveform/ Logmel
- Faster operation using
GPU, thanks to kapre.
Framework: Evenly distributed batch generation
• Mini-batch learning: Updates model by using subset of training data.
• Randomly selected batch: randomly selects N data from training set.

- Not guarantees that a mini-batch consists of all the classes

- Has imbalanced class distribution if whole training data has.

• Evenly distributed batch: Choose M data per class. N=M*C

- All the mini-batch consists of all the classes.

- Has balanced class distribution.

- (Empirically) shows more stable and fast convergence.
• Mixup: Data augmentation using linear interpolation between two data
Framework: Mixup augmentation
• We used mixup to train model to predict the relative scale of data,

rather than binary classification.
x: data

t: label

λ: mixing parameter
w: scale parameter
Low-level-k0
DenseNet-k1
…
DenseNet-kh
n-head Classifier
‘Cello’
Waveform
h modules
(a) Low-level-k module
BN + Relu + 1x1 Conv (k)
(b) DenseNet-k module (c) n-head classifier module
BN + Relu + 3x3 Conv (k)
Dense (n Multi-Head)
GAP + Softmax
Average
SE
Concatenate
2x2 MaxPool
BN
Logmel
BN + Reshape
3x3 Conv (k)
Concatenate
• End-to-end DenseNet
• Frequency-wise BN
• Squeeze-and-Excitation Network
• Multi-head softmax
Framework: Architecture
• DenseNet: Densely connected network

f_dense(x) = concatenate(f(x),x))
• Allows direct path for backpropagation
• End-to-end DenseNet:

- All layers from input(logmel) to output(loss) is concatenated
Framework: End-to-end DenseNet
Framework: Multi-head softmax
• Replacing softmax layer to

average of multiple softmax outputs.
• Why?

- Good initialization close to 0.5
prediction results especially for mixup.

- Allows prediction for near-0.5 easier.
• Categorical cross-entropy for a mini batch:
Framework: Batch-wise loss masking (1)
• Masked loss when false-annotated data is known:
m_n: 1 when n-th data has true label

0 when n-th data has false label
• Our solution: Remove outliers which have the highest loss from the
gradient calculation.

- x may be false-annotated data if:

1) it is non-verified, and

2) it shows the highest or similar loss in the current batch / iteration.
Framework: Batch-wise loss masking (2)
• Efficient computation for max(loss) using batch-wise calculation.
Experimental results
• Audio segment: 64,000 samples for all experiments

- 16kHz/4s, 32kHz/2s, 44.1kHz/1.45s

• Input domain: logmel or waveform

• MAP@3 Results
Images from https://www.kaggle.com/fizzbuzz/beginner-s-guide-to-audio-data
Future work
• Verifying ideas with additional experiments.

• Model size minimization

• Implementation for real-world application
• Thank you!
• We thank to @Zafar and @daisukelab, who provided wonderful kernels
and discussions for the task.

• If you have interests to Cochlear.ai, 

please visit www.cochlear.ai

More Related Content

What's hot

Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkNamHyuk Ahn
 
Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learningSEMINARGROOT
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2OSri Ambati
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNNNoura Hussein
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
Multimodal Residual Learning for Visual QA
Multimodal Residual Learning for Visual QAMultimodal Residual Learning for Visual QA
Multimodal Residual Learning for Visual QANamHyuk Ahn
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering methodrajshreemuthiah
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 

What's hot (18)

Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural Network
 
N ns 1
N ns 1N ns 1
N ns 1
 
Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learning
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2O
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
Chap4 slides
Chap4 slidesChap4 slides
Chap4 slides
 
Multimodal Residual Learning for Visual QA
Multimodal Residual Learning for Visual QAMultimodal Residual Learning for Visual QA
Multimodal Residual Learning for Visual QA
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 

Similar to Audio tagging system using densely connected convolutional networks (DCASE2018 task2)

backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxNoorUlHaq47
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxJohnPrasad14
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkSpark Summit
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...Naoki Shibata
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 

Similar to Audio tagging system using densely connected convolutional networks (DCASE2018 task2) (20)

backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Deep learning
Deep learningDeep learning
Deep learning
 
UNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptxUNetEliyaLaialy (2).pptx
UNetEliyaLaialy (2).pptx
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptx
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 

Recently uploaded

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Recently uploaded (20)

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 

Audio tagging system using densely connected convolutional networks (DCASE2018 task2)

  • 1. Audio tagging system using densely connected convolutional networks Il-Young Jeong Presented by: Il-Young Jeong and Hyungui Lim Authors: Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 20 November 2018, Surrey, UK
  • 2. Introduction: DCASE 2018 challenge task 2 General-purpose audio tagging of Freesound content with AudioSet labels • Classifying sound events of very diverse nature including: - musical instruments - human sounds - domestic sounds - animals - etc. • Dataset: Subset of Freesound Dataset with AudioSet Ontology
  • 3. Difficulty of the task was due to: • Varied input length
 - from 300ms to 30s • Insufficient training data
 - ~9.5k recordings for 41 classes
 • Imbalanced class distribution
 - from 94 to 300 samples per class
 • Unreliable annotation
 - Only ~40% labels were verified
 Introduction: DCASE 2018 challenge task 2
  • 4. Introduction: DCASE 2018 challenge task 2 Our Solutions •Segment-wise learning 
 •Strong augmentation 
 (mixup) 
 •Evenly-distributed batch
 
 •Batch-wise loss masking Difficulty of the task was due to: • Varied input length
 - from 300ms to 30s • Insufficient training data
 - ~9.5k recordings for 41 classes
 • Imbalanced class distribution
 - from 94 to 300 samples per class
 • Unreliable annotation
 - Only ~40% labels were verified
 •Ensemble approach
  • 5. Segmentation • All the preprocessing steps are performed for each batch generation. 
 Pros: Fast implementation of various settings
 Cons: Computation in batch generation Framework: (On-the-fly) Preprocessing Mixup Augmentation T-F representation - Long data -> Takes excerpts - Short data -> Zero-padding - New data generated by mixing two segments. - Raw waveform/ Logmel - Faster operation using GPU, thanks to kapre.
  • 6. Framework: Evenly distributed batch generation • Mini-batch learning: Updates model by using subset of training data. • Randomly selected batch: randomly selects N data from training set.
 - Not guarantees that a mini-batch consists of all the classes
 - Has imbalanced class distribution if whole training data has.
 • Evenly distributed batch: Choose M data per class. N=M*C
 - All the mini-batch consists of all the classes.
 - Has balanced class distribution.
 - (Empirically) shows more stable and fast convergence.
  • 7. • Mixup: Data augmentation using linear interpolation between two data Framework: Mixup augmentation • We used mixup to train model to predict the relative scale of data,
 rather than binary classification. x: data
 t: label
 λ: mixing parameter w: scale parameter
  • 8. Low-level-k0 DenseNet-k1 … DenseNet-kh n-head Classifier ‘Cello’ Waveform h modules (a) Low-level-k module BN + Relu + 1x1 Conv (k) (b) DenseNet-k module (c) n-head classifier module BN + Relu + 3x3 Conv (k) Dense (n Multi-Head) GAP + Softmax Average SE Concatenate 2x2 MaxPool BN Logmel BN + Reshape 3x3 Conv (k) Concatenate • End-to-end DenseNet • Frequency-wise BN • Squeeze-and-Excitation Network • Multi-head softmax Framework: Architecture
  • 9. • DenseNet: Densely connected network
 f_dense(x) = concatenate(f(x),x)) • Allows direct path for backpropagation • End-to-end DenseNet:
 - All layers from input(logmel) to output(loss) is concatenated Framework: End-to-end DenseNet
  • 10. Framework: Multi-head softmax • Replacing softmax layer to
 average of multiple softmax outputs. • Why?
 - Good initialization close to 0.5 prediction results especially for mixup.
 - Allows prediction for near-0.5 easier.
  • 11. • Categorical cross-entropy for a mini batch: Framework: Batch-wise loss masking (1) • Masked loss when false-annotated data is known: m_n: 1 when n-th data has true label
 0 when n-th data has false label
  • 12. • Our solution: Remove outliers which have the highest loss from the gradient calculation.
 - x may be false-annotated data if:
 1) it is non-verified, and
 2) it shows the highest or similar loss in the current batch / iteration. Framework: Batch-wise loss masking (2) • Efficient computation for max(loss) using batch-wise calculation.
  • 13. Experimental results • Audio segment: 64,000 samples for all experiments
 - 16kHz/4s, 32kHz/2s, 44.1kHz/1.45s
 • Input domain: logmel or waveform
 • MAP@3 Results
  • 15. Future work • Verifying ideas with additional experiments.
 • Model size minimization
 • Implementation for real-world application
  • 16. • Thank you! • We thank to @Zafar and @daisukelab, who provided wonderful kernels and discussions for the task.
 • If you have interests to Cochlear.ai, 
 please visit www.cochlear.ai