SlideShare a Scribd company logo
A Hybrid Approach with
Multi-channel I-Vectors and
Convolutional Neural Networks for
Acoustic Scene Classification
Hamid Eghbal-zadeh, Bernhard Lehner, Matthias Dorfer and Gerhard Widmer
A closer look into our winning submission
at IEEE DCASE-2016 challenge1
for
Acoustic Scene Classification
1) www.cs.tut.fi/sgn/arg/dcase2016/
Introduction
 
What is Acoustic Scene Classification
(ASC)?
photo credit: www.cs.tut.fi/sgn/arg/dcase2016/
1/23
 
Challenges in ASC: Overlapping events
Park City Center
photo credit: www.cs.tut.fi/sgn/arg/dcase2016/
2/23
 
Challenges in ASC: Session Variability
Vienna! Athens!
photo credit:www.google.com
3/23
 
Challenges in ASC: Session Variability
Vienna! Athens!
υγειά μαςProst
photo credit:www.google.com
3/23
 
Challenges in ASC: Session Variability
Vienna! Athens!
photo credit:www.google.com
3/23
 
ASC is not that easy ...
photo credit: www.memegen.com
Methods for ASC
 
Deeeeeep learning
⬛ Pros:
⬜ A powerful method for supervised learning
⬜ Convolutional Neural Networks (CNNs)
⬜ Spectrograms as images
⬜ Feature Learning
⬜ Successfully applied on images, speech and music
⬛ Cons:
⬜ Confusion of classes when dealing with noisy scenes
and blurry spectrograms
⬜ Lack of generalization and overfitting if the training data does
not contain various sessions
Piczak, K. J., et al "Environmental sound classification with convolutional neural networks.", 2015.
photo credit: Yann Lecun's slides at NIPS2016 keynote 4/23
 
Factor Analysis
⬛ Pros:
⬜ Session Variability reduction
⬜ Use of a Universal Background Model (UBM)
⬜ Better generalization due to the unsupervised methodology
⬜ Successfully applied on sequential data such as Speech and
Music
⬛ Cons:
⬜ Relying on engineered features
⬜ Limits to use specialized features for Audio Scene Analysis
because of the independence and Gaussian assumptions in FA
[1] Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011.
[2] Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013).
5/23
 
A Hybrid system to overcome the
complexities ...
photo credit: www.imgflip.com
 
A hybrid approach to ASC
⬛ We combine a CNN with an I-Vector based ASC system:
⬜ A CNN is trained on spectrograms
⬜ I-Vector features (based on FA) are extracted from MFCCs
⬛ Late fusion
⬜ A score fusion technique is used to combine the two methods
⬛ Model averaging for better generalization
⬜ Multiple models are trained and the decision from different
models are averaged
Brummer, N., et al. "On calibration of language recognition scores." , 2006.
6/23
CNNs for ASC
 
A hybrid approach to ASC
⬛ A VGG-style fully convolutional architecture
⬜ A well-known model for object recognition
Conv layer Pooling layer Average pooling layer
Slide...
Sumtheprobabilities
30secs
Feature Learning part Feed-Forward part
Simonyan, K., et al. "Very deep convolutional networks for large-scale image recognition.", 2014.
7/23
I-Vectors for ASC
 
I-Vector Features
GMM Train
I-Vector
model
Sparse
statistics
Adapted GMM params = GMM params – unknown matrix . hidden factor
Learned via EM
Training
MFCCs
Many components high dimension
[1] Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011.
[2] Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013).
[3] Kenny, Patrick, et al. "Uncertainty Modeling Without Subspace Methods For Text-Dependent
Speaker Recognition.", 2016.
low dimension
I-vector
Point estimate
low dimension
EM
8/23
 
I-Vector Features
GMM
MFCCs
Sparse
statistics
I-vector
Adapted GMM params = GMM params – unknown matrix . hidden factor
Sparse
statistics
TrainingExtraction
Learned via EM
Many components
high dimension
high dimension
Train
I-Vector
model
I-vector
Point estimate
low dimension
EM
[1] Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011.
[2] Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013).
[3] Kenny, Patrick, et al. "Uncertainty Modeling Without Subspace Methods For Text-Dependent
Speaker Recognition.", 2016.
low dimension
9/23
 
I-Vector Features
⬛ Requires a Universal Background Model (UBM):
⬜ A GMM with 256 Gaussian components
⬜ MFCCs features
⬛ MAP estimation of a hidden factor:
⬜ m: mean from the GMM
⬜ M: adapted GMM mean to MFCCs of an audio segment
⬜ Solving the following factor analysis equation:
M = m + T.y
⬜ y is the hidden factor and its MAP estimation is the I-vector
Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011.
10/23
 
Improving I-Vector Features for ASC
GMM I-vectorleft
right
average
difference
GMM I-vector
GMM I-vector
GMM I-vector
⬛ Tuning MFCC parameters:
⬛ I-vectors from MFCCs of different channels
Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013).
11/23
 
Post-processing and Scoring I-Vector
Features
⬛ Length-Normalization
⬛ Within-class Covariance Normalization (WCCN)
⬛ Linear Discriminant Analysis (LDA)
⬛ Cosine Similarity:
⬜ Average I-vectors of each class in training set (Model I-vector)
⬜ Compute cosine similarity from each test I-vector to model I-
vector of each class
⬜ Pick the class with maximum similarity
[1] Garcia-Romero, D., et al. "Analysis of i-vector Length Normalization in Speaker Recognition.", 2011.
[2] Hatch, A. O.,et al. "Within-class covariance normalization for SVM-based speaker recognition.", 2006.
[3] Dehak, Najim, et al. "Cosine similarity scoring without score normalization techniques." 2010.
12/23
Hybrid system
 
Why hybrid?
photo credit: www.mechanicalengineeringblog.com
www.google.com
 
Linear Logistic Regression for Score
Fusion
⬛ Combining cosine scores of I-vectors with CNN probabilities
⬛ A Linear Logistic Regression (LLR) model is trained on validation
set
⬛ A coefficient is learned for each model and a bias term for each
class.
⬛ Final score is computed by applying the learned coefficients and
the bias terms on the test set scores.
13/23
 
Model averaging
⬛ 4 separate models trained from each fold
⬛ Average the final score from models in each fold
14/23
Dataset
 
TUT Acoustic Scenes 2016 dataset
⬛ 30-seconds audio segments from 15 acoustic scenes:
⬜ Bus - traveling by bus in the city (vehicle)
⬜ Cafe / Restaurant - small cafe/restaurant (indoor)
⬜ Car - driving or traveling as a passenger, in the city (vehicle)
⬜ City center (outdoor)
⬜ Forest path (outdoor)
⬜ Grocery store - medium size grocery store (indoor)
⬜ Home (indoor)
⬜ Lakeside beach (outdoor)
⬜ Library (indoor)
⬜ Metro station (indoor)
⬜ Office - multiple persons, typical work day (indoor)
⬜ Residential area (outdoor)
⬜ Train (traveling, vehicle)
⬜ Tram (traveling, vehicle)
⬜ Urban park (outdoor)
⬛ Development set:
⬜ Each acoustic scene has 78 segments totaling 39 minutes of audio.
⬜ 4 folds cross validation
⬛ Evaluation set:
⬜ 26 segments totaling 13 minutes of audio.
Mesaros, A.,et al "TUT database for acoustic scene classification and sound event detection." , 2016.
15/23
Results
 
CNN vs I-Vectors: Without Calibration
CNN
Accuracy:83.33%
I-Vectors
Accuracy:86.41%
Class-wise accuracy Confusion matrix Scores
16/23
 
CNN vs I-Vectors: With Calibration
CNN
Accuracy:84.10%
I-Vectors
Accuracy:88.70%
Class-wise accuracy Confusion matrix Scores
17/23
 
Hybrid
Hybrid
Accuracy:89.74%
Hybrid
Accuracy:89.74%
Class-wise accuracy Confusion matrix Scores
18/23
Analysis: Score Calibration
 
CNN vs I-Vectors: Without Calibration
CNN
Accuracy:83.33%
I-Vectors
Accuracy:86.41%
Class-wise accuracy Confusion matrix Scores
19/23
 
CNN vs I-Vectors: With Calibration
CNN
Accuracy:84.10%
I-Vectors
Accuracy:88.70%
Class-wise accuracy Confusion matrix Scores
19/23
Analysis: Score Fusion
 
CNN vs I-Vectors: Without Calibration
CNN
Accuracy:83.33%
I-Vectors
Accuracy:86.41%
metro train
Class-wise accuracy Confusion matrix Scores
20/23
 
Hybrid
Hybrid
Accuracy:89.7%
Hybrid
Accuracy:89.7%
Class-wise accuracy Confusion matrix Scores
20/23
 
DCASE 2016 challenge: Results
Hybrid
Calibrated
Multi-channel
I-vectors
Multi-channel
I-vectors
Oursingle
CNN
MFCC+GMM
baseline
21/23
 
DCASE 2016 challenge: Observations
22/23
 
Challenges in ASC: Session Variability
Kippis Kippis
photo credit:www.google.com
22/23
Conclusion
 
Conclusion
⬛ Performance of I-Vectors can be noticeably improved by tuning
MFCCs
⬛ Different channels contain different information from a scene that
is beneficial to the I-vector system
⬛ I-Vectors and CNNs are complementary
⬛ Score Calibration improved both I-Vectors and CNN
⬛ A late-fusion can efficiently combine the two system’s predictions
⬛ This method is easily adaptable to new conditions
23/23
 
JOHANNES KEPLER
UNIVERSITY LINZ
Altenberger Str. 69
4040 Linz, Austria
www.jku.at
Thank you!
For more information about this presentation, don’t hesitate to
contact me via:
hamid.eghbal-zadeh@jku.at
Slides available at:
https://www.slideshare.net/heghbalz
Code soon will be available at:
https://github.com/cpjku

More Related Content

Similar to Slides of my presentation at EUSIPCO 2017

Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
ijtsrd
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
IRJET Journal
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
CSCJournals
 
Resume_Naveena1
Resume_Naveena1Resume_Naveena1
Resume_Naveena1
Naveena Vemulapalli
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptx
harimaxwell0712
 
Final year project proposal
Final year project proposalFinal year project proposal
Final year project proposal
sertsedengle shewandagn
 
IRJET- Application of MCNN in Object Detection
IRJET-  	  Application of MCNN in Object DetectionIRJET-  	  Application of MCNN in Object Detection
IRJET- Application of MCNN in Object Detection
IRJET Journal
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
ranjit banshpal
 
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
NanubalaDhruvan
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
민재 정
 
project final ppt.pptx
project final ppt.pptxproject final ppt.pptx
project final ppt.pptx
HarishKumarHarijan
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
June-Woo Kim
 
Deep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognitionDeep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognition
TELKOMNIKA JOURNAL
 
IRJET- Texture Analysis and Fracture Identification of Bones X-Ray Images...
IRJET-  	  Texture Analysis and Fracture Identification of Bones X-Ray Images...IRJET-  	  Texture Analysis and Fracture Identification of Bones X-Ray Images...
IRJET- Texture Analysis and Fracture Identification of Bones X-Ray Images...
IRJET Journal
 
Traffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNsTraffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNs
IRJET Journal
 
Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...
Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...
Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...
IRJET Journal
 
A Survey on: Sound Source Separation Methods
A Survey on: Sound Source Separation MethodsA Survey on: Sound Source Separation Methods
A Survey on: Sound Source Separation Methods
IJCERT
 
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATIONFRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
sipij
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year Journey
Lionel Briand
 
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
IRJET Journal
 

Similar to Slides of my presentation at EUSIPCO 2017 (20)

Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
 
Automatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVMAutomatism System Using Faster R-CNN and SVM
Automatism System Using Faster R-CNN and SVM
 
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
 
Resume_Naveena1
Resume_Naveena1Resume_Naveena1
Resume_Naveena1
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptx
 
Final year project proposal
Final year project proposalFinal year project proposal
Final year project proposal
 
IRJET- Application of MCNN in Object Detection
IRJET-  	  Application of MCNN in Object DetectionIRJET-  	  Application of MCNN in Object Detection
IRJET- Application of MCNN in Object Detection
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
 
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
TRAFFIC MANAGEMENT THROUGH SATELLITE IMAGING -- Part 1
 
Combinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learningCombinatorial optimization and deep reinforcement learning
Combinatorial optimization and deep reinforcement learning
 
project final ppt.pptx
project final ppt.pptxproject final ppt.pptx
project final ppt.pptx
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
 
Deep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognitionDeep hypersphere embedding for real-time face recognition
Deep hypersphere embedding for real-time face recognition
 
IRJET- Texture Analysis and Fracture Identification of Bones X-Ray Images...
IRJET-  	  Texture Analysis and Fracture Identification of Bones X-Ray Images...IRJET-  	  Texture Analysis and Fracture Identification of Bones X-Ray Images...
IRJET- Texture Analysis and Fracture Identification of Bones X-Ray Images...
 
Traffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNsTraffic Sign Recognition using CNNs
Traffic Sign Recognition using CNNs
 
Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...
Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...
Implementation of Various Machine Learning Algorithms for Traffic Sign Detect...
 
A Survey on: Sound Source Separation Methods
A Survey on: Sound Source Separation MethodsA Survey on: Sound Source Separation Methods
A Survey on: Sound Source Separation Methods
 
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATIONFRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
FRONT AND REAR VEHICLE DETECTION USING HYPOTHESIS GENERATION AND VERIFICATION
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year Journey
 
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
A Survey on Portable Camera-Based Assistive Text and Product Label Reading Fr...
 

Recently uploaded

Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
Karen593256
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 

Recently uploaded (20)

Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 

Slides of my presentation at EUSIPCO 2017

  • 1.
  • 2.
  • 3. A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification Hamid Eghbal-zadeh, Bernhard Lehner, Matthias Dorfer and Gerhard Widmer
  • 4. A closer look into our winning submission at IEEE DCASE-2016 challenge1 for Acoustic Scene Classification 1) www.cs.tut.fi/sgn/arg/dcase2016/
  • 6.   What is Acoustic Scene Classification (ASC)? photo credit: www.cs.tut.fi/sgn/arg/dcase2016/ 1/23
  • 7.   Challenges in ASC: Overlapping events Park City Center photo credit: www.cs.tut.fi/sgn/arg/dcase2016/ 2/23
  • 8.   Challenges in ASC: Session Variability Vienna! Athens! photo credit:www.google.com 3/23
  • 9.   Challenges in ASC: Session Variability Vienna! Athens! υγειά μαςProst photo credit:www.google.com 3/23
  • 10.   Challenges in ASC: Session Variability Vienna! Athens! photo credit:www.google.com 3/23
  • 11.   ASC is not that easy ... photo credit: www.memegen.com
  • 13.   Deeeeeep learning ⬛ Pros: ⬜ A powerful method for supervised learning ⬜ Convolutional Neural Networks (CNNs) ⬜ Spectrograms as images ⬜ Feature Learning ⬜ Successfully applied on images, speech and music ⬛ Cons: ⬜ Confusion of classes when dealing with noisy scenes and blurry spectrograms ⬜ Lack of generalization and overfitting if the training data does not contain various sessions Piczak, K. J., et al "Environmental sound classification with convolutional neural networks.", 2015. photo credit: Yann Lecun's slides at NIPS2016 keynote 4/23
  • 14.   Factor Analysis ⬛ Pros: ⬜ Session Variability reduction ⬜ Use of a Universal Background Model (UBM) ⬜ Better generalization due to the unsupervised methodology ⬜ Successfully applied on sequential data such as Speech and Music ⬛ Cons: ⬜ Relying on engineered features ⬜ Limits to use specialized features for Audio Scene Analysis because of the independence and Gaussian assumptions in FA [1] Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011. [2] Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013). 5/23
  • 15.   A Hybrid system to overcome the complexities ... photo credit: www.imgflip.com
  • 16.   A hybrid approach to ASC ⬛ We combine a CNN with an I-Vector based ASC system: ⬜ A CNN is trained on spectrograms ⬜ I-Vector features (based on FA) are extracted from MFCCs ⬛ Late fusion ⬜ A score fusion technique is used to combine the two methods ⬛ Model averaging for better generalization ⬜ Multiple models are trained and the decision from different models are averaged Brummer, N., et al. "On calibration of language recognition scores." , 2006. 6/23
  • 18.   A hybrid approach to ASC ⬛ A VGG-style fully convolutional architecture ⬜ A well-known model for object recognition Conv layer Pooling layer Average pooling layer Slide... Sumtheprobabilities 30secs Feature Learning part Feed-Forward part Simonyan, K., et al. "Very deep convolutional networks for large-scale image recognition.", 2014. 7/23
  • 20.   I-Vector Features GMM Train I-Vector model Sparse statistics Adapted GMM params = GMM params – unknown matrix . hidden factor Learned via EM Training MFCCs Many components high dimension [1] Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011. [2] Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013). [3] Kenny, Patrick, et al. "Uncertainty Modeling Without Subspace Methods For Text-Dependent Speaker Recognition.", 2016. low dimension I-vector Point estimate low dimension EM 8/23
  • 21.   I-Vector Features GMM MFCCs Sparse statistics I-vector Adapted GMM params = GMM params – unknown matrix . hidden factor Sparse statistics TrainingExtraction Learned via EM Many components high dimension high dimension Train I-Vector model I-vector Point estimate low dimension EM [1] Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011. [2] Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013). [3] Kenny, Patrick, et al. "Uncertainty Modeling Without Subspace Methods For Text-Dependent Speaker Recognition.", 2016. low dimension 9/23
  • 22.   I-Vector Features ⬛ Requires a Universal Background Model (UBM): ⬜ A GMM with 256 Gaussian components ⬜ MFCCs features ⬛ MAP estimation of a hidden factor: ⬜ m: mean from the GMM ⬜ M: adapted GMM mean to MFCCs of an audio segment ⬜ Solving the following factor analysis equation: M = m + T.y ⬜ y is the hidden factor and its MAP estimation is the I-vector Dehak, Najim, et al. "Front-end factor analysis for speaker verification.",2011. 10/23
  • 23.   Improving I-Vector Features for ASC GMM I-vectorleft right average difference GMM I-vector GMM I-vector GMM I-vector ⬛ Tuning MFCC parameters: ⬛ I-vectors from MFCCs of different channels Elizalde, B, et al. "An i-vector based approach for audio scene detection." (2013). 11/23
  • 24.   Post-processing and Scoring I-Vector Features ⬛ Length-Normalization ⬛ Within-class Covariance Normalization (WCCN) ⬛ Linear Discriminant Analysis (LDA) ⬛ Cosine Similarity: ⬜ Average I-vectors of each class in training set (Model I-vector) ⬜ Compute cosine similarity from each test I-vector to model I- vector of each class ⬜ Pick the class with maximum similarity [1] Garcia-Romero, D., et al. "Analysis of i-vector Length Normalization in Speaker Recognition.", 2011. [2] Hatch, A. O.,et al. "Within-class covariance normalization for SVM-based speaker recognition.", 2006. [3] Dehak, Najim, et al. "Cosine similarity scoring without score normalization techniques." 2010. 12/23
  • 26.   Why hybrid? photo credit: www.mechanicalengineeringblog.com www.google.com
  • 27.   Linear Logistic Regression for Score Fusion ⬛ Combining cosine scores of I-vectors with CNN probabilities ⬛ A Linear Logistic Regression (LLR) model is trained on validation set ⬛ A coefficient is learned for each model and a bias term for each class. ⬛ Final score is computed by applying the learned coefficients and the bias terms on the test set scores. 13/23
  • 28.   Model averaging ⬛ 4 separate models trained from each fold ⬛ Average the final score from models in each fold 14/23
  • 30.   TUT Acoustic Scenes 2016 dataset ⬛ 30-seconds audio segments from 15 acoustic scenes: ⬜ Bus - traveling by bus in the city (vehicle) ⬜ Cafe / Restaurant - small cafe/restaurant (indoor) ⬜ Car - driving or traveling as a passenger, in the city (vehicle) ⬜ City center (outdoor) ⬜ Forest path (outdoor) ⬜ Grocery store - medium size grocery store (indoor) ⬜ Home (indoor) ⬜ Lakeside beach (outdoor) ⬜ Library (indoor) ⬜ Metro station (indoor) ⬜ Office - multiple persons, typical work day (indoor) ⬜ Residential area (outdoor) ⬜ Train (traveling, vehicle) ⬜ Tram (traveling, vehicle) ⬜ Urban park (outdoor) ⬛ Development set: ⬜ Each acoustic scene has 78 segments totaling 39 minutes of audio. ⬜ 4 folds cross validation ⬛ Evaluation set: ⬜ 26 segments totaling 13 minutes of audio. Mesaros, A.,et al "TUT database for acoustic scene classification and sound event detection." , 2016. 15/23
  • 32.   CNN vs I-Vectors: Without Calibration CNN Accuracy:83.33% I-Vectors Accuracy:86.41% Class-wise accuracy Confusion matrix Scores 16/23
  • 33.   CNN vs I-Vectors: With Calibration CNN Accuracy:84.10% I-Vectors Accuracy:88.70% Class-wise accuracy Confusion matrix Scores 17/23
  • 36.   CNN vs I-Vectors: Without Calibration CNN Accuracy:83.33% I-Vectors Accuracy:86.41% Class-wise accuracy Confusion matrix Scores 19/23
  • 37.   CNN vs I-Vectors: With Calibration CNN Accuracy:84.10% I-Vectors Accuracy:88.70% Class-wise accuracy Confusion matrix Scores 19/23
  • 39.   CNN vs I-Vectors: Without Calibration CNN Accuracy:83.33% I-Vectors Accuracy:86.41% metro train Class-wise accuracy Confusion matrix Scores 20/23
  • 41.   DCASE 2016 challenge: Results Hybrid Calibrated Multi-channel I-vectors Multi-channel I-vectors Oursingle CNN MFCC+GMM baseline 21/23
  • 42.   DCASE 2016 challenge: Observations 22/23
  • 43.   Challenges in ASC: Session Variability Kippis Kippis photo credit:www.google.com 22/23
  • 45.   Conclusion ⬛ Performance of I-Vectors can be noticeably improved by tuning MFCCs ⬛ Different channels contain different information from a scene that is beneficial to the I-vector system ⬛ I-Vectors and CNNs are complementary ⬛ Score Calibration improved both I-Vectors and CNN ⬛ A late-fusion can efficiently combine the two system’s predictions ⬛ This method is easily adaptable to new conditions 23/23
  • 46.  
  • 47. JOHANNES KEPLER UNIVERSITY LINZ Altenberger Str. 69 4040 Linz, Austria www.jku.at Thank you! For more information about this presentation, don’t hesitate to contact me via: hamid.eghbal-zadeh@jku.at Slides available at: https://www.slideshare.net/heghbalz Code soon will be available at: https://github.com/cpjku