SlideShare a Scribd company logo
Experiments on the DCASE Challenge 2016:
Acoustic Scene Classification and
Sound Event Detection in Real Life Recordings
Benjamin Elizalde1
, Anurag Kumar1
, Ankit Shah2
, Rohan
Badlani3
, Emmanuel Vincent4
, Bhiksha Raj1
, Ian Lane1
1
CMU, 4
Inria, 2
NIT Surathkal, 3
BITS
Overview
1. Task 1: Acoustic Scene Classification
a. Approach
b. Results on challenge
c. Conclusions
2. Task 3: Sound Event Detection in Real Life Recordings
a. Approach
b. Results on challenge
c. Conclusions
Task 1: Acoustic Scene Classification
Approach
Multiclass classification of scenes
using high-level features and
multiple kernels for SVM
Approach
● Merge both audio channels into one.
● Compute MFCCs with 20 dim. +D+DD, 30 ms window and 50% overlap.
Compute
MFCC
Approach
Compute
MFCC
Train
UBM
● Train multiple GMM-UBMs of different component sizes (64,128,256,512)
over the MFCCs of all the training data.
Approach
Compute
MFCC
Train
UBM
Compute
high-level
features
● The high-level features try to capture the distribution of the scene’s MFCCs.
● Compute features, labeled as Beta, by adapting the UBM to the MFCCs using
MAP* for each component size.
○ Four supervectors with stacked means.
○ Four supervectors with stacked means and variance.
Approach
Compute
MFCC
Train
UBM
Compute
high-level
features
● The high-level features try to capture the distribution of the scene’s MFCCs.
● Compute features, labeled as Beta, by adapting the UBM to the MFCCs using
MAP* for each component size.
○ Four supervectors with stacked means.
○ Four supervectors with stacked means and variance.
MAP*
Approach
● Train SVM with Linear and RBF kernels for each of the 8 supervectors, thus 16
systems in total.
Train/Classify
SVM
Compute
MFCC
Train
UBM
Compute
high-level
features
Approach
● Fuse classifiers based on maximum prediction vote.
Fuse
classifiers
Output
Scores
Train/Classify
SVM
Compute
MFCC
Train
UBM
Compute
high-level
features
Task 1: Acoustic Scene Classification
Results
Overall results on the challenge
Accuracy %
Run Development Evaluation
Baseline 72.5 77.2
Best 89.9H
89.7H
Overall results on the challenge
Accuracy %
Run Development Evaluation
Baseline 72.5 77.2
Best 89.9H
89.7H
Our approach 78.9 85.97th
Class-wise results
Conclusions
● GMM-based representations capture well the content of long scene recordings.
○ For example, i-vectors which got the best performance.
● Fusion of different beta vectors and kernels improved performance.
● We tried Alpha features which are histograms, similar to Bag-of-Audio-Words
representations in combination with 7 kernels, but didn’t help.
Task 3: Sound Event Detection in
Real Life Recordings
Approach
Multiclass classification of
sound events in scenes,
including a generic sound event
and time-perturbation
Approach: Standard
● Create a pipeline for each scene: Home and Residential Area.
● For training: extract the sound events from the scene recordings to have
isolated files.
Extract
sounds
Train set
Approach: Standard
● Merge both audio channels into one..
● Compute MFCCs with 13 dim. +D+DD, 30 ms window and 50% overlap.
○ Also explored: Separable Gabor Filter Banks, Scatnet and Stacked with and without PCA.
Extract
sounds
Compute
MFCC
Approach: Standard
● Tpot is a toolbox that automatically creates machine learning pipelines.
● Tpot (v4) tests and optimizes 12 classifiers such as SVMs, Decision Trees,
K-Neighbors, Logistic Regression, etc.
● Main selections: Random Forest, Logistic Regression, Gradient Boosting
Extract
sounds
Compute
MFCC
Generate
TPOT
pipeline
Train
Classifier
Approach: Standard
Extract
sounds
Classify
segments
Compute
MFCC
Segment
audio
Output
Scores
Test set
● For testing: the scene recording was trimmed into contiguous one-second
segments.
● Compute MFCCs and then classify each segment.
Approach: Generic
● The generic class was created to address out-of-vocabulary sound events.
● The class is specific to each scene.
● The number of audio files corresponds to the average of files per class.
Extract
sounds
Compute
MFCC
Segment
audio
Output
Scores
Generic
Sound
Event
Classify
segments
Approach: Perturbation
● The time-perturbation is intended to address robustness and compensate for
small data.
● The signal was speeded up to 2x and slowed down to 0.3x to output 13
different versions of the original file.
Extract
sounds
Compute
MFCC
Segment
audio
Output
Scores
Time
perturbation
Classify
segments
Task 3: Sound Event Detection in
Real Life Recordings
Results
Results on challenge
Overall Segment-based Error Rate
Run Development Evaluation
Baseline 0.91 0.87
Best 0.91S
0.80S
Results on challenge
Overall Segment-based Error Rate
Run Development Evaluation
Standard (S) 0.84 1.05
Baseline 0.91 0.87
Best 0.91S
0.80S
Results on challenge
Overall Segment-based Error Rate
Run Development Evaluation
Standard (S) 0.84 1.05
S+Generic 0.81 1.07
Baseline 0.91 0.87
Best 0.91S
0.80S
Results on challenge
Overall Segment-based Error Rate
Run Development Evaluation
Standard (S) 0.84 1.05
S+Generic 0.81 1.07
S+Generic+Perturbation 0.76 0.961
Baseline 0.91 0.87
Best 0.91S
0.80S
9th
Results on challenge
Overall Segment-based Error Rate
Run Development Evaluation
Standard (S) 0.84 1.05
S+Generic 0.81 1.07
S+Generic+Perturbation 0.76 0.961
*S+Perturbation 0.74 0.96
Baseline 0.91 0.87
Best 0.91S
0.80S
9th
Results on challenge
Segment-based Error Rate
(Class-based average)
Run Home Residential
Baseline 0.97 1.31
Best 0.98S
0.98D
Results on challenge
Segment-based Error Rate
(Class-based average)
Run Home Residential
Standard (S) 1.12 0.99
S+Generic 1.92 2.08
S+Generic+Perturbation 1.52 1.24
*S+Perturbation 1.03 0.89
Baseline 0.97 1.31
Best 0.98S
0.98D
Class-wise results on Home
Class-wise results on Residential Area
Conclusions
● Classifier exploration improved performance, unlike feature exploration.
● The generic sound event could be an option for out-of-vocabulary-words.
● Time-perturbation significantly improved our system.
● Large-scale sound event detection using weakly-labeled data.
Questions?
bmartin1@andrew.cmu.edu
Per-scene results on the challenge
Scene Rank - ex aequo
Residential Area 1 tiers
Forest, Grocery Store, Metro, Office 2
Train, Home, City Center, Bus 3
Beach, Cafe, Car, Tram 4
Library, Park 5
Sound-events results on challenge
S+Generic+Perturbation
Home Residential Rank ...
Cupboard Wind Blowing 1
Glass Jingling Children Shouting 2
... ... ...
Water Tap Running People Speaking 9
Object Impact Bird Singing 13

More Related Content

What's hot

Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Tomoki Koriyama
 
video compression
video compressionvideo compression
video compression
aniruddh Tyagi
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
VijayKumarArya
 
Signal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLABSignal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLAB
Embedded Plus Trichy
 
Companding and DPCM and ADPCM
Companding and DPCM and ADPCMCompanding and DPCM and ADPCM
Companding and DPCM and ADPCM
naimish12
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency Filtering
Tejus Adiga M
 
Hw2
Hw2Hw2
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression Standards
Ajay
 
MPEG video compression standard
MPEG video compression standardMPEG video compression standard
MPEG video compression standard
anuragjagetiya
 
Mpeg 2
Mpeg 2Mpeg 2
Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)
Joel P
 
Jpeg
JpegJpeg
Image compression and it’s security1
Image compression and it’s security1Image compression and it’s security1
Image compression and it’s security1
Reyad Hossain
 
A quick illustration of jpeg 2000
A quick illustration of jpeg 2000A quick illustration of jpeg 2000
A quick illustration of jpeg 2000
Data Fok
 
Signal Compression and JPEG
Signal Compression and JPEGSignal Compression and JPEG
Signal Compression and JPEG
guest9006ab
 
Audio compression
Audio compressionAudio compression
Audio compression
Madhawa Gunasekara
 
Lab manual
Lab manualLab manual
Lab manual
Mat Awang
 
JPEG2000 in a nutshell
JPEG2000 in a nutshellJPEG2000 in a nutshell
JPEG2000 in a nutshell
Benoit Michel
 
Task allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsTask allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessors
Don William
 
Multimedia Object - Image
Multimedia Object - ImageMultimedia Object - Image
Multimedia Object - Image
Telkom Institute of Management
 

What's hot (20)

Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
 
video compression
video compressionvideo compression
video compression
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
 
Signal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLABSignal and image processing on satellite communication using MATLAB
Signal and image processing on satellite communication using MATLAB
 
Companding and DPCM and ADPCM
Companding and DPCM and ADPCMCompanding and DPCM and ADPCM
Companding and DPCM and ADPCM
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency Filtering
 
Hw2
Hw2Hw2
Hw2
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression Standards
 
MPEG video compression standard
MPEG video compression standardMPEG video compression standard
MPEG video compression standard
 
Mpeg 2
Mpeg 2Mpeg 2
Mpeg 2
 
Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)Image compression 14_04_2020 (1)
Image compression 14_04_2020 (1)
 
Jpeg
JpegJpeg
Jpeg
 
Image compression and it’s security1
Image compression and it’s security1Image compression and it’s security1
Image compression and it’s security1
 
A quick illustration of jpeg 2000
A quick illustration of jpeg 2000A quick illustration of jpeg 2000
A quick illustration of jpeg 2000
 
Signal Compression and JPEG
Signal Compression and JPEGSignal Compression and JPEG
Signal Compression and JPEG
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Lab manual
Lab manualLab manual
Lab manual
 
JPEG2000 in a nutshell
JPEG2000 in a nutshellJPEG2000 in a nutshell
JPEG2000 in a nutshell
 
Task allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsTask allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessors
 
Multimedia Object - Image
Multimedia Object - ImageMultimedia Object - Image
Multimedia Object - Image
 

Similar to Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Classification and Sound Event Detection in real life recordings

Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
NAVER Engineering
 
MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
Daniel Brewster
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
GeeksLab Odessa
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
Luís Gustavo Martins
 
N017428692
N017428692N017428692
N017428692
IOSR Journals
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
aneetaanu
 
Icmmse slides
Icmmse slidesIcmmse slides
Icmmse slides
Manoj Shukla
 
ASR_final
ASR_finalASR_final
ASR_final
Bidhan Barai
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
Deepesh Lekhak
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-Attention
NU_I_TODALAB
 
Mini Project- Audio Enhancement
Mini Project-  Audio EnhancementMini Project-  Audio Enhancement
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by event
Yara Ali
 
Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...
Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...
Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...
adil raja
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identification
ijcisjournal
 
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Lviv Data Science Summer School
 
Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...
Hyun-gui Lim
 
Audio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfAudio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdf
ssuser849b73
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
MLconf
 
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based ModelReal-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
adil raja
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
ssuser849b73
 

Similar to Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Classification and Sound Event Detection in real life recordings (20)

Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
 
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонны...
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
N017428692
N017428692N017428692
N017428692
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
Icmmse slides
Icmmse slidesIcmmse slides
Icmmse slides
 
ASR_final
ASR_finalASR_final
ASR_final
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-Attention
 
Mini Project- Audio Enhancement
Mini Project-  Audio EnhancementMini Project-  Audio Enhancement
Mini Project- Audio Enhancement
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by event
 
Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...
Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...
Real-time Non-Intrusive Speech Quality Estimation of VoIP Using Genetic Progr...
 
General Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker IdentificationGeneral Kalman Filter & Speech Enhancement for Speaker Identification
General Kalman Filter & Speech Enhancement for Speaker Identification
 
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
Master defence 2020 - Borys Olshanetskyi -Context Independent Speaker Classif...
 
Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...Audio tagging system using densely connected convolutional networks (DCASE201...
Audio tagging system using densely connected convolutional networks (DCASE201...
 
Audio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfAudio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdf
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
 
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based ModelReal-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 

Recently uploaded

dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 

Recently uploaded (20)

dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 

Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Classification and Sound Event Detection in real life recordings

  • 1. Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recordings
  • 2. Benjamin Elizalde1 , Anurag Kumar1 , Ankit Shah2 , Rohan Badlani3 , Emmanuel Vincent4 , Bhiksha Raj1 , Ian Lane1 1 CMU, 4 Inria, 2 NIT Surathkal, 3 BITS
  • 3. Overview 1. Task 1: Acoustic Scene Classification a. Approach b. Results on challenge c. Conclusions 2. Task 3: Sound Event Detection in Real Life Recordings a. Approach b. Results on challenge c. Conclusions
  • 4. Task 1: Acoustic Scene Classification Approach
  • 5. Multiclass classification of scenes using high-level features and multiple kernels for SVM
  • 6. Approach ● Merge both audio channels into one. ● Compute MFCCs with 20 dim. +D+DD, 30 ms window and 50% overlap. Compute MFCC
  • 7. Approach Compute MFCC Train UBM ● Train multiple GMM-UBMs of different component sizes (64,128,256,512) over the MFCCs of all the training data.
  • 8. Approach Compute MFCC Train UBM Compute high-level features ● The high-level features try to capture the distribution of the scene’s MFCCs. ● Compute features, labeled as Beta, by adapting the UBM to the MFCCs using MAP* for each component size. ○ Four supervectors with stacked means. ○ Four supervectors with stacked means and variance.
  • 9. Approach Compute MFCC Train UBM Compute high-level features ● The high-level features try to capture the distribution of the scene’s MFCCs. ● Compute features, labeled as Beta, by adapting the UBM to the MFCCs using MAP* for each component size. ○ Four supervectors with stacked means. ○ Four supervectors with stacked means and variance. MAP*
  • 10. Approach ● Train SVM with Linear and RBF kernels for each of the 8 supervectors, thus 16 systems in total. Train/Classify SVM Compute MFCC Train UBM Compute high-level features
  • 11. Approach ● Fuse classifiers based on maximum prediction vote. Fuse classifiers Output Scores Train/Classify SVM Compute MFCC Train UBM Compute high-level features
  • 12. Task 1: Acoustic Scene Classification Results
  • 13. Overall results on the challenge Accuracy % Run Development Evaluation Baseline 72.5 77.2 Best 89.9H 89.7H
  • 14. Overall results on the challenge Accuracy % Run Development Evaluation Baseline 72.5 77.2 Best 89.9H 89.7H Our approach 78.9 85.97th
  • 16. Conclusions ● GMM-based representations capture well the content of long scene recordings. ○ For example, i-vectors which got the best performance. ● Fusion of different beta vectors and kernels improved performance. ● We tried Alpha features which are histograms, similar to Bag-of-Audio-Words representations in combination with 7 kernels, but didn’t help.
  • 17. Task 3: Sound Event Detection in Real Life Recordings Approach
  • 18. Multiclass classification of sound events in scenes, including a generic sound event and time-perturbation
  • 19. Approach: Standard ● Create a pipeline for each scene: Home and Residential Area. ● For training: extract the sound events from the scene recordings to have isolated files. Extract sounds Train set
  • 20. Approach: Standard ● Merge both audio channels into one.. ● Compute MFCCs with 13 dim. +D+DD, 30 ms window and 50% overlap. ○ Also explored: Separable Gabor Filter Banks, Scatnet and Stacked with and without PCA. Extract sounds Compute MFCC
  • 21. Approach: Standard ● Tpot is a toolbox that automatically creates machine learning pipelines. ● Tpot (v4) tests and optimizes 12 classifiers such as SVMs, Decision Trees, K-Neighbors, Logistic Regression, etc. ● Main selections: Random Forest, Logistic Regression, Gradient Boosting Extract sounds Compute MFCC Generate TPOT pipeline Train Classifier
  • 22. Approach: Standard Extract sounds Classify segments Compute MFCC Segment audio Output Scores Test set ● For testing: the scene recording was trimmed into contiguous one-second segments. ● Compute MFCCs and then classify each segment.
  • 23. Approach: Generic ● The generic class was created to address out-of-vocabulary sound events. ● The class is specific to each scene. ● The number of audio files corresponds to the average of files per class. Extract sounds Compute MFCC Segment audio Output Scores Generic Sound Event Classify segments
  • 24. Approach: Perturbation ● The time-perturbation is intended to address robustness and compensate for small data. ● The signal was speeded up to 2x and slowed down to 0.3x to output 13 different versions of the original file. Extract sounds Compute MFCC Segment audio Output Scores Time perturbation Classify segments
  • 25. Task 3: Sound Event Detection in Real Life Recordings Results
  • 26. Results on challenge Overall Segment-based Error Rate Run Development Evaluation Baseline 0.91 0.87 Best 0.91S 0.80S
  • 27. Results on challenge Overall Segment-based Error Rate Run Development Evaluation Standard (S) 0.84 1.05 Baseline 0.91 0.87 Best 0.91S 0.80S
  • 28. Results on challenge Overall Segment-based Error Rate Run Development Evaluation Standard (S) 0.84 1.05 S+Generic 0.81 1.07 Baseline 0.91 0.87 Best 0.91S 0.80S
  • 29. Results on challenge Overall Segment-based Error Rate Run Development Evaluation Standard (S) 0.84 1.05 S+Generic 0.81 1.07 S+Generic+Perturbation 0.76 0.961 Baseline 0.91 0.87 Best 0.91S 0.80S 9th
  • 30. Results on challenge Overall Segment-based Error Rate Run Development Evaluation Standard (S) 0.84 1.05 S+Generic 0.81 1.07 S+Generic+Perturbation 0.76 0.961 *S+Perturbation 0.74 0.96 Baseline 0.91 0.87 Best 0.91S 0.80S 9th
  • 31. Results on challenge Segment-based Error Rate (Class-based average) Run Home Residential Baseline 0.97 1.31 Best 0.98S 0.98D
  • 32. Results on challenge Segment-based Error Rate (Class-based average) Run Home Residential Standard (S) 1.12 0.99 S+Generic 1.92 2.08 S+Generic+Perturbation 1.52 1.24 *S+Perturbation 1.03 0.89 Baseline 0.97 1.31 Best 0.98S 0.98D
  • 34. Class-wise results on Residential Area
  • 35. Conclusions ● Classifier exploration improved performance, unlike feature exploration. ● The generic sound event could be an option for out-of-vocabulary-words. ● Time-perturbation significantly improved our system. ● Large-scale sound event detection using weakly-labeled data.
  • 37. Per-scene results on the challenge Scene Rank - ex aequo Residential Area 1 tiers Forest, Grocery Store, Metro, Office 2 Train, Home, City Center, Bus 3 Beach, Cafe, Car, Tram 4 Library, Park 5
  • 38. Sound-events results on challenge S+Generic+Perturbation Home Residential Rank ... Cupboard Wind Blowing 1 Glass Jingling Children Shouting 2 ... ... ... Water Tap Running People Speaking 9 Object Impact Bird Singing 13