SlideShare a Scribd company logo
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Multi-level audio classification architecture
Jozef Vavrek, Jozef Juhár
Department of Electronics and Multimedia Communications
Faculty of Electrical Engineering and Informatics
Technical University of Košice
email: {Jozef.Vavrek; Jozef.Juhar}@tuke.sk
Telecommunication
Educational Seminar
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Content
1. Motivation and aim
2. Proposed classification system
3. Audio data
4. Segmentation, preprocessing, feature extraction, smoothing
4.1 Feature extraction techniques (cepstral)
4.2 Feature extraction techniques (spectral)
5. Basic principles of BN audio data classification via BDT
6. Basic principles of BN audio data classification via BDA
7. Binary discrimination architecture employing Support Vector
Machine classifier (BDASVM)
8. Experimental setup
9. Results
10. Additional experiments – One Against One (OAO) architecture
11. Additional results
12. Conclusions & future work
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
1.Motivation and aim
We built the classification system with intention to use if for refinement
the acoustic models for each particular audio class and lower the word
error rate of the automatic speech recognition (ASR) system.
We proposed binary discrimination architecture utilizing support vector
machine (BDASVM) classifier in order to overcome classification
accuracy of binary decision trees with SVM (BDTSVM) and alleviate
miss-classification error that propagates from the top of the
architecture.
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
2.Proposed classification system
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
3.Audio data
 Database: Slovak TV broadcast news BNKE1 - part of the COST-278
 Audio: 16 kHz 16 bit mono PCM
 Metadata: manually annotated using Transcriber
 Duration: 65 hours (188 recordings)
 Audio data used for training and testing:
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Audio event Training set (min) Testing set (min)
Pure Speech (PS) 10.19 9.16
Speech with env. sound (SES) 9.26 9.44
Speech with music (MS) 9.41 9.25
Music (M) 11.7 9.04
Env. Sound (Background B) 9.06 9.31
49 46.2
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
4.Segmentation, preprocessing, feature extraction, smoothing
4.1 Feature extraction techniques (cepstral)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Mel-Frequency Cepstral Coefficients (MFCC)
Variance of Acceleration Mel-Frequency Cepstral Coefficients (VAMFCC)
Variance of Mel-Filter Bank Energy (VMFBE)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
4.2 Feature extraction techniques (spectral)
Spectral Centroid (SC)
Spectral Flux (SF)
Spectral Spread (SS)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
4.2 Feature extraction techniques (spectral)
Spectral ROLL-OFF (ROLLOFF)
Band Periodicity (BP)
5.Basic principles of BN audio data classification via BDT
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
6.Basic principles of BN audio data classification via BDA
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
7.Binary Discrimination Architecture employing Support
Vector Machine classifier (BDASVM)
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
8.Experimental setup
Segmentation: rectangular window with length 200ms and 100ms overlapping
Preprocessing: Hamming window with length 50ms and 25ms overlapping
Feature extraction: frame-based, segment-based, frame-based with smoothing, segment-
based with smoothing
Smoothing: floating window with length 1s
Classification: support vector machine classifier, RBF kernel function, 5-fold cross-
validation
•Evaluation parameters:
– for cross-validation: Area Under the Curve (AUC)=<TPR>, (0.5,1)
– for classification performance: Accuracy (Acc)=(TP+TN)/(TP+FP+TN+FN)
– Processing Time (PT)
Software: wavex (wav extractor), libsvm-3.17
Hardware: HPC TUKE 24 nodes, IBM Blade System x HS22 with two six-core processor
units Intel Xeon L5640 (2.27GHz) and 48 GB RAM
9.Results
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Topology
Acc[%]
frame framefw
seg segfw
BDTSVM 74.82 88.05 78.13 86.81
BDASVM 75.10 89.12 80.43 90.52
+0.28 +1.07 +2.3 +3.71
Tab.: Classification performance of BDTSVM and BDASVM architectures for different parameterization levels
Acc represents average from S-NS, PS-NPS, MS-SES, M-B discriminators
Tab.: The overall classification performance of BDTSVM and BDASVM architectures
Acc represents average from each parameterization levels
Topology
Acc[%]
PS MS SES M B Avg PT[min]
BDTSVM 85.69 54.46 48.63 72.75 77.83 67.87 44.13
BDASVM 85.94 53.29 48.94 72.85 80.74 68.35 48.37
+0.48 -4.24
10.Additional experiments – One Against One (OAO) architecture
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
11.Additional results
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Topology
Acc[%]
frame framefw
seg segfw
OAOSVM 63.70 77.38 64.41 74.16
BDASVM 54.95 79.75 62.91 75.79
-8.75 +2.37 -1.5 +1.63
Tab.: Classification performance of OAOSVM and BDASVM architectures for different parameterization levels
Acc represents average for PS, MS, SES, M, B classes
Tab.: The overall classification performance of OAOSVM and BDASVM architectures
Acc represents average from each parameterization levels
Topology
Acc[%]
PS MS SES M B Avg PT[min]
OAOSVM 86.92 53.22 46.79 76.49 86.13 69.91 24.56
BDASVM 85.94 53.29 48.94 72.85 80.74 68.35 48.37
-1.56 -23.81
12. Conclusions & future work
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
Advantages of BDASVM
significant classification error reduction on each individual classification level
regardless the parameterization level (against BDTSVM)
higher the overall classification accuracy (against BDTSVM)
Possibility of using optimal parameterization and feature selection techniques
on each individual level of classification (against OAOSVM)
Disadvantages of BDASVM
 higher number of classifiers => higher processing time (against BDTSVM and
OAOSVM)
 a need to find an optimal feature selection algorithm for selecting optimal
training and testing set (against BDTSVM and OAOSVM)
In the near future, we will make comparison between BDASVM and One Against
All SVM (OAASVM) architecture and extraction of each audio class using
phoneme-based alignment.
Future work will be also directed towards an implementation of the BDASVM into
the BN transcription system.
Thank you for you attention
The Development of Excellence of the Telecommunication Research Team in
Relation to International Cooperation - CZ.1.07/2.3.00/20.0217

More Related Content

Viewers also liked

CV_PhilMorris_2015
CV_PhilMorris_2015CV_PhilMorris_2015
CV_PhilMorris_2015Phil Morris
 
Business Improvements
Business ImprovementsBusiness Improvements
Business Improvements
Celina Thom
 
Santhosh kumar gattu
Santhosh kumar gattuSanthosh kumar gattu
Santhosh kumar gattu
SANTHOSH KUMAR GATTU
 
Home energy information
Home energy informationHome energy information
Home energy information
Gajapriya7085
 
Computacion básica
Computacion básicaComputacion básica
Computacion básica
MariaAmbuludiG
 
Mi vida miguel ángel garcía
Mi vida   miguel ángel garcíaMi vida   miguel ángel garcía
Mi vida miguel ángel garcía
xMagm
 
לדעת לנהל בחוכמה
לדעת לנהל בחוכמהלדעת לנהל בחוכמה
לדעת לנהל בחוכמה
Danny Mann
 
Genre research
Genre researchGenre research
Genre research
jordan248
 
Now You See it! Visualizing your PPC Competition - SMX West 2016
Now You See it! Visualizing your PPC Competition - SMX West 2016Now You See it! Visualizing your PPC Competition - SMX West 2016
Now You See it! Visualizing your PPC Competition - SMX West 2016
Maddie Cary Deuel
 

Viewers also liked (12)

CV_PhilMorris_2015
CV_PhilMorris_2015CV_PhilMorris_2015
CV_PhilMorris_2015
 
Business Improvements
Business ImprovementsBusiness Improvements
Business Improvements
 
Training
TrainingTraining
Training
 
Santhosh kumar gattu
Santhosh kumar gattuSanthosh kumar gattu
Santhosh kumar gattu
 
CV VIJAYLAXMI VERMA
CV VIJAYLAXMI VERMACV VIJAYLAXMI VERMA
CV VIJAYLAXMI VERMA
 
Home energy information
Home energy informationHome energy information
Home energy information
 
Computacion básica
Computacion básicaComputacion básica
Computacion básica
 
MySender
MySenderMySender
MySender
 
Mi vida miguel ángel garcía
Mi vida   miguel ángel garcíaMi vida   miguel ángel garcía
Mi vida miguel ángel garcía
 
לדעת לנהל בחוכמה
לדעת לנהל בחוכמהלדעת לנהל בחוכמה
לדעת לנהל בחוכמה
 
Genre research
Genre researchGenre research
Genre research
 
Now You See it! Visualizing your PPC Competition - SMX West 2016
Now You See it! Visualizing your PPC Competition - SMX West 2016Now You See it! Visualizing your PPC Competition - SMX West 2016
Now You See it! Visualizing your PPC Competition - SMX West 2016
 

Similar to KTTO_2015_Vavrek

SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCSVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
IRJET Journal
 
MANOJ_H_RAO_Resume
MANOJ_H_RAO_ResumeMANOJ_H_RAO_Resume
MANOJ_H_RAO_ResumeManoj Rao
 
DCCN 2016 - Tutorial 1 - Communication with LAN/WLAN
DCCN 2016 - Tutorial 1 - Communication with LAN/WLANDCCN 2016 - Tutorial 1 - Communication with LAN/WLAN
DCCN 2016 - Tutorial 1 - Communication with LAN/WLAN
rudndccn
 
C010421720
C010421720C010421720
C010421720
IOSR Journals
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
jesujoseph
 
Applied mathematics
Applied mathematicsApplied mathematics
Applied mathematicsVisionary_
 
TUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESSTTUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESST
multimediaeval
 
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
Alpen-Adria-Universität
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdf
Vignesh V Menon
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Alpen-Adria-Universität
 
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
TSC University of Mondragon
 
1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable video1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable video
Yogananda Patnaik
 
EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018
EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018
EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018
UVCE
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Beniamino Murgante
 
PANSAT COM AB05-CD06 Final Report
PANSAT COM AB05-CD06 Final ReportPANSAT COM AB05-CD06 Final Report
PANSAT COM AB05-CD06 Final ReportBrian Martiniello
 
UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH KUMAR R 11GUEE...
UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH  KUMAR R 11GUEE...UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH  KUMAR R 11GUEE...
UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH KUMAR R 11GUEE...
UVCE
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
ijtsrd
 

Similar to KTTO_2015_Vavrek (20)

SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVCSVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
SVM Based Saliency Map Technique for Reducing Time Complexity in HEVC
 
HVTpaperDI2003v5
HVTpaperDI2003v5HVTpaperDI2003v5
HVTpaperDI2003v5
 
MANOJ_H_RAO_Resume
MANOJ_H_RAO_ResumeMANOJ_H_RAO_Resume
MANOJ_H_RAO_Resume
 
DCCN 2016 - Tutorial 1 - Communication with LAN/WLAN
DCCN 2016 - Tutorial 1 - Communication with LAN/WLANDCCN 2016 - Tutorial 1 - Communication with LAN/WLAN
DCCN 2016 - Tutorial 1 - Communication with LAN/WLAN
 
C010421720
C010421720C010421720
C010421720
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
 
Applied mathematics
Applied mathematicsApplied mathematics
Applied mathematics
 
TUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESSTTUKE System for MediaEval 2014 QUESST
TUKE System for MediaEval 2014 QUESST
 
20120140505013
2012014050501320120140505013
20120140505013
 
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
MPEG-21-based Cross-Layer Optimization Techniques for enabling Quality of Exp...
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdf
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
 
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
Real-time Implementation of Sphere Decoder-based MIMO Wireless System (EUSIPC...
 
1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable video1 state of-the-art and trends in scalable video
1 state of-the-art and trends in scalable video
 
EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018
EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018
EC(UVCE) 7th sem syllabus copy form lohith kumar 11guee6018
 
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
 
Resume
ResumeResume
Resume
 
PANSAT COM AB05-CD06 Final Report
PANSAT COM AB05-CD06 Final ReportPANSAT COM AB05-CD06 Final Report
PANSAT COM AB05-CD06 Final Report
 
UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH KUMAR R 11GUEE...
UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH  KUMAR R 11GUEE...UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH  KUMAR R 11GUEE...
UVCE ELECTRONICS AND COMMUNICATION 7th SEM SYLLABUS BY LOHITH KUMAR R 11GUEE...
 
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...
 

KTTO_2015_Vavrek

  • 1. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 Multi-level audio classification architecture Jozef Vavrek, Jozef Juhár Department of Electronics and Multimedia Communications Faculty of Electrical Engineering and Informatics Technical University of Košice email: {Jozef.Vavrek; Jozef.Juhar}@tuke.sk
  • 2. Telecommunication Educational Seminar The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
  • 3. Content 1. Motivation and aim 2. Proposed classification system 3. Audio data 4. Segmentation, preprocessing, feature extraction, smoothing 4.1 Feature extraction techniques (cepstral) 4.2 Feature extraction techniques (spectral) 5. Basic principles of BN audio data classification via BDT 6. Basic principles of BN audio data classification via BDA 7. Binary discrimination architecture employing Support Vector Machine classifier (BDASVM) 8. Experimental setup 9. Results 10. Additional experiments – One Against One (OAO) architecture 11. Additional results 12. Conclusions & future work The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
  • 4. 1.Motivation and aim We built the classification system with intention to use if for refinement the acoustic models for each particular audio class and lower the word error rate of the automatic speech recognition (ASR) system. We proposed binary discrimination architecture utilizing support vector machine (BDASVM) classifier in order to overcome classification accuracy of binary decision trees with SVM (BDTSVM) and alleviate miss-classification error that propagates from the top of the architecture. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
  • 5. 2.Proposed classification system The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
  • 6. 3.Audio data  Database: Slovak TV broadcast news BNKE1 - part of the COST-278  Audio: 16 kHz 16 bit mono PCM  Metadata: manually annotated using Transcriber  Duration: 65 hours (188 recordings)  Audio data used for training and testing: The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 Audio event Training set (min) Testing set (min) Pure Speech (PS) 10.19 9.16 Speech with env. sound (SES) 9.26 9.44 Speech with music (MS) 9.41 9.25 Music (M) 11.7 9.04 Env. Sound (Background B) 9.06 9.31 49 46.2
  • 7. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 4.Segmentation, preprocessing, feature extraction, smoothing
  • 8. 4.1 Feature extraction techniques (cepstral) The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 Mel-Frequency Cepstral Coefficients (MFCC) Variance of Acceleration Mel-Frequency Cepstral Coefficients (VAMFCC) Variance of Mel-Filter Bank Energy (VMFBE)
  • 9. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 4.2 Feature extraction techniques (spectral) Spectral Centroid (SC) Spectral Flux (SF) Spectral Spread (SS)
  • 10. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 4.2 Feature extraction techniques (spectral) Spectral ROLL-OFF (ROLLOFF) Band Periodicity (BP)
  • 11. 5.Basic principles of BN audio data classification via BDT The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
  • 12. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 6.Basic principles of BN audio data classification via BDA
  • 13. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 7.Binary Discrimination Architecture employing Support Vector Machine classifier (BDASVM)
  • 14. The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 8.Experimental setup Segmentation: rectangular window with length 200ms and 100ms overlapping Preprocessing: Hamming window with length 50ms and 25ms overlapping Feature extraction: frame-based, segment-based, frame-based with smoothing, segment- based with smoothing Smoothing: floating window with length 1s Classification: support vector machine classifier, RBF kernel function, 5-fold cross- validation •Evaluation parameters: – for cross-validation: Area Under the Curve (AUC)=<TPR>, (0.5,1) – for classification performance: Accuracy (Acc)=(TP+TN)/(TP+FP+TN+FN) – Processing Time (PT) Software: wavex (wav extractor), libsvm-3.17 Hardware: HPC TUKE 24 nodes, IBM Blade System x HS22 with two six-core processor units Intel Xeon L5640 (2.27GHz) and 48 GB RAM
  • 15. 9.Results The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 Topology Acc[%] frame framefw seg segfw BDTSVM 74.82 88.05 78.13 86.81 BDASVM 75.10 89.12 80.43 90.52 +0.28 +1.07 +2.3 +3.71 Tab.: Classification performance of BDTSVM and BDASVM architectures for different parameterization levels Acc represents average from S-NS, PS-NPS, MS-SES, M-B discriminators Tab.: The overall classification performance of BDTSVM and BDASVM architectures Acc represents average from each parameterization levels Topology Acc[%] PS MS SES M B Avg PT[min] BDTSVM 85.69 54.46 48.63 72.75 77.83 67.87 44.13 BDASVM 85.94 53.29 48.94 72.85 80.74 68.35 48.37 +0.48 -4.24
  • 16. 10.Additional experiments – One Against One (OAO) architecture The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217
  • 17. 11.Additional results The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 Topology Acc[%] frame framefw seg segfw OAOSVM 63.70 77.38 64.41 74.16 BDASVM 54.95 79.75 62.91 75.79 -8.75 +2.37 -1.5 +1.63 Tab.: Classification performance of OAOSVM and BDASVM architectures for different parameterization levels Acc represents average for PS, MS, SES, M, B classes Tab.: The overall classification performance of OAOSVM and BDASVM architectures Acc represents average from each parameterization levels Topology Acc[%] PS MS SES M B Avg PT[min] OAOSVM 86.92 53.22 46.79 76.49 86.13 69.91 24.56 BDASVM 85.94 53.29 48.94 72.85 80.74 68.35 48.37 -1.56 -23.81
  • 18. 12. Conclusions & future work The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217 Advantages of BDASVM significant classification error reduction on each individual classification level regardless the parameterization level (against BDTSVM) higher the overall classification accuracy (against BDTSVM) Possibility of using optimal parameterization and feature selection techniques on each individual level of classification (against OAOSVM) Disadvantages of BDASVM  higher number of classifiers => higher processing time (against BDTSVM and OAOSVM)  a need to find an optimal feature selection algorithm for selecting optimal training and testing set (against BDTSVM and OAOSVM) In the near future, we will make comparison between BDASVM and One Against All SVM (OAASVM) architecture and extraction of each audio class using phoneme-based alignment. Future work will be also directed towards an implementation of the BDASVM into the BN transcription system.
  • 19. Thank you for you attention The Development of Excellence of the Telecommunication Research Team in Relation to International Cooperation - CZ.1.07/2.3.00/20.0217