Random Forests for Laughter Detection

511 views

Published on

Random Forest Classifier, mRMR filter and Gaussian smoothing applied to frame-wise features for laughter and filler detection.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
511
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Random Forests for Laughter Detection

  1. 1. RANDOM FORESTS FOR LAUGHTER DETECTION Heysem Kaya, A. Mehdi Ercetin, A. Ali Salah, S. Fikret Gurgen Department of Computer Engineering Bogazici University / Istanbul WASSS '13, Grenoble, 23.08.2013
  2. 2. Presentation Outline ● Introduction and Motivation ● A Brief Review of Literature ● The Challenge Corpus ● Methodology – Methods & Features Utilized – Experimental Results ● Conclusions ● Questions
  3. 3. Introduction • • • • • New directions in ASR: emotional and social cues. Social Signal Sub-Challenge of the Interspeech 2013: detection of laughter and fillers Challenges: garbage class dominates, too many samples Best results obtained with fusion (e.g. speech + vision) TUM baseline feature set
  4. 4. Brief Review of Related Work Work Classifier Features Other Notes Ito et al., 2005 GMM MFCC, Δ-MFCC Class. fusion with AND operator was found to increase precision Thruong and Leeuven, 2007 GMM & SVM PLP, prosody voicing, modulation spectrum Knox and Mirgafori, 2007 MLP MFCC, prosody along with first and second Δ Petridis and Pantic, 2008 MLP, PLP, prosody & Adaboost their Δs as well (feat. sel.) as video features Vinciarelli et al., 2009 Fusion with diverse meta classifiers was found to outperform fusion of class. with the same algorithm ● Class prob. stacked to ANN ●Δ-MFCC outdid raw MFCC ● Multi-modal fusion was superior to single-modal model ● Best results: stacking to ANN ● Survey recommends utilization of classifier fusion for SSP
  5. 5. The SSP Challenge Dataset - Corpus ● ● Based on SSPNet Vocalization Corpus Collected from microphone recordings of 60 phone calls ● Contains 2763 clips each with 11s length ● At least one laughter or filler at every clip ● ● Fillers are vocalizations to hold floor (e.g. “uhm”, “eh”, “ah”) To provide speaker independence: – calls #1-35 served as training, – #36-45 as development and the rest as test
  6. 6. The SSP Challenge Dataset – Statistics Property Statistic # of Clips 2763 Clip Duration 11 sec. # of Phone Calls 60 # of Subjects 120 # Male Subjects 57 # Female Subjects 63 # of Filler Events 3.0 k # of Laughter Events 1.2 k
  7. 7. Methodology ● Classifier: – Random Forests (RF) – Support Vector Machines (SVM) ● Challenge Features (TUM Baseline) ● Feature Selection: – ● minimum Redundancy Maximum Relevance (mRMR) Post-processing: – Gaussian Window Smoothing
  8. 8. The SSP Challenge Dataset - Features ● ● Non-overlapping frames of length 10ms (over 3 x 10⁶ frames) Considering memory limitations, a small set of (141) affectively potent features are extracted (Schuller et al., 2013): – MFCC 1-12, logarithmic energy, voicing probability, HNR, F0, zero crossing rate along with their first order Δ – Also for MFCC and log. energy second order Δ – Frame wise LLDs are augmented with mean and std. of the frame and its 8 neighbors
  9. 9. Random Forests ● ● Random Forest is a fusion of decision tree predictors1 Each decision tree is grown with – – ● ● a set of randomly selected instances (sampled with replacement) having a subset of features which are also randomly selected Sampling with replacement leaves on the avg 1/3 instances 'out of the bag' They are shown to be superior to current algorithms in accuracy and perform well on large databases 1 L. Breiman, “Random Forests”, University of California, Berkeley,USA, 2001
  10. 10. mRMR Based Feature Selection ● ● ● mRMR is introduced by Peng et al. (2005) as a feature ranking algorithm based on mutual information (MI) MI quantifies the amount of shared information between two random variables A candidate feature is selected having max MI with target variable – min MI with already selected features – For mRMR we used authors' original implementation* *http://penglab.janelia.org/proj/mRMR/ –
  11. 11. Experimental Results ● ● We used WEKA* implementation of SVM and RFs. For Social Signal Sub-Challenge we consider Area Under Curve (AUC) measure of laughter and filler classes and their unweighted average (UAAUC) *M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA Data Mining Software”, 2009 (http://www.cs.waikato.ac.nz/ml/weka/)
  12. 12. Baseline Results with Linear SVM (%AUC) SVM Baseline on Development Set1 Laughter C2 0.1 81.3 1 81.2 10 81.2 1 2 Filler 83.6 83.7 83.7 UAAUC 82.5 82.5 82.5 Challenge paper reports 86.2% and 89.0% for laughter and filler, respectively SVM Complexity parameter SVM Performance with mRMR Features on Devel. Set (C=0.1) # of mRMR Features 90 70 50 Laughter Filler UAAUC 80.7 79.9 78.8 83.3 83.0 82.4 82.0 81.5 80.6
  13. 13. Experiments with Random Forests ● Hyper-parameters of an RF are – – ● ● the number of (randomly selected) features per decision tree (d) and the number of trees (T) to form the forest We also investigated the effect of mRMR ranked features (D) The tested values used for the hyper-parameters are d = {8,16,32}, T= {10,20,30} and D = {50,70,90, All}
  14. 14. Random Forest Performance with T=20, Varying d and D on Development Set (% AUC) Local Dim. (d) #mRMR (D) All 90 8 70 50 All 90 16 70 50 All 90 32 70 50 Laughter 88.9 89.3 88.3 88.0 89.0 89.3 88.8 88.3 89.5 89.6 89.1 88.2 Filler 90.7 90.8 90.3 89.7 90.6 90.7 90.7 90.0 90.9 90.9 90.8 90.0 UAAUC 89.8 90.1 89.3 88.9 89.8 90.0 89.8 89.2 90.2 90.3 90.0 89.1
  15. 15. Random Forest Performance with Varying T, d and D on Development Set (% UAAUC) Local Dim. (d) #mRMR (D) 8 16 32 Avg All 90 70 50 All 90 70 50 All 90 70 50 10 Trees 20 Trees 30 Trees 87.4 88.0 87.9 87.5 88.2 88.6 88.5 87.8 88.8 89.0 88.7 87.9 88.2 89.8 90.1 89.3 88.9 89.8 90.0 89.8 89.2 90.2 90.3 90.0 89.1 89.7 89.7 90.1 89.8 89.4 90.4 90.5 90.2 89.6 90.4 90.8 90.4 89.6 90.0 Development set baseline reported on paper: 87.6%, reproduced baseline: 82.5%
  16. 16. Gaussian Smoothing ● ● ● We further applied Gaussian window smoothing on posteriors. The posteriors of each frame were re-calculated as a weighted sum of its 2K neighbors (K before & K after) The Gaussian weight function used to smooth frame i with a neighboring frame j, is given as −(1/2) w i , j =(2∗pi∗B) ● ● ∗exp(−∣i− j∣/(2∗B)) We tested development set accuracy for K = 1,...,10 Since the increase from K=8 to K=9 was less than 0.05%, we chose K=8
  17. 17. The Development Set Accuracy vs K
  18. 18. The Challenge Test Set Result ● We re-trained training and development sets together with the best setting (T=30, D=90, d=32) ● We issued a Gaussian smoothing with K=8 ● On the overall we attained – 89.6% and 87.3% AUC for laughter and fillers – An UAAUC of 88.4% outperformed the challenge test set baseline (83.3%) by 5.1%
  19. 19. Conclusions ● ● ● ● We proposed the use of RFs for laughter detection The results indicate superior performance in accuracy We observed that RFs make use of feature reduction via mRMR while SVMs deteriorate Together with Gaussian smoothing of posteriors, we attained an increase of 5.1% (absolute) from challenge test set baseline
  20. 20. Thanks! ● Merci de votre attention

×