Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets

1,067 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets

  1. 1. Technische Universität München Violent Scenes Detection withLarge, Brute-Forced Acoustic and Visual Feature Sets Florian Eyben, Felix Weninger, Nicolas Lehment, Gerhard Rigoll, Björn Schuller Institute for Human-Machine Communication, Technische Universität München Session “Affect Task: Violent Scenes Detection” October 4, 2012
  2. 2. Technische Universität München“Large”• Start with frame-wise features (audio / video)• Summarize over „meaningful unit“ – Shot? – Sliding window? – Overlap?• Application of functionals: – Percentiles, moments, …• Results in 3.8k audio and 9.7k video featuresOctober 4, 2012 TUM / Felix Weninger 2
  3. 3. Technische Universität MünchenFrame-Wise Features (LLDs)• Acoustic energy LLDs – Loudness, energy, ZCR• Acoustic spectral LLDs – MFCCs, band energy, centroid, roll-off point, flux, entropy, moments, sharpness, harmonicity• Visual LLDs – HSV histogram – Optical Flow: histogram + mean + std.dev. – Laplacian edge image histogram + strongest edgeOctober 4, 2012 TUM / Felix Weninger 3
  4. 4. Technische Universität München“Brute-Forced”• Fully data-based approach (no pre-classification)• Little hand-crafting / engineering of features• Systematic feature (over-)generation• Emphasize on machine learning• Successful in affect recognition and speaker characterization tasks – INTERSPEECH 2009 Emotion Challenge – INTERSPEECH 2010 Paralinguistic Challenge – INTERSPEECH 2011 Intoxication / Sleepiness• Generalization?October 4, 2012 TUM / Felix Weninger 4
  5. 5. Technische Universität MünchenA Data-Based Approach• System development based on 3-fold CV of development data – „Movie-independent“ – Stratified by violence proportion and age• Use all features from development data for evaluation on test dataOctober 4, 2012 TUM / Felix Weninger 5
  6. 6. Technische Universität München„Acoustic and Visual“• Expect complementarity of modalities• Late fusion by confidences of single-modal classifiersOctober 4, 2012 TUM / Felix Weninger 6
  7. 7. Technische Universität MünchenSegmentation and Classification• Two segmentations evaluated on development set: – Functionals over shots – Functionals over X sec. sliding window• Sliding window segmentation: – Classify per window – Fuse window classification per shot – Alternative: Generate segmentation• Weka, SVM (SMO), C = 0.01• Logistic regression to obtain confidencesOctober 4, 2012 TUM / Felix Weninger 7
  8. 8. Technische Universität MünchenTUM Test Runs Run Modality Overlap Overlap MAP100 MAP100 MAP20 Train Eval Test Dev (CV) Dev (CV) TUM-1 A+V X .484 .397 .525 TUM-2 A X .376 .445 .515 TUM-3 A X X .360 .428 .518 TUM-4 A .392 .442 .503 TUM-5 V .320 .224 .213October 4, 2012 TUM / Felix Weninger 8
  9. 9. Technische Universität MünchenTUM Test Runs Run Modality Overlap Overlap UA Rec WA Rec Train Eval Dev Dev TUM-1 A+V X .584 .848 TUM-2 A X .648 .830 TUM-3 A X X .648 .826 TUM-4 A .634 .829 TUM-5 V .537 .832October 4, 2012 TUM / Felix Weninger 9
  10. 10. Technische Universität MünchenTest Data: MAP 100 by MovieMovie TUM-1 (A+V) TUM-2 (A)Dead Poets Society .523 .158Fight Club .321 .315Independence Day .609 .656October 4, 2012 TUM / Felix Weninger 10
  11. 11. Technische Universität MünchenDiscussion• MAP very sensitive to segmentation – Ex.: MAP100 = .73, MAP20 = .88 on Dev iff segment boundaries are aligned to violent / non-violent scenes NV V Aligned: Not Aligned: ? – Train on aligned data / test on not aligned data: MAP100 = .49• Accuracies: similar ranking, but less „sensitive“ – Correlated with target function in learningOctober 4, 2012 TUM / Felix Weninger 11
  12. 12. Technische Universität MünchenConclusions and Outlook• Demonstrated feasibility of „brute-force“ approach• Acoustic features alone are often competitive• Visual features are complementary• Future: Deeper analysis of – Individual features„ worth – Influence of segmentation on model training and evaluationOctober 4, 2012 TUM / Felix Weninger 12
  13. 13. Technische Universität München Thank you. weninger@tum.de openSMILE: http://opensmile.sourceforge.netOctober 4, 2012 TUM / Felix Weninger 13

×