Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets

  • 500 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
500
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Technische Universität München Violent Scenes Detection withLarge, Brute-Forced Acoustic and Visual Feature Sets Florian Eyben, Felix Weninger, Nicolas Lehment, Gerhard Rigoll, Björn Schuller Institute for Human-Machine Communication, Technische Universität München Session “Affect Task: Violent Scenes Detection” October 4, 2012
  • 2. Technische Universität München“Large”• Start with frame-wise features (audio / video)• Summarize over „meaningful unit“ – Shot? – Sliding window? – Overlap?• Application of functionals: – Percentiles, moments, …• Results in 3.8k audio and 9.7k video featuresOctober 4, 2012 TUM / Felix Weninger 2
  • 3. Technische Universität MünchenFrame-Wise Features (LLDs)• Acoustic energy LLDs – Loudness, energy, ZCR• Acoustic spectral LLDs – MFCCs, band energy, centroid, roll-off point, flux, entropy, moments, sharpness, harmonicity• Visual LLDs – HSV histogram – Optical Flow: histogram + mean + std.dev. – Laplacian edge image histogram + strongest edgeOctober 4, 2012 TUM / Felix Weninger 3
  • 4. Technische Universität München“Brute-Forced”• Fully data-based approach (no pre-classification)• Little hand-crafting / engineering of features• Systematic feature (over-)generation• Emphasize on machine learning• Successful in affect recognition and speaker characterization tasks – INTERSPEECH 2009 Emotion Challenge – INTERSPEECH 2010 Paralinguistic Challenge – INTERSPEECH 2011 Intoxication / Sleepiness• Generalization?October 4, 2012 TUM / Felix Weninger 4
  • 5. Technische Universität MünchenA Data-Based Approach• System development based on 3-fold CV of development data – „Movie-independent“ – Stratified by violence proportion and age• Use all features from development data for evaluation on test dataOctober 4, 2012 TUM / Felix Weninger 5
  • 6. Technische Universität München„Acoustic and Visual“• Expect complementarity of modalities• Late fusion by confidences of single-modal classifiersOctober 4, 2012 TUM / Felix Weninger 6
  • 7. Technische Universität MünchenSegmentation and Classification• Two segmentations evaluated on development set: – Functionals over shots – Functionals over X sec. sliding window• Sliding window segmentation: – Classify per window – Fuse window classification per shot – Alternative: Generate segmentation• Weka, SVM (SMO), C = 0.01• Logistic regression to obtain confidencesOctober 4, 2012 TUM / Felix Weninger 7
  • 8. Technische Universität MünchenTUM Test Runs Run Modality Overlap Overlap MAP100 MAP100 MAP20 Train Eval Test Dev (CV) Dev (CV) TUM-1 A+V X .484 .397 .525 TUM-2 A X .376 .445 .515 TUM-3 A X X .360 .428 .518 TUM-4 A .392 .442 .503 TUM-5 V .320 .224 .213October 4, 2012 TUM / Felix Weninger 8
  • 9. Technische Universität MünchenTUM Test Runs Run Modality Overlap Overlap UA Rec WA Rec Train Eval Dev Dev TUM-1 A+V X .584 .848 TUM-2 A X .648 .830 TUM-3 A X X .648 .826 TUM-4 A .634 .829 TUM-5 V .537 .832October 4, 2012 TUM / Felix Weninger 9
  • 10. Technische Universität MünchenTest Data: MAP 100 by MovieMovie TUM-1 (A+V) TUM-2 (A)Dead Poets Society .523 .158Fight Club .321 .315Independence Day .609 .656October 4, 2012 TUM / Felix Weninger 10
  • 11. Technische Universität MünchenDiscussion• MAP very sensitive to segmentation – Ex.: MAP100 = .73, MAP20 = .88 on Dev iff segment boundaries are aligned to violent / non-violent scenes NV V Aligned: Not Aligned: ? – Train on aligned data / test on not aligned data: MAP100 = .49• Accuracies: similar ranking, but less „sensitive“ – Correlated with target function in learningOctober 4, 2012 TUM / Felix Weninger 11
  • 12. Technische Universität MünchenConclusions and Outlook• Demonstrated feasibility of „brute-force“ approach• Acoustic features alone are often competitive• Visual features are complementary• Future: Deeper analysis of – Individual features„ worth – Influence of segmentation on model training and evaluationOctober 4, 2012 TUM / Felix Weninger 12
  • 13. Technische Universität München Thank you. weninger@tum.de openSMILE: http://opensmile.sourceforge.netOctober 4, 2012 TUM / Felix Weninger 13