0
Technische Universität München     Violent Scenes Detection withLarge, Brute-Forced Acoustic and Visual              Featu...
Technische Universität München“Large”• Start with frame-wise features (audio / video)• Summarize over „meaningful unit“   ...
Technische Universität MünchenFrame-Wise Features (LLDs)• Acoustic energy LLDs      – Loudness, energy, ZCR• Acoustic spec...
Technische Universität München“Brute-Forced”•   Fully data-based approach (no pre-classification)•   Little hand-crafting ...
Technische Universität MünchenA Data-Based Approach• System development based on 3-fold CV of  development data      – „Mo...
Technische Universität München„Acoustic and Visual“• Expect complementarity of modalities• Late fusion by confidences of s...
Technische Universität MünchenSegmentation and Classification• Two segmentations evaluated on development set:      – Func...
Technische Universität MünchenTUM Test Runs   Run            Modality      Overlap         Overlap   MAP100    MAP100    M...
Technische Universität MünchenTUM Test Runs            Run   Modality       Overlap   Overlap   UA Rec          WA Rec    ...
Technische Universität MünchenTest Data: MAP 100 by MovieMovie                               TUM-1 (A+V)          TUM-2 (A...
Technische Universität MünchenDiscussion• MAP very sensitive to segmentation      – Ex.: MAP100 = .73, MAP20 = .88 on Dev ...
Technische Universität MünchenConclusions and Outlook•   Demonstrated feasibility of „brute-force“ approach•   Acoustic fe...
Technische Universität München                                   Thank you.                            weninger@tum.de    ...
Upcoming SlideShare
Loading in...5
×

Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets

590

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
590
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature Sets"

  1. 1. Technische Universität München Violent Scenes Detection withLarge, Brute-Forced Acoustic and Visual Feature Sets Florian Eyben, Felix Weninger, Nicolas Lehment, Gerhard Rigoll, Björn Schuller Institute for Human-Machine Communication, Technische Universität München Session “Affect Task: Violent Scenes Detection” October 4, 2012
  2. 2. Technische Universität München“Large”• Start with frame-wise features (audio / video)• Summarize over „meaningful unit“ – Shot? – Sliding window? – Overlap?• Application of functionals: – Percentiles, moments, …• Results in 3.8k audio and 9.7k video featuresOctober 4, 2012 TUM / Felix Weninger 2
  3. 3. Technische Universität MünchenFrame-Wise Features (LLDs)• Acoustic energy LLDs – Loudness, energy, ZCR• Acoustic spectral LLDs – MFCCs, band energy, centroid, roll-off point, flux, entropy, moments, sharpness, harmonicity• Visual LLDs – HSV histogram – Optical Flow: histogram + mean + std.dev. – Laplacian edge image histogram + strongest edgeOctober 4, 2012 TUM / Felix Weninger 3
  4. 4. Technische Universität München“Brute-Forced”• Fully data-based approach (no pre-classification)• Little hand-crafting / engineering of features• Systematic feature (over-)generation• Emphasize on machine learning• Successful in affect recognition and speaker characterization tasks – INTERSPEECH 2009 Emotion Challenge – INTERSPEECH 2010 Paralinguistic Challenge – INTERSPEECH 2011 Intoxication / Sleepiness• Generalization?October 4, 2012 TUM / Felix Weninger 4
  5. 5. Technische Universität MünchenA Data-Based Approach• System development based on 3-fold CV of development data – „Movie-independent“ – Stratified by violence proportion and age• Use all features from development data for evaluation on test dataOctober 4, 2012 TUM / Felix Weninger 5
  6. 6. Technische Universität München„Acoustic and Visual“• Expect complementarity of modalities• Late fusion by confidences of single-modal classifiersOctober 4, 2012 TUM / Felix Weninger 6
  7. 7. Technische Universität MünchenSegmentation and Classification• Two segmentations evaluated on development set: – Functionals over shots – Functionals over X sec. sliding window• Sliding window segmentation: – Classify per window – Fuse window classification per shot – Alternative: Generate segmentation• Weka, SVM (SMO), C = 0.01• Logistic regression to obtain confidencesOctober 4, 2012 TUM / Felix Weninger 7
  8. 8. Technische Universität MünchenTUM Test Runs Run Modality Overlap Overlap MAP100 MAP100 MAP20 Train Eval Test Dev (CV) Dev (CV) TUM-1 A+V X .484 .397 .525 TUM-2 A X .376 .445 .515 TUM-3 A X X .360 .428 .518 TUM-4 A .392 .442 .503 TUM-5 V .320 .224 .213October 4, 2012 TUM / Felix Weninger 8
  9. 9. Technische Universität MünchenTUM Test Runs Run Modality Overlap Overlap UA Rec WA Rec Train Eval Dev Dev TUM-1 A+V X .584 .848 TUM-2 A X .648 .830 TUM-3 A X X .648 .826 TUM-4 A .634 .829 TUM-5 V .537 .832October 4, 2012 TUM / Felix Weninger 9
  10. 10. Technische Universität MünchenTest Data: MAP 100 by MovieMovie TUM-1 (A+V) TUM-2 (A)Dead Poets Society .523 .158Fight Club .321 .315Independence Day .609 .656October 4, 2012 TUM / Felix Weninger 10
  11. 11. Technische Universität MünchenDiscussion• MAP very sensitive to segmentation – Ex.: MAP100 = .73, MAP20 = .88 on Dev iff segment boundaries are aligned to violent / non-violent scenes NV V Aligned: Not Aligned: ? – Train on aligned data / test on not aligned data: MAP100 = .49• Accuracies: similar ranking, but less „sensitive“ – Correlated with target function in learningOctober 4, 2012 TUM / Felix Weninger 11
  12. 12. Technische Universität MünchenConclusions and Outlook• Demonstrated feasibility of „brute-force“ approach• Acoustic features alone are often competitive• Visual features are complementary• Future: Deeper analysis of – Individual features„ worth – Influence of segmentation on model training and evaluationOctober 4, 2012 TUM / Felix Weninger 12
  13. 13. Technische Universität München Thank you. weninger@tum.de openSMILE: http://opensmile.sourceforge.netOctober 4, 2012 TUM / Felix Weninger 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×