An Uninformed Approach to Violence        Detection in Hollywood Movies                                    ARF (Austria-Ro...
Presentation outline          • The approach          • Video content description & classification          • Experimental...
The approach                                                            e.g. movie: Harry Potter > challenge: find a way t...
The approach: machine learning > approach:      low-level features                mid-level prediction              predic...
The approach: machine learning > approach: testing      low-level features                mid-level prediction       predi...
Video content description - audio     standard audio features    (frame-level)                                           ...
Video content description - visual   feature descriptors (frame-level)       • Histogram of oriented Gradients (HoG) ~ co...
Classifier: multi-layer perceptron      desc. dim.                       512 units               1-5 (~concept tags)  - tr...
Experimental results: concept prediction   > validation of the concept predictor (on the 15 train movies);   > use concept...
Experimental results: violence prediction   > validation of the violence predictor (on the 15 train movies);   > input: de...
Experimental results: official runs   > segment/shot violence decision: assign the frame-wise highest   prediction score +...
D                                                                                 0,1                                     ...
Conclusions and future work  > fair performance for a naïve attempt to violence detection;  > a high baseline to be challe...
thank you !                        any questions ?MediaEval - Pisa, Italy, 4-5 October 2012   13/13                       ...
Upcoming SlideShare
Loading in …5
×

ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywood Movies

711 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
711
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywood Movies

  1. 1. An Uninformed Approach to Violence Detection in Hollywood Movies ARF (Austria-Romania-France) team Jan SCHLÜTER+1 Bogdan IONESCU*2,4 jan.schlueter@ofai.at bionescu@imag.pub.ro Ionuț MIRONICĂ2 Markus SCHEDL3 imironica@imag.pub.ro markus.schedl@jku.at +this work was supported by the Austrian Science Fund (FWF) under project no. Z159. *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557.1 2 3 4 Austrian Research University Institute for Artificial POLITEHNICA Intelligence of Bucharest
  2. 2. Presentation outline • The approach • Video content description & classification • Experimental results • Conclusions and future workMediaEval - Pisa, Italy, 4-5 October 2012 1/13 2
  3. 3. The approach e.g. movie: Harry Potter > challenge: find a way to tag violence in movies; correlation matrix Armageddon Kill Bill The Wicker Man (on ground truth) > what approach ? different correlations between violence and concepts; high variability in appearance of violent scenes from movie to movie; training a classifier on ground-truth to predict directly the violence high low frames is questionable.MediaEval - Pisa, Italy, 4-5 October 2012 2/133
  4. 4. The approach: machine learning > approach: low-level features mid-level prediction predicting violence training pred. (real values) blood training & optimizing frame-level … descriptors pred. fire violence movies & yes/no ground truth … (+ score)(annotations) pred. screamsMediaEval - Pisa, Italy, 4-5 October 2012 3/134
  5. 5. The approach: machine learning > approach: testing low-level features mid-level prediction predicting violence pred. blood frame-level … descriptors pred. fire violence unseen yes/no movie … (+ score) pred. screamsMediaEval - Pisa, Italy, 4-5 October 2012 4/135
  6. 6. Video content description - audio  standard audio features (frame-level) • Zero-Crossing Rate, • Linear Predictive Coefficients, time • Line Spectral Pairs, • Mel-Frequency Cepstral Coefficients, global • spectral centroid, flux, rolloff, and f1 f2 … fn feature = kurtosis,+ mean & + variance of each feature over var{f2} var{fn} variance a certain window. [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands]MediaEval - Pisa, Italy, 4-5 October 2012 5/13 6
  7. 7. Video content description - visual  feature descriptors (frame-level) • Histogram of oriented Gradients (HoG) ~ counts occurrences of gradient orientation in localized portions of an image (20º per bin);  color descriptors (frame-level) • Color naming histogram ~ project colours into 11 universal color names (black, blue, brown, grey, green, orange, pink, purple, red, white, and yellow); [J. van de Weijer et al. IEEE TIP’09]  visual activity (frame-level) high values will 9 2 account for important visual changes ~ action time [B. Ionescu et al. IEEE ICASSP’06]MediaEval - Pisa, Italy, 4-5 October 2012 6/13 7
  8. 8. Classifier: multi-layer perceptron desc. dim. 512 units 1-5 (~concept tags) - training using back-propagation, - use dropout to reduce overfitting: a fraction of units is randomly omitted for each training case so a unit cannot rely on all other units being present. [G. Hinton et al. arXiv.org’12]MediaEval - Pisa, Italy, 4-5 October 2012 7/13 8
  9. 9. Experimental results: concept prediction > validation of the concept predictor (on the 15 train movies); > use concept ground truth; the purely visual * concepts obtain high Fscore mainly because they are rare, blood detector not that accurate (e.g. missed most blood in “Kill Bill”), best results for fire and explosions (prominent yellow tones), gunshots leave-one-movie-out cross-validation and screams. *results reported for an optimum thresholdMediaEval - Pisa, Italy, 4-5 October 2012 8/13 9
  10. 10. Experimental results: violence prediction > validation of the violence predictor (on the 15 train movies); > input: descriptors + mid-level predictions (real numbers); > use violence ground truth; + median filtering for predictions 0.41 0.46 0.3 0.34 0.23 0.27 prec. rec. F-sc. prec. rec. F-sc. optimal threshold optimal threshold leave-one-movie-out cross-validationMediaEval - Pisa, Italy, 4-5 October 2012 9/13 10
  11. 11. Experimental results: official runs > segment/shot violence decision: assign the frame-wise highest prediction score + thresholding; > segment-level results: precision 0.28, recall 0.49, F-score 0.36, MAP@100 0.55; > shot-level results: results vary significantly with the movieMediaEval - Pisa, Italy, 4-5 October 2012 10/13 11
  12. 12. D 0,1 0,2 0,3 0,4 0,5 0,6 0,7 D YN 0,05 0,15 0,25 0,35 0,1 0,2 0,3 00 Y I D NI --5 D YN 5 Y I D NI -1 D YN -1 Y I D NI --4 4 D YN YNI - MAP TU I - 3 TU B 3 D B -5 D YN -5 YNI -2 MAP@100 TE I -2 TE C C- TU - 1 TU B 1 B--2 N 2 N II- TU II-5 5 TU B B- TU - 4 TU B 4 B- TU - 1 TU B 1 B--3 N 3 N II- II- 4 N 4 N II- II- 1 MediaEval - Pisa, Italy, 4-5 October 2012 N 1 N II- II- 2 N 2 N II- II 3 L -3 LIIG G -2 > shot-level comparative results: - L 2 LIIG G -4 - L 4 LIIG G -3 - L 3 LIIG G -1 TU - TU M 1 M- TU 5 TU -5 M M -3 TU -3 TU M M -2 TU -2 TU M M- TE -4 4 TE Sh C C- Sh an an g TE - 2 TE C 2 Sh ha Sh gha i C- an H TU 4 an iH o TU -4 Sh gha o ng M Experimental results: official runs Sh gha i ng k M--1 an H an iH o ko n 1 o Sh gha o ng n g 3 Sh gha ng k g--3 an H an iiH o kon o gh n ng gh on g g--4 ai g k 4 ai H k o H o on on ng - ng g gk -5 ko 5 onng Sh g--2 an TE 2 TE gh C C- ai H TE - 5 TE 5 on C gk C--3 3 onng g--1 1 AR AR F F--1 111/13 12
  13. 13. Conclusions and future work > fair performance for a naïve attempt to violence detection; > a high baseline to be challenged by more sophisticated approaches; > future work:  investigate whether the concept predictions actually helped,  investigate contribution of modalities,  investigate dropout vs. classic learning.MediaEval - Pisa, Italy, 4-5 October 2012 12/13 13
  14. 14. thank you ! any questions ?MediaEval - Pisa, Italy, 4-5 October 2012 13/13 14

×