SlideShare a Scribd company logo
1 of 14
An Uninformed Approach to Violence
        Detection in Hollywood Movies
                                    ARF (Austria-Romania-France) team


                                  Jan SCHLÜTER+1                   Bogdan IONESCU*2,4
                                    jan.schlueter@ofai.at            bionescu@imag.pub.ro


                                  Ionuț MIRONICĂ2                   Markus SCHEDL3
                                   imironica@imag.pub.ro            markus.schedl@jku.at



    +this   work was supported by the Austrian Science Fund (FWF) under project no. Z159.
    *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557.
1                                         2                    3                            4
               Austrian Research                University
               Institute for Artificial         POLITEHNICA
               Intelligence                     of Bucharest
Presentation outline


          • The approach

          • Video content description & classification

          • Experimental results

          • Conclusions and future work




MediaEval - Pisa, Italy, 4-5 October 2012                1/13 2
The approach
                                                            e.g. movie: Harry Potter
 > challenge: find a way
 to tag violence in movies;            correlation matrix
                                         Armageddon
                                          Kill Bill
                                        The Wicker Man
                                       (on ground truth)
 > what approach ?
 different correlations between
 violence and concepts;

 high variability in appearance
 of violent scenes from movie
 to movie;


 training a classifier
 on ground-truth to predict
 directly the violence                                       high low
 frames is questionable.
MediaEval - Pisa, Italy, 4-5 October 2012                                      2/133
The approach: machine learning
 > approach:
      low-level features                mid-level prediction              predicting violence
                                     training

                                                pred. (real values)
                               blood
                                                                         training & optimizing
            frame-level          …
            descriptors                         pred.
                                 fire                                 violence
  movies &                                                                             yes/no
 ground truth                    …                                                    (+ score)
(annotations)
                                                 pred.
                             screams



MediaEval - Pisa, Italy, 4-5 October 2012                                                    3/134
The approach: machine learning
 > approach: testing
      low-level features                mid-level prediction       predicting violence


                                               pred.
                               blood

            frame-level          …
            descriptors                        pred.
                                 fire                          violence
  unseen                                                                       yes/no
   movie                         …
                                                                              (+ score)
                                                pred.
                             screams



MediaEval - Pisa, Italy, 4-5 October 2012                                          4/135
Video content description - audio
     standard audio features
    (frame-level)

                                                         • Zero-Crossing Rate,
                                                         • Linear Predictive Coefficients,

                                       time              • Line Spectral Pairs,

                                                         • Mel-Frequency Cepstral Coefficients,
                                              global
                                                         • spectral centroid, flux, rolloff, and
    f1 f2        …        fn                 feature
                                                =        kurtosis,
+                                           mean &       + variance of each feature over
     var{f2}          var{fn}               variance     a certain window.



                                            [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands]

MediaEval - Pisa, Italy, 4-5 October 2012                                                       5/13
                                                                                                   6
Video content description - visual
   feature descriptors (frame-level)
       • Histogram of oriented Gradients (HoG) ~ counts occurrences of gradient
       orientation in localized portions of an image (20º per bin);

   color descriptors (frame-level)
       • Color naming histogram ~ project colours into 11 universal color names
       (black, blue, brown, grey, green, orange, pink, purple, red, white, and yellow);
                                                             [J. van de Weijer et al. IEEE TIP’09]
   visual activity (frame-level)
                                                                           high values will
                9                   2                                        account for
                                                                          important visual
                                                                         changes ~ action

                                                                     time
                                                              [B. Ionescu et al. IEEE ICASSP’06]

MediaEval - Pisa, Italy, 4-5 October 2012                                                   6/13
                                                                                               7
Classifier: multi-layer perceptron




      desc. dim.                       512 units               1-5 (~concept tags)

  - training using back-propagation,
  - use 'dropout' to reduce overfitting: a fraction of units is randomly
  omitted for each training case so a unit cannot rely on all other units
  being present.                                        [G. Hinton et al. arXiv.org’12]

MediaEval - Pisa, Italy, 4-5 October 2012                                         7/13
                                                                                     8
Experimental results: concept prediction
   > validation of the concept predictor (on the 15 train movies);
   > use concept ground truth;
                                                           the purely visual
                                                   *       concepts obtain high
                                                           Fscore mainly because
                                                           they are rare,

                                                           blood detector not that
                                                           accurate (e.g. missed
                                                           most blood in “Kill Bill”),

                                                          best results for fire and
                                                          explosions (prominent
                                                          yellow tones), gunshots
                 leave-one-movie-out cross-validation
                                                          and screams.
                                               *results reported for an optimum threshold
MediaEval - Pisa, Italy, 4-5 October 2012                                            8/13
                                                                                        9
Experimental results: violence prediction
   > validation of the violence predictor (on the 15 train movies);

   > input: descriptors + mid-level predictions (real numbers);

   > use violence ground truth;                           + median filtering
                                                          for predictions
                      0.41                               0.46
                                 0.3                             0.34
           0.23                                 0.27



            prec.     rec.      F-sc.            prec.   rec.    F-sc.
                      optimal threshold                  optimal threshold

                                            leave-one-movie-out cross-validation
MediaEval - Pisa, Italy, 4-5 October 2012                                      9/13
                                                                                  10
Experimental results: official runs
   > segment/shot violence decision: assign the frame-wise highest
   prediction score + thresholding;

   > segment-level results:
       precision 0.28, recall 0.49, F-score 0.36, MAP@100 0.55;

   > shot-level results:
                                                        results vary
                                                        significantly
                                                        with the movie




MediaEval - Pisa, Italy, 4-5 October 2012                            10/13
                                                                         11
D




                                                                                 0,1
                                                                                        0,2
                                                                                               0,3
                                                                                                      0,4
                                                                                                             0,5
                                                                                                                    0,6
                                                                                                                    0,7
                                                               D YN




                                                                                 0,05
                                                                                               0,15
                                                                                                             0,25
                                                                                                                    0,35




                                                                                         0,1
                                                                                                       0,2
                                                                                                                     0,3




                                                                            00
                                                                 Y I
                                                               D NI --5
                                                               D YN 5
                                                                 Y I
                                                               D NI -1
                                                               D YN -1
                                                                 Y I
                                                               D NI --4 4
                                                               D YN
                                                                 YNI
                                                                       -




                                                                                                                     MAP
                                                                TU I - 3
                                                               TU B 3
                                                               D B -5
                                                               D YN -5
                                                                 YNI
                                                                       -2




                                                                                                                    MAP@100
                                                                TE I -2
                                                               TE C
                                                                    C-
                                                                TU - 1
                                                               TU B 1
                                                                    B--2
                                                                  N 2
                                                                  N II-
                                                                TU II-5 5
                                                               TU B
                                                                    B-
                                                                TU - 4
                                                               TU B 4
                                                                    B-
                                                                TU - 1
                                                               TU B 1
                                                                    B--3
                                                                  N 3
                                                                  N II-
                                                                    II- 4
                                                                  N 4
                                                                  N II-
                                                                    II- 1




 MediaEval - Pisa, Italy, 4-5 October 2012
                                                                  N 1
                                                                  N II-
                                                                    II- 2
                                                                  N 2
                                                                  N II-
                                                                    II 3
                                                                 L -3
                                                                 LIIG
                                                                    G -2
                                                                                                                              > shot-level comparative results:




                                                                      -
                                                                 L 2
                                                                 LIIG
                                                                    G -4
                                                                      -
                                                                 L 4
                                                                 LIIG
                                                                    G -3
                                                                      -
                                                                 L 3
                                                                 LIIG
                                                                    G -1
                                                               TU -
                                                               TU M 1
                                                                    M-
                                                               TU 5
                                                               TU -5
                                                                    M
                                                                    M -3
                                                               TU -3
                                                               TU
                                                                    M
                                                                    M -2
                                                               TU -2
                                                               TU
                                                                    M
                                                                    M-
                                                                TE -4   4
                                                               TE
                                             Sh                     C
                                                                    C-
                                             Sh
                                                an
                                                an g            TE - 2
                                                               TE C 2
                                             Sh ha
                                             Sh gha i               C-
                                                an H TU 4
                                                an iH o TU -4
                                             Sh gha o ng M
                                                                                                                                                                  Experimental results: official runs




                                             Sh gha i ng k M--1
                                                an H
                                                an iH o ko n 1  o
                                             Sh gha o ng n g 3
                                             Sh gha ng k g--3
                                                an H
                                                an iiH o kon    o
                                                   gh n ng
                                                   gh on g g--4
                                                      ai g k 4
                                                     ai H k o
                                                        H o on
                                                          on ng -
                                                           ng g
                                                             gk -5
                                                              ko 5
                                                               onng
                                             Sh                   g--2
                                                an              TE 2
                                                               TE
                                                   gh               C
                                                                    C-
                                                      ai
                                                        H       TE - 5
                                                               TE 5
                                                          on        C
                                                             gk C--3    3
                                                               onng
                                                                  g--1
                                                                      1
                                                                AR
                                                                AR
                                                                     F
                                                                    F--1
                                                                        1
11/13
    12
Conclusions and future work

  > fair performance for a naïve attempt to violence detection;

  > a high baseline to be challenged by more sophisticated
  approaches;


  > future work:
      investigate whether the concept predictions actually helped,

      investigate contribution of modalities,

      investigate dropout vs. classic learning.



MediaEval - Pisa, Italy, 4-5 October 2012                             12/13
                                                                          13
thank you !
                        any questions ?




MediaEval - Pisa, Italy, 4-5 October 2012   13/13
                                                14

More Related Content

Viewers also liked

14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация стуStanislav Litvinenko
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012
 
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...MediaEval2012
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...MediaEval2012
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skillsJNavarro0321
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4souzadea1
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account MatchingMediaEval2012
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskMediaEval2012
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanewilovepurin
 
Designinteração– veda 3
Designinteração– veda 3Designinteração– veda 3
Designinteração– veda 3souzadea1
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация стуStanislav Litvinenko
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskMediaEval2012
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonSharon Jimenez
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval2012
 
Mr. & Mrs. S Before & After
Mr. & Mrs. S Before & AfterMr. & Mrs. S Before & After
Mr. & Mrs. S Before & AfterMichael Kret
 

Viewers also liked (20)

14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация сту
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization...
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skills
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing Task
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanew
 
Designinteração– veda 3
Designinteração– veda 3Designinteração– veda 3
Designinteração– veda 3
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация сту
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
 
Papiloma humano
Papiloma humanoPapiloma humano
Papiloma humano
 
κειμενο
κειμενοκειμενο
κειμενο
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharon
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-Tagging
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
 
Mr. & Mrs. S Before & After
Mr. & Mrs. S Before & AfterMr. & Mrs. S Before & After
Mr. & Mrs. S Before & After
 

Similar to ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywood Movies

ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationMediaEval2012
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskMediaEval2012
 
lec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdflec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdfAlamgirAkash3
 
IRJET- Survey Paper on Anomaly Detection in Surveillance Videos
IRJET-  	  Survey Paper on Anomaly Detection in Surveillance VideosIRJET-  	  Survey Paper on Anomaly Detection in Surveillance Videos
IRJET- Survey Paper on Anomaly Detection in Surveillance VideosIRJET Journal
 
D1.1. State of The Art and Requirements Analysis for Hypervideo
D1.1. State of The Art and Requirements Analysis for HypervideoD1.1. State of The Art and Requirements Analysis for Hypervideo
D1.1. State of The Art and Requirements Analysis for HypervideoLinkedTV
 
Engaging Games Storyboard - 8842
Engaging Games Storyboard - 8842Engaging Games Storyboard - 8842
Engaging Games Storyboard - 8842Lisa Durff
 
Automatic Visual Concept Detection in Videos: Review
Automatic Visual Concept Detection in Videos: ReviewAutomatic Visual Concept Detection in Videos: Review
Automatic Visual Concept Detection in Videos: ReviewIRJET Journal
 

Similar to ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywood Movies (8)

ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video Classification
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
 
Pc Seminar Jordi
Pc Seminar JordiPc Seminar Jordi
Pc Seminar Jordi
 
lec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdflec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdf
 
IRJET- Survey Paper on Anomaly Detection in Surveillance Videos
IRJET-  	  Survey Paper on Anomaly Detection in Surveillance VideosIRJET-  	  Survey Paper on Anomaly Detection in Surveillance Videos
IRJET- Survey Paper on Anomaly Detection in Surveillance Videos
 
D1.1. State of The Art and Requirements Analysis for Hypervideo
D1.1. State of The Art and Requirements Analysis for HypervideoD1.1. State of The Art and Requirements Analysis for Hypervideo
D1.1. State of The Art and Requirements Analysis for Hypervideo
 
Engaging Games Storyboard - 8842
Engaging Games Storyboard - 8842Engaging Games Storyboard - 8842
Engaging Games Storyboard - 8842
 
Automatic Visual Concept Detection in Videos: Review
Automatic Visual Concept Detection in Videos: ReviewAutomatic Visual Concept Detection in Videos: Review
Automatic Visual Concept Detection in Videos: Review
 

More from MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding MediaEval2012
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingMediaEval2012
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodMediaEval2012
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...MediaEval2012
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskMediaEval2012
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 

More from MediaEval2012 (20)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
Closing
ClosingClosing
Closing
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music Tagging
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012
 

Recently uploaded

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywood Movies

  • 1. An Uninformed Approach to Violence Detection in Hollywood Movies ARF (Austria-Romania-France) team Jan SCHLÜTER+1 Bogdan IONESCU*2,4 jan.schlueter@ofai.at bionescu@imag.pub.ro Ionuț MIRONICĂ2 Markus SCHEDL3 imironica@imag.pub.ro markus.schedl@jku.at +this work was supported by the Austrian Science Fund (FWF) under project no. Z159. *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557. 1 2 3 4 Austrian Research University Institute for Artificial POLITEHNICA Intelligence of Bucharest
  • 2. Presentation outline • The approach • Video content description & classification • Experimental results • Conclusions and future work MediaEval - Pisa, Italy, 4-5 October 2012 1/13 2
  • 3. The approach e.g. movie: Harry Potter > challenge: find a way to tag violence in movies; correlation matrix Armageddon Kill Bill The Wicker Man (on ground truth) > what approach ? different correlations between violence and concepts; high variability in appearance of violent scenes from movie to movie; training a classifier on ground-truth to predict directly the violence high low frames is questionable. MediaEval - Pisa, Italy, 4-5 October 2012 2/133
  • 4. The approach: machine learning > approach: low-level features mid-level prediction predicting violence training pred. (real values) blood training & optimizing frame-level … descriptors pred. fire violence movies & yes/no ground truth … (+ score) (annotations) pred. screams MediaEval - Pisa, Italy, 4-5 October 2012 3/134
  • 5. The approach: machine learning > approach: testing low-level features mid-level prediction predicting violence pred. blood frame-level … descriptors pred. fire violence unseen yes/no movie … (+ score) pred. screams MediaEval - Pisa, Italy, 4-5 October 2012 4/135
  • 6. Video content description - audio  standard audio features (frame-level) • Zero-Crossing Rate, • Linear Predictive Coefficients, time • Line Spectral Pairs, • Mel-Frequency Cepstral Coefficients, global • spectral centroid, flux, rolloff, and f1 f2 … fn feature = kurtosis, + mean & + variance of each feature over var{f2} var{fn} variance a certain window. [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands] MediaEval - Pisa, Italy, 4-5 October 2012 5/13 6
  • 7. Video content description - visual  feature descriptors (frame-level) • Histogram of oriented Gradients (HoG) ~ counts occurrences of gradient orientation in localized portions of an image (20º per bin);  color descriptors (frame-level) • Color naming histogram ~ project colours into 11 universal color names (black, blue, brown, grey, green, orange, pink, purple, red, white, and yellow); [J. van de Weijer et al. IEEE TIP’09]  visual activity (frame-level) high values will 9 2 account for important visual changes ~ action time [B. Ionescu et al. IEEE ICASSP’06] MediaEval - Pisa, Italy, 4-5 October 2012 6/13 7
  • 8. Classifier: multi-layer perceptron desc. dim. 512 units 1-5 (~concept tags) - training using back-propagation, - use 'dropout' to reduce overfitting: a fraction of units is randomly omitted for each training case so a unit cannot rely on all other units being present. [G. Hinton et al. arXiv.org’12] MediaEval - Pisa, Italy, 4-5 October 2012 7/13 8
  • 9. Experimental results: concept prediction > validation of the concept predictor (on the 15 train movies); > use concept ground truth; the purely visual * concepts obtain high Fscore mainly because they are rare, blood detector not that accurate (e.g. missed most blood in “Kill Bill”), best results for fire and explosions (prominent yellow tones), gunshots leave-one-movie-out cross-validation and screams. *results reported for an optimum threshold MediaEval - Pisa, Italy, 4-5 October 2012 8/13 9
  • 10. Experimental results: violence prediction > validation of the violence predictor (on the 15 train movies); > input: descriptors + mid-level predictions (real numbers); > use violence ground truth; + median filtering for predictions 0.41 0.46 0.3 0.34 0.23 0.27 prec. rec. F-sc. prec. rec. F-sc. optimal threshold optimal threshold leave-one-movie-out cross-validation MediaEval - Pisa, Italy, 4-5 October 2012 9/13 10
  • 11. Experimental results: official runs > segment/shot violence decision: assign the frame-wise highest prediction score + thresholding; > segment-level results: precision 0.28, recall 0.49, F-score 0.36, MAP@100 0.55; > shot-level results: results vary significantly with the movie MediaEval - Pisa, Italy, 4-5 October 2012 10/13 11
  • 12. D 0,1 0,2 0,3 0,4 0,5 0,6 0,7 D YN 0,05 0,15 0,25 0,35 0,1 0,2 0,3 00 Y I D NI --5 D YN 5 Y I D NI -1 D YN -1 Y I D NI --4 4 D YN YNI - MAP TU I - 3 TU B 3 D B -5 D YN -5 YNI -2 MAP@100 TE I -2 TE C C- TU - 1 TU B 1 B--2 N 2 N II- TU II-5 5 TU B B- TU - 4 TU B 4 B- TU - 1 TU B 1 B--3 N 3 N II- II- 4 N 4 N II- II- 1 MediaEval - Pisa, Italy, 4-5 October 2012 N 1 N II- II- 2 N 2 N II- II 3 L -3 LIIG G -2 > shot-level comparative results: - L 2 LIIG G -4 - L 4 LIIG G -3 - L 3 LIIG G -1 TU - TU M 1 M- TU 5 TU -5 M M -3 TU -3 TU M M -2 TU -2 TU M M- TE -4 4 TE Sh C C- Sh an an g TE - 2 TE C 2 Sh ha Sh gha i C- an H TU 4 an iH o TU -4 Sh gha o ng M Experimental results: official runs Sh gha i ng k M--1 an H an iH o ko n 1 o Sh gha o ng n g 3 Sh gha ng k g--3 an H an iiH o kon o gh n ng gh on g g--4 ai g k 4 ai H k o H o on on ng - ng g gk -5 ko 5 onng Sh g--2 an TE 2 TE gh C C- ai H TE - 5 TE 5 on C gk C--3 3 onng g--1 1 AR AR F F--1 1 11/13 12
  • 13. Conclusions and future work > fair performance for a naïve attempt to violence detection; > a high baseline to be challenged by more sophisticated approaches; > future work:  investigate whether the concept predictions actually helped,  investigate contribution of modalities,  investigate dropout vs. classic learning. MediaEval - Pisa, Italy, 4-5 October 2012 12/13 13
  • 14. thank you ! any questions ? MediaEval - Pisa, Italy, 4-5 October 2012 13/13 14