Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The MediaEval 2012 Affect Task: Violent Scenes Detectio

1,078 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The MediaEval 2012 Affect Task: Violent Scenes Detectio

  1. 1. Affect Task: Violent scenes detection Task overview October, 03 2012 MediaEval 2012 Guillaume, Mohammad, Cédric & Claire-Hélène
  2. 2. Task definition Second year! Derives from a Technicolor Use Case  Helping users choose movies that are suitable for children in their family by proposing a preview of the most violent segments Very same definition “Physical violence or accident resulting in human injury or pain”  As objective as possible  But:  Dead people without seeing how they appear to be dead => Not annotated  Somebody hurting himself while shaving => Annotated  Does not match the use case…2 10/08/12
  3. 3. Task definition Two types of runs  Primary and required run at shot level,  i.e. a decision violent/non violent should be provided for each movie shot  Optional run at segment level,  i.e. violent segments (starting and ending times) should be extracted by the participants  Scores are required to compute the official measure Rules  Any features automatically extracted from the DVDs can be used  This includes audio, video and subtitles  No external additional data (e.g. from the internet)3 10/08/12
  4. 4. Data set 18 Hollywood movies purchased by participants  Of different genre (from extremely violent to non violent) both in the learning and test sets.4 10/08/12
  5. 5. Data set – development set Movie Duration Shot # Violence duration (%) Violent shots (%) Armageddon 8680.16 3562 14.03 14.6 Billy elliot 6349.44 1236 5.14 4.21 Eragon 5985.44 1663 11.02 16.6 Harry Potter 5 7953.52 1891 10.46 13.43 I am Legend 5779.92 1547 12.75 20.43 Leon 6344.56 1547 4.3 7.24 Midnight Express 6961.04 1677 7.28 11.15 Pirates Carib. 8239.44 2534 11.3 12.47 Reservoir Dogs 5712.96 856 11.55 12.38 Saving private Ryan 9751.0 2494 12.92 18.81 The Six Senth 6178.04 963 1.34 2.80 The wicker man 5870.44 1638 8.36 6.72 Kill Bill1 5626.6 1597 17.4 24.8 The Bourne Identity 5877.6 1995 7.5 9.3 The wizard of Oz 5415.7 908 5.5 5.0 TOTAL 100725.8 (27h58min) 26108 9.39 11.995 10/08/12
  6. 6. Data set – test set Movie Duration Shot # Violence duration (%) Violent shots (%) Dead Poets Society 7413.24 1583 0.75 2.15 Fight Club 8005.72 2335 7.61 13.28 Independence Day 8834.32 2652 6.4 13.99 TOTAL 24253.28 6570 4.92 9.80 (6h44min)6 10/08/12
  7. 7. Annotations & additional data Groundtruth manually created by 7 human assessors:  Segments containing violent events according to the definition  One unique violent action per segment wherever possible  Or tag ‘multiple_action_scenes’  7 high level video concepts:  Presence of blood  Presence of fire  Presence of guns or assimilated weapons  Presence of cold arms (knives or assimilated weapons)  Fights (1 against 1, small, large, distant attack)  Car chases  Gory scenes (graphic images of bloodletting and/or tissue damage)  3 high level audio concepts:  Gunshots, cannon fire  Screams, effort noise  Explosions Automatically generated shot boundaries with keyframes7 10/08/12
  8. 8. Results
  9. 9. Evaluation metrics  Official measure : Mean Average Precision @100  Average precision at the 100 top ranked violent shots, over the 3 test movies  For comparison purpose with 2011, the MediaEval Cost  C fa = 1 C = C fa Pfa + Cmiss Pmiss where  Cmiss = 10and Pfa Pmiss are the estimated probabilities of false alarm and missed detection  Additional metrics:  false alarm rate, miss detection rate, precision, recall, F-measure, MAP@20, MAP  Detection error trade-off (DET) curves9 10/08/12
  10. 10. Task participation Survey:  35 teams manifested interest for the task (among which 12 were very interested)  2011: 13 teams Registration:  11 teams = 6 core partipants + 1 organizers team + 4 additional teams  At least, 3 joint submissions - 16 research teams - 9 countries  3 teams already worked on the detection of violence in movies  2011: 6 teams = 4 + 2 organizers, 1 joint submission, 4 countries Submission:  7 teams + 1 organizers team  We have lost 3 teams (corpus availability, economical issues, low performance)  Grand total of 36 runs: 35 at shot level and 1 brave submission at segment level!  2011: 29 runs at shot level, 4 teams + 2 organizers teams Workshop participation:  6 teams  2011: 3 teams10 10/08/12
  11. 11. Task baseline – random classification Movie MAP@100 Dead Poets Society 2.17 Fight Club 13.27 Independence Day 13.98 Total 9.0811 10/08/12
  12. 12. Task participation Run 2011 WorkshopRegistration Country MAP@100 MediaEvalCost submission participation Participation 1 (shot) 65.05 3.56ARF Austria X 1 (segment) 54.82 5.13DYNI – LSIS France 5 X 12.44 7.96NII - Video Processing Japan 5 X 30.82 1.28LabShanghai-Hongkong China 5 X 62.38 5.52TUB - DAI Germany 5 X X 18.53 4.20 Germany-TUM 5 X 48.43 7.83 AustriaLIG - MRIM France 4 X X 31.37 4.16TEC* France-UK 5 X X 61.82 3.56 8 teamsTotal 36 5 6 (75%) (23%)Rand. classification 9.8 *: task organizer Best run according to the MAP@100. 12 10/08/12
  13. 13. Task participation Run 2011 WorkshopRegistration Country MAP@100 MediaEvalCost submission participation Participation 1 (shot) 65.05 3.56ARF Austria X 1 (segment) 54.82 5.13DYNI – LSIS France 5 X 12.44 7.96NII - Video Processing Japan 5 X 30.82 1.28LabShanghai-Hongkong China 5 X 62.38 5.52TUB - DAI Germany 5 X X 18.53 4.20 Germany-TUM 5 X 48.43 7.83 AustriaLIG - MRIM France 4 X X 31.37 4.16TEC* France-UK 5 X X 61.82 3.56 8 teamsTotal 36 5 6 (75%) (23%)Rand. classification 9.8 *: task organizer Best run according to the MAP@100. 13 10/08/12
  14. 14. Task participation Run 2011 WorkshopRegistration Country MAP@100 MediaEvalCost submission participation Participation 1 (shot) 65.05 3.56ARF Austria X 1 (segment) 54.82 5.13DYNI – LSIS France 5 X 12.44 7.96NII - Video Processing Japan 5 X 30.82 1.28LabShanghai-Hongkong China 5 X 62.38 5.52TUB - DAI Germany 5 X X 18.53 4.20 Germany-TUM 5 X 48.43 7.83 AustriaLIG - MRIM France 4 X X 31.37 4.16TEC* France-UK 5 X X 61.82 3.56 8 teamsTotal 36 5 6 (75%) (23%)Rand. classification 9.8 *: task organizer Best run according to the MAP@100. 14 10/08/12
  15. 15. Learned points Features:  Mainly classic low-level features either audio or video  Mainly computed at frame level Classification step:  Mainly supervised machine learning systems  Mostly SVM-based, 1 NN, 1BN  Two systems based on similarity computation (KNN) Multimodality:  Is audio, video, audio and video more informative? No real convergence  No use of text features Mid-level concepts:  YES! This year, they were largelly used (4 teams out of 8)  Seems promising, for some of them (except blood)  But how to use them? (as additional features, as an intermediate step) Test set: seems that…  It worked better on Independence Day and Dead Poets Society was more difficult.  Due to some similarity with other movies in the dev set?  Generalization issue?15 10/08/12
  16. 16. DET curves (best run per participant-MAP@100)16 10/08/12
  17. 17. Recall vs. Precision (best run per participant – MAP@100) 17 10/08/12
  18. 18. Conclusions & perspectives Success of the task  Increased number of participants  Attracked people from the domain  Quality of results has deeply increased MediaEval2013  Which task definition?  How to go one step further in the multimodality?  Text is still not used  Who will join the organizers’ group for next year?18 10/08/12

×