The MediaEval 2012 Affect Task: Violent Scenes Detectio

Affect Task: Violent scenes detection
Task overview
October, 03 2012

MediaEval 2012
Guillaume, Mohammad, Cédric & Claire-Hélène

Task definition
 Second year!
 Derives from a Technicolor Use Case
 Helping users choose movies that are suitable for children in their family by
proposing a preview of the most violent segments
 Very same definition
“Physical violence or accident resulting in human injury or pain”
 As objective as possible
 But:
 Dead people without seeing how they appear to be dead => Not annotated
 Somebody hurting himself while shaving => Annotated
 Does not match the use case…

2 10/08/12

Task definition
 Two types of runs
 Primary and required run at shot level,
 i.e. a decision violent/non violent should be provided for each movie shot
 Optional run at segment level,
 i.e. violent segments (starting and ending times) should be extracted by the
participants
 Scores are required to compute the official measure

 Rules
 Any features automatically extracted from the DVDs can be used
 This includes audio, video and subtitles
 No external additional data (e.g. from the internet)

3 10/08/12

Data set
 18 Hollywood movies purchased by participants
 Of different genre (from extremely violent to non violent) both in the learning and
test sets.

4 10/08/12

Data set – development set

Movie Duration Shot # Violence duration (%) Violent shots (%)
Armageddon 8680.16 3562 14.03 14.6
Billy elliot 6349.44 1236 5.14 4.21
Eragon 5985.44 1663 11.02 16.6
Harry Potter 5 7953.52 1891 10.46 13.43
I am Legend 5779.92 1547 12.75 20.43
Leon 6344.56 1547 4.3 7.24
Midnight Express 6961.04 1677 7.28 11.15
Pirates Carib. 8239.44 2534 11.3 12.47
Reservoir Dogs 5712.96 856 11.55 12.38
Saving private Ryan 9751.0 2494 12.92 18.81
The Six Senth 6178.04 963 1.34 2.80
The wicker man 5870.44 1638 8.36 6.72
Kill Bill1 5626.6 1597 17.4 24.8
The Bourne Identity 5877.6 1995 7.5 9.3
The wizard of Oz 5415.7 908 5.5 5.0
TOTAL 100725.8 (27h58min) 26108 9.39 11.99

5 10/08/12

Data set – test set

Movie Duration Shot # Violence duration (%) Violent shots (%)
Dead Poets Society 7413.24 1583 0.75 2.15

Fight Club 8005.72 2335 7.61 13.28

Independence Day 8834.32 2652 6.4 13.99

TOTAL 24253.28 6570 4.92 9.80
(6h44min)

6 10/08/12

Annotations & additional data
 Groundtruth manually created by 7 human assessors:
 Segments containing violent events according to the definition
 One unique violent action per segment wherever possible
 Or tag ‘multiple_action_scenes’

 7 high level video concepts:
 Presence of blood
 Presence of fire
 Presence of guns or assimilated weapons
 Presence of cold arms (knives or assimilated weapons)
 Fights (1 against 1, small, large, distant attack)
 Car chases
 Gory scenes (graphic images of bloodletting and/or tissue damage)

 3 high level audio concepts:
 Gunshots, cannon fire
 Screams, effort noise
 Explosions

 Automatically generated shot boundaries with keyframes
7 10/08/12

Evaluation metrics
 Official measure : Mean Average Precision @100
 Average precision at the 100 top ranked violent shots, over the 3 test movies

 For comparison purpose with 2011, the MediaEval Cost
 C fa = 1
C = C fa Pfa + Cmiss Pmiss where

Cmiss = 10
and Pfa Pmiss are the estimated probabilities of false alarm and missed detection

 Additional metrics:
 false alarm rate, miss detection rate, precision, recall, F-measure, MAP@20, MAP
 Detection error trade-off (DET) curves

9 10/08/12

Task participation
 Survey:
 35 teams manifested interest for the task (among which 12 were very interested)
 2011: 13 teams
 Registration:
 11 teams = 6 core partipants + 1 organizers team + 4 additional teams
 At least, 3 joint submissions - 16 research teams - 9 countries
 3 teams already worked on the detection of violence in movies
 2011: 6 teams = 4 + 2 organizers, 1 joint submission, 4 countries
 Submission:
 7 teams + 1 organizers team
 We have lost 3 teams (corpus availability, economical issues, low performance)
 Grand total of 36 runs: 35 at shot level and 1 brave submission at segment level!
 2011: 29 runs at shot level, 4 teams + 2 organizers teams
 Workshop participation:
 6 teams
 2011: 3 teams

10 10/08/12

Task baseline – random classification

Movie MAP@100
Dead Poets Society 2.17
Fight Club 13.27
Independence Day 13.98
Total 9.08

11 10/08/12

Task participation

Run 2011 Workshop
Registration Country MAP@100 MediaEvalCost
submission participation Participation

1 (shot) 65.05 3.56
ARF Austria X
1 (segment) 54.82 5.13
DYNI – LSIS France 5 X 12.44 7.96
NII - Video Processing
Japan 5 X 30.82 1.28
Lab
Shanghai-Hongkong China 5 X 62.38 5.52
TUB - DAI Germany 5 X X 18.53 4.20
Germany-
TUM 5 X 48.43 7.83
Austria
LIG - MRIM France 4 X X 31.37 4.16

TEC* France-UK 5 X X 61.82 3.56

8 teams
Total 36 5 6 (75%)
(23%)
Rand. classification 9.8

*: task organizer
Best run according to the MAP@100.
12 10/08/12

Task participation

Run 2011 Workshop

1 (shot) 65.05 3.56
ARF Austria X
1 (segment) 54.82 5.13
Japan 5 X 30.82 1.28
Lab
Germany-
TUM 5 X 48.43 7.83
Austria


8 teams
Total 36 5 6 (75%)
(23%)
*: task organizer
13 10/08/12

Task participation

Run 2011 Workshop

1 (shot) 65.05 3.56
ARF Austria X
1 (segment) 54.82 5.13
Japan 5 X 30.82 1.28
Lab
Germany-
TUM 5 X 48.43 7.83
Austria


8 teams
Total 36 5 6 (75%)
(23%)
*: task organizer
14 10/08/12

Learned points
 Features:
 Mainly classic low-level features either audio or video
 Mainly computed at frame level

 Classification step:
 Mainly supervised machine learning systems
 Mostly SVM-based, 1 NN, 1BN
 Two systems based on similarity computation (KNN)

 Multimodality:
 Is audio, video, audio and video more informative? No real convergence
 No use of text features

 Mid-level concepts:
 YES! This year, they were largelly used (4 teams out of 8)
 Seems promising, for some of them (except blood)
 But how to use them? (as additional features, as an intermediate step)

 Test set: seems that…
 It worked better on Independence Day and Dead Poets Society was more difficult.
 Due to some similarity with other movies in the dev set?
 Generalization issue?

15 10/08/12

DET curves (best run per participant-MAP@100)

16 10/08/12

Recall vs. Precision (best run per participant – MAP@100)

17 10/08/12

Conclusions & perspectives
 Success of the task
 Increased number of participants
 Attracked people from the domain
 Quality of results has deeply increased

 MediaEval2013
 Which task definition?
 How to go one step further in the multimodality?
 Text is still not used
 Who will join the organizers’ group for next year?

18 10/08/12

The MediaEval 2012 Affect Task: Violent Scenes Detectio

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

More from MediaEval2012

More from MediaEval2012 (20)

Recently uploaded

Recently uploaded (20)

The MediaEval 2012 Affect Task: Violent Scenes Detectio