TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning

•Download as PPTX, PDF•

0 likes•349 views

This paper describes the participation of the TUB-IRML group to the MediaEval 2014 Violent Scenes Detection (VSD) affect task. We employ low- and mid-level audio-visual features fused at the decision level. We perform feature space partitioning of training samples through k-means clustering and train a different model for each cluster. These models are then used to predict the violence level of videos by employing two-class support vector machines (SVMs) and a classifier selection approach. The experimental results obtained on Hollywood movies and short Web videos show the superiority of mid-level audio features over visual features in terms of discriminative power, and a further enhanced performance resulting from the fusion of audio-visual cues at the decision-level. Finally, the results also demonstrate a performance gain obtained by partitioning the feature space and training multiple models, compared to a unique violence detection model. http://ceur-ws.org/Vol-1263/mediaeval2014_submission_68.pdf

Software

Competence Center Information Retrieval & Machine Learning
TUB-IRML at MediaEval 2014 Violent Scenes Detection
Task: Violence Modeling through Feature Space Partitioning
Esra Acar, Sahin Albayrak

Outline
216 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
►The Violence Detection Method
Video Representation
Violence Detection Model
►Results & Discussion
►Conclusions & Future Work

The Violence Detection Method
316 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
►The two main components of our method are:
(1) the representation of video segments, and
(2) the learning of a violence model.

Video Representation (1)
416 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
The generation process of sparse coding based audio and visual representations for video segments.

Video Representation (2)
516 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
The generation of audio and visual dictionaries with sparse coding.

Video Representation (3)
616 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
► In addition to the mid-level audio and visual representations,
we use low-level features which are:
 Motion-related descriptors – Violent Flow (ViF) which is a
descriptor proposed for real-time detection of violent crowd
behaviors, and
 Static content representations – Affect-related color
descriptors such as statistics on saturation, brightness and
hue in the HSL color space, and colorfulness.

Violence Detection Model
716 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
► Violence is a concept which can audio-visually be expressed in
diverse manners.
► We learn multiple models for the violence concept instead of a
unique model.
 Feature space partitioning by clustering video segments in
the training dataset, and
 Learn a different model for each violence sub-concept.
► We perform a classifier selection to solve the classifier
combination issue.

Results & Discussion (1)
816 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
Method MAP2014 –
Movies
MAP@100 –
Movies
MAP2014 –
Web videos
MAP@100 –
Web videos
Run1 0.169 0.368 0.517 0.582
Run2 0.139 0.284 0.371 0.478
Run3 0.080 0.208 0.477 0.495
Run4 0.172 0.409 0.489 0.586
Run5 0.170 0.406 0.479 0.567
SVM-based
unique model
0.093 0.302 - -
Run1  MFCC-based mid-level audio representations
Run2  HoG- and HoF-based mid-level features and ViF
Run3  Affect-related color features
Run4  Audio and visual features (except color)
Run5  All audio-visual representations are linearly fused at the decision level
The MAP2014 and MAP@100 of our method with different representations

Results & Discussion (2)
916 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
► The mid-level audio representation (Run1) provides promising
performance and outperforms all other representations (Run2
& 3).
► The performance is further improved by decision-level fusion
(Run4).
► Affect-related color features does NOT help much (Run5).
► The results on the Web video dataset demonstrate superior
results (i.e., our method generalizes well).
► Affect-related color features seem to provide better results on
the Web video dataset (Run3).
► Our method outperforms the SVM-based unique model.

Conclusions & Future Work
1016 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task
► The mid-level audio representation based on MFCC and
sparse coding
provides promising performance in terms of MAP2014 and
MAP@100 metrics, and
also outperforms our visual representations.
► As a future work, we need to
extend/improve our visual representation set, and
further investigate the feature space partitioning concept.

Competence Center Information Retrieval &
Machine Learning
www.dai-labor.de
Fon
Fax
+49 (0) 30 / 314 – 74
+49 (0) 30 / 314 – 74 003
DAI-Labor
Technische Universität Berlin
Fakultät IV – Elektrontechnik & Informatik
Sekretariat TEL 14
Ernst-Reuter-Platz 7
10587 Berlin, Deutschland
11
Esra Acar
Researcher
M.Sc.
esra.acar@tu-berlin.de
Thanks!
013
TUB-IRML at MediaEval 2014 Violent Scenes Detection Task16 October 2014

Similar to TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning

TUB-IRML at MediaEval 2014 Visual Privacy Task: Privacy Filtering through Blu...multimediaeval

TUB-IRML at the MediaEval 2014 Visual Privacy TaskEsra Açar

CR-Play INRIA engineer part (Capture Reconstruct and Play with Image Based Re...Jérôme Esnault

Fehlmann and Kranich - Measuring tests using cosmicInternational Software Benchmarking Standards Group (ISBSG)

Managing a Software Ecosystem Using a Multiple Software Product Line: a Case ...Simon Urli

MediaEval 2016 - UNIFESP Predicting Media Interestingness Taskmultimediaeval

CHOReVOLUTION WP2 Enablers CHOReVOLUTION

Automatic Subtitle Generation For Sound In VideosAsia Smith

Similar to TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning (8)

TUB-IRML at MediaEval 2014 Visual Privacy Task: Privacy Filtering through Blu...

TUB-IRML at the MediaEval 2014 Visual Privacy Task

CR-Play INRIA engineer part (Capture Reconstruct and Play with Image Based Re...

Fehlmann and Kranich - Measuring tests using cosmic

Managing a Software Ecosystem Using a Multiple Software Product Line: a Case ...

MediaEval 2016 - UNIFESP Predicting Media Interestingness Task

CHOReVOLUTION WP2 Enablers

Automatic Subtitle Generation For Sound In Videos

Recently uploaded

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

2.pdf Ejercicios de programación competitivaDiego Iván Oliveros Acosta

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC

PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

Introduction Computer Science - Software Design.pdfFerryKemperman

Software Coding for software engineeringssuserb3a23b

What is Advanced Excel and what are some best practices for designing and cre...Technogeeks

Powering Real-Time Decisions with Continuous Data StreamsSafe Software

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler

SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa

Recently uploaded (20)

Unveiling the Future: Sylius 2.0 New Features

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

cpct NetworkING BASICS AND NETWORK TOOL.ppt

Unveiling Design Patterns: A Visual Guide with UML Diagrams

2.pdf Ejercicios de programación competitiva

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

CRM Contender Series: HubSpot vs. Salesforce

Software Project Health Check: Best Practices and Techniques for Your Product...

PREDICTING RIVER WATER QUALITY ppt presentation

Implementing Zero Trust strategy with Azure

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

Introduction Computer Science - Software Design.pdf

Software Coding for software engineering

What is Advanced Excel and what are some best practices for designing and cre...

Powering Real-Time Decisions with Continuous Data Streams

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars

SpotFlow: Tracking Method Calls and States at Runtime

TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning

1. Competence Center Information Retrieval & Machine Learning TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning Esra Acar, Sahin Albayrak

2. Outline 216 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task ►The Violence Detection Method Video Representation Violence Detection Model ►Results & Discussion ►Conclusions & Future Work

3. The Violence Detection Method 316 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task ►The two main components of our method are: (1) the representation of video segments, and (2) the learning of a violence model.

4. Video Representation (1) 416 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task The generation process of sparse coding based audio and visual representations for video segments.

5. Video Representation (2) 516 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task The generation of audio and visual dictionaries with sparse coding.

6. Video Representation (3) 616 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task ► In addition to the mid-level audio and visual representations, we use low-level features which are:  Motion-related descriptors – Violent Flow (ViF) which is a descriptor proposed for real-time detection of violent crowd behaviors, and  Static content representations – Affect-related color descriptors such as statistics on saturation, brightness and hue in the HSL color space, and colorfulness.

7. Violence Detection Model 716 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task ► Violence is a concept which can audio-visually be expressed in diverse manners. ► We learn multiple models for the violence concept instead of a unique model.  Feature space partitioning by clustering video segments in the training dataset, and  Learn a different model for each violence sub-concept. ► We perform a classifier selection to solve the classifier combination issue.

8. Results & Discussion (1) 816 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task Method MAP2014 – Movies MAP@100 – Movies MAP2014 – Web videos MAP@100 – Web videos Run1 0.169 0.368 0.517 0.582 Run2 0.139 0.284 0.371 0.478 Run3 0.080 0.208 0.477 0.495 Run4 0.172 0.409 0.489 0.586 Run5 0.170 0.406 0.479 0.567 SVM-based unique model 0.093 0.302 - - Run1  MFCC-based mid-level audio representations Run2  HoG- and HoF-based mid-level features and ViF Run3  Affect-related color features Run4  Audio and visual features (except color) Run5  All audio-visual representations are linearly fused at the decision level The MAP2014 and MAP@100 of our method with different representations

9. Results & Discussion (2) 916 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task ► The mid-level audio representation (Run1) provides promising performance and outperforms all other representations (Run2 & 3). ► The performance is further improved by decision-level fusion (Run4). ► Affect-related color features does NOT help much (Run5). ► The results on the Web video dataset demonstrate superior results (i.e., our method generalizes well). ► Affect-related color features seem to provide better results on the Web video dataset (Run3). ► Our method outperforms the SVM-based unique model.

10. Conclusions & Future Work 1016 October 2014 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task ► The mid-level audio representation based on MFCC and sparse coding provides promising performance in terms of MAP2014 and MAP@100 metrics, and also outperforms our visual representations. ► As a future work, we need to extend/improve our visual representation set, and further investigate the feature space partitioning concept.

11. Competence Center Information Retrieval & Machine Learning www.dai-labor.de Fon Fax +49 (0) 30 / 314 – 74 +49 (0) 30 / 314 – 74 003 DAI-Labor Technische Universität Berlin Fakultät IV – Elektrontechnik & Informatik Sekretariat TEL 14 Ernst-Reuter-Platz 7 10587 Berlin, Deutschland 11 Esra Acar Researcher M.Sc. esra.acar@tu-berlin.de Thanks! 013 TUB-IRML at MediaEval 2014 Violent Scenes Detection Task16 October 2014

TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning

Recommended

Recommended

More Related Content

Similar to TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning

Similar to TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning (8)

More from multimediaeval

More from multimediaeval (20)

Recently uploaded

Recently uploaded (20)

TUB-IRML at MediaEval 2014 Violent Scenes Detection Task: Violence Modeling through Feature Space Partitioning