RECOD at MediaEval 2014: Violent Scenes Detection Task

•

1 like•224 views

This paper presents the RECOD approaches used in the MediaEval 2014 Violent Scenes Detection task. Our system is based on the combination of visual, audio, and text features. We also evaluate the performance of a convolutional network as a feature extractor. We combined those features using a fusion scheme. We participated in the main and the generalization tasks. http://ceur-ws.org/Vol-1263/mediaeval2014_submission_89.pdf

Software

REC D @ MediaEval 2014: Violent Scenes Detection Task
Sandra Avila2, Daniel Moreira1, Mauricio Perez1, Daniel Moraes3, Isabela Cota1,
Vanessa Testoni3, Eduardo Valle2, Siome Goldenstein1, Anderson Rocha1
1Institute of Computing, University of Campinas (Unicamp), SP, Brazil
2School of Electrical and Computing Engineering, University of Campinas (Unicamp), SP, Brazil
3Samsung Research Institute Brazil, SP, Brazil
Abstract
This paper presents the RECOD approaches used in the
MediaEval 2014 Violent Scenes Detection task. Our system is
based on the combination of visual, audio, and text
features. We also evaluate the performance of a convolutional
network as a feature extractor. We combined those features
using a fusion scheme. We participated in the main and the
generalization tasks.
Framework
Runs Submitted
Institute of Computing
System Description*
Visual Features: SURF, dense trajectories and motion boundary descriptors,
PCA, bag of visual words-based representation;
Audio Features: Several types of audio features, PCA, bag of visual words-based
Text Features: movie subtitles, bag of words approach;
Classification: SVM classifiers using LIBSVM library;
For the main task (m):
●m1: 3 types of audio features + 3 types of visual features (including a
visual feature extractor based on CNNs) + text features;
●m2: 1 type of audio features + 3 types of visual features (including a
visual feature extractor based on CNNs) + text features;
●m3: 1 type of audio features + 3 types of visual features (including a
visual feature extractor based on CNNs);
●m4: 1 type of audio features + 2 types of visual features + text features;
●m5: 1 type of audio features.
Results
representation;
Fusion Prediction
For the generalization task (g):
●g1: 3 types of audio features + 3 types of visual features (including a
visual feature extractor based on CNNs);
●g2: 1 type of audio features + 3 types of visual features (including a
visual feature extractor based on CNNs);
●g3: 1 type of audio features + 2 types of visual features (including a
visual feature extractor based on CNNs);
●g4: 1 type of audio features;
●g5: 1 type of visual features.
8mil Brav Desp Ghos Juma Term Vven MAP
m1 0.204 0.477 0.337 0.567 0.188 0.479 0.378 0.376
m2 0.239 0.459 0.308 0.348 0.362 0.465 0.431 0.373
m3 0.189 0.545 0.277 0.465 0.212 0.418 0.489 0.371
m4 0.115 0.319 0.209 0.270 0.159 0.502 0.167 0.249
m5 0.373 0.301 0.307 0.423 0.175 0.308 0.317 0.315
g1 g2 g3 g4 g5
MAP 0.618 0.615 0.604 0.545 0.515
8mil Brav Desp Ghos Juma Term Vven MAP
u1 0.351 0.601 0.636 0.530 0.521 0.352 0.463 0.493
u2 0.402 0.237 0.407 0.345 0.232 0.277 0.188 0.298
Table 1: Official results obtained for the main task in terms of MAP2014.
Table 2: Official results obtained for the
generalization task in terms of MAP2014.
Table 3: Unofficial results obtained for the main task in terms of
MAP2014. u1: 1 type of audio features; u2: text features.
*There are some technical aspects which we cannot describe given we are patenting the developed approach.
Acknowledgements: Contact: Sandra Avila
sandra@dca.fee.unicamp.br
Bag of
words-based
Bag of words
Classification
Visual Features
Audio Features
Text Features

Recently uploaded

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Software Quality Assurance Interview QuestionsArshad QA

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

TECUNIQUE: Success Stories: IT Service providermohitmore19

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Right Money Management App For Your Financial GoalsJhone kinadey

Recently uploaded (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

5 Signs You Need a Fashion PLM Software.pdf

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Optimizing AI for immediate response in Smart CCTV

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

HR Software Buyers Guide in 2024 - HRSoftware.com

Software Quality Assurance Interview Questions

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Hand gesture recognition PROJECT PPT.pptx

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

TECUNIQUE: Success Stories: IT Service provider

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Right Money Management App For Your Financial Goals

RECOD at MediaEval 2014: Violent Scenes Detection Task

1. REC D @ MediaEval 2014: Violent Scenes Detection Task Sandra Avila2, Daniel Moreira1, Mauricio Perez1, Daniel Moraes3, Isabela Cota1, Vanessa Testoni3, Eduardo Valle2, Siome Goldenstein1, Anderson Rocha1 1Institute of Computing, University of Campinas (Unicamp), SP, Brazil 2School of Electrical and Computing Engineering, University of Campinas (Unicamp), SP, Brazil 3Samsung Research Institute Brazil, SP, Brazil Abstract This paper presents the RECOD approaches used in the MediaEval 2014 Violent Scenes Detection task. Our system is based on the combination of visual, audio, and text features. We also evaluate the performance of a convolutional network as a feature extractor. We combined those features using a fusion scheme. We participated in the main and the generalization tasks. Framework Runs Submitted Institute of Computing System Description* Visual Features: SURF, dense trajectories and motion boundary descriptors, PCA, bag of visual words-based representation; Audio Features: Several types of audio features, PCA, bag of visual words-based Text Features: movie subtitles, bag of words approach; Classification: SVM classifiers using LIBSVM library; For the main task (m): ●m1: 3 types of audio features + 3 types of visual features (including a visual feature extractor based on CNNs) + text features; ●m2: 1 type of audio features + 3 types of visual features (including a visual feature extractor based on CNNs) + text features; ●m3: 1 type of audio features + 3 types of visual features (including a visual feature extractor based on CNNs); ●m4: 1 type of audio features + 2 types of visual features + text features; ●m5: 1 type of audio features. Results representation; Fusion Prediction For the generalization task (g): ●g1: 3 types of audio features + 3 types of visual features (including a visual feature extractor based on CNNs); ●g2: 1 type of audio features + 3 types of visual features (including a visual feature extractor based on CNNs); ●g3: 1 type of audio features + 2 types of visual features (including a visual feature extractor based on CNNs); ●g4: 1 type of audio features; ●g5: 1 type of visual features. 8mil Brav Desp Ghos Juma Term Vven MAP m1 0.204 0.477 0.337 0.567 0.188 0.479 0.378 0.376 m2 0.239 0.459 0.308 0.348 0.362 0.465 0.431 0.373 m3 0.189 0.545 0.277 0.465 0.212 0.418 0.489 0.371 m4 0.115 0.319 0.209 0.270 0.159 0.502 0.167 0.249 m5 0.373 0.301 0.307 0.423 0.175 0.308 0.317 0.315 g1 g2 g3 g4 g5 MAP 0.618 0.615 0.604 0.545 0.515 8mil Brav Desp Ghos Juma Term Vven MAP u1 0.351 0.601 0.636 0.530 0.521 0.352 0.463 0.493 u2 0.402 0.237 0.407 0.345 0.232 0.277 0.188 0.298 Table 1: Official results obtained for the main task in terms of MAP2014. Table 2: Official results obtained for the generalization task in terms of MAP2014. Table 3: Unofficial results obtained for the main task in terms of MAP2014. u1: 1 type of audio features; u2: text features. *There are some technical aspects which we cannot describe given we are patenting the developed approach. Acknowledgements: Contact: Sandra Avila sandra@dca.fee.unicamp.br Bag of words-based Bag of words Classification Visual Features Audio Features Text Features

RECOD at MediaEval 2014: Violent Scenes Detection Task

Recommended

Recommended

More Related Content

More from multimediaeval

More from multimediaeval (20)

Recently uploaded

Recently uploaded (20)

RECOD at MediaEval 2014: Violent Scenes Detection Task