The Shanghai-Hongkong Team at    MediaEval2012: ViolentScene Detection Using Trajectory-        based Features Yu-Gang Jia...
Outlines• Introduction• Framework• Feature Extraction• Classifiers• Temporal Smoothing• Results• Discussions• First 20 cli...
Introduction• Violent Scene Detection task [1] -  practical challenge, great potential in  applications.• Focus on novel f...
Framework                                                  All features   Temporal          χ2                            ...
Feature Extraction• Trajectory-based features [2]:    - dense trajectory, HOG, HOF, MBH [5]    - TrajMF (relative location...
Feature Extraction• SIFT [4]• STIP [3]• MFCC• Concept-based Features (10 concepts: blood,  carchase, coldarms, fights, fir...
Classifiers• BoW representation• Chi-squared kernel SVMs• Kernel level early fusion is used to  combine multiple features
Temporal Smoothing• Feature Smoothing – averaged  features over a three-shot window.• Score Smoothing – averaged  predicti...
Results (mAP@20)                                   0.8    Mean Average Precision at 20                                   0...
Results (mAP@100)                                    0.7    Mean Average Precision at 100                                 ...
Discussions• SIFT + STIP + MFCC show insignificant  improvement. TrajMF has encoded the rich  information of SIFT and STIP...
First 20 clips retrieved
Thank You
Upcoming SlideShare
Loading in …5
×

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features

990 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
990
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features

  1. 1. The Shanghai-Hongkong Team at MediaEval2012: ViolentScene Detection Using Trajectory- based Features Yu-Gang Jiang*, Qi Dai*, Chun Chet Tan**, Xiangyang Xue*, Chong-Wah Ngo** *School of Computer Science, Fudan University, Shanghai **Department of Computer Science, City University of Hong Kong, HK MediaEval 2012 Workshop, Oct 4-5, Pisa, Italy
  2. 2. Outlines• Introduction• Framework• Feature Extraction• Classifiers• Temporal Smoothing• Results• Discussions• First 20 clips retrieved
  3. 3. Introduction• Violent Scene Detection task [1] - practical challenge, great potential in applications.• Focus on novel features.• Top performance in mAP@20, runner-up in mAP@100[1] C.-H. Demarty, C. Penet, G. Gravier, and M. Soleymani. The MediaEval 2012 Affect Task: Violent Scenes Detection. In MediaEval 2012 Workshop, Pisa, Italy, 2012.
  4. 4. Framework All features Temporal χ2 except feature kernel 2 concept-based smoothing SVM Feature extraction Classifiers Trajectory-based (7 features) 5Video shots SIFT Detection score-level χ2 4 temporal 1 Spatial-temporal interest point kernel smoothing SVM MFCC audio 3 feature Concept-based The circled numbers indicate the 5 submitted runs
  5. 5. Feature Extraction• Trajectory-based features [2]: - dense trajectory, HOG, HOF, MBH [5] - TrajMF (relative locations and motions between trajectory pairs) - Trajectory shape feature• Advantages: robust to camera movement, rich information, implicitly capture object-object and object-background relationships.[2] Y.-G. Jiang, Q. Dai, X. Xue, W. Liu, and C.-W. Ngo. Trajectory-based modeling of human actions with motion reference points. In ECCV, 2012.[5] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.
  6. 6. Feature Extraction• SIFT [4]• STIP [3]• MFCC• Concept-based Features (10 concepts: blood, carchase, coldarms, fights, fire, firearms, gore, explosions, gunshots, screams)[3] I. Laptev. On space-time interest points. International Journal of Computer Vision, 64:107-123, 2005.[4] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60:91-110, 2004.
  7. 7. Classifiers• BoW representation• Chi-squared kernel SVMs• Kernel level early fusion is used to combine multiple features
  8. 8. Temporal Smoothing• Feature Smoothing – averaged features over a three-shot window.• Score Smoothing – averaged prediction scores over a three-shot window.
  9. 9. Results (mAP@20) 0.8 Mean Average Precision at 20 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 r3 r2 r5 r4 r1• Run 5: 7 dense trajectory features • Run 2: Run 4 + feature smoothing• Run 4: Run 5 + SIFT + STIP + MFCC • Run 1: Run 4 + score smoothing• Run 3: Run 4 + concept scores
  10. 10. Results (mAP@100) 0.7 Mean Average Precision at 100 0.6 0.5 0.4 0.3 0.2 0.1 0 r3 r4 r5 r2 r1• Run 5: 7 dense trajectory features • Run 2: Run 4 + feature smoothing• Run 4: Run 5 + SIFT + STIP + MFCC • Run 1: Run 4 + score smoothing• Run 3: Run 4 + concept scores
  11. 11. Discussions• SIFT + STIP + MFCC show insignificant improvement. TrajMF has encoded the rich information of SIFT and STIP.• Concept-based scores do not improve the performances - overfitting SVMs due to insufficient training data. In fact, using mid- level concept detectors is a promising direction.• Score smoothing boosts the performances. Feature smoothing that “blurs” the features across shots might not be a good option.
  12. 12. First 20 clips retrieved
  13. 13. Thank You

×