Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features

1,084 views

Published on

Published in: Technology
  • Be the first to comment

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Trajectory-based Features

  1. 1. The Shanghai-Hongkong Team at MediaEval2012: ViolentScene Detection Using Trajectory- based Features Yu-Gang Jiang*, Qi Dai*, Chun Chet Tan**, Xiangyang Xue*, Chong-Wah Ngo** *School of Computer Science, Fudan University, Shanghai **Department of Computer Science, City University of Hong Kong, HK MediaEval 2012 Workshop, Oct 4-5, Pisa, Italy
  2. 2. Outlines• Introduction• Framework• Feature Extraction• Classifiers• Temporal Smoothing• Results• Discussions• First 20 clips retrieved
  3. 3. Introduction• Violent Scene Detection task [1] - practical challenge, great potential in applications.• Focus on novel features.• Top performance in mAP@20, runner-up in mAP@100[1] C.-H. Demarty, C. Penet, G. Gravier, and M. Soleymani. The MediaEval 2012 Affect Task: Violent Scenes Detection. In MediaEval 2012 Workshop, Pisa, Italy, 2012.
  4. 4. Framework All features Temporal χ2 except feature kernel 2 concept-based smoothing SVM Feature extraction Classifiers Trajectory-based (7 features) 5Video shots SIFT Detection score-level χ2 4 temporal 1 Spatial-temporal interest point kernel smoothing SVM MFCC audio 3 feature Concept-based The circled numbers indicate the 5 submitted runs
  5. 5. Feature Extraction• Trajectory-based features [2]: - dense trajectory, HOG, HOF, MBH [5] - TrajMF (relative locations and motions between trajectory pairs) - Trajectory shape feature• Advantages: robust to camera movement, rich information, implicitly capture object-object and object-background relationships.[2] Y.-G. Jiang, Q. Dai, X. Xue, W. Liu, and C.-W. Ngo. Trajectory-based modeling of human actions with motion reference points. In ECCV, 2012.[5] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.
  6. 6. Feature Extraction• SIFT [4]• STIP [3]• MFCC• Concept-based Features (10 concepts: blood, carchase, coldarms, fights, fire, firearms, gore, explosions, gunshots, screams)[3] I. Laptev. On space-time interest points. International Journal of Computer Vision, 64:107-123, 2005.[4] D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision, 60:91-110, 2004.
  7. 7. Classifiers• BoW representation• Chi-squared kernel SVMs• Kernel level early fusion is used to combine multiple features
  8. 8. Temporal Smoothing• Feature Smoothing – averaged features over a three-shot window.• Score Smoothing – averaged prediction scores over a three-shot window.
  9. 9. Results (mAP@20) 0.8 Mean Average Precision at 20 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 r3 r2 r5 r4 r1• Run 5: 7 dense trajectory features • Run 2: Run 4 + feature smoothing• Run 4: Run 5 + SIFT + STIP + MFCC • Run 1: Run 4 + score smoothing• Run 3: Run 4 + concept scores
  10. 10. Results (mAP@100) 0.7 Mean Average Precision at 100 0.6 0.5 0.4 0.3 0.2 0.1 0 r3 r4 r5 r2 r1• Run 5: 7 dense trajectory features • Run 2: Run 4 + feature smoothing• Run 4: Run 5 + SIFT + STIP + MFCC • Run 1: Run 4 + score smoothing• Run 3: Run 4 + concept scores
  11. 11. Discussions• SIFT + STIP + MFCC show insignificant improvement. TrajMF has encoded the rich information of SIFT and STIP.• Concept-based scores do not improve the performances - overfitting SVMs due to insufficient training data. In fact, using mid- level concept detectors is a promising direction.• Score smoothing boosts the performances. Feature smoothing that “blurs” the features across shots might not be a good option.
  12. 12. First 20 clips retrieved
  13. 13. Thank You

×