Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

【ECCV 2016 BNMW】Human Action Recognition without Human


Published on

Project page:

The objective of this paper is to evaluate "human action recognition without human". Motion representation is frequently discussed in human action recognition. We have examined several sophisticated options, such as dense trajectories (DT) and the two-stream convolutional neural network (CNN). However, some features from the background could be too strong, as shown in some recent studies on human action recognition. Therefore, we considered whether a background sequence alone can classify human actions in current large-scale action datasets (e.g., UCF101). In this paper, we propose a novel concept for human action analysis that is named "human action recognition without human". An experiment clearly shows the effect of a background sequence for understanding an action label.
To the best of our knowledge, this is the first study of human action recognition without human. However, we should not have done that kind of thing. The motion representation from a background sequence is effective to classify videos in a human action database. We demonstrated human action recognition in with and without a human settings on the UCF101 dataset. The results show the setting without a human (47.42%; without human setting) was close to the setting with a human (56.91%; with human setting). We must accept this reality to realize better motion representation.

Published in: Science
  • Be the first to comment

【ECCV 2016 BNMW】Human Action Recognition without Human

  1. 1. Human Action Recognition without Human He Yun1,2, Soma Shirakabe1,2, Yutaka Satoh1,2, Hirokatsu Kataoka1 1Computer Vision Research Group, AIST, Japan 2Human-Centered Vision Lab., University of Tsukuba, Japan
  2. 2. Motion representation •  Database: UCF101, HMDB51, ActivityNet •  Approach: IDT, Two-Stream CNN –  DBs and approaches have been prepared in the field
  3. 3. Action Database h"p://
  4. 4. The problem setting in action recognition •  Video-level prediction –  1 action-label prediction per input video Tennis Swing Mo6on Descriptor
  5. 5. Dense Trajectories (DT) [Wang+, CVPR11] •  Trajectory-based representation –  A large amount of trajectories –  Feature description (HOG, HOF, MBH) –  Codeword vector is generated
  6. 6. Two-Stream CNN [Simonyan+, NIPS14] •  Spatial and temporal convolution –  Spatial-stream: From a RGB image –  Temporal-stream: From a stacked flows –  Score fusion: Average or SVM
  7. 7. Is background enough to classify actions? •  RGB input is too strong! –  The two-stream CNN[Simonyan+, NIPS14] reported spatial-stream can understand an action more than expected •  72.4% with spatial-stream (RGB) @UCF101 •  “Human Action Recognition without Human”
  8. 8. Without Human? •  Human action recognition can be done just by motion of the background? Tennis Swing Mo6on Descriptor Tennis Swing? Mo6on Descriptor
  9. 9. Detailed setting of w/ and w/o Human •  With and without human setting –  Without human setting: center-blind image with UCF101 –  With human setting: inverse of the without human setting I (x, y) f (x, y) * I’ (x, y) 1/2 1/4 1/4 1/2 1/4 1/4 I (x, y) f (x, y) * I’ (x, y) 1/2 1/4 1/4 1/2 1/4 1/4 ー ー Without Human SeIng With Human SeIng
  10. 10. Framework –  Baseline: Very deep two-stream CNN [Wang+, arXiv15] –  Two different scenarios: without human and with human
  11. 11. Exploration experiment •  @UCF101 –  UCF101 pre-trained model with very deep two-stream CNN –  With/Without Human Setting
  12. 12. Visual results (Full Image)
  13. 13. Visual results (Without Human Setting)
  14. 14. Without Human •  The concept of ”Human Action Recognition without Human” –  The accuracies are very close •  With human is +9.49% better than without human –  The current motion representation heavily rely on the backgrounds
  15. 15. Future work •  This is a suggestive reality –  We must accept this reality to realize better motion representation –  Pure motion representation is an urgent work! •  More sophisticated approach •  Human only motion