Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

560 views

Published on

Project page
http://www.hirokatsukataoka.net/research/transitionalactionrecognition/transitionalactionrecognition.html

Herein, we address transitional actions class as a class between actions. Transitional actions should be useful for producing short-term action predictions while an action is transitive. However, transitional action recognition is difficult because actions and transitional actions partially overlap each other. To deal with this issue, we propose a subtle motion descriptor (SMD) that identifies the sensitive differences between actions and transitional actions. The two primary contributions in this paper are as follows: (i) defining transitional actions for short-term action predictions that permit earlier predictions than early action recognition, and (ii) utilizing convolutional neural network (CNN) based SMD to present a clear distinction between actions and transitional actions. Using three different datasets, we will show that our proposed approach produces better results than do other state-of-the-art models. The experimental results clearly show the recognition performance effectiveness of our proposed model, as well as its ability to comprehend temporal motion in transitional actions.

Published in: Science
  • Be the first to comment

  • Be the first to like this

【BMVC2016】Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature

  1. 1. Recognition of Transitional Action for Short-Term Action Prediction using Discriminative Temporal CNN Feature Hirokatsu Kataoka, Ph.D. Computer Vision Research Group (CVRG), AIST http://www.hirokatsukataoka.net/ Yudai Miyashita (TDU), Masaki Hayashi (Liquid Inc., Keio Univ.), Kenji Iwata, Yutaka Satoh (AIST)
  2. 2. Related work: Early Action Recognition •  [Ryoo, ICCV2011] M. S. Ryoo, “Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos”, International Conference on Computer Vision (ICCV), pp.1036-1043, 2011.
  3. 3. Related work: Action Prediction •  [Kataoka+, VISAPP2016] ??? Daytime (Time Zone) Walking (Previous Activity) Sitting (Current Activity) ??? (Next Activity) xtimezone xprevious xcurrent θ = “Using a PC” Given Not given Time series H. Kataoka, Y. Aoki, K. Iwata, Y. Satoh, “Activity Prediction using a Space-Time CNN and Bayesian Framework”, in VISAPP, 2016.
  4. 4. Problem of related works •  Early action recognition –  Action recognition in an early frame of the action –  Enough cue is required, so almost equals to action recognition •  Action prediction –  Complete future prediction in an unstable situation
  5. 5. Proposal •  Transitional Action (TA): Action-class while an action is transitive –  TA contains cue of prediction: Earlier than early action recognition –  Recognition-like future action prediction: More stable prediction [Applications] Autonomous driving, active safety and robotics Δt 【Proposal】 Short-term action prediction recognize “cross” at time t5 【Previous works】 Early action recognition recognize “cross” at time t9 Walk straight (Action) Cross (Action) Walk straight – Cross (Transitional action) t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12
  6. 6. Problem settings Framework Problem Action Recognition Early Action Recognition Action Prediction Transitional Action Recognition f (F1...t A ) → At f (F1...t−L A ) → At f (F1...t A ) → At+L f (F1...t TA ) → At+L
  7. 7. Difference Framework Problem Action Recognition Early Action Recognition Action Prediction Transitional Action Recognition f (F1...t A ) → At f (F1...t−L A ) → At f (F1...t A ) → At+L f (F1...t TA ) → At+L Walk straight (Action) Cross (Action) t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 f (F1...t−L A ) → At A(cross)The objective action is -  Early action recognition is late response
  8. 8. Difference Framework Problem Action Recognition Early Action Recognition Action Prediction Transitional Action Recognition f (F1...t A ) → At f (F1...t−L A ) → At f (F1...t A ) → At+L f (F1...t TA ) → At+L Walk straight (Action) Cross (Action) t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 f (F1...t A ) → At+L A(cross)The objective action is -  Action prediction is unstable
  9. 9. Difference Framework Problem Action Recognition Early Action Recognition Action Prediction Transitional Action Recognition f (F1...t A ) → At f (F1...t−L A ) → At f (F1...t A ) → At+L f (F1...t TA ) → At+L Walk straight (Action) Cross (Action) Walk straight – Cross (Transitional action) t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 A(cross)The objective action is -  Transitional action recognition is reasonable f (F1...t TA ) → At+L
  10. 10. Details of transitional action (TA) •  Annotation for TA –  TA and normal action (NA) classes are partially overlapped each other •  Difficulty of TA –  Temporally mixed between NA and TA
  11. 11. Subtle Motion Descriptor (SMD) •  A discriminative temporal CNN feature –  To divide classes between NA and TA
  12. 12. Subtle Motion Descriptor (SMD) •  Activation feature from VGG-16 –  Fully-connected layer (N = 4,096) –  Based on pooled time series (PoT) [Ryoo+, CVPR2015]
  13. 13. Subtle Motion Descriptor (SMD) •  Temporal difference ΔVt is calculated –  (Frame t) – (Frame t-1)
  14. 14. Subtle Motion Descriptor (SMD) •  Temporal pooling from ΔV t –  Plus and minus –  Zero-around values are pooled (→This is the contribution of SMD) –  TH is experimentally fixed
  15. 15. Datasets •  Temporal action datasets –  NTSEL [Kataoka+, ITSC2015] •  Walk (NA), cross (NA), bicycle (NA), turn (TA) with human bbox –  UTKinect-Action [Xia+, CVPRW2012] •  Ordered 10 NAs (e.g. walk, throw, sit) •  8 TAs (excluding push/pull; next page) •  Without human bbox –  Watch-n-Patch [Wu+, CVPR2015] •  Daily 10 NAs (e.g. read, turn on monitor, leave office) •  Top frequent 10 TAs (next page) •  Without human bbox
  16. 16. Experimental settings (list of TAs) •  @UTKinect-Action @Watch-n-Patch
  17. 17. Implements •  Action recognition appraoches –  Temporal CNN models •  Pooled Time-series (PoT) [Ryoo+, CVPR2015] •  CNN accumulation •  CNN + IDT [Jain+, ECCVW2014] –  Improved dense trajectories (IDT) and with improved features •  IDT [Wang+, ICCV2013] •  IDT + cooccurrence-feature [Kataoka+, ACCV2014] •  All Features in IDT
  18. 18. Exploration experiment •  Parameters –  Frame accumulation –  Thresholding value TH –  Layer fc6 vs fc7
  19. 19. Exploration experiment •  Temporal accumulation [frames] –  Faster prediction: 3 [frames] (0.1s) –  Toward state-of-the-art: 10 [frames] (0.33s) –  Baseline should be 3 and 10 frames accumulation
  20. 20. Exploration experiment •  Thresholding value –  Depending on data
  21. 21. Exploration experiment •  Layer fc6 vs fc7 –  Layer fc6 is better
  22. 22. Results •  SMD (ours) is state-of-the-art in transitional action recognition
  23. 23. Comparison of PoT •  Subtle motion is effective for transitional action recognition –  NTSEL: +2.18%, +8.63% –  UTKinect: +7.19%, +4.31% –  Watch-n-Patch: +4.82%, +5.12%
  24. 24. Conclulsion •  Two contribusions: 1.  Definition of transitional action for short-term action prediction 2.  Subtle Motion Descriptor (SMD) to classify transitional and normal actions

×