Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

【CVPR2016_LAP】Dominant Codewords Selection with Topic Model for Action Recognition

602 views

Published on

http://www.hirokatsukataoka.net/pdf/cvprw16_kataoka_ddt.pdf

In this paper, we propose a framework for recognizing human activities that uses only in-topic dominant codewords and a mixture of intertopic vectors. Latent Dirichlet allocation (LDA) is used to develop approximations of human motion primitives; these are mid-level representations, and they adaptively integrate dominant vectors when classifying human activities. In LDA topic modeling, action videos (documents) are represented by a bag-of-words (input from a dictionary), and these are based on improved dense trajectories ([18]). The output topics correspond to human motion primitives, such as finger moving or subtle leg motion. We eliminate the impurities, such as missed tracking or changing light conditions, in each motion primitive. The assembled vector of motion primitives is an improved representation of the action. We demonstrate our method on four different datasets.

Published in: Science
  • Be the first to comment

  • Be the first to like this

【CVPR2016_LAP】Dominant Codewords Selection with Topic Model for Action Recognition

  1. 1. Dominant Codewords Selec2on with Topic Model for Ac2on Recogni2on Hirokatsu KATAOKA, Masaki Hayashi†, Yoshimitsu AOKI†, Kenji IWATA, Yutaka SATOH, Slobodan Ilic‡ Na2onal Ins2tute of Advanced Industrial Science and Technology (AIST) † Keio University ‡ Technische Universität München hPp://www.hirokatsukataoka.net/
  2. 2. Human Sensing •  Several tasks are included in vision-based human sensing –  Detec2on, tracking, face recogni2on –  Posture es2ma2on, ac2on analysis (event recogni2on) –  Ac2on recogni2on is able to extend human sensing applica2ons Mental state Body Situa2on APen2on Ac2on Analysis shakinghands Look at people Detec2on Gaze Es2ma2on Ac2on Recogni2on Posture Es2ma2on Face Recogni2on Trajectory extrac2on Tracking
  3. 3. Understanding Human Ac2ons •  Ac2on is defined as something that people do or cause to happen –  e.g. walking, running, si^ng This image contains a man walking •  Ac2on recogni2on Classifica2on •  Ac2on detec2on Classifica2on & localiza2on Walking Walking Ac2on Recogni2on Ac2on Detec2on
  4. 4. Current Ac2on Recogni2on •  Trajectory-based representa2on Improved Dense Trajectories (IDT) [Wang+, ICCV13] Trajectory-pooled Deep-convolu2onal Descriptors (TDD) [Wang+, CVPR15] •  “Dense” representa2on is important •  Trajectory-based hand-craied descrip2on (HOG/HOF/MBH/Traj.) •  Intersec2on of hand-craied and deep- learned features •  Intui2vely, IDT feature is replaced by ConvMaps •  CNN is based on Two-stream ConvNet [Simonyan+, NIPS14]
  5. 5. Typical Ac2on Recogni2on Pipeline Trajectories / keypoints extrac2on Feature descrip2on Codeword pooling Classifica2on e.g. BoF, VLAD, Fisher Vectors hPp://www.analy2calway.com/images/SMV/svmFeatureSpace2.gif e.g. SVM
  6. 6. Typical Ac2on Recogni2on Pipeline •  Two strategies for more sophis2cated feature vector Trajectory extrac2on Trajectory descrip2on 1. Trajectory refinement! (main contribu2on) 2. BePer feature!
  7. 7. Our proposal •  1. Dominant codewords selec2on (DCS) –  Significant ac2on representa2on by using topic model (typical LDA) –  Topic-based grouping from dense trajectories –  Noise elimina2on at each topic •  2. Co-occurrence feature representa2on for bePer feat. –  As an addi2onal feature into typical HOG/ HOF / MBH / Traj. –  The feature is based on [Kataoka+, ACCV14] which is improved co-occurrence feature focusing on extrac2ng subtle mo2on × × × × × × × × × × × × × × × There are some noise even in the sta2c scene!
  8. 8. Flowchart of dominant codewords selec2on (DCS) •  1. Feature extrac2on –  Dense Trajectories •  2. Topic modeling to divide primi2ve mo2on –  Each topic is approxima2ng each primi2ve mo2on •  3. Noise canceling at each topic •  4. Dominant dense trajectories (DDT)
  9. 9. Feature extrac2on •  Improved dense trajectories [Wang+, ICCV13] –  HOG/ HOF/ MBHx, /MBHy –  CoHOG, Extended CoHOG [Kataoka+, ACCV14] –  Bag-of-features (BoF) to input of topic model Extended CoHOG
  10. 10. Topic Model •  Latent Dirichlet alloca2on (LDA) [Blei+, JMLR03] –  We applied simplified model [Griffiths+, 04] –  Typical LDA –  Parameters •  Probability distribu2on of topic: Θ •  Topic: T •  Hyper parameter: α, β •  DT-BoF from video: v
  11. 11. Topic Model for DCS •  Noise rejec2on at each topic –  Each topic indicates each “primi2ve mo2on” –  In-topic adap2ve thresholding × × × × × × × × × × × “cut off ends” Topic1 Topic2 Topic3 Topic4 MPII cooking Each topic indicates each “primi2ve mo2on” × × × × × × × × × × × × × × × × × × × “cut off ends” × × × × × × × × × × × × × Unsophis2cated rejec2on is not desirable × Eliminated vector
  12. 12. Dominant dense trajectoies (DDT) •  All refined topics are integrated –  DDT –  DDT is a mixture of dominant codewords using AND opera2on e.g. •  T1 = {vw1, vw3, vw5} => T1’ = {vw1, vw5} •  T2 = {vw1, vw2, vw7} => T2’ = {vw1, vw2, vw7} •  T3 = {vw3, vw6, vw7} => T3’ = {vw7} •  DDT = {vw1, vw2, vw5, vw7} •  DDT is sophis2cated representa2on with dominant codewords
  13. 13. Experiments •  Four datasets for ac2on recogni2on 【INRIA surgery】 View: 4 Activity: 4 view1 view2 view3 view4 【IXMAS】 View: 5 Activity: 11 【MPII cooking activities】 View: 1 Activity: 65 view1 view2 view3 view4 view5 【NTSEL traffic】 View: 1 Activity: 4
  14. 14. Parameters •  DT, classifica2on & LDA are based on the previous works –  # of codeword: 4,000 –  # of trajectory pooling: 15 frame –  The parameters are following original DT –  α and β of LDA are set at 1.0 and 0.01 –  1,000 Gibbs sampler itera2ons –  The topic modeling params are based on [Griffiths and Steyvers, 04] •  Dominant codewords selec2on (DCS) –  Topic-based codewords are set as 1% of the frequency –  # of topic: # of view x # of ac2on
  15. 15. Comparison of DT •  Dominant DT vs DT –  By using HOG, HOF, MBHx, MBHy and combined features –  Dominant codewords selec2on is effec2ve on the DT
  16. 16. Co-occurrence feature in DDT •  On fine-grained dataset –  MPII cooking [Rohrbach+, CVPR12] –  Extended co-occurrence feature is improved by DCS –  DDT is improved with extended co-occurrence feature –  [Ni+, ECCV14] and [Ni+, CVPR15] are state-of-the-art with mid-level features
  17. 17. Topic Visualiza2on × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × Topic1 (View 2) “sit” View 1 “sit” View 2 Topic1 (View 1) Topic2 (View 2) Topic2 (View 1) Topic3 (View 2) Topic3 (View 1) Topic4 (View 2) Topic4 (View 1) “getup” View 2 Topic1 (View 2) Topic2 (View 2) Topic3 (View 2) Topic4 (View 2) “crossing” Topic1 Topic2 Topic3 Topic4 “cut off ends” Topic1 Topic2 Topic3 Topic4 INRIA surgery INRIA surgery IXMAS NTSEL traffic MPII cooking
  18. 18. Conclusion •  Dominant codewords selec2on (DCS) for ac2on recogni2on –  We applied topic model for noise canceling in trajectory-based representa2on –  Extra feature (co-occurrence feature) •  Future works –  Exploring parameters and more effec2ve visualiza2on (on-going…) –  Seman2c flow with topic models

×