Modeling Temporal Structure of Decomposable Motion Segments for       Activity ClassificationJuan Carlos      Chih-Wei    ...
Recognizing Human ActivitiesMotion Analysis              Interactions with Objects    Detect unusual behavior             ...
Activity landscape                                                            Long termSnapshot   Atomic action    Activit...
Activity landscape                                                                                                      Lo...
Activity landscape                                                                 Long termSnapshot   Atomic action     A...
Activity landscape – related datasets                                                                                 Long...
Activity landscape                                                                                    Long term Snapshot  ...
Our proposal – decompose activities into simpler               motion segments    1. Simple motions are easier to describe...
Outline• Discriminative model for activities  – Representation  – Recognition  – Learning• Experiments• Conclusions       ...
Outline• Discriminative model for activities  – Representation  – Recognition  – Learning• Experiments• Conclusions       ...
A model for activities                         Activity Model                                     11
A model for complex activities                                                 Activity ModelModel Properties             ...
A model for complex activities                                                 Activity ModelModel Properties             ...
A model for complex activities                                                 Activity ModelModel Properties             ...
A model for complex activities                                                    Activity ModelModel Properties          ...
A model for complex activities                                                          Activity ModelModel Properties    ...
A model for complex activities                           temporal location uncertainty   Activity ModelModel Properties   ...
A model for complex activities                          temporal location uncertainty   Activity ModelModel Properties    ...
Outline• Discriminative model for activities  – Representation  – Recognition  – Learning• Experiments• Conclusions       ...
Query Video              Recognition                            20
Query Video              Recognition    [0                            1]                            Activity Model    [0  ...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:   ...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                                           1]Match M...
Query Video                          Recognition    [0                                                          1]Match Mo...
Query Video                            Recognition              Video words    [0                                         ...
Query Video                            Recognition              Video words    [0                                         ...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                                              1]Matc...
Query Video                          Recognition    [0                                                           1]Match M...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                        1]Match Motion Segment 1:• C...
Query Video                          Recognition    [0                                        1]• Matching score for all s...
Outline• Discriminative model for activities  – Representation  – Recognition  – Learning• Experiments• Conclusions       ...
Learning from weakly labeled data  positive examples               negative examples                                      ...
Learning from weakly labeled data       positive examples   negative examples                                             ...
LearningGoalLearn: • Motion segment appearance       • Temporal arrangementA max-margin framwork by optimizing a discrimin...
LearningCoordinate descend• Initialize model parameters          positive examples                negative examples       ...
LearningCoordinate descend• Initialize model parameters1. Find best matching locations          positive examples         ...
LearningCoordinate descend• Initialize model parameters1. Find best matching locations2. Update          positive examples...
LearningCoordinate descend• Initialize model parameters1. Find best matching locations2. Update          positive examples...
LearningCoordinate descend• Initialize model parameters1. Find best matching locations                                    ...
Outline• Discriminative model for activities  – Representation  – Recognition  – Learning• Experiments• Conclusions       ...
Experiment I: Simple Actions • KTH dataset [Schuldt et al 2004]         Action Class         Our Model                    ...
Experiment II: Proof of concept • Activities synthesized from               • 6 classes   Weizmann [Blank 2005]           ...
Experiment III: Olympic Sports Dataset• YouTube videos with class labels per video from AMT• 16 classes, ~100 videos each ...
Learned model: High Jump                                               Activity Model [0                                  ...
Learned model: High Jump                                               Activity Model [0                                  ...
Learned model: High Jump                                             Activity Model[0                                     ...
Learned Model: Clean and Jerk                                                                        Activity Model[0     ...
Learned Model: Clean and Jerk                                                                        Activity Model[0     ...
Learned Model: Clean and Jerk                                                                        Activity Model[0     ...
Matched SequencesLong JumpSequence 1                 Run                            Take off               Stand upLong Ju...
Matched Sequences   VaultSequence 1             Run                Up in the air      Landing   VaultSequence 2           ...
Classifying Olympic Sports100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0%                Ours       Laptev et al CVPR 08 ...
Outline• Discriminative model for activities  – Representation  – Recognition  – Learning• Experiments• Conclusions       ...
ConclusionsTemporal context and structures are useful      Olympic Sports Dataset         for activity recognition        ...
Thank you!                           Juan Carlos Niebles                               Graduate student                   ...
Upcoming SlideShare
Loading in …5
×

ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

704 views

Published on

Published in: Education, Technology, Spiritual
  • Be the first to comment

  • Be the first to like this

ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

  1. 1. Modeling Temporal Structure of Decomposable Motion Segments for Activity ClassificationJuan Carlos Chih-Wei Li Niebles Chen Fei-Fei Computer Science Dept. Stanford University 1
  2. 2. Recognizing Human ActivitiesMotion Analysis Interactions with Objects Detect unusual behavior Temporal structure & causalityJudge Sports Automatically Provide cooking assistance Smart surveillanceBiomechanics … Psychology studiesVideo game interfaces 2
  3. 3. Activity landscape Long termSnapshot Atomic action Activities Events event Construction Catch Run High Jump Football of a building 10-1 100 101 103 107-8 Temporal Scale (seconds) 3
  4. 4. Activity landscape Long term Snapshot Atomic action Activities Events event Construction Catch Run High Jump Football of a building 10-1 100 101 103 107-8• Thurau & Hlavac, 2008 • Bobick & Davis, 2001 • Ramanan & Forsyth, • Sridhar et al, 2010• Gupta et al, 2009 • Efros et al, 2003 2003 • Kuettel, 2010• Ikizler & Duygulu, 2009 • Schuldt et al, 2004 • Laxton et al, 2007• Ikizler-Cinbis et al, 2009 • Alper & Shah, 2005 • Ikizler & Forsyth, 2008• Yao & Fei-Fei 2010a,b • Dollar et al, 2005 • Gupta et al, 2009• Yang, Wang and Mori, • Blank et al, 2005 • Choi & Savarese, 20092010 • Niebles et al, 2006 • Laptev et al, 2008 • Wang & Mori, 2008 • Rodriguez et al, 2008 • Wang & Mori, 2009 • Gupta et al, 2009 • Liu et al, 2009 4 • Marszalek et al, 2009
  5. 5. Activity landscape Long termSnapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds) • Composition of simple motions • Non-periodic • Longer duration than atomic actions
  6. 6. Activity landscape – related datasets Long term Snapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds)Actions in still images KTH New[Ikizler 2009] [Schuldt et al 2004] Olympic SportsPPMI Hollywood Dataset[Yao & Fei-Fei 2010] [Laptev et al 2008]UIUC Sports UCF Sports[Li & Fei-Fei 2007] [Rodriguez et al 2008] Ballet [Yang et al 2009]
  7. 7. Activity landscape Long term Snapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds)Possible approaches: Pose-based recognition HMM, CRF Bag of features • Computationally intensive • Simple action recognition: Fails when actions Ferrari et al 2008 are complex Ramanan & Forsyth 2003 Laptev et al 2008 Sminchisescu 2006 Nazli & Forsyth 2008 Niebles et al 2006 Blank et al 2005 7 […] Liu et al 2009 Efros et al 2003 […]
  8. 8. Our proposal – decompose activities into simpler motion segments 1. Simple motions are easier to describe computationally 2. Can leverage temporal context 3. Human visual system seems to rely on decomposition for understanding [Zacks et al, Nature Neuro 2001, Tversky et al, JEP, 2006] 8
  9. 9. Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 9
  10. 10. Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 10
  11. 11. A model for activities Activity Model 11
  12. 12. A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time 12
  13. 13. A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time• Model is formedby a few simplemotions 13
  14. 14. A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time• Model is formedby a few simplemotions 14
  15. 15. A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time• Model is formedby a few simplemotions• Local motionappearance : Motion Segment 1 15
  16. 16. A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] : anchor location time• Model is formedby a few simplemotions• Local motionappearance• Encode temporalorder : Motion Segment 1 16
  17. 17. A model for complex activities temporal location uncertainty Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] : anchor location time• Model is formedby a few simplemotions• Local motionappearance• Encode temporalorder : Motion Segment 1• Temporalflexibility 17
  18. 18. A model for complex activities temporal location uncertainty Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] : anchor location time shorter• Model is formedby a few simplemotions• Local motionappearance• Encode temporalorder : Motion Segment 1• Temporalflexibility• Multiple temporalscales longer 18
  19. 19. Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 19
  20. 20. Query Video Recognition 20
  21. 21. Query Video Recognition [0 1] Activity Model [0 1] 21
  22. 22. Query Video Recognition [0 1]Match Motion Segment 1: Activity Model [0 1] 22
  23. 23. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location Activity Model [0 1] 23
  24. 24. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 24
  25. 25. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 25
  26. 26. Query Video Recognition [0 1]Match Motion Segment 1: Spatio-temporal Interest points• Consider a candidate location HOG/HOF Descriptors• Matching score for this segment: [Laptev et al, 2005] Activity Model [0 1] 26
  27. 27. Query Video Recognition [0 1]Match Motion Segment 1: Vector-quantized into a codebook• Consider a candidate location of 1000 spatio-temporal words.• Matching score for this segment: Activity Model [0 1] 27
  28. 28. Query Video Recognition Video words [0 1]Match Motion Segment 1: Appearance feature:• Consider a candidate location histogram of video words• Matching score for this segment: Activity Model [0 1] 28
  29. 29. Query Video Recognition Video words [0 1]Match Motion Segment 1: Appearance similarity score:• Consider a candidate location Chi-square kernel SVM• Matching score for this segment: Activity Model [0 1] 29
  30. 30. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 30
  31. 31. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 31
  32. 32. Query Video Recognition [0 1]Match Motion Segment 1: Temporal location feature:• Consider a candidate location the distance btw h_1 and the• Matching score for this segment: anchor location: Activity Model [0 1] 32
  33. 33. Query Video Recognition [0 1]Match Motion Segment 1: Temporal location disagreement• Consider a candidate location score: 2nd order polynomial• Matching score for this segment: Activity Model [0 1] 33
  34. 34. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 34
  35. 35. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 35
  36. 36. Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 36
  37. 37. Query Video Recognition [0 1]• Matching score for all segments: Activity Model [0 1] 37
  38. 38. Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 38
  39. 39. Learning from weakly labeled data positive examples negative examples 39 • YouTube videos • Class label per video collected on Amazon Mechanical Turk • No annotation of temporal segments 39
  40. 40. Learning from weakly labeled data positive examples negative examples 40 Activity Model[0 1] 40
  41. 41. LearningGoalLearn: • Motion segment appearance • Temporal arrangementA max-margin framwork by optimizing a discriminative loss:  Coordinate descend [Felzenszwalb et al 2008] Activity Model [0 1] 41
  42. 42. LearningCoordinate descend• Initialize model parameters positive examples negative examples Activity Model [ ] 0 1 42
  43. 43. LearningCoordinate descend• Initialize model parameters1. Find best matching locations positive examples negative examples Activity Model [ ] 0 1 43
  44. 44. LearningCoordinate descend• Initialize model parameters1. Find best matching locations2. Update positive examples negative examples Activity Model [ ] 0 1 44
  45. 45. LearningCoordinate descend• Initialize model parameters1. Find best matching locations2. Update positive examples negative examples Activity Model [ ] 0 1 45
  46. 46. LearningCoordinate descend• Initialize model parameters1. Find best matching locations Repeat till convergence (or max iter.)2. Update positive examples negative examples Activity Model [ ] 0 1 46
  47. 47. Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 47
  48. 48. Experiment I: Simple Actions • KTH dataset [Schuldt et al 2004] Action Class Our Model walking jogging running walking 94.4% running 79.5% jogging 78.2% hand-waving 99.9% hand-clapping 96.5% boxing 99.2% boxing hand-waving hand-clapping100.0% Ours 90.0% Wang et al 2009 80.0% 70.0% Laptev et al 2008 60.0% Wong et al 2007 50.0% Accuracy Schuldt et al 48 2004
  49. 49. Experiment II: Proof of concept • Activities synthesized from • 6 classes Weizmann [Blank 2005] •Ours 100% Ours 100% •Bag-of-features 17% Bag-of-Features 17%wave jump jumping - jacks Activity Model [0 1] shorter jumping jacks waving waving Transition from jump to jumping jacks longer jumping jacks 49
  50. 50. Experiment III: Olympic Sports Dataset• YouTube videos with class labels per video from AMT• 16 classes, ~100 videos each http://vision.stanford.edu/Datasets/OlympicSports high-jump long-jump triple-jump pole-vault discus hammer javelin shot put 50 basketball bowling tennis-serve platform springboard snatch clean-jerk vault lay-up
  51. 51. Learned model: High Jump Activity Model [0 1]shorter Landing & Start running Run Take off stand uplonger Run 51
  52. 52. Learned model: High Jump Activity Model [0 1]shorter Landing & Start running Run Take off stand uplonger Run Shortersegment, larger location 52
  53. 53. Learned model: High Jump Activity Model[0 1] Landing & Start running Run Take off stand up Run Long segment, small location uncertainty 53
  54. 54. Learned Model: Clean and Jerk Activity Model[0 1]Hold weight while Lift Weight to Hold weight oncrouching shoulders shoulders Hold weight while crouching Transition to upright position 54
  55. 55. Learned Model: Clean and Jerk Activity Model[0 1]Hold weight while Lift Weight to Hold weight oncrouching shoulders shoulders Hold weight while crouching Transition to upright position Short segment with low location uncertainty, it had high location consistency in training 55
  56. 56. Learned Model: Clean and Jerk Activity Model[0 1]Hold weight while Lift Weight to Hold weight oncrouching shoulders shoulders Hold weight while crouching Transition to upright position Segments encode similar appearance, possible locations overlap 56
  57. 57. Matched SequencesLong JumpSequence 1 Run Take off Stand upLong JumpSequence 2 Remarks: •Matching is tolerant to variations in exact motion segment temporal location. • Query videos can have different time length. Long Jump Model [0 1] 57
  58. 58. Matched Sequences VaultSequence 1 Run Up in the air Landing VaultSequence 2 Low matching score, good temporal alignment, bad appearance. Vault Model [0 1] 58
  59. 59. Classifying Olympic Sports100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% Ours Laptev et al CVPR 08 Our Method 72.1% Laptev et al 2008 62.0% 59
  60. 60. Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 60
  61. 61. ConclusionsTemporal context and structures are useful Olympic Sports Dataset for activity recognition (16 classes, ~100 video/class) Future directions • Explore richer temporal structures; • Introduce semantics for more meaningful decomposition 61
  62. 62. Thank you! Juan Carlos Niebles Graduate student Princeton/StanfordBangpeng Yao, Barry Chai, Jia Deng, Hao Su, OlgaRussakovsky, and all Stanford Vision Lab members.

×