• Like
  • Save
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification
Upcoming SlideShare
Loading in...5
×
 

ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

on

  • 621 views

 

Statistics

Views

Total Views
621
Views on SlideShare
621
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification Presentation Transcript

    • Modeling Temporal Structure of Decomposable Motion Segments for Activity ClassificationJuan Carlos Chih-Wei Li Niebles Chen Fei-Fei Computer Science Dept. Stanford University 1
    • Recognizing Human ActivitiesMotion Analysis Interactions with Objects Detect unusual behavior Temporal structure & causalityJudge Sports Automatically Provide cooking assistance Smart surveillanceBiomechanics … Psychology studiesVideo game interfaces 2
    • Activity landscape Long termSnapshot Atomic action Activities Events event Construction Catch Run High Jump Football of a building 10-1 100 101 103 107-8 Temporal Scale (seconds) 3
    • Activity landscape Long term Snapshot Atomic action Activities Events event Construction Catch Run High Jump Football of a building 10-1 100 101 103 107-8• Thurau & Hlavac, 2008 • Bobick & Davis, 2001 • Ramanan & Forsyth, • Sridhar et al, 2010• Gupta et al, 2009 • Efros et al, 2003 2003 • Kuettel, 2010• Ikizler & Duygulu, 2009 • Schuldt et al, 2004 • Laxton et al, 2007• Ikizler-Cinbis et al, 2009 • Alper & Shah, 2005 • Ikizler & Forsyth, 2008• Yao & Fei-Fei 2010a,b • Dollar et al, 2005 • Gupta et al, 2009• Yang, Wang and Mori, • Blank et al, 2005 • Choi & Savarese, 20092010 • Niebles et al, 2006 • Laptev et al, 2008 • Wang & Mori, 2008 • Rodriguez et al, 2008 • Wang & Mori, 2009 • Gupta et al, 2009 • Liu et al, 2009 4 • Marszalek et al, 2009
    • Activity landscape Long termSnapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds) • Composition of simple motions • Non-periodic • Longer duration than atomic actions
    • Activity landscape – related datasets Long term Snapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds)Actions in still images KTH New[Ikizler 2009] [Schuldt et al 2004] Olympic SportsPPMI Hollywood Dataset[Yao & Fei-Fei 2010] [Laptev et al 2008]UIUC Sports UCF Sports[Li & Fei-Fei 2007] [Rodriguez et al 2008] Ballet [Yang et al 2009]
    • Activity landscape Long term Snapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds)Possible approaches: Pose-based recognition HMM, CRF Bag of features • Computationally intensive • Simple action recognition: Fails when actions Ferrari et al 2008 are complex Ramanan & Forsyth 2003 Laptev et al 2008 Sminchisescu 2006 Nazli & Forsyth 2008 Niebles et al 2006 Blank et al 2005 7 […] Liu et al 2009 Efros et al 2003 […]
    • Our proposal – decompose activities into simpler motion segments 1. Simple motions are easier to describe computationally 2. Can leverage temporal context 3. Human visual system seems to rely on decomposition for understanding [Zacks et al, Nature Neuro 2001, Tversky et al, JEP, 2006] 8
    • Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 9
    • Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 10
    • A model for activities Activity Model 11
    • A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time 12
    • A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time• Model is formedby a few simplemotions 13
    • A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time• Model is formedby a few simplemotions 14
    • A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] time• Model is formedby a few simplemotions• Local motionappearance : Motion Segment 1 15
    • A model for complex activities Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] : anchor location time• Model is formedby a few simplemotions• Local motionappearance• Encode temporalorder : Motion Segment 1 16
    • A model for complex activities temporal location uncertainty Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] : anchor location time• Model is formedby a few simplemotions• Local motionappearance• Encode temporalorder : Motion Segment 1• Temporalflexibility 17
    • A model for complex activities temporal location uncertainty Activity ModelModel Properties 0 1• Use a standard [ ]time range: [0,1] : anchor location time shorter• Model is formedby a few simplemotions• Local motionappearance• Encode temporalorder : Motion Segment 1• Temporalflexibility• Multiple temporalscales longer 18
    • Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 19
    • Query Video Recognition 20
    • Query Video Recognition [0 1] Activity Model [0 1] 21
    • Query Video Recognition [0 1]Match Motion Segment 1: Activity Model [0 1] 22
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location Activity Model [0 1] 23
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 24
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 25
    • Query Video Recognition [0 1]Match Motion Segment 1: Spatio-temporal Interest points• Consider a candidate location HOG/HOF Descriptors• Matching score for this segment: [Laptev et al, 2005] Activity Model [0 1] 26
    • Query Video Recognition [0 1]Match Motion Segment 1: Vector-quantized into a codebook• Consider a candidate location of 1000 spatio-temporal words.• Matching score for this segment: Activity Model [0 1] 27
    • Query Video Recognition Video words [0 1]Match Motion Segment 1: Appearance feature:• Consider a candidate location histogram of video words• Matching score for this segment: Activity Model [0 1] 28
    • Query Video Recognition Video words [0 1]Match Motion Segment 1: Appearance similarity score:• Consider a candidate location Chi-square kernel SVM• Matching score for this segment: Activity Model [0 1] 29
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 30
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 31
    • Query Video Recognition [0 1]Match Motion Segment 1: Temporal location feature:• Consider a candidate location the distance btw h_1 and the• Matching score for this segment: anchor location: Activity Model [0 1] 32
    • Query Video Recognition [0 1]Match Motion Segment 1: Temporal location disagreement• Consider a candidate location score: 2nd order polynomial• Matching score for this segment: Activity Model [0 1] 33
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 34
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 35
    • Query Video Recognition [0 1]Match Motion Segment 1:• Consider a candidate location• Matching score for this segment: Activity Model [0 1] 36
    • Query Video Recognition [0 1]• Matching score for all segments: Activity Model [0 1] 37
    • Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 38
    • Learning from weakly labeled data positive examples negative examples 39 • YouTube videos • Class label per video collected on Amazon Mechanical Turk • No annotation of temporal segments 39
    • Learning from weakly labeled data positive examples negative examples 40 Activity Model[0 1] 40
    • LearningGoalLearn: • Motion segment appearance • Temporal arrangementA max-margin framwork by optimizing a discriminative loss:  Coordinate descend [Felzenszwalb et al 2008] Activity Model [0 1] 41
    • LearningCoordinate descend• Initialize model parameters positive examples negative examples Activity Model [ ] 0 1 42
    • LearningCoordinate descend• Initialize model parameters1. Find best matching locations positive examples negative examples Activity Model [ ] 0 1 43
    • LearningCoordinate descend• Initialize model parameters1. Find best matching locations2. Update positive examples negative examples Activity Model [ ] 0 1 44
    • LearningCoordinate descend• Initialize model parameters1. Find best matching locations2. Update positive examples negative examples Activity Model [ ] 0 1 45
    • LearningCoordinate descend• Initialize model parameters1. Find best matching locations Repeat till convergence (or max iter.)2. Update positive examples negative examples Activity Model [ ] 0 1 46
    • Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 47
    • Experiment I: Simple Actions • KTH dataset [Schuldt et al 2004] Action Class Our Model walking jogging running walking 94.4% running 79.5% jogging 78.2% hand-waving 99.9% hand-clapping 96.5% boxing 99.2% boxing hand-waving hand-clapping100.0% Ours 90.0% Wang et al 2009 80.0% 70.0% Laptev et al 2008 60.0% Wong et al 2007 50.0% Accuracy Schuldt et al 48 2004
    • Experiment II: Proof of concept • Activities synthesized from • 6 classes Weizmann [Blank 2005] •Ours 100% Ours 100% •Bag-of-features 17% Bag-of-Features 17%wave jump jumping - jacks Activity Model [0 1] shorter jumping jacks waving waving Transition from jump to jumping jacks longer jumping jacks 49
    • Experiment III: Olympic Sports Dataset• YouTube videos with class labels per video from AMT• 16 classes, ~100 videos each http://vision.stanford.edu/Datasets/OlympicSports high-jump long-jump triple-jump pole-vault discus hammer javelin shot put 50 basketball bowling tennis-serve platform springboard snatch clean-jerk vault lay-up
    • Learned model: High Jump Activity Model [0 1]shorter Landing & Start running Run Take off stand uplonger Run 51
    • Learned model: High Jump Activity Model [0 1]shorter Landing & Start running Run Take off stand uplonger Run Shortersegment, larger location 52
    • Learned model: High Jump Activity Model[0 1] Landing & Start running Run Take off stand up Run Long segment, small location uncertainty 53
    • Learned Model: Clean and Jerk Activity Model[0 1]Hold weight while Lift Weight to Hold weight oncrouching shoulders shoulders Hold weight while crouching Transition to upright position 54
    • Learned Model: Clean and Jerk Activity Model[0 1]Hold weight while Lift Weight to Hold weight oncrouching shoulders shoulders Hold weight while crouching Transition to upright position Short segment with low location uncertainty, it had high location consistency in training 55
    • Learned Model: Clean and Jerk Activity Model[0 1]Hold weight while Lift Weight to Hold weight oncrouching shoulders shoulders Hold weight while crouching Transition to upright position Segments encode similar appearance, possible locations overlap 56
    • Matched SequencesLong JumpSequence 1 Run Take off Stand upLong JumpSequence 2 Remarks: •Matching is tolerant to variations in exact motion segment temporal location. • Query videos can have different time length. Long Jump Model [0 1] 57
    • Matched Sequences VaultSequence 1 Run Up in the air Landing VaultSequence 2 Low matching score, good temporal alignment, bad appearance. Vault Model [0 1] 58
    • Classifying Olympic Sports100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% Ours Laptev et al CVPR 08 Our Method 72.1% Laptev et al 2008 62.0% 59
    • Outline• Discriminative model for activities – Representation – Recognition – Learning• Experiments• Conclusions 60
    • ConclusionsTemporal context and structures are useful Olympic Sports Dataset for activity recognition (16 classes, ~100 video/class) Future directions • Explore richer temporal structures; • Introduce semantics for more meaningful decomposition 61
    • Thank you! Juan Carlos Niebles Graduate student Princeton/StanfordBangpeng Yao, Barry Chai, Jia Deng, Hao Su, OlgaRussakovsky, and all Stanford Vision Lab members.