cvpr2011: human activity recognition - part 4: syntactic

  • 544 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
544
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
16
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Frontiers ofHuman Activity Analysis J. K. Aggarwal Michael S. Ryoo Kris M. Kitani
  • 2. Overview Human Activity RecognitionSingle-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 2
  • 3. MotivationHow do we interpret a sequence of actions? 3
  • 4. HierarchyHierarchy implies decomposition into sub-parts 4
  • 5. Now we’ll cover… Human Activity RecognitionSingle-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 5
  • 6. SyntacticApproaches 6
  • 7. Syntactic ModelsActivities as strings of symbols.What is the underlying structure? 7
  • 8. Early applications to Vision Tsai and Fu 1980.Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. 8
  • 9. Hierarchical syntactic approach  Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure  Not for   Systems with a lot of errors and uncertainty   Activities with shallow structure 9
  • 10. Basics Context-Free Grammar Generic Language Natural Languages Start Symbol (S) Sentences Set of Terminal Symbols (T) WordsSet of Non-Terminal Symbols (N) Parts of Speech Set of Production Rules (P) Syntax Rules 10
  • 11. Parsing with a grammarS → NP VP (0.8) PP → PREP NP (1.0)S → VP (0.2) PREP → like (1.0)NP → NOUN (0.4) VERB → swat (0.2)NP → NOUN PP (0.4) VERB → flies (0.4)NP → NOUN NP (0.2) VERB → like (0.4)VP → VERB (0.3) NOUN → swat (0.05)VP → VERB NP (0.3) NOUN → flies (0.45)VP → VERB PP (0.2) NOUN → ants (0.5)VP → VERB NP PP (0.2) swat flies like ants 11
  • 12. Parsing with a grammarS → NP VP (0.8) PP → PREP NP (1.0)S → VP (0.2) PREP → like (1.0)NP → NOUN (0.4) VERB → swat (0.2)NP → NOUN PP (0.4) VERB → flies (0.4)NP → NOUN NP (0.2) VERB → like (0.4)VP → VERB (0.3) NOUN → swat (0.05)VP → VERB NP (0.3) NOUN → flies (0.45)VP → VERB PP (0.2) NOUN → ants (0.5)VP → VERB NP PP (0.2) S NP (0.8) VP (0.2) (0.3)NOUN NP NP (0.4) (0.4) NOUN VERB NOUN (0.05) (0.45) (0.4) (0.5) swat flies like ants 12
  • 13. Video analysis with CFGs The “Inverse Hollywood problem”: From video to scripts and storyboards via causal analysis. Brand 1997 Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 13
  • 14. CFG for human activitiesenter detach leave enter detach attach touch touch detach attach leave M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 14
  • 15. Parse tree SCENE (Open up a PC) IN ACTION (Open PC) ACTION (unscrew) OUT OUT IN MOVE REMOVE ADD ADD MOTION MOTION enter detach leave enter detach attach touch touch detach attach leave •  Deterministic low-level primitive detection •  Deterministic parsingM. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 15
  • 16. Stochastic CFGsAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 16
  • 17. Gesture analysis with CFGs Primitive recognition with HMMsAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 17
  • 18. left-rightAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 18
  • 19. up-downAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 19
  • 20. right-leftAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 20
  • 21. down-upAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 21
  • 22. Parse Tree S RH TOP UD BOT DU LR RLleft-right up-down right-left down-up 22
  • 23. Errors Likelihood value over time (not discrete symbols) HMM a HMM b Errors are inevitable… but the grammar acts as a top-down constraintAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 23
  • 24. Dealing with uncertainty & errors   Stolcke-Early (probabilistic) parser   SKIP rules to deal with insertion errors HMM a HMM b HMM cAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 24
  • 25. SCFG for BlackjackRecognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 •  Deals with more complex activities •  Deals with more error types 25
  • 26. extracting primitive actions 26
  • 27. Game grammarRecognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 27
  • 28. Dealing with errors  Ungrammatical strings cause parser to fail  Account for errors with multiple hypothesis   Insertion, deletion, substitution  Issues   How many errors should we tolerate?   Potentially exponential hypothesis space   Ungrammatical strings: vision problem or illegal activity? 28
  • 29. Observations  CFGs good for structured activities  Can incorporate uncertainty in observations  Natural contextual prior for recognizing errors  Not clear how to deal with errors  Assumes ‘good’ action classifiers  Need to define grammar manually Can we learn the grammar from data? 29
  • 30. Heuristic Grammatical Induction 1.  Lexicon learning •  Learn HMMs •  Cluster HMMs 2.  Convert video to string 3.  Learn Grammar Unsupervised Analysis of Human Gestures. Wang et al 2001 30
  • 31. COMPRESSIVE a b c d a b c d b c d a b a b length occurrence new rule new symbol deletion of insertion of substring new rule substringOn-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. Nevill-Manning 2000. 31
  • 32. exampleS→a b c d a b c d b c d a b a b (DL=16)A→b c dS→a A a A A a b a b (DL=14) Repeat until compression becomes 0. 32
  • 33. Critical assumption  No uncertainty  No errors   insertions   deletions   substitution Can we learn grammars despite errors? 33
  • 34. Learning with noise Can we learn the basic structure of a transaction?Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 34
  • 35. extracting primitivesRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 35
  • 36. Underlying structure? D → a x b y c a b x c y a b c xRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 36
  • 37. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b cRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 37
  • 38. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b cRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 38
  • 39. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b c A→a b c D → A A A Simple grammar Efficient compressionRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 39
  • 40. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression GRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 40
  • 41. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structureRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 41
  • 42. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structure DL(D|G) = − log p(D|G) Data compression Likelihood (inside probabilities)Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 42
  • 43. Minimum Description LengthRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 43
  • 44. Minimum Description LengthRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 44
  • 45. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 45
  • 46. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 46
  • 47. Conclusions  Possible to learn basic structure  Robust to errors (insertion, deletion, substitution)  Need a lot of training data  Computational complexity 47
  • 48. Bayesian ApproachesInfinite Hierarchical Hidden Markov Models. The Infinite PCFG using Hierarchical Dirichlet Processes. Heller et al 2009. Liang et al 2007. 48
  • 49. Take home message Hierarchical Syntactic Models  Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure  Not for   Systems with a lot of errors and uncertainty   Activities with weak structure 49
  • 50. StatisticalApproaches 50
  • 51. Using a hierarchical statistical approach  Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics  Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 51
  • 52. Statistical (State-based) ModelActivities as a stochastic path. What are the underlying dynamics? 52
  • 53. Characteristics  Strong Markov assumption  Strong dynamics prior  Robust to uncertainty  Modifications to account for   Hierarchical structure   Concurrent structure 53
  • 54. Hierarchical activities Problem: How do we modelhierarchical activities? combinatory state space! Solution: “stack” actions forhierarchical activities 54
  • 55. Hierarchical hidden Markov model Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 55
  • 56. Context-free activity grammarLearning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 56
  • 57. Context-free activity grammarLearning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 57
  • 58. Observations  Tree structures useful for hierarchies  Tight integration of trajectories with abstract semantic states  Activities are not always a single sequence (ie. they sometimes happen in parallel ) 58
  • 59. Concurrent activities Problem: How do we modelconcurrent activities? combinatory state space! Solution:“stand-up” model for concurrent activities 59
  • 60. Propagation networkPropagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 60
  • 61. Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 61
  • 62. temporal inferenceInference by standing the state transition model on its side 62
  • 63. Inferring structure (storylines) Understanding Videos, Constructing Plots – Learning a Visually Grounded Storyline Model from Annotated Videos Gupta, Srinivasan, Shi and Davis CVPR 2009 Learn AND-OR graphs from weakly labeled data 63
  • 64. Scripts from structureUnderstanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos.Gupta, Srinivasan, Shi and Davis CVPR 2009 64
  • 65. Take home message Hierarchical statistical model  Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics  Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 65
  • 66. Contrasting hierarchical approaches Actions as: Activities as: Model Characteristic probabilistic Robust to Statistic paths DBN states uncertainty discrete DescribesSyntactic strings CFG symbols deep hierarchy Encodes logicalDescriptive sets CFG, MLN complex relationships logic 66
  • 67. References (not included in ACM survey paper)  W. Tsai and K.S. Fu. Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. SMC1980.  M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.  T. Wang, H. Shum, Y. Xu, N. Zheng. Unsupervised Analysis of Human Gestures. PRCM 2001.  C.G. Nevill-Manning, I.H. Witten. On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. IEEE 2000.  K. Heller, Y.W. Teh and D. Gorur. Infinite Hierarchical Hidden Markov Model s. AISTATS 2009.  P. Liang, S. Petrov, M. Jordan, D. Klein. The Infinite PCFG using Hierarchical Dirichlet Processes. EMNLP 2007.  A. Gupta, N. Srinivasan, J.Shi and L.Davis. Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos. CVPR 2009. 67