Your SlideShare is downloading. ×
cvpr2011: human activity recognition - part 4: syntactic
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

cvpr2011: human activity recognition - part 4: syntactic

564
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
564
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Frontiers ofHuman Activity Analysis J. K. Aggarwal Michael S. Ryoo Kris M. Kitani
  • 2. Overview Human Activity RecognitionSingle-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 2
  • 3. MotivationHow do we interpret a sequence of actions? 3
  • 4. HierarchyHierarchy implies decomposition into sub-parts 4
  • 5. Now we’ll cover… Human Activity RecognitionSingle-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 5
  • 6. SyntacticApproaches 6
  • 7. Syntactic ModelsActivities as strings of symbols.What is the underlying structure? 7
  • 8. Early applications to Vision Tsai and Fu 1980.Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. 8
  • 9. Hierarchical syntactic approach  Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure  Not for   Systems with a lot of errors and uncertainty   Activities with shallow structure 9
  • 10. Basics Context-Free Grammar Generic Language Natural Languages Start Symbol (S) Sentences Set of Terminal Symbols (T) WordsSet of Non-Terminal Symbols (N) Parts of Speech Set of Production Rules (P) Syntax Rules 10
  • 11. Parsing with a grammarS → NP VP (0.8) PP → PREP NP (1.0)S → VP (0.2) PREP → like (1.0)NP → NOUN (0.4) VERB → swat (0.2)NP → NOUN PP (0.4) VERB → flies (0.4)NP → NOUN NP (0.2) VERB → like (0.4)VP → VERB (0.3) NOUN → swat (0.05)VP → VERB NP (0.3) NOUN → flies (0.45)VP → VERB PP (0.2) NOUN → ants (0.5)VP → VERB NP PP (0.2) swat flies like ants 11
  • 12. Parsing with a grammarS → NP VP (0.8) PP → PREP NP (1.0)S → VP (0.2) PREP → like (1.0)NP → NOUN (0.4) VERB → swat (0.2)NP → NOUN PP (0.4) VERB → flies (0.4)NP → NOUN NP (0.2) VERB → like (0.4)VP → VERB (0.3) NOUN → swat (0.05)VP → VERB NP (0.3) NOUN → flies (0.45)VP → VERB PP (0.2) NOUN → ants (0.5)VP → VERB NP PP (0.2) S NP (0.8) VP (0.2) (0.3)NOUN NP NP (0.4) (0.4) NOUN VERB NOUN (0.05) (0.45) (0.4) (0.5) swat flies like ants 12
  • 13. Video analysis with CFGs The “Inverse Hollywood problem”: From video to scripts and storyboards via causal analysis. Brand 1997 Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 13
  • 14. CFG for human activitiesenter detach leave enter detach attach touch touch detach attach leave M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 14
  • 15. Parse tree SCENE (Open up a PC) IN ACTION (Open PC) ACTION (unscrew) OUT OUT IN MOVE REMOVE ADD ADD MOTION MOTION enter detach leave enter detach attach touch touch detach attach leave •  Deterministic low-level primitive detection •  Deterministic parsingM. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 15
  • 16. Stochastic CFGsAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 16
  • 17. Gesture analysis with CFGs Primitive recognition with HMMsAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 17
  • 18. left-rightAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 18
  • 19. up-downAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 19
  • 20. right-leftAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 20
  • 21. down-upAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 21
  • 22. Parse Tree S RH TOP UD BOT DU LR RLleft-right up-down right-left down-up 22
  • 23. Errors Likelihood value over time (not discrete symbols) HMM a HMM b Errors are inevitable… but the grammar acts as a top-down constraintAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 23
  • 24. Dealing with uncertainty & errors   Stolcke-Early (probabilistic) parser   SKIP rules to deal with insertion errors HMM a HMM b HMM cAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 24
  • 25. SCFG for BlackjackRecognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 •  Deals with more complex activities •  Deals with more error types 25
  • 26. extracting primitive actions 26
  • 27. Game grammarRecognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 27
  • 28. Dealing with errors  Ungrammatical strings cause parser to fail  Account for errors with multiple hypothesis   Insertion, deletion, substitution  Issues   How many errors should we tolerate?   Potentially exponential hypothesis space   Ungrammatical strings: vision problem or illegal activity? 28
  • 29. Observations  CFGs good for structured activities  Can incorporate uncertainty in observations  Natural contextual prior for recognizing errors  Not clear how to deal with errors  Assumes ‘good’ action classifiers  Need to define grammar manually Can we learn the grammar from data? 29
  • 30. Heuristic Grammatical Induction 1.  Lexicon learning •  Learn HMMs •  Cluster HMMs 2.  Convert video to string 3.  Learn Grammar Unsupervised Analysis of Human Gestures. Wang et al 2001 30
  • 31. COMPRESSIVE a b c d a b c d b c d a b a b length occurrence new rule new symbol deletion of insertion of substring new rule substringOn-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. Nevill-Manning 2000. 31
  • 32. exampleS→a b c d a b c d b c d a b a b (DL=16)A→b c dS→a A a A A a b a b (DL=14) Repeat until compression becomes 0. 32
  • 33. Critical assumption  No uncertainty  No errors   insertions   deletions   substitution Can we learn grammars despite errors? 33
  • 34. Learning with noise Can we learn the basic structure of a transaction?Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 34
  • 35. extracting primitivesRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 35
  • 36. Underlying structure? D → a x b y c a b x c y a b c xRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 36
  • 37. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b cRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 37
  • 38. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b cRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 38
  • 39. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b c A→a b c D → A A A Simple grammar Efficient compressionRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 39
  • 40. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression GRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 40
  • 41. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structureRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 41
  • 42. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structure DL(D|G) = − log p(D|G) Data compression Likelihood (inside probabilities)Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 42
  • 43. Minimum Description LengthRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 43
  • 44. Minimum Description LengthRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 44
  • 45. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 45
  • 46. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 46
  • 47. Conclusions  Possible to learn basic structure  Robust to errors (insertion, deletion, substitution)  Need a lot of training data  Computational complexity 47
  • 48. Bayesian ApproachesInfinite Hierarchical Hidden Markov Models. The Infinite PCFG using Hierarchical Dirichlet Processes. Heller et al 2009. Liang et al 2007. 48
  • 49. Take home message Hierarchical Syntactic Models  Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure  Not for   Systems with a lot of errors and uncertainty   Activities with weak structure 49
  • 50. StatisticalApproaches 50
  • 51. Using a hierarchical statistical approach  Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics  Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 51
  • 52. Statistical (State-based) ModelActivities as a stochastic path. What are the underlying dynamics? 52
  • 53. Characteristics  Strong Markov assumption  Strong dynamics prior  Robust to uncertainty  Modifications to account for   Hierarchical structure   Concurrent structure 53
  • 54. Hierarchical activities Problem: How do we modelhierarchical activities? combinatory state space! Solution: “stack” actions forhierarchical activities 54
  • 55. Hierarchical hidden Markov model Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 55
  • 56. Context-free activity grammarLearning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 56
  • 57. Context-free activity grammarLearning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 57
  • 58. Observations  Tree structures useful for hierarchies  Tight integration of trajectories with abstract semantic states  Activities are not always a single sequence (ie. they sometimes happen in parallel ) 58
  • 59. Concurrent activities Problem: How do we modelconcurrent activities? combinatory state space! Solution:“stand-up” model for concurrent activities 59
  • 60. Propagation networkPropagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 60
  • 61. Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 61
  • 62. temporal inferenceInference by standing the state transition model on its side 62
  • 63. Inferring structure (storylines) Understanding Videos, Constructing Plots – Learning a Visually Grounded Storyline Model from Annotated Videos Gupta, Srinivasan, Shi and Davis CVPR 2009 Learn AND-OR graphs from weakly labeled data 63
  • 64. Scripts from structureUnderstanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos.Gupta, Srinivasan, Shi and Davis CVPR 2009 64
  • 65. Take home message Hierarchical statistical model  Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics  Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 65
  • 66. Contrasting hierarchical approaches Actions as: Activities as: Model Characteristic probabilistic Robust to Statistic paths DBN states uncertainty discrete DescribesSyntactic strings CFG symbols deep hierarchy Encodes logicalDescriptive sets CFG, MLN complex relationships logic 66
  • 67. References (not included in ACM survey paper)  W. Tsai and K.S. Fu. Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. SMC1980.  M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.  T. Wang, H. Shum, Y. Xu, N. Zheng. Unsupervised Analysis of Human Gestures. PRCM 2001.  C.G. Nevill-Manning, I.H. Witten. On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. IEEE 2000.  K. Heller, Y.W. Teh and D. Gorur. Infinite Hierarchical Hidden Markov Model s. AISTATS 2009.  P. Liang, S. Petrov, M. Jordan, D. Klein. The Infinite PCFG using Hierarchical Dirichlet Processes. EMNLP 2007.  A. Gupta, N. Srinivasan, J.Shi and L.Davis. Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos. CVPR 2009. 67