cvpr2011: human activity recognition - part 4: syntactic

945 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
945
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

cvpr2011: human activity recognition - part 4: syntactic

  1. 1. Frontiers ofHuman Activity Analysis J. K. Aggarwal Michael S. Ryoo Kris M. Kitani
  2. 2. Overview Human Activity RecognitionSingle-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 2
  3. 3. MotivationHow do we interpret a sequence of actions? 3
  4. 4. HierarchyHierarchy implies decomposition into sub-parts 4
  5. 5. Now we’ll cover… Human Activity RecognitionSingle-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 5
  6. 6. SyntacticApproaches 6
  7. 7. Syntactic ModelsActivities as strings of symbols.What is the underlying structure? 7
  8. 8. Early applications to Vision Tsai and Fu 1980.Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. 8
  9. 9. Hierarchical syntactic approach  Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure  Not for   Systems with a lot of errors and uncertainty   Activities with shallow structure 9
  10. 10. Basics Context-Free Grammar Generic Language Natural Languages Start Symbol (S) Sentences Set of Terminal Symbols (T) WordsSet of Non-Terminal Symbols (N) Parts of Speech Set of Production Rules (P) Syntax Rules 10
  11. 11. Parsing with a grammarS → NP VP (0.8) PP → PREP NP (1.0)S → VP (0.2) PREP → like (1.0)NP → NOUN (0.4) VERB → swat (0.2)NP → NOUN PP (0.4) VERB → flies (0.4)NP → NOUN NP (0.2) VERB → like (0.4)VP → VERB (0.3) NOUN → swat (0.05)VP → VERB NP (0.3) NOUN → flies (0.45)VP → VERB PP (0.2) NOUN → ants (0.5)VP → VERB NP PP (0.2) swat flies like ants 11
  12. 12. Parsing with a grammarS → NP VP (0.8) PP → PREP NP (1.0)S → VP (0.2) PREP → like (1.0)NP → NOUN (0.4) VERB → swat (0.2)NP → NOUN PP (0.4) VERB → flies (0.4)NP → NOUN NP (0.2) VERB → like (0.4)VP → VERB (0.3) NOUN → swat (0.05)VP → VERB NP (0.3) NOUN → flies (0.45)VP → VERB PP (0.2) NOUN → ants (0.5)VP → VERB NP PP (0.2) S NP (0.8) VP (0.2) (0.3)NOUN NP NP (0.4) (0.4) NOUN VERB NOUN (0.05) (0.45) (0.4) (0.5) swat flies like ants 12
  13. 13. Video analysis with CFGs The “Inverse Hollywood problem”: From video to scripts and storyboards via causal analysis. Brand 1997 Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 13
  14. 14. CFG for human activitiesenter detach leave enter detach attach touch touch detach attach leave M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 14
  15. 15. Parse tree SCENE (Open up a PC) IN ACTION (Open PC) ACTION (unscrew) OUT OUT IN MOVE REMOVE ADD ADD MOTION MOTION enter detach leave enter detach attach touch touch detach attach leave •  Deterministic low-level primitive detection •  Deterministic parsingM. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 15
  16. 16. Stochastic CFGsAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 16
  17. 17. Gesture analysis with CFGs Primitive recognition with HMMsAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 17
  18. 18. left-rightAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 18
  19. 19. up-downAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 19
  20. 20. right-leftAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 20
  21. 21. down-upAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 21
  22. 22. Parse Tree S RH TOP UD BOT DU LR RLleft-right up-down right-left down-up 22
  23. 23. Errors Likelihood value over time (not discrete symbols) HMM a HMM b Errors are inevitable… but the grammar acts as a top-down constraintAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 23
  24. 24. Dealing with uncertainty & errors   Stolcke-Early (probabilistic) parser   SKIP rules to deal with insertion errors HMM a HMM b HMM cAction Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 24
  25. 25. SCFG for BlackjackRecognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 •  Deals with more complex activities •  Deals with more error types 25
  26. 26. extracting primitive actions 26
  27. 27. Game grammarRecognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 27
  28. 28. Dealing with errors  Ungrammatical strings cause parser to fail  Account for errors with multiple hypothesis   Insertion, deletion, substitution  Issues   How many errors should we tolerate?   Potentially exponential hypothesis space   Ungrammatical strings: vision problem or illegal activity? 28
  29. 29. Observations  CFGs good for structured activities  Can incorporate uncertainty in observations  Natural contextual prior for recognizing errors  Not clear how to deal with errors  Assumes ‘good’ action classifiers  Need to define grammar manually Can we learn the grammar from data? 29
  30. 30. Heuristic Grammatical Induction 1.  Lexicon learning •  Learn HMMs •  Cluster HMMs 2.  Convert video to string 3.  Learn Grammar Unsupervised Analysis of Human Gestures. Wang et al 2001 30
  31. 31. COMPRESSIVE a b c d a b c d b c d a b a b length occurrence new rule new symbol deletion of insertion of substring new rule substringOn-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. Nevill-Manning 2000. 31
  32. 32. exampleS→a b c d a b c d b c d a b a b (DL=16)A→b c dS→a A a A A a b a b (DL=14) Repeat until compression becomes 0. 32
  33. 33. Critical assumption  No uncertainty  No errors   insertions   deletions   substitution Can we learn grammars despite errors? 33
  34. 34. Learning with noise Can we learn the basic structure of a transaction?Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 34
  35. 35. extracting primitivesRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 35
  36. 36. Underlying structure? D → a x b y c a b x c y a b c xRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 36
  37. 37. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b cRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 37
  38. 38. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b cRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 38
  39. 39. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b c A→a b c D → A A A Simple grammar Efficient compressionRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 39
  40. 40. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression GRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 40
  41. 41. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structureRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 41
  42. 42. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structure DL(D|G) = − log p(D|G) Data compression Likelihood (inside probabilities)Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 42
  43. 43. Minimum Description LengthRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 43
  44. 44. Minimum Description LengthRecovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 44
  45. 45. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 45
  46. 46. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 46
  47. 47. Conclusions  Possible to learn basic structure  Robust to errors (insertion, deletion, substitution)  Need a lot of training data  Computational complexity 47
  48. 48. Bayesian ApproachesInfinite Hierarchical Hidden Markov Models. The Infinite PCFG using Hierarchical Dirichlet Processes. Heller et al 2009. Liang et al 2007. 48
  49. 49. Take home message Hierarchical Syntactic Models  Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure  Not for   Systems with a lot of errors and uncertainty   Activities with weak structure 49
  50. 50. StatisticalApproaches 50
  51. 51. Using a hierarchical statistical approach  Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics  Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 51
  52. 52. Statistical (State-based) ModelActivities as a stochastic path. What are the underlying dynamics? 52
  53. 53. Characteristics  Strong Markov assumption  Strong dynamics prior  Robust to uncertainty  Modifications to account for   Hierarchical structure   Concurrent structure 53
  54. 54. Hierarchical activities Problem: How do we modelhierarchical activities? combinatory state space! Solution: “stack” actions forhierarchical activities 54
  55. 55. Hierarchical hidden Markov model Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 55
  56. 56. Context-free activity grammarLearning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 56
  57. 57. Context-free activity grammarLearning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 57
  58. 58. Observations  Tree structures useful for hierarchies  Tight integration of trajectories with abstract semantic states  Activities are not always a single sequence (ie. they sometimes happen in parallel ) 58
  59. 59. Concurrent activities Problem: How do we modelconcurrent activities? combinatory state space! Solution:“stand-up” model for concurrent activities 59
  60. 60. Propagation networkPropagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 60
  61. 61. Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 61
  62. 62. temporal inferenceInference by standing the state transition model on its side 62
  63. 63. Inferring structure (storylines) Understanding Videos, Constructing Plots – Learning a Visually Grounded Storyline Model from Annotated Videos Gupta, Srinivasan, Shi and Davis CVPR 2009 Learn AND-OR graphs from weakly labeled data 63
  64. 64. Scripts from structureUnderstanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos.Gupta, Srinivasan, Shi and Davis CVPR 2009 64
  65. 65. Take home message Hierarchical statistical model  Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics  Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 65
  66. 66. Contrasting hierarchical approaches Actions as: Activities as: Model Characteristic probabilistic Robust to Statistic paths DBN states uncertainty discrete DescribesSyntactic strings CFG symbols deep hierarchy Encodes logicalDescriptive sets CFG, MLN complex relationships logic 66
  67. 67. References (not included in ACM survey paper)  W. Tsai and K.S. Fu. Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. SMC1980.  M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.  T. Wang, H. Shum, Y. Xu, N. Zheng. Unsupervised Analysis of Human Gestures. PRCM 2001.  C.G. Nevill-Manning, I.H. Witten. On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. IEEE 2000.  K. Heller, Y.W. Teh and D. Gorur. Infinite Hierarchical Hidden Markov Model s. AISTATS 2009.  P. Liang, S. Petrov, M. Jordan, D. Klein. The Infinite PCFG using Hierarchical Dirichlet Processes. EMNLP 2007.  A. Gupta, N. Srinivasan, J.Shi and L.Davis. Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos. CVPR 2009. 67

×