SlideShare a Scribd company logo
1 of 41
Download to read offline
Frontiers of
Human Activity Analysis

                J. K. Aggarwal
               Michael S. Ryoo
                 Kris M. Kitani
1990 ~




         Single layered
          approaches


                          2
Two different views
 Activities as human    Activities as video
  movements               observations
   Semantic-oriented       Data-oriented
   3-D body-part           Spatio-temporal
    estimation               features
   Tracking                Bag-of-words




 Sequence               Space-time distribution

                                                    3
1990s




        Single layered
         approaches
           Sequential approaches


                                   4
Sequential approaches
 Actions as a set of videos
   Videos as feature sequences




                                  5
Sequential approaches
 Motivation
   An action is a sequence
    of body-part states
   Each frame in an action
    video describes
    a particular body-part
    configuration
    Example:
     11 points
     body configuration of ‘kicking’
                                       6
Action recognition using HMMs
 Recognition using hidden Markov models
   Each HMM generates a particular sequence
    of features.
   Matching observed features with the model.
     An action -> a set of sequences of features


 [Yamato et al. CVPR 1992]: Tennis plays



                                                    7
HMMs for actions
 Human action as a pose sequence
 Each hidden state is trained to generate a
  particular body posture.
   Each HMM produces a pose sequence: action
         a00         a11          a22             ann
               a01          a12
         w0          w1           w2      ……..    wn
           b0k         b1k          b2k             bnk

        pose         pose         pose           pose


                                                          8
Hidden Markov models
       This is a classic evaluation problem of HMMs.
           Given observations VT (a sequence of poses), find the
            HMM Mi that maximizes P(VT|Mi): forward algorithm.
           Transition probabilities aij and observations
            probabilities bik are pre-trained using training data.

Noise HMM                                     Structure of other basic HMMs
a00                    a11                     a00           a11         a22           ann
        a01                                          a01           a12
 w0                    w1                      w0            w1          w2      ……    wn

      (b00,b01,b02)          (b10,b11,b12)       (b00,b01,b02) b1k         b2k               bnk
      =(1/3,1/3,1/3)         =(1/3,1/3,1/3)      =(1/3,1/3,1/3)
pose                   pose                   pose          pose         pose         pose
(observation)
                                                                                              9
HMMs for hand gestures
 HMMs for gesture recognition
   American Sign Language (ASL)
   Sequential HMMs
         Features from colored globes




 [Starner, T. and Pentland, A., Real-time American Sign Language recognition from video
 using hidden Markov models. International Symposium on Computer Vision, 1995.]
                                                                                          10
Dynamic time warping
 Dynamic programming algorithm to match
  two strings (e.g. sequences).
   [Gavrila and L. Davis, 1995]
   Each frame generates a symbol (of a feature
    vector)




                                                  11
Coupled HMMs
 Pentland CHMMs
   Human-human interactions
         Two types of states for two agents
   Synthetic agents for training HMMs


                                             Vs.




 [Oliver, N. M., Rosario, B., and Pentland, A. P., A Bayesian computer vision sys-
 tem for modeling human interactions. IEEE T PAMI, 2000.]
                                                                                     12
HMM Variations
 Coupled hidden semi-Markov models
   Natarajan and Nevatia 2007
   Human-human interactions
   Activities with varying durations
         Models probabilistic
          distributions of state
          durations.




 [Natarajan, P. and Nevatia, R., Coupled hidden semi Markov models for activity recognition.
 WMVC 2007]
                                                                                               13
Dynamic Bayesian networks
 Diverse variations
   Dynamic Bayesian
    networks
         Body-part analysis




 [Park, S. and Aggarwal, J. K., A hierarchical Bayesian network for event recognition of
 human actions and interactions. Multimedia Systems, 2004]
                                                                                           14
Hierarchical human-body modeling

                                                                                                                       Person Pi   Person ID


             180
                                                                                                 Head
             160

             140

             120

             100
  #pixels




                                                                                               Upper-body    Head       Upper bo   Lower bo
              80

              60




                                                                                                                          dy          dy
              40

              20

               0
                    0    50    100        150         200         250     300    350
                                                row




                                                                                                Lower-body
             0.08
                                                                                                               Hair        Torso       Legs
             0.07


             0.06


             0.05
p(x|theta)




             0.04




                                                                                                              Face         Arms        Feet
             0.03


             0.02


             0.01


               0
                    0   20    40     60         80          100     120    140   160
                                                x




                     Vertical                                                          A simple human
                    projection                                                           body model                   Data structure


                                                                                                                                               15
Space-time trajectories
 Trajectory patterns
   Yilmaz and Shah, 2005 – UCF
   Joint trajectories in
    3-D XYT space.
   Compared trajectory
    shapes to classify
    human actions.



 [Yilmaz, A. and Shah, M., Recognizing human
 actions in videos acquired by uncalibrated
 moving cameras, ICCV 2005]                    16
Sequential approaches - summary
 Designed for modeling sequential dynamics
   Markov process
   Motion features are extracted per frame

 Limitations
   Feature extraction
      Assumes good observation models
   Complex human activities?
      Large amount of training data


                                              17
2000s




        Single layered
         approaches
          Space-time approaches


                                  18
Space-time approaches
 Actions as a set of videos
   Videos as space-time volumes




                                   19
Space-time approaches
 Videos as 3-D XYT volumes
 Problem: matching between two volumes
   Match volumes directly
        Compare volumes from testing videos with those
         from training videos.
   t                                   t



                            similar?




         Training video                    Testing video
                                                           20
Motion history images
 Matching two
  volumes
 Bobick and J. Davis,
  2001
   Motion history images
    (MHIs)
   Weighted projection of
    a XYT foreground
    volume
   Template matching
 [Bobick, A. and Davis, J., The recognition of
 human movement using temporal templates.
 IEEE T PAMI 23(3), 2001]
                                                 21
3-D volume matching
 Ke, Suktankar, Herbert 2007
   Volume
    matching
    based on its
    segments.
   Segment
    matching
    scores are
    combined.

 [Ke, Y., Sukthankar, R., and Hebert, M., Spatio-temporal shape and flow correlation for action
 recognition. CVPR 2007]
                                                                                                  22
Global features from volumes
 Efros et al. 2003
   Concatenated optical flow features from 3-D
    XYT volumes
   Analyzed
    soccer plays
    from low-
    resolution
    videos.



 [Efros, A., Berg, A., Mori, G., and Malik, J., Recognizing action at a distance, ICCV 2003]
                                                                                               23
Sparse features from videos
 Problem: matching between two videos
   Match volumes directly?
   Extracts sparse features -
     Video version of SIFT features
                                 t




     SIFT for images             Features for videos
                                                       24
Sparse features from videos
 Spatio-temporal features
   Reliable under noise, background changes,
    lighting condition changes, …
   Laptev 2003, Dollar et al. 2005




          (a)         (b)         (c)
                                                25
Cuboid features
 Cuboid descriptors
   Dollar et al., Cuboid, VS-PETS 2005
   Appearances of local 3-D XYT volumes
         Raw appearance
         Gradients
         Optical flows
   Captures salient
    periodic motion.


 [Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse
 spatio-temporal features, VS-PETS 2005]
                                                                                            26
STIP interest point detector
 Laptev and Linderberg 2003
   Simple periodic actions
         Spatio-temporal local features + SVMs
   Introduced the KTH dataset
   Local descriptor
    based on Harris
    corner detector



 [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach,
 ICPR 2004]
                                                                                              27
Bag-of-words representation
   t




Classify features based on their   Histogram (bag-of-words)
           appearance                      similarity
   t




                                                              28
pLSA models for actions
 pLSA from text recognition
   Probabilistic latent semantic analysis
   Reasoning the probability of features
    originated from a particular action video.




 [Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories
 using spatial-temporal words, BMVC 2006]
                                                                                                29
Approach overview
 Recognition using local spatio-temporal
  features                      t


   Bag-of-words
   Classifiers
      e.g. SVMs, pLSA, …

 Extensions
   Structural considerations
   Hybrid features
   Grouping features
                                            30
Structural considerations
 Bag-of-features ignores structure.
 Structures?
   Wong et al. 2007
         pLSA-ISM: encodes relative
          locations of features
   Savarese et al. 2008
         Feature correlation:
          pairwise proximity

 [Wong, S.-F., Kim, T.-K., and Cipolla, R., Learning motion categories using both semantic
 and structural information, CVPR 2007]
 [Savarese, S., DelPozo, A., Niebles, J., and Fei-Fei, L., Spatial-temporal correlatons
 for unsupervised action classification, WMVC 2008]                                          31
Local features for movie scenes
 Laptev et al. 2009 – movies
   Movie scenes with camera movements
      Instantaneous actions




   Improved their local descriptor [Laptev 03] for
    analyzing movie videos.
      Gradients + optical flows
                                                      32
Grouping features
 Groups a small number of features
   2~3 features which appear jointly
   Spatially/temporally
    adjacent features
         grouping


 Multiple levels
   Hierarchical?

 [Kovashka A. and Grauman K., Learning a hierarchy of discriminative space-time neighborhood
 features for human action recognition, CVPR 2010]
                                                                                               33
XYT approaches: pros and cons
 Advantages
   Robust under noise
      Background changes, camera movements, …
      YouTube-type videos
 Limitations
   Bag-of-words
      Spatio-temporal relations among features are
       ignored.
   Not hierarchical
      Difficult to model complex activities

                                                      34
Summary: single layered
 In general, suitable for action recognition
   Single actor
   Structural variations?
 Handle uncertainties reliably
   Strong to noise, background, illuminations, …
 Stochastic decision
   Can be served as building blocks.
 A large number of training videos required.

                                                    35
Datasets
 KTH dataset
   Single action video
    classification
      Single actor
      One action per video


 Weizmann dataset
   Similar to the KTH
    dataset (single action)

                              36
KTH results
100                                                 Shuldt et al. '04
                                                    Dollar et al. '05
95
                                                    Jiang et al. '06

90                                                  Niebles et al. '06
                                                    Yeo et al. '06
85
                                                    Ke et al.'07

80                                                  Savarese et al. '08
                                                    Laptev et al. '08
75
                                                    Rodriguez et al. '08

70                                                  Liu et al. '08
                                                    Bregonzio et al. '09
65
                                                    Ranpantzikos et al. '09

60                                                  Ryoo et al. '09b
  2004   2005   2006    2007   2008   2009   2010




                                                                           37
New datasets
 Hollywood dataset [Laptev 07,08]
   Movie scenes
   Goal: recognition in complex environments
      Moving cameras
      Background changes

   Action classification
      Segmented videos
      Atomic movements
       (e.g. kissing)


                                                38
New datasets
 UT-Interaction dataset
   Multiple actors
      Human interactions
      Pedestrians
   Continuous videos


 UT-Tower dataset
   Low-resolution
   Simple actions

                             39
Single layered: References
Space-Time approaches
 Bobick, A. and Davis, J., The recognition of human movement using temporal templates. IEEE T
  PAMI 23(3), 2001.
 Efros, A., Berg, A., Mori, G., and Malik, J., Recognizing action at a distance, ICCV 2003.
 Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR
  2004.
 Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving
  cameras, ICCV 2005.
 Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse spatio-
  temporal features, VS-PETS 2005.
 Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories using
  spatial-temporal words, BMVC 2006.
 Ke, Y., Sukthankar, R., and Hebert, M., Spatio-temporal shape and flow correlation for action
  recognition. CVPR 2007.
 Wong, S.-F., Kim, T.-K., and Cipolla, R., Learning motion categories using both semantic and
  structural information, CVPR 2007
 Savarese, S., DelPozo, A., Niebles, J., and Fei-Fei, L., Spatial-temporal correlatons for
  unsupervised action classification, WMVC 2008.
 Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B., Learning realistic human actions from
  movies, CVPR 2008.



                                                                                                      40
Singled-layered: References (2)
Sequential approaches
 Yamato, J., Ohya, J., and Ishii, K., Recognizing human action in time-sequential images using
  hidden Markov model. CVPR 1992.
 Gavrila, D. and Davis, L., Towards 3-D model-based tracking and recognition of human movement.
  In International Workshop on Face and Gesture Recognition 1995.
 Starner, T. and Pentland, A., Real-time American Sign Language recognition from video using
  hidden Markov models. International Symposium on Computer Vision, 1995.
 Oliver, N. M., Rosario, B., and Pentland, A. P., A Bayesian computer vision system for modeling
  human interactions. IEEE T PAMI, 2000.
 Park, S. and Aggarwal, J. K., A hierarchical Bayesian network for event recognition of human
  actions and interactions. Multimedia Systems, 2004.
 Natarajan, P. and Nevatia, R., Coupled hidden semi Markov models for activity recognition. WMVC
  2007.




                                                                                                    41

More Related Content

Viewers also liked

cvpr2011: human activity recognition - part 2: overview
cvpr2011: human activity recognition - part 2: overviewcvpr2011: human activity recognition - part 2: overview
cvpr2011: human activity recognition - part 2: overviewzukun
 
cvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductioncvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductionzukun
 
Simple and Complex Activity Recognition Through Smart Phones
Simple and Complex Activity Recognition Through Smart PhonesSimple and Complex Activity Recognition Through Smart Phones
Simple and Complex Activity Recognition Through Smart PhonesBarnan Das
 
Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...
Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...
Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...PyData
 
Activity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone AccelerometersActivity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone AccelerometersIshara Amarasekera
 
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...Rupali Bhatnagar
 

Viewers also liked (7)

cvpr2011: human activity recognition - part 2: overview
cvpr2011: human activity recognition - part 2: overviewcvpr2011: human activity recognition - part 2: overview
cvpr2011: human activity recognition - part 2: overview
 
cvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introductioncvpr2011: human activity recognition - part 1: introduction
cvpr2011: human activity recognition - part 1: introduction
 
Simple and Complex Activity Recognition Through Smart Phones
Simple and Complex Activity Recognition Through Smart PhonesSimple and Complex Activity Recognition Through Smart Phones
Simple and Complex Activity Recognition Through Smart Phones
 
Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...
Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...
Introduction to Action Recognition in Python by Bertrand Nouvel, Jonathan Kel...
 
Activity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone AccelerometersActivity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone Accelerometers
 
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel...
 
Format Of Synopsis
Format Of SynopsisFormat Of Synopsis
Format Of Synopsis
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 

cvpr2011: human activity recognition - part 3: single layer

  • 1. Frontiers of Human Activity Analysis J. K. Aggarwal Michael S. Ryoo Kris M. Kitani
  • 2. 1990 ~ Single layered approaches 2
  • 3. Two different views  Activities as human  Activities as video movements observations  Semantic-oriented  Data-oriented  3-D body-part  Spatio-temporal estimation features  Tracking  Bag-of-words  Sequence  Space-time distribution 3
  • 4. 1990s Single layered approaches Sequential approaches 4
  • 5. Sequential approaches  Actions as a set of videos  Videos as feature sequences 5
  • 6. Sequential approaches  Motivation  An action is a sequence of body-part states  Each frame in an action video describes a particular body-part configuration  Example: 11 points body configuration of ‘kicking’ 6
  • 7. Action recognition using HMMs  Recognition using hidden Markov models  Each HMM generates a particular sequence of features.  Matching observed features with the model.  An action -> a set of sequences of features  [Yamato et al. CVPR 1992]: Tennis plays 7
  • 8. HMMs for actions  Human action as a pose sequence  Each hidden state is trained to generate a particular body posture.  Each HMM produces a pose sequence: action a00 a11 a22 ann a01 a12 w0 w1 w2 …….. wn b0k b1k b2k bnk pose pose pose pose 8
  • 9. Hidden Markov models  This is a classic evaluation problem of HMMs.  Given observations VT (a sequence of poses), find the HMM Mi that maximizes P(VT|Mi): forward algorithm.  Transition probabilities aij and observations probabilities bik are pre-trained using training data. Noise HMM Structure of other basic HMMs a00 a11 a00 a11 a22 ann a01 a01 a12 w0 w1 w0 w1 w2 …… wn (b00,b01,b02) (b10,b11,b12) (b00,b01,b02) b1k b2k bnk =(1/3,1/3,1/3) =(1/3,1/3,1/3) =(1/3,1/3,1/3) pose pose pose pose pose pose (observation) 9
  • 10. HMMs for hand gestures  HMMs for gesture recognition  American Sign Language (ASL)  Sequential HMMs  Features from colored globes [Starner, T. and Pentland, A., Real-time American Sign Language recognition from video using hidden Markov models. International Symposium on Computer Vision, 1995.] 10
  • 11. Dynamic time warping  Dynamic programming algorithm to match two strings (e.g. sequences).  [Gavrila and L. Davis, 1995]  Each frame generates a symbol (of a feature vector) 11
  • 12. Coupled HMMs  Pentland CHMMs  Human-human interactions  Two types of states for two agents  Synthetic agents for training HMMs Vs. [Oliver, N. M., Rosario, B., and Pentland, A. P., A Bayesian computer vision sys- tem for modeling human interactions. IEEE T PAMI, 2000.] 12
  • 13. HMM Variations  Coupled hidden semi-Markov models  Natarajan and Nevatia 2007  Human-human interactions  Activities with varying durations  Models probabilistic distributions of state durations. [Natarajan, P. and Nevatia, R., Coupled hidden semi Markov models for activity recognition. WMVC 2007] 13
  • 14. Dynamic Bayesian networks  Diverse variations  Dynamic Bayesian networks  Body-part analysis [Park, S. and Aggarwal, J. K., A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems, 2004] 14
  • 15. Hierarchical human-body modeling Person Pi Person ID 180 Head 160 140 120 100 #pixels Upper-body Head Upper bo Lower bo 80 60 dy dy 40 20 0 0 50 100 150 200 250 300 350 row Lower-body 0.08 Hair Torso Legs 0.07 0.06 0.05 p(x|theta) 0.04 Face Arms Feet 0.03 0.02 0.01 0 0 20 40 60 80 100 120 140 160 x Vertical A simple human projection body model Data structure 15
  • 16. Space-time trajectories  Trajectory patterns  Yilmaz and Shah, 2005 – UCF  Joint trajectories in 3-D XYT space.  Compared trajectory shapes to classify human actions. [Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005] 16
  • 17. Sequential approaches - summary  Designed for modeling sequential dynamics  Markov process  Motion features are extracted per frame  Limitations  Feature extraction  Assumes good observation models  Complex human activities?  Large amount of training data 17
  • 18. 2000s Single layered approaches Space-time approaches 18
  • 19. Space-time approaches  Actions as a set of videos  Videos as space-time volumes 19
  • 20. Space-time approaches  Videos as 3-D XYT volumes  Problem: matching between two volumes  Match volumes directly  Compare volumes from testing videos with those from training videos. t t similar? Training video Testing video 20
  • 21. Motion history images  Matching two volumes  Bobick and J. Davis, 2001  Motion history images (MHIs)  Weighted projection of a XYT foreground volume  Template matching [Bobick, A. and Davis, J., The recognition of human movement using temporal templates. IEEE T PAMI 23(3), 2001] 21
  • 22. 3-D volume matching  Ke, Suktankar, Herbert 2007  Volume matching based on its segments.  Segment matching scores are combined. [Ke, Y., Sukthankar, R., and Hebert, M., Spatio-temporal shape and flow correlation for action recognition. CVPR 2007] 22
  • 23. Global features from volumes  Efros et al. 2003  Concatenated optical flow features from 3-D XYT volumes  Analyzed soccer plays from low- resolution videos. [Efros, A., Berg, A., Mori, G., and Malik, J., Recognizing action at a distance, ICCV 2003] 23
  • 24. Sparse features from videos  Problem: matching between two videos  Match volumes directly?  Extracts sparse features -  Video version of SIFT features t SIFT for images Features for videos 24
  • 25. Sparse features from videos  Spatio-temporal features  Reliable under noise, background changes, lighting condition changes, …  Laptev 2003, Dollar et al. 2005 (a) (b) (c) 25
  • 26. Cuboid features  Cuboid descriptors  Dollar et al., Cuboid, VS-PETS 2005  Appearances of local 3-D XYT volumes  Raw appearance  Gradients  Optical flows  Captures salient periodic motion. [Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse spatio-temporal features, VS-PETS 2005] 26
  • 27. STIP interest point detector  Laptev and Linderberg 2003  Simple periodic actions  Spatio-temporal local features + SVMs  Introduced the KTH dataset  Local descriptor based on Harris corner detector [Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004] 27
  • 28. Bag-of-words representation t Classify features based on their Histogram (bag-of-words) appearance similarity t 28
  • 29. pLSA models for actions  pLSA from text recognition  Probabilistic latent semantic analysis  Reasoning the probability of features originated from a particular action video. [Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words, BMVC 2006] 29
  • 30. Approach overview  Recognition using local spatio-temporal features t  Bag-of-words  Classifiers  e.g. SVMs, pLSA, …  Extensions  Structural considerations  Hybrid features  Grouping features 30
  • 31. Structural considerations  Bag-of-features ignores structure.  Structures?  Wong et al. 2007  pLSA-ISM: encodes relative locations of features  Savarese et al. 2008  Feature correlation: pairwise proximity [Wong, S.-F., Kim, T.-K., and Cipolla, R., Learning motion categories using both semantic and structural information, CVPR 2007] [Savarese, S., DelPozo, A., Niebles, J., and Fei-Fei, L., Spatial-temporal correlatons for unsupervised action classification, WMVC 2008] 31
  • 32. Local features for movie scenes  Laptev et al. 2009 – movies  Movie scenes with camera movements  Instantaneous actions  Improved their local descriptor [Laptev 03] for analyzing movie videos.  Gradients + optical flows 32
  • 33. Grouping features  Groups a small number of features  2~3 features which appear jointly  Spatially/temporally adjacent features  grouping  Multiple levels  Hierarchical? [Kovashka A. and Grauman K., Learning a hierarchy of discriminative space-time neighborhood features for human action recognition, CVPR 2010] 33
  • 34. XYT approaches: pros and cons  Advantages  Robust under noise  Background changes, camera movements, …  YouTube-type videos  Limitations  Bag-of-words  Spatio-temporal relations among features are ignored.  Not hierarchical  Difficult to model complex activities 34
  • 35. Summary: single layered  In general, suitable for action recognition  Single actor  Structural variations?  Handle uncertainties reliably  Strong to noise, background, illuminations, …  Stochastic decision  Can be served as building blocks.  A large number of training videos required. 35
  • 36. Datasets  KTH dataset  Single action video classification  Single actor  One action per video  Weizmann dataset  Similar to the KTH dataset (single action) 36
  • 37. KTH results 100 Shuldt et al. '04 Dollar et al. '05 95 Jiang et al. '06 90 Niebles et al. '06 Yeo et al. '06 85 Ke et al.'07 80 Savarese et al. '08 Laptev et al. '08 75 Rodriguez et al. '08 70 Liu et al. '08 Bregonzio et al. '09 65 Ranpantzikos et al. '09 60 Ryoo et al. '09b 2004 2005 2006 2007 2008 2009 2010 37
  • 38. New datasets  Hollywood dataset [Laptev 07,08]  Movie scenes  Goal: recognition in complex environments  Moving cameras  Background changes  Action classification  Segmented videos  Atomic movements (e.g. kissing) 38
  • 39. New datasets  UT-Interaction dataset  Multiple actors  Human interactions  Pedestrians  Continuous videos  UT-Tower dataset  Low-resolution  Simple actions 39
  • 40. Single layered: References Space-Time approaches  Bobick, A. and Davis, J., The recognition of human movement using temporal templates. IEEE T PAMI 23(3), 2001.  Efros, A., Berg, A., Mori, G., and Malik, J., Recognizing action at a distance, ICCV 2003.  Schuldt, C., Laptev, I., and Caputo, B., Recognizing human actions: A local SVM approach, ICPR 2004.  Yilmaz, A. and Shah, M., Recognizing human actions in videos acquired by uncalibrated moving cameras, ICCV 2005.  Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S., Behavior recognition via sparse spatio- temporal features, VS-PETS 2005.  Niebles, J. C., Wang, H., and Fei-Fei, L., Unsupervised learning of human action categories using spatial-temporal words, BMVC 2006.  Ke, Y., Sukthankar, R., and Hebert, M., Spatio-temporal shape and flow correlation for action recognition. CVPR 2007.  Wong, S.-F., Kim, T.-K., and Cipolla, R., Learning motion categories using both semantic and structural information, CVPR 2007  Savarese, S., DelPozo, A., Niebles, J., and Fei-Fei, L., Spatial-temporal correlatons for unsupervised action classification, WMVC 2008.  Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B., Learning realistic human actions from movies, CVPR 2008. 40
  • 41. Singled-layered: References (2) Sequential approaches  Yamato, J., Ohya, J., and Ishii, K., Recognizing human action in time-sequential images using hidden Markov model. CVPR 1992.  Gavrila, D. and Davis, L., Towards 3-D model-based tracking and recognition of human movement. In International Workshop on Face and Gesture Recognition 1995.  Starner, T. and Pentland, A., Real-time American Sign Language recognition from video using hidden Markov models. International Symposium on Computer Vision, 1995.  Oliver, N. M., Rosario, B., and Pentland, A. P., A Bayesian computer vision system for modeling human interactions. IEEE T PAMI, 2000.  Park, S. and Aggarwal, J. K., A hierarchical Bayesian network for event recognition of human actions and interactions. Multimedia Systems, 2004.  Natarajan, P. and Nevatia, R., Coupled hidden semi Markov models for activity recognition. WMVC 2007. 41