SlideShare a Scribd company logo
Human Action Recognition
 by Learning Bases of Action
     Attributes and Parts
 Bangpeng Yao, Xiaoye Jiang, Aditya Khosla,
Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei

            Stanford University



                                                1
Action Classification in Still Images
Low level feature
                         Riding bike




 Yao & Fei-Fei, 2010
 Koniusz et al., 2010
 Delaitre et al., 2010
 Yao et al., 2011




                                            2
Action Classification in Still Images
Low level feature                      High-level representation
                         Riding bike
                                        - Semantic concepts – Attributes




                         Riding a bike
 Yao & Fei-Fei, 2010
 Koniusz et al., 2010    Sitting on a bike seat
 Delaitre et al., 2010   Wearing a helmet
 Yao et al., 2011
                         Peddling the pedals
                         …



                                                                       3
Action Classification in Still Images
Low level feature                      High-level representation
                         Riding bike
                                        - Semantic concepts – Attributes
                                        - Objects




                         Riding a bike
 Yao & Fei-Fei, 2010
 Koniusz et al., 2010    Sitting on a bike seat
 Delaitre et al., 2010   Wearing a helmet
 Yao et al., 2011
                         Peddling the pedals
                         …



                                                                       4
Action Classification in Still Images
Low level feature                      High-level representation
                         Riding bike
                                        - Semantic concepts – Attributes
                                        - Objects
                                                          Parts
                                        - Human poses




                         Riding a bike
 Yao & Fei-Fei, 2010
 Koniusz et al., 2010    Sitting on a bike seat
 Delaitre et al., 2010   Wearing a helmet
 Yao et al., 2011
                         Peddling the pedals
                         …



                                                                       5
Action Classification in Still Images
Low level feature                      High-level representation
                         Riding bike
                                        - Semantic concepts – Attributes
                                        - Objects
                                                            Parts
                                        - Human poses
                                        - Contexts of attributes & parts
                              Riding

                         Riding a bike
 Yao & Fei-Fei, 2010
 Koniusz et al., 2010    Sitting on a bike seat
 Delaitre et al., 2010   Wearing a helmet
 Yao et al., 2011
                         Peddling the pedals
                         …



                                                                       6
Action Classification in Still Images
Low level feature                             High-level representation
                           Riding bike
                         wearing
                         a helmet             - Semantic concepts – Attributes
                                              - Objects
                         sitting on                               Parts
                         bike seat            - Human poses
                                   Peddling   - Contexts of attributes & parts
                                  the pedal
                              riding a bike



 Yao & Fei-Fei, 2010      Farhadi et al., 2009     Gupta et al., 2009       Yang et al., 2010
 Koniusz et al., 2010     Lampert et al., 2009     Yao & Fei-Fei, 2010      Maji et al., 2011
 Delaitre et al., 2010    Berg et al., 2010        Torresani et al., 2010   Liu et al., 2011
 Yao et al., 2011         Parikh & Grauman, 2011   Li et al., 2010


                                    Incorporate human knowledge;
                                    More understanding of image content;
                                    More discriminative classifier.
                                                                                          7
Outline
• Intuition: Action Attributes and Parts
• Algorithm: Learning Bases of Attributes
and Parts
• Experiments: PASCAL VOC & Stanford
40 Actions
• Conclusion

                                            8
Outline
• Intuition: Action Attributes and Parts
• Algorithm: Learning Bases of Attributes
and Parts
• Experiments: PASCAL VOC & Stanford
40 Actions
• Conclusion

                                            9
Action Attributes and Parts
Attributes: semantic descriptions of human actions


      ……




                                                     10
Action Attributes and Parts
Attributes: semantic descriptions of human actions


                     Discriminative classifier, e.g. SVM
      ……




                      Riding
                      bike                           Not
                                                     riding
                                                     bike
                                           Lampert et al., 2009
                                           Berg et al., 2010


                                                                  11
Action Attributes and Parts
Attributes:


                  A pre-trained detector
      ……
Parts-Objects:


      ……
Parts-Poselets:


      ……
                      Object Bank, Li et al., 2010
                      Poselet, Bourdev & Malik, 2009


                                                       12
Action Attributes and Parts
Attributes:
                                             a: Image feature vector
      ……          Attribute classification

Parts-Objects:
                     Object detection
      ……
Parts-Poselets:
                    Poselet detection

      ……



                                                                  13
Action Attributes and Parts
Attributes:         Action bases Φ

                                             a: Image feature vector
      ……          Attribute classification

Parts-Objects:                   …
                     Object detection
      ……
Parts-Poselets:
                    Poselet detection

      ……



                                                                  14
Action Attributes and Parts
Attributes:       Action bases Φ

                                   a: Image feature vector
      ……
Parts-Objects:               …

      ……
Parts-Poselets:


      ……



                                                        15
Action Attributes and Parts
Attributes:       Action bases Φ

                                   a: Image feature vector
      ……
Parts-Objects:               …

      ……
Parts-Poselets:


      ……



                                                        16
Action Attributes and Parts
Attributes:       Action bases Φ

                                         a: Image feature vector
      ……
Parts-Objects:                …

      ……
Parts-Poselets:
                                                       a Φw
      ……


                  Bases coefficients w
                                                              17
Action Attributes and Parts
Attributes:       Action bases Φ

                                         a: Image feature vector
      ……
Parts-Objects:                …

      ……
Parts-Poselets:
                                                         a Φw
      ……
                                              • Sparse
                                              • Encodes context
                                              • Robust to initially
                  Bases coefficients w        weak detections
                                                                18
Outline
• Intuition: Action Attributes and Parts
• Algorithm: Learning Bases of
Attributes and Parts
• Experiments: PASCAL VOC & Stanford
40 Actions
• Conclusion

                                           19
Bases of Atr. & Parts: Training
a     Φ
               • Input: a1 ,, a N
               • Output: Φ              Φ1 ,, ΦM
           …                                                          sparse
                               W        w1 ,, w N
               • Jointly estimate Φ and W :
           w             N
                               1                2
                  min            ai Φw i                 wi       ,
    a Φw          Φ ,W
                         i 1   2                2             1

                  Accurate approximation             L1 regularization, sparsity of W
                                                     2
                     s.t.      j, Φ j           Φj        1
                                        1   2        2

                                Elastic net, sparsity of Φ [Zou & Hasti, 2005]

               • Optimization: stochastic gradient descent.
                                                                                20
Bases of Atr. & Parts: Testing
a     Φ
               • Input: a
                        Φ     Φ1 ,, ΦM
           …
               • Output: w               sparse

               • Estimate w:
           w
                       1            2
    a Φw            min a Φw        2
                                            w1
                     w 2
                Accurate approximation    L1 regularization, sparsity of W

               • Optimization: stochastic gradient descent.



                                                                      21
Outline
• Intuition: Action Attributes and Parts
• Algorithm: Learning Bases of Attributes
and Parts
• Experiments: PASCAL VOC & Stanford
40 Actions
• Conclusion

                                            22
PASCAL VOC 2010 Action Dataset
• 9 classes, 50-100 trainval / testing images per class




                                         Figure credit: Ivan Laptev

• 14 attributes – trained from the trainval images;
 27 objects – taken from Li et al, NIPS 2010;
 150 poselets – taken from Bourdev & Malik, ICCV 2009.
                                                                      23
VOC 2010: Classification Result
                        0.9         SURREY_MK
                                    UCLEAR_DOSP
                        0.8         Poselet, Maji et al, 2011
    Average precision




                        0.7         Our method, use “a”

                        0.6

                        0.5

                        0.4

                        0.3

                                 1
                              Phoning        2
                                          Playing     3
                                                   Reading         4
                                                                Riding     5
                                                                         Riding      6
                                                                                  Running     7
                                                                                            Taking       8        9
                                                                                                                Walking
                                                                                                       Using
                                        instrument               bike    horse              photo    computer

a                       Φ


                            …


                            w
                                                                                                                          24
VOC 2010: Classification Result
                        0.9          SURREY_MK
                                     UCLEAR_DOSP
                        0.8          Poselet, Maji et al, 2011
    Average precision




                        0.7          Our method, use “a”
                                     Our method, use “w”
                        0.6

                        0.5

                        0.4

                        0.3

                                 1           2             3       4        5         6        7         8          9
                              Phoning     Playing  Reading       Riding   Riding   Running   Taking     Using    Walking
                                        instrument                bike    horse              photo    computer

a                       Φ


                            …


                            w
                                                                                                                           25
VOC 2010: Analysis of Bases
                        0.9          SURREY_MK
                                     UCLEAR_DOSP
                        0.8          Poselet, Maji et al, 2011
    Average precision




                        0.7          Our method, use “a”
                                     Our method, use “w”
                        0.6

                        0.5

                        0.4

                        0.3

                                 1           2             3       4          5       6         7          8          9
                              Phoning     Playing  Reading       Riding   Riding   Running    Taking      Using    Walking
                                        instrument                bike    horse               photo     computer

a                       Φ                                                                    attributes
                                                                                             objects

                            …
                                                                                             poselets


                            w
                                                           400 action bases                                                  26
VOC 2010: Analysis of Bases
                        0.9          SURREY_MK
                                     UCLEAR_DOSP
                        0.8          Poselet, Maji et al, 2011
    Average precision




                        0.7          Our method, use “a”
                                     Our method, use “w”
                        0.6

                        0.5

                        0.4

                        0.3

                                 1           2             3       4          5       6         7          8          9
                              Phoning     Playing  Reading       Riding   Riding   Running    Taking      Using    Walking
                                        instrument                bike    horse               photo     computer

a                       Φ                                                                    attributes
                                                                                             objects

                            …
                                                                                             poselets


                            w
                                                           400 action bases                                                  27
VOC 2010: Analysis of Bases
                        0.9          SURREY_MK
                                     UCLEAR_DOSP
                        0.8          Poselet, Maji et al, 2011
    Average precision




                        0.7          Our method, use “a”
                                     Our method, use “w”
                        0.6

                        0.5

                        0.4

                        0.3

                                 1           2             3       4          5       6         7          8          9
                              Phoning     Playing  Reading       Riding   Riding   Running    Taking      Using    Walking
                                        instrument                bike    horse               photo     computer

a                       Φ                                                                    attributes
                                                                                             objects

                            …
                                                                                             poselets


                            w
                                                           400 action bases                                                  28
VOC 2010: Control Experiment

                            0.7
                                                      Use “a”
            Mean average   0.65                       Use “w”

                            0.6
              precision


                           0.55

                            0.5

a   Φ                      0.45
                                  A+O+P   A+O   A+P     O+P

        …                                         A: attribute
                                                  O: object
                                                  P: poselet

        w
                                                                 29
PASCAL VOC 2011 Result
• Our method ranks the first in nine out of ten classes in
comp10.
                           Others’ best   Others’ best    Our
                            in comp9       in comp10     method
          Jumping              71.6           59.5        66.7
          Phoning              50.7           31.3        41.1
      Playing instrument       77.5           45.6        60.8
          Reading              37.8           27.8        42.2
         Riding bike           88.8           84.4        90.5
        Riding horse           90.2           88.3        92.2
          Running              87.9           77.6        86.2
        Taking photo           25.7           31.0        28.8
       Using computer          58.9           47.4        63.5
           Walking             59.5           57.6        64.2

                                                                  30
PASCAL VOC 2011 Result
• Our method achieves the best performance in five out
of ten classes if we consider both comp9 and comp10.
                           Others’ best   Others’ best    Our
                            in comp9       in comp10     method
          Jumping              71.6           59.5        66.7
          Phoning              50.7           31.3        41.1
      Playing instrument       77.5           45.6        60.8
          Reading              37.8           27.8        42.2
         Riding bike           88.8           84.4        90.5
        Riding horse           90.2           88.3        92.2
          Running              87.9           77.6        86.2
        Taking photo           25.7           31.0        28.8
       Using computer          58.9           47.4        63.5
           Walking             59.5           57.6        64.2

                                                                  31
Stanford 40 Actions
• 40 actions classes, 9532 real world images from Google, Flickr, etc.
    Applauding   Blowing     Brushing   Calling     Cleaning     Climbing    Cooking       Cutting
                 bubbles       teeth                  floor         wall                    trees




     Cutting     Drinking    Feeding    Fishing      Fixing      Gardening   Holding      Jumping
    vegetables                horse                   bike                   umbrella




     Playing      Playing    Pouring    Pushing     Reading      Repairing     Riding      Riding
      guitar       violin     liquid      cart                     car          bike       horse




     Rowing      Running     Shooting   Smoking      Taking      Texting     Throwing       Using
                              arrow     cigarette    photo       message      frisbee     computer




      Using        Using     Walking    Washing     Watching      Waving     Writing on   Writing on
    microscope   telescope    dog        dishes     television    hands       board        paper




http://vision.stanford.edu/Datasets/40actions.html                                                     32
Stanford 40 Actions
• 40 actions classes, 9532 real world images from Google, Flickr, etc.
    Applauding   Blowing     Brushing   Calling     Cleaning     Climbing    Cooking       Cutting
                 bubbles       teeth                  floor         wall                    trees

                                                    Fixing
                                                     bike
     Cutting     Drinking    Feeding    Fishing      Fixing      Gardening   Holding      Jumping
    vegetables                horse                   bike                   umbrella

                                                                             Riding
                                                                              bike
     Playing      Playing    Pouring    Pushing     Reading      Repairing     Riding      Riding
      guitar       violin     liquid      cart                     car          bike       horse




     Rowing      Running     Shooting   Smoking      Taking      Texting     Throwing       Using
                              arrow     cigarette    photo       message      frisbee     computer




      Using        Using     Walking    Washing     Watching      Waving     Writing on   Writing on
    microscope   telescope    dog        dishes     television    hands       board        paper




http://vision.stanford.edu/Datasets/40actions.html                                                     33
Stanford 40 Actions
• 40 actions classes, 9532 real world images from Google, Flickr, etc.
    Applauding   Blowing     Brushing   Calling     Cleaning     Climbing    Cooking         Cutting
                 bubbles       teeth                  floor         wall                      trees




     Cutting     Drinking    Feeding    Fishing      Fixing      Gardening   Holding         Jumping
    vegetables                horse                   bike                   umbrella




     Playing      Playing    Pouring    Pushing     Reading      Repairing     Riding        Riding
      guitar       violin     liquid      cart                     car          bike         horse




     Rowing      Running     Shooting   Smoking      Taking      Texting     Throwing         Using
                              arrow     cigarette    photo       message      frisbee       computer
                                                          Writing on                      Writing on
                                                           board                           paper
      Using        Using     Walking    Washing     Watching      Waving     Writing on     Writing on
    microscope   telescope    dog        dishes     television    hands       board          paper




http://vision.stanford.edu/Datasets/40actions.html                                                       34
Stanford 40 Actions
• 40 actions classes, 9532 real world images from Google, Flickr, etc.
    Applauding     Blowing    Brushing     Calling     Cleaning      Climbing    Cooking       Cutting
                   bubbles      teeth                    floor          wall                    trees

                 Drinking                                           Gardening

     Cutting      Drinking    Feeding      Fishing      Fixing       Gardening   Holding      Jumping
    vegetables                 horse                     bike                    umbrella




     Playing       Playing    Pouring      Pushing     Reading       Repairing     Riding      Riding
      guitar        violin     liquid        cart                      car          bike       horse

                                         Smoking
                                         Cigarette
     Rowing        Running    Shooting     Smoking      Taking       Texting     Throwing       Using
                               arrow       cigarette    photo        message      frisbee     computer




      Using         Using     Walking      Washing     Watching       Waving     Writing on   Writing on
    microscope    telescope    dog          dishes     television     hands       board        paper




http://vision.stanford.edu/Datasets/40actions.html                                                         35
Average precision
                    R
                        id
                           i
                          ng
                                a




                                                0
                                                    0.1
                                                          0.2
                                                                0.3
                                                                      0.4
                                                                            0.5
                                                                                  0.6
                                                                                        0.7
                                                                                              0.8
                                                                                                                 0.9
                    R
                     ow ho
                           in           r
                               g se
          C Rid a b
             lim in                     o
                  bi g a at
                      ng              bi
                            m            k
                                ou e
            C                        nt
                                        a
               le
                   an Jum in
                        in            pi
                           g              ng
                   W th
            Sh alk e flo
                oo ing or
                     tin           a
                          g            do
                   Pl an                   g
                        ay ar
    H                       in         ro
      ol
         di                     g w
           ng                      gu
                                       ita
                 up             Fi r
                        an sh
                             um i ng
          Th                       br
               ro                      el
                   wi Ru la
            W           ng nn
               rit            a i ng
                  in             fri
                      g
                          o n sb e
                                a          e
                     W             b
                          at oa
                             ch r d
                      C i ng
                         ut
               Fe tin TV
                    ed g
                         in tre
                            g             es
                                a
                                   h
             W            G or
                rit ard se
                    in
 Lo                     g          en
    ok                     o            in
       in Rep n a g
         g
            t h ai r bo
               ru             i n ok
                                 g
          C am a
              ut
                 tin icr car
                     g os
                          ve co
              Bl
                  ow get pe
                        in abl
                           g              e
                    P l bub s
                                                                                                                       (LLC, Wang et al, CVPR 2010) baseline.




                         ay b
                             i          le
                  B r ng s
                       us vio
              R            h             l
                  ep ing in
                      ai
                          rin tee
                   Pu g a th
            U shi bik
               sin ng e
                     g             a
                         a             c
                            co art
                                 m
                         A p pu
                              pl te
                                 au r
  Lo S m                              di
                                         ng
     ok           ok C
         in            in
                                                                                                                       • We use 45 attributes, 81 objects, and 150 poselets.




                                 oo
            g             g           k
               th             c in
                   ru iga g
                         a            re
               W te                      tt
                   as les e
                        hi           co
                           ng             p
                                  di e
                                     sh
                               D es
                                 rin
                                                                                                                                                                                          Stanford 40 Actions: Result




                                      ki
                                         n
                   W             C g
                       av all
                           in           in
                   Po g h g
                         ur an
               R             in           d
                   ea g l s
                        di           iq
                           ng uid
                   Ta a
                                                                                                                       • Compare our method with the Locality-constrained Linear Coding




                        k            b
             Te ing oo
                  xt                        k
                     i n pho
                         g
                            m tos
                                 es
                                                                                                    LLC




                                     sa
36




                                         ge
                                                                                                    Our Method
Average precision
                    R
                        id
                           i
                          ng
                                a




                                                0
                                                    0.1
                                                          0.2
                                                                0.3
                                                                      0.4
                                                                            0.5
                                                                                  0.6
                                                                                        0.7
                                                                                              0.8
                                                                                                                 0.9
                    R
                     ow ho
                           in           r
                               g se
          C Rid a b
             lim in                     o
                  bi g a at
                      ng              bi
                            m            k
                                ou e
            C                        nt
                                        a
               le
                   an Jum in
                        in            pi
                           g              ng
                   W th
            Sh alk e flo
                oo ing or
                     tin           a
                          g            do
                   Pl an                   g
                        ay ar
    H                       in         ro
      ol
         di                     g w
           ng                      gu
                                       ita
                 up             Fi r
                        an sh
                             um i ng
          Th                       br
               ro                      el
                   wi Ru la
            W           ng nn
               rit            a i ng
                  in             fri
                      g
                          o n sb e
                                a          e
                     W             b
                          at oa
                             ch r d
                      C i ng
                         ut
               Fe tin TV
                    ed g
                         in tre
                            g             es
                                a
                                   h
             W            G or
                rit ard se
                    in
 Lo                     g          en
    ok                     o            in
       in Rep n a g
         g
            t h ai r bo
               ru             i n ok
                                 g
          C am a
              ut
                 tin icr car
                     g os
                          ve co
              Bl
                  ow get pe
                        in abl
                           g              e
                    P l bub s
                         ay b
                             i          le
                  B r ng s
                       us vio
              R            h             l
                  ep ing in
                      ai
                          rin tee
                   Pu g a th
            U shi bik
               sin ng e
                     g             a
                         a             c
                            co art
                                 m
                         A p pu
                              pl te
                                 au r
  Lo S m                              di
                                         ng
     ok           ok C
         in            in        oo
            g             g           k
               th             c in
                   ru iga g
                         a            re
               W te                      tt
                   as les e
                        hi           co
                           ng             p
                                  di e
                                     sh
                               D es
                                 rin
                                                                                                                       Stanford 40 Actions: Result




                                      ki
                                         n
                   W             C g
                       av all
                           in           in
                   Po g h g
                         ur an
               R             in           d
                   ea g l s
                        di           iq
                           ng uid
                   Ta a
                        k            b
             Te ing oo
                  xt                        k
                     i n pho
                         g
                            m tos
                                 es
                                                                                                    LLC




                                     sa
37




                                         ge
                                                                                                    Our Method
Outline
• Intuition: Action Attributes and Parts
• Algorithm: Learning Bases of Attributes
and Parts
• Experiments: PASCAL VOC & Stanford
40 Actions
• Conclusion

                                            38
Conclusion
Attributes:       Action bases Φ

                                         a: Image feature vector
      ……
Parts-Objects:                …

      ……
Parts-Poselets:
                                                       a Φw
      ……


                  Bases coefficients w
                                                              39
Acknowledgement




                  40

More Related Content

Viewers also liked

Activity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone AccelerometersActivity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone Accelerometers
Ishara Amarasekera
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
Akshay Hegde
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Human activity recognition
Human activity recognition Human activity recognition
Human activity recognition srikanthgadam
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognition
Randhir Gupta
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 

Viewers also liked (6)

Activity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone AccelerometersActivity Recognition using Cell Phone Accelerometers
Activity Recognition using Cell Phone Accelerometers
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Human activity recognition
Human activity recognition Human activity recognition
Human activity recognition
 
Human activity recognition
Human activity recognitionHuman activity recognition
Human activity recognition
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

I Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledgeI Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledge
Sabrina Ricci
 
Hollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest galleryHollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest gallery
Zsolt Nemeth
 
Christina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptxChristina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptx
madeline604788
 
The Journey of an Indie Film - Mark Murphy Director
The Journey of an Indie Film - Mark Murphy DirectorThe Journey of an Indie Film - Mark Murphy Director
The Journey of an Indie Film - Mark Murphy Director
Mark Murphy Director
 
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to StardomYoung Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
greendigital
 
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles onlineTreasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
Hidden Treasure Hunts
 
Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __
catcabrera
 
Reimagining Classics - What Makes a Remake a Success
Reimagining Classics - What Makes a Remake a SuccessReimagining Classics - What Makes a Remake a Success
Reimagining Classics - What Makes a Remake a Success
Mark Murphy Director
 
240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf
Madhura TBRC
 
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and SafetyModern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
AITIX LLC
 
The Evolution of Animation in Film - Mark Murphy Director
The Evolution of Animation in Film - Mark Murphy DirectorThe Evolution of Animation in Film - Mark Murphy Director
The Evolution of Animation in Film - Mark Murphy Director
Mark Murphy Director
 
Maximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdf
Maximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdfMaximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdf
Maximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdf
Xtreame HDTV
 
This Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I MadeThis Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I Made
Aarush Ghate
 
Meet Crazyjamjam - A TikTok Sensation | Blog Eternal
Meet Crazyjamjam - A TikTok Sensation | Blog EternalMeet Crazyjamjam - A TikTok Sensation | Blog Eternal
Meet Crazyjamjam - A TikTok Sensation | Blog Eternal
Blog Eternal
 
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and LoveMeet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
get joys
 
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docxThe Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
Xtreame HDTV
 
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdfMatt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Azura Everhart
 
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
9u08k0x
 
A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024
Indira Srivatsa
 
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
9u08k0x
 

Recently uploaded (20)

I Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledgeI Know Dino Trivia: Part 3. Test your dino knowledge
I Know Dino Trivia: Part 3. Test your dino knowledge
 
Hollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest galleryHollywood Actress - The 250 hottest gallery
Hollywood Actress - The 250 hottest gallery
 
Christina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptxChristina's Baby Shower Game June 2024.pptx
Christina's Baby Shower Game June 2024.pptx
 
The Journey of an Indie Film - Mark Murphy Director
The Journey of an Indie Film - Mark Murphy DirectorThe Journey of an Indie Film - Mark Murphy Director
The Journey of an Indie Film - Mark Murphy Director
 
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to StardomYoung Tom Selleck: A Journey Through His Early Years and Rise to Stardom
Young Tom Selleck: A Journey Through His Early Years and Rise to Stardom
 
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles onlineTreasure Hunt Puzzles, Treasure Hunt Puzzles online
Treasure Hunt Puzzles, Treasure Hunt Puzzles online
 
Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __Snoopy boards the big bow wow musical __
Snoopy boards the big bow wow musical __
 
Reimagining Classics - What Makes a Remake a Success
Reimagining Classics - What Makes a Remake a SuccessReimagining Classics - What Makes a Remake a Success
Reimagining Classics - What Makes a Remake a Success
 
240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf240529_Teleprotection Global Market Report 2024.pdf
240529_Teleprotection Global Market Report 2024.pdf
 
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and SafetyModern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
Modern Radio Frequency Access Control Systems: The Key to Efficiency and Safety
 
The Evolution of Animation in Film - Mark Murphy Director
The Evolution of Animation in Film - Mark Murphy DirectorThe Evolution of Animation in Film - Mark Murphy Director
The Evolution of Animation in Film - Mark Murphy Director
 
Maximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdf
Maximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdfMaximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdf
Maximizing Your Streaming Experience with XCIPTV- Tips for 2024.pdf
 
This Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I MadeThis Is The First All Category Quiz That I Made
This Is The First All Category Quiz That I Made
 
Meet Crazyjamjam - A TikTok Sensation | Blog Eternal
Meet Crazyjamjam - A TikTok Sensation | Blog EternalMeet Crazyjamjam - A TikTok Sensation | Blog Eternal
Meet Crazyjamjam - A TikTok Sensation | Blog Eternal
 
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and LoveMeet Dinah Mattingly – Larry Bird’s Partner in Life and Love
Meet Dinah Mattingly – Larry Bird’s Partner in Life and Love
 
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docxThe Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
The Ultimate Guide to Setting Up Eternal IPTV on Your Devices.docx
 
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdfMatt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
Matt Rife Cancels Shows Due to Health Concerns, Reschedules Tour Dates.pdf
 
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
高仿(nyu毕业证书)美国纽约大学毕业证文凭毕业证原版一模一样
 
A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024A TO Z INDIA Monthly Magazine - JUNE 2024
A TO Z INDIA Monthly Magazine - JUNE 2024
 
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
哪里买(osu毕业证书)美国俄勒冈州立大学毕业证双学位证书原版一模一样
 

ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

  • 1. Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei Stanford University 1
  • 2. Action Classification in Still Images Low level feature Riding bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 2
  • 3. Action Classification in Still Images Low level feature High-level representation Riding bike - Semantic concepts – Attributes Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 3
  • 4. Action Classification in Still Images Low level feature High-level representation Riding bike - Semantic concepts – Attributes - Objects Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 4
  • 5. Action Classification in Still Images Low level feature High-level representation Riding bike - Semantic concepts – Attributes - Objects Parts - Human poses Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 5
  • 6. Action Classification in Still Images Low level feature High-level representation Riding bike - Semantic concepts – Attributes - Objects Parts - Human poses - Contexts of attributes & parts Riding Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 6
  • 7. Action Classification in Still Images Low level feature High-level representation Riding bike wearing a helmet - Semantic concepts – Attributes - Objects sitting on Parts bike seat - Human poses Peddling - Contexts of attributes & parts the pedal riding a bike Yao & Fei-Fei, 2010 Farhadi et al., 2009 Gupta et al., 2009 Yang et al., 2010 Koniusz et al., 2010 Lampert et al., 2009 Yao & Fei-Fei, 2010 Maji et al., 2011 Delaitre et al., 2010 Berg et al., 2010 Torresani et al., 2010 Liu et al., 2011 Yao et al., 2011 Parikh & Grauman, 2011 Li et al., 2010  Incorporate human knowledge;  More understanding of image content;  More discriminative classifier. 7
  • 8. Outline • Intuition: Action Attributes and Parts • Algorithm: Learning Bases of Attributes and Parts • Experiments: PASCAL VOC & Stanford 40 Actions • Conclusion 8
  • 9. Outline • Intuition: Action Attributes and Parts • Algorithm: Learning Bases of Attributes and Parts • Experiments: PASCAL VOC & Stanford 40 Actions • Conclusion 9
  • 10. Action Attributes and Parts Attributes: semantic descriptions of human actions …… 10
  • 11. Action Attributes and Parts Attributes: semantic descriptions of human actions Discriminative classifier, e.g. SVM …… Riding bike Not riding bike Lampert et al., 2009 Berg et al., 2010 11
  • 12. Action Attributes and Parts Attributes: A pre-trained detector …… Parts-Objects: …… Parts-Poselets: …… Object Bank, Li et al., 2010 Poselet, Bourdev & Malik, 2009 12
  • 13. Action Attributes and Parts Attributes: a: Image feature vector …… Attribute classification Parts-Objects: Object detection …… Parts-Poselets: Poselet detection …… 13
  • 14. Action Attributes and Parts Attributes: Action bases Φ a: Image feature vector …… Attribute classification Parts-Objects: … Object detection …… Parts-Poselets: Poselet detection …… 14
  • 15. Action Attributes and Parts Attributes: Action bases Φ a: Image feature vector …… Parts-Objects: … …… Parts-Poselets: …… 15
  • 16. Action Attributes and Parts Attributes: Action bases Φ a: Image feature vector …… Parts-Objects: … …… Parts-Poselets: …… 16
  • 17. Action Attributes and Parts Attributes: Action bases Φ a: Image feature vector …… Parts-Objects: … …… Parts-Poselets: a Φw …… Bases coefficients w 17
  • 18. Action Attributes and Parts Attributes: Action bases Φ a: Image feature vector …… Parts-Objects: … …… Parts-Poselets: a Φw …… • Sparse • Encodes context • Robust to initially Bases coefficients w weak detections 18
  • 19. Outline • Intuition: Action Attributes and Parts • Algorithm: Learning Bases of Attributes and Parts • Experiments: PASCAL VOC & Stanford 40 Actions • Conclusion 19
  • 20. Bases of Atr. & Parts: Training a Φ • Input: a1 ,, a N • Output: Φ Φ1 ,, ΦM … sparse W w1 ,, w N • Jointly estimate Φ and W : w N 1 2 min ai Φw i wi , a Φw Φ ,W i 1 2 2 1 Accurate approximation L1 regularization, sparsity of W 2 s.t. j, Φ j Φj 1 1 2 2 Elastic net, sparsity of Φ [Zou & Hasti, 2005] • Optimization: stochastic gradient descent. 20
  • 21. Bases of Atr. & Parts: Testing a Φ • Input: a Φ Φ1 ,, ΦM … • Output: w sparse • Estimate w: w 1 2 a Φw min a Φw 2 w1 w 2 Accurate approximation L1 regularization, sparsity of W • Optimization: stochastic gradient descent. 21
  • 22. Outline • Intuition: Action Attributes and Parts • Algorithm: Learning Bases of Attributes and Parts • Experiments: PASCAL VOC & Stanford 40 Actions • Conclusion 22
  • 23. PASCAL VOC 2010 Action Dataset • 9 classes, 50-100 trainval / testing images per class Figure credit: Ivan Laptev • 14 attributes – trained from the trainval images; 27 objects – taken from Li et al, NIPS 2010; 150 poselets – taken from Bourdev & Malik, ICCV 2009. 23
  • 24. VOC 2010: Classification Result 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” 0.6 0.5 0.4 0.3 1 Phoning 2 Playing 3 Reading 4 Riding 5 Riding 6 Running 7 Taking 8 9 Walking Using instrument bike horse photo computer a Φ … w 24
  • 25. VOC 2010: Classification Result 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computer a Φ … w 25
  • 26. VOC 2010: Analysis of Bases 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computer a Φ attributes objects … poselets w 400 action bases 26
  • 27. VOC 2010: Analysis of Bases 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computer a Φ attributes objects … poselets w 400 action bases 27
  • 28. VOC 2010: Analysis of Bases 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computer a Φ attributes objects … poselets w 400 action bases 28
  • 29. VOC 2010: Control Experiment 0.7 Use “a” Mean average 0.65 Use “w” 0.6 precision 0.55 0.5 a Φ 0.45 A+O+P A+O A+P O+P … A: attribute O: object P: poselet w 29
  • 30. PASCAL VOC 2011 Result • Our method ranks the first in nine out of ten classes in comp10. Others’ best Others’ best Our in comp9 in comp10 method Jumping 71.6 59.5 66.7 Phoning 50.7 31.3 41.1 Playing instrument 77.5 45.6 60.8 Reading 37.8 27.8 42.2 Riding bike 88.8 84.4 90.5 Riding horse 90.2 88.3 92.2 Running 87.9 77.6 86.2 Taking photo 25.7 31.0 28.8 Using computer 58.9 47.4 63.5 Walking 59.5 57.6 64.2 30
  • 31. PASCAL VOC 2011 Result • Our method achieves the best performance in five out of ten classes if we consider both comp9 and comp10. Others’ best Others’ best Our in comp9 in comp10 method Jumping 71.6 59.5 66.7 Phoning 50.7 31.3 41.1 Playing instrument 77.5 45.6 60.8 Reading 37.8 27.8 42.2 Riding bike 88.8 84.4 90.5 Riding horse 90.2 88.3 92.2 Running 87.9 77.6 86.2 Taking photo 25.7 31.0 28.8 Using computer 58.9 47.4 63.5 Walking 59.5 57.6 64.2 31
  • 32. Stanford 40 Actions • 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paper http://vision.stanford.edu/Datasets/40actions.html 32
  • 33. Stanford 40 Actions • 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Fixing bike Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Riding bike Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paper http://vision.stanford.edu/Datasets/40actions.html 33
  • 34. Stanford 40 Actions • 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Writing on Writing on board paper Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paper http://vision.stanford.edu/Datasets/40actions.html 34
  • 35. Stanford 40 Actions • 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Drinking Gardening Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Smoking Cigarette Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paper http://vision.stanford.edu/Datasets/40actions.html 35
  • 36. Average precision R id i ng a 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 R ow ho in r g se C Rid a b lim in o bi g a at ng bi m k ou e C nt a le an Jum in in pi g ng W th Sh alk e flo oo ing or tin a g do Pl an g ay ar H in ro ol di g w ng gu ita up Fi r an sh um i ng Th br ro el wi Ru la W ng nn rit a i ng in fri g o n sb e a e W b at oa ch r d C i ng ut Fe tin TV ed g in tre g es a h W G or rit ard se in Lo g en ok o in in Rep n a g g t h ai r bo ru i n ok g C am a ut tin icr car g os ve co Bl ow get pe in abl g e P l bub s (LLC, Wang et al, CVPR 2010) baseline. ay b i le B r ng s us vio R h l ep ing in ai rin tee Pu g a th U shi bik sin ng e g a a c co art m A p pu pl te au r Lo S m di ng ok ok C in in • We use 45 attributes, 81 objects, and 150 poselets. oo g g k th c in ru iga g a re W te tt as les e hi co ng p di e sh D es rin Stanford 40 Actions: Result ki n W C g av all in in Po g h g ur an R in d ea g l s di iq ng uid Ta a • Compare our method with the Locality-constrained Linear Coding k b Te ing oo xt k i n pho g m tos es LLC sa 36 ge Our Method
  • 37. Average precision R id i ng a 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 R ow ho in r g se C Rid a b lim in o bi g a at ng bi m k ou e C nt a le an Jum in in pi g ng W th Sh alk e flo oo ing or tin a g do Pl an g ay ar H in ro ol di g w ng gu ita up Fi r an sh um i ng Th br ro el wi Ru la W ng nn rit a i ng in fri g o n sb e a e W b at oa ch r d C i ng ut Fe tin TV ed g in tre g es a h W G or rit ard se in Lo g en ok o in in Rep n a g g t h ai r bo ru i n ok g C am a ut tin icr car g os ve co Bl ow get pe in abl g e P l bub s ay b i le B r ng s us vio R h l ep ing in ai rin tee Pu g a th U shi bik sin ng e g a a c co art m A p pu pl te au r Lo S m di ng ok ok C in in oo g g k th c in ru iga g a re W te tt as les e hi co ng p di e sh D es rin Stanford 40 Actions: Result ki n W C g av all in in Po g h g ur an R in d ea g l s di iq ng uid Ta a k b Te ing oo xt k i n pho g m tos es LLC sa 37 ge Our Method
  • 38. Outline • Intuition: Action Attributes and Parts • Algorithm: Learning Bases of Attributes and Parts • Experiments: PASCAL VOC & Stanford 40 Actions • Conclusion 38
  • 39. Conclusion Attributes: Action bases Φ a: Image feature vector …… Parts-Objects: … …… Parts-Poselets: a Φw …… Bases coefficients w 39