ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

890 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
890
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ICCV2011: Human Action Recognition by Learning bases of action attributes and parts

  1. 1. Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla,Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei Stanford University 1
  2. 2. Action Classification in Still ImagesLow level feature Riding bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 2
  3. 3. Action Classification in Still ImagesLow level feature High-level representation Riding bike - Semantic concepts – Attributes Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 3
  4. 4. Action Classification in Still ImagesLow level feature High-level representation Riding bike - Semantic concepts – Attributes - Objects Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 4
  5. 5. Action Classification in Still ImagesLow level feature High-level representation Riding bike - Semantic concepts – Attributes - Objects Parts - Human poses Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 5
  6. 6. Action Classification in Still ImagesLow level feature High-level representation Riding bike - Semantic concepts – Attributes - Objects Parts - Human poses - Contexts of attributes & parts Riding Riding a bike Yao & Fei-Fei, 2010 Koniusz et al., 2010 Sitting on a bike seat Delaitre et al., 2010 Wearing a helmet Yao et al., 2011 Peddling the pedals … 6
  7. 7. Action Classification in Still ImagesLow level feature High-level representation Riding bike wearing a helmet - Semantic concepts – Attributes - Objects sitting on Parts bike seat - Human poses Peddling - Contexts of attributes & parts the pedal riding a bike Yao & Fei-Fei, 2010 Farhadi et al., 2009 Gupta et al., 2009 Yang et al., 2010 Koniusz et al., 2010 Lampert et al., 2009 Yao & Fei-Fei, 2010 Maji et al., 2011 Delaitre et al., 2010 Berg et al., 2010 Torresani et al., 2010 Liu et al., 2011 Yao et al., 2011 Parikh & Grauman, 2011 Li et al., 2010  Incorporate human knowledge;  More understanding of image content;  More discriminative classifier. 7
  8. 8. Outline• Intuition: Action Attributes and Parts• Algorithm: Learning Bases of Attributesand Parts• Experiments: PASCAL VOC & Stanford40 Actions• Conclusion 8
  9. 9. Outline• Intuition: Action Attributes and Parts• Algorithm: Learning Bases of Attributesand Parts• Experiments: PASCAL VOC & Stanford40 Actions• Conclusion 9
  10. 10. Action Attributes and PartsAttributes: semantic descriptions of human actions …… 10
  11. 11. Action Attributes and PartsAttributes: semantic descriptions of human actions Discriminative classifier, e.g. SVM …… Riding bike Not riding bike Lampert et al., 2009 Berg et al., 2010 11
  12. 12. Action Attributes and PartsAttributes: A pre-trained detector ……Parts-Objects: ……Parts-Poselets: …… Object Bank, Li et al., 2010 Poselet, Bourdev & Malik, 2009 12
  13. 13. Action Attributes and PartsAttributes: a: Image feature vector …… Attribute classificationParts-Objects: Object detection ……Parts-Poselets: Poselet detection …… 13
  14. 14. Action Attributes and PartsAttributes: Action bases Φ a: Image feature vector …… Attribute classificationParts-Objects: … Object detection ……Parts-Poselets: Poselet detection …… 14
  15. 15. Action Attributes and PartsAttributes: Action bases Φ a: Image feature vector ……Parts-Objects: … ……Parts-Poselets: …… 15
  16. 16. Action Attributes and PartsAttributes: Action bases Φ a: Image feature vector ……Parts-Objects: … ……Parts-Poselets: …… 16
  17. 17. Action Attributes and PartsAttributes: Action bases Φ a: Image feature vector ……Parts-Objects: … ……Parts-Poselets: a Φw …… Bases coefficients w 17
  18. 18. Action Attributes and PartsAttributes: Action bases Φ a: Image feature vector ……Parts-Objects: … ……Parts-Poselets: a Φw …… • Sparse • Encodes context • Robust to initially Bases coefficients w weak detections 18
  19. 19. Outline• Intuition: Action Attributes and Parts• Algorithm: Learning Bases ofAttributes and Parts• Experiments: PASCAL VOC & Stanford40 Actions• Conclusion 19
  20. 20. Bases of Atr. & Parts: Traininga Φ • Input: a1 ,, a N • Output: Φ Φ1 ,, ΦM … sparse W w1 ,, w N • Jointly estimate Φ and W : w N 1 2 min ai Φw i wi , a Φw Φ ,W i 1 2 2 1 Accurate approximation L1 regularization, sparsity of W 2 s.t. j, Φ j Φj 1 1 2 2 Elastic net, sparsity of Φ [Zou & Hasti, 2005] • Optimization: stochastic gradient descent. 20
  21. 21. Bases of Atr. & Parts: Testinga Φ • Input: a Φ Φ1 ,, ΦM … • Output: w sparse • Estimate w: w 1 2 a Φw min a Φw 2 w1 w 2 Accurate approximation L1 regularization, sparsity of W • Optimization: stochastic gradient descent. 21
  22. 22. Outline• Intuition: Action Attributes and Parts• Algorithm: Learning Bases of Attributesand Parts• Experiments: PASCAL VOC & Stanford40 Actions• Conclusion 22
  23. 23. PASCAL VOC 2010 Action Dataset• 9 classes, 50-100 trainval / testing images per class Figure credit: Ivan Laptev• 14 attributes – trained from the trainval images; 27 objects – taken from Li et al, NIPS 2010; 150 poselets – taken from Bourdev & Malik, ICCV 2009. 23
  24. 24. VOC 2010: Classification Result 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” 0.6 0.5 0.4 0.3 1 Phoning 2 Playing 3 Reading 4 Riding 5 Riding 6 Running 7 Taking 8 9 Walking Using instrument bike horse photo computera Φ … w 24
  25. 25. VOC 2010: Classification Result 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computera Φ … w 25
  26. 26. VOC 2010: Analysis of Bases 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computera Φ attributes objects … poselets w 400 action bases 26
  27. 27. VOC 2010: Analysis of Bases 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computera Φ attributes objects … poselets w 400 action bases 27
  28. 28. VOC 2010: Analysis of Bases 0.9 SURREY_MK UCLEAR_DOSP 0.8 Poselet, Maji et al, 2011 Average precision 0.7 Our method, use “a” Our method, use “w” 0.6 0.5 0.4 0.3 1 2 3 4 5 6 7 8 9 Phoning Playing Reading Riding Riding Running Taking Using Walking instrument bike horse photo computera Φ attributes objects … poselets w 400 action bases 28
  29. 29. VOC 2010: Control Experiment 0.7 Use “a” Mean average 0.65 Use “w” 0.6 precision 0.55 0.5a Φ 0.45 A+O+P A+O A+P O+P … A: attribute O: object P: poselet w 29
  30. 30. PASCAL VOC 2011 Result• Our method ranks the first in nine out of ten classes incomp10. Others’ best Others’ best Our in comp9 in comp10 method Jumping 71.6 59.5 66.7 Phoning 50.7 31.3 41.1 Playing instrument 77.5 45.6 60.8 Reading 37.8 27.8 42.2 Riding bike 88.8 84.4 90.5 Riding horse 90.2 88.3 92.2 Running 87.9 77.6 86.2 Taking photo 25.7 31.0 28.8 Using computer 58.9 47.4 63.5 Walking 59.5 57.6 64.2 30
  31. 31. PASCAL VOC 2011 Result• Our method achieves the best performance in five outof ten classes if we consider both comp9 and comp10. Others’ best Others’ best Our in comp9 in comp10 method Jumping 71.6 59.5 66.7 Phoning 50.7 31.3 41.1 Playing instrument 77.5 45.6 60.8 Reading 37.8 27.8 42.2 Riding bike 88.8 84.4 90.5 Riding horse 90.2 88.3 92.2 Running 87.9 77.6 86.2 Taking photo 25.7 31.0 28.8 Using computer 58.9 47.4 63.5 Walking 59.5 57.6 64.2 31
  32. 32. Stanford 40 Actions• 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paperhttp://vision.stanford.edu/Datasets/40actions.html 32
  33. 33. Stanford 40 Actions• 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Fixing bike Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Riding bike Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paperhttp://vision.stanford.edu/Datasets/40actions.html 33
  34. 34. Stanford 40 Actions• 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Writing on Writing on board paper Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paperhttp://vision.stanford.edu/Datasets/40actions.html 34
  35. 35. Stanford 40 Actions• 40 actions classes, 9532 real world images from Google, Flickr, etc. Applauding Blowing Brushing Calling Cleaning Climbing Cooking Cutting bubbles teeth floor wall trees Drinking Gardening Cutting Drinking Feeding Fishing Fixing Gardening Holding Jumping vegetables horse bike umbrella Playing Playing Pouring Pushing Reading Repairing Riding Riding guitar violin liquid cart car bike horse Smoking Cigarette Rowing Running Shooting Smoking Taking Texting Throwing Using arrow cigarette photo message frisbee computer Using Using Walking Washing Watching Waving Writing on Writing on microscope telescope dog dishes television hands board paperhttp://vision.stanford.edu/Datasets/40actions.html 35
  36. 36. Average precision R id i ng a 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 R ow ho in r g se C Rid a b lim in o bi g a at ng bi m k ou e C nt a le an Jum in in pi g ng W th Sh alk e flo oo ing or tin a g do Pl an g ay ar H in ro ol di g w ng gu ita up Fi r an sh um i ng Th br ro el wi Ru la W ng nn rit a i ng in fri g o n sb e a e W b at oa ch r d C i ng ut Fe tin TV ed g in tre g es a h W G or rit ard se in Lo g en ok o in in Rep n a g g t h ai r bo ru i n ok g C am a ut tin icr car g os ve co Bl ow get pe in abl g e P l bub s (LLC, Wang et al, CVPR 2010) baseline. ay b i le B r ng s us vio R h l ep ing in ai rin tee Pu g a th U shi bik sin ng e g a a c co art m A p pu pl te au r Lo S m di ng ok ok C in in • We use 45 attributes, 81 objects, and 150 poselets. oo g g k th c in ru iga g a re W te tt as les e hi co ng p di e sh D es rin Stanford 40 Actions: Result ki n W C g av all in in Po g h g ur an R in d ea g l s di iq ng uid Ta a • Compare our method with the Locality-constrained Linear Coding k b Te ing oo xt k i n pho g m tos es LLC sa36 ge Our Method
  37. 37. Average precision R id i ng a 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 R ow ho in r g se C Rid a b lim in o bi g a at ng bi m k ou e C nt a le an Jum in in pi g ng W th Sh alk e flo oo ing or tin a g do Pl an g ay ar H in ro ol di g w ng gu ita up Fi r an sh um i ng Th br ro el wi Ru la W ng nn rit a i ng in fri g o n sb e a e W b at oa ch r d C i ng ut Fe tin TV ed g in tre g es a h W G or rit ard se in Lo g en ok o in in Rep n a g g t h ai r bo ru i n ok g C am a ut tin icr car g os ve co Bl ow get pe in abl g e P l bub s ay b i le B r ng s us vio R h l ep ing in ai rin tee Pu g a th U shi bik sin ng e g a a c co art m A p pu pl te au r Lo S m di ng ok ok C in in oo g g k th c in ru iga g a re W te tt as les e hi co ng p di e sh D es rin Stanford 40 Actions: Result ki n W C g av all in in Po g h g ur an R in d ea g l s di iq ng uid Ta a k b Te ing oo xt k i n pho g m tos es LLC sa37 ge Our Method
  38. 38. Outline• Intuition: Action Attributes and Parts• Algorithm: Learning Bases of Attributesand Parts• Experiments: PASCAL VOC & Stanford40 Actions• Conclusion 38
  39. 39. ConclusionAttributes: Action bases Φ a: Image feature vector ……Parts-Objects: … ……Parts-Poselets: a Φw …… Bases coefficients w 39
  40. 40. Acknowledgement 40

×