Recognizing Human-Object Interactions inStill Images by Modeling the MutualContextof Objects and Human PosesPresented ByAr...
Robots interactwith objectsAutomatic sportscommentary“Kobe is dunking the ball.”2Human-Object InteractionMedical care
3Vs.Human-Object InteractionPlayingsaxophonePlayingbassoonPlayingsaxophoneGrouplet is a generic feature for structured obj...
4Human-Object InteractionTorsoHead• Human pose estimationHolistic image based classificationDetailed understanding and rea...
5Human-Object InteractionTennisracket• Human pose estimationHolistic image based classificationDetailed understanding and ...
6Human-Object Interaction• Human pose estimationHolistic image based classificationDetailed understanding and reasoning• O...
• Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference...
• Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference...
• Felzenszwalb & Huttenlocher, 2005• Ren et al, 2005• Ramanan, 2006• Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et ...
Human pose estimation & Object detection10Human poseestimation ischallenging.• Felzenszwalb & Huttenlocher, 2005• Ren et a...
Human pose estimation & Object detection11FacilitateGiven theobject isdetected.
• Viola & Jones, 2001• Lampert et al, 2008• Divvala et al, 2009• Vedaldi et al, 2009Small, low-resolution,partially occlud...
Human pose estimation & Object detection13Objectdetection ischallenging• Viola & Jones, 2001• Lampert et al, 2008• Divvala...
Human pose estimation & Object detection14FacilitateGiven thepose isestimated.
Human pose estimation & Object detection15Mutual Context
• Hoiem et al, 2006• Rabinovich et al, 2007• Oliva & Torralba, 2007• Heitz & Koller, 2008• Desai et al, 2009• Divvala et a...
Context in Computer VisionOur approach – Two challengingtasks serve as mutual context ofeach other:Withmutualcontext:Witho...
• Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference...
19HAMutual Context Model Representation• More than one H for each A;• Unobserved during training.A:CroquetshotVolleyballs...
20Mutual Context Model Representation( , )e O H( , )e A O( , )e A He ee Ew  Markov Random FieldCliquepotentialCliq...
21Af1 f2 fNMutual Context Model Representation( , )e nO P( , )e m nP PfOP1 PNP2OH• , , : Spatialrelationship among obj...
22HAf1 f2 fNMutual Context Model RepresentationObtained bystructure learningfOPNP1 P2O• Learn structural connectivity am...
23HOAfOf1 f2 fNP1 P2 PNMutual Context Model Representation• and : Discriminativepart detection scores.( , )e OO f ( , )...
• Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference...
25Model LearningHOAfOf1 f2 fNP1 P2 PNe ee Ew  cricketshotcricketbowlingInput:Goals:Hidden human poses
26Model LearningHOAfOf1 f2 fNP1 P2 PNInput:Goals:Hidden human posesStructural connectivitye ee Ew  cricketshotcric...
e ee Ew  27Model LearningGoals:Hidden human posesStructural connectivityPotential parametersPotential weightsHOAfOf1...
28Model LearningGoals:Parameter estimationHidden variablesStructure learningHOAfOf1 f2 fNP1 P2 PNInput:e ee Ew  cr...
29Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:croquet shote ee Ew  Hidden human posesStructural connectivityP...
30Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:  22max2e eeE eEw     Joint densityof the model...
31Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:( , )e O H( , )e A O ( , )e A H( , )e nO P ( , )e m nP P( , )e ...
32Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:Max-margin learning22,1min2r ir i  ww• xi: Potential values of...
33Learning ResultsCricketdefensiveshotCricketbowlingCroquetshot
34Learning ResultsTennisserveVolleyballsmashTennisforehand
• Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference...
I 36Model InferenceThe learned models
I 37Model InferenceThe learned modelsHead detectionTorso detectionTennis racket detectionLayout of the object and body ...
I38Model InferenceThe learned models    * *1 1 1 1,, , , n nA H O P   * *,, , ,K K K K n nA H O POutput
• Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference...
40Dataset and Experiment Setup• Object detection;• Pose estimation;• Activity classification.Tasks:[Gupta et al, 2009]Cric...
[Gupta et al, 2009]Cricketdefensive shotCricketbowlingCroquetshotTennisforehandTennisserveVolleyballsmashSport data set: 6...
0 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionObject Detection ResultsCricket bat42ValidregionCroquet mallet Tennis ra...
Object Detection Results43430 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionVolleyball0 0.2 0.4 0.6 0.8 100.20.40.60.81Rec...
44Dataset and Experiment Setup• Object detection;• Pose estimation;• Activity classification.Tasks:[Gupta et al, 2009]Cric...
45Human Pose Estimation ResultsMethod Torso Upper Leg Lower Leg Upper Arm Lower Arm HeadRamanan,2006.52 .22 .22 .21 .28 .2...
46Human Pose Estimation ResultsMethod Torso Upper Leg Lower Leg Upper Arm Lower Arm HeadRamanan,2006.52 .22 .22 .21 .28 .2...
47Human Pose Estimation ResultsMethod Torso Upper Leg Lower Leg Upper Arm Lower Arm HeadRamanan,2006.52 .22 .22 .21 .28 .2...
48Dataset and Experiment Setup• Object detection;• Pose estimation;• Activity classification.Tasks:[Gupta et al, 2009]Cric...
Activity Classification Results49Gupta etal, 2009OurmodelBag-of-Words83.3%Classificationaccuracy78.9%52.5%0.90.80.70.60.5N...
50ConclusionHuman-Object InteractionNext StepsVs.• Pose estimation & Object detection on PPMI images.• Modeling multiple o...
51
Upcoming SlideShare
Loading in …5
×

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses

504 views
395 views

Published on

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
504
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses

  1. 1. Recognizing Human-Object Interactions inStill Images by Modeling the MutualContextof Objects and Human PosesPresented ByArwa ChittalwalaIrfan ShaikhHeena Patel1
  2. 2. Robots interactwith objectsAutomatic sportscommentary“Kobe is dunking the ball.”2Human-Object InteractionMedical care
  3. 3. 3Vs.Human-Object InteractionPlayingsaxophonePlayingbassoonPlayingsaxophoneGrouplet is a generic feature for structured objects, or interactionsof groups of objects.(Previous talk: Grouplet)Caltech101HOI activity: Tennis ForehandHolistic image based classificationDetailed understanding and reasoningBerg & Malik, 2005 Grauman & Darrell, 2005 Gehler & Nowozin, 2009 OURS48% 59% 77% 62%
  4. 4. 4Human-Object InteractionTorsoHead• Human pose estimationHolistic image based classificationDetailed understanding and reasoning
  5. 5. 5Human-Object InteractionTennisracket• Human pose estimationHolistic image based classificationDetailed understanding and reasoning• Object detection
  6. 6. 6Human-Object Interaction• Human pose estimationHolistic image based classificationDetailed understanding and reasoning• Object detectionTorsoHeadTennisracketHOI activity: Tennis Forehand
  7. 7. • Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference• Experiments• ConclusionOutline7
  8. 8. • Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference• Experiments• ConclusionOutline8
  9. 9. • Felzenszwalb & Huttenlocher, 2005• Ren et al, 2005• Ramanan, 2006• Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009Difficult partappearanceSelf-occlusionImage region lookslike a body partHuman pose estimation & Object detection9Human poseestimation ischallenging.
  10. 10. Human pose estimation & Object detection10Human poseestimation ischallenging.• Felzenszwalb & Huttenlocher, 2005• Ren et al, 2005• Ramanan, 2006• Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009
  11. 11. Human pose estimation & Object detection11FacilitateGiven theobject isdetected.
  12. 12. • Viola & Jones, 2001• Lampert et al, 2008• Divvala et al, 2009• Vedaldi et al, 2009Small, low-resolution,partially occludedImage region similarto detection targetHuman pose estimation & Object detection12Objectdetection ischallenging
  13. 13. Human pose estimation & Object detection13Objectdetection ischallenging• Viola & Jones, 2001• Lampert et al, 2008• Divvala et al, 2009• Vedaldi et al, 2009
  14. 14. Human pose estimation & Object detection14FacilitateGiven thepose isestimated.
  15. 15. Human pose estimation & Object detection15Mutual Context
  16. 16. • Hoiem et al, 2006• Rabinovich et al, 2007• Oliva & Torralba, 2007• Heitz & Koller, 2008• Desai et al, 2009• Divvala et al, 2009• Murphy et al, 2003• Shotton et al, 2006• Harzallah et al, 2009• Li, Socher & Fei-Fei, 2009• Marszalek et al, 2009• Bao & Savarese, 2010Context in Computer Vision~3-4%withcontextwithoutcontextHelpful, but only moderatelyoutperform betterPrevious work – Use contextcues to facilitate object detection:• Viola & Jones, 2001• Lampert et al, 200816
  17. 17. Context in Computer VisionOur approach – Two challengingtasks serve as mutual context ofeach other:Withmutualcontext:Withoutcontext:17~3-4%withcontextwithoutcontextHelpful, but only moderatelyoutperform betterPrevious work – Use contextcues to facilitate object detection:• Hoiem et al, 2006• Rabinovich et al, 2007• Oliva & Torralba, 2007• Heitz & Koller, 2008• Desai et al, 2009• Divvala et al, 2009• Murphy et al, 2003• Shotton et al, 2006• Harzallah et al, 2009• Li, Socher & Fei-Fei, 2009• Marszalek et al, 2009• Bao & Savarese, 2010
  18. 18. • Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference• Experiments• ConclusionOutline18
  19. 19. 19HAMutual Context Model Representation• More than one H for each A;• Unobserved during training.A:CroquetshotVolleyballsmashTennisforehandIntra-class variationsActivityObjectHuman poseBody partslP: location; θP: orientation; sP: scale.CroquetmalletVolleyballTennisracketO:H:P:f: Shape context. [Belongie et al, 2002]P1Image evidencefOf1 f2 fNOP2 PN
  20. 20. 20Mutual Context Model Representation( , )e O H( , )e A O( , )e A He ee Ew  Markov Random FieldCliquepotentialCliqueweightOP1 PNfOHAP2f1 f2 fN( , )e A O ( , )e A H ( , )e O H• , , : Frequencyof co-occurrence between A, O, and H.
  21. 21. 21Af1 f2 fNMutual Context Model Representation( , )e nO P( , )e m nP PfOP1 PNP2OH• , , : Spatialrelationship among object and body parts.( , )e nO P ( , )e m nP P( , )e nH P     bin binn n nO P O P O Pl l s s    location orientation size( , )e nH Pe ee Ew  Markov Random FieldCliquepotentialCliqueweight( , )e A O ( , )e A H ( , )e O H• , , : Frequencyof co-occurrence between A, O, and H.
  22. 22. 22HAf1 f2 fNMutual Context Model RepresentationObtained bystructure learningfOPNP1 P2O• Learn structural connectivity amongthe body parts and the object.( , )e A O ( , )e A H ( , )e O H• , , : Frequencyof co-occurrence between A, O, and H.• , , : Spatialrelationship among object and body parts.( , )e nO P ( , )e m nP P( , )e nH P     bin binn n nO P O P O Pl l s s    location orientation size ( , )e nO P( , )e m nP P( , )e nH Pe ee Ew  Markov Random FieldCliquepotentialCliqueweight
  23. 23. 23HOAfOf1 f2 fNP1 P2 PNMutual Context Model Representation• and : Discriminativepart detection scores.( , )e OO f ( , )ne n PP f[Andriluka et al, 2009]Shape context + AdaBoost• Learn structural connectivity amongthe body parts and the object.[Belongie et al, 2002][Viola & Jones, 2001]( , )e OO f( , )ne n PP f( , )e A O ( , )e A H ( , )e O H• , , : Frequencyof co-occurrence between A, O, and H.• , , : Spatialrelationship among object and body parts.( , )e nO P ( , )e m nP P( , )e nH P     bin binn n nO P O P O Pl l s s    location orientation sizee ee Ew  Markov Random FieldCliquepotentialCliqueweight
  24. 24. • Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference• Experiments• ConclusionOutline24
  25. 25. 25Model LearningHOAfOf1 f2 fNP1 P2 PNe ee Ew  cricketshotcricketbowlingInput:Goals:Hidden human poses
  26. 26. 26Model LearningHOAfOf1 f2 fNP1 P2 PNInput:Goals:Hidden human posesStructural connectivitye ee Ew  cricketshotcricketbowling
  27. 27. e ee Ew  27Model LearningGoals:Hidden human posesStructural connectivityPotential parametersPotential weightsHOAfOf1 f2 fNP1 P2 PNInput:cricketshotcricketbowling
  28. 28. 28Model LearningGoals:Parameter estimationHidden variablesStructure learningHOAfOf1 f2 fNP1 P2 PNInput:e ee Ew  cricketshotcricketbowlingHidden human posesStructural connectivityPotential parametersPotential weights
  29. 29. 29Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:croquet shote ee Ew  Hidden human posesStructural connectivityPotential parametersPotential weights
  30. 30. 30Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:  22max2e eeE eEw     Joint densityof the modelGaussian priori ofthe edge number Hill-climbinge ee Ew  Hidden human posesStructural connectivityPotential parametersPotential weights
  31. 31. 31Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:( , )e O H( , )e A O ( , )e A H( , )e nO P ( , )e m nP P( , )e nH P( , )e OO f ( , )ne n PP f• Maximum likelihood• Standard AdaBooste ee Ew  Hidden human posesStructural connectivityPotential parametersPotential weights
  32. 32. 32Model LearningGoals:HOAfOf1 f2 fNP1 P2 PNApproach:Max-margin learning22,1min2r ir i  ww• xi: Potential values of the i-th image.• wr: Potential weights of the r-th pose.• y(r): Activity of the r-th pose.• ξi: A slack variable for the i-th image.Notations   s.t. , where ,1, 0iic i r i iii r y r y ci      w x w xe ee Ew  Hidden human posesStructural connectivityPotential parametersPotential weights
  33. 33. 33Learning ResultsCricketdefensiveshotCricketbowlingCroquetshot
  34. 34. 34Learning ResultsTennisserveVolleyballsmashTennisforehand
  35. 35. • Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference• Experiments• ConclusionOutline35
  36. 36. I 36Model InferenceThe learned models
  37. 37. I 37Model InferenceThe learned modelsHead detectionTorso detectionTennis racket detectionLayout of the object and body parts.CompositionalInference[Chen et al, 2007]  * *1 1 1 1,, , , n nA H O P
  38. 38. I38Model InferenceThe learned models    * *1 1 1 1,, , , n nA H O P   * *,, , ,K K K K n nA H O POutput
  39. 39. • Background and Intuition• Mutual Context of Object and Human Pose Model Representation Model Learning Model Inference• Experiments• ConclusionOutline39
  40. 40. 40Dataset and Experiment Setup• Object detection;• Pose estimation;• Activity classification.Tasks:[Gupta et al, 2009]Cricketdefensive shotCricketbowlingCroquetshotTennisforehandTennisserveVolleyballsmashSport data set: 6 classes180 training (supervised with object and part locations) & 120 testing images
  41. 41. [Gupta et al, 2009]Cricketdefensive shotCricketbowlingCroquetshotTennisforehandTennisserveVolleyballsmashSport data set: 6 classes41Dataset and Experiment Setup• Object detection;• Pose estimation;• Activity classification.Tasks:180 training (supervised with object and part locations) & 120 testing images
  42. 42. 0 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionObject Detection ResultsCricket bat42ValidregionCroquet mallet Tennis racket Volleyball0 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionCricket ballOurMethodSlidingwindowPedestriancontext[Andrilukaet al, 2009][Dalal &Triggs, 2006]
  43. 43. Object Detection Results43430 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionVolleyball0 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionCricket ball0 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionOur MethodPedestrian as contextScanning window detector0 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionOur MethodPedestrian as contextScanning window detector0 0.2 0.4 0.6 0.8 100.20.40.60.81RecallPrecisionOur MethodPedestrian as contextScanning window detectorSliding window Pedestrian context Our methodSmallobjectBackgroundclutter
  44. 44. 44Dataset and Experiment Setup• Object detection;• Pose estimation;• Activity classification.Tasks:[Gupta et al, 2009]Cricketdefensive shotCricketbowlingCroquetshotTennisforehandTennisserveVolleyballsmashSport data set: 6 classes180 training & 120 testing images
  45. 45. 45Human Pose Estimation ResultsMethod Torso Upper Leg Lower Leg Upper Arm Lower Arm HeadRamanan,2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42Andriluka etal, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45Our fullmodel.66 .43 .39 .44 .34 .44 .40 .27 .29 .58
  46. 46. 46Human Pose Estimation ResultsMethod Torso Upper Leg Lower Leg Upper Arm Lower Arm HeadRamanan,2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42Andriluka etal, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45Our fullmodel.66 .43 .39 .44 .34 .44 .40 .27 .29 .58Andrilukaet al, 2009Our estimationresultTennis servemodelAndrilukaet al, 2009Our estimationresultVolleyballsmash model
  47. 47. 47Human Pose Estimation ResultsMethod Torso Upper Leg Lower Leg Upper Arm Lower Arm HeadRamanan,2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42Andriluka etal, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45Our fullmodel.66 .43 .39 .44 .34 .44 .40 .27 .29 .58One poseper class.63 .40 .36 .41 .31 .38 .35 .21 .23 .52EstimationresultEstimationresultEstimationresultEstimationresult
  48. 48. 48Dataset and Experiment Setup• Object detection;• Pose estimation;• Activity classification.Tasks:[Gupta et al, 2009]Cricketdefensive shotCricketbowlingCroquetshotTennisforehandTennisserveVolleyballsmashSport data set: 6 classes180 training & 120 testing images
  49. 49. Activity Classification Results49Gupta etal, 2009OurmodelBag-of-Words83.3%Classificationaccuracy78.9%52.5%0.90.80.70.60.5No sceneinformation Scene iscritical!! CricketshotTennisforehandBag-of-wordsSIFT+SVMGupta etal, 2009Ourmodel
  50. 50. 50ConclusionHuman-Object InteractionNext StepsVs.• Pose estimation & Object detection on PPMI images.• Modeling multiple objects and humans.Grouplet representationMutual context model
  51. 51. 51

×