Recovering 3D human body configurations using shape contexts

1,325 views
1,198 views

Published on

Mori and Malik
Presented by Joseph Vainshtein

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,325
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Recovering 3D human body configurations using shape contexts

  1. 1. Recovering 3D human body configurations using shape contexts Greg Mori & Jitendra Malik Presented by Joseph Vainshtein Winter 2007
  2. 2. Agenda <ul><li>Motivation and goals </li></ul><ul><li>The Framework </li></ul><ul><li>The Basic pose estimation method </li></ul><ul><ul><li>Pose estimation </li></ul></ul><ul><ul><li>Estimate joint locations (deformation) </li></ul></ul><ul><li>Scaling to large image databases </li></ul><ul><li>Using part examplars </li></ul><ul><li>3D Model estimation </li></ul><ul><li>Some Results </li></ul>
  3. 3. Motivation <ul><li>We receive an image of a person as input </li></ul><ul><li>What is the person in the image doing? </li></ul>
  4. 4. Motivation – continued <ul><li>We know that there is a person in the input image. We want to recover his body posture to understand the image (what the person in the image is doing) </li></ul><ul><li>If we had a database of many people in various poses, we could compare our image to the other images. </li></ul><ul><li>But – It’s not so simple… </li></ul>
  5. 5. Goals <ul><li>Given an input image of a person: </li></ul><ul><ul><li>Estimate body posture (joint locations) </li></ul></ul><ul><ul><li>Build 3D model </li></ul></ul><ul><ul><li>examples taken from Mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  6. 6. Agenda <ul><li>Motivation and goals </li></ul><ul><li>The Framework </li></ul><ul><li>The Basic pose estimation method </li></ul><ul><ul><li>Pose estimation </li></ul></ul><ul><ul><li>Estimate joint locations (deformation) </li></ul></ul><ul><li>Scaling to large image databases </li></ul><ul><li>Using part examplars </li></ul><ul><li>3D Model estimation </li></ul><ul><li>Some Results </li></ul>
  7. 7. The Framework <ul><li>We assume that we have a database of images of people in various poses, photographed from different angles. </li></ul><ul><li>In each image in the database, 14 joint locations are manually marked (wrists, elbows, shoulders, hips, knees, ankles, head, waist) </li></ul>
  8. 8. Agenda <ul><li>Motivation and goals </li></ul><ul><li>The Framework </li></ul><ul><li>The Basic pose estimation method </li></ul><ul><ul><li>Pose estimation </li></ul></ul><ul><ul><li>Estimate joint locations (deformation) </li></ul></ul><ul><li>Scaling to large image databases </li></ul><ul><li>Using part examplars </li></ul><ul><li>3D Model estimation </li></ul><ul><li>Some Results </li></ul>
  9. 9. The basic estimation algorithm - intuition <ul><li>In the basic estimation algorithm, we will attempt to deform each image from the database into the input image, and compute a “fit score” </li></ul><ul><li>Later we will see how to do this more efficiently </li></ul>Query image Database image
  10. 10. The basic estimation algorithm <ul><li>We want to test our input image against some image from our database and obtain a “fit score” </li></ul><ul><li>Edge detection is applied on each of two images </li></ul><ul><li>Points are sampled from resulting boundary (300-1000) </li></ul><ul><li>From now on, we will only work with these points </li></ul>
  11. 11. The basic estimation algorithm <ul><li>The deformation process consist of: </li></ul><ul><ul><li>Finding a correspondence between points sampled from both images (for every point sampled from boundary of exemplar image find the “best” point on the boundary of input image) </li></ul></ul><ul><ul><li>Find a deformation of exemplar points into input image </li></ul></ul><ul><li>This will repeat for several iterations </li></ul>
  12. 12. The shape context <ul><li>A term we will use: shape contexts </li></ul><ul><li>Shape contexts are point descriptors. They describe the shape around it. </li></ul><ul><li>In the algorithm we will use a variation: generalized shape contexts. First we will see the simpler variant. </li></ul>
  13. 13. Shape context (simple version) <ul><li>Radii of binning structure grows with distance from point because we want closer points to have more effect on the descriptor (SC) </li></ul>Count = 4 Count = 10 Count the number of points in each histogram bin: <ul><ul><li>example taken from Mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  14. 14. Generalized shape context <ul><li>A variation on the regular shape contexts </li></ul><ul><li>We sum the tangent vectors falling into bins, not count points </li></ul><ul><li>The gray arrows are tangent vectors of the sampled points. The blue ones are the histogram bin values (normalized) </li></ul><ul><li>We build a 2K-dimentional vector </li></ul>
  15. 15. The matching <ul><li>We want to find for every point on the exemplar image it’s corresponding point from the query image. </li></ul><ul><li>For each point in exemplar and query image, generalized shape context is calculated. </li></ul><ul><ul><li>Points with similar descriptors should be matched. </li></ul></ul><ul><li>The bipartite matching is used for this. </li></ul>
  16. 16. The bipartite matching <ul><li>We construct a weighted complete bipartite graph. </li></ul><ul><li>Nodes on two sides represent points sampled from two images </li></ul><ul><li>The weight of the edge represents cost of matching sample points. </li></ul><ul><li>To deal with outliers, we add to each side several “artificial” nodes, which are connected to each node on the other side with cost . </li></ul><ul><li>We find the lowest-cost perfect matching in this graph. </li></ul><ul><ul><li>One (simple) option is the Hungarian algorithm. </li></ul></ul><ul><li>The exemplar with lowest matching cost is selected </li></ul>Points sampled from exemplar Points sampled from query image
  17. 17. <ul><li>Our mission now is to estimate joint locations in input image </li></ul><ul><li>We have the pairs obtained from matching </li></ul><ul><li>We rely on the anatomic kinematic chain as the basis for our deformation model. </li></ul><ul><li>The kinematic chain consists of 9 segments: Torso, upper and lower arms, upper and lower legs. </li></ul>The deformable matching <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  18. 18. The deformable matching – cont’d <ul><li>First of all, we determine for exemplar points the segments they belong to </li></ul><ul><li>For this we connect joint locations by lines </li></ul><ul><li>Each point is assigned to the segment for which its distance is closest </li></ul><ul><li>We will denote by </li></ul><ul><li>the segment chosen for point </li></ul>
  19. 19. The deformation model – cont’d <ul><li>We allow translation of the torso points, and rotation of other segments around their joints </li></ul><ul><ul><li>hips around knees, arms around shoulders, etc. </li></ul></ul><ul><li>General idea: </li></ul><ul><ul><li>Find optimal ( in the least-squares sense) translation for the torso points </li></ul></ul><ul><ul><li>Find the optimal rotation for upper legs and arms around hips and shoulders. </li></ul></ul><ul><ul><li>Find optimal rotation of lower legs and arms around knees and elbows. </li></ul></ul><ul><li>After we find the optimal deformation for all points, we can apply it on the joints, and receive the location of the joints in the query image </li></ul>
  20. 20. <ul><li>The optimal (in least-squares sense) translation for torso points: </li></ul><ul><li>The solution for this is </li></ul>The deformation model – cont’d
  21. 21. The deformation model – cont’d <ul><li>For all other segments, we seek a rotational deformation around the relevant joint that will give us least-squares distances. </li></ul><ul><li>Supposing the deformation up to this point was </li></ul><ul><li>For segment , the joint location is . We seek the deformation </li></ul><ul><li>Solution: </li></ul>
  22. 22. The deformation model – cont’d <ul><li>The process is repeated for a small number of iterations (point matching and deformation) </li></ul><ul><li>Joint locations in input image are found by applying optimal deformation on joints from exemplar </li></ul><ul><li>We also have a score for the fit we have made: matching cost for the optimal assignment </li></ul>
  23. 23. A matching and deformation example Query image points Exemplar points <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  24. 24. A matching and deformation example Iteration 1 Iteration 3 Iteration 2 matching deformation <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  25. 25. Agenda <ul><li>Motivation and goals </li></ul><ul><li>The Framework </li></ul><ul><li>The Basic pose estimation method </li></ul><ul><ul><li>Pose estimation </li></ul></ul><ul><ul><li>Estimate joint locations (deformation) </li></ul></ul><ul><li>Scaling to large image databases </li></ul><ul><li>Using part examplars </li></ul><ul><li>3D Model estimation </li></ul><ul><li>Some Results </li></ul>
  26. 26. Scaling to large exemplar databases <ul><li>The simplest algorithm one can think of: </li></ul><ul><ul><li>Run basic algorithm on all images in database, for each one obtain matching score </li></ul></ul><ul><ul><li>Choose image with best score </li></ul></ul><ul><li>This is not applicable in systems with large exemplar databases, which are needed if we want to not to restrict the algorithm to specific body postures </li></ul><ul><li>We will present a method to solve this. </li></ul>
  27. 27. Scaling to large exemplar databases <ul><li>The idea : </li></ul><ul><ul><li>If the query image and the exemplar image are very different, there is no need to run the smart and expensive algorithm to find that this is a bad fit. </li></ul></ul><ul><li>Solution: </li></ul><ul><ul><li>Use a pruning algorithm to obtain a short list of “good” candidate images, then perform expensive and more accurate algorithm on each. </li></ul></ul>
  28. 28. The pruning algorithm <ul><li>For each exemplar in database, we precompute a large number of shape contexts </li></ul><ul><li>Shape contexts for i ’th exemplar: </li></ul><ul><li>For the query image we compute only a small number of representative shape contexts, </li></ul><ul><li>These will be enough to “disqualify” bad candidates </li></ul>
  29. 29. The pruning algorithm – cont’d <ul><li>For those representatives, we find the best matches from the precomputed shape contexts. </li></ul><ul><li>For representative best match from i’th exemplar is : </li></ul><ul><ul><li>The distance between shape context vectors is computed using the same formula as in matching cost: </li></ul></ul>
  30. 30. The pruning algorithm – cont’d <ul><li>Now we estimate distance between the shapes as normalized sum of matching cost of the r representative points </li></ul><ul><li>is a normalizing factor </li></ul><ul><ul><li>If representative number u was not a good representative point, we want it to have less effect on the cost </li></ul></ul>
  31. 31. The pruning algorithm – cont’d <ul><li>The shortlist of candidates is selected by sorting the exemplars by distance from query image </li></ul><ul><li>The basic algorithm is performed on the shortlist to find the best match </li></ul>
  32. 32. Selecting shortlist – example Query image Top 10 candidates <ul><ul><li>example taken from the paper </li></ul></ul>
  33. 33. Agenda <ul><li>Motivation and goals </li></ul><ul><li>The Framework </li></ul><ul><li>The Basic pose estimation method </li></ul><ul><ul><li>Pose estimation </li></ul></ul><ul><ul><li>Estimate joint locations (deformation) </li></ul></ul><ul><li>Scaling to large image databases </li></ul><ul><li>Using part examplars </li></ul><ul><li>3D Model estimation </li></ul><ul><li>Some Results </li></ul>
  34. 34. Matching part exemplars - motivation <ul><li>When using the algorithm presented above in a general matching framework (not restricted to specific body positions and camera angles) a very large image database is needed to succeed. </li></ul><ul><li>In this section we will show a method to reduce the exemplar database needed to match the shape. </li></ul><ul><ul><li>This will also reduce runtime </li></ul></ul>
  35. 35. Matching part exemplars - intuition <ul><li>The idea here is not to match the entire shape, but to match the different body parts independently </li></ul><ul><li>The resulting match might include body parts matched from different images </li></ul><ul><li>We allow six “limbs” as body parts: </li></ul><ul><ul><li>Left and right arms </li></ul></ul><ul><ul><li>Left and right legs </li></ul></ul><ul><ul><li>Waist </li></ul></ul><ul><ul><li>Head </li></ul></ul>
  36. 36. Example of matching part exemplars <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  37. 37. Matching part exemplars – cont’d <ul><li>The matching process starts in a similar way algorithm from previous section. </li></ul><ul><li>With the difference that score is not computed for the entire shape, but the score of matching limb is computed separately. </li></ul><ul><li>We’ll denote by the matching obtained by matching the limb from exemplar to limb in query image. </li></ul><ul><ul><li>will denote the limb’s matching score </li></ul></ul>Points sampled from i th exemplar Points sampled from query image
  38. 38. Matching part exemplars – cont’d <ul><li>We now want to find a combination of these separate limbs into a match for the entire shape. </li></ul><ul><li>The first idea that comes to mind is simply to choose for each limb the exemplar with highest score </li></ul><ul><ul><li>This is not a good idea, since in this simple manner nothing enforces the combination to be consistent </li></ul></ul><ul><li>Solution: </li></ul><ul><ul><li>Define a measure of consistency for a combination </li></ul></ul><ul><ul><li>Then, create a score that will take into account both the consistency score and the individual matching score for limbs </li></ul></ul>
  39. 39. Matching part exemplars: consistency score <ul><li>A combination is consistent if limbs are at “proper” distances from one another </li></ul><ul><li>Our measure of consistency will use the distances between limb base points </li></ul><ul><ul><li>shoulders for arms, hips for legs, for waist and head they are just the points </li></ul></ul><ul><li>We will enforce the following distances to be “proper”: </li></ul><ul><ul><li>Left arm – head </li></ul></ul><ul><ul><li>Right arm – head </li></ul></ul><ul><ul><li>Waist-head </li></ul></ul><ul><ul><li>Left leg – waist </li></ul></ul><ul><ul><li>Right leg - waist </li></ul></ul>
  40. 40. Matching part exemplars: consistency score – cont’d <ul><li>A combination of two limbs is consistent if the distance between them in the combination is comparable to the distance between those limbs in the original images </li></ul><ul><li>The consistency score of some combination will be sum of consistency scores across links </li></ul><ul><ul><li>For each of the links, we try all matching options, and compute the distance between bases in every matching option. This could even be computed in advance. </li></ul></ul>
  41. 41. Matching part exemplars: consistency score – cont’d <ul><li>We define as the consistency cost of combining limb from exemplar and limb from exemplar </li></ul><ul><li>is the 2D distance between limb bases </li></ul><ul><ul><li>is a link </li></ul></ul><ul><li>Note that as distance deviates from consistent exemplars, increases exponentially </li></ul>
  42. 42. Matching part exemplars <ul><li>Finally, we define the total combination cost of combination </li></ul><ul><li>and are determined manually </li></ul><ul><li>The combination with lowest overall score is selected </li></ul>Individual limb “fit score” Sum of consistency scores on all links
  43. 43. Example of matching part exemplars <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  44. 44. Agenda <ul><li>Motivation and goals </li></ul><ul><li>The Model </li></ul><ul><li>The Basic pose estimation method </li></ul><ul><ul><li>Point sampling </li></ul></ul><ul><ul><li>The shape context & generalized shape context </li></ul></ul><ul><ul><li>The point matching </li></ul></ul><ul><ul><li>Shape deformation </li></ul></ul><ul><li>Scaling the algorithm to large image databases </li></ul><ul><li>Matching part examplars </li></ul><ul><li>3D Model estimation </li></ul><ul><li>Some Results </li></ul>
  45. 45. Estimating 3D configuration <ul><li>We now want to build 3D “stick model” in the pose of person in query image </li></ul><ul><li>The method we use relies on simple geometry, and assumes the orthographic camera model </li></ul><ul><li>It assumes we know the following: </li></ul><ul><ul><li>The image coordinates of key points </li></ul></ul><ul><ul><li>The relative lengths of segments connecting these key points </li></ul></ul><ul><ul><li>For each segment, a labeling of “closer endpoint” </li></ul></ul><ul><ul><ul><li>We will assume these labels are supplied on exemplars, and automatically transferred after the matching process </li></ul></ul></ul>We have obtained them in the algorithm from previous sections These are simply proportion of human body parts
  46. 46. Estimating 3D configuration – cont’d <ul><li>We can find the configuration in 3D space up to some scaling factor s. </li></ul><ul><li>For every segment, we have: </li></ul><ul><li>For every segment, one endpoint position is known </li></ul><ul><ul><li>Since the configuration is connected, we fix one keypoint (lets say, head), and iteratively compute other keypoints by traversing the segments </li></ul></ul><ul><li>The system is solvable (if s is also fixed) </li></ul><ul><ul><li>There is a bound for s (because dZ is not complex) </li></ul></ul>
  47. 47. Agenda <ul><li>Motivation and goals </li></ul><ul><li>The Model </li></ul><ul><li>The Basic pose estimation method </li></ul><ul><ul><li>Point sampling </li></ul></ul><ul><ul><li>The shape context & generalized shape context </li></ul></ul><ul><ul><li>The point matching </li></ul></ul><ul><ul><li>Shape deformation </li></ul></ul><ul><li>Scaling the algorithm to large image databases </li></ul><ul><li>Matching part examplars </li></ul><ul><li>3D Model estimation </li></ul><ul><li>Some Results </li></ul>
  48. 48. Results of creating 3D model <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  49. 49. Results of creating 3D model <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  50. 50. Questions <ul><li>Now’s the time for your questions… </li></ul>? <ul><ul><li>example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul>
  51. 51. Bibliography & credits <ul><li>Some results and a few slides were taken from Mori’s webpage </li></ul><ul><ul><li>www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt </li></ul></ul><ul><li>A slightly different version of the paper can also be found there </li></ul><ul><ul><li>http://www.cs.sfu.ca/~mori/courses/cmpt882/papers/mori-eccv02.pdf </li></ul></ul>

×