Recovering 3D human body configurations using shape contexts

Recovering 3D human body configurations using shape contexts Greg Mori & Jitendra Malik Presented by Joseph Vainshtein Winter 2007

Agenda Motivation and goals The Framework The Basic pose estimation method Pose estimation Estimate joint locations (deformation) Scaling to large image databases Using part examplars 3D Model estimation Some Results

Motivation We receive an image of a person as input What is the person in the image doing?

Motivation – continued We know that there is a person in the input image. We want to recover his body posture to understand the image (what the person in the image is doing) If we had a database of many people in various poses, we could compare our image to the other images. But – It’s not so simple…

Goals Given an input image of a person: Estimate body posture (joint locations) Build 3D model examples taken from Mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

The Framework We assume that we have a database of images of people in various poses, photographed from different angles. In each image in the database, 14 joint locations are manually marked (wrists, elbows, shoulders, hips, knees, ankles, head, waist)

The basic estimation algorithm - intuition In the basic estimation algorithm, we will attempt to deform each image from the database into the input image, and compute a “fit score” Later we will see how to do this more efficiently Query image Database image

The basic estimation algorithm We want to test our input image against some image from our database and obtain a “fit score” Edge detection is applied on each of two images Points are sampled from resulting boundary (300-1000) From now on, we will only work with these points

The basic estimation algorithm The deformation process consist of: Finding a correspondence between points sampled from both images (for every point sampled from boundary of exemplar image find the “best” point on the boundary of input image) Find a deformation of exemplar points into input image This will repeat for several iterations

The shape context A term we will use: shape contexts Shape contexts are point descriptors. They describe the shape around it. In the algorithm we will use a variation: generalized shape contexts. First we will see the simpler variant.

Shape context (simple version) Radii of binning structure grows with distance from point because we want closer points to have more effect on the descriptor (SC) Count = 4 Count = 10 Count the number of points in each histogram bin: example taken from Mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

Generalized shape context A variation on the regular shape contexts We sum the tangent vectors falling into bins, not count points The gray arrows are tangent vectors of the sampled points. The blue ones are the histogram bin values (normalized) We build a 2K-dimentional vector

The matching We want to find for every point on the exemplar image it’s corresponding point from the query image. For each point in exemplar and query image, generalized shape context is calculated. Points with similar descriptors should be matched. The bipartite matching is used for this.

The bipartite matching We construct a weighted complete bipartite graph. Nodes on two sides represent points sampled from two images The weight of the edge represents cost of matching sample points. To deal with outliers, we add to each side several “artificial” nodes, which are connected to each node on the other side with cost . We find the lowest-cost perfect matching in this graph. One (simple) option is the Hungarian algorithm. The exemplar with lowest matching cost is selected Points sampled from exemplar Points sampled from query image

Our mission now is to estimate joint locations in input image We have the pairs obtained from matching We rely on the anatomic kinematic chain as the basis for our deformation model. The kinematic chain consists of 9 segments: Torso, upper and lower arms, upper and lower legs. The deformable matching example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

The deformable matching – cont’d First of all, we determine for exemplar points the segments they belong to For this we connect joint locations by lines Each point is assigned to the segment for which its distance is closest We will denote by the segment chosen for point

The deformation model – cont’d We allow translation of the torso points, and rotation of other segments around their joints hips around knees, arms around shoulders, etc. General idea: Find optimal ( in the least-squares sense) translation for the torso points Find the optimal rotation for upper legs and arms around hips and shoulders. Find optimal rotation of lower legs and arms around knees and elbows. After we find the optimal deformation for all points, we can apply it on the joints, and receive the location of the joints in the query image

The optimal (in least-squares sense) translation for torso points: The solution for this is The deformation model – cont’d

The deformation model – cont’d For all other segments, we seek a rotational deformation around the relevant joint that will give us least-squares distances. Supposing the deformation up to this point was For segment , the joint location is . We seek the deformation Solution:

The deformation model – cont’d The process is repeated for a small number of iterations (point matching and deformation) Joint locations in input image are found by applying optimal deformation on joints from exemplar We also have a score for the fit we have made: matching cost for the optimal assignment

A matching and deformation example Query image points Exemplar points example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

A matching and deformation example Iteration 1 Iteration 3 Iteration 2 matching deformation example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

Scaling to large exemplar databases The simplest algorithm one can think of: Run basic algorithm on all images in database, for each one obtain matching score Choose image with best score This is not applicable in systems with large exemplar databases, which are needed if we want to not to restrict the algorithm to specific body postures We will present a method to solve this.

Scaling to large exemplar databases The idea : If the query image and the exemplar image are very different, there is no need to run the smart and expensive algorithm to find that this is a bad fit. Solution: Use a pruning algorithm to obtain a short list of “good” candidate images, then perform expensive and more accurate algorithm on each.

The pruning algorithm For each exemplar in database, we precompute a large number of shape contexts Shape contexts for i ’th exemplar: For the query image we compute only a small number of representative shape contexts, These will be enough to “disqualify” bad candidates

The pruning algorithm – cont’d For those representatives, we find the best matches from the precomputed shape contexts. For representative best match from i’th exemplar is : The distance between shape context vectors is computed using the same formula as in matching cost:

The pruning algorithm – cont’d Now we estimate distance between the shapes as normalized sum of matching cost of the r representative points is a normalizing factor If representative number u was not a good representative point, we want it to have less effect on the cost

The pruning algorithm – cont’d The shortlist of candidates is selected by sorting the exemplars by distance from query image The basic algorithm is performed on the shortlist to find the best match

Selecting shortlist – example Query image Top 10 candidates example taken from the paper

Matching part exemplars - motivation When using the algorithm presented above in a general matching framework (not restricted to specific body positions and camera angles) a very large image database is needed to succeed. In this section we will show a method to reduce the exemplar database needed to match the shape. This will also reduce runtime

Matching part exemplars - intuition The idea here is not to match the entire shape, but to match the different body parts independently The resulting match might include body parts matched from different images We allow six “limbs” as body parts: Left and right arms Left and right legs Waist Head

Example of matching part exemplars example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

Matching part exemplars – cont’d The matching process starts in a similar way algorithm from previous section. With the difference that score is not computed for the entire shape, but the score of matching limb is computed separately. We’ll denote by the matching obtained by matching the limb from exemplar to limb in query image. will denote the limb’s matching score Points sampled from i th exemplar Points sampled from query image

Matching part exemplars – cont’d We now want to find a combination of these separate limbs into a match for the entire shape. The first idea that comes to mind is simply to choose for each limb the exemplar with highest score This is not a good idea, since in this simple manner nothing enforces the combination to be consistent Solution: Define a measure of consistency for a combination Then, create a score that will take into account both the consistency score and the individual matching score for limbs

Matching part exemplars: consistency score A combination is consistent if limbs are at “proper” distances from one another Our measure of consistency will use the distances between limb base points shoulders for arms, hips for legs, for waist and head they are just the points We will enforce the following distances to be “proper”: Left arm – head Right arm – head Waist-head Left leg – waist Right leg - waist

Matching part exemplars: consistency score – cont’d A combination of two limbs is consistent if the distance between them in the combination is comparable to the distance between those limbs in the original images The consistency score of some combination will be sum of consistency scores across links For each of the links, we try all matching options, and compute the distance between bases in every matching option. This could even be computed in advance.

Matching part exemplars: consistency score – cont’d We define as the consistency cost of combining limb from exemplar and limb from exemplar is the 2D distance between limb bases is a link Note that as distance deviates from consistent exemplars, increases exponentially

Matching part exemplars Finally, we define the total combination cost of combination and are determined manually The combination with lowest overall score is selected Individual limb “fit score” Sum of consistency scores on all links

Agenda Motivation and goals The Model The Basic pose estimation method Point sampling The shape context & generalized shape context The point matching Shape deformation Scaling the algorithm to large image databases Matching part examplars 3D Model estimation Some Results

Estimating 3D configuration We now want to build 3D “stick model” in the pose of person in query image The method we use relies on simple geometry, and assumes the orthographic camera model It assumes we know the following: The image coordinates of key points The relative lengths of segments connecting these key points For each segment, a labeling of “closer endpoint” We will assume these labels are supplied on exemplars, and automatically transferred after the matching process We have obtained them in the algorithm from previous sections These are simply proportion of human body parts

Estimating 3D configuration – cont’d We can find the configuration in 3D space up to some scaling factor s. For every segment, we have: For every segment, one endpoint position is known Since the configuration is connected, we fix one keypoint (lets say, head), and iteratively compute other keypoints by traversing the segments The system is solvable (if s is also fixed) There is a bound for s (because dZ is not complex)

Results of creating 3D model example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

Questions Now’s the time for your questions… ? example taken from mori’s webpage - www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt

Bibliography & credits Some results and a few slides were taken from Mori’s webpage www.cs.sfu.ca/~ mori /research/papers/ mori _mecv01.ppt A slightly different version of the paper can also be found there http://www.cs.sfu.ca/~mori/courses/cmpt882/papers/mori-eccv02.pdf

Recovering 3D human body configurations using shape contexts

More Related Content

Similar to Recovering 3D human body configurations using shape contexts

More from wolf

Recently uploaded

Recovering 3D human body configurations using shape contexts