Upcoming SlideShare
×

Pami meanshift

1,016 views

Published on

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,016
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
21
0
Likes
0
Embeds 0
No embeds

No notes for slide

Pami meanshift

2. 2. 604 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 5, MAY 2002Fig. 1. Example of a feature space. (a) A 400 Â 276 color image. (b) Corresponding L*u*v* color space with 110; 400 data points.some proximity measure. See [28, Section 3.2] for a survey points xi , i  1; . . . ; n in the d-dimensional space Rd , theof hierarchical clustering methods. The hierarchical meth- multivariate kernel density estimator with kernel Kx and aods tend to be computationally expensive and the definition symmetric positive definite d Â d bandwidth matrix H,of a meaningful stopping criterion for the fusion (or computed in the point x is given bydivision) of the data is not straightforward. The rationale behind the density estimation-based non- ^ 1 n fx  KH x À xi ; 1parametric clustering approach is that the feature space can n i1be regarded as the empirical probability density function(p.d.f.) of the represented parameter. Dense regions in the wherefeature space thus correspond to local maxima of the p.d.f., KH x j H jÀ1=2 KHÀ1=2 x: 2that is, to the modes of the unknown density. Once thelocation of a mode is determined, the cluster associated The d-variate kernel Kx is a bounded function withwith it is delineated based on the local structure of the compact support satisfying [62, p. 95]feature space [25], [60], [63].  Our approach to mode detection and clustering is based on Kxdx  1 lim kxkd Kx  0the mean shift procedure, proposed in 1975 by Fukunaga and Rd kxk3I   3Hostetler [21] and largely forgotten until Chengs paper [7]rekindled interest in it. In spite of its excellent qualities, the xKxdx  0 xxb Kxdx  cK I; Rd Rdmean shift procedure does not seem to be known in statisticalliterature. While the book [54, Section 6.2.2] discusses [21], the where cK is a constant. The multivariate kernel can beadvantages of employing a mean shift type procedure in generated from a symmetric univariate kernel K1 x in twodensity estimation were only recently rediscovered [8]. different ways As will be proven in the sequel, a computational module  dbased on the mean shift procedure is an extremely versatile K P x  K1 xi  K S x  ak;d K1 kxk; 4tool for feature space analysis and can provide reliable i1solutions for many vision tasks. In Section 2, the mean shiftprocedure is defined and its properties are analyzed. In where K P x is obtained from the product of the univariateSection 3, the procedure is used as the computational kernels and K S x from rotating K1 x in Rd i.e., K S x is ,module for robust feature space analysis and implementa- radially symmetric. The constant aÀ1  Rd K1 kxkdx k;dtional issues are discussed. In Section 4, the feature space assures that K S x integrates to one, though this conditionanalysis technique is applied to two low-level vision tasks: can be relaxed in our context. Either type of multivariatediscontinuity preserving filtering and image segmentation. kernel obeys (3), but, for our purposes, the radiallyBoth algorithms can have as input either gray level or color symmetric kernels are often more suitable.images and the only parameter to be tuned by the user is We are interested only in a special class of radiallythe resolution of the analysis. The applicability of the mean symmetric kernels satisfyingshift procedure is not restricted to the presented examples. Kx  ck;d kkxk2 ; 5In Section 5, other applications are mentioned and theprocedure is put into a more general context. in which case it suffices to define the function kx called the profile of the kernel, only for x ! 0. The normalization constant ck;d , which makes Kx integrate to one, is2 THE MEAN SHIFT PROCEDURE assumed strictly positive.Kernel density estimation (known as the Parzen window Using a fully parameterized H increases the complexitytechnique in pattern recognition literature [17, Section 4.3]) is of the estimation [62, p. 106] and, in practice, the bandwidththe most popular density estimation method. Given n data matrix H is chosen either as diagonal H  diagh2 ; . . . ; h2 , 1 d
3. 3. COMANICIU AND MEER: MEAN SHIFT: A ROBUST APPROACH TOWARD FEATURE SPACE ANALYSIS 605or proportional to the identity matrix H  h2 I. The clear We define the functionadvantage of the latter case is that only one bandwidthparameter h > 0 must be provided; however, as can be seen gx  ÀkH x; 13from (2), then the validity of an Euclidean metric for the assuming that the derivative of the kernel profile k exists forfeature space should be confirmed first. Employing only all x P 0; I, except for a finite set of points. Now, usingone bandwidth parameter, the kernel density estimator (1) gx for profile, the kernel Gx is defined asbecomes the well-known expression 1  x À xi n Gx  cg;d g kxk2 ; 14 ^ fx  d K : 6 nh i1 h where cg;d is the corresponding normalization constant. The The quality of a kernel density estimator is measured by kernel Kx was called the shadow of Gx in [7] in a slightlythe mean of the square error between the density and its different context. Note that the Epanechnikov kernel is theestimate, integrated over the domain of definition. In practice, shadow of the uniform kernel, i.e., the d-dimensional unithowever, only an asymptotic approximation of this measure sphere, while the normal kernel and its shadow have the same(denoted as AMISE) can be computed. Under the asympto- expression.tics, the number of data points n 3 I, while the bandwidth Introducing gx into (12) yields,h 3 0 at a rate slower than nÀ1 . For both types of multivariatekernels, the AMISE measure is minimized by the Epanechni- ^ rf h;K xkov kernel [51, p. 139], [62, p. 104] having the profile 2ck;d  n x À xi 2  d2 xi À xg 1Àx 0 x 1 nh h kE x  7 i1 0 x 1; 4 5Pn Q 2ck;d  x À xi 2 n i1 xi g xÀxi 2 h R Swhich yields the radially symmetric kernel  d2 nh g h n xÀxi 2 À x ; i1 i1 g h c d  21 À kxk2  kxk 1 1 À1 KE x  2 d 8 15 0 otherwise; n xÀxi 2 where i1 g h is assumed to be a positive number.where cd is the volume of the unit d-dimensional sphere. This condition is easy to satisfy for all the profiles met inNote that the Epanechnikov profile is not differentiable at practice. Both terms of the product in (15) have specialthe boundary. The profile significance. From (11), the first term is proportional to the 1 density estimate at x computed with the kernel G kN x  exp À x x!0 9 2  x À xi 2 n ^h;G x  cg;d f g : 16yields the multivariate normal kernel nhd i1 h 1 The second term is the mean shift KN x  2Àd=2 exp À kxk2 10 2 n 2 i1 xi g xÀxi for both types of composition (4). The normal kernel is often h mh;G x   À x; 17symmetrically truncated to have a kernel with finite support. n g xÀxi 2 i1 h While these two kernels will suffice for most applicationswe are interested in, all the results presented below are valid i.e., the difference between the weighted mean, using thefor arbitrary kernels within the conditions to be stated. kernel G for weights, and x, the center of the kernelEmploying the profile notation, the density estimator (6) can (window). From (16) and (17), (15) becomesbe rewritten as 2ck;d  x À xi 2 n ^ ^ rfh;K x  fh;G x 2 mh;G x; 18 f^h;K x  ck;d k : 11 h cg;d nhd i1 h yieldingThe first step in the analysis of a feature space with the 1 ^ rfh;K xunderlying density fx is to find the modes of this density. mh;G x  h2 c : 19The modes are located among the zeros of the gradient 2 ^ fh;G xrfx  0 and the mean shift procedure is an elegant wayto locate these zeros without estimating the density. The expression (19) shows that, at location x, the mean shift vector computed with kernel G is proportional to the normal-2.1 Density Gradient Estimation ized density gradient estimate obtained with kernel K. TheThe density gradient estimator is obtained as the gradient of normalization is by the density estimate in x computed withthe density estimator by exploiting the linearity of (11) the kernel G. The mean shift vector thus always points toward the direction of maximum increase in the density. This is a n 2 ^ h;K x rfh;K x  2ck;d rf ^ H x À xi x À xi k : more general formulation of the property first remarked by nhd2 i1 h Fukunaga and Hostetler [20, p. 535], [21], and discussed in [7]. The relation captured in (19) is intuitive, the local mean is 12 shifted toward the region in which the majority of the