Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- CVPR2010: Advanced ITinCVPR in a Nu... by zukun 417 views
- CVPR2010: Advanced ITinCVPR in a Nu... by zukun 298 views
- Cvpr2007 object category recognitio... by zukun 465 views
- CVPR2010: Advanced ITinCVPR in a Nu... by zukun 383 views
- Iccv2009 recognition and learning o... by zukun 467 views
- Envoi bari by Outremeuse 480 views

542 views

Published on

License: CC Attribution-ShareAlike License

No Downloads

Total views

542

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

3

Comments

0

Likes

1

No embeds

No notes for slide

- 1. A vne da cdIfr t nT e r inomai h oyn oC P “ aN t e” VRi n us l hl CP VR T ti u rl oa J n 1 -82 1 u e 31 0 0 S nFa c c ,A a rn i oC sInterest Points and Method of Types:Visual Localization &Texture CategorizationFrancisco Escolano
- 2. Interest PointsBackground.From the classical Harris detector, a big bang ofinterest-point detectors ensuring some sort of invariance tozoom/scale, rotation and perspective distortion has emerged sincethe proposal of SIFT detector and descriptors [Lowe,04].These detectors, typically including a multi-scale analysis of theimage, include Harris Aﬃne, MSER [Matas et al,02] and SURF [Bayet al,08].The need of an intensive comparison of diﬀerent detectors(and descriptors) mainly in terms of spatio-temporal stability(repeatability, distinctiveness and robustness) is yet a classicchallenge [Mikolajczyk et al, 05].Stability experiments are key to predict the future behavior of thedectector/descriptor in subsequent tasks (bag-of-words recognition,matching,...) 2/32
- 3. Entropy Saliency DetectorLocal Saliency, in contrast to global one [Ullman, 96], means localdistinctiveness (outstanding/popout pixel distributions over theimage) [Julesz,81][Nothdurft,00].In Computer Vision, a mild IT deﬁnition of local saliency is linked tovisual unpredictability [Kadir and Brady,01]. Then a salient region islocally unpredictible (measured by entropy) and this is consistentwith a peak of entropy in scale-space.Scale-space analysis is key because we do not know the scale ofregions beforehand. In addition, isotropic detections may beextended to aﬃne detectors with an extra computational cost. (seeAlg. 1). 3/32
- 4. Entropy Saliency Detector(2)Alg. 1: Kadir and Brady scale saliency algorithmInput: Input image I, initial scale smin , ﬁnal scale smaxfor each pixel x do for each scale s between smin and smax do L Calculate local entropy HD (s, x) = − Ps,x (di ) log2 Ps,x (di ) i=1 end Choose the set of scales at which entropy is a local maximum Sp = {s : HD (s − 1, x) < HD (s, x) > HD (s + 1, x)} for each scale s between smin and smax do if s ∈ Sp then Entropy weight calculation by means of a self-dissimilarity measure in scale space s 2 L WD (s, x) = 2s−1 i=1 | Ps,x (di ) − Ps−1,x (di ) | Entropy weighting YD (s, x) = HD (s, x)WD (s, x) end endendOutput: A disperse three dimensional matrix containing weighted local entropies for all pixels at those scales where entropy is peaked 4/32
- 5. Entropy Saliency Detector (and 3)Figure: Isotropic (top) vs Aﬃne (bottom) detections. 5/32
- 6. Learning and Chernoﬀ InformationScale-space analysis, is, thus, one of the bottlenecks of the process.However, having a prior knowledge of the statistics of the imagesbeing analyzed it is possible to discard a signiﬁcant number ofpixels, and thus, avoid scale-space analysis.Working hypothesisIf the local distribution around a pixel at a scale smax is highlyhomogeneous (low entropy) one may assume that for scaless < smax it will happen the same. Thus, scale-space peaks will notexist in this range of scales.[Suau and Escolano,08].Inspired in statistical detection of edges [Konishi et al.,03] andcontours [Cazorla & Escolano, 03] and also in contour grouping[Cazorla et al.,02]. 6/32
- 7. Learning and Chernoﬀ Information(2)Relative entropy and threshold, a basic procedure consists ofcomputing the ratio between the entropy at smax and the maximumof entropies for all pixels at their smaxFiltering by homogeneity along scale-space 1. Calculate the local entropy HD for each pixel at scale smax . 2. Select an entropy threshold σ ∈ [0, 1]. HD (x,smax ) 3. X = {x | maxx {HD (x,smax )} > σ} 4. Apply scale saliency algorithm only to those pixels x ∈ X .What is the optimal threshold σ? 7/32
- 8. Learning and Chernoﬀ Information(3)Images belonging to the same image category or environment sharesimilar intensity and texture distributions, so it seems reasonable tothink that the entropy values of their most salient regions will lay inthe same range.On/Oﬀ distributionsThe pon (θ) deﬁnes the probability of a region to be part of the mostsalient regions of the image given that its relative entropy value is θ,while poﬀ (θ) deﬁnes the probability of a region to do not be part ofthe most salient regions of the image.Then, the maximum relative entropy σ being pon (σ) > 0 may bechoosen as an entropy threshold for that image category by ﬁnding atrade-oﬀ between false positives and negatives. 8/32
- 9. Learning and Chernoﬀ Information (4)Figure: On(blue)/Oﬀ(red) distributions and thresholds. 9/32
- 10. Learning and Chernoﬀ Information (5)Chernoﬀ InformationThe expected error rate of a likelihood test based on pon (φ) andpoﬀ (φ) decreases exponentially with respect to C (pon (φ), poﬀ (φ)),where: J C (p, q) = − min log p λ (yj )q 1−λ (yj ) 0≤λ≤1 j=1A related measure is Bhattachryya Distance (Chernoﬀ withλ = 1/2): J 1 1 BC (p, q) = − log p 2 (yj )q 2 (yj ) j=1 10/32
- 11. Learning and Chernoﬀ Information (6)Chernoﬀ BoundsA threshold T must be chosen for an image class so any pixel froman image belonging to the same image class may be discarded iflog(pon (θ)/poﬀ (θ)) < T . Being T −D(poﬀ (θ)||pon (θ)) < T < D(pon (θ)||poﬀ (θ))and D(.||.) the Kullback-Leibler divergence: J p(yj ) D(p||q) = p(yj ) log q(yj ) j=1 11/32
- 12. Learning and Chernoﬀ Information (7)Figure: Filtering for diﬀerent image categories, T = 0 12/32
- 13. Learning and Chernoﬀ Information (8)Test set Chernoﬀ T % Points % Timeairplanes side 0.415 -4.98 30.79% 42.12% 0.0943 0 60,11% 72.61% 2.9271background 0.208 -2.33 15.89% 24.00% 0.6438 0 43.91% 54.39% 5.0290bottles 0.184 -2.80 9.50% 20.50% 0.4447 0 23.56% 35.47% 1.9482camel 0.138 -2.06 10.06% 20.94% 0.2556 0 40.10% 52.43% 4.2110cars brad 0.236 -2.63 24.84% 36.57% 0.4293 0 48.26% 61.14% 3.4547cars brad bg 0.327 -3.24 22.90% 34.06% 0.2091 0 57.18% 70.02% 4.1999faces 0.278 -3.37 25.31% 37.21% 0.9057 0 54.76% 67.92% 8.3791google things 0.160 -2.15 14.58% 25.48% 0.7444 0 40.49% 52.81% 5.7128guitars 0.252 -3.11 15.34% 26.35% 0.2339 0 37.94% 50.11% 2.3745houses 0.218 -2.62 16.09% 27.16% 0.2511 0 44.51% 56.88% 3.4209leaves 0.470 -6.08 29.43% 41.44% 0.8699 0 46.60% 59.28% 3.0674motorbikes side 0.181 -2.34 15.63% 27.64% 0.2947 0 38.62% 51.64% 3.7305 13/32
- 14. Learning and Chernoﬀ Information (9)Figure: Filtering in the Caltech101 database, T = Tmin vs T = 0 14/32
- 15. Learning and Chernoﬀ Information (10)Chernoﬀ & Classiﬁcation ErrorFollowing the Chernoﬀ theorem which exploits the Sanov’s theorem(quantifying probability of rare events), for n i.i.d. samplesdistributed by following Q, the probability of error for the test withhypotheses Q = pon and Q = poﬀ is given by: pe = πon 2−nD(pλ ||pon ) + πoﬀ 2−nD(pλ ||poﬀ ) = 2−n min{D(pλ ||pon ),D(pλ ||poﬀ )} ,being πon and πoﬀ the priors.Choosing λ so that D(pλ ||pon ) = D(pλ ||poﬀ ) = C (pon , poﬀ ) wehave that Chernoﬀ information is the best achievable exponent inthe Bayesian probability of error. 15/32
- 16. Coarse-to-ﬁne Visual LocalizationProblem Statement A 6DOF SLAM method has built a 3D+2D map of an indoor/outdoor environment [Lozano et al, 09]. We have manually marked 6 environments and trained a minimal complexity supervised classiﬁer (see next lesson) for performing coarse localization. We got the statistics from the images of each environment in order to infer their respective pon and poﬀ distributions and hence their Chernoﬀ information and T bounds. Once a test image is submitted it is classiﬁed and ﬁltered according to Chernoﬀ information. Then the keypoints are computed. 16/32
- 17. Coarse-to-ﬁne Visual Localization (2)Problem Statement (cont) Using the SIFT descriptors of the keypoints and the GTM algorithm [Aguilar et al, 09] we match the image with a structural + appearance prototype previously unsupervisedly learned through an EM-algorithm. The prototype tells us what is the sub-environment to which the image belongs. In order to perform ﬁne localization we match the image with the structure and appearance oﬀ all images assigned to a given sub-enviromnent and then select the one with highest likelihood. See more in the Feature-selection lesson. 17/32
- 18. Coarse-to-ﬁne Visual Localization (3)Figure: 3D+2D map for coarse-to-ﬁne localization with % discarded 18/32
- 19. Coarse-to-ﬁne Visual Localization (4)Figure: Filtering results in three environments with T = Tmin and T = 0 19/32
- 20. KD-Partitions and EntropyData Partitions and Density EstimationLet X be a d-dimensional random variable, and f (x) its pdf. LetA = {Aj |j = 1, . . . , m} be a partition of X for which Ai ∩ Aj = ∅ ifi = j and j Aj = X . Then, we have [Stowell& Plumbley, 09]: Aj f (x) nj fAj = ˆ fAj (x) = , µ(Aj ) nµ(Aj )where fAj approximates f (x) in each cell, µ(Aj ) is the d-dimensionalvolume of Aj . If f (x) is unknown and we are given a set of samplesX = {x1 , . . . , xn } from it, being xi ∈ Rd , we can approximate theprobability of f (x) in each cell as pj = nj /n, where nj is the numberof samples in cell Aj . 20/32
- 21. KD-Partitions and EntropyEntropy EstimationDiﬀerential Shannon entropy is then asymptotically approximated by m ˆ nj n H= log µ(Aj ) , n nj j=1and such approximation relies on the way of building the partition.It is created recursively following the data splitting method of thek-d tree algorithm. At each level, data is split at the median alongone axis. Then, data splitting is recursively applied to each subspaceuntil an uniformity stop criterion is satisﬁed. 21/32
- 22. KD-Partitions and EntropyEntropy Estimation (cont)The aim of this stop criterion is to ensure that there is an uniformdensity in each cell in order to best approximate f (x). The chosenuniformity test is fast and depends on the median. The distributionof the median of the samples in Aj tends to a normal distributionthat can be standardized as: √ 2medd (Aj ) − mind (Aj ) − maxd (Aj ) Zj = nj , maxd (Aj ) − mind (Aj )When |Zj | > 1.96 (the 95% conﬁdence threshold of a standardGaussian) declare signiﬁcant deviation from uniformity. Not applied √until there are less than n data points in each partition. 22/32
- 23. KD-Partitions and DivergenceKDP Total-Variation DivergenceThe total variation distance [Denuit and Bellegem,01] between twoprobability measures P and Q for a ﬁnite alphabet, is given by: 1 δ(P, Q) = |P(x) − Q(x)| . 2 xThen, the divergence is simply formulated as: p 1 nx,j no,jδ(P, Q) = |pj −qj | ∈ [0, 1], p(Aj ) = = pj p(Aj ) = = qj 2 nx no j=1where pi and pj are the proportion of samples of P and Q in cell Aj . 23/32
- 24. KD-Partitions and DivergenceFigure: KDP divergence. Left: δ = 0.24. Right: δ = 0.92 24/32
- 25. Multi-Dimensional Saliency AlgorithmAlg. 2: MD Kadir and Brady scale saliency algorithmInput: m−dimensional image I , initial scale smin , ﬁnal scale smaxfor each pixel x do for each scale si between smin and smax do (1) Create a m−dimensional sample set Xi = {xi } from N (si , x) in image I ; (2) Apply kd-partition to X in order to estimate m nj n ˆ H(si , x) = − log µ(Aj ) j=1 n nj if i > smin + 1 then ˆ ˆ ˆ (3)if H(si−2 , x) < H(si−1 , x) > H(si , x) then r 1 (4)Compute Divergence: W = δ(Xi−1 , Xi−2 ) = 2 |pi−1 − pi−2 | j=1 ˆ (5)Entropy weighting Y (si−1 , x) = H(si−1 , x) · W end else (6) Y (si−1 , x) = 0 end end endendOutput: An array Y containing weighted entropy values for all pixels on image at each scale 25/32
- 26. MD-(Gabor) Saliency for TexturesKDP Total-Variation Divergence Use Brodatz dataset (111 textures and 9 images per category: 999 images). Use 15 Gabor ﬁlters for obtaining multi-dimensional data. Both graylevel saliency and MD saliency are tuned to obtain 150 salient points. Use each image in the database as query image. Use: saliency with only RIFT, only spin images, and combining RIFT and spin images. retrieval-recall results strongly inﬂuenced by the type of descriptor used [Suau & Escolano,10]. 26/32
- 27. MD-(Gabor) Saliency for Textures (2)Figure: Salient pixels in textures. Left: MD-KPD. Right: Graylevel saliency 27/32
- 28. MD-(Gabor) Saliency for Textures (3)Figure: Average recall vs number of retrievals. 28/32
- 29. References[Lowe,04] Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision,60(2):91–110[Matas et al,02] Matas, J., Chum, O., Urban, M., and Pajdla, T.(2004). Ro- bust wide baseline stereo from maximally stable extremalregions. Image and Vision Computing, 22(10):761–767[Mikolajczyk et al, 05] Mikolajczyk, K., Tuytelaars, T., Schmid, C.,Zisserman, A., Matas, J., Schaﬀalitzky, F., Kadir, T., and Gool, L. V.(2005). A comparison of afne region detectors. International Journalof Computer Vision, 65(1/2):43–72[Ullman,96] High-level Vision, MIT Press, 1996 29/32
- 30. References (2)[Julesz,81] Julesz, B. (1981). Textons, the Elements of TexturePerception, and their Interactions. Nature 290 (5802): 91–97[Nothdurft,00] Nothdurft, H.C. Salience from feature contrast:variations with texture density. Vision Research 40 (2000): 3181–3200[Kadir and Brady,01] Kadir, T. and Brady, M. (2001). Scale, saliencyand image description. International Journal of Computer Vision,45(2):83–105[Suau and Escolano,08] Suau, P., Escolano, F. (2008) BayesianOptimization of the Scale Saliency Filter. Image and VisionComputing, 26(9), pp. 1207–1218 30/32
- 31. References (3)[Konishi et al.,03] Konishi, S., Yuille, A. L., Coughlan, J. M., andZhu, S. C. (2003). Statistical edge detection: learning and evaluatingedge cues. IEEE Trans. on PAMI, 25(1):57–74[Cazorla & Escolano] Cazorla, M. and Escolano, F. (2003). TwoBayesian methods for junction detection. IEEE Transactions onImage Processing, 12(3):317–327[Cazorla et al.,02] Cazorla, M., Escolano, F., Gallardo, D., and Rizo,R. (2002). Junction detection and grouping with probabilistic edgemodels and Bayesian A*. Pattern Recognition, 35(9):1869–1881[Lozano et al, 09] Lozano M,A., Escolano, F., Bonev, B., Suau, P.,Aguilar, W., S´ez, J.M., Cazorla, M. (2009). Region and constell. abased categorization of images with unsupervised graph learning.Image Vision Comput. 27(7): 960–978 31/32
- 32. References (4)[Aguilar et al, 09] Aguilar, W., Frauel Y., Escolano, F., Martnez-Prez,M.E., Espinosa-Romero, A., Lozano, M.A. (2009) A robust GraphTransformation Matching for non-rigid registration. Image VisionComput. 27(7): 897–910[Stowell& Plumbley, 09] Stowell, D. and Plumbley, M. D. (2009).Fast multidimensional entropy estimation by k-d partitioning. IEEESignal Processing Letters, 16(6):537–540[Suau & Escolano,10] Suau, P., Escolano, F. (2010). Analysis of theMulti-Dimensional Scale Saliency Algorithm and its Application toTexture Categorization, SSPR’2010 (accepted) 32/32

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment