A vne
 da cd
Ifr t nT e r i
nomai h oyn
     o
C P “ aN t e”
 VRi n us l
          hl                              CP
                                           VR
  T ti
   u rl
    oa                                J n 1 -82 1
                                       u e 31 0 0
                                      S nFa c c ,A
                                       a rn i oC
                                             s
Interest Points and Method of Types:
Visual Localization &Texture Categorization

Francisco Escolano
Interest Points

Background.From the classical Harris detector, a big bang of
interest-point detectors ensuring some sort of invariance to
zoom/scale, rotation and perspective distortion has emerged since
the proposal of SIFT detector and descriptors [Lowe,04].
These detectors, typically including a multi-scale analysis of the
image, include Harris Affine, MSER [Matas et al,02] and SURF [Bay
et al,08].The need of an intensive comparison of different detectors
(and descriptors) mainly in terms of spatio-temporal stability
(repeatability, distinctiveness and robustness) is yet a classic
challenge [Mikolajczyk et al, 05].
Stability experiments are key to predict the future behavior of the
dectector/descriptor in subsequent tasks (bag-of-words recognition,
matching,...)
                                                                      2/32
Entropy Saliency Detector

Local Saliency, in contrast to global one [Ullman, 96], means local
distinctiveness (outstanding/popout pixel distributions over the
image) [Julesz,81][Nothdurft,00].
In Computer Vision, a mild IT definition of local saliency is linked to
visual unpredictability [Kadir and Brady,01]. Then a salient region is
locally unpredictible (measured by entropy) and this is consistent
with a peak of entropy in scale-space.
Scale-space analysis is key because we do not know the scale of
regions beforehand. In addition, isotropic detections may be
extended to affine detectors with an extra computational cost. (see
Alg. 1).


                                                                         3/32
Entropy Saliency Detector(2)

Alg. 1: Kadir and Brady scale saliency algorithm

Input: Input image I, initial scale smin , final scale smax
for each pixel x do
      for each scale s between smin and smax do
                                                       L
             Calculate local entropy HD (s, x) = −           Ps,x (di ) log2 Ps,x (di )
                                                      i=1
      end
      Choose the set of scales at which entropy is a local maximum
      Sp = {s : HD (s − 1, x) < HD (s, x) > HD (s + 1, x)}
      for each scale s between smin and smax do
            if s ∈ Sp then
                   Entropy weight calculation by means of a self-dissimilarity measure in scale space
                                 s  2  L
                   WD (s, x) = 2s−1    i=1 | Ps,x (di ) − Ps−1,x (di ) |
                   Entropy weighting YD (s, x) = HD (s, x)WD (s, x)
             end
      end
end
Output: A disperse three dimensional matrix containing weighted local entropies for all pixels at those scales where
        entropy is peaked




                                                                                                                       4/32
Entropy Saliency Detector (and 3)




Figure: Isotropic (top) vs Affine (bottom) detections.
                                                       5/32
Learning and Chernoff Information

Scale-space analysis, is, thus, one of the bottlenecks of the process.
However, having a prior knowledge of the statistics of the images
being analyzed it is possible to discard a significant number of
pixels, and thus, avoid scale-space analysis.
Working hypothesis
If the local distribution around a pixel at a scale smax is highly
homogeneous (low entropy) one may assume that for scales
s < smax it will happen the same. Thus, scale-space peaks will not
exist in this range of scales.[Suau and Escolano,08].
Inspired in statistical detection of edges [Konishi et al.,03] and
contours [Cazorla & Escolano, 03] and also in contour grouping
[Cazorla et al.,02].

                                                                         6/32
Learning and Chernoff Information(2)

Relative entropy and threshold, a basic procedure consists of
computing the ratio between the entropy at smax and the maximum
of entropies for all pixels at their smax
Filtering by homogeneity along scale-space
 1. Calculate the local entropy HD for each pixel at scale smax .
 2. Select an entropy threshold σ ∈ [0, 1].
                  HD (x,smax )
 3. X = {x |   maxx {HD (x,smax )}   > σ}
 4. Apply scale saliency algorithm only to those pixels x ∈ X .
What is the optimal threshold σ?


                                                                    7/32
Learning and Chernoff Information(3)

Images belonging to the same image category or environment share
similar intensity and texture distributions, so it seems reasonable to
think that the entropy values of their most salient regions will lay in
the same range.

On/Off distributions
The pon (θ) defines the probability of a region to be part of the most
salient regions of the image given that its relative entropy value is θ,
while poff (θ) defines the probability of a region to do not be part of
the most salient regions of the image.
Then, the maximum relative entropy σ being pon (σ) > 0 may be
choosen as an entropy threshold for that image category by finding a
trade-off between false positives and negatives.

                                                                           8/32
Learning and Chernoff Information (4)




Figure: On(blue)/Off(red) distributions and thresholds.
                                                         9/32
Learning and Chernoff Information (5)

Chernoff Information
The expected error rate of a likelihood test based on pon (φ) and
poff (φ) decreases exponentially with respect to C (pon (φ), poff (φ)),
where:                                                 
                                         J
            C (p, q) = − min log             p λ (yj )q 1−λ (yj )
                         0≤λ≤1
                                      j=1

A related measure is Bhattachryya Distance (Chernoff with
λ = 1/2):                                       
                                     J
                                               1       1
               BC (p, q) = − log            p 2 (yj )q 2 (yj )
                                     j=1


                                                                        10/32
Learning and Chernoff Information (6)

Chernoff Bounds
A threshold T must be chosen for an image class so any pixel from
an image belonging to the same image class may be discarded if
log(pon (θ)/poff (θ)) < T . Being T

           −D(poff (θ)||pon (θ)) < T < D(pon (θ)||poff (θ))
and D(.||.) the Kullback-Leibler divergence:
                                J
                                                  p(yj )
                   D(p||q) =         p(yj ) log
                                                  q(yj )
                               j=1



                                                                    11/32
Learning and Chernoff Information (7)




Figure: Filtering for different image categories, T = 0

                                                         12/32
Learning and Chernoff Information (8)
Test set          Chernoff     T     % Points   % Time
airplanes side     0.415    -4.98   30.79%     42.12%   0.0943
                              0     60,11%     72.61%   2.9271
background         0.208    -2.33   15.89%     24.00%   0.6438
                              0     43.91%     54.39%   5.0290
bottles            0.184    -2.80    9.50%     20.50%   0.4447
                              0     23.56%     35.47%   1.9482
camel              0.138    -2.06   10.06%     20.94%   0.2556
                              0     40.10%     52.43%   4.2110
cars brad          0.236    -2.63   24.84%     36.57%   0.4293
                              0     48.26%     61.14%   3.4547
cars brad bg       0.327    -3.24   22.90%     34.06%   0.2091
                              0     57.18%     70.02%   4.1999
faces              0.278    -3.37   25.31%     37.21%   0.9057
                              0     54.76%     67.92%   8.3791
google things      0.160    -2.15   14.58%     25.48%   0.7444
                              0     40.49%     52.81%   5.7128
guitars            0.252    -3.11   15.34%     26.35%   0.2339
                              0     37.94%     50.11%   2.3745
houses             0.218    -2.62   16.09%     27.16%   0.2511
                              0     44.51%     56.88%   3.4209
leaves             0.470    -6.08   29.43%     41.44%   0.8699
                              0     46.60%     59.28%   3.0674
motorbikes side    0.181    -2.34   15.63%     27.64%   0.2947
                              0     38.62%     51.64%   3.7305

                                                                 13/32
Learning and Chernoff Information (9)




Figure: Filtering in the Caltech101 database, T = Tmin vs T = 0
                                                                  14/32
Learning and Chernoff Information (10)

Chernoff & Classification Error
Following the Chernoff theorem which exploits the Sanov’s theorem
(quantifying probability of rare events), for n i.i.d. samples
distributed by following Q, the probability of error for the test with
hypotheses Q = pon and Q = poff is given by:

             pe   = πon 2−nD(pλ ||pon ) + πoff 2−nD(pλ ||poff )
                  = 2−n min{D(pλ ||pon ),D(pλ ||poff )} ,

being πon and πoff the priors.
Choosing λ so that D(pλ ||pon ) = D(pλ ||poff ) = C (pon , poff ) we
have that Chernoff information is the best achievable exponent in
the Bayesian probability of error.
                                                                         15/32
Coarse-to-fine Visual Localization

Problem Statement
   A 6DOF SLAM method has built a 3D+2D map of an
   indoor/outdoor environment [Lozano et al, 09].
   We have manually marked 6 environments and trained a
   minimal complexity supervised classifier (see next lesson) for
   performing coarse localization.
   We got the statistics from the images of each environment in
   order to infer their respective pon and poff distributions and
   hence their Chernoff information and T bounds.
   Once a test image is submitted it is classified and filtered
   according to Chernoff information. Then the keypoints are
   computed.
                                                                   16/32
Coarse-to-fine Visual Localization (2)

Problem Statement (cont)
    Using the SIFT descriptors of the keypoints and the GTM
    algorithm [Aguilar et al, 09] we match the image with a
    structural + appearance prototype previously unsupervisedly
    learned through an EM-algorithm.
    The prototype tells us what is the sub-environment to which
    the image belongs.
    In order to perform fine localization we match the image with
    the structure and appearance off all images assigned to a given
    sub-enviromnent and then select the one with highest
    likelihood.
    See more in the Feature-selection lesson.
                                                                  17/32
Coarse-to-fine Visual Localization (3)




Figure: 3D+2D map for coarse-to-fine localization with % discarded

                                                                    18/32
Coarse-to-fine Visual Localization (4)




Figure: Filtering results in three environments with T = Tmin and T = 0
                                                                          19/32
KD-Partitions and Entropy

Data Partitions and Density Estimation
Let X be a d-dimensional random variable, and f (x) its pdf. Let
A = {Aj |j = 1, . . . , m} be a partition of X for which Ai ∩ Aj = ∅ if
i = j and j Aj = X . Then, we have [Stowell& Plumbley, 09]:

                          Aj   f (x)                 nj
                  fAj =                ˆ
                                       fAj (x) =           ,
                          µ(Aj )                   nµ(Aj )

where fAj approximates f (x) in each cell, µ(Aj ) is the d-dimensional
volume of Aj . If f (x) is unknown and we are given a set of samples
X = {x1 , . . . , xn } from it, being xi ∈ Rd , we can approximate the
probability of f (x) in each cell as pj = nj /n, where nj is the number
of samples in cell Aj .
                                                                          20/32
KD-Partitions and Entropy

Entropy Estimation
Differential Shannon entropy is then asymptotically approximated by
                          m
                    ˆ          nj       n
                    H=            log      µ(Aj )   ,
                               n        nj
                         j=1

and such approximation relies on the way of building the partition.
It is created recursively following the data splitting method of the
k-d tree algorithm. At each level, data is split at the median along
one axis. Then, data splitting is recursively applied to each subspace
until an uniformity stop criterion is satisfied.


                                                                     21/32
KD-Partitions and Entropy

Entropy Estimation (cont)
The aim of this stop criterion is to ensure that there is an uniform
density in each cell in order to best approximate f (x). The chosen
uniformity test is fast and depends on the median. The distribution
of the median of the samples in Aj tends to a normal distribution
that can be standardized as:
                 √ 2medd (Aj ) − mind (Aj ) − maxd (Aj )
          Zj =    nj                                     ,
                         maxd (Aj ) − mind (Aj )

When |Zj | > 1.96 (the 95% confidence threshold of a standard
Gaussian) declare significant deviation from uniformity. Not applied
                         √
until there are less than n data points in each partition.
                                                                       22/32
KD-Partitions and Divergence

KDP Total-Variation Divergence
The total variation distance [Denuit and Bellegem,01] between two
probability measures P and Q for a finite alphabet, is given by:
                                   1
                      δ(P, Q) =            |P(x) − Q(x)| .
                                   2   x

Then, the divergence is simply formulated as:
                p
            1                                        nx,j               no,j
δ(P, Q) =             |pj −qj | ∈ [0, 1], p(Aj ) =        = pj p(Aj ) =      = qj
            2                                        nx                 no
                j=1

where pi and pj are the proportion of samples of P and Q in cell Aj .
                                                                               23/32
KD-Partitions and Divergence




Figure: KDP divergence. Left: δ = 0.24. Right: δ = 0.92


                                                          24/32
Multi-Dimensional Saliency Algorithm
Alg. 2: MD Kadir and Brady scale saliency algorithm

Input: m−dimensional image I , initial scale smin , final scale smax
for each pixel x do
      for each scale si between smin and smax do
            (1) Create a m−dimensional sample set Xi = {xi } from N (si , x) in image I ;
            (2) Apply kd-partition to X in order to estimate
                            m
                               nj        n
             ˆ
            H(si , x) = −         log      µ(Aj )
                           j=1
                               n        nj
            if i > smin + 1 then
                        ˆ             ˆ             ˆ
                  (3)if H(si−2 , x) < H(si−1 , x) > H(si , x) then
                                                                                r
                                                                      1
                         (4)Compute Divergence: W = δ(Xi−1 , Xi−2 ) = 2              |pi−1 − pi−2 |
                                                                               j=1
                                                             ˆ
                         (5)Entropy weighting Y (si−1 , x) = H(si−1 , x) · W
                  end
                  else
                         (6) Y (si−1 , x) = 0
                  end
            end
      end
end
Output: An array Y containing weighted entropy values for all pixels on image at each scale


                                                                                                      25/32
MD-(Gabor) Saliency for Textures

KDP Total-Variation Divergence
    Use Brodatz dataset (111 textures and 9 images per category:
    999 images).
    Use 15 Gabor filters for obtaining multi-dimensional data.
    Both graylevel saliency and MD saliency are tuned to obtain
    150 salient points.
    Use each image in the database as query image.
    Use: saliency with only RIFT, only spin images, and combining
    RIFT and spin images.
    retrieval-recall results strongly influenced by the type of
    descriptor used [Suau & Escolano,10].

                                                                    26/32
MD-(Gabor) Saliency for Textures (2)




Figure: Salient pixels in textures. Left: MD-KPD. Right: Graylevel saliency

                                                                          27/32
MD-(Gabor) Saliency for Textures (3)




Figure: Average recall vs number of retrievals.
                                                  28/32
References

[Lowe,04] Lowe, D. (2004). Distinctive image features from scale
invariant keypoints. International Journal of Computer Vision,
60(2):91–110
[Matas et al,02] Matas, J., Chum, O., Urban, M., and Pajdla, T.
(2004). Ro- bust wide baseline stereo from maximally stable extremal
regions. Image and Vision Computing, 22(10):761–767
[Mikolajczyk et al, 05] Mikolajczyk, K., Tuytelaars, T., Schmid, C.,
Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., and Gool, L. V.
(2005). A comparison of afne region detectors. International Journal
of Computer Vision, 65(1/2):43–72
[Ullman,96] High-level Vision, MIT Press, 1996

                                                                     29/32
References (2)


[Julesz,81] Julesz, B. (1981). Textons, the Elements of Texture
Perception, and their Interactions. Nature 290 (5802): 91–97
[Nothdurft,00] Nothdurft, H.C. Salience from feature contrast:
variations with texture density. Vision Research 40 (2000): 3181–3200
[Kadir and Brady,01] Kadir, T. and Brady, M. (2001). Scale, saliency
and image description. International Journal of Computer Vision,
45(2):83–105
[Suau and Escolano,08] Suau, P., Escolano, F. (2008) Bayesian
Optimization of the Scale Saliency Filter. Image and Vision
Computing, 26(9), pp. 1207–1218


                                                                    30/32
References (3)

[Konishi et al.,03] Konishi, S., Yuille, A. L., Coughlan, J. M., and
Zhu, S. C. (2003). Statistical edge detection: learning and evaluating
edge cues. IEEE Trans. on PAMI, 25(1):57–74
[Cazorla & Escolano] Cazorla, M. and Escolano, F. (2003). Two
Bayesian methods for junction detection. IEEE Transactions on
Image Processing, 12(3):317–327
[Cazorla et al.,02] Cazorla, M., Escolano, F., Gallardo, D., and Rizo,
R. (2002). Junction detection and grouping with probabilistic edge
models and Bayesian A*. Pattern Recognition, 35(9):1869–1881
[Lozano et al, 09] Lozano M,A., Escolano, F., Bonev, B., Suau, P.,
Aguilar, W., S´ez, J.M., Cazorla, M. (2009). Region and constell.
               a
based categorization of images with unsupervised graph learning.
Image Vision Comput. 27(7): 960–978
                                                                         31/32
References (4)


[Aguilar et al, 09] Aguilar, W., Frauel Y., Escolano, F., Martnez-Prez,
M.E., Espinosa-Romero, A., Lozano, M.A. (2009) A robust Graph
Transformation Matching for non-rigid registration. Image Vision
Comput. 27(7): 897–910
[Stowell& Plumbley, 09] Stowell, D. and Plumbley, M. D. (2009).
Fast multidimensional entropy estimation by k-d partitioning. IEEE
Signal Processing Letters, 16(6):537–540
[Suau & Escolano,10] Suau, P., Escolano, F. (2010). Analysis of the
Multi-Dimensional Scale Saliency Algorithm and its Application to
Texture Categorization, SSPR’2010 (accepted)


                                                                      32/32

CVPR2010: Advanced ITinCVPR in a Nutshell: part 2: Interest Points

  • 1.
    A vne dacd Ifr t nT e r i nomai h oyn o C P “ aN t e” VRi n us l hl CP VR T ti u rl oa J n 1 -82 1 u e 31 0 0 S nFa c c ,A a rn i oC s Interest Points and Method of Types: Visual Localization &Texture Categorization Francisco Escolano
  • 2.
    Interest Points Background.From theclassical Harris detector, a big bang of interest-point detectors ensuring some sort of invariance to zoom/scale, rotation and perspective distortion has emerged since the proposal of SIFT detector and descriptors [Lowe,04]. These detectors, typically including a multi-scale analysis of the image, include Harris Affine, MSER [Matas et al,02] and SURF [Bay et al,08].The need of an intensive comparison of different detectors (and descriptors) mainly in terms of spatio-temporal stability (repeatability, distinctiveness and robustness) is yet a classic challenge [Mikolajczyk et al, 05]. Stability experiments are key to predict the future behavior of the dectector/descriptor in subsequent tasks (bag-of-words recognition, matching,...) 2/32
  • 3.
    Entropy Saliency Detector LocalSaliency, in contrast to global one [Ullman, 96], means local distinctiveness (outstanding/popout pixel distributions over the image) [Julesz,81][Nothdurft,00]. In Computer Vision, a mild IT definition of local saliency is linked to visual unpredictability [Kadir and Brady,01]. Then a salient region is locally unpredictible (measured by entropy) and this is consistent with a peak of entropy in scale-space. Scale-space analysis is key because we do not know the scale of regions beforehand. In addition, isotropic detections may be extended to affine detectors with an extra computational cost. (see Alg. 1). 3/32
  • 4.
    Entropy Saliency Detector(2) Alg.1: Kadir and Brady scale saliency algorithm Input: Input image I, initial scale smin , final scale smax for each pixel x do for each scale s between smin and smax do L Calculate local entropy HD (s, x) = − Ps,x (di ) log2 Ps,x (di ) i=1 end Choose the set of scales at which entropy is a local maximum Sp = {s : HD (s − 1, x) < HD (s, x) > HD (s + 1, x)} for each scale s between smin and smax do if s ∈ Sp then Entropy weight calculation by means of a self-dissimilarity measure in scale space s 2 L WD (s, x) = 2s−1 i=1 | Ps,x (di ) − Ps−1,x (di ) | Entropy weighting YD (s, x) = HD (s, x)WD (s, x) end end end Output: A disperse three dimensional matrix containing weighted local entropies for all pixels at those scales where entropy is peaked 4/32
  • 5.
    Entropy Saliency Detector(and 3) Figure: Isotropic (top) vs Affine (bottom) detections. 5/32
  • 6.
    Learning and ChernoffInformation Scale-space analysis, is, thus, one of the bottlenecks of the process. However, having a prior knowledge of the statistics of the images being analyzed it is possible to discard a significant number of pixels, and thus, avoid scale-space analysis. Working hypothesis If the local distribution around a pixel at a scale smax is highly homogeneous (low entropy) one may assume that for scales s < smax it will happen the same. Thus, scale-space peaks will not exist in this range of scales.[Suau and Escolano,08]. Inspired in statistical detection of edges [Konishi et al.,03] and contours [Cazorla & Escolano, 03] and also in contour grouping [Cazorla et al.,02]. 6/32
  • 7.
    Learning and ChernoffInformation(2) Relative entropy and threshold, a basic procedure consists of computing the ratio between the entropy at smax and the maximum of entropies for all pixels at their smax Filtering by homogeneity along scale-space 1. Calculate the local entropy HD for each pixel at scale smax . 2. Select an entropy threshold σ ∈ [0, 1]. HD (x,smax ) 3. X = {x | maxx {HD (x,smax )} > σ} 4. Apply scale saliency algorithm only to those pixels x ∈ X . What is the optimal threshold σ? 7/32
  • 8.
    Learning and ChernoffInformation(3) Images belonging to the same image category or environment share similar intensity and texture distributions, so it seems reasonable to think that the entropy values of their most salient regions will lay in the same range. On/Off distributions The pon (θ) defines the probability of a region to be part of the most salient regions of the image given that its relative entropy value is θ, while poff (θ) defines the probability of a region to do not be part of the most salient regions of the image. Then, the maximum relative entropy σ being pon (σ) > 0 may be choosen as an entropy threshold for that image category by finding a trade-off between false positives and negatives. 8/32
  • 9.
    Learning and ChernoffInformation (4) Figure: On(blue)/Off(red) distributions and thresholds. 9/32
  • 10.
    Learning and ChernoffInformation (5) Chernoff Information The expected error rate of a likelihood test based on pon (φ) and poff (φ) decreases exponentially with respect to C (pon (φ), poff (φ)), where:   J C (p, q) = − min log  p λ (yj )q 1−λ (yj ) 0≤λ≤1 j=1 A related measure is Bhattachryya Distance (Chernoff with λ = 1/2):   J 1 1 BC (p, q) = − log  p 2 (yj )q 2 (yj ) j=1 10/32
  • 11.
    Learning and ChernoffInformation (6) Chernoff Bounds A threshold T must be chosen for an image class so any pixel from an image belonging to the same image class may be discarded if log(pon (θ)/poff (θ)) < T . Being T −D(poff (θ)||pon (θ)) < T < D(pon (θ)||poff (θ)) and D(.||.) the Kullback-Leibler divergence: J p(yj ) D(p||q) = p(yj ) log q(yj ) j=1 11/32
  • 12.
    Learning and ChernoffInformation (7) Figure: Filtering for different image categories, T = 0 12/32
  • 13.
    Learning and ChernoffInformation (8) Test set Chernoff T % Points % Time airplanes side 0.415 -4.98 30.79% 42.12% 0.0943 0 60,11% 72.61% 2.9271 background 0.208 -2.33 15.89% 24.00% 0.6438 0 43.91% 54.39% 5.0290 bottles 0.184 -2.80 9.50% 20.50% 0.4447 0 23.56% 35.47% 1.9482 camel 0.138 -2.06 10.06% 20.94% 0.2556 0 40.10% 52.43% 4.2110 cars brad 0.236 -2.63 24.84% 36.57% 0.4293 0 48.26% 61.14% 3.4547 cars brad bg 0.327 -3.24 22.90% 34.06% 0.2091 0 57.18% 70.02% 4.1999 faces 0.278 -3.37 25.31% 37.21% 0.9057 0 54.76% 67.92% 8.3791 google things 0.160 -2.15 14.58% 25.48% 0.7444 0 40.49% 52.81% 5.7128 guitars 0.252 -3.11 15.34% 26.35% 0.2339 0 37.94% 50.11% 2.3745 houses 0.218 -2.62 16.09% 27.16% 0.2511 0 44.51% 56.88% 3.4209 leaves 0.470 -6.08 29.43% 41.44% 0.8699 0 46.60% 59.28% 3.0674 motorbikes side 0.181 -2.34 15.63% 27.64% 0.2947 0 38.62% 51.64% 3.7305 13/32
  • 14.
    Learning and ChernoffInformation (9) Figure: Filtering in the Caltech101 database, T = Tmin vs T = 0 14/32
  • 15.
    Learning and ChernoffInformation (10) Chernoff & Classification Error Following the Chernoff theorem which exploits the Sanov’s theorem (quantifying probability of rare events), for n i.i.d. samples distributed by following Q, the probability of error for the test with hypotheses Q = pon and Q = poff is given by: pe = πon 2−nD(pλ ||pon ) + πoff 2−nD(pλ ||poff ) = 2−n min{D(pλ ||pon ),D(pλ ||poff )} , being πon and πoff the priors. Choosing λ so that D(pλ ||pon ) = D(pλ ||poff ) = C (pon , poff ) we have that Chernoff information is the best achievable exponent in the Bayesian probability of error. 15/32
  • 16.
    Coarse-to-fine Visual Localization ProblemStatement A 6DOF SLAM method has built a 3D+2D map of an indoor/outdoor environment [Lozano et al, 09]. We have manually marked 6 environments and trained a minimal complexity supervised classifier (see next lesson) for performing coarse localization. We got the statistics from the images of each environment in order to infer their respective pon and poff distributions and hence their Chernoff information and T bounds. Once a test image is submitted it is classified and filtered according to Chernoff information. Then the keypoints are computed. 16/32
  • 17.
    Coarse-to-fine Visual Localization(2) Problem Statement (cont) Using the SIFT descriptors of the keypoints and the GTM algorithm [Aguilar et al, 09] we match the image with a structural + appearance prototype previously unsupervisedly learned through an EM-algorithm. The prototype tells us what is the sub-environment to which the image belongs. In order to perform fine localization we match the image with the structure and appearance off all images assigned to a given sub-enviromnent and then select the one with highest likelihood. See more in the Feature-selection lesson. 17/32
  • 18.
    Coarse-to-fine Visual Localization(3) Figure: 3D+2D map for coarse-to-fine localization with % discarded 18/32
  • 19.
    Coarse-to-fine Visual Localization(4) Figure: Filtering results in three environments with T = Tmin and T = 0 19/32
  • 20.
    KD-Partitions and Entropy DataPartitions and Density Estimation Let X be a d-dimensional random variable, and f (x) its pdf. Let A = {Aj |j = 1, . . . , m} be a partition of X for which Ai ∩ Aj = ∅ if i = j and j Aj = X . Then, we have [Stowell& Plumbley, 09]: Aj f (x) nj fAj = ˆ fAj (x) = , µ(Aj ) nµ(Aj ) where fAj approximates f (x) in each cell, µ(Aj ) is the d-dimensional volume of Aj . If f (x) is unknown and we are given a set of samples X = {x1 , . . . , xn } from it, being xi ∈ Rd , we can approximate the probability of f (x) in each cell as pj = nj /n, where nj is the number of samples in cell Aj . 20/32
  • 21.
    KD-Partitions and Entropy EntropyEstimation Differential Shannon entropy is then asymptotically approximated by m ˆ nj n H= log µ(Aj ) , n nj j=1 and such approximation relies on the way of building the partition. It is created recursively following the data splitting method of the k-d tree algorithm. At each level, data is split at the median along one axis. Then, data splitting is recursively applied to each subspace until an uniformity stop criterion is satisfied. 21/32
  • 22.
    KD-Partitions and Entropy EntropyEstimation (cont) The aim of this stop criterion is to ensure that there is an uniform density in each cell in order to best approximate f (x). The chosen uniformity test is fast and depends on the median. The distribution of the median of the samples in Aj tends to a normal distribution that can be standardized as: √ 2medd (Aj ) − mind (Aj ) − maxd (Aj ) Zj = nj , maxd (Aj ) − mind (Aj ) When |Zj | > 1.96 (the 95% confidence threshold of a standard Gaussian) declare significant deviation from uniformity. Not applied √ until there are less than n data points in each partition. 22/32
  • 23.
    KD-Partitions and Divergence KDPTotal-Variation Divergence The total variation distance [Denuit and Bellegem,01] between two probability measures P and Q for a finite alphabet, is given by: 1 δ(P, Q) = |P(x) − Q(x)| . 2 x Then, the divergence is simply formulated as: p 1 nx,j no,j δ(P, Q) = |pj −qj | ∈ [0, 1], p(Aj ) = = pj p(Aj ) = = qj 2 nx no j=1 where pi and pj are the proportion of samples of P and Q in cell Aj . 23/32
  • 24.
    KD-Partitions and Divergence Figure:KDP divergence. Left: δ = 0.24. Right: δ = 0.92 24/32
  • 25.
    Multi-Dimensional Saliency Algorithm Alg.2: MD Kadir and Brady scale saliency algorithm Input: m−dimensional image I , initial scale smin , final scale smax for each pixel x do for each scale si between smin and smax do (1) Create a m−dimensional sample set Xi = {xi } from N (si , x) in image I ; (2) Apply kd-partition to X in order to estimate m nj n ˆ H(si , x) = − log µ(Aj ) j=1 n nj if i > smin + 1 then ˆ ˆ ˆ (3)if H(si−2 , x) < H(si−1 , x) > H(si , x) then r 1 (4)Compute Divergence: W = δ(Xi−1 , Xi−2 ) = 2 |pi−1 − pi−2 | j=1 ˆ (5)Entropy weighting Y (si−1 , x) = H(si−1 , x) · W end else (6) Y (si−1 , x) = 0 end end end end Output: An array Y containing weighted entropy values for all pixels on image at each scale 25/32
  • 26.
    MD-(Gabor) Saliency forTextures KDP Total-Variation Divergence Use Brodatz dataset (111 textures and 9 images per category: 999 images). Use 15 Gabor filters for obtaining multi-dimensional data. Both graylevel saliency and MD saliency are tuned to obtain 150 salient points. Use each image in the database as query image. Use: saliency with only RIFT, only spin images, and combining RIFT and spin images. retrieval-recall results strongly influenced by the type of descriptor used [Suau & Escolano,10]. 26/32
  • 27.
    MD-(Gabor) Saliency forTextures (2) Figure: Salient pixels in textures. Left: MD-KPD. Right: Graylevel saliency 27/32
  • 28.
    MD-(Gabor) Saliency forTextures (3) Figure: Average recall vs number of retrievals. 28/32
  • 29.
    References [Lowe,04] Lowe, D.(2004). Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 60(2):91–110 [Matas et al,02] Matas, J., Chum, O., Urban, M., and Pajdla, T. (2004). Ro- bust wide baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10):761–767 [Mikolajczyk et al, 05] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., and Gool, L. V. (2005). A comparison of afne region detectors. International Journal of Computer Vision, 65(1/2):43–72 [Ullman,96] High-level Vision, MIT Press, 1996 29/32
  • 30.
    References (2) [Julesz,81] Julesz,B. (1981). Textons, the Elements of Texture Perception, and their Interactions. Nature 290 (5802): 91–97 [Nothdurft,00] Nothdurft, H.C. Salience from feature contrast: variations with texture density. Vision Research 40 (2000): 3181–3200 [Kadir and Brady,01] Kadir, T. and Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2):83–105 [Suau and Escolano,08] Suau, P., Escolano, F. (2008) Bayesian Optimization of the Scale Saliency Filter. Image and Vision Computing, 26(9), pp. 1207–1218 30/32
  • 31.
    References (3) [Konishi etal.,03] Konishi, S., Yuille, A. L., Coughlan, J. M., and Zhu, S. C. (2003). Statistical edge detection: learning and evaluating edge cues. IEEE Trans. on PAMI, 25(1):57–74 [Cazorla & Escolano] Cazorla, M. and Escolano, F. (2003). Two Bayesian methods for junction detection. IEEE Transactions on Image Processing, 12(3):317–327 [Cazorla et al.,02] Cazorla, M., Escolano, F., Gallardo, D., and Rizo, R. (2002). Junction detection and grouping with probabilistic edge models and Bayesian A*. Pattern Recognition, 35(9):1869–1881 [Lozano et al, 09] Lozano M,A., Escolano, F., Bonev, B., Suau, P., Aguilar, W., S´ez, J.M., Cazorla, M. (2009). Region and constell. a based categorization of images with unsupervised graph learning. Image Vision Comput. 27(7): 960–978 31/32
  • 32.
    References (4) [Aguilar etal, 09] Aguilar, W., Frauel Y., Escolano, F., Martnez-Prez, M.E., Espinosa-Romero, A., Lozano, M.A. (2009) A robust Graph Transformation Matching for non-rigid registration. Image Vision Comput. 27(7): 897–910 [Stowell& Plumbley, 09] Stowell, D. and Plumbley, M. D. (2009). Fast multidimensional entropy estimation by k-d partitioning. IEEE Signal Processing Letters, 16(6):537–540 [Suau & Escolano,10] Suau, P., Escolano, F. (2010). Analysis of the Multi-Dimensional Scale Saliency Algorithm and its Application to Texture Categorization, SSPR’2010 (accepted) 32/32