SlideShare a Scribd company logo
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 3321
Adaptive Metric Learning for Saliency Detection
Shuang Li, Huchuan Lu, Senior Member, IEEE, Zhe Lin, Member, IEEE,
Xiaohui Shen, Member, IEEE, and Brian Price
Abstract—In this paper, we propose a novel adaptive metric
learning algorithm (AML) for visual saliency detection. A key
observation is that the saliency of a superpixel can be estimated
by the distance from the most certain foreground and background
seeds. Instead of measuring distance on the Euclidean space,
we present a learning method based on two complementary
Mahalanobis distance metrics: 1) generic metric learning (GML)
and 2) specific metric learning (SML). GML aims at the global
distribution of the whole training set, while SML considers the
specific structure of a single image. Considering that multiple
similarity measures from different views may enhance the
relevant information and alleviate the irrelevant one, we try
to fuse the GML and SML together and experimentally find
the combining result does work well. Different from the most
existing methods which are directly based on low-level features,
we devise a superpixelwise Fisher vector coding approach to
better distinguish salient objects from the background. We also
propose an accurate seeds selection mechanism and exploit
contextual and multiscale information when constructing the final
saliency map. Experimental results on various image sets
show that the proposed AML performs favorably against the
state-of-the-arts.
Index Terms—Metric learning, saliency detection,
Mahalanobis distance, Fisher vector.
I. INTRODUCTION
VISUAL saliency aims at finding the regions on an image
that are more visually distinctive or important and often
serves as a pre-processing procedure for many vision tasks,
such as image categorization [1], image retrieval [2], image
compression [3], content-aware image/video resizing [4], etc.
Visual saliency basically breaks down into the problem of
separating the salient regions from the non-salient ones by
measuring differences in their features. Numerous models and
algorithms have been proposed to perform this. Unsupervised
approaches [5]–[9] are stimuli-driven and rely largely on
distinguishing low-level visual features. Early unsupervised
models, such as Gaussian pyramids [5], central-surround [5],
fuzzy growing [10] are mainly inspired by original biological
Manuscript received August 21, 2014; revised February 10, 2015 and
April 10, 2015; accepted May 26, 2015. Date of publication June 3,
2015; date of current version June 23, 2015. This work was supported
in part by the Natural Science Foundation of China under Grant 61472060
and in part by the Fundamental Research Funds for the Central Universities
under Grant DUT14YQ101. The associate editor coordinating the review of
this manuscript and approving it for publication was Mr. Pierre-Marc Jodoin.
S. Li and H. Lu are with the School of Information and Communication
Engineering, Faculty of Electronic Information and Electrical Engineering,
Dalian University of Technology, Dalian 116024, China (e-mail:
shuangli59app@gmail.com; lhchuan@dlut.edu.cn).
Z. Lin, X. Shen, and B. Price are with Adobe Research, San Jose,
CA 95110 USA (e-mail: zlin@adobe.com; xshen@adobe.com;
bprice@adobe.com).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2015.2440755
Fig. 1. The comparison between the Euclidean distance space and the
Mahalanobis distance space. The Mahalanobis distance is more discriminative
than the Euclidean distance, since its background part is less salient.
vision stimulus. Later studies address saliency detection from
broader views, e.g., convex hull [7], [11] and frequency
domain [12], [13]. In contrast, supervised methods [14]–[16]
incorporate high-level and known information to better
distinguish the salient regions by learning salient visual
information from a large number of images with ground truth
labels.
Despite the differences in these methods, they all require
the basic ability to compute a difference measure on some
regions features to distinguish them. To the best of our
knowledge, all existing models address saliency detection
based on the Euclidean distance. However, Euclidean distance
weights features equally without considering the distribution of
the data, thereby it becomes invalid when detecting objects in
complex images. This phenomenon happens frequently in the
saliency detection process, especially when the salient regions
and backgrounds are similar, which leads to the problem
that the Euclidean distances between the foregrounds and
the similar backgrounds are smaller than the distances within
the foregrounds. Figure 1 illustrates this problem. Given an
image, we first select some initial seeds, including foreground
and backgrounds seeds. The process of seeds selection is the
same as Section III-C mentioned. We compute the distance
between each superpixel and seeds and draw the distance
distribution in Figure 1. We observed that the Mahalanobis
distance is more distinctive than the Euclidean distance, since
its background part is less salient. This motivates us to train
a discriminative distance metric to assign appropriate weights
to features so that the objects can be precisely separated from
the background.
We use metric learning to compute a more discriminative
distance measure. Distance metric learning has been widely
1057-7149 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
3322 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015
Fig. 2. The comparison between low-level features and our SFV feature. (a) input image. (b) saliency map based on low-level features. (c) saliency map
based on SFV. (d) ground truth.
Fig. 3. Pipeline of the adaptive metric learning algorithm. IT [5], GB [19], LR [14], RC [6] are other four saliency methods.
adopted for this purpose in different applications since it takes
into account the covariance information when estimating the
data distributions and improves the performance of learning
methods significantly. To our knowledge, we are the first to
successfully formulate the saliency detection problem into a
metric learning framework and our method works well on
different databases. We also propose a Superpixel-wise Fisher
Vector coding approach which maps the low-level features,
such as RGB and LAB, to high dimensional sparse vector.
Compared with using low-level features directly, the SFV is
more discriminative in challenging environments as shown
in Figure 2. Thus we use SFV features to describe each
superpixel.
In this paper, we adopt an effective feature coding method
and propose a novel metric learning based saliency detection
model, which incorporates both supervised and
semi-supervised information. Our algorithm considers both
the global distribution of the whole training dataset (GML)
and the typical structure of a specific image (SML), and
we successfully fuse them together to extract the clustering
characteristics for estimating the final saliency map. Figure 3
shows the pipeline of our method. First, as an extension of the
traditional Fisher Vector coding [17], Superpixel-wise Fisher
Vector coding is proposed to describe superpixels by learning
the parameters of a Gaussian mixture model (Section III-A).
Second, we train a Generic metric from the training
set (Section III-B1) and apply it to a single image to find
the saliency seeds with the assistance of the superpixel-wise
objectness map generated by [18] (Section III-C).
Third, a Specific metric based on kernel classification is
learnt from the chosen seeds for each image (Section III-B2).
Finally, by integrating the Generic metric and Specific
metric together (Section III-D), we obtain the clustering
information for each superpixel and use it to generate the
final saliency map (Section III-E). The GML and SML as
shown in Figure 3 are two intermediate images which are
not really generated when computing saliency maps. But they
serve as comparisons to demonstrate the efficiency of the
fused results in Section IV-A. The main contributions of our
work include:
• Two metric learning approaches are first applied to
saliency detection as the optimal distance measure of
two superpixels. GML is learnt from the global training
set while SML is learnt from the specific image training
samples. They are complementary to each other and
achieve promising results after the affinity aggregation.
• A superpixel-wise fisher vector coding method is first
put forward which contains image contextual information
when representing superpixels and makes supervised
learning methods more suitable for single image
processing.
• An accurate seeds selection method is first presented
based on the Mahalanobis distance metric. The selected
seeds serve as training samples of the Specific metric
learning and reference nodes when evaluating saliency
values.
LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3323
Experimental results on various image sets show that our
method is comparable with most of the state-of-the-arts and
the proposed metric learning approaches can be extended to
other fields as well.
II. RELATED WORK
Significant improvement and prosperity in saliency
detection have been witnessed in recent years. Numerous
unsupervised approaches have been proposed under different
theoretical models. Cheng et al. [6] propose a global region
contrast algorithm which simultaneously considers the spatial
coherence across the regions and the global contrast over
the entire image. However, low-level color contrast becomes
invalid when dealing with challenging scenes. Li et al. [20]
compute the dense and sparse reconstruction errors based
on background templates which are extracted from image
boundaries. They propose several integration strategies, such
as multi-scale reconstruction error and Bayesian integration,
which improve the performance of saliency detection
significantly. In [21], boundary connectivity, a robust
background measure, is first applied to saliency detection.
It characterizes the spatial layout of image regions and
provides a specific geometrical explanation to its definition.
Perazzi et al. [22] formulate the saliency estimation and
complete contrast using high-dimensional Gaussian filters.
They modify SLIC [23] and demonstrate the effectiveness of
their superpixel segmentation approach in detecting salient
objects.
Furthermore, lacking the knowledge of sizes and locations
of objects, boundary prior and objectness are often adopted
to highlight the salient regions or depress the backgrounds.
Jiang et al. [18] construct saliency by integrating
three visual cues, including uniqueness, focusness and
objectness (UFO), where uniqueness represents color contrast;
focusness indicates the degree of focus, often appearing as the
reverse of blurriness; objectness proposed by Alexe et al. [24]
is the likelihood of a given image window containing an
object. In [25], Wei et al. define the saliency value of
each patch as the shortest distance to the image boundary,
observing that image boundaries are more likely to be the
background. However, this assumption is less convincing,
especially when the scene is challenging.
Compared with unsupervised approaches, supervised
methods are apparently rare. In [26] and [27], Jiang et al.
also propose a multi-scale learning approach, which maps the
regional feature vector to a saliency score and fuse these
scores across multiple levels to generate the final saliency
map. They introduce a novel feature vector, which integrates
the regional contrast, regional property and regional
backgroundness descriptors together, to represent each region
and learn a discriminative random forest regressor to predict
regional scores. Shen and Wu [14] treat an image as the
combination of sparse noises and the low-rank matrix. They
extract low-level features to form high-level priors and then
incorporate the priors to a low-rank matrix recovery model
for constructing the saliency map. However, the saliency
assignment near the object is unsatisfying due to the ambiguity
of prior maps. Liu et al. [28] formulate the saliency detection
as a partial differential equation problem and solve it under
an adaptive PDE learning framework. They learn the optimal
saliency seeds via discrete submodularity and use seeds as
boundary condition to solve the Linear Elliptic System.
Inspired by these works, we construct a metric fusion
framework which contains two complementary metric learning
approaches to generate robust and accurate saliency maps even
in complex scenes. Our method encodes low-level features into
a high-dimensional feature space and incorporates multi-scale
and objectness information when measuring saliency values.
Therefore, our method can uniformly highlight objects with
explicit object boundaries.
III. PROPOSED ALGORITHM
In this section, we present an effective and robust adaptive
metric learning method for visual saliency detection. The
proposed algorithm proceeds through five steps to generate
the final saliency map. Firstly, we extract low-level features
to encode the superpixels generated by the simple
linear iterative clustering (SLIC) [23] algorithm
with a Superpixel-wise Fisher Vector representation.
Secondly, two Mahalanobis distance metric learning
approaches, Generic metric learning and Specific metric
learning are introduced to learn the optimal distance measure
of superpixels. Thirdly, we propose a novel seeds selection
strategy based on the Mahalanobis distance to generate
saliency seeds, which can be used to train Specific metric
as training samples and evaluate the saliency values as
referenced nodes. Fourthly, a metric fusion framework is
presented to fuse the Generic and Specific metrics together.
Finally, we obtain graceful and smooth saliency maps by
combining the spectral clustering and multi-scale information.
A. Superpixel-Wise Fisher Vector Coding (SFV)
Appropriate feature coding approaches can effectively
extract main information and remove the redundancies, thus
greatly improving the performance of saliency detection.
Fisher Vector can be regarded as an extension of the
well-known bag-of-words representation, since it captures
the first-order and second-order differences between local
features and the centers of a Mixture of Gaussian Distributions.
Recently, Chen et al. [29] extend Fisher Vector to the point
level image representation for object detection. For a different
purpose, we propose to further extend the FV coding to
superpixel level and experimentally verify the superiority of
our Superpixel-wise Fisher Vector coding method.
Given a superpixel i= {pt, t = 1, . . ., T }, where pt is a
-dimensional image pixel, and T is the number of pixels
within i, we train a Gaussian mixture model (GMM)
λ(pt) = K
k=1 υkψk(pt) from all the pixels of an
image using the Maximum Likelihood (ML) criterion. The
parameters of the K-component GMM are defined as
λ = {υk, μk, k, k = 1, . . ., K}, where υk, μk and k are
the mixture weight, mean vector and covariance matrix of
Gaussian k respectively. Similar to the FV coding method, the
SFV representation can be written as a = 2 K-dimensional
concatenated form:
ϕi = {ζμ1, ζσ1 , . . . , ζμK , ζσK} (1)
3324 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015
where ζμk and ζσk are defined as: ζμk = 1
T
√
υk
T
t=1
ηt (k) pt −μk
σk
,
ζσk = 1
T
√
υk
T
t=1
ηt (k) 1√
2
{(pt−μk )2
σ2
k
− 1}, and σk is the square root
of the diagonal values of k, ηt (k) is the soft assignment of pt
to Gaussian k.
The SFV representation ϕi is hereby used to describe
superpixel i in this paper. It has several advantages:
• As an extension of Fisher Vector coding,
SFV successfully realizes superpixel level coding
representation, making Fisher Vector more suitable for
single image processing. Instead of averaging low-level
features of contained pixels, SFV statistically analyzes
the internal feature distribution of each superpixel,
providing a more accurate and reliable representation
for it. Experiments show that our SFV generates more
smooth and uniform saliency maps and improves about
2 percent compared with low-level features in the
precision-recall curve on the MSRA-1000 database as
shown in Figure 7.
• SFV can be regarded as an adaptive Fisher Vector coding,
since the parameters of the GMM model are trained
on a specific image online. This means even the same
superpixels in different images have different coding
representations. Therefore, our SFV better considers
image contextual information.
• Due to the small number of superpixels in an image and
their disjoint nature, SFV is much faster than existing
state-of-the-art FV variants. Furthermore, besides saliency
detection, SFV can also be applied to other vision tasks,
such as image segmentation and content-aware image
resizing, etc.
B. Adaptive Metric Learning
Learning a discriminative metric can better distinguish the
samples in different classes, as well as shortening the distance
within the same class. Numerous models and methods
have been proposed in the last decade, especially for the
Mahalanobis distance metric learning, such as information
theoretic metric learning (ITML) [30], large margin nearest
neighbor (LMNN) [31], [32], and logistic discriminative based
metric learning (LDML) [33].
However, most existing metric learning approaches learn a
fixed metric for all samples without considering the deeper
structure of the data, thereby breaking down in the presence of
irrelevant or unreliable features. In this paper, we propose an
adaptive metric learning approach, which considers both the
global distribution of the whole training set (GML) and the
specific structure of a single image (SML) to better separate
objects from the background. Our approach can also be viewed
as an integration of a supervised distance metric learning
model (GML) and a semi-supervised distance metric learning
model (SML). Since GML and SML are complimentary to
each other, we get promising results after fusing
them together under an affinity aggregation framework
(Section III-D).
1) Generic Metric Learning (GML): Metric learning has
been widely applied to vision tasks, but never been used for
saliency detection because of its long training time, which is
infeasible for single image processing. In this part, we solve
this problem by pre-training a Generic metric Mg from the
first 500 images of MSRA-1000 database using gradient
descent, and we verify, both experimentally and empirically,
that Mg is generally suitable for all images.
First, we construct a training set {ϕi, i = 1, 2, . . . , M}
consisted of superpixels extracted from all training images,
where ϕi is the SFV representation of superpixel i. To find
the most discriminative Mg, we minimize
M∗
g = arg min
Mg
1
2
α Mg
2
+
n {ij|δn
i =1,δn
j =0}
D(i, j) (2)
D(i, j) = exp{−(ϕi − ϕj)T
Mg(ϕi − ϕj )/σ2
1 } (3)
where δn
i is an indicator of the ith superpixel in the nth image
belonging to the foreground or background, D(i, j) is the
exponential Mahalanobis distance between i and j under the
distance metric Mg. We set σ1 = 0.1 to control the strength
of distances.
Considering that the background is various and chaotic, and
different object regions are distinctive as well, we just impose
restriction on pairwise distances between positive samples
and negative ones, which is more reliable and reasonable
for the fact that salient objects are always distinct from the
background. This minimization aims at maximizing feature
distances between foreground and background samples,
thereby significantly improving the performance of saliency
detection. Eqn 2 can be easily solved by gradient descent.
The Generic metric includes the information of all superpixels
in the whole training images, thus it is appropriate for most
images.
2) Specific Metric Learning (SML): Recently,
Wang et al. [34] propose a novel doublet-SVM metric
learning approach based on Kernel Classification Framework,
thus formulating the metric learning into a SVM problem and
achieving desirable results with less training time. However,
experiments show that directly applying doublet-SVM
to saliency detection cannot ensure good detection
accuracy. Therefore, we modify this approach by adding
a constraint ω(τ1,τ2), which significantly improves the
performance of the final saliency map.
Let {ϕi, i = 1, 2, . . . , m} be the training dataset, where ϕi is
the SFV representation of a labeled superpixel extracted from
a specific image. The detailed process of extracting labeled
superpixels from an image will be discussed in Section III-C.
We first divide these samples into foreground seeds and
background seeds and label them as 1 and 0 respectively.
Given a training sample ϕi with label hi , we find its q1 nearest
neighbors with the same label and q2 nearest neighbors with
different labels, and then (q1 + q2) doublets are constructed
for it. Each doublet consists of the training sample ϕi and
one of its nearest neighbors. By combining the doublets of
all samples together, a doublet set χ = {x1, x2, . . . , xZ } is
established, where xτ = (ϕτ,1, ϕτ,2), τ = 1, 2, . . . Z is one
of the doublets, and ϕτ,1 and ϕτ,2 are the SFV of superpixel
τ1 and τ2 in doublet xτ , We assign xτ a label as follows:
lτ = −1 if hτ,1 = hτ,2, and lτ = 1 if hτ,1 = hτ,2.
LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3325
As an extension of degree-2 polynomial kernel, we define
the doublet level degree-2 polynomial kernel as:
Kp(xτ , xι)
= tr
ω(τ1,τ2)(ϕτ,1 − ϕτ,2)(ϕτ,1 − ϕτ,2)T
ω(ι1,ι2)(ϕι,1 − ϕι,2)(ϕι,1 − ϕι,2)T
= ω(τ1,τ2)ω(ι1,ι2){(ϕτ,1 − ϕτ,2)T
(ϕι,1 − ϕι,2)}2
(4)
where ω(τ1,τ2) = θ(τ1,τ2) ∗ O(τ1,τ2) is a weight parameter.
θ(τ1,τ2) = 1−exp(−dist(τ1,τ2)/σ2) (5)
O(τ1,τ2) = 1 − exp{−(Oτ1 − Oτ2)2
/σ2} (6)
where dist(τ1,τ2) is the space distance between superpixel
τ1 and τ2, and θ(τ1,τ2) is the corresponding exponential space
distance. Oτ1 is the objectness score defined as Eqn 11 of
superpixel τ1, and O(τ1,τ2) is the superpixel-wise objectness
distance between τ1 and τ2. We set σ2 = 0.1. The weight
parameter ω(τ1,τ2) provides crucial spatial and prior informa-
tion regarding the interesting objects, thus it is more robust in
evaluating the similarity between a pair of superpixels than the
feature distance alone. In order to determinate the similarity of
two samples in a doublet, we further define a kernel decision
function as follows:
E(x) = sgn{
τ
ατ lτ Kp(xτ , x) + β} (7)
where ατ is the weight of doublet xτ , β is a bias parameter.
We have
τ
ατ lτ Kp(xτ , x) + β
= ω(x1,x2)(ϕx,1 − ϕx,2)T
Ms(ϕx,1 − ϕx,2) + β (8)
Ms =
τ
ατ lτ ω(τ1,τ2)(ϕτ,1 − ϕτ,2)(ϕτ,1 − ϕτ,2)T
(9)
For the facility of computation, we set ω(x1,x2)=1. The
proposed Specific metric Ms can be easily solved by existing
SVM solvers. The Specific metric is trained only on the
test image, and it is much faster than existing metric
learning approaches. According to [34], the doublet-SVM
is 2000 times, on average, faster than the ITML [30].
Therefore, it is feasible to train a Specific metric for each
image to better distinguish its objects from the background.
In this part, we propose two metric learning approaches:
GML and SML. The first one considers more about the global
distribution of the whole training set, while the second one
aims at exploring the deeper structure of a specific image.
GML can be pretrained offline and is generally suitable for
all images, while SML is much faster, since it can be solved
by existing SVM solvers. We need to mention that the image
specific is not always better than the Generic metric, as it has
fewer training samples and less reliable labels. Instead, these
two metrics are supposed to be complementary to each other
and can be fused together to improve the performance of the
final detection results.
C. Iterative Seeds Selection by Mahalanobis Distance (ISMD)
As a preliminary criterion of saliency detection, saliency
seeds directly influence the performance of seeds-based
solutions. Recently, Liu et al. [28] propose an optimal
seeds selection strategy via submodularity. By adding a stop
criterion, the submodularity problem can be solved and then
the optimal seed set is obtained accordingly. In [35], Lu et al.
learn optimal seeds by combining bottom-up saliency maps
and mid-level vision cues. Inspired by their works, we propose
a compact but efficient iterative seeds selection scheme based
on the Mahalanobis distance assessment (ISMD).
Alexe et al. [24] present a novel objectness method to
measure the likelihood of a given image window containing
an object. Jiang et al. [18] extend the original objectness to
Pixel-level Objectness O(p) and Region-level Objectness Oi
by defining:
O(p) =
W
w=1
P(w) (10)
Oi =
1
T
p∈i
O(p) (11)
where W is the number of sampling windows that contain
pixel p, and P(w) is the probability score of the wth window,
T is the number of pixels within region i. We redefine the
region-level objectness as superpixle-wise objectness in this
paper.
Motivated by the fact that highlights of the superpixle-wise
objectness map are more likely to be the foreground seeds,
a set of initial foreground seeds is constructed from the lightest
two percent regions of the objectness map. Considering
that the background is massive and scattered, we pick out
several lowest objectness values from each boundary of the
superpixel-wise objectness map as initial background seeds.
The intuition is that if superpixel i is a foreground seed, the
ratio of distances from foreground seeds and background seeds
should be small. We formulate the ratio as follows:
i =
f s
drat(i, f s)
bs
drat(i, bs)
(12)
where
drat(i, f s) = φ(i, f s)(ϕi − ϕ f s)Mg(ϕi − ϕ f s)T
(13)
is the Mahalanobis distance between superpixel i and one
of foreground seeds f s under the Generic metric Mg, and
φ(i, f s) = d(i, f s) ∗ O(i, f s) is a weight parameter, where
d(i, f s) = exp(−dist2
(i, f s)/σ2) (14)
is another kind of exponential space distance between
superpixel i and f s. Only when i ≤ 0 or i ≥ 1,
i can be added to the foreground seeds set or background
seeds set, where 0 and 1 are two thresholds. With the
new added seeds each time, we iterate this process N1 times.
Since most of the area in an image belongs to the back-
ground, in order to generate more background seeds, the
iteration continues N2 times more, but only selects seeds
3326 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015
Fig. 4. Iterative seeds selection by Mahalanobis distance. Initial saliency
seeds are first selected from the lightest and the darkest parts of the superpixel-
wise objectness map. By computing the Mahalanobis distance between any
superpixel and the chosen seeds, we iteratively increase the foreground and
background seeds.
with
bs
drat(i, bs) ≤ 2, where 2 is a threshold. Then we
obtain the final seeds set as illustrated in Figure 4.
As elaborated in Section III-B2, the Specific metric Ms
can be learnt from the labeled seeds via doublet-SVM.
One may concern that Ms will rely too much on Mg, since the
labeled seeds are generated under Mg. Fortunately, by learning
a generally suitable metric, we can enforce a very high seeds
accuracy (98.82% on MSRA-1000 database) which means the
seeds-based Specific metric is reliable enough to measure the
distance.
D. Metric Fusion for Extracting Spectral
Clustering Characteristics
Aggregating several affinity matrices appropriately may
enhance the relevant and useful information, and at the same
time, alleviate the irrelevant and unreliable one. Spectral
clustering is an important unsupervised clustering algorithm
for transferring the feature representation into a more
discriminative indicator space, and we call this property as
“spectral clustering characteristics”. Spectral clustering has
been applied to many fields for its effective and outstanding
performance.
In this section, we merge the metric fusion into a spectral
clustering features extraction process [36] and learn the
optimal aggregation weight for each affinity matrix. The fusion
strategy significantly improves the results of saliency detection
as shown in Figure 5. Based on the two metrics learnt above,
two affinity matrices g and s are constructed with the
corresponding i jth element
π
g
i, j = exp{−φ(i, j)(ϕi − ϕj )Mg(ϕi − ϕj )T
/σ3}
πs
i, j = exp{−φ(i, j)(ϕi − ϕj )Ms(ϕi − ϕj )T
/σ3} (15)
where σ3 = 0.1. The affinity aggregation strategy aims
at finding the optimal clustering characteristic vector of
all the superpixels in an image and the weight parameter
ϑ = [ϑg, ϑs]T associated with g and s, so the fusion
Fig. 5. Evaluation of metrics. (a) input images. (b) Generic metric.
(c) Specific metric. (d) fused results. (e) ground truth.
problem can be conducted as:
min
ϑg,ϑs
1,..., r
{
i, j
ϑ2
g π
g
i, j i − j
2
+
i, j
ϑ2
s πs
i, j i − j
2
}
= min
ϑg,ϑs
1,..., r
{ϑ2
g
T
(Hg − g) + ϑ2
s
T
(Hs − s) }
= min
ϑg,ϑs
(βgϑ2
g + βsϑ2
s ) (16)
where i is the clustering characteristic indicator of
superpixel i, and r is the number of superpixels in an image,
Hg = diag{h11, . . . , hrr } is the diagonal matrix of g with its
diagonal element hii =
j
π
g
i, j , βg = T (Hg − g) . To solve
this problem, we first employ two constraints: the normalized
weight constraint ϑg + ϑs = 1 and the normalized spectral
clustering constraint T H = 1. By fixing ϑ, the clustering
characteristic vector can be easily obtained using standard
spectral clustering. If is given, Eqn 16 can be formulated as:
min
ϑg,ϑs
(βgϑ2
g + βsϑ2
s ) = min
μg,μs
(ρgμ2
g + ρsμ2
s ) (17)
subject to
μ2
g + μ2
s = 1,
μg
√
αg
+
μs
√
αs
= 1 (18)
where αg = T Hg , ρg=
βg
αg
and μg =
√
αgϑg. This can be
easily solved by existing 1D line-search methods.
To summarize, metric fusion tries to find the optimal
clustering characteristic vector and the optimal weight
parameter ϑ via a two-step iterative strategy. Since affinity
matrices incorporate φ(i, j) in Eqn 15, the convergence
can be very fast, about three iterations in each image.
We use the indicator representation to compute saliency maps
(Section III-E).
E. Context-Based Multi-Scale Saliency Detection
In this section, we propose a context-based multi-scale
saliency detection algorithm to compute the saliency map for
each image. Lacking the knowledge of sizes of objects, we first
generate superpixels in S different scales. Then the K-means
algorithm is applied in each scale to segment an image into
LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3327
Fig. 6. The distribution of saliency values of ground truth foregrounds and backgrounds. (a) Generic metric on MSRA-1000. (b) Specific metric on
MSRA-1000. (c) AML on MSRA-1000. (d) AML on MSRA-5000.
N clusters via their SFV features. According to the intuition
that a superpixel is salient if its cluster neighbors are close
to the foreground seeds and far from the background seeds,
we define the distance between superpixel i and saliency seeds
in scale s as:
D(s)
i, f s =
f n(s)
q=1
{γ i − q + (1 − γ)
N
(s)
c
j=1
Wi, j j − q }
D
(s)
i,bs =
bn(s)
q=1
{γ i − q + (1 − γ)
N
(s)
c
j=1
Wi, j j − q }
(19)
where
Wi, j = Q1 exp{−dist(i, j)/σ2} ∗ Q2 exp{−(Oi − Oj )2
/σ2}
(20)
is the weighted distance between superpixel i and its cluster
neighbor j, and i is the clustering characteristic indicator
of superpixel i, f n and bn are the number of foreground
and background seeds chosen by our ISMD seeds selection
approach. Q1, Q2 and γ are weight parameters, Nc is the
number of cluster neighbors of superpixels i. The saliency
value of superpixel i can be formulated as:
sal(i) =
S
s=1
νs ∗ exp(Oi)
1 + {(1 − exp(−D(s)
i, f s/σ4)}/D(s)
i,bs
=
S
s=1
νs ∗ exp(Oi) ∗ D(s)
i,bs
D
(s)
i,bs + 1 − exp(−D
(s)
i, f s/σ4)
(21)
where νs is the weight of scale s, and σ4 = 0.1.
The considerations of all the other superpixels belonging to
the same cluster and multiple scales smooth the saliency map
effectively, and make our approach more robust in dealing with
complicated scenes.
IV. EXPERIMENTS
We evaluate the proposed method on four benchmark
datasets. The first one is MSRA-1000 [13], a subset
of MSRA-5000, which has been widely used in previous
works with its accurate human-labelled masks. The second
one is MRAS-5000 dataset [15] which includes 5000 more
comprehensive images. The third one is THUS-10000 [37]
consists of 10000 images, each of which has an unambiguous
salient object with pixel-wise ground truth labeling. The last
Fig. 7. (a) Precision-recall curve for Generic metric, Specific metric, and
fused results without neighbor smoothness (MSRA-1000 and Berkeley-300).
Precision-recall curve based on SFV and low-level features. Precision-recall
curve for other two fusion methods. (b) Images of fused results based on SFV
and low-level features.
one is Berkeley-300 [38] which contains more challenging
scenes with multiple objects of different sizes and locations.
Since we have already used the first 500 images
of MSRA-1000 for training, we evaluate our algorithm
and compare it with other methods on the rest 500 images of
MSRA-1000, 4500 images of MSRA-5000, where excludes
500 training images (MSRA-5000 contains all the images of
MSRA-1000), 9501 images of THUS-10000 (THUS-10000
contains 499 training images), and Berkeley-300.
A. Evaluation of Metrics
We perform several comparative experiments as shown
in Figure 5, Figure 6 and Figure 7(a) to demonstrate the
efficiency of Generic metric (GML), Specific metric (SML),
and their combination (AML based on SFV). In order to
eliminate the influence of neighbor smoothness, Eqn 19, when
comparing metrics, we just compute the distance between each
superpixel and seeds, instead of the sum of weighted distances
of its cluster neighbors:
D(s)
i, f s =
f n(s)
q=1
i − q , D(s)
i,bs =
bn(s)
q=1
i − q (22)
The precision-recall curves of the Generic metric and Specific
metric are almost the same, but their combination outperforms
both of them. We also try to add or multiply saliency maps
generated by these two metrics directly, but the PR curves are
much lower than our fusion approach in Figure 7(a). This
is consistent with our motivation: Mg is trained from the
3328 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015
Fig. 8. Results of different methods. (a), (b) Precision-recall curves on MSRA-1000. (c) Average precisions, recalls, F-measures and AUC on MSRA-1000.
(d), (e) Precision-recall curves on MSRA-5000. (f) Average precisions, recalls, F-measures and AUC on MSRA-5000.
Fig. 9. Results of different methods. (a), (b) Precision-recall curves on THUS-10000. (c) Average precisions, recalls, F-measures and AUC on THUS-10000.
(d), (e) Precision-recall curves on Berkeley-300. (f) Average precisions, recalls, F-measures and AUC on Berkeley-300.
whole training dataset, containing the global distribution of the
data, and Ms aims at a single image, considering the specific
structure of samples.
Figure 5 demonstrates that the fused results significantly
remove the light saliency values in the background regions
produced by GML and SML. Since most parts in computing
saliency maps under different metrics are the same,
e.g., objectness prior map, seeds selection, etc., it is reasonable
that Figure 5 (b) and (c) are similar, but there are still
differences between them. To further prove this, we conduct an
extra experiment as shown in Figure 11. The second line is the
results generated by fusing the GML with itself, the third line
is the results generated by fusing the SML with itself, and the
fourth line the obtained by fusing the GML and SML. We call
them as GG, SS, and AML respectively. Limited by the image
resolution, some differences between the GML and SML may
not be find in Figure 5, but the integration with the metric
itself can apparently enlarge their distinctiveness. Furthermore,
if one metric is incorrect, another one can make up it.
The SS performs better than the GG in Figure 11 (a)-(e),
while the GG is better in (f)-(g), and the AML tends to
take the best results of them, which demonstrates that the
GML and SML are indeed complimentary to each other and
improve the performance of saliency detection after fusion.
Figure 11 (k)-(m) show that if both the GML and SML get
bad results, the results after fusion are still bad.
In addition, we plot the distribution of saliency values in
Figure 6. Ground truth masks provide a specific label, 1 or 0,
for each pixel and we regard a superpixel as foreground when
more than 80% pixels of it are labelled by 1. Otherwise, the
superpixel will be background. We put all the foreground
superpixels from the whole dataset together and get the
distribution of their saliency values computed by different
saliency methods as the red line. The blue line is the
distribution of saliency values of background superpixels.
Figure 6(a), (b), (c) are the saliency distribution produced
by GML, SML and AML on MSRA-1000 respectively.
Figure 6(d) is AML on MSRA-5000. This shows that AML is
better than GML and SML, since its background saliency
values are closer to 0.
Furthermore, our Generic metric is robust to different
databases. We use the metric trained from MSRA-1000 to
all the databases, including MSRA-1000, MSRA-5000,
THUS-10000, and Berkeley-300. As shown in
Figure 8 and Figure 9, the results are still promising even
on different databases, which demonstrates the effectiveness
and adaptiveness of our Generic metric. Overall, the fused
results based on two outstanding and complementary metrics
achieve higher precision and recall values and generate more
accurate saliency maps.
B. Evaluation of Superpixel-Wise Fisher Vector
We have mentioned that our Superpixel-wise Fisher Vector
coding approach can improve the performance of saliency
detection by capturing the average first-order and second-order
differences between local features and the centers of a Mixture
of Gaussian Distributions. In experiments, we extract the
low-level features: RGB and LAB to learn a 12D SFV
representation for each superpixel ( = 6, K = 1,
= 2 K = 12). Figure 7(a) shows the efficiency of our
SFV coding approach by comparing the precision-recall curves
of low-level features and the SFV on MSRA-1000 database.
Figure 7(b) are corresponding images.
C. Evaluation of Saliency Maps
We compare the proposed saliency detection model with
several state-of-the-art methods: IT [5], GB [19], FT [13],
LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3329
Fig. 10. The comparison of previous methods, our algorithm and ground truth. (a) Test image. (b) IT [5]. (c) GB [19]. (d) GC [39].
(e) CB [44]. (f) UFO [18]. (g) Proposed. (h) Ground truth.
GC [39], UFO [18], SVO [40], HS [41], PD [42], AMC [43],
RCJ [37], DSR [20], DRFI [26], CB [44], RC [6], LR [14]
and XL [45]. We use source codes provided by the
authors or implement them based on the available codes or
softwares.
We conduct several quantitative comparisons of some
typical saliency detection methods. Figure 8(a), (b), (d) and (e)
show that the proposed AML is comparable with most of the
state-of-the-arts on MSRA-1000 and MSRA-5000 databases.
Figure 8(c) and (f) are the comparisons of average precision,
recall, F-measure and AUC. We use AUC as an evaluation
criteria, since it represents the area under the PR curve
and can effectively reflect the global properties of different
algorithms. Instead of using the bounding boxes to evaluate
the saliency detection performances on MSRA-5000 database,
we adopt the accurate human-labeled masks provided by [26]
to ensure more reliable comparative results. We also perform
experiments on THUS-10000 and Berkeley-300 databases
as shown in Figure 9. Precision-recall curves show that
AML reaches 97.4%, 94.0%, 96.5%, 81.5% precision rate on
MSRA-1000, MSRA-5000, THUS-10000, and Berkeley-300
respectively. All of them demonstrate the efficiency of our
method.
Figure 10 shows some sample results of five previous
approaches and our AML algorithm. The IT and GB methods
are capable in finding the salient regions in most cases, but
they tend to highlight the boundaries and miss lots of object
information because of the blurriness of saliency maps. The
GC method cannot contain all the salient pixels and often
mislabels small background patches as salient regions. The
CB and UFO models can highlight the objects uniformly, but
they become invalid in dealing with challenging scenes. Our
method can catch both the small and large salient objects even
in complex environments. In addition, we can highlight the
objects uniformly with accurate boundaries and do not need
to care about the number and locations of the salient objects.
We also test the average computational cost on different
datasets: 18.15s on MSRA-1000, 18.42s on MSRA-5000,
17.90s on THUS-10000 and 18.78s on Berkeley-300. The pro-
posed algorithm is implemented in MATLAB on a PC machine
with Intel i7-3370 CPU (3.4 GHz) and 32 GB memory.
D. Evaluation of Selected Seeds
We train an effective Specific metric based on the
assumption that the selected seeds are correct. In experiments,
3330 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015
Fig. 11. Example results of different metrics. The first line is the input images, the second line is the results generated by fusing the GML with itself, the
third line is the results generated by fusing the SML with itself, the fourth line is obtained by fusing the GML and SML, and the last line is the ground truth
images.
we cannot ensure that the chosen seeds are completely
accurate, but we can enforce a very high seeds accuracy. The
accuracy of selected seeds is defined as follows:
sa =
f sc + bsc
f st + bst
=
f sc + bsc
( f sc + f sic) + (bsc + bsic)
(23)
where
f sc =
n i
(gtn
i &seedn
i )
bsc =
n i
(gtn
i &seedn
i ) (24)
i represents the ith superpixel extracted from the nth image
of a typical database. gtn
i and seedn
i are the ground truth and
label assigned by our seeds selection mechanism of i. The
accuracy rates of four databases are: 0.9882 on MSRA-1000,
0.9769 on MSRA-5000, 0.9822 on THUS-10000 and 0.8874
on Berkeley-300. We experimentally verify that the seeds are
accurate enough to generate a reliable Specific metric for
each image.
V. CONCLUSION
In this paper, we explicitly propose two Mahalanobis
distance metric learning models and a superpixel-wise fisher
vector representation for visual saliency detection. To our
knowledge, we are the first to apply metric learning to
saliency detection and conduct a metric fusion mechanism
to improve the detection accuracy. Different from previous
methods, we adopt a new feature coding strategy and make
the supervised metric learning more suitable for single image
processing. In addition, we propose an accurate seeds selection
method based on the Mahalanobis distance measure to train
the Specific metric and construct the final saliency map.
We estimate the saliency value of each superpixel from a
multi-scale view and include the contextual information when
computing it. Experimental results with sixteen state-of-the-art
algorithms on four benchmark image databases demonstrate
the efficiency of our metric learning approach and the saliency
detection model. In the future, we plan to explore more robust
object detection approaches to further improve the accuracy
of saliency detection.
REFERENCES
[1] C. Siagian and L. Itti, “Rapid biologically-inspired scene classification
using features shared with visual attention,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 29, no. 2, pp. 300–312, Feb. 2007.
[2] H. Liu, X. Xie, X. Tang, Z.-W. Li, and W.-Y. Ma, “Effective browsing
of Web image search results,” in Proc. 6th ACM SIGMM Int. Workshop
Multimedia Inf. Retr., 2004, pp. 84–90.
[3] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The JPEG2000 still
image coding system: An overview,” IEEE Trans. Consum. Electron.,
vol. 46, no. 4, pp. 1103–1127, Nov. 2000.
[4] Y. Niu, F. Liu, X. Li, and M. Gleicher, “Warp propagation for video
resizing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010,
pp. 537–544.
[5] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual
attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998.
[6] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu,
“Global contrast based salient region detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2011, pp. 409–416.
[7] Y. Xie, H. Lu, and M.-H. Yang, “Bayesian saliency via low and mid
level cues,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1689–1698,
May 2013.
[8] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection
via graph-based manifold ranking,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit., Jun. 2013, pp. 3166–3173.
[9] J. Sun, H. Lu, and X. Liu, “Saliency region detection based on Markov
absorption probabilities,” IEEE Trans. Image Process., vol. 24, no. 5,
pp. 1639–1649, May 2015.
[10] Y.-F. Ma and H.-J. Zhang, “Contrast-based image attention analysis by
using fuzzy growing,” in Proc. 11th ACM Int. Conf. Multimedia, 2003,
pp. 374–381.
[11] J. Sun, H. Lu, and S. Li, “Saliency detection based on integration
of boundary and soft-segmentation,” in Proc. IEEE Int. Conf. Image
Process., Sep./Oct. 2012, pp. 1085–1088.
[12] X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2007,
pp. 1–8.
[13] R. Achanta, S. Hemami, F. Estrada, and S. Süsstrunk, “Frequency-tuned
salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2009, pp. 1597–1604.
[14] X. Shen and Y. Wu, “A unified approach to salient object detection via
low rank matrix recovery,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2012, pp. 853–860.
[15] T. Liu et al., “Learning to detect a salient object,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 33, no. 2, pp. 353–367, Feb. 2011.
[16] J. Yang and M.-H. Yang, “Top-down visual saliency via joint CRF and
dictionary learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Jun. 2012, pp. 2296–2303.
[17] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, “Image
classification with the Fisher vector: Theory and practice,” Int. J.
Comput. Vis., vol. 105, no. 3, pp. 222–245, 2013.
LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3331
[18] P. Jiang, H. Ling, J. Yu, and J. Peng, “Salient region detection by UFO:
Uniqueness, focusness and objectness,” in Proc. IEEE Int. Conf. Comput.
Vis., Dec. 2013, pp. 1976–1983.
[19] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Proc.
Adv. Neural Inf. Process. Syst., 2006, pp. 545–552.
[20] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, “Saliency detection
via dense and sparse reconstruction,” in Proc. IEEE Int. Conf. Comput.
Vis., Dec. 2013, pp. 2976–2983.
[21] W. Zhu, S. Liang, Y. Wei, and J. Sun, “Saliency optimization from
robust background detection,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2014, pp. 2814–2821.
[22] F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung, “Saliency filters:
Contrast based filtering for salient region detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2012, pp. 733–740.
[23] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk,
“SLIC superpixels compared to state-of-the-art superpixel methods,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282,
Nov. 2012.
[24] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of
image windows,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34,
no. 11, pp. 2189–2202, Nov. 2012.
[25] Y. Wei, F. Wen, W. Zhu, and J. Sun, “Geodesic saliency using
background priors,” in Proc. 12th Eur. Conf. Comput. Vis. (ECCV), 2012,
pp. 29–42.
[26] H. Jiang, Z. Yuan, M.-M. Cheng, Y. Gong, N. Zheng, and J. Wang.
(2014). “Salient object detection: A discriminative regional feature inte-
gration approach.” [Online]. Available: http://arxiv.org/abs/1410.5926
[27] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient
object detection: A discriminative regional feature integration approach,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013,
pp. 2083–2090.
[28] R. Liu, J. Cao, Z. Lin, and S. Shan, “Adaptive partial differential
equation learning for visual saliency detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2014, pp. 3866–3873.
[29] Q. Chen et al., “Efficient maximum appearance search for large-scale
object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Jun. 2013, pp. 3190–3197.
[30] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon,
“Information-theoretic metric learning,” in Proc. 24th Int. Conf. Mach.
Learn., 2007, pp. 209–216.
[31] K. Q. Weinberger, J. Blitzer, and L. K. Saul, “Distance metric learning
for large margin nearest neighbor classification,” in Proc. Adv. Neural
Inf. Process. Syst., 2005, pp. 1473–1480.
[32] K. Q. Weinberger and L. K. Saul, “Fast solvers and efficient
implementations for distance metric learning,” in Proc. 25th Int. Conf.
Mach. Learn., 2008, pp. 1160–1167.
[33] M. Guillaumin, J. Verbeek, and C. Schmid, “Is that you? Metric
learning approaches for face identification,” in Proc. IEEE 12th Int.
Conf. Comput. Vis., Sep./Oct. 2009, pp. 498–505.
[34] F. Wang, W. Zuo, L. Zhang, D. Meng, and D. Zhang. (2013). “A kernel
classification framework for metric learning.” [Online]. Available:
http://arxiv.org/abs/1309.5823
[35] S. Lu, V. Mahadevan, and N. Vasconcelos, “Learning optimal seeds for
diffusion-based salient object detection,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., Jun. 2014, pp. 2790–2797.
[36] H.-C. Huang, Y.-Y. Chuang, and C.-S. Chen, “Affinity aggregation for
spectral clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Jun. 2012, pp. 773–780.
[37] M.-M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S.-M. Hu,
“Global contrast based salient region detection,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 37, no. 3, pp. 569–582, Mar. 2014.
[38] V. Movahedi and J. H. Elder, “Design and perceptual validation
of performance measures for salient object segmentation,” in Proc.
IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops,
Jun. 2010, pp. 49–56.
[39] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet, and N. Crook,
“Efficient salient region detection with soft image abstraction,” in Proc.
IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1529–1536.
[40] K.-Y. Chang, T.-L. Liu, H.-T. Chen, and S.-H. Lai, “Fusing generic
objectness and visual saliency for salient object detection,” in Proc. IEEE
Int. Conf. Comput. Vis., Nov. 2011, pp. 914–921.
[41] Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013,
pp. 1155–1162.
[42] R. Margolin, A. Tal, and L. Zelnik-Manor, “What makes a patch
distinct?” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2013, pp. 1139–1146.
[43] B. Jiang, L. Zhang, H. Lu, C. Yang, and M.-H. Yang, “Saliency detection
via absorbing Markov chain,” in Proc. IEEE Int. Conf. Comput. Vis.,
Dec. 2013, pp. 1665–1672.
[44] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li, “Automatic
salient object segmentation based on context and shape prior,” in Proc.
BMVC, 2011, pp. 110.1–110.12
[45] Y. Xie and H. Lu, “Visual saliency detection based on Bayesian model,”
in Proc. 18th IEEE Int. Conf. Image Process., Sep. 2011, pp. 645–648.
Shuang Li is currently pursuing the
B.E. degree with the School of Information
and Communication Engineering, Dalian University
of Technology (DUT), China. From 2012 to 2015,
she was a Research Assistant with the Computer
Vision Group, DUT. Her research interests focus
on saliency detection and object recognition.
Huchuan Lu (SM’12) received the M.Sc. degree
in signal and information processing and the
Ph.D. degree in system engineering from the Dalian
University of Technology (DUT), Dalian, China,
in 1998 and 2008, respectively. He joined as a
Faculty Member in 1998, and is currently a Full
Professor with the School of Information and
Communication Engineering, DUT. His current
research interests include the areas of computer
vision and pattern recognition with a focus on visual
tracking, saliency detection, and segmentation.
He is also a member of the Association for Computing Machinery and
an Associate Editor of the IEEE TRANSACTIONS ON SYSTEMS, MAN AND
CYBERNETICS—PART B.
Zhe Lin (M’10) received the B.Eng. degree in
automatic control from the University of Science
and Technology of China, in 2002, the M.S. degree
in electrical engineering from the Korea Advanced
Institute of Science and Technology, in 2004, and the
Ph.D. degree in electrical and computer engineering
from the University of Maryland, College Park,
in 2009. He has been a Research Intern with
Microsoft Live Labs Research. He is currently a
Senior Research Scientist with Adobe Research,
San Jose, CA. His research interests include deep
learning, object detection and recognition, image classification and tagging,
content-based image and video retrieval, human motion tracking, and activity
analysis.
Xiaohui Shen (M’11) received the B.S. and
M.S. degrees from the Department of Automation,
Tsinghua University, China, and the Ph.D. degree
from the Department of Electrical Engineering
and Computer Sciences, Northwestern University,
in 2013. He is currently a Research Scientist with
Adobe Research, San Jose, CA. He is generally
interested in the research problems in the area of
computer vision, in particular, image retrieval, object
detection, and image understanding.
Brian Price received the Ph.D. degree in computer
science from Brigham Young University under the
advisement of Dr. B. Morse. He has contributed
new features to many Adobe products, such as
Photoshop, Photoshop Elements, and After-Effects,
mostly involving interactive image segmentation and
matting. He is currently a Senior Research Scientist
with Adobe Research, specializing in computer
vision. His research interests include semantic seg-
mentation, interactive object selection and matting,
stereo and RGBD, and broad interest in computer
vision and its intersections with machine learning and computer graphics.

More Related Content

What's hot

Optimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral ImagesOptimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral Images
IDES Editor
 
F010224446
F010224446F010224446
F010224446
IOSR Journals
 
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
ijma
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
csandit
 
Morpho
MorphoMorpho
Morpho
Subbu Akili
 
IRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion MethodsIRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET Journal
 
Biomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWPBiomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWP
IRJET Journal
 
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
sipij
 
Image Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused ImagesImage Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused Images
CSCJournals
 
IRJET- Fusion based Brain Tumor Detection
IRJET- Fusion based Brain Tumor DetectionIRJET- Fusion based Brain Tumor Detection
IRJET- Fusion based Brain Tumor Detection
IRJET Journal
 
FUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHM
FUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHMFUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHM
FUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHM
AM Publications
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
IJCSEIT Journal
 
MIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLS
MIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLSMIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLS
MIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLS
AM Publications
 
Ijctt v7 p104
Ijctt v7 p104Ijctt v7 p104
Ijctt v7 p104
ssrgjournals
 
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
CSCJournals
 
An Analysis and Comparison of Quality Index Using Clustering Techniques for S...
An Analysis and Comparison of Quality Index Using Clustering Techniques for S...An Analysis and Comparison of Quality Index Using Clustering Techniques for S...
An Analysis and Comparison of Quality Index Using Clustering Techniques for S...
CSCJournals
 
Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317
Yashwant Kurmi
 
A GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENT
A GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENTA GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENT
A GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENT
pharmaindexing
 
E1803053238
E1803053238E1803053238
E1803053238
IOSR Journals
 
IRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means ClusteringIRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET Journal
 

What's hot (20)

Optimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral ImagesOptimized Neural Network for Classification of Multispectral Images
Optimized Neural Network for Classification of Multispectral Images
 
F010224446
F010224446F010224446
F010224446
 
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
 
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDYSINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
 
Morpho
MorphoMorpho
Morpho
 
IRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion MethodsIRJET - Review of Various Multi-Focus Image Fusion Methods
IRJET - Review of Various Multi-Focus Image Fusion Methods
 
Biomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWPBiomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWP
 
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...
 
Image Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused ImagesImage Fusion and Image Quality Assessment of Fused Images
Image Fusion and Image Quality Assessment of Fused Images
 
IRJET- Fusion based Brain Tumor Detection
IRJET- Fusion based Brain Tumor DetectionIRJET- Fusion based Brain Tumor Detection
IRJET- Fusion based Brain Tumor Detection
 
FUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHM
FUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHMFUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHM
FUZZY SEGMENTATION OF MRI CEREBRAL TISSUE USING LEVEL SET ALGORITHM
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
 
MIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLS
MIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLSMIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLS
MIP AND UNSUPERVISED CLUSTERING FOR THE DETECTION OF BRAIN TUMOUR CELLS
 
Ijctt v7 p104
Ijctt v7 p104Ijctt v7 p104
Ijctt v7 p104
 
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
 
An Analysis and Comparison of Quality Index Using Clustering Techniques for S...
An Analysis and Comparison of Quality Index Using Clustering Techniques for S...An Analysis and Comparison of Quality Index Using Clustering Techniques for S...
An Analysis and Comparison of Quality Index Using Clustering Techniques for S...
 
Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317
 
A GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENT
A GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENTA GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENT
A GENERAL STUDY ON HISTOGRAM EQUALIZATION FOR IMAGE ENHANCEMENT
 
E1803053238
E1803053238E1803053238
E1803053238
 
IRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means ClusteringIRJET-Multimodal Image Classification through Band and K-Means Clustering
IRJET-Multimodal Image Classification through Band and K-Means Clustering
 

Viewers also liked

KEVIN D. BROWN-NRA Instructor
KEVIN D. BROWN-NRA InstructorKEVIN D. BROWN-NRA Instructor
KEVIN D. BROWN-NRA InstructorKevin Brown
 
Resume
ResumeResume
Roadmix MOBILEjobsite App
Roadmix MOBILEjobsite AppRoadmix MOBILEjobsite App
Roadmix MOBILEjobsite App
Ross Telford
 
Http www-signofthewhalect-com-2ndspringfeverparty
Http www-signofthewhalect-com-2ndspringfeverpartyHttp www-signofthewhalect-com-2ndspringfeverparty
Http www-signofthewhalect-com-2ndspringfeverparty
Sign of the Whale CT
 
Dharmananda chakra chart
Dharmananda chakra chartDharmananda chakra chart
Dharmananda chakra chart
Heather Johnstone
 
ARQUITECTURA BIOCLIMÁTICA
ARQUITECTURA BIOCLIMÁTICAARQUITECTURA BIOCLIMÁTICA
ARQUITECTURA BIOCLIMÁTICA
Lorena Ramirez
 
Expo de atenelol
Expo de atenelolExpo de atenelol
Expo de atenelol
modeltop
 
Trabajo de informatica en power point
Trabajo de informatica en power point Trabajo de informatica en power point
Trabajo de informatica en power point
darlin ramirez
 
Antagonistas del calcio
Antagonistas del calcioAntagonistas del calcio
Antagonistas del calcio
Jonathan Trejo
 

Viewers also liked (9)

KEVIN D. BROWN-NRA Instructor
KEVIN D. BROWN-NRA InstructorKEVIN D. BROWN-NRA Instructor
KEVIN D. BROWN-NRA Instructor
 
Resume
ResumeResume
Resume
 
Roadmix MOBILEjobsite App
Roadmix MOBILEjobsite AppRoadmix MOBILEjobsite App
Roadmix MOBILEjobsite App
 
Http www-signofthewhalect-com-2ndspringfeverparty
Http www-signofthewhalect-com-2ndspringfeverpartyHttp www-signofthewhalect-com-2ndspringfeverparty
Http www-signofthewhalect-com-2ndspringfeverparty
 
Dharmananda chakra chart
Dharmananda chakra chartDharmananda chakra chart
Dharmananda chakra chart
 
ARQUITECTURA BIOCLIMÁTICA
ARQUITECTURA BIOCLIMÁTICAARQUITECTURA BIOCLIMÁTICA
ARQUITECTURA BIOCLIMÁTICA
 
Expo de atenelol
Expo de atenelolExpo de atenelol
Expo de atenelol
 
Trabajo de informatica en power point
Trabajo de informatica en power point Trabajo de informatica en power point
Trabajo de informatica en power point
 
Antagonistas del calcio
Antagonistas del calcioAntagonistas del calcio
Antagonistas del calcio
 

Similar to adaptive metric learning for saliency detection base paper

Visual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning ApproachesVisual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning Approaches
csandit
 
[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...
[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...
[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...
Ziyuan Zhao
 
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
AnuragVijayAgrawal
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various Algorithms
IJMTST Journal
 
Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...
Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...
Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...
IOSR Journals
 
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATIONMULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
ijaia
 
When deep learners change their mind learning dynamics for active learning
When deep learners change their mind  learning dynamics for active learningWhen deep learners change their mind  learning dynamics for active learning
When deep learners change their mind learning dynamics for active learning
Devansh16
 
Improving the Accuracy of Object Based Supervised Image Classification using ...
Improving the Accuracy of Object Based Supervised Image Classification using ...Improving the Accuracy of Object Based Supervised Image Classification using ...
Improving the Accuracy of Object Based Supervised Image Classification using ...
CSCJournals
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Human Re-identification with Global and Local Siamese Convolution Neural Network
Human Re-identification with Global and Local Siamese Convolution Neural NetworkHuman Re-identification with Global and Local Siamese Convolution Neural Network
Human Re-identification with Global and Local Siamese Convolution Neural Network
TELKOMNIKA JOURNAL
 
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
IRJET Journal
 
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
IRJET Journal
 
RANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGES
RANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGESRANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGES
RANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGES
sipij
 
Review of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachReview of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging Approach
Editor IJMTER
 
TBerger_FinalReport
TBerger_FinalReportTBerger_FinalReport
TBerger_FinalReport
Thaddeus Berger
 
Image Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual SaliencyImage Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual Saliency
IRJET Journal
 
528 439-449
528 439-449528 439-449
528 439-449
idescitation
 
Object based Classification of Satellite Images by Combining the HDP, IBP and...
Object based Classification of Satellite Images by Combining the HDP, IBP and...Object based Classification of Satellite Images by Combining the HDP, IBP and...
Object based Classification of Satellite Images by Combining the HDP, IBP and...
IRJET Journal
 
Robust Clustering of Eye Movement Recordings for Quanti
Robust Clustering of Eye Movement Recordings for QuantiRobust Clustering of Eye Movement Recordings for Quanti
Robust Clustering of Eye Movement Recordings for Quanti
Giuseppe Fineschi
 
Object Detection with Computer Vision
Object Detection with Computer VisionObject Detection with Computer Vision
Object Detection with Computer Vision
IRJET Journal
 

Similar to adaptive metric learning for saliency detection base paper (20)

Visual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning ApproachesVisual Saliency Model Using Sift and Comparison of Learning Approaches
Visual Saliency Model Using Sift and Comparison of Learning Approaches
 
[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...
[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...
[ICIP 2022] ACT-NET: Asymmetric Co-Teacher Network for Semi-Supervised Memory...
 
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
A Comprehensive Analysis on Co-Saliency Detection on Learning Approaches in 3...
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various Algorithms
 
Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...
Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...
Object Classification of Satellite Images Using Cluster Repulsion Based Kerne...
 
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATIONMULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
 
When deep learners change their mind learning dynamics for active learning
When deep learners change their mind  learning dynamics for active learningWhen deep learners change their mind  learning dynamics for active learning
When deep learners change their mind learning dynamics for active learning
 
Improving the Accuracy of Object Based Supervised Image Classification using ...
Improving the Accuracy of Object Based Supervised Image Classification using ...Improving the Accuracy of Object Based Supervised Image Classification using ...
Improving the Accuracy of Object Based Supervised Image Classification using ...
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Human Re-identification with Global and Local Siamese Convolution Neural Network
Human Re-identification with Global and Local Siamese Convolution Neural NetworkHuman Re-identification with Global and Local Siamese Convolution Neural Network
Human Re-identification with Global and Local Siamese Convolution Neural Network
 
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
A Survey on Different Relevance Feedback Techniques in Content Based Image Re...
 
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
IRJET-A Review on Implementation of High Dimension Colour Transform in Domain...
 
RANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGES
RANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGESRANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGES
RANSAC BASED MOTION COMPENSATED RESTORATION FOR COLONOSCOPY IMAGES
 
Review of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging ApproachReview of Image Segmentation Techniques based on Region Merging Approach
Review of Image Segmentation Techniques based on Region Merging Approach
 
TBerger_FinalReport
TBerger_FinalReportTBerger_FinalReport
TBerger_FinalReport
 
Image Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual SaliencyImage Retrieval using Graph based Visual Saliency
Image Retrieval using Graph based Visual Saliency
 
528 439-449
528 439-449528 439-449
528 439-449
 
Object based Classification of Satellite Images by Combining the HDP, IBP and...
Object based Classification of Satellite Images by Combining the HDP, IBP and...Object based Classification of Satellite Images by Combining the HDP, IBP and...
Object based Classification of Satellite Images by Combining the HDP, IBP and...
 
Robust Clustering of Eye Movement Recordings for Quanti
Robust Clustering of Eye Movement Recordings for QuantiRobust Clustering of Eye Movement Recordings for Quanti
Robust Clustering of Eye Movement Recordings for Quanti
 
Object Detection with Computer Vision
Object Detection with Computer VisionObject Detection with Computer Vision
Object Detection with Computer Vision
 

Recently uploaded

comptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdfcomptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdf
foxlyon
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
OKORIE1
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
PreethaV16
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
DharmaBanothu
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...
Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...
Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...
Dr.Costas Sachpazis
 
Literature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptxLiterature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptx
LokerXu2
 
paper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdfpaper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdf
ShurooqTaib
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
IJCNCJournal
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
wafawafa52
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
Kamal Acharya
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
Kamal Acharya
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
Northrop Grumman - Aerospace Structures Overvi.pdf
Northrop Grumman - Aerospace Structures Overvi.pdfNorthrop Grumman - Aerospace Structures Overvi.pdf
Northrop Grumman - Aerospace Structures Overvi.pdf
takipo7507
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 

Recently uploaded (20)

comptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdfcomptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdf
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
FULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back EndFULL STACK PROGRAMMING - Both Front End and Back End
FULL STACK PROGRAMMING - Both Front End and Back End
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...
Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...
Sachpazis_Consolidation Settlement Calculation Program-The Python Code and th...
 
Literature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptxLiterature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptx
 
paper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdfpaper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdf
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
Ericsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.pptEricsson LTE Throughput Troubleshooting Techniques.ppt
Ericsson LTE Throughput Troubleshooting Techniques.ppt
 
Supermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdfSupermarket Management System Project Report.pdf
Supermarket Management System Project Report.pdf
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
 
Accident detection system project report.pdf
Accident detection system project report.pdfAccident detection system project report.pdf
Accident detection system project report.pdf
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
Northrop Grumman - Aerospace Structures Overvi.pdf
Northrop Grumman - Aerospace Structures Overvi.pdfNorthrop Grumman - Aerospace Structures Overvi.pdf
Northrop Grumman - Aerospace Structures Overvi.pdf
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 

adaptive metric learning for saliency detection base paper

  • 1. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 3321 Adaptive Metric Learning for Saliency Detection Shuang Li, Huchuan Lu, Senior Member, IEEE, Zhe Lin, Member, IEEE, Xiaohui Shen, Member, IEEE, and Brian Price Abstract—In this paper, we propose a novel adaptive metric learning algorithm (AML) for visual saliency detection. A key observation is that the saliency of a superpixel can be estimated by the distance from the most certain foreground and background seeds. Instead of measuring distance on the Euclidean space, we present a learning method based on two complementary Mahalanobis distance metrics: 1) generic metric learning (GML) and 2) specific metric learning (SML). GML aims at the global distribution of the whole training set, while SML considers the specific structure of a single image. Considering that multiple similarity measures from different views may enhance the relevant information and alleviate the irrelevant one, we try to fuse the GML and SML together and experimentally find the combining result does work well. Different from the most existing methods which are directly based on low-level features, we devise a superpixelwise Fisher vector coding approach to better distinguish salient objects from the background. We also propose an accurate seeds selection mechanism and exploit contextual and multiscale information when constructing the final saliency map. Experimental results on various image sets show that the proposed AML performs favorably against the state-of-the-arts. Index Terms—Metric learning, saliency detection, Mahalanobis distance, Fisher vector. I. INTRODUCTION VISUAL saliency aims at finding the regions on an image that are more visually distinctive or important and often serves as a pre-processing procedure for many vision tasks, such as image categorization [1], image retrieval [2], image compression [3], content-aware image/video resizing [4], etc. Visual saliency basically breaks down into the problem of separating the salient regions from the non-salient ones by measuring differences in their features. Numerous models and algorithms have been proposed to perform this. Unsupervised approaches [5]–[9] are stimuli-driven and rely largely on distinguishing low-level visual features. Early unsupervised models, such as Gaussian pyramids [5], central-surround [5], fuzzy growing [10] are mainly inspired by original biological Manuscript received August 21, 2014; revised February 10, 2015 and April 10, 2015; accepted May 26, 2015. Date of publication June 3, 2015; date of current version June 23, 2015. This work was supported in part by the Natural Science Foundation of China under Grant 61472060 and in part by the Fundamental Research Funds for the Central Universities under Grant DUT14YQ101. The associate editor coordinating the review of this manuscript and approving it for publication was Mr. Pierre-Marc Jodoin. S. Li and H. Lu are with the School of Information and Communication Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China (e-mail: shuangli59app@gmail.com; lhchuan@dlut.edu.cn). Z. Lin, X. Shen, and B. Price are with Adobe Research, San Jose, CA 95110 USA (e-mail: zlin@adobe.com; xshen@adobe.com; bprice@adobe.com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2015.2440755 Fig. 1. The comparison between the Euclidean distance space and the Mahalanobis distance space. The Mahalanobis distance is more discriminative than the Euclidean distance, since its background part is less salient. vision stimulus. Later studies address saliency detection from broader views, e.g., convex hull [7], [11] and frequency domain [12], [13]. In contrast, supervised methods [14]–[16] incorporate high-level and known information to better distinguish the salient regions by learning salient visual information from a large number of images with ground truth labels. Despite the differences in these methods, they all require the basic ability to compute a difference measure on some regions features to distinguish them. To the best of our knowledge, all existing models address saliency detection based on the Euclidean distance. However, Euclidean distance weights features equally without considering the distribution of the data, thereby it becomes invalid when detecting objects in complex images. This phenomenon happens frequently in the saliency detection process, especially when the salient regions and backgrounds are similar, which leads to the problem that the Euclidean distances between the foregrounds and the similar backgrounds are smaller than the distances within the foregrounds. Figure 1 illustrates this problem. Given an image, we first select some initial seeds, including foreground and backgrounds seeds. The process of seeds selection is the same as Section III-C mentioned. We compute the distance between each superpixel and seeds and draw the distance distribution in Figure 1. We observed that the Mahalanobis distance is more distinctive than the Euclidean distance, since its background part is less salient. This motivates us to train a discriminative distance metric to assign appropriate weights to features so that the objects can be precisely separated from the background. We use metric learning to compute a more discriminative distance measure. Distance metric learning has been widely 1057-7149 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
  • 2. 3322 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 Fig. 2. The comparison between low-level features and our SFV feature. (a) input image. (b) saliency map based on low-level features. (c) saliency map based on SFV. (d) ground truth. Fig. 3. Pipeline of the adaptive metric learning algorithm. IT [5], GB [19], LR [14], RC [6] are other four saliency methods. adopted for this purpose in different applications since it takes into account the covariance information when estimating the data distributions and improves the performance of learning methods significantly. To our knowledge, we are the first to successfully formulate the saliency detection problem into a metric learning framework and our method works well on different databases. We also propose a Superpixel-wise Fisher Vector coding approach which maps the low-level features, such as RGB and LAB, to high dimensional sparse vector. Compared with using low-level features directly, the SFV is more discriminative in challenging environments as shown in Figure 2. Thus we use SFV features to describe each superpixel. In this paper, we adopt an effective feature coding method and propose a novel metric learning based saliency detection model, which incorporates both supervised and semi-supervised information. Our algorithm considers both the global distribution of the whole training dataset (GML) and the typical structure of a specific image (SML), and we successfully fuse them together to extract the clustering characteristics for estimating the final saliency map. Figure 3 shows the pipeline of our method. First, as an extension of the traditional Fisher Vector coding [17], Superpixel-wise Fisher Vector coding is proposed to describe superpixels by learning the parameters of a Gaussian mixture model (Section III-A). Second, we train a Generic metric from the training set (Section III-B1) and apply it to a single image to find the saliency seeds with the assistance of the superpixel-wise objectness map generated by [18] (Section III-C). Third, a Specific metric based on kernel classification is learnt from the chosen seeds for each image (Section III-B2). Finally, by integrating the Generic metric and Specific metric together (Section III-D), we obtain the clustering information for each superpixel and use it to generate the final saliency map (Section III-E). The GML and SML as shown in Figure 3 are two intermediate images which are not really generated when computing saliency maps. But they serve as comparisons to demonstrate the efficiency of the fused results in Section IV-A. The main contributions of our work include: • Two metric learning approaches are first applied to saliency detection as the optimal distance measure of two superpixels. GML is learnt from the global training set while SML is learnt from the specific image training samples. They are complementary to each other and achieve promising results after the affinity aggregation. • A superpixel-wise fisher vector coding method is first put forward which contains image contextual information when representing superpixels and makes supervised learning methods more suitable for single image processing. • An accurate seeds selection method is first presented based on the Mahalanobis distance metric. The selected seeds serve as training samples of the Specific metric learning and reference nodes when evaluating saliency values.
  • 3. LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3323 Experimental results on various image sets show that our method is comparable with most of the state-of-the-arts and the proposed metric learning approaches can be extended to other fields as well. II. RELATED WORK Significant improvement and prosperity in saliency detection have been witnessed in recent years. Numerous unsupervised approaches have been proposed under different theoretical models. Cheng et al. [6] propose a global region contrast algorithm which simultaneously considers the spatial coherence across the regions and the global contrast over the entire image. However, low-level color contrast becomes invalid when dealing with challenging scenes. Li et al. [20] compute the dense and sparse reconstruction errors based on background templates which are extracted from image boundaries. They propose several integration strategies, such as multi-scale reconstruction error and Bayesian integration, which improve the performance of saliency detection significantly. In [21], boundary connectivity, a robust background measure, is first applied to saliency detection. It characterizes the spatial layout of image regions and provides a specific geometrical explanation to its definition. Perazzi et al. [22] formulate the saliency estimation and complete contrast using high-dimensional Gaussian filters. They modify SLIC [23] and demonstrate the effectiveness of their superpixel segmentation approach in detecting salient objects. Furthermore, lacking the knowledge of sizes and locations of objects, boundary prior and objectness are often adopted to highlight the salient regions or depress the backgrounds. Jiang et al. [18] construct saliency by integrating three visual cues, including uniqueness, focusness and objectness (UFO), where uniqueness represents color contrast; focusness indicates the degree of focus, often appearing as the reverse of blurriness; objectness proposed by Alexe et al. [24] is the likelihood of a given image window containing an object. In [25], Wei et al. define the saliency value of each patch as the shortest distance to the image boundary, observing that image boundaries are more likely to be the background. However, this assumption is less convincing, especially when the scene is challenging. Compared with unsupervised approaches, supervised methods are apparently rare. In [26] and [27], Jiang et al. also propose a multi-scale learning approach, which maps the regional feature vector to a saliency score and fuse these scores across multiple levels to generate the final saliency map. They introduce a novel feature vector, which integrates the regional contrast, regional property and regional backgroundness descriptors together, to represent each region and learn a discriminative random forest regressor to predict regional scores. Shen and Wu [14] treat an image as the combination of sparse noises and the low-rank matrix. They extract low-level features to form high-level priors and then incorporate the priors to a low-rank matrix recovery model for constructing the saliency map. However, the saliency assignment near the object is unsatisfying due to the ambiguity of prior maps. Liu et al. [28] formulate the saliency detection as a partial differential equation problem and solve it under an adaptive PDE learning framework. They learn the optimal saliency seeds via discrete submodularity and use seeds as boundary condition to solve the Linear Elliptic System. Inspired by these works, we construct a metric fusion framework which contains two complementary metric learning approaches to generate robust and accurate saliency maps even in complex scenes. Our method encodes low-level features into a high-dimensional feature space and incorporates multi-scale and objectness information when measuring saliency values. Therefore, our method can uniformly highlight objects with explicit object boundaries. III. PROPOSED ALGORITHM In this section, we present an effective and robust adaptive metric learning method for visual saliency detection. The proposed algorithm proceeds through five steps to generate the final saliency map. Firstly, we extract low-level features to encode the superpixels generated by the simple linear iterative clustering (SLIC) [23] algorithm with a Superpixel-wise Fisher Vector representation. Secondly, two Mahalanobis distance metric learning approaches, Generic metric learning and Specific metric learning are introduced to learn the optimal distance measure of superpixels. Thirdly, we propose a novel seeds selection strategy based on the Mahalanobis distance to generate saliency seeds, which can be used to train Specific metric as training samples and evaluate the saliency values as referenced nodes. Fourthly, a metric fusion framework is presented to fuse the Generic and Specific metrics together. Finally, we obtain graceful and smooth saliency maps by combining the spectral clustering and multi-scale information. A. Superpixel-Wise Fisher Vector Coding (SFV) Appropriate feature coding approaches can effectively extract main information and remove the redundancies, thus greatly improving the performance of saliency detection. Fisher Vector can be regarded as an extension of the well-known bag-of-words representation, since it captures the first-order and second-order differences between local features and the centers of a Mixture of Gaussian Distributions. Recently, Chen et al. [29] extend Fisher Vector to the point level image representation for object detection. For a different purpose, we propose to further extend the FV coding to superpixel level and experimentally verify the superiority of our Superpixel-wise Fisher Vector coding method. Given a superpixel i= {pt, t = 1, . . ., T }, where pt is a -dimensional image pixel, and T is the number of pixels within i, we train a Gaussian mixture model (GMM) λ(pt) = K k=1 υkψk(pt) from all the pixels of an image using the Maximum Likelihood (ML) criterion. The parameters of the K-component GMM are defined as λ = {υk, μk, k, k = 1, . . ., K}, where υk, μk and k are the mixture weight, mean vector and covariance matrix of Gaussian k respectively. Similar to the FV coding method, the SFV representation can be written as a = 2 K-dimensional concatenated form: ϕi = {ζμ1, ζσ1 , . . . , ζμK , ζσK} (1)
  • 4. 3324 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 where ζμk and ζσk are defined as: ζμk = 1 T √ υk T t=1 ηt (k) pt −μk σk , ζσk = 1 T √ υk T t=1 ηt (k) 1√ 2 {(pt−μk )2 σ2 k − 1}, and σk is the square root of the diagonal values of k, ηt (k) is the soft assignment of pt to Gaussian k. The SFV representation ϕi is hereby used to describe superpixel i in this paper. It has several advantages: • As an extension of Fisher Vector coding, SFV successfully realizes superpixel level coding representation, making Fisher Vector more suitable for single image processing. Instead of averaging low-level features of contained pixels, SFV statistically analyzes the internal feature distribution of each superpixel, providing a more accurate and reliable representation for it. Experiments show that our SFV generates more smooth and uniform saliency maps and improves about 2 percent compared with low-level features in the precision-recall curve on the MSRA-1000 database as shown in Figure 7. • SFV can be regarded as an adaptive Fisher Vector coding, since the parameters of the GMM model are trained on a specific image online. This means even the same superpixels in different images have different coding representations. Therefore, our SFV better considers image contextual information. • Due to the small number of superpixels in an image and their disjoint nature, SFV is much faster than existing state-of-the-art FV variants. Furthermore, besides saliency detection, SFV can also be applied to other vision tasks, such as image segmentation and content-aware image resizing, etc. B. Adaptive Metric Learning Learning a discriminative metric can better distinguish the samples in different classes, as well as shortening the distance within the same class. Numerous models and methods have been proposed in the last decade, especially for the Mahalanobis distance metric learning, such as information theoretic metric learning (ITML) [30], large margin nearest neighbor (LMNN) [31], [32], and logistic discriminative based metric learning (LDML) [33]. However, most existing metric learning approaches learn a fixed metric for all samples without considering the deeper structure of the data, thereby breaking down in the presence of irrelevant or unreliable features. In this paper, we propose an adaptive metric learning approach, which considers both the global distribution of the whole training set (GML) and the specific structure of a single image (SML) to better separate objects from the background. Our approach can also be viewed as an integration of a supervised distance metric learning model (GML) and a semi-supervised distance metric learning model (SML). Since GML and SML are complimentary to each other, we get promising results after fusing them together under an affinity aggregation framework (Section III-D). 1) Generic Metric Learning (GML): Metric learning has been widely applied to vision tasks, but never been used for saliency detection because of its long training time, which is infeasible for single image processing. In this part, we solve this problem by pre-training a Generic metric Mg from the first 500 images of MSRA-1000 database using gradient descent, and we verify, both experimentally and empirically, that Mg is generally suitable for all images. First, we construct a training set {ϕi, i = 1, 2, . . . , M} consisted of superpixels extracted from all training images, where ϕi is the SFV representation of superpixel i. To find the most discriminative Mg, we minimize M∗ g = arg min Mg 1 2 α Mg 2 + n {ij|δn i =1,δn j =0} D(i, j) (2) D(i, j) = exp{−(ϕi − ϕj)T Mg(ϕi − ϕj )/σ2 1 } (3) where δn i is an indicator of the ith superpixel in the nth image belonging to the foreground or background, D(i, j) is the exponential Mahalanobis distance between i and j under the distance metric Mg. We set σ1 = 0.1 to control the strength of distances. Considering that the background is various and chaotic, and different object regions are distinctive as well, we just impose restriction on pairwise distances between positive samples and negative ones, which is more reliable and reasonable for the fact that salient objects are always distinct from the background. This minimization aims at maximizing feature distances between foreground and background samples, thereby significantly improving the performance of saliency detection. Eqn 2 can be easily solved by gradient descent. The Generic metric includes the information of all superpixels in the whole training images, thus it is appropriate for most images. 2) Specific Metric Learning (SML): Recently, Wang et al. [34] propose a novel doublet-SVM metric learning approach based on Kernel Classification Framework, thus formulating the metric learning into a SVM problem and achieving desirable results with less training time. However, experiments show that directly applying doublet-SVM to saliency detection cannot ensure good detection accuracy. Therefore, we modify this approach by adding a constraint ω(τ1,τ2), which significantly improves the performance of the final saliency map. Let {ϕi, i = 1, 2, . . . , m} be the training dataset, where ϕi is the SFV representation of a labeled superpixel extracted from a specific image. The detailed process of extracting labeled superpixels from an image will be discussed in Section III-C. We first divide these samples into foreground seeds and background seeds and label them as 1 and 0 respectively. Given a training sample ϕi with label hi , we find its q1 nearest neighbors with the same label and q2 nearest neighbors with different labels, and then (q1 + q2) doublets are constructed for it. Each doublet consists of the training sample ϕi and one of its nearest neighbors. By combining the doublets of all samples together, a doublet set χ = {x1, x2, . . . , xZ } is established, where xτ = (ϕτ,1, ϕτ,2), τ = 1, 2, . . . Z is one of the doublets, and ϕτ,1 and ϕτ,2 are the SFV of superpixel τ1 and τ2 in doublet xτ , We assign xτ a label as follows: lτ = −1 if hτ,1 = hτ,2, and lτ = 1 if hτ,1 = hτ,2.
  • 5. LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3325 As an extension of degree-2 polynomial kernel, we define the doublet level degree-2 polynomial kernel as: Kp(xτ , xι) = tr ω(τ1,τ2)(ϕτ,1 − ϕτ,2)(ϕτ,1 − ϕτ,2)T ω(ι1,ι2)(ϕι,1 − ϕι,2)(ϕι,1 − ϕι,2)T = ω(τ1,τ2)ω(ι1,ι2){(ϕτ,1 − ϕτ,2)T (ϕι,1 − ϕι,2)}2 (4) where ω(τ1,τ2) = θ(τ1,τ2) ∗ O(τ1,τ2) is a weight parameter. θ(τ1,τ2) = 1−exp(−dist(τ1,τ2)/σ2) (5) O(τ1,τ2) = 1 − exp{−(Oτ1 − Oτ2)2 /σ2} (6) where dist(τ1,τ2) is the space distance between superpixel τ1 and τ2, and θ(τ1,τ2) is the corresponding exponential space distance. Oτ1 is the objectness score defined as Eqn 11 of superpixel τ1, and O(τ1,τ2) is the superpixel-wise objectness distance between τ1 and τ2. We set σ2 = 0.1. The weight parameter ω(τ1,τ2) provides crucial spatial and prior informa- tion regarding the interesting objects, thus it is more robust in evaluating the similarity between a pair of superpixels than the feature distance alone. In order to determinate the similarity of two samples in a doublet, we further define a kernel decision function as follows: E(x) = sgn{ τ ατ lτ Kp(xτ , x) + β} (7) where ατ is the weight of doublet xτ , β is a bias parameter. We have τ ατ lτ Kp(xτ , x) + β = ω(x1,x2)(ϕx,1 − ϕx,2)T Ms(ϕx,1 − ϕx,2) + β (8) Ms = τ ατ lτ ω(τ1,τ2)(ϕτ,1 − ϕτ,2)(ϕτ,1 − ϕτ,2)T (9) For the facility of computation, we set ω(x1,x2)=1. The proposed Specific metric Ms can be easily solved by existing SVM solvers. The Specific metric is trained only on the test image, and it is much faster than existing metric learning approaches. According to [34], the doublet-SVM is 2000 times, on average, faster than the ITML [30]. Therefore, it is feasible to train a Specific metric for each image to better distinguish its objects from the background. In this part, we propose two metric learning approaches: GML and SML. The first one considers more about the global distribution of the whole training set, while the second one aims at exploring the deeper structure of a specific image. GML can be pretrained offline and is generally suitable for all images, while SML is much faster, since it can be solved by existing SVM solvers. We need to mention that the image specific is not always better than the Generic metric, as it has fewer training samples and less reliable labels. Instead, these two metrics are supposed to be complementary to each other and can be fused together to improve the performance of the final detection results. C. Iterative Seeds Selection by Mahalanobis Distance (ISMD) As a preliminary criterion of saliency detection, saliency seeds directly influence the performance of seeds-based solutions. Recently, Liu et al. [28] propose an optimal seeds selection strategy via submodularity. By adding a stop criterion, the submodularity problem can be solved and then the optimal seed set is obtained accordingly. In [35], Lu et al. learn optimal seeds by combining bottom-up saliency maps and mid-level vision cues. Inspired by their works, we propose a compact but efficient iterative seeds selection scheme based on the Mahalanobis distance assessment (ISMD). Alexe et al. [24] present a novel objectness method to measure the likelihood of a given image window containing an object. Jiang et al. [18] extend the original objectness to Pixel-level Objectness O(p) and Region-level Objectness Oi by defining: O(p) = W w=1 P(w) (10) Oi = 1 T p∈i O(p) (11) where W is the number of sampling windows that contain pixel p, and P(w) is the probability score of the wth window, T is the number of pixels within region i. We redefine the region-level objectness as superpixle-wise objectness in this paper. Motivated by the fact that highlights of the superpixle-wise objectness map are more likely to be the foreground seeds, a set of initial foreground seeds is constructed from the lightest two percent regions of the objectness map. Considering that the background is massive and scattered, we pick out several lowest objectness values from each boundary of the superpixel-wise objectness map as initial background seeds. The intuition is that if superpixel i is a foreground seed, the ratio of distances from foreground seeds and background seeds should be small. We formulate the ratio as follows: i = f s drat(i, f s) bs drat(i, bs) (12) where drat(i, f s) = φ(i, f s)(ϕi − ϕ f s)Mg(ϕi − ϕ f s)T (13) is the Mahalanobis distance between superpixel i and one of foreground seeds f s under the Generic metric Mg, and φ(i, f s) = d(i, f s) ∗ O(i, f s) is a weight parameter, where d(i, f s) = exp(−dist2 (i, f s)/σ2) (14) is another kind of exponential space distance between superpixel i and f s. Only when i ≤ 0 or i ≥ 1, i can be added to the foreground seeds set or background seeds set, where 0 and 1 are two thresholds. With the new added seeds each time, we iterate this process N1 times. Since most of the area in an image belongs to the back- ground, in order to generate more background seeds, the iteration continues N2 times more, but only selects seeds
  • 6. 3326 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 Fig. 4. Iterative seeds selection by Mahalanobis distance. Initial saliency seeds are first selected from the lightest and the darkest parts of the superpixel- wise objectness map. By computing the Mahalanobis distance between any superpixel and the chosen seeds, we iteratively increase the foreground and background seeds. with bs drat(i, bs) ≤ 2, where 2 is a threshold. Then we obtain the final seeds set as illustrated in Figure 4. As elaborated in Section III-B2, the Specific metric Ms can be learnt from the labeled seeds via doublet-SVM. One may concern that Ms will rely too much on Mg, since the labeled seeds are generated under Mg. Fortunately, by learning a generally suitable metric, we can enforce a very high seeds accuracy (98.82% on MSRA-1000 database) which means the seeds-based Specific metric is reliable enough to measure the distance. D. Metric Fusion for Extracting Spectral Clustering Characteristics Aggregating several affinity matrices appropriately may enhance the relevant and useful information, and at the same time, alleviate the irrelevant and unreliable one. Spectral clustering is an important unsupervised clustering algorithm for transferring the feature representation into a more discriminative indicator space, and we call this property as “spectral clustering characteristics”. Spectral clustering has been applied to many fields for its effective and outstanding performance. In this section, we merge the metric fusion into a spectral clustering features extraction process [36] and learn the optimal aggregation weight for each affinity matrix. The fusion strategy significantly improves the results of saliency detection as shown in Figure 5. Based on the two metrics learnt above, two affinity matrices g and s are constructed with the corresponding i jth element π g i, j = exp{−φ(i, j)(ϕi − ϕj )Mg(ϕi − ϕj )T /σ3} πs i, j = exp{−φ(i, j)(ϕi − ϕj )Ms(ϕi − ϕj )T /σ3} (15) where σ3 = 0.1. The affinity aggregation strategy aims at finding the optimal clustering characteristic vector of all the superpixels in an image and the weight parameter ϑ = [ϑg, ϑs]T associated with g and s, so the fusion Fig. 5. Evaluation of metrics. (a) input images. (b) Generic metric. (c) Specific metric. (d) fused results. (e) ground truth. problem can be conducted as: min ϑg,ϑs 1,..., r { i, j ϑ2 g π g i, j i − j 2 + i, j ϑ2 s πs i, j i − j 2 } = min ϑg,ϑs 1,..., r {ϑ2 g T (Hg − g) + ϑ2 s T (Hs − s) } = min ϑg,ϑs (βgϑ2 g + βsϑ2 s ) (16) where i is the clustering characteristic indicator of superpixel i, and r is the number of superpixels in an image, Hg = diag{h11, . . . , hrr } is the diagonal matrix of g with its diagonal element hii = j π g i, j , βg = T (Hg − g) . To solve this problem, we first employ two constraints: the normalized weight constraint ϑg + ϑs = 1 and the normalized spectral clustering constraint T H = 1. By fixing ϑ, the clustering characteristic vector can be easily obtained using standard spectral clustering. If is given, Eqn 16 can be formulated as: min ϑg,ϑs (βgϑ2 g + βsϑ2 s ) = min μg,μs (ρgμ2 g + ρsμ2 s ) (17) subject to μ2 g + μ2 s = 1, μg √ αg + μs √ αs = 1 (18) where αg = T Hg , ρg= βg αg and μg = √ αgϑg. This can be easily solved by existing 1D line-search methods. To summarize, metric fusion tries to find the optimal clustering characteristic vector and the optimal weight parameter ϑ via a two-step iterative strategy. Since affinity matrices incorporate φ(i, j) in Eqn 15, the convergence can be very fast, about three iterations in each image. We use the indicator representation to compute saliency maps (Section III-E). E. Context-Based Multi-Scale Saliency Detection In this section, we propose a context-based multi-scale saliency detection algorithm to compute the saliency map for each image. Lacking the knowledge of sizes of objects, we first generate superpixels in S different scales. Then the K-means algorithm is applied in each scale to segment an image into
  • 7. LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3327 Fig. 6. The distribution of saliency values of ground truth foregrounds and backgrounds. (a) Generic metric on MSRA-1000. (b) Specific metric on MSRA-1000. (c) AML on MSRA-1000. (d) AML on MSRA-5000. N clusters via their SFV features. According to the intuition that a superpixel is salient if its cluster neighbors are close to the foreground seeds and far from the background seeds, we define the distance between superpixel i and saliency seeds in scale s as: D(s) i, f s = f n(s) q=1 {γ i − q + (1 − γ) N (s) c j=1 Wi, j j − q } D (s) i,bs = bn(s) q=1 {γ i − q + (1 − γ) N (s) c j=1 Wi, j j − q } (19) where Wi, j = Q1 exp{−dist(i, j)/σ2} ∗ Q2 exp{−(Oi − Oj )2 /σ2} (20) is the weighted distance between superpixel i and its cluster neighbor j, and i is the clustering characteristic indicator of superpixel i, f n and bn are the number of foreground and background seeds chosen by our ISMD seeds selection approach. Q1, Q2 and γ are weight parameters, Nc is the number of cluster neighbors of superpixels i. The saliency value of superpixel i can be formulated as: sal(i) = S s=1 νs ∗ exp(Oi) 1 + {(1 − exp(−D(s) i, f s/σ4)}/D(s) i,bs = S s=1 νs ∗ exp(Oi) ∗ D(s) i,bs D (s) i,bs + 1 − exp(−D (s) i, f s/σ4) (21) where νs is the weight of scale s, and σ4 = 0.1. The considerations of all the other superpixels belonging to the same cluster and multiple scales smooth the saliency map effectively, and make our approach more robust in dealing with complicated scenes. IV. EXPERIMENTS We evaluate the proposed method on four benchmark datasets. The first one is MSRA-1000 [13], a subset of MSRA-5000, which has been widely used in previous works with its accurate human-labelled masks. The second one is MRAS-5000 dataset [15] which includes 5000 more comprehensive images. The third one is THUS-10000 [37] consists of 10000 images, each of which has an unambiguous salient object with pixel-wise ground truth labeling. The last Fig. 7. (a) Precision-recall curve for Generic metric, Specific metric, and fused results without neighbor smoothness (MSRA-1000 and Berkeley-300). Precision-recall curve based on SFV and low-level features. Precision-recall curve for other two fusion methods. (b) Images of fused results based on SFV and low-level features. one is Berkeley-300 [38] which contains more challenging scenes with multiple objects of different sizes and locations. Since we have already used the first 500 images of MSRA-1000 for training, we evaluate our algorithm and compare it with other methods on the rest 500 images of MSRA-1000, 4500 images of MSRA-5000, where excludes 500 training images (MSRA-5000 contains all the images of MSRA-1000), 9501 images of THUS-10000 (THUS-10000 contains 499 training images), and Berkeley-300. A. Evaluation of Metrics We perform several comparative experiments as shown in Figure 5, Figure 6 and Figure 7(a) to demonstrate the efficiency of Generic metric (GML), Specific metric (SML), and their combination (AML based on SFV). In order to eliminate the influence of neighbor smoothness, Eqn 19, when comparing metrics, we just compute the distance between each superpixel and seeds, instead of the sum of weighted distances of its cluster neighbors: D(s) i, f s = f n(s) q=1 i − q , D(s) i,bs = bn(s) q=1 i − q (22) The precision-recall curves of the Generic metric and Specific metric are almost the same, but their combination outperforms both of them. We also try to add or multiply saliency maps generated by these two metrics directly, but the PR curves are much lower than our fusion approach in Figure 7(a). This is consistent with our motivation: Mg is trained from the
  • 8. 3328 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 Fig. 8. Results of different methods. (a), (b) Precision-recall curves on MSRA-1000. (c) Average precisions, recalls, F-measures and AUC on MSRA-1000. (d), (e) Precision-recall curves on MSRA-5000. (f) Average precisions, recalls, F-measures and AUC on MSRA-5000. Fig. 9. Results of different methods. (a), (b) Precision-recall curves on THUS-10000. (c) Average precisions, recalls, F-measures and AUC on THUS-10000. (d), (e) Precision-recall curves on Berkeley-300. (f) Average precisions, recalls, F-measures and AUC on Berkeley-300. whole training dataset, containing the global distribution of the data, and Ms aims at a single image, considering the specific structure of samples. Figure 5 demonstrates that the fused results significantly remove the light saliency values in the background regions produced by GML and SML. Since most parts in computing saliency maps under different metrics are the same, e.g., objectness prior map, seeds selection, etc., it is reasonable that Figure 5 (b) and (c) are similar, but there are still differences between them. To further prove this, we conduct an extra experiment as shown in Figure 11. The second line is the results generated by fusing the GML with itself, the third line is the results generated by fusing the SML with itself, and the fourth line the obtained by fusing the GML and SML. We call them as GG, SS, and AML respectively. Limited by the image resolution, some differences between the GML and SML may not be find in Figure 5, but the integration with the metric itself can apparently enlarge their distinctiveness. Furthermore, if one metric is incorrect, another one can make up it. The SS performs better than the GG in Figure 11 (a)-(e), while the GG is better in (f)-(g), and the AML tends to take the best results of them, which demonstrates that the GML and SML are indeed complimentary to each other and improve the performance of saliency detection after fusion. Figure 11 (k)-(m) show that if both the GML and SML get bad results, the results after fusion are still bad. In addition, we plot the distribution of saliency values in Figure 6. Ground truth masks provide a specific label, 1 or 0, for each pixel and we regard a superpixel as foreground when more than 80% pixels of it are labelled by 1. Otherwise, the superpixel will be background. We put all the foreground superpixels from the whole dataset together and get the distribution of their saliency values computed by different saliency methods as the red line. The blue line is the distribution of saliency values of background superpixels. Figure 6(a), (b), (c) are the saliency distribution produced by GML, SML and AML on MSRA-1000 respectively. Figure 6(d) is AML on MSRA-5000. This shows that AML is better than GML and SML, since its background saliency values are closer to 0. Furthermore, our Generic metric is robust to different databases. We use the metric trained from MSRA-1000 to all the databases, including MSRA-1000, MSRA-5000, THUS-10000, and Berkeley-300. As shown in Figure 8 and Figure 9, the results are still promising even on different databases, which demonstrates the effectiveness and adaptiveness of our Generic metric. Overall, the fused results based on two outstanding and complementary metrics achieve higher precision and recall values and generate more accurate saliency maps. B. Evaluation of Superpixel-Wise Fisher Vector We have mentioned that our Superpixel-wise Fisher Vector coding approach can improve the performance of saliency detection by capturing the average first-order and second-order differences between local features and the centers of a Mixture of Gaussian Distributions. In experiments, we extract the low-level features: RGB and LAB to learn a 12D SFV representation for each superpixel ( = 6, K = 1, = 2 K = 12). Figure 7(a) shows the efficiency of our SFV coding approach by comparing the precision-recall curves of low-level features and the SFV on MSRA-1000 database. Figure 7(b) are corresponding images. C. Evaluation of Saliency Maps We compare the proposed saliency detection model with several state-of-the-art methods: IT [5], GB [19], FT [13],
  • 9. LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3329 Fig. 10. The comparison of previous methods, our algorithm and ground truth. (a) Test image. (b) IT [5]. (c) GB [19]. (d) GC [39]. (e) CB [44]. (f) UFO [18]. (g) Proposed. (h) Ground truth. GC [39], UFO [18], SVO [40], HS [41], PD [42], AMC [43], RCJ [37], DSR [20], DRFI [26], CB [44], RC [6], LR [14] and XL [45]. We use source codes provided by the authors or implement them based on the available codes or softwares. We conduct several quantitative comparisons of some typical saliency detection methods. Figure 8(a), (b), (d) and (e) show that the proposed AML is comparable with most of the state-of-the-arts on MSRA-1000 and MSRA-5000 databases. Figure 8(c) and (f) are the comparisons of average precision, recall, F-measure and AUC. We use AUC as an evaluation criteria, since it represents the area under the PR curve and can effectively reflect the global properties of different algorithms. Instead of using the bounding boxes to evaluate the saliency detection performances on MSRA-5000 database, we adopt the accurate human-labeled masks provided by [26] to ensure more reliable comparative results. We also perform experiments on THUS-10000 and Berkeley-300 databases as shown in Figure 9. Precision-recall curves show that AML reaches 97.4%, 94.0%, 96.5%, 81.5% precision rate on MSRA-1000, MSRA-5000, THUS-10000, and Berkeley-300 respectively. All of them demonstrate the efficiency of our method. Figure 10 shows some sample results of five previous approaches and our AML algorithm. The IT and GB methods are capable in finding the salient regions in most cases, but they tend to highlight the boundaries and miss lots of object information because of the blurriness of saliency maps. The GC method cannot contain all the salient pixels and often mislabels small background patches as salient regions. The CB and UFO models can highlight the objects uniformly, but they become invalid in dealing with challenging scenes. Our method can catch both the small and large salient objects even in complex environments. In addition, we can highlight the objects uniformly with accurate boundaries and do not need to care about the number and locations of the salient objects. We also test the average computational cost on different datasets: 18.15s on MSRA-1000, 18.42s on MSRA-5000, 17.90s on THUS-10000 and 18.78s on Berkeley-300. The pro- posed algorithm is implemented in MATLAB on a PC machine with Intel i7-3370 CPU (3.4 GHz) and 32 GB memory. D. Evaluation of Selected Seeds We train an effective Specific metric based on the assumption that the selected seeds are correct. In experiments,
  • 10. 3330 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 11, NOVEMBER 2015 Fig. 11. Example results of different metrics. The first line is the input images, the second line is the results generated by fusing the GML with itself, the third line is the results generated by fusing the SML with itself, the fourth line is obtained by fusing the GML and SML, and the last line is the ground truth images. we cannot ensure that the chosen seeds are completely accurate, but we can enforce a very high seeds accuracy. The accuracy of selected seeds is defined as follows: sa = f sc + bsc f st + bst = f sc + bsc ( f sc + f sic) + (bsc + bsic) (23) where f sc = n i (gtn i &seedn i ) bsc = n i (gtn i &seedn i ) (24) i represents the ith superpixel extracted from the nth image of a typical database. gtn i and seedn i are the ground truth and label assigned by our seeds selection mechanism of i. The accuracy rates of four databases are: 0.9882 on MSRA-1000, 0.9769 on MSRA-5000, 0.9822 on THUS-10000 and 0.8874 on Berkeley-300. We experimentally verify that the seeds are accurate enough to generate a reliable Specific metric for each image. V. CONCLUSION In this paper, we explicitly propose two Mahalanobis distance metric learning models and a superpixel-wise fisher vector representation for visual saliency detection. To our knowledge, we are the first to apply metric learning to saliency detection and conduct a metric fusion mechanism to improve the detection accuracy. Different from previous methods, we adopt a new feature coding strategy and make the supervised metric learning more suitable for single image processing. In addition, we propose an accurate seeds selection method based on the Mahalanobis distance measure to train the Specific metric and construct the final saliency map. We estimate the saliency value of each superpixel from a multi-scale view and include the contextual information when computing it. Experimental results with sixteen state-of-the-art algorithms on four benchmark image databases demonstrate the efficiency of our metric learning approach and the saliency detection model. In the future, we plan to explore more robust object detection approaches to further improve the accuracy of saliency detection. REFERENCES [1] C. Siagian and L. Itti, “Rapid biologically-inspired scene classification using features shared with visual attention,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2, pp. 300–312, Feb. 2007. [2] H. Liu, X. Xie, X. Tang, Z.-W. Li, and W.-Y. Ma, “Effective browsing of Web image search results,” in Proc. 6th ACM SIGMM Int. Workshop Multimedia Inf. Retr., 2004, pp. 84–90. [3] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The JPEG2000 still image coding system: An overview,” IEEE Trans. Consum. Electron., vol. 46, no. 4, pp. 1103–1127, Nov. 2000. [4] Y. Niu, F. Liu, X. Li, and M. Gleicher, “Warp propagation for video resizing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 537–544. [5] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp. 1254–1259, Nov. 1998. [6] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu, “Global contrast based salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 409–416. [7] Y. Xie, H. Lu, and M.-H. Yang, “Bayesian saliency via low and mid level cues,” IEEE Trans. Image Process., vol. 22, no. 5, pp. 1689–1698, May 2013. [8] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection via graph-based manifold ranking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 3166–3173. [9] J. Sun, H. Lu, and X. Liu, “Saliency region detection based on Markov absorption probabilities,” IEEE Trans. Image Process., vol. 24, no. 5, pp. 1639–1649, May 2015. [10] Y.-F. Ma and H.-J. Zhang, “Contrast-based image attention analysis by using fuzzy growing,” in Proc. 11th ACM Int. Conf. Multimedia, 2003, pp. 374–381. [11] J. Sun, H. Lu, and S. Li, “Saliency detection based on integration of boundary and soft-segmentation,” in Proc. IEEE Int. Conf. Image Process., Sep./Oct. 2012, pp. 1085–1088. [12] X. Hou and L. Zhang, “Saliency detection: A spectral residual approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2007, pp. 1–8. [13] R. Achanta, S. Hemami, F. Estrada, and S. Süsstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2009, pp. 1597–1604. [14] X. Shen and Y. Wu, “A unified approach to salient object detection via low rank matrix recovery,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 853–860. [15] T. Liu et al., “Learning to detect a salient object,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 353–367, Feb. 2011. [16] J. Yang and M.-H. Yang, “Top-down visual saliency via joint CRF and dictionary learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2296–2303. [17] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, “Image classification with the Fisher vector: Theory and practice,” Int. J. Comput. Vis., vol. 105, no. 3, pp. 222–245, 2013.
  • 11. LI et al.: ADAPTIVE METRIC LEARNING FOR SALIENCY DETECTION 3331 [18] P. Jiang, H. Ling, J. Yu, and J. Peng, “Salient region detection by UFO: Uniqueness, focusness and objectness,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1976–1983. [19] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Proc. Adv. Neural Inf. Process. Syst., 2006, pp. 545–552. [20] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, “Saliency detection via dense and sparse reconstruction,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 2976–2983. [21] W. Zhu, S. Liang, Y. Wei, and J. Sun, “Saliency optimization from robust background detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 2814–2821. [22] F. Perazzi, P. Krahenbuhl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 733–740. [23] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “SLIC superpixels compared to state-of-the-art superpixel methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282, Nov. 2012. [24] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of image windows,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2189–2202, Nov. 2012. [25] Y. Wei, F. Wen, W. Zhu, and J. Sun, “Geodesic saliency using background priors,” in Proc. 12th Eur. Conf. Comput. Vis. (ECCV), 2012, pp. 29–42. [26] H. Jiang, Z. Yuan, M.-M. Cheng, Y. Gong, N. Zheng, and J. Wang. (2014). “Salient object detection: A discriminative regional feature inte- gration approach.” [Online]. Available: http://arxiv.org/abs/1410.5926 [27] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 2083–2090. [28] R. Liu, J. Cao, Z. Lin, and S. Shan, “Adaptive partial differential equation learning for visual saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 3866–3873. [29] Q. Chen et al., “Efficient maximum appearance search for large-scale object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 3190–3197. [30] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, “Information-theoretic metric learning,” in Proc. 24th Int. Conf. Mach. Learn., 2007, pp. 209–216. [31] K. Q. Weinberger, J. Blitzer, and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” in Proc. Adv. Neural Inf. Process. Syst., 2005, pp. 1473–1480. [32] K. Q. Weinberger and L. K. Saul, “Fast solvers and efficient implementations for distance metric learning,” in Proc. 25th Int. Conf. Mach. Learn., 2008, pp. 1160–1167. [33] M. Guillaumin, J. Verbeek, and C. Schmid, “Is that you? Metric learning approaches for face identification,” in Proc. IEEE 12th Int. Conf. Comput. Vis., Sep./Oct. 2009, pp. 498–505. [34] F. Wang, W. Zuo, L. Zhang, D. Meng, and D. Zhang. (2013). “A kernel classification framework for metric learning.” [Online]. Available: http://arxiv.org/abs/1309.5823 [35] S. Lu, V. Mahadevan, and N. Vasconcelos, “Learning optimal seeds for diffusion-based salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 2790–2797. [36] H.-C. Huang, Y.-Y. Chuang, and C.-S. Chen, “Affinity aggregation for spectral clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 773–780. [37] M.-M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S.-M. Hu, “Global contrast based salient region detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 569–582, Mar. 2014. [38] V. Movahedi and J. H. Elder, “Design and perceptual validation of performance measures for salient object segmentation,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops, Jun. 2010, pp. 49–56. [39] M.-M. Cheng, J. Warrell, W.-Y. Lin, S. Zheng, V. Vineet, and N. Crook, “Efficient salient region detection with soft image abstraction,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1529–1536. [40] K.-Y. Chang, T.-L. Liu, H.-T. Chen, and S.-H. Lai, “Fusing generic objectness and visual saliency for salient object detection,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 914–921. [41] Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1155–1162. [42] R. Margolin, A. Tal, and L. Zelnik-Manor, “What makes a patch distinct?” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013, pp. 1139–1146. [43] B. Jiang, L. Zhang, H. Lu, C. Yang, and M.-H. Yang, “Saliency detection via absorbing Markov chain,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 1665–1672. [44] H. Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, and S. Li, “Automatic salient object segmentation based on context and shape prior,” in Proc. BMVC, 2011, pp. 110.1–110.12 [45] Y. Xie and H. Lu, “Visual saliency detection based on Bayesian model,” in Proc. 18th IEEE Int. Conf. Image Process., Sep. 2011, pp. 645–648. Shuang Li is currently pursuing the B.E. degree with the School of Information and Communication Engineering, Dalian University of Technology (DUT), China. From 2012 to 2015, she was a Research Assistant with the Computer Vision Group, DUT. Her research interests focus on saliency detection and object recognition. Huchuan Lu (SM’12) received the M.Sc. degree in signal and information processing and the Ph.D. degree in system engineering from the Dalian University of Technology (DUT), Dalian, China, in 1998 and 2008, respectively. He joined as a Faculty Member in 1998, and is currently a Full Professor with the School of Information and Communication Engineering, DUT. His current research interests include the areas of computer vision and pattern recognition with a focus on visual tracking, saliency detection, and segmentation. He is also a member of the Association for Computing Machinery and an Associate Editor of the IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS—PART B. Zhe Lin (M’10) received the B.Eng. degree in automatic control from the University of Science and Technology of China, in 2002, the M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology, in 2004, and the Ph.D. degree in electrical and computer engineering from the University of Maryland, College Park, in 2009. He has been a Research Intern with Microsoft Live Labs Research. He is currently a Senior Research Scientist with Adobe Research, San Jose, CA. His research interests include deep learning, object detection and recognition, image classification and tagging, content-based image and video retrieval, human motion tracking, and activity analysis. Xiaohui Shen (M’11) received the B.S. and M.S. degrees from the Department of Automation, Tsinghua University, China, and the Ph.D. degree from the Department of Electrical Engineering and Computer Sciences, Northwestern University, in 2013. He is currently a Research Scientist with Adobe Research, San Jose, CA. He is generally interested in the research problems in the area of computer vision, in particular, image retrieval, object detection, and image understanding. Brian Price received the Ph.D. degree in computer science from Brigham Young University under the advisement of Dr. B. Morse. He has contributed new features to many Adobe products, such as Photoshop, Photoshop Elements, and After-Effects, mostly involving interactive image segmentation and matting. He is currently a Senior Research Scientist with Adobe Research, specializing in computer vision. His research interests include semantic seg- mentation, interactive object selection and matting, stereo and RGBD, and broad interest in computer vision and its intersections with machine learning and computer graphics.