This document summarizes a research paper on visual-textual joint relevance learning for tag-based social image search. It introduces the challenges with conventional tag-based search and proposes using a hypergraph to jointly model visual and textual features. The method constructs hyperedges based on visual words and tags to build the hypergraph. It then performs semi-supervised learning on the hypergraph to learn relevance scores. Experimental results show the proposed approach achieves better performance than baseline methods.
Visual textual joint relevance learning for tag based social image search
1. Visual-Textual Joint Relevance Learning
For Tag-Based Social Image Search
Sushil Kumar(16EC65R16)
M.Tech, VIPES
IIT Kharagpur
1
2. Content
• Introduction
• Related Works
A) Social Image Search
B) Hypergraph Learning
• Hypergraph Analysis
• Visual-Textual Relevance Learning
• Experimental Results
• Conclusion
• References
2
3. Introduction
Conventional tag-based social image search methods cannot achieve satisfactory results for
two reasons
1) Too much noise in user-provided tags
2) It lacks an optimal ranking strategy
Ranking options
1) Time based ranking
2) Interestingness based ranking
Most existing Algorithms usually explore visual content and tags separately or sequentially.
3
4. Fig. 1[1]. Schematic illustration of the proposed visual-textual joint relevance learning approach.
4
5. Related works
A) Social Image Search
1) Separated Methods: In separated methods, only the textual content or the
visual content is employed for tag analysis.
2) Sequential Methods: In sequential methods, the visual content and the tags are
sequentially employed.
B) Hypergraph Learning
Hypergraph has been employed for image retrieval and object recognition.
5
7. Hypergraph Analysis
In a simple graph, samples are represented by vertices and an edge links the
two related vertices.
Hyperedge in a hypergraph is able to link more than two vertices.
Vertices represents images and hyperedges are representing tag/visual terms.
A hypergraph G = (V,E,w) is composed by a vertex set V, an edge set E, and the weights
of the edges w.
The hypergraph G can be denoted by a |V| × |E| incidence matrix H with entries defined
as
h 𝑣, 𝑒 =
1, 𝑖𝑓 𝑣 ∈ 𝑒
0, 𝑖𝑓 𝑣 ∉ 𝑒
7
(1)
8. For a vertex v ∈ V, its vertex degree can be estimated by
d(v) =
𝑒∈ 𝐸
𝑤 𝑒 ℎ(𝑣, 𝑒)
For a hyperedge e ∈ E, its hyperedge degree can be estimated by
δ 𝑒 =
v∈V
ℎ(𝑣, 𝑒)
W denote the diagonal matrix of the hyperedge weights
W 𝑖, 𝑗 =
𝑤 𝑖 , 𝑖𝑓 𝑖 = 𝑗
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
8
(4)
(3)
(2)
9. Learning
Regularization framework
Arg minf {λRemp( f ) +Ω ( f )}
where f is the to-be-learned classification function, Ω( f ) is a regularizer on the
hypergraph, Remp( f ) is empirical loss, and λ > 0 is a weighting parameter.
9
(5)
(6)
10. In matrix form
Ω( f ) = f T Δ f ; Δ=I-Θ
Θ=Dv
-1/2HWDe
-1HTDv
-1/2a
Dv and De the diagonal matrices of the vertex degree and the
hyperedge degree.
The loss term is defined as:
Remp(f)= 𝑓−𝑦 2 = u∈V f (u) − y (u)) 2
y is an n × 1 initial label vector.
10
(7)
(9)
11. Step 1. Hypergraph Construction
1. For social image set X = (x1, x2, . . . , xn) as a vertex in the hypergraph G = (V,E,w).
2. Generate a bag-of-visual-words description fi
bow for each image xi, where fi
bow (k, 1) = 1 indicates
that xi contains the kth visual word. Construct hyperedges by using fi
bow.
3. For each image, the tags are ranked and only top min (nl , ni ) tags are left for further processing.
4. Generate a bag-of-textual-words description fi
tag for each image xi, where fi
tag (k, 1) = 1 indicates that
xi contains the kth selected tag.
5. Construct hyperedges by using fi
tag.
6. Generate the incidence matrix Hi , the diagonal matrices of the vertex degrees and the hyperedge
degrees Dv and De, the initial weights of all hyperedges w respectively.
Visual-Textual Relevance Learning
Algorithm
11
12. Step 2. Pseudo-Relevant Sample Selection
The Flickr Distance is employed to estimate the semantic relevance of an image xi to
the query tag tq , and the top K results are selected as the pseudo-relevant images.
Step 3. Relevance Learning on Hypergraph
Conduct semi-supervised learning on the hypergraph structure. Iteratively learn the
to-be-learned relevance score vector f and the weights for hyperedge w.
12
13. Fig. 3[1]. Examples of hyperedge construction. (a) Example of textual hyperedge
construction, where three hyperedges are generated by tags “people,” “gun,”
and “tank.” (b) Example of visual hyperedge construction, where three hyperedges are
generated by three visual words.
Fig. 4[1]. Example of the connection
between two images.
13
14. Query Seq. HG HG+WE HG+WE(VIS) HG+WE(TAG)
Airshow 0.4193 0.5759 0.7183 0.5847 0.6869
Apple 0.2433 0.6975 0.8128 0.8100 0.7875
Aquarium 0.5640 0.8163 0.9346 0.9189 0.9134
Basin 0.2981 0.4911 0.6115 0.6178 0.5946
Beach 0.5986 0.8270 1.0000 0.9949 0.9869
Bird 0.8931 0.9576 0.9653 0.9375 0.9618
Bmw 0.5910 0.6244 0.7265 0.7048 0.6826
Table [1][1]:The NDCG @7 results of different methods. the best result in each row is marked in blue
Experimental Results
14
NDCG=Normalised Discounted Cumulative Gain
15. (a)
(b)
(c)
(d)
(e)
Fig. 5. Top results obtained by
different methods for the query apple.
(a) Sequential social image relevance
learning.
(b) Hypergraph-based relevance
learning.
(c) Hypergraph based relevance
learning with hyperedge weight
estimation, i.e., the proposed method.
(d) Proposed learning method with
merely visual information.
(e) Proposed learning method with
merely tag information.
15
16. Conclusion
• In the proposed method, both visual content and tags are used to generate the
hyperedges of a hypergraph, and a relevance learning procedure is performed on
the hypergraph structure
• Experimental results demonstrate that the proposed method achieved better results
compared with many base line methods including sequential social image ranking,
Hypergraph-based relevance learning, HG-WE (Visual) and HG-WE (Tag).
16
17. References
[1]. Y. Gao, M. Wang, Z. J. Zha, J. Shen, X. Li and X. Wu, "Visual-Textual
Joint Relevance Learning for Tag-Based Social Image Search," in IEEE
Transactions on Image Processing, vol. 22, no. 1, pp. 363-376, Jan. 2013.
[2]. M. Wang, K. Yang, X. S. Hua, and H.-J. Zhang, “Toward a relevant and
diverse search of social images,” IEEE Trans. Multimedia, vol. 12, no. 8, pp.
829–842, Dec. 2010.
[3]. Q. Liu, Y. Huang, and D. Metaxas, “Hypergraph with sampling for image
retrieval,” Pattern Recognit., vol. 44, nos. 10–11, pp. 2255–2262, 2011.
17