Transcript of "Towards data driven estimation of image tag relevance using visually similar and dissimilar folksonomy images"
TOWARDS DATA-DRIVEN ESTIMATION OF IMAGE TAG RELEVANCE USING VISUALLYSIMILAR AND DISSIMILAR FOLKSONOMY IMAGES ACM Multimedia 2012: Workshop on Socially-Aware Multimedia Nara – Oct. 29, 2012 Sihyoung Lee1, Wesley De Neve1,2, Yong Man Ro11 Image and Video Systems Lab, Dept. of Electrical Engineering, KAIST 2 Multimedia Lab, ELIS, Ghent University - iMinds
4 /19 Introduction• Increasing online availability of images – thanks to easy-to-use multimedia devices and online services – thanks to cheap storage and bandwidth – thanks to an increasing number of people going online• Some statistics – every minute, over 2,500 images are uploaded to Flickr – every day, over 300 million photos are uploaded to Facebook• How to effectively retrieve images for consumption purposes?
5 /19 Problems in Image Folksonomies• Most image search engines strongly depend on tags• Non-relevant tags hinder effective consumption Among the 60 images retrieved, only 20 images are related to ‘apple’
7 /19 Motivation• The correlation between visual and semantic similarity – is high for images that are semantically and visually distant – is lower for images that are semantically and visually close The probability of having semantically similar images in a set of visually similar images is lower than the probability of having semantically dissimilar images in a set of visually dissimilar images• The above observation motivated us to develop a novel technique for tag relevance learning – takes advantage of both visually similar and dissimilar images
9 /19 Conceptual Illustration of The Proposed Method food bicycle sign solder desert airplane desert desert bicycle desertbuilding bicycle Image tag relevance estimation desert rifle desert bicycle desert Image tag relevance desert, bicycle using visually dissimilar images ship desert desert bicycle street bicycle nature desert bicycle Image tag relevance atomium using the proposed method basketball bicycle
10 /19 Proposed Method• Let r (i, t) be an image tag relevance learning function based on the proposed method, then it is defined as r ( i, t ) := rsimilar ( i, t , k ) − rdissimilar ( i, t , l ) ∑ vote( j, t ) – rsimilar ( i, t , k ) := nt [ N s ( i, k ) ] − nt [ N rand ( k ) ] = j=N∑i ,vote( j , t ) − k ⋅ j=I s( k) I ∑ vote( j , t ) – r dissimilar ( i, t , u ) := n [ N ( i, l ) ] − n [ N ( l ) ] = ∑( vote( j, t ) − l ⋅ t d t rand j =I ) j = N d i ,l I where nt[∙] represents the number of images annotated with t, Ns(i,k) is a set of k images visually similar to i , Nd(i,l) is a set of l images visually dissimilar to i , and Nrand(k) is a set of k randomly selected neighbors Relationship between rsimilar ( i, t , k ) , rdissimilar ( i, t , l ) , and r ( i, t ) rsimilar ( i, t , k ) rdissimilar ( i, t , l ) r ( i, t )t relevant + - ++t irrelevant - + --
11 /19 Rationale• For trelevant relevant to the content of i, – P(trelevant|Ns(i,k)) is higher than P(trelevant|Nrand(k)) rsimilar(trelevant,i,k) thus returns a positive value – P(trelevant|Nd(i,l)) is lower than P(trelevant|Nrand(l)) rdissimilar(trelevant,i,l) thus returns a negative value• For tirrelevant irrelevant to the content of i, – P(tirrelevant|Ns(i,k)) is lower than P(tirrelevant|Nrand(k)) rsimilar(tirrelevant,i,k) thus returns a negative value – P(tirrelevant|Nd(i,l)) is higher than P(tirrelevant|Nrand(l)) rdissimilar(tirrelevant,i,l) thus returns a positive value
12 /19 Outline• Introduction• Motivation• The Proposed Image Tag Relevance Estimation• Experiments• Conclusions
13 /19 Experimental Setup (1/2)• Image set used: subset of MIRFlickr-1M – 100,000 images annotated with 1,130,342 tags by 13,343 users concept vocabulary of 159,300 unique tags – test set 1,000 images annotated with at least four tags annotated with 24,474 tags • we manually classified 6,534 tags as correct • we manually classified 17,940 tags as noisy• Image descriptor – Bag of Visual Words (BoVW) vocabulary size: 500 – use of cosine similarity for measuring image similarity
14 /19 Experimental Setup (2/2)• Metrics used for evaluating the effectiveness of the proposed technique for image tag relevance estimation – for image tag refinement A noise NL = A where NL (Noise Level) denotes the proportion of irrelevant tag assignments in the set of all tag assignments, A is the set of tag assignments in an image folksonomy, Anoise is the set of incorrect (noisy) tag assignments – for tag-based image retrieval It ∩ I t ,m relevant retrieved P @ m for t = , m where Tt is the set of all folksonomy images relevant to t, relevant Tt,mrelevant is the set of the m topmost images that have been retrieved for t (given the estimated tag relevance values)
Effectiveness of Image Tag Relevance 15 /19 Estimation for Image Tag Refinement • Effectiveness of image tag relevance estimation using visually similar and dissimilar images, compared to previous approaches – neighbor voting and a variant of neighbor voting estimate image tag relevance by only making use of visually similar images After image tag refinement Before image tag refinement Using visually similar images Using the proposed technique Number of relevant tags 6,534 5,881 5,881Number of irrelevant tags 17,940 13,117 12,094 NL 0.733 0.690 0.673
Effectiveness of Image Tag Relevance 16 /19 Estimation for Tag-based Image Retrieval• Effectiveness of image tag relevance estimation using visually similar and dissimilar images, compared to previous approaches – neighbor voting and a variant of neighbor voting estimate image tag relevance by only making use of visually similar images
18 /19 Conclusions• We proposed an image tag relevance technique that makes use of both visually similar and dissimilar images – increases the difference in image tag relevance between tags relevant and tags not relevant with respect to a seed image – comes with a low increase in computational complexity• The effectiveness of the proposed technique was confirmed using MIRFLICKR-25000 and MIRFLICKR-1M – by showing that the proposed technique allows increasing the effectiveness of tag refinement and tag-based image retrieval• Future research – combining visual information and tag statistics comparing our data-driven approach with a classifier-based approach for detecting a number of predefined semantic concepts