Exploiting collective knowledge in an image folksonomy for semantic-based near-duplicate video detection


Published on

Exploiting collective knowledge in an image folksonomy for semantic-based near-duplicate video detection.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Exploiting collective knowledge in an image folksonomy for semantic-based near-duplicate video detection

  1. 1. Exploiting Collective Knowledge in an Image Folksonomy for Semantic-based Near-duplicate Video Detection Hyun-seok Min, Wesley De Neve, and Yong Man Ro Image and Video Systems Lab Korea Advanced Institute of Science and Technology (KAIST) Daejeon, South Korea e-mail: hsmin@kaist.ac.kr website: http://ivylab.kaist.ac.kr I. INTRODUCTION IV. DETECTION OF NEAR-DUPLICATES - Increasing number of duplicates and near-duplicates on websites for Video matching aims at determining whether a given query video video sharing sequence Vq appears in a target or reference video sequence Vt - need for efficient and effective near-duplicate detection techniques - Conventional video signatures are based on low-level visual features - The semantic dissimilarity between two video sequences Vq and Vt: - highly sensitive to spatiotemporal transformations N - This paper proposes a novel technique for semantic-based near- 1 duplicate video detection d video ( U q , Ut ) = N ∑d i =1 q t shot ( A i , A i + p ), - based on the observation that near-duplicates still convey the same semantic information U q , U t : the semantic video signatures of Vq and Vt - takes advantage of the wide variety of user-supplied tags present in p : the video shot in the reference video sequence a set of user-contributed images (i.e., an image folksonomy) at which similarity measurement starts - The semantic distance between two video shots: II. SYSTEM ARCHITECTURE Query video sequence A iq ∩ A tj d shot ( A iq , A tj ) = , A : the cardinality of A Pre-processing A iq × A tj Shot segmentation V. EXPERIMENTS Low-level feature extraction 1. Experimental setup Creation of a semantic video signature - Our experiments made use of the MUSCLE-VCD-2007 dataset - To construct an image folksonomy, 3000 images with at least one or Detection of semantic concepts Image folksonomy more relevant tags were retrieved from Flickr Creation of semantic signature 2. Experimental results - The proposed method misclassified only two out of 15 spatially Video matching using semantic video signatures transformed query video sequences Reference Semantic video matching video - For the 1,604 query video shots, the total number of detected semantic database concepts is 7,927 Computation of similarity - five semantic concepts were predicted on average for a video shot - among the 7,927 detected semantic concepts, 272 different concepts Near-duplicate detection could be identified Decide whether the query video is a near- duplicate or not 3. Visual results Fig. 1. Semantic-based near-duplicate detection using an image folksonomy Reference video sequence Query video sequence III. MODEL-FREE SEMANTIC CONCEPT DETECTION The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again. Folksonomy images (strongly tagged images) Key I1 I2 … IF frame Visual similarity measurement si Nearest neighbor images Nearest ith shot of a query video I1 … IK neighbor sequence images If : folksonomy image Folksonomy-based semantic concept detection … … … … … … … … : tag Set of tags The frequency of tag t in the set of Detected : tag frequency & the … visual neighbors interior, home, inside, night, home, house, interior, inside, style, semantic number of images reflects the sunset cottage Semantic concepts concepts labeled with t in the relevance of tag t image folksonomy with respect to Fig. 3. Example key frames with visual neighbors and detected semantic concepts … … … … the content of si . (underlined semantic concepts are considered to be correct) Fig. 2. Folksonomy-based semantic concept detection VI. CONCLUSIONS - Metric for measuring the relevance of a tag t: - This paper discussed a novel technique for semantic-based near- duplicate video detection c Lt c : neighbor images tag t in the set of K nearest the frequency of - near-duplicates still convey the same semantic information J (t ) = − , - takes advantage of the wide variety of user-supplied tags present in K F Lt : the number of images labeled with tag t in the an image folksonomy (i.e., collective knowledge) image folksonomy (containing F images) - Semantic video signatures are constructed by detecting semantic - The semantic signature U of V, with V = {S1, S2, …, SN}: concepts along the temporal axis of video sequences - our model-free approach is able to exploit an unrestricted tag U = {A1, A2,K, AN }. Ai : the set of semantic concepts for Sj vocabulary (unlike model-based semantic concept detection) - Preliminary experimental results look encouraging IEEE International Conference on Image Processing (ICIP), September 2010, Hong Kong