Exploiting collective knowledge in an image folksonomy for semantic-based near-duplicate video detection

Exploiting Collective Knowledge in an Image Folksonomy
for Semantic-based Near-duplicate Video Detection
Hyun-seok Min, Wesley De Neve, and Yong Man Ro
Image and Video Systems Lab
Korea Advanced Institute of Science and Technology (KAIST)
Daejeon, South Korea
e-mail: hsmin@kaist.ac.kr website: http://ivylab.kaist.ac.kr
I. INTRODUCTION IV. DETECTION OF NEAR-DUPLICATES
- Increasing number of duplicates and near-duplicates on websites for
Video matching aims at determining whether a given query video
video sharing
sequence Vq appears in a target or reference video sequence Vt
- need for efficient and effective near-duplicate detection techniques
- Conventional video signatures are based on low-level visual features - The semantic dissimilarity between two video sequences Vq and Vt:
- highly sensitive to spatiotemporal transformations N
- This paper proposes a novel technique for semantic-based near- 1
duplicate video detection
d video ( U q , Ut ) =
N ∑d
i =1
q t
shot ( A i , A i + p ),

- based on the observation that near-duplicates still convey the same
semantic information U q , U t : the semantic video signatures of Vq and Vt
- takes advantage of the wide variety of user-supplied tags present in p : the video shot in the reference video sequence
a set of user-contributed images (i.e., an image folksonomy) at which similarity measurement starts
- The semantic distance between two video shots:
II. SYSTEM ARCHITECTURE
Query video sequence
A iq ∩ A tj
d shot ( A iq , A tj ) = , A : the cardinality of A
Pre-processing A iq × A tj
Shot segmentation
V. EXPERIMENTS
Low-level feature extraction
1. Experimental setup
Creation of a semantic video signature - Our experiments made use of the MUSCLE-VCD-2007 dataset
- To construct an image folksonomy, 3000 images with at least one or
Detection of semantic concepts Image
folksonomy more relevant tags were retrieved from Flickr
Creation of semantic signature 2. Experimental results
- The proposed method misclassified only two out of 15 spatially
Video matching using semantic video signatures transformed query video sequences
Reference
Semantic video matching video
- For the 1,604 query video shots, the total number of detected semantic
database concepts is 7,927
Computation of similarity - five semantic concepts were predicted on average for a video shot
- among the 7,927 detected semantic concepts, 272 different concepts
Near-duplicate detection could be identified
Decide whether the query video is a near-
duplicate or not 3. Visual results
Fig. 1. Semantic-based near-duplicate detection using an image folksonomy Reference video sequence Query video sequence

III. MODEL-FREE SEMANTIC CONCEPT DETECTION
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.

Folksonomy images (strongly tagged images) Key
I1 I2 … IF
frame

Visual similarity measurement
si Nearest neighbor images
Nearest
ith shot of a query video I1 … IK neighbor
sequence
images
If : folksonomy image
Folksonomy-based semantic concept detection …
…
…
… …
…
…
…
: tag Set of tags The frequency of
tag t in the set of Detected
: tag frequency & the … visual neighbors interior, home, inside, night, home, house, interior, inside, style,
semantic
number of images reflects the sunset cottage
Semantic concepts concepts
labeled with t in the relevance of tag t
image folksonomy with respect to Fig. 3. Example key frames with visual neighbors and detected semantic concepts
…
…
…
… the content of si . (underlined semantic concepts are considered to be correct)
Fig. 2. Folksonomy-based semantic concept detection VI. CONCLUSIONS
- Metric for measuring the relevance of a tag t: - This paper discussed a novel technique for semantic-based near-
duplicate video detection
c Lt c : neighbor images tag t in the set of K nearest
the frequency of - near-duplicates still convey the same semantic information
J (t ) = − , - takes advantage of the wide variety of user-supplied tags present in
K F Lt : the number of images labeled with tag t in the an image folksonomy (i.e., collective knowledge)
image folksonomy (containing F images)
- Semantic video signatures are constructed by detecting semantic
- The semantic signature U of V, with V = {S1, S2, …, SN}: concepts along the temporal axis of video sequences
- our model-free approach is able to exploit an unrestricted tag
U = {A1, A2,K, AN }. Ai : the set of semantic concepts for Sj vocabulary (unlike model-based semantic concept detection)
- Preliminary experimental results look encouraging

IEEE International Conference on Image Processing (ICIP), September 2010, Hong Kong

Exploiting collective knowledge in an image folksonomy for semantic-based near-duplicate video detection

Recommended

Recommended

More Related Content

More from Wesley De Neve

More from Wesley De Neve (20)

Recently uploaded

Recently uploaded (20)

Exploiting collective knowledge in an image folksonomy for semantic-based near-duplicate video detection