Video Copy Detection Using                                     Inclined Video Tomography and Bag-of-Visual-Words          ...
Upcoming SlideShare
Loading in …5
×

Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words

456 views
362 views

Published on

Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
456
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words

  1. 1. Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words Hyun-seok Min, Se Min Kim, Wesley De Neve, and Yong Man Ro Image and Video Systems Lab Korea Advanced Institute of Science and Technology (KAIST) Daejeon, South Korea e-mail: ymro@ee.kaist.ac.kr website: http://ivylab.kaist.ac.krI. INTRODUCTION III. VIDEO MATCHING USING HISTOGRAMS - The dissimilarity between two video clips Vq and Vr:- BoVW-based approaches can be effectively used for the detection of both image and video copies N p: the position of the video shot in the 1 - however, these approaches typically ignore the inherent temporal q D(V , V ) = min p N r ∑ q r Dshot (S i , S i + p ), reference video clip at which similarity measurement starts nature of video content i =1 q N r r L- Conventional video tomography extracts slices from a space-time cube q V = Si V = Sl i =1 l =1 that are parallel to the time axis - however, slices that are parallel to the time axis do not take advantage - The dissimilarity between two video shots Sq and Sr is measured by of spatial information making use of the cosine similarity:- This paper proposes to create a content-based video signature by means M of the following two sequential steps ∑ q aj r ×aj M: the number of visual words in the vocabulary 1) extraction of inclined tomography images from the video content q r j =1 - angle of inclination is dependent on the amount of motion Dshot (S , S ) = 1 - , aj: the weight of the jth visual word M M 2) characterization of the inclined tomography images by means of BoVW ∑ )∑ ) ( ( q 2 aj r 2 aj j =1 j =1II. CREATION OF A VIDEO SIGNATURE BY MEANS OF IV. EXPERIMENTS INCLINED VIDEO TOMOGRAPHY AND BOVW 1. Experimental setup1. Extraction of inclined tomography images - Use of TRECVID 2009 for creating NDVCs and reference video clips - To extract inclined tomography images from a video clip V, we first - Use of 100 query video clips by applying five transformations to 20 segment V into N space-time cubes such that V = <S1, S2, …, SN> video clips randomly selected from the reference video database - We subsequently segment each space-time cube into several space- - blurring: we blurred frames using a Gaussian kernel with a radius time sub-cubes of 15; - picture-in-picture: we inserted a picture with a size that is 30% of Fv, Fb : number of frames in a space- the size of the main frame; time cube and space-time sub-cube - change in brightness: we increased the brightness with 40%; Wv, Wb : width of a space-time cube - mirroring: we reversed frames from the left to the right; and space-time sub-cube - change in frame rate: we halved the frame rate. Hv, Hb : height of a space-time cube and space-time sub-cube 2. Experimental results 1.1 1.1 1 1 0.9 0.9 Fig. 1. Segmentation of a space-time cube into space-time sub-cubes. 0.8 0.8 0.7 0.7 Precision Recall 0.6 0.6 - The angle of inclination of the tomography image extracted reflects the 0.5 0.5 intensity of motion in the space-time sub-cube under consideration 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 blur pattern change in mirroring frame rate average blur pattern change in mirroring frame rate average insertion brightness change insertion brightness change Transformations Transformations Proposed video signature BoVW using SIFT Video tomography Proposed video signature BoVW using SIFT Video tomography Fig. 4. Comparison of the effectiveness of several video signatures. Fig. 2. Extraction of an inclined tomography image from a space-time sub-cube. L(x, y, t): the luminance value of a β θ= Wb × H b × Fb ∑L( x, y, f + 1) - L( x, y, f ) , pixel (x, y) of a particular frame at time t ( x, y , f ) β: a weight parameter2. BoVW applied to inclined tomography images - each space-time cube Si can be represented as a vector Ai that summarizes how the space-time sub-cubes are distributed over the vocabulary of visual words used (a) (b) M: the number of visual words vj in the Fig. 5. Example images: (a) example key frame and (b) 16 inclined tomography images A i = ai,1 , ai ,2 ,...,ai ,M , vocabulary used extracted from the key frame shown in (a). ai,j: the weight of the jth visual word V. CONCLUSIONS - This paper introduced a novel video signature that takes advantage of both inclined video tomography and BoVW - The proposed video signature is able to capture both spatial and temporal information - the angle of inclination of the extracted tomography images isFig. 3. Extraction of a histogram of visual words from an inclined tomography image. dependent on the amount of motion in the local volumes IEEE International Conference on Multimedia and Expo (ICME), July 2012, Melbourne (Australia)

×