At&t research at trecvid 2009


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

At&t research at trecvid 2009

  1. 1. AT&T Research at TRECVID 2009<br />Content-based Copy Detection<br />
  2. 2. TRECVID 2009<br />TREC Video Retrieval Evaluation<br />Specials for 2009 <br />Tasks<br />surveillance event detection<br />high-level feature extraction<br />search (interactive, manually-assisted, and/or fully automatic)<br />content-based copy detection<br />
  3. 3. Video data<br />Sound and Vision<br />The Netherlands Institute for Sound and Vision<br />news magazine, science news, news reports, documentaries, educational programming, and archival video<br />BBC rushes<br />unedited material <br />All materials in MPEG-1.. yep!)<br />
  4. 4. Datasets <br />Development <br /> (32.9 GB) (reference) <br /> (31.4 GB) (reference) <br /> (64.3 GB) (reference) <br /> (12.2 GB) (non-reference) <br /> (10.9 GB) (non-reference) <br /> (10.8 GB) (non-reference) <br />Test <br /> (32.9 GB) (reference) <br /> (31.4 GB) (reference) <br /> (64.3 GB) (reference) <br /> (114.8 GB) (reference) <br /> (12.2 GB) (non-reference) <br /> (10.9 GB) (non-reference) <br /> (10.8 GB) (non-reference) <br /> (19.0 GB (non-reference)<br />
  5. 5. Content-based copy detection<br />copyright control<br />business intelligence<br />advertisement tracking<br />law enforcement investigations<br />
  6. 6. Video transformation <br />Picture in picture (The original video is inserted in front of a background video) <br />Insertions of pattern <br />Strong reencoding<br />Change of gamma <br />Decrease in quality <br />Blur, change of gamma, frame dropping, contrast, compression, ratio, white noise<br />Post production <br />Crop, shift, contrast, caption (text insertion), flip (vertical mirroring), insertion of pattern, Picture in Picture (the original video is in the background)<br />Change to randomly choose 1 transformation from each of the 3 main categories.<br />
  7. 7. AT&T Research at TRECVID 2009Content-based Copy Detection<br />Applications<br />discovering copyright infringement of multimedia content<br />monitoring commercial air time<br />querying video by example<br />Approaches<br />digital video watermarking<br />content based copy detection (CBCD).<br />
  8. 8. Overview<br />
  9. 9. Content based sampling<br />Shot boundary detection (SBD)<br />Adopts a “divide and conquer” strategy<br />Six independent detectors:<br />Cut, fade in, fade out, fast dissolve (less than 5 frames), dissolve and motion<br />Each detector is a finite state machine (FSM)<br />FSMs depent on two types of visual features:<br />Intra-frame (only one frame)<br />Inter-frame (current frame+previous frame)<br />
  10. 10. Overview<br />
  11. 11. Transformation detection andnormalization for query keyframe<br />Letterbox detection<br />Picture-in-picture detection<br />Query Keyframe Normalization<br />
  12. 12. Transformation detection andnormalization for query keyframe<br /><ul><li>Letterbox detection
  13. 13. Picture-in-picture detection
  14. 14. Canny edge detection operator</li></ul><br />
  15. 15. Transformation detection andnormalization for query keyframe<br />Query Keyframe Normalization<br />Equalize and blur the query keyframe to overcome the effect of change of Gamma and white noise transformations.<br />
  16. 16. Transformation detection andnormalization for query keyframe<br />And we have 10 types of query keyframe: original, letterbox removed, PiP scaled, equalized, blurred and flipped versions of these five types<br />
  17. 17. Overview<br />
  18. 18. Reference keyframe transformation<br />Only 2 transformations <br />Half-resolution rescaling <br />For compared with the detected PiP region in the query keyframes<br />Strong re-encoding<br />For dealing with the strong re-encoded query keyframes. <br />And we have 3 types of reference keyframe<br />
  19. 19. Overview<br />
  20. 20. Scale-invariant feature transform SIFT Extraction<br />
  21. 21. Scale-invariant feature transform SIFT Extraction<br />It’s main feature for locating video copies<br />Locating the keypoints that have local maximum Difference of Gaussian values both in scale and in space. (specified by location, scale and orientation)<br />Computing a descriptor for each keypoint. The descriptor is the gradient orientation histogram, which is a 128 dimension feature vector.<br />
  22. 22. Overview<br />
  23. 23. Locality sensitive hashing (LSH)<br />The basic idea <br />hash the input items so that similar items are mapped to the same buckets with high probability<br />a – random vector following a Gaussian distribution with zero mean and unit variance<br />w – preset bucket size<br />b – in range [0,w]<br />
  24. 24. Overview<br />
  25. 25. Indexing and search by LSH<br />Sort LSH values independency<br />Save with SIFT identifications in separate index file<br />SIFT identifications: (String)<br />Reference video ID<br />Keyframe ID<br />SIFT ID<br />
  26. 26. Overview<br />
  27. 27. Keyframe level query refinement<br />Two issues:<br />the original SIFT matching by Euclidian distance is not reliable<br />it‘s possible that two SIFT features that are far away mapped to the same LSH value<br />
  28. 28. Keyframe level query refinement<br />Random Sample Consensus (RANSAC)<br />
  29. 29. Keyframe level query refinement<br />Random Sample Consensus (RANSAC)<br /><ul><li>Randomly select 3 pairs of matching keypoints(having the same LSH)
  30. 30. Determine the affine model
  31. 31. Transform all keypoints in the reference keyframe into the query keyframe
  32. 32. Count the number of keypoints in the reference whose transformed to the coordinates of their matching keypoints in the query keyframe. These keypoints are called inliers
  33. 33. Repeat steps 1 to 4 for a certain number of times, and output the maximum number of inliers</li></li></ul><li>Keyframe level query refinement<br />Transformations: PiP, shift, ratio..<br />
  34. 34. Overview<br />
  35. 35. Keyframe level result merge<br />If one reference keyframe appears more than once in the 12 lists<br />New relevance score set to be maximum score<br />
  36. 36. Overview<br />
  37. 37. Video level result fusion<br />Get pair (i, j) with the best sum relevance<br />
  38. 38. Overview<br />
  39. 39. Video relevance score normalization<br />Normalize the relevance scores into range [0,1]<br />x – original relevance score<br />y – normalized one<br />
  40. 40. Overview<br />
  41. 41. CBCD result generation<br />Query video ID<br />Reference video ID<br />Information of copied reference video segment<br />Starting frame of copied segment in the query video<br />Decision score <br />
  42. 42. CBCD Evaluation Results<br />Dataset<br />1407 short query videos<br />838 reference videos<br />208 non-reference videos<br />Extract<br />For entire reference video set<br />268,000 keyframes<br />57,000,000 SIFT features<br />For entire query video set<br />18,000 keyframes<br />2,600,000 SIFT features<br />
  43. 43. CBCD Evaluation Criteria<br />Parameters for NoFA profile<br />Parameters for Balanced profile<br />
  44. 44. CBCD Evaluation Results<br />
  45. 45. CBCD Evaluation Results<br />
  46. 46. CBCD Evaluation Results<br />
  47. 47. About<br /><br /><br /><br />
  48. 48. Want more information?<br />KirillLazarev<br />Skype: kirill_lazarev<br />Mail:<br />Twitter:<br />