Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Understanding Near-Duplicate Videos: A User-Centric Approach


Published on

Popular content in video sharing web sites (e.g., YouTube) is usually duplicated. Most scholars define near-duplicate video clips (NDVC) based on non-semantic features (e.g., different image/audio quality), while a few also include semantic features (different videos of similar content). However, it is unclear what features contribute to the human perception of similar videos. Findings of two large scale online surveys (N = 1003) confirm the relevance of both types of features. While some of our findings confirm the adopted definitions of NDVC, other findings are surprising. For example, videos that vary in visual content –by overlaying or inserting additional information– may not be perceived as near-duplicate versions of the original videos. Conversely, two different videos with distinct sounds, people, and scenarios were considered to be NDVC because they shared the same semantics (none of the pairs had additional information). Furthermore, the exact role played by semantics in relation to the features that make videos alike is still an open question. In most cases, participants preferred to see only one of the NDVC in the search results of a video search query and they were more tolerant to changes in the audio than in the video tracks. Finally, we propose a user-centric NDVC definition and present implications for how duplicate content should be dealt with by video sharing websites.

Published in: Technology, Business
  • Be the first to comment

Understanding Near-Duplicate Videos: A User-Centric Approach

  1. 1. Near-Duplicate Videos
  2. 2. Let’s say you’re looking for the Bush attack video…
  3. 3. … and you get 11,100 results.
  4. 4. … after 40 minutes... watching the videos listed on the first page you notice > 50% are similar, i.e. NDVC 27% in average [Wu et al., 2007]
  5. 5. NDVC technical definition <ul><li>Identical or approximately identical videos, that differ in some feature: </li></ul><ul><ul><li>file formats, encoding parameters </li></ul></ul><ul><ul><li>photometric variations (color, lighting changes ) </li></ul></ul><ul><ul><li>overlays ( caption, logo, audio commentary ) </li></ul></ul><ul><ul><li>editing operations ( frames add/remove ) </li></ul></ul><ul><ul><li>semantic similarity </li></ul></ul>NDVC are videos that are “ essentially the same ”
  6. 6. … like this
  7. 7. Two challenges: <ul><li>There is no agreement on a single definition of NDVC </li></ul><ul><li>NDVC are mostly considered as redundant content that has to be removed from the system </li></ul>
  8. 8. Human Perception of Mauro Cherubini Rodrigo de Oliveira Nuria Oliver Near Duplicate Videos
  9. 9. What kind of NDVC? Malicious (i.e., spam produced by a single user) Copyright infringement (e.g., pirated music videos) User-edited content : videos that complement the original material with additional information
  10. 10. Recently NDVC detection algorithm
  11. 11. Recently NDVC detection algorithm
  12. 12. Why not? NDVC detection algorithm ?
  13. 13. Methodology <ul><li>2 large-scale online surveys ( n= 1003) </li></ul><ul><li>7 pairs of NDVC (differing in 1 feature) </li></ul><ul><li>Subjects were asked about: </li></ul><ul><ul><li>Similarity </li></ul></ul><ul><ul><li>Preference </li></ul></ul>
  14. 14. NDVC technical definition <ul><li>Identical or approximately identical videos, that differ in some features: </li></ul><ul><ul><li>photometric variations (color, lighting changes ) </li></ul></ul><ul><ul><li>overlays ( caption, logo, audio commentary ) </li></ul></ul><ul><ul><li>editing operations ( frames add/remove ) </li></ul></ul><ul><ul><li>And … </li></ul></ul><ul><ul><li>semantic similarity </li></ul></ul><ul><ul><li>(e.g., two deer grazing grass in two different forests) </li></ul></ul>
  15. 15. Audio Quality NDVC Preference Stereo, 44 Khz Mono, 11 Khz
  16. 16. Image Quality NDVC Preference
  17. 17. Audio content (overlay) Preference NDVC
  18. 18. Visual + audio content (length) Preference Not NDVC
  19. 19. Visual content (editing) Not NDVC Want both
  20. 20. Similar semantics , different videos (similar visual info) NDVC Want both
  21. 21. Similar semantics , different videos (similar audio info) Not NDVC Preference
  22. 22. Implications for Design <ul><li>User-centric NDVC definition </li></ul>NDVC are approximately identical videos that might differ in audio/image quality , or overlays . Conversely, identical videos with relevant complementary information (changing clip length or scenes) are not considered as NDVC. Furthermore, users perceive as near-duplicate videos that are not alike but that are visually similar and semantically related .
  23. 23. Implications for Design <ul><li>Clustering </li></ul><ul><ul><li>Groups sharing video, audio, semantic content </li></ul></ul><ul><ul><li>Ranking based on user-submitted query </li></ul></ul><ul><ul><li>Highlight the most representative </li></ul></ul>
  24. 24. Implications for Design <ul><li>Feature and user adaptation </li></ul><ul><ul><li>Boost ranking based on general observations </li></ul></ul><ul><ul><ul><li>More content </li></ul></ul></ul><ul><ul><ul><li>Better image/audio quality </li></ul></ul></ul><ul><ul><ul><li>… </li></ul></ul></ul><ul><ul><li>Boost ranking based on personalization </li></ul></ul><ul><ul><ul><li>Abilities (e.g., auditory skills) </li></ul></ul></ul><ul><ul><ul><li>Task (e.g., video producer vs. movie enthusiastic) </li></ul></ul></ul><ul><ul><ul><li>Search query </li></ul></ul></ul>
  25. 25. Future Work <ul><li>NDVC’s differing in more than 1 low-level feature </li></ul><ul><li>Propose ways to visualize the NDVCs </li></ul><ul><li>Study effects of user’s goals while searching videos </li></ul>
  26. 26. A Human-Centric stance in Multimedia research Biomimetics Crowdsourcing Psychophysical experiments
  27. 27. Thank you! Mauro Cherubini Rodrigo de Oliveira Nuria Oliver [email_address] [email_address] [email_address]