One of These Things is Not Like the Other:      Crowdsourcing Semantic Similarity of Multimedia Files                     ...
Upcoming SlideShare
Loading in …5
×

ICT.OPEN2012 - One of These Things is Not Like the Other: Crowdsourcing Semantic Similarity of Multimedia Files

165 views

Published on

Poster version of earlier work, presented at ICT.OPEN 2012.

Original paper:
Discovering User Perceptions of Semantic Similarity in Near-duplicate Multimedia Files in Near-duplicate Multimedia Files. In Proc. of 1st International Workshop on Crowdsourcing Web Search, Lyon, France, April 17, 2012, CEUR-WS.org. Available online: http://msp.ewi.tudelft.nl/sites/default/files/crowdsearch2012-vliegendhart.pdf.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
165
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ICT.OPEN2012 - One of These Things is Not Like the Other: Crowdsourcing Semantic Similarity of Multimedia Files

  1. 1. One of These Things is Not Like the Other: Crowdsourcing Semantic Similarity of Multimedia Files Raynor Vliegendhart *, Martha Larson *, and Johan Pouwelse** Multimedia Information Retrieval Lab* Parallel and Distributed Systems Group** Delft University of Technology Delft University of Technology Problem HIT Design ● Problem: What constitutes a near duplicate? Amazon Mechanical Turk (AMT) is a crowdsourcing platform For example: Are these two files the same? Why (not)? to which Human Intelligence Tasks (HITs) can be submitted. Phrasing in our HIT is important in order to elicit serious judgments: ● “Imagine that you download the three items in the list and that you view them.” Chrono Cross - Dream of the Chrono Cross Dream of the Shore Near Another World Shore Near Another World Harry Potter and the Sorcerers Stone Audio Book (478 MB) Violin/Piano Cover Violin and Piano Harry Potter and the Sorcerer s Stone (YouTube: IQYNEj51EUI) (YouTube: Iuh3YrJtK3M) (2001)(ENG GER NL) 2Lions- (4.36 GB) Harry Potter.And.The.Sorcerer.Stone.DVDR. Yes: It’s the same song. NTSC.SKJACK.Universal.S (4.46 GB) No: These are different performances by different performers. Definition: ● Don’t force workers to make a contrast, and Functional near-duplicate multimedia items are items that fulfill the ● Explain the definition of functional similarity. same purpose for the user. Once the user has one of these items, there is no additional need for another. o The items are comparable. They are for all practical purposes the same. Someone would never really need all three of these. ● Task: Discovering new notions of user-perceived similarity between o Each item can be considered unique. I can imagine that someone multimedia files in a file-sharing setting. might really want to download all three of these items. ● Motivation: Clustering items in search results. o One item is not like the other two. (Please mark that item in the list.) The other two items are comparable. Experiments ● Dataset: ● Popular file-sharing site: The Pirate Bay (thepiratebay.se). Screenshots from Tribler 5.4 (tribler.org) ● 75 queries derived from Top 100 list. ● 32,773 filenames and metadata. Approach ● 1000 random triads sampled from search results. ● Crowdsourcing Experiment: ● Idea: Point the odd one out, inspired by Sesame Street’s “one of these things is not like the other”. ● Recruitment HIT and Main HIT run concurrently on AMT. ● 8 out of 14 qualified workers produced free-text judgments for 308 triads within 36 hours. ● Card Sort: ● Group similar judgments into piles, merge piles iteratively, and, finally label each pile. ● End result: 44 user-perceived dimensions of similarity discovered. ● Crowdsourcing Task: ● 3 multimedia files displayed as search results Conclusion ● Worker points the odd one out and justifies why. ● Wealth of user-perceived dimensions of similarity discovered. ● Challenge: Eliciting serious judgments ● Quick results due to interesting crowdsourcing task.Contact: R.Vliegendhart@tudelft.nl ICT.OPEN 2012, Rotterdam, The Netherlands, 2012 @ShinNoNoir

×