Be the first to like this
We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focus on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach is used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing
this test collection using the Amazon Mechanical Turk platform.We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.