Aggregating search results from a variety of heterogeneous
sources, so-called verticals, such as news, image and video,
into a single interface is a popular paradigm in web search.
Current approaches that evaluate the effectiveness of aggregated
search systems are based on rewarding systems that
return highly relevant verticals for a given query, where this
relevance is assessed under different assumptions. It is difficult
to evaluate or compare those systems without fully
understanding the relationship between those underlying assumptions.
To address this, we present a formal analysis and
a set of extensive user studies to investigate the effects of various
assumptions made for assessing query vertical relevance.
A total of more than 20,000 assessments on 44 search tasks
across 11 verticals are collected through Amazon Mechanical
Turk and subsequently analysed. Our results provide
insights into various aspects of query vertical relevance and
allow us to explain in more depth as well as questioning the
evaluation results published in the literature.
Work with Ke (Adam) Zhou, Ronan Cummins and Joemon Jose.
Presented at WWW 2013, Rio de Janeiro.