Query/Task Satisfaction and Grid-based Evaluation Metrics Under Different Image Search Intents (SIGIR 2020)
Kosetsu Tsukuda and Masataka Goto
National Institute of Advanced Industrial Science and Technology (AIST)
Japan
ACM SIGIR 2020
Query/Task Satisfaction and
Grid-based Evaluation Metrics
Under Different Image Search Intents
Search intent in image search 2
Intent: Intent:
I want to see photos of
my favorite actor Tom Cruise.
I want to learn
what Jupiter looks like.
Jupiter Tom Cruise
People use web image search with various search intents:
from serious demands for study or work to just passing time
Learn Entertain
Research goal 3
Investigate the influence of user’s intent on
query/task satisfaction and grid-based evaluation metrics
Intent
Query/task satisfaction Grid-based evaluation metrics
Query/task satisfaction 5
I want to learn about Jupiter.
Jupiter Jupiter’s satellite Jupiter europa
Query sat.
t
Task sat.
Under a search intent, a user addresses a specific image search task
A task consists of one or more queries submitted by the user
The user gains satisfaction for each query and the task
Relationship between intents, query satisfaction, and task satisfaction 6
Task difficulty would vary according to user’s search intent
Query satisfaction would influence the task satisfaction
Learn Entertain
Task sat. Task sat.
RQ1 7
What are the characteristics of the query satisfaction
and the task satisfaction and what is the relationship
between them under different image search intents?
Answering this question enables us to
understand user behavior at a deeper level
reveal an appropriate approach to support the users according to their intent
Publicly available dataset 8
Dataset developed through a field study [Wu et al. WSDM’19]
29 users, 447 tasks, and 1,758 queries
A user provided a 5-level satisfaction feedback for each query and task
Assessors annotated 1 intent from each of 4 taxonomies to a task
Taxonomy 1
Locate
Learn
Entertain
Taxonomy 2
Work&Study
Daily-life
Taxonomy 3
Specific
General
Taxonomy 4
Mental Image
Navigation
Analysis 9
Number of unsatisfied queries before the first satisfied query
Influence of query satisfaction on task satisfaction
2 unsatisfied queries The first satisfied query
Avg of query sat.
Max of query sat.
Task sat.
Avg/Max of query sat.
Tasksat.
Avg
Max
Learn
Take-home messages for RQ1 10
Users who have more demanding intents (Learn and Work&Study)
tend to have low query/task satisfaction
Because such users struggle to get satisfied results by the first query
in a task, helping them to submit their first query in a task is one
possible way to increase their satisfaction
For users who want to learn something and look for general information
rather than specific one, submitting many satisfied queries contributes
to increase the task satisfaction
Therefore, it is beneficial to support such user’s search process
even after they found a desired image
Grid-based evaluation metric 12
Jupiter
Xie et al. proposed a grid-based evaluation metric for image search [TheWebConf’19]
The metric considers “middle bias,” which indicates that users tend to pay
more attention to images in the middle horizontal position on the SERP
The metric is implemented by expanding an evaluation metric for
general web search such as RBP (Rank-Biased Precision)
Middle bias
RBP
RBP-MB
+ Middle bias (MB)
�𝑀𝑀 = �
𝑖𝑖=0
∞
�
𝑗𝑗=0
𝑖𝑖−1
𝐶𝐶𝑗𝑗 1 − 𝐶𝐶𝑖𝑖 �
𝑗𝑗=0
𝑖𝑖
𝑅𝑅𝑗𝑗
𝑀𝑀𝑀𝑀𝑀𝑀 = �
𝑖𝑖=0
∞
�
𝑗𝑗=0
𝑖𝑖−1
𝑓𝑓 𝑐𝑐 𝑖𝑖 𝐶𝐶𝑗𝑗 1 − 𝐶𝐶𝑖𝑖 �
𝑗𝑗=0
𝑖𝑖
𝑅𝑅𝑗𝑗
RQ2 13
How do image search intents affect the
performance of the grid-based evaluation metric?
Answering this question is beneficial for improving the evaluation metric
Jupiter
Analysis 14
Dataset includes relevance scores for each pair of a query and an image
We can compute RBP/RBP-MB for each query
We compute Peason’s Correlation between a metric and query satisfaction
A good evaluation metric is highly correlated with query satisfaction
RBP
RBP-MB
Query sat.
RBP/RBP-MB
Querysat.
RBP
RBP-MB
Learn
0.324
0.479
49 92 88 37
66 71 90 60
5673 81 91
Take-home messages for RQ2 15
When users want to learn something or find images for daily life, or when
users know how the image content looks like before submitting a query,
it is effective to incorporate the middle bias behavior into the evaluation metric
For other intents, there is still room for improvement in evaluation metric by,
for example, developing intent-aware metrics.
Intent RBP RBP-MB
Locate 0.304 0.304
Learn 0.401 0.429*
Entertain 0.360 0.379
Work&Study 0.299 0.299
Daily-life 0.334 0.372*
Specific 0.389 0.399
General 0.272 0.286
Mental Image 0.377 0.411*
Navigation 0.314 0.319
Pearson’s Correlation between evaluation metrics and query satisfaction (*: 𝑝𝑝 < 0.01)