More Related Content

Similar to Query/Task Satisfaction and Grid-based Evaluation Metrics Under Different Image Search Intents (SIGIR 2020)(20)


More from Kosetsu Tsukuda(20)


Query/Task Satisfaction and Grid-based Evaluation Metrics Under Different Image Search Intents (SIGIR 2020)

  1. Kosetsu Tsukuda and Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST) Japan ACM SIGIR 2020 Query/Task Satisfaction and Grid-based Evaluation Metrics Under Different Image Search Intents
  2. Search intent in image search 2 Intent: Intent: I want to see photos of my favorite actor Tom Cruise. I want to learn what Jupiter looks like. Jupiter Tom Cruise People use web image search with various search intents: from serious demands for study or work to just passing time Learn Entertain
  3. Research goal 3 Investigate the influence of user’s intent on query/task satisfaction and grid-based evaluation metrics Intent Query/task satisfaction Grid-based evaluation metrics
  4. Query/task satisfaction
  5. Query/task satisfaction 5 I want to learn about Jupiter. Jupiter Jupiter’s satellite Jupiter europa Query sat. t Task sat.  Under a search intent, a user addresses a specific image search task  A task consists of one or more queries submitted by the user  The user gains satisfaction for each query and the task
  6. Relationship between intents, query satisfaction, and task satisfaction 6  Task difficulty would vary according to user’s search intent  Query satisfaction would influence the task satisfaction Learn Entertain Task sat. Task sat.
  7. RQ1 7 What are the characteristics of the query satisfaction and the task satisfaction and what is the relationship between them under different image search intents? Answering this question enables us to  understand user behavior at a deeper level  reveal an appropriate approach to support the users according to their intent
  8. Publicly available dataset 8  Dataset developed through a field study [Wu et al. WSDM’19]  29 users, 447 tasks, and 1,758 queries  A user provided a 5-level satisfaction feedback for each query and task  Assessors annotated 1 intent from each of 4 taxonomies to a task Taxonomy 1 Locate Learn Entertain Taxonomy 2 Work&Study Daily-life Taxonomy 3 Specific General Taxonomy 4 Mental Image Navigation
  9. Analysis 9  Number of unsatisfied queries before the first satisfied query  Influence of query satisfaction on task satisfaction 2 unsatisfied queries The first satisfied query Avg of query sat. Max of query sat. Task sat. Avg/Max of query sat. Tasksat. Avg Max Learn
  10. Take-home messages for RQ1 10  Users who have more demanding intents (Learn and Work&Study) tend to have low query/task satisfaction  Because such users struggle to get satisfied results by the first query in a task, helping them to submit their first query in a task is one possible way to increase their satisfaction  For users who want to learn something and look for general information rather than specific one, submitting many satisfied queries contributes to increase the task satisfaction  Therefore, it is beneficial to support such user’s search process even after they found a desired image
  11. Grid-based evaluation metric
  12. Grid-based evaluation metric 12 Jupiter  Xie et al. proposed a grid-based evaluation metric for image search [TheWebConf’19]  The metric considers “middle bias,” which indicates that users tend to pay more attention to images in the middle horizontal position on the SERP  The metric is implemented by expanding an evaluation metric for general web search such as RBP (Rank-Biased Precision) Middle bias RBP RBP-MB + Middle bias (MB) �𝑀𝑀 = � 𝑖𝑖=0 ∞ � 𝑗𝑗=0 𝑖𝑖−1 𝐶𝐶𝑗𝑗 1 − 𝐶𝐶𝑖𝑖 � 𝑗𝑗=0 𝑖𝑖 𝑅𝑅𝑗𝑗 𝑀𝑀𝑀𝑀𝑀𝑀 = � 𝑖𝑖=0 ∞ � 𝑗𝑗=0 𝑖𝑖−1 𝑓𝑓 𝑐𝑐 𝑖𝑖 𝐶𝐶𝑗𝑗 1 − 𝐶𝐶𝑖𝑖 � 𝑗𝑗=0 𝑖𝑖 𝑅𝑅𝑗𝑗
  13. RQ2 13 How do image search intents affect the performance of the grid-based evaluation metric? Answering this question is beneficial for improving the evaluation metric
  14. Jupiter Analysis 14  Dataset includes relevance scores for each pair of a query and an image  We can compute RBP/RBP-MB for each query  We compute Peason’s Correlation between a metric and query satisfaction  A good evaluation metric is highly correlated with query satisfaction RBP RBP-MB Query sat. RBP/RBP-MB Querysat. RBP RBP-MB Learn 0.324 0.479 49 92 88 37 66 71 90 60 5673 81 91
  15. Take-home messages for RQ2 15  When users want to learn something or find images for daily life, or when users know how the image content looks like before submitting a query, it is effective to incorporate the middle bias behavior into the evaluation metric  For other intents, there is still room for improvement in evaluation metric by, for example, developing intent-aware metrics. Intent RBP RBP-MB Locate 0.304 0.304 Learn 0.401 0.429* Entertain 0.360 0.379 Work&Study 0.299 0.299 Daily-life 0.334 0.372* Specific 0.389 0.399 General 0.272 0.286 Mental Image 0.377 0.411* Navigation 0.314 0.319 Pearson’s Correlation between evaluation metrics and query satisfaction (*: 𝑝𝑝 < 0.01)