Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Struggling and Success in Web Search @ CIKM2015

1,574 views

Published on

Slides for our CIKM2015 paper that received the best student paper award. This paper is the result of my internship at Microsoft Research in Redmond. In this joint work with Ryen White, Ahmed Hassan Awadallah and Susan Dumais, we investigate why some web searchers succeed where others struggle.
PDF: http://hdl.handle.net/11245/2.163418

Web searchers sometimes struggle to find relevant information. Struggling leads to frustrating and dissatisfying search experiences, even if searchers ultimately meet their search objectives. Better understanding of search tasks where people struggle is important in improving search systems. We address this important issue using a mixed methods study using large-scale logs, crowd-sourced labeling, and predictive modeling. We analyze anonymized search logs from the Microsoft Bing Web search engine to characterize aspects of struggling searches and better explain the relationship between struggling and search success. To broaden our understanding of the struggling process beyond the behavioral signals in log data, we develop and utilize a crowd-sourced labeling methodology. We collect third-party judgments about why searchers appear to struggle and, if appropriate, where in the search task it became clear to the judges that searches would succeed (i.e., the pivotal query). We use our findings to propose ways in which systems can help searchers reduce struggling. Key components of such support are algorithms that accurately predict the nature of future actions and their anticipated impact on search outcomes. Our findings have implications for the design of search systems that help searchers struggle less and succeed more.

Daan Odijk, Ryen W. White, Ahmed Hassan Awadallah and Susan T. Dumais. Struggling and Success in Web Search. In CIKM ’15, 2015.

BibTeX: @inproceedings{odijk2015struggling,
Author = {Odijk, Daan and White, Ryen W. and Hassan Awadallah, Ahmed and Dumais, Susan T.},
Booktitle = {CIKM 2015: 24th ACM International Conference on Information and Knowledge Management},
Month = {October},
Publisher = {ACM},
Title = {Struggling and Success in Web Search},
Year = {2015}}

Published in: Science
  • Be the first to comment

  • Be the first to like this

Struggling and Success in Web Search @ CIKM2015

  1. 1. Struggling and Success in Web Search Daan Odijk, University of Amsterdam,The Netherlands Ryen W.White,Ahmed Hassan Awadallah & Susan T. Dumais
 Microsoft Research, Redmond,WA, USA
  2. 2. Motivation • Long web search sessions are common • Half of web search sessions contain multiple queries [Shokouhi et al., SIGIR’13] • 40% of sessions take 3+ minutes [Dumais, NSF'13] • Account for most of search time • What are searchers doing? Struggling or Exploring.
 [Hassan et al., WSDM’14] • Struggling in 60% of long sessions, often successful
  3. 3. Motivation • Long web search sessions are common • Half of web search sessions contain multiple queries [Shokouhi et al., SIGIR’13] • 40% of sessions take 3+ minutes [Dumais, NSF'13] • Account for most of search time • What are searchers doing? Struggling or Exploring.
 [Hassan et al., WSDM’14] • Struggling in 60% of long sessions, often successful remaining 36% as struggling sessions. In total, the data se sisted of 3000 sessions with 17,117 queries, 13,168 distinct qu and 13,780 result clicks. We also asked the judges to assess the success of each session the following labels: Successful: Sessions where searchers were able to loca required information. Partially Successful: Sessions were searchers failed to some of the required information. Unsuccessful: Sessions where searchers failed to loca required information. Note that struggling is a characterization of the search p while success is a characterization of its outcome. Hence it i sible for a user who has difficulty locating the required inform (struggling) to end up locating it (success). It is also possibl user to fail in locating the required information without stru (e.g., submit a single unsuccessful query then give up). The distribution of success labels across session types is sho Figure 3. The figure shows that most of the exploring sessio successful (more than 75%) or partially successful (more 20%). This agrees with our definition of exploring sessions, are open-ended and multi-faceted in nature. Hence, failing to the required information is likely to prevent exploring early Figure 2. Session type distribution as labeled by judges. 40% 23% 36% 1% Exploring Exploring with Struggle Struggling Cannot Judge 0% 20% 40% 60% 80% Exploring Exploring with Struggle Struggling %sessionswithlabel Successful Partially Successful Unsuccessful 3. Filter short sessions: After removing navigational queries and segmenting the logs into topically coherent sessions, we exclude sessions with less than three unique queries since these are unlikely to be exploring or struggling sessions. We applied these criteria to identify long topically-coherent ses- sions because there are sessions in which users are likely to show exploring or struggling behavior. There were many thousands of such sessions in our data. We sampled 3000 of them and instructed external human judges to examine each session, try to understand the user’s experience, and identify the reason for the observed be- havior. We now describe the process by which the labels were col- lected from external judges. Figure 2. Session type distribution as labeled by judges. Figure 3. Session success distribution as labeled by judges. 40% 23% 36% Exploring Exploring with Struggle Struggling Cannot Judge 0% 20% 40% 60% 80% Exploring Exploring with Struggle Struggling %sessionswithlabel Successful Partially Successful Unsuccessful
  4. 4. 9:13:11 AM ! Query us open 9:13:24 AM ! Query us open golf 9:13:36 AM ! Query us open golf 2013 live 9:13:59 AM ! Query watch us open live streaming 9:14:02 AM " Click http://inquisitr.com/1300340/watch-2014-u-s- open-live-online-final-round-free-streaming-video 9:31:55 AM END Example of Struggling Logged session from June 2014
  5. 5. 9:13:11 AM ! Query us open 9:13:24 AM ! Query us open golf 9:13:36 AM ! Query us open golf 2013 live 9:13:59 AM ! Query watch us open live streaming 9:14:02 AM " Click http://inquisitr.com/1300340/watch-2014-u-s- open-live-online-final-round-free-streaming-video 9:31:55 AM END Example of Struggling Logged session from June 2014 Pivotal Query
  6. 6. 9:13:11 AM ! Query us open 9:13:24 AM ! Query us open golf 9:13:36 AM ! Query us open golf 2013 live 9:13:59 AM ! Query watch us open live streaming 9:14:02 AM " Click http://inquisitr.com/1300340/watch-2014-u-s- open-live-online-final-round-free-streaming-video 9:31:55 AM END Example of Struggling Logged session from June 2014 How do searchers go from struggle to success? Can we help them struggle less? Pivotal Query How & why do searchers struggle?
  7. 7. Characterizing Struggling Annotating Query Transitions Predicting Future Actions Outline Struggling searchers behave differently given different outcomes on many search aspects, including:
 queries, reformulations, clicks, dwell time & topic.
  8. 8. June 1–7, 2014 1. Filter sessions: US & English, start with a typed query 2. Segment into tasks: Topically coherent sub-sessions with at least 3 queries 3. Filter struggling tasks [Hassan et al., WSDM’14] 4. Partition based on final clicks First Query Second Query Last Query 2,937,450 Successful Tasks 4,508,821 Unsuccessful Tasks … No click or dwell time < 10s Dwell time > 30s or end of task Commonly used proxy for success. Identifying Struggling Sessions
  9. 9. Query Grouping First Query Query Last Query … All intermediate queries
  10. 10. Queries become longer 1 2 3 4 5 6 7 8+ Length of fLUst queUy 0% 10% 20% 30% 6uccessful 1 2 3 4 5 6 7 8+ Length of otheU queULes 1 2 3 4 5 6 7 8+ Length of last queUy 8nsuccessful First query short: 3.35 vs 3.23 terms Longer queries: 4.29 vs 4.04 terms Larger difference: 4.29 vs 3.93 terms 100% 50% 0% 4ueUy SeUceQtile foU fiUst queUy 0% 10% 20% 30% 40% Successful 100% 50% 0% 4ueUy SeUceQtile foU iQteUmediate queUies UQsuccessful 100% 50% 0% 4ueUy SeUceQtile foU fiUst queUy 0% 10% 20% 30% 40% Successful Length of intermediate queriesLength of first query Length of last query Shorter
  11. 11. Query Reformulation 0% 25% 50% sSeFialize generalize substitute sSelling baFN new From Iirst Tuery 0% 25% 50% To Iinal Tuery ComSared to aYerage AboYe %elow 0% 25% 50% Intermediate SuFFessIul No Yes New Shares no terms with previous queries. Back Exact repeat of a previous query in the task. Spelling Changed the spelling, e.g. fixing a typo. Substitution Replaced a single term with another term. Generalization Removed a term from the query. Specialization Added a term to the query. [Hassan, EMNLP’13] 0% 25% 50% To Iinal Tuery ComSared to aYerage AboYe %elow 25% 50% ermediate SuFFessIul No Yes Iinal Tuery ComSared to aYerage AboYe %elow
  12. 12. Query Reformulation 0% 25% 50% sSeFialize generalize substitute sSelling baFN new From Iirst Tuery 0% 25% 50% To Iinal Tuery ComSared to aYerage AboYe %elow 0% 25% 50% Intermediate SuFFessIul No Yes New Shares no terms with previous queries. Back Exact repeat of a previous query in the task. Spelling Changed the spelling, e.g. fixing a typo. Substitution Replaced a single term with another term. Generalization Removed a term from the query. Specialization Added a term to the query. [Hassan, EMNLP’13] 0% 25% 50% To Iinal Tuery ComSared to aYerage AboYe %elow 25% 50% ermediate SuFFessIul No Yes Iinal Tuery ComSared to aYerage AboYe %elow
  13. 13. Characterizing Struggling Annotating Query Transitions Predicting Future Actions Outline Struggling searchers in successful vs unsuccessful tasks issue fewer and shorter queries. Query reformulations indicate more trouble choosing correct vocabulary.
  14. 14. Characterizing Struggling Annotating Query Transitions Predicting Future Actions Outline Struggling searchers in successful vs unsuccessful tasks issue fewer and shorter queries. Query reformulations indicate more trouble choosing correct vocabulary. Crowd-sourcing to better understand: connection between struggling and success & query transitions. 
 First an exploratory pilot, then main annotations.
  15. 15. • Validate partitioning on success based on final clicks • Understanding struggling based on open-ended questions: • Could you describe how the user in the more successful task eventually succeeded in becoming successful? • Why was the user in the less successful task struggling to find the information they were looking for? • What might you do differently if you were the user in the less successful task? Exploratory Pilot: 175 task pairs with identical initial queriesAnnotating pairs of tasks
  16. 16. Exploratory Pilot: 175 task pairs with identical initial queries • For 35 initial queries with 40%-60% successful vs not: • Five task pairs successful-unsuccessful for each query • 175 task pairs annotated First Query Second Query Last Query 2,937,450 Successful 4,508,821 Unsuccessful …
  17. 17. Exploratory Pilot 0% 20% 40% 60% 80% 100% 142Agreement 33 0% 20% 40% 60% 80% 100% 973icked 6uccessful 9 36 Last Query Reasonable proxy for succes Between three judges Which task was more successful? Which task was more successful?
  18. 18. Intent-based query reformulation taxonomy Added, removed or substituted ☐ an action (e.g., download, contact) ☐ an attribute (e.g., printable, free, ) Specified ☐ a particular instance
 (e.g. added a brand name or version number) Rephrased ☐ Corrected a spelling error or typo ☐ Used a synonym or related term Switched ☐ to a related task (changed main focus) ☐ to a new task Based on the answers to the open-ended questions
  19. 19. Main Annotations: Transitions in 659 successful tasks • Annotated 659 new successful tasks • Single tasks, no pairs • Annotated transitions between each query pair First Query Second Query Last Query 2,937,450 Successful Tasks …
  20. 20. Success & Pivotal Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Pivotal Query 0 50 100 150 200 Numberoftasks Continued Final Query 19% 40% 39% not at all somewhat mostly completely Task Success Made at least some progress 62% Location of the pivotal query
  21. 21. Query Transitions Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery
  22. 22. Query Transitions Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery
  23. 23. Query Transitions Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery
  24. 24. Query Transitions Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery
  25. 25. Query Transitions Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery 5emRved ActiRn CRrrected 7ySR 6ubstituted ActiRn 6witched tR 1ew 5eShrased w/6ynRnym Added ActiRn 5emRved Attribute 6witched tR 5elated 6ubstituted Attribute 6SeciIied Instance Added Attribute 25% 50% 25% 11% 21% 53% 26% 6% 19% 50% 27% 15% 46% 29% 25% 13% 25% 36% 36% 13% 19% 41% 39% 8% 12% 54% 34% 12% 22% 40% 33% 10% 16% 47% 34% 6% 30% 32% 36% 12% 14% 45% 40% 11% nRt at all sRmewhat mRstly cRmSletely | SivRtal
  26. 26. Query Transitions Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery 5emRved ActiRn CRrrected 7ySR 6ubstituted ActiRn 6witched tR 1ew 5eShrased w/6ynRnym Added ActiRn 5emRved Attribute 6witched tR 5elated 6ubstituted Attribute 6SeciIied Instance Added Attribute 25% 50% 25% 11% 21% 53% 26% 6% 19% 50% 27% 15% 46% 29% 25% 13% 25% 36% 36% 13% 19% 41% 39% 8% 12% 54% 34% 12% 22% 40% 33% 10% 16% 47% 34% 6% 30% 32% 36% 12% 14% 45% 40% 11% nRt at all sRmewhat mRstly cRmSletely | SivRtal
  27. 27. Query Transitions Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery 5emRved ActiRn CRrrected 7ySR 6ubstituted ActiRn 6witched tR 1ew 5eShrased w/6ynRnym Added ActiRn 5emRved Attribute 6witched tR 5elated 6ubstituted Attribute 6SeciIied Instance Added Attribute 25% 50% 25% 11% 21% 53% 26% 6% 19% 50% 27% 15% 46% 29% 25% 13% 25% 36% 36% 13% 19% 41% 39% 8% 12% 54% 34% 12% 22% 40% 33% 10% 16% 47% 34% 6% 30% 32% 36% 12% 14% 45% 40% 11% nRt at all sRmewhat mRstly cRmSletely | SivRtal
  28. 28. Characterizing Struggling Annotating Query Transitions Predicting Future Actions Outline Substantial differences in how searchers refine queries in different stages in a struggling task. Strong connections with task outcomes, and particular pivotal queries. Struggling searchers, successful or not, behave differently on many search aspects, including:
 queries, reformulations, clicks, dwell time & topic.
  29. 29. Predicting Reformulation What query reformulation strategy will a user employ? Added Attribute 6SeFiIied IQstaQFe 6ubstituted Attribute 6witFhed tR 5eOated 5emRved Attribute 5eShrased w/6yQRQym Added AFtiRQ 6witFhed tR 1ew 6ubstituted AFtiRQ CRrreFted 7ySR 5emRved AFtiRQ 465 374 203 179 99 91 80 46 41 34 18 4uery 7raQsitiRQ FrRm Iirst Tuery 2thers 7R IiQaO Tuery First Query Second Query Tailor query suggestions Tailor auto- completions Re-rank results or suggestions
  30. 30. Feature Sets +History Features Query Features Interaction Features Transition Features First Query Second Query Tailor query suggestions Tailor auto- completions Re-rank results or suggestions
  31. 31. 0% 15% 30% 45% 60% Baseline Q1 +Interaction Q2 Baseline Q2 +Interaction Q3 10% 20% Q1 to Q2 Q2 to Q3 0% 15% 30% 45% 60% Baseline Q1 +Interaction Q2 Baseline Q2 +Interaction Q3 Q1 to Q2 Q2 to Q3 53% 48% 46% 39% 34% 30% 20% F1 Majority Prediction Results Q1 Q2 Q2 Q3
  32. 32. Characterizing Struggling Annotating Query Transitions Predicting Future Actions Outline Accurately predicting reformulation strategy Provide situation-specific support at a higher level Substantial differences in how searchers behave in a struggling task, depending on task outcomes.
  33. 33. Limitations • Very specific and apparent type of struggling • Determination of search success • Final click as a proxy • Judgements by third-party, not by searchers themselves
  34. 34. Future Work • Directly applying reformulation strategies • Query suggestions & auto-completions • Mining query ➣ pivotal query pairs • Can identify automatically: F1 59% 
 (final query baseline: 51%) • Hints and tips on reformulation strategies

×