Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning by example: training users through high-quality query suggestions

720 views

Published on

A presentation given at UvA in September 2015, discussing joint work with Morgan Harvey and David Elsweiler.
Full paper: http://dl.acm.org/citation.cfm?id=2767731

Published in: Science
  • Be the first to comment

Learning by example: training users through high-quality query suggestions

  1. 1. Learning by Example: training users through high-quality query suggestions (SIGIR’15) A collaboration with Morgan Harvey & David Elsweiler. Claudia Hauff Web Information Systems
  2. 2. 0 50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 300,000,000 350,000,000 Sep*12 Apr*13 Oct*13 May*14 Dec*14 Jun*15 Jan*16 Data available at https://duckduckgo.com/traffic.html NSA collecting phone records of millions of Verizon customers daily. The Guardian. June 6, 2013. Not everyone stays around.
  3. 3. I do care about privacy … until the moment my searches fail me. @flickr:eviloars Can we teach searchers to use an arbitrary search engine as best as possible?
  4. 4. @flickr:practicalowl Advanced retrieval algorithms; queries as a given. Assisting users in creating better queries. query suggestions related searches query autocompletion Personalised & context-driven search. Educate users to become better searchers.Educate users to become better searchers. complimentary to technical solutions system specific
  5. 5. • Altering the size [Franzen & Karlgren, 2000] and wording [Belkin et al., 2003] of the search box influences the length of submitted queries • Exchanging a complex multi-field catalogue interface for a simple search box radically alters user behaviour [McKay & Buchanan, 2013] • Training users how to construct boolean logic queries can change search behaviour [Lucas & Topi, 2004] • Allowing users to compare their search behaviour to expert searchers enables them to reflect and change their habits [Bateman et al., 2012] deeper in the results list [6]. Behaviour change support systems “… information systems designed to form, alter, or reinforce attitudes or behaviours or both without using coercion or deception” [Oinas-Kukkonen & Harjumaa, 2008]
  6. 6. We created zing
  7. 7. Our questions Are users able to notice differences between good queries and their own? Can they abstract these differences to change their own behaviour? How effectively can users learn and abstract from good queries? Do users who are “trained” perform better than users who did not receive training? @flickr:eviloars
  8. 8. Our hypotheses @flickr:carbonnyc H1: Users can adapt their querying behaviour to pose good queries to an unfamiliar search system. H3: A small number of “training queries” are sufficient. H4: A user who receives training with queries he can relate to, learns better than a user who receives training with less-relatable queries. H5: A user who receives training with queries he can relate to, learns faster than a user who receives training with less-relatable queries. H2: Users are able to identify salient characteristics of good queries.
  9. 9. A collection of user studies Piloting zing User perception of high-quality queries Main study: zing Training size study Generating training queries All studies are based on AQUAINT and the TREC 2005 Robust track topics.
  10. 10. • Query quality is measured in Average Precision • The queries should intuitively make sense to humans (instead of relying on quirks in documents) • The queries should not be overly verbose or specific Generating high-quality queries I
  11. 11. for each TREC topic relevant documents 100 single-term queries AQUAINT Hand-crafted filtering rules to avoid unintuitive term selection. Generating high-quality queries II
  12. 12. for each TREC topic relevant documents AQUAINT AP-based query ranking top two-term queries Hand-crafted filtering rules to avoid unintuitive term selection. Generating high-quality queries II
  13. 13. for each TREC topic relevant documents AQUAINT AP-based query ranking 3x : top 100 queries up to length 4 Hand-crafted filtering rules to avoid unintuitive term selection. Generating high-quality queries II
  14. 14. Identify positive accomplishments of the Hubble telescope since it was launched in 1991. (303) Identify drugs used in the treatment of mental illness. (383) What is the status of The Three Gorges Project? (416) * universe astronomer faint hubble * infrared galaxies universe hubble * infrared stars universe hubble * antidepressant risk zoloft prozac * zoloft studies prozac * antidepressant effective zoloft * cofferdams damming generating 2009 * dam corporation phase 2009 * 2009 river construction Median AP across the 100 generated queries: 0.38 Generating high-quality queries III
  15. 15. A collection of user studies Piloting User perception of high-quality queries Main study: Training size study Generating training queries
  16. 16. You are given an information need and a query suggestion that has been derived for this information need. Rate the suggestion along four dimensions: knowledge, surprise, usage and relevance. Identify positive accomplishments of the Hubble telescope since it was launched in 1991. universe astronomer faint hubble Top 15 queries per topic. Hit: 10 tasks, 12 cents. 3 workers per task. task User perception I
  17. 17. 1 2 3 4 5 0 100 200 300 400 500 600 Rating Numberofratings How surprised were you? Not Very 1 2 3 4 5 0 200 400 600 800 Rating Numberofratings Would you use the suggestion? No Yes 1 2 3 4 5 0 200 400 600 800 Rating Numberofratings What will the quality of the search results be? Low High User perception II
  18. 18. 1 2 3 4 5 0 100 200 300 400 500 600 Rating Numberofratings How surprised were you? Not Very 1 2 3 4 5 0 200 400 600 800 Rating Numberofratings Would you use the suggestion? No Yes 1 2 3 4 5 0 200 400 600 800 Rating Numberofratings What will the quality of the search results be? Low High User perception II Indicates that our query generation approach is valid. Many of our suggestions are not very convincing. Expected search result quality is mostly average.
  19. 19. • Familiar topics tend to be of broad interest • Topics covering specific themes attract low knowledge ratings
 
 User perception III What factors contributed to the growth of consumer on-line shopping? (639) 3.0/5 Identify drugs used in the treatment of mental illness. (383) 2.89/5 What is the status of The Three Gorges Project? (416) 1.58/5
  20. 20. A collection of user studies Piloting zing User perception of high-quality queries Main study: Training size study Generating training queries
  21. 21. A closer look at zing How well am I doing? Suggestions (higher AP than user queries) after 2 initial queries. Relevant documents are marked by the system
  22. 22. Piloting • N=22 undergraduates • 10 medium difficulty topics • Randomized topic order • Reflection prompts When does fatigue set in? By topic 7, median AP≈0 Query characteristics 81 reflections encoded C1: Specific query terms C2: More general query terms C3: Queries not in topic description C4: Unexpected or surprising vocab. C5: Surprising non-use of vocab. C6: Terms the user was surprised at the usefulness of C7: Thinking creatively C8: Advanced vocabulary (rare) C9: Specialist vocabulary (rare) C10: Good combination of search terms C11: Synonyms and related concepts C12: Query requires specialist knowledgeUsers are able to identify salient characteristics of good queries.
  23. 23. A collection of user studies Piloting User perception of high-quality queries Main study: zing Training size study Generating training queries
  24. 24. • Between-group design, N=91 • 6 medium difficulty topics • Randomized topic order • Training & test phase Main study Group Gexp_high Trained on high-quality suggestions, that were also perceived as high quality. Group Gexp_low Trained on high-quality suggestions, that were perceived as low quality. Group Gcontrol No training at any stage. topic +suggestions topic +suggestions topictopic +suggestions topic +suggestions topic topic topic topictopic topic topic
  25. 25. Main study: query effectiveness Training topics Test topics Users who receive high-quality training suggestions perform better on average & achieve considerably higher max. AP scores.
  26. 26. Main study: query sequence effectiveness 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 Query sequence AveragePrecision Control Exp_High Exp_Low Average precision over sequences of queries on test topics. Each point represents the mean AP of all queries submitted as nth query. Gexp_high & Gexp_low significantly outperform Gcontrol. No significant differences observed between Gexp_high & Gexp_low.
  27. 27. A collection of user studies Piloting User perception of high-quality queries Main study: zing Training size study Generating training queries
  28. 28. Training size study • Between-group design, N=57 • Analogous setup to Main study 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 Query sequence AveragePrecision Control Exp_High Exp_Low Main study: 4 training & 2 test topics 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 Query sequence AveragePrecision Control Exp_High Exp_Low Now: 2 training & 4 test topics Less training yields fewer (but still stat. significant) improvements. Similarity between Gexp_high & Gexp_low remains stable.
  29. 29. Looking back at our hypotheses @flickr:carbonnyc H1: Users can adapt their querying behaviour to pose good queries to an unfamiliar search system. H3: A small number of “training queries” are sufficient. H4: A user who receives training with queries he can relate to, learns better than a user who receives training with less-relatable queries. H5: A user who receives training with queries he can relate to, learns faster than a user who receives training with less-relatable queries. H2: Users are able to identify salient characteristics of good queries.
  30. 30. • Learning is limited to a single session • Does the learning effect hold across sessions and over time? • How to translate this approach (requiring qrels) into settings where users are unwilling to train? • Are implicit relevance indicators sufficient? • What is the most efficient manner of presenting such “learning queries” to users? Looking ahead @flickr:
  31. 31. Ideas, comments & suggestions are more than welcome! Thank you. c.hauff@tudelft.nl

×