Direct Answers for SearchQueries in the Long TailMichael Bernstein, Jaime Teevan, Susan Dumais,Dan Liebling, and Eric Horv...
Answers: Direct Search ResultsManually constructed for popular queriesweather boston
the girl with the dragon tattoo                                  good answers                                  reduce clic...
the girl with the dragon tattoo                                  only popular                                  query types...
Prevalence of Uncommon SearchesNo answers for many information needsmolasses substitutesincrease volume windows xpdissolva...
Tail AnswersDirect results for queries in the long tailmolasses substitutes
Tail AnswersDirect results for queries in the long tailmolasses substitutesTail Answers improve the searchexperience for l...
The Long Tail of Answers                weather                movies# occurrences                          Tail Answers  ...
Crowds in Tail AnswersCrowd Data                      Paid Crowdssearch logs                     on-demand75 million Bing ...
Crowds can support thelong tail of user goals ininteractive systems.Crowd Data      Paid Crowdssearch logs     on-demand
Tail Answers PipelineFind URLs that satisfy fact-findinginformation needs, then extract answersfrom those pages
Tail Answers Pipeline1   Find candidate information needs2   Filter candidates3   Write answers
Tail Answers Pipeline1   Identify answer candidates2   Filter candidates3   Extract answer content
Identify Answer CandidatesCrowd data: 75 million search sessionsAll information needs are answer candidates:queries leadin...
Abstraction: Search Trails[White, Bilenko, and Cucerzan 2007]URL path from query to 30min session timeoutquery            ...
Example Answer Candidates   force quit mac force quit on macs      URL1how to force quit mac    410 area code             ...
Tail Answers Pipeline1   Identify answer candidates2   Filter candidates3   Extract answer content
Filtering Answer CandidatesFocus on fact-finding information needs [Kellar 2007]Exclude popular but unanswerable candidates...
Filtering Answer CandidatesThree filters remove answer candidates thatdo not address fact-finding information needs:     Nav...
Filter by Navigation BehaviorDestination Probability for URL :P(session length = 2 | URL in trail)Probability of ending se...
Filter by Navigation BehaviorDestination Probability Filter:URLs with low probability that searchers willend their session...
Filter by Query BehaviorWhat answers are these searchers looking for?dissolvable stitches(how long they last?)(what they’r...
Filter by Query BehaviorA minority of searchers use question words:how long dissolvable stitches lastwhere is 732 area cod...
Filter by Answer TypeCan a concise answer address this need?Ask paid crowdsourcing workers to select:–  Short: phrase or s...
Creating Tail Answers1   Identify answer candidates2   Filter candidates3   Extract answer content
Extracting the Tail AnswerWe now have answer candidates with:    Factual responses    Succinct responsesHowever, the answe...
Crowdsourcing WorkflowReliably extract relevant answer from the URLvia paid crowdsourcing (CrowdFlower)       Extract      ...
Quality Challenge: OvergeneratingTypical extraction length:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabi...
Inclusion/Exclusion Gold StandardInclusion/exclusion lists test worker agreementwith a few annotated examples:      Text t...
Quality Challenge: OvergeneratingExtraction length with gold standards:Lorem ipsum dolor sit amet, consectetur adipiscing ...
Tail Answer Pipeline1   Identify answer candidates2   Filter candidates3   Extract answer content
75 million search trails   Median answer triggered once a day19,167 answer candidates   44 cents to create each answer
Evaluation: Answer QualityHand-coded for correctness and writing errors(two–three redundant coders)83% of Tail Answers had...
Field ExperimentHow do Tail Answers impact searchers’subjective impressions of the result page?Method:Recruit 361 users to...
Field Experiment DesignWithin-subjects 2x2 design:     Tail Answers vs. no Tail Answers     Good Ranking vs. Bad RankingMe...
Tail Answers’ Usefulness AreComparable to Good Result RankingResultsTail Answers main effect:                       0.34pts...
Answers Make Result ClickthroughsLess NecessaryResultsTail Answers main effect:                       1.01pts (7pt Likert)R...
Tail Answers impactsubjective ratings half asmuch as good ranking,and fully compensate forpoor results.…but, we need to im...
Ongoing ChallengesSpreading incorrect or unverified informationCannibalizing pageviews from theoriginal content pages      ...
Extension: A.I.-driven AnswersUse open information extraction systems topropose answers, and crowd to verifyCrowd-authored...
Extension: Better Result SnippetsImprove result pages for popular queriesAutomatically extractedCrowd-authored
Extension: Domain-Specific AnswersDesign for specific information needsCrowds structuring new data types
Direct Answers for SearchQueries in the Long TailCrowd data can support many uncommonuser goals in interactive systems.
Direct Answers for Search Queries in the Long Tail
Upcoming SlideShare
Loading in …5
×

Direct Answers for Search Queries in the Long Tail

1,071 views
1,008 views

Published on

CHI 2012

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,071
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Direct Answers for Search Queries in the Long Tail

  1. 1. Direct Answers for SearchQueries in the Long TailMichael Bernstein, Jaime Teevan, Susan Dumais,Dan Liebling, and Eric HorvitzMIT CSAIL and Microsoft Research MIT HUMAN-COMPUTER INTERACTION
  2. 2. Answers: Direct Search ResultsManually constructed for popular queriesweather boston
  3. 3. the girl with the dragon tattoo good answers reduce clicks on result pagesmemorial day 2012 users trigger answers repeatedly once discoveredAAPL [Chilton + Teevan 2009]
  4. 4. the girl with the dragon tattoo only popular query typesmemorial day 2012 answers are: - high costAAPL - high maintenance
  5. 5. Prevalence of Uncommon SearchesNo answers for many information needsmolasses substitutesincrease volume windows xpdissolvable stitches speeddog body temperatureCHI 2013 deadline …
  6. 6. Tail AnswersDirect results for queries in the long tailmolasses substitutes
  7. 7. Tail AnswersDirect results for queries in the long tailmolasses substitutesTail Answers improve the searchexperience for less common queries,and fully compensate for poor results.
  8. 8. The Long Tail of Answers weather movies# occurrences Tail Answers chi 2017 location Information needs Hard to find structured information Not enough query volume for dedicated teams
  9. 9. Crowds in Tail AnswersCrowd Data Paid Crowdssearch logs on-demand75 million Bing search trails Extract Proofread Title
  10. 10. Crowds can support thelong tail of user goals ininteractive systems.Crowd Data Paid Crowdssearch logs on-demand
  11. 11. Tail Answers PipelineFind URLs that satisfy fact-findinginformation needs, then extract answersfrom those pages
  12. 12. Tail Answers Pipeline1 Find candidate information needs2 Filter candidates3 Write answers
  13. 13. Tail Answers Pipeline1 Identify answer candidates2 Filter candidates3 Extract answer content
  14. 14. Identify Answer CandidatesCrowd data: 75 million search sessionsAll information needs are answer candidates:queries leading to a clickthrough on a single URL force quit mac force quit on macs URL … Identify Outline Filterhow to force quit mac Extract Evaluation Discussion
  15. 15. Abstraction: Search Trails[White, Bilenko, and Cucerzan 2007]URL path from query to 30min session timeoutquery URL1 URL2 URL3 URL1 URL1 URL1
  16. 16. Example Answer Candidates force quit mac force quit on macs URL1how to force quit mac 410 area code URL2area code 410 location
  17. 17. Tail Answers Pipeline1 Identify answer candidates2 Filter candidates3 Extract answer content
  18. 18. Filtering Answer CandidatesFocus on fact-finding information needs [Kellar 2007]Exclude popular but unanswerable candidates: radio pandora pandora.com pandora radio log in Identify Outline Filter Extract Evaluation Discussion
  19. 19. Filtering Answer CandidatesThree filters remove answer candidates thatdo not address fact-finding information needs: Navigation behavior Pages addressing search needs Query behavior Unambiguous needs Answer type Succinct answers
  20. 20. Filter by Navigation BehaviorDestination Probability for URL :P(session length = 2 | URL in trail)Probability of ending session at URLafter clicking through from the search resultsquery URL1query URL1 URL1 destinationquery URL1 URL2 URL3 probability = 0.5query URL1 URL4
  21. 21. Filter by Navigation BehaviorDestination Probability Filter:URLs with low probability that searchers willend their session (Lots of back navigations, later clicks)Focus on queries where searchers addressedan information need
  22. 22. Filter by Query BehaviorWhat answers are these searchers looking for?dissolvable stitches(how long they last?)(what they’re made of?)732 area code(city and state?)(count of active phone numbers?)
  23. 23. Filter by Query BehaviorA minority of searchers use question words:how long dissolvable stitches lastwhere is 732 area codeFilter candidates with fewer than 1% ofclickthroughs from question queries
  24. 24. Filter by Answer TypeCan a concise answer address this need?Ask paid crowdsourcing workers to select:–  Short: phrase or sentence Today “The optimal fish frying temperature is 350°F.”–  List: small set of directions or alternatives “To change your password over Remote Desktop: 1) Click on Start > Windows Security. 2) Click the Change Password button. [...]”–  Summary: synthesize large amount of content Impact of Budget Cuts on Teachers
  25. 25. Creating Tail Answers1 Identify answer candidates2 Filter candidates3 Extract answer content
  26. 26. Extracting the Tail AnswerWe now have answer candidates with: Factual responses Succinct responsesHowever, the answer is buried: dissolvable stitches dissolvable stitches how long dissolvable stitches absorption Identify Outline Filter Extract Evaluation Discussion
  27. 27. Crowdsourcing WorkflowReliably extract relevant answer from the URLvia paid crowdsourcing (CrowdFlower) Extract Vote Proofread Vote Title Vote [Bernstein et al. 2010, Little et al. 2010]
  28. 28. Quality Challenge: OvergeneratingTypical extraction length:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur nislligula, venenatis eget vulputate at, venenatis non sem. Pellentesqueviverra metus vel orci suscipit vitae ullamcorper nisl vestibulum. Utbibendum venenatis erat nec porttitor. Integer aliquam elit temportortor iaculis ut volutpat est lacinia. Aenean fringilla interdum tristique.Duis id felis sit amet libero porttitor suscipit eget vitae elit. Fusce nequeaugue, facilisis quis bibendum a, dapibus et felis. Aliquam at sagittismagna. Sed commodo semper tortor in facilisis. Nullam consequatquam et felis faucibus sed imperdiet purus luctus. Proin adipiscing felisac nulla euismod ac dictum massa blandit. In volutpat auctor pharetra.Phasellus at nisl massa. Vivamus malesuada turpis a ligula lacinia utinterdum dui congue. Curabitur a molestie leo. Nulla mattis posueresapien sit amet orci aliquam.
  29. 29. Inclusion/Exclusion Gold StandardInclusion/exclusion lists test worker agreementwith a few annotated examples: Text that must be present Text that must not be presentImplementable via negative look-ahead regexGold standard questions: [Le et al. 2010]
  30. 30. Quality Challenge: OvergeneratingExtraction length with gold standards:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur nislligula, venenatis eget vulputate at, venenatis non sem. Pellentesqueviverra metus vel orci suscipit vitae ullamcorper nisl vestibulum. Utbibendum venenatis erat nec porttitor. Integer aliquam elit temportortor iaculis ut volutpat est lacinia. Aenean fringilla interdum tristique.Duis id felis sit amet libero porttitor suscipit eget vitae elit. Fusce nequeaugue, facilisis quis bibendum a, dapibus et felis. Aliquam at sagittismagna. Sed commodo semper tortor in facilisis. Nullam consequatquam et felis faucibus sed imperdiet purus luctus. Proin adipiscing felisac nulla euismod ac dictum massa blandit. In volutpat auctor pharetra.Phasellus at nisl massa. Vivamus malesuada turpis a ligula lacinia utinterdum dui congue. Curabitur a molestie leo. Nulla mattis posueresapien sit amet orci aliquam.
  31. 31. Tail Answer Pipeline1 Identify answer candidates2 Filter candidates3 Extract answer content
  32. 32. 75 million search trails Median answer triggered once a day19,167 answer candidates 44 cents to create each answer
  33. 33. Evaluation: Answer QualityHand-coded for correctness and writing errors(two–three redundant coders)83% of Tail Answers had no writing errors87% of Tail Answers were completely correct orhad a minor error (e.g., title != content)False positives in crowd data:dynamic web pages Identify Outline Filter Extract Evaluation Discussion
  34. 34. Field ExperimentHow do Tail Answers impact searchers’subjective impressions of the result page?Method:Recruit 361 users to issue queries that triggerTail Answers to a modified version of Bing.
  35. 35. Field Experiment DesignWithin-subjects 2x2 design: Tail Answers vs. no Tail Answers Good Ranking vs. Bad RankingMeasurement: 7-point Likert responses 1. Result page is useful 2. No need to click through to a resultAnalysis: linear mixed effects modelGeneralization of ANOVA
  36. 36. Tail Answers’ Usefulness AreComparable to Good Result RankingResultsTail Answers main effect: 0.34pts (7pt Likert)Ranking main effect: 0.68ptsInteraction effect: 1.03pts 7 Useful 5 Tail Answer 3 No Tail Answer 1 Good Ranking Bad RankingAll results significant p<0.001
  37. 37. Answers Make Result ClickthroughsLess NecessaryResultsTail Answers main effect: 1.01pts (7pt Likert)Result ranking main effect: 0.50ptsInteraction effect: 0.91pts 7 No clicks 5 Tail Answer 3 No Tail Answer 1 Good Ranking Bad RankingAll results significant p<0.001
  38. 38. Tail Answers impactsubjective ratings half asmuch as good ranking,and fully compensate forpoor results.…but, we need to improve the trigger queries.
  39. 39. Ongoing ChallengesSpreading incorrect or unverified informationCannibalizing pageviews from theoriginal content pages Identify Outline Filter Extract Evaluation Discussion
  40. 40. Extension: A.I.-driven AnswersUse open information extraction systems topropose answers, and crowd to verifyCrowd-authoredAuthored by AI, verified by crowds
  41. 41. Extension: Better Result SnippetsImprove result pages for popular queriesAutomatically extractedCrowd-authored
  42. 42. Extension: Domain-Specific AnswersDesign for specific information needsCrowds structuring new data types
  43. 43. Direct Answers for SearchQueries in the Long TailCrowd data can support many uncommonuser goals in interactive systems.

×