Better Search Through Query Understanding

16,607 views

Published on

Better Search Through Query Understanding
Presented as a Data Talk at Intuit on April 22, 2014

Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding.

About the Speaker

Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Published in: Technology, Business
2 Comments
37 Likes
Statistics
Notes
  • Thanks for sharing. This is great ... clean and neat preso... one question though . did you use HMM or any specific technique for sequence modelling ? I am looking to implement a query tagger for better understanding our user queries... I am wondering how did you go about creating training set .. was it more of labor intensive human labelled tags or some automated way using dictionaries ? Some context on our biz. we are into selling consumer electronics, home & FMCG product with a catalog size of 1.5M unique products.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sure without query concept you can't do find any thing.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
16,607
On SlideShare
0
From Embeds
0
Number of Embeds
4,091
Actions
Shares
0
Downloads
287
Comments
2
Likes
37
Embeds 0
No embeds

No notes for slide

Better Search Through Query Understanding

  1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions Daniel Tunkelang Head, Query Understanding better search through query understanding
  2. overview  query understanding: what is it?  how we do query understanding at LinkedIn  some other thoughts from search in the wild what I’m not going to cover: 2
  3. Information need query select from results rank using IR model user: system: tf-idf PageRank bird’s-eye view of how a search engine works 3
  4. Information need query select from results rank using IR model user: system: tf-idf PageRank query understanding 4
  5. search is a communication problem 5
  6. 6 tag: skill OR title related skills: search, ranking, … tag: company id: 1337 industry: internet verticals: people, jobs intent: exploratory
  7. query understanding pipeline 7 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  8. query understanding pipeline 8 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  9. 9 fix obvious typos help users spell names spelling correction
  10. spelling out the details 10 PEOPLE NAMES COMPANIES TITLES PAST QUERIES n-grams marissa => ma ar ri is ss sa metaphone mark/marc => MRK co-occurrence counts marissa:mayer = 1000 marisa meyer yahoo marissa marisa meyer mayer yahoo
  11. spelling out the details 11 problem: corpus as well as query logs contain many spelling errors certain spelling errors are quite frequent while genuine words (especially names) might be infrequent
  12. spelling out the details 12 problem: corpus & query logs contain spelling errors solution: use query chains to infer correct spelling [product manger] [product manager] CLICK [marissa mayer] CLICK
  13. query understanding pipeline 13 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  14. query tagging: identifying entities in the query 14 TITLE CO GEO TITLE-237 software engineer software developer programmer … CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
  15. query tagging: identifying entities in the query 15 TITLE CO GEO MORE PRECISE MATCHING WITH DOCUMENTS
  16. entity-based filtering 16 BEFORE
  17. entity-based filtering 17 AFTER BEFORE
  18. entity-based filtering 18 BEFORE
  19. entity-based filtering 19 AFTER BEFORE
  20. entity-based suggestions 20
  21. entity-based suggestions 21
  22. query tagging: sequential model 22 EMISSION PROBABILITIES (learned from user profiles) TRANSITION PROBABILITIES (learned from query logs) TRAINING
  23. query tagging: sequential model 23 INFERENCE given a query, find the most likely sequence of tags
  24. query understanding pipeline 24 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  25. vertical intent prediction: distribution 25 JOBS PEOPLE COMPANIES (probability distribution over verticals)
  26. vertical intent prediction: relevance 26 [company] [employees] [jobs] [name search]
  27. query understanding pipeline 27 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  28. 28 query expansion: name synonyms
  29. 29 query expansion: job title synonyms
  30. 30 query expansion: signals [jon] [jonathan] CLICK trained using query chains: [programmer] [developer] CLICK symmetric but not transitive! [francis] ⇔ [frank] [franklin] ⇔ [frank] [francis] ≠ [franklin] [software engineer] [software developer] CLICK context based! [software engineer] => [software developer] [civil engineer] ≠ [civil developer]
  31. query understanding pipeline 31 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  32. 32 what else can we learn from search in the wild?
  33. don’t guess when it’s better to ask 33 vs.
  34. clarify then refine 34 computers books
  35. give users transparency, guidance, and control 35
  36. think beyond individual search queries 36 Gene Golovchinsky, FXPAL
  37. know when you don’t know 37 Claudia Hauff, Query Difficulty for Digital Libraries [2009]
  38. 38 Daniel Tunkelang dtunkelang@linkedin.com https://linkedin.com/in/dtunkelang

×