Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Daniel Tunkelang
Head, Query Understanding
better search thro...
overview
 query understanding: what is it?
 how we do query understanding at LinkedIn
 some other thoughts from search ...
Information need query select from results
rank using IR model
user:
system:
tf-idf PageRank
bird’s-eye view of how a sear...
Information need query select from results
rank using IR model
user:
system:
tf-idf PageRank
query understanding
4
search is a communication problem
5
6
tag: skill OR title
related skills:
search, ranking, …
tag: company
id: 1337
industry: internet
verticals:
people, jobs
...
query understanding pipeline
7
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured qu...
query understanding pipeline
8
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured qu...
9
fix obvious typos
help users spell names
spelling correction
spelling out the details
10
PEOPLE NAMES
COMPANIES
TITLES
PAST QUERIES
n-grams
marissa => ma ar ri is ss sa
metaphone
mark...
spelling out the details
11
problem: corpus as well as query logs contain many spelling errors
certain spelling errors are...
spelling out the details
12
problem: corpus & query logs contain spelling errors
solution: use query chains to infer corre...
query understanding pipeline
13
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured q...
query tagging: identifying entities in the query
14
TITLE CO GEO
TITLE-237
software engineer
software developer
programmer...
query tagging: identifying entities in the query
15
TITLE CO GEO
MORE PRECISE MATCHING WITH DOCUMENTS
entity-based filtering
16
BEFORE
entity-based filtering
17
AFTER
BEFORE
entity-based filtering
18
BEFORE
entity-based filtering
19
AFTER
BEFORE
entity-based suggestions
20
entity-based suggestions
21
query tagging: sequential model
22
EMISSION PROBABILITIES
(learned from user profiles)
TRANSITION PROBABILITIES
(learned f...
query tagging: sequential model
23
INFERENCE
given a query, find the most likely sequence of tags
query understanding pipeline
24
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured q...
vertical intent prediction: distribution
25
JOBS
PEOPLE
COMPANIES
(probability distribution over verticals)
vertical intent prediction: relevance
26
[company]
[employees]
[jobs]
[name search]
query understanding pipeline
27
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured q...
28
query expansion: name synonyms
29
query expansion: job title synonyms
30
query expansion: signals
[jon] [jonathan] CLICK
trained using query chains:
[programmer] [developer] CLICK
symmetric bu...
query understanding pipeline
31
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured q...
32
what else can we learn from search in the wild?
don’t guess when it’s better to ask
33
vs.
clarify then refine
34
computers books
give users transparency, guidance, and control
35
think beyond individual search queries
36
Gene Golovchinsky, FXPAL
know when you don’t know
37
Claudia Hauff, Query Difficulty for Digital Libraries [2009]
38
Daniel Tunkelang
dtunkelang@linkedin.com
https://linkedin.com/in/dtunkelang
Upcoming SlideShare
Loading in …5
×

Better Search Through Query Understanding

21,321 views

Published on

Better Search Through Query Understanding
Presented as a Data Talk at Intuit on April 22, 2014

Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding.

About the Speaker

Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Published in: Technology, Business
  • Thanks for sharing. This is great ... clean and neat preso... one question though . did you use HMM or any specific technique for sequence modelling ? I am looking to implement a query tagger for better understanding our user queries... I am wondering how did you go about creating training set .. was it more of labor intensive human labelled tags or some automated way using dictionaries ? Some context on our biz. we are into selling consumer electronics, home & FMCG product with a catalog size of 1.5M unique products.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sure without query concept you can't do find any thing.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Better Search Through Query Understanding

  1. Recruiting SolutionsRecruiting SolutionsRecruiting Solutions Daniel Tunkelang Head, Query Understanding better search through query understanding
  2. overview  query understanding: what is it?  how we do query understanding at LinkedIn  some other thoughts from search in the wild what I’m not going to cover: 2
  3. Information need query select from results rank using IR model user: system: tf-idf PageRank bird’s-eye view of how a search engine works 3
  4. Information need query select from results rank using IR model user: system: tf-idf PageRank query understanding 4
  5. search is a communication problem 5
  6. 6 tag: skill OR title related skills: search, ranking, … tag: company id: 1337 industry: internet verticals: people, jobs intent: exploratory
  7. query understanding pipeline 7 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  8. query understanding pipeline 8 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  9. 9 fix obvious typos help users spell names spelling correction
  10. spelling out the details 10 PEOPLE NAMES COMPANIES TITLES PAST QUERIES n-grams marissa => ma ar ri is ss sa metaphone mark/marc => MRK co-occurrence counts marissa:mayer = 1000 marisa meyer yahoo marissa marisa meyer mayer yahoo
  11. spelling out the details 11 problem: corpus as well as query logs contain many spelling errors certain spelling errors are quite frequent while genuine words (especially names) might be infrequent
  12. spelling out the details 12 problem: corpus & query logs contain spelling errors solution: use query chains to infer correct spelling [product manger] [product manager] CLICK [marissa mayer] CLICK
  13. query understanding pipeline 13 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  14. query tagging: identifying entities in the query 14 TITLE CO GEO TITLE-237 software engineer software developer programmer … CO-1441 Google Inc. Industry: Internet GEO-7583 Country: US Lat: 42.3482 N Long: 75.1890 W (RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
  15. query tagging: identifying entities in the query 15 TITLE CO GEO MORE PRECISE MATCHING WITH DOCUMENTS
  16. entity-based filtering 16 BEFORE
  17. entity-based filtering 17 AFTER BEFORE
  18. entity-based filtering 18 BEFORE
  19. entity-based filtering 19 AFTER BEFORE
  20. entity-based suggestions 20
  21. entity-based suggestions 21
  22. query tagging: sequential model 22 EMISSION PROBABILITIES (learned from user profiles) TRANSITION PROBABILITIES (learned from query logs) TRAINING
  23. query tagging: sequential model 23 INFERENCE given a query, find the most likely sequence of tags
  24. query understanding pipeline 24 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  25. vertical intent prediction: distribution 25 JOBS PEOPLE COMPANIES (probability distribution over verticals)
  26. vertical intent prediction: relevance 26 [company] [employees] [jobs] [name search]
  27. query understanding pipeline 27 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  28. 28 query expansion: name synonyms
  29. 29 query expansion: job title synonyms
  30. 30 query expansion: signals [jon] [jonathan] CLICK trained using query chains: [programmer] [developer] CLICK symmetric but not transitive! [francis] ⇔ [frank] [franklin] ⇔ [frank] [francis] ≠ [franklin] [software engineer] [software developer] CLICK context based! [software engineer] => [software developer] [civil engineer] ≠ [civil developer]
  31. query understanding pipeline 31 spellcheck query tagging vertical intent prediction query expansion raw query structured query + annotations
  32. 32 what else can we learn from search in the wild?
  33. don’t guess when it’s better to ask 33 vs.
  34. clarify then refine 34 computers books
  35. give users transparency, guidance, and control 35
  36. think beyond individual search queries 36 Gene Golovchinsky, FXPAL
  37. know when you don’t know 37 Claudia Hauff, Query Difficulty for Digital Libraries [2009]
  38. 38 Daniel Tunkelang dtunkelang@linkedin.com https://linkedin.com/in/dtunkelang

×